#data-science-and-ml

1 messages · Page 382 of 1

stone marlin
#

I remember going through both of them when I was picking out a framework to use, and I was like, "ehh, these are both fine." Streamlit, though, felt easier to manage and build stuff up with for me --- but totally subjective. They're both great tools.

#

You'll probably do the data validation and then, after you get the results, it'll take an existing template and fill the stuff. In Flask / Jinja, you need to make this template yourself. In Streamlit, it sort'a makes it for you. I don't remember Dash's thing, but I think it's similar.

agile cobalt
#

there's also Mode (and many others), if you just want reports instead of actual dashboards

#

in Dash you sort of define the HTML outline then feed plotly graphs into a div
(and use callbacks to update it based on user interactions)

stone marlin
#

Ahhh, that sounds familiar. I haven't seen Mode, dang, there's a lot of these.

#

I only know Streamlit because Emyrs told me about it here. :']

agile cobalt
#

I'm not sure if Mode was free or not though

odd meteor
#

Lol yeah in a bus. I thought that's what everyone calls it

stone marlin
agile cobalt
#

streamlit seems fine as well - if it doesn't lock anything behind a paywall (cough dash enterprise...), maybe go for it

stone marlin
#

Secretly, I love streamlit too because it integrates well with Altair, my beloved underrated plotting library. :'''']

desert oar
#

i have no idea, you will have to investigate what is different between the validation and training sets. it might be background objects, or it might be some other problem

acoustic crow
#

Let me represent this a bit more clearly. So I don't need to create any charts and etc. The thing that I have is a dataset which contains an 'n' amount of columns and each column has an some sort of value in it. For this column I have an observed and expected value. I have to create a validation script in Python with the help of Pandas and some other validation scripts that I am able to find to run their checks such as if this value is either negative and etc. After that validation is completed I need to somehow visualize it in a report style showing with flags(importance) each column that did not pass the check, it has to be sort of interactive to be able to filter and so on

#

So I am not sure where to really categorize my problem if its web related or not

stone marlin
#

Yeah, that seems like something you could do in a table, but I'm not sure how things like icons of flags or highlighting work in Streamlit.

desert oar
#

this sounds like it might be easier to just roll your own flask app or something

#

the requirement seems pretty straightforward, just a yes/no indicator next to each column name, and an expandable <details> element w/ specific information about what failed if anything

#

maybe a way to export some report as a text document or json or whatever

stone marlin
#

In the flask app, they'd still have to use some js framework like data.table or something. I think Streamlit has this built-in if the results are in pandas.

#

Either way is probably fine, though.

desert oar
#

would they? you could render it statically in an html table

stone marlin
#

To get an interactive table with filters?

desert oar
#

oh i missed that they wanted it to be interactive

#

this is gonna sound stupid but... have you considered generating an excel workbook?

stone marlin
#

Yeah, that's the only reason I'd recommend SL instead of just rollin' their own. https://datatables.net/ is very powerful, but also --- can be frustrating to work with.

#

Haha, that's not a bad idea either. And pandas, also, has a default exporter for excel.

desert oar
#

yeah ive generated pretty sophisticated reports that way

stone marlin
#

Yeah, it's actually really cool. A lot of people require excel, so it's a pretty nice thing they put in. :']

acoustic crow
#

So basically extract the results of the data validation into an excel spreadsheet and just structure it there?

desert oar
#

im a bit surprised there arent convenient and light-weight "off the shelf" libraries for sortable/filterable tables

desert oar
#

that said, i feel like everyone's first web app is a table

#

so it can't be that hard

#

not that there's anything wrong with streamlit either

acoustic crow
#

I am just not good with web dev, not much experience and I have no idea how to make a template and later feed data into that template and so on

desert oar
#

i'd go with excel then personally, if that meets the requirements

stone marlin
#

If you want to get better at webdev, try that out. Otherwise, there's some other good options here. :']

agile cobalt
desert oar
acoustic crow
#

I researched dash and I dont think it meets the requirements that are necessary. Streamlit sort of does. Excel seems like a good idea, but so does the web populating. I am down to learn new things but for the sake of graduating I am not sure which is the correct course of action

agile cobalt
#

excel is fine tbh

#

you can even use xlslwriter / openpyxl to format it nicely for reports

sterile rivet
#

https://prnt.sc/xE7nXBAy0zyg

Labels consist of 3 items, together them 3 makes around 12k datapoints. The graph yall see above is correct,
1st item has 4.2k points
2nd item has 5.3k points( As yall can see, they added 5.3k on the 4.2k graph) How can I avoid this?
3rd item has 2.5k points which is again added over 1 and 2, what to do to make their bar plots separately?

Lightshot

Captured with Lightshot

acoustic crow
#

i do appreciate the ideas, I think excel might be a good option as well at this point

#

Thank you for the ideas, people! If anybody has any other input that they would like to share, I'd love to discuss it!

stone marlin
desert oar
uneven flame
desert oar
sterile rivet
# desert oar is it correct or is it incorrect? who is "they"? can you provide more context fo...

So, New York(0), London(1) and Paris(2) has 4723, 5341, 2510 points respectively, and these are together merged in label(which is my x-axis here), together these make around 12k points.
I wanted to plot a bar chart for each label individually, the bar chart for New York is correct(as you could see in the graph) .
London(1) has 5.3k data points and it is supposed to show 5.3k in the graph above, but it is the addition of NewYork + London and addition of all 3 (which is 12k as shown in the graph) in the Paris barplot.
How can I plot them individually?

desert oar
#
cities = pd.Series({
    'New York': 4723,
    'London': 5341,
    'Paris': 2510,
})

cities.plot.bar()
plt.show()
#

should be as easy as that

arctic wedgeBOT
#

Hey @sterile rivet!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

sterile rivet
desert oar
#
labels = [0] * len(new_york_text) + [1] * len(london_text) + [2] * len(paris_text)

what did you expect?

#

matplotlib bar just takes counts and positions

#

you're way overthinking this

#

why are you converting these to lists?

sterile rivet
desert oar
#

this code makes no sense to me

#

you just need to get the length of each dataframe and put it in a bar chart

#

it looks like you're trying to do a bunch of complicated stuff that you don't need to do

sterile rivet
# desert oar this code makes no sense to me

uh, this is actually a project with some assigned tasks, plotting a bar chart isnt a task but I am still trying to plot one for practicing matplolib.
3 different datasets are given according to the areas, and I am supposed to make a system which predicts whether a tweet was sent from any of the 3 cities.

desert oar
#

even so. i think you are way overthinking this plot here

sterile rivet
desert oar
#

what is the simplest possible way this could work?

#

just put the 3 lengths in a list...

#
sizes = [
    len(new_york_tweets),
    len(london_tweets),
    len(paris_tweets),
]
labels = ["NY", "London", "Paris"]
plt.bar(range(len(sizes)), sizes, tick_label=labels)
sterile rivet
desert oar
#

but that isn't what i am saying to do

#

i'm saying to just get the length of each group of tweets individually

#

and just plot them

#

look at my code

#

it couldn't get any simpler

#

you're trying to do something much fancier and more complicated than you need to

#

simple is good

sterile rivet
neat anvil
acoustic crow
#

But thank you for the tip

#

Do I basically create the format of the excel file through this library or how exactly?

urban lance
#

how can I save the result of a df.groupby to a new dataframe?

neat anvil
# acoustic crow Do I basically create the format of the excel file through this library or how e...

It allows you to write excel files from python. Put data or formulas into cells, create filters, lock down certain sheets, graphs , everything. So you can use python to hit your apis or database or whatever, then make a dope excel out of the data you've collected. Once you write the code right once it’s automated and you can just run the python code every week or whatever to generate the dope spreadsheet with no work

#

But if your dataset is easily imported directly into excel, it may be kind of pointless to do anything in python

#

Except as a learning exercise for yourself

daring frost
serene scaffold
urban lance
#

I might have what I need now 🙏

#

the code doesn't look nice, but if it works it works

serene scaffold
#

show code

urban lance
#

been struggling with this for way to long

urban lance
serene scaffold
#

if you're willing to swallow your pride, I can suggest improvements. up to you.

urban lance
#

alright then, I'll get back to you if I'm sure my jank worked

#

gimme 30min

serene scaffold
#

I might be doing something else by then. we'll see

urban lance
#

No worries, I have time. There is no need for a quick reply

#

@serene scaffold actually here is what I'm going right now 😅
The writing an reading csv part, does exactly what I want. Of course it's not very efficient

df = pd.DataFrame(df).reset_index()
df.to_csv("chunk_processed_csv.csv", index=False, encoding='utf-8-sig')
df = pd.read_csv("chunk_processed_csv.csv")
df = df.iloc[1: , :]
serene scaffold
#

what is this supposed to do? you're trying to "get rid of the index"?

#

because you can't--every dataframe always has an index no matter what

#

if you just don't want to look at the index, that is doable.

grand vapor
#

Hey everyone, I’m trying to read and store H5 file data in pandas dataframes. I have 8 H5 files each around 3GB. So, it’s a lot of data. I can do this successfully, but it freezes my computer and takes a very long time. I’m wondering, is there a more efficient and less memory-taxing way of doing this? Should I convert from H5 to another format like CSV or Parquet or pickle?

urban lance
#

I'll show ya step by step

serene scaffold
# urban lance What no

the dataframe always has an index. there's no way around that. you can just choose to not print it.

urban lance
#

we're not talking about any index

serene scaffold
urban lance
#

I'll show you what I'm doing

serene scaffold
#

Sure. I have about five minutes.

urban lance
#

my program won't finish in that time

serene scaffold
#

Alright, good luck!

urban lance
#

it works anyways

#

not very fast, but it works

#

actually

#

This is what I have

#

and this is what I want

#

@serene scaffold

#

does that explain what I'm trying to do 🤔

#

I wanna save the groups I made as a new dataframe (and not ungroup them)

serene scaffold
#

I can't really tell what your data model is.

urban lance
#

once my code has finished running, I'll go through what I'm doing

#

today was the first time I used groupby so I know next to nothing about it

#

this is what the columns look like after grouping
(the data is sensitive so I'm not showing this)

#

then I reset the index with

df = pd.DataFrame(df).reset_index()
#

when I now write the file to a csv, and read it back in
the lambda, min max and sum level somehow becomes part of the data 🤷‍♂️

#

note that the data is still grouped in this state

#

Then I drop the first row with this state
drop the "drived_tstamp" and rename the other 2 to "min" and "max" respectivelely

#

and then I got the data exaclty the way I wanted, still grouped but without the lambda, min max and sum level

twin hound
#

# Apply standardized scaling to the training and test data, but only fit the training set

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# SVM model with parameters adjusted for maximum optimization
rand_list = {"C": stats.uniform(1, 100),
             "gamma": stats.uniform(0.1, 1)}
svm_model = SVC(max_iter = 5000,kernel='rbf',C=97.366, gamma=0.4834)

# Perform Randomized Search for hyper parameterization

clf = RandomizedSearchCV(svm_model, param_distributions = rand_list, random_state = 0)
search = clf.fit(X_train, y_train)
params = search.best_params_
print(params)
print(search.score)

# Fit model with data and perform prediction

svm_fit = svm_model.fit(X_train, y_train)
prediction = svm_model.predict(X_test) # Model prediction using testing set


# Use the score metric for evaluation of the model accuracy

score_train = svm_model.score(X_train,y_train)
score_test = svm_model.score(X_test,y_test)

print(score_train)
print(score_test)

# Perform k-fold cross validation to optimize the model and reduce bias/variance
# Number of folds

k = 5
kf = StratifiedKFold(n_splits=k, shuffle = False, random_state = None)

# Model Prediction with k-fold cross-validation using testing set

prediction_kf = cross_val_predict(svm_fit,X_test,y_test,cv = k)

# K-fold cross validation on the training/validation set
k_score_train = cross_val_score(svm_fit,X_train,y_train,cv = k)

# K-fold cross validation on the testing set
k_score_test = cross_val_score(svm_fit,X_test,y_test,cv = k)

mean_accuracy_train = np.average(k_score_train)
mean_accuracy_test = np.average(k_score_test)

print(mean_accuracy_train)
print(mean_accuracy_test)
#

someone please help I have bad overfitting. note that my model is highly non-linear

neon heart
#

I can not figure out why this group is splitting like this -

twin hound
#

can someone help me 1 on 1? i can explain it easily through voice

misty flint
#

have you tried looking at the data columns individually. if i saw this, i would slice only IndustrySubsector just to double check

neon heart
neat anvil
twin hound
#

it didnt change

neat anvil
#

svm_fit = svm_model.fit(X_train, y_train, **params)

twin hound
#

my problem isnt with the parameters

#

I think the data is just too nonlinear

neat anvil
#

okay well can you understand that pasting code in here that's not the code you actually used is not very helpful

twin hound
#

that is the code I used

#

the C and gamma values I changed manually

#

in the code I posted. those were the results from the random searhc

neat anvil
twin hound
#

anyway I tried your fix its giving an error

neat anvil
#

O woops

#

yeah it'd be svm_fit = SVC(max_iter = 5000,kernel='rbf',**params).fit(X_train, y_train)

twin hound
#

ok yea it worked now

#

but it still has bad overfitting

#

not sure why test set has terrible score. Could I just do private chat with u, im sure u could help easily if u understood the data

neat anvil
#

@twin hound there's parameters you're not tuning in the hyperparameter search- the kernel and max_iter. Try adding those to the search.

twin hound
#

how would I add kernel

#

ive tried all the kernels rbf is the best

cinder thicket
#

(from #python-discussion )hello, new here
is there any way to do Shape From shading in python, if so, how do i do it?
i want to make DEMs for many of the solar system's moons with the image data avalable

neat anvil
#
rand_list = {
    "C": stats.uniform(1, 100),
    "gamma": stats.uniform(0.1, 1),
    "max_iter": stats.uniform(1,5000),
    "kernel": ["rbf", "opt2", "opt3"],
}

@twin hound

cinder thicket
neat anvil
twin hound
#

ok well anyway to summarize my issue, even after applying all of this tuning, the score on the training set is really high (0.96-0.997) but the testing set doesnt change (0.5-0.7) and when I apply kfold cross validation the training set ranges from 0.6-0.7 and the test set ranges from 0.4-0.5

#

my issue is it seems the model only works well with the training set

#

for 2 days Ive played around with the parameters. literally have changed everything tried many combinations, grid search, rand search, etc. my main issue is just what I said above. Wondering if you know why this occurs usually

tacit basin
#

Rather low number. What algorithm is that?

neat anvil
twin hound
#

how do I set up a good cross validated search?

#

I'm not sure it's #2 because I performed the same analysis using train_test_split to just verify if the test data was bad. but train_test_split gave same results

#

like given the code, what would u do

#

with your experience

tacit basin
twin hound
#

heres a sample of the training and test

#

with the model on it

twin hound
neat anvil
twin hound
#

could it be because the training set is small?

#

its only [750,8]

#

test set is [150,8]

neat anvil
#

you're not properly doing the cross-validated hyperparameter search, so your model is overfit to the Training data.

tacit basin
twin hound
twin hound
tacit basin
twin hound
#

ok sure let me send my data

#

how do I send excel data here?

neat anvil
tacit basin
neat anvil
# twin hound am I supposed to literally just randomly try every single parameter and hope for...

this is an astute observation. That's why more recent versions of SKLearn added hyperparameter search functions that learn as they go: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.HalvingRandomSearchCV.html#sklearn.model_selection.HalvingRandomSearchCV

tacit basin
neat anvil
#

you can also try the hyperopt library, which uses different learning algorithms than sklearn for hyperparameter optimization

#

But random search is quite powerful. Your model and data are both small, so randomly searching a few hundred options and picking the best is likely to find a very good solution in a relatively short amount of computation time.

twin hound
#

so do I just use random search with every single parameter?

#

theres like 30

neat anvil
#

probably not all 30. that's where expertise comes in. You need to understand how SVMs work, how the training algorithm works, and how your data is interacting with those things. Then you can pick reasonable choices for which parameters to search and reasonable ranges for them.

#

just 2 parameters is clearly not enough, since you are reporting such a large difference b/w train and test performance

tacit basin
# twin hound theres like 30

If your data is small training probably doesn't take long. So you search a lot :). You can use some auto ml library as well.

twin hound
#

well I know for svm regularization, kernel and gamma have high impact

#

what if I change my input features

#

instead of just putting the x data as is, make a relationship between them to reduce dimensions

neat anvil
#

that would have a large impact on your model, yes

#

maybe good, maybe bad

twin hound
#

this is the data fyi

#

so for example reduce dimensions by going water/cement and coarse/fine aggregate. than

#

i feel my input data is really bad and has bad bias

tacit basin
twin hound
#

its the setting time of the concrete

#

different concretes have different setting times because it highly affects compressive strength

#

its classified as concrete age

tacit basin
twin hound
#

yea its all scaled

tacit basin
#

What are zeros in some rows?

twin hound
tacit basin
twin hound
#

the excel is not

#

0 just means it has no value for that certain input

#

like for example no fly ash for the third sample

tacit basin
twin hound
#

no it means there is no "amount" of that parameter in the concrete mix

#

essentially what you see is 8 inputs which are different materials for concrete mix and 1 output which is the strength class

#

some concretes have no fly ash, plasticizer, etc. so it has a value of 0

tacit basin
#

Or do we have more samples with certain class?

twin hound
tacit basin
#

Not bad

#

Ok.why SVC? :)

twin hound
#

heres also an example of what the data looks like if we plot the first and 2nd inputs against eachother. most of them look random like this

#

the colors are just the different classes

#

I have to use ANN and SVC

#

its for a project thats why : (

#

I am getting the same issue with ANN if you are wondering

#

MLP to be specific

tacit basin
#

Ah
Ok. All seems fine what you showed me.
I hate machine learning 🤪

twin hound
#

I know man... thats why im here 😭

tacit basin
#

So this seems like overfitting. What can cause overfitting in SVC?

iron basalt
#

Having your product revolve around one big model is a fundemental strategy for "unicorn" companies. The idea being to find some niche which has yet to be automated (low hanging pre-computerization fruit) and then automate it with a website + maybe an ML model. They often call themselves "tech" companies (using tech does not make the company a tech company) and mostly pop up on the west coast of the US.

#

The goal is then to hype it up to infinite and sell when it's highly valued to a bigger "tech" company or a bank. And it works for some, and when it does it's very profitable so they keep trying.

serene scaffold
#

Sorry, but we don't allow recruitment in this server.

wispy remnant
#

ah apologies

#

I honestly do not know where to turn

serene scaffold
#

I'm not really sure either. There's a Python job board on python.org.

wispy remnant
#

as nobody will help me, and its vital I have someone who knows what they are doing to take on this task.

#

thanks

#

this channel would pertain to phyphox correct?

#

spectrum analysis etc.

serene scaffold
#

idk what phyplox is, but this is the channel to discuss scientific computing in Python

wispy remnant
#

it handles sensors in mobile devices, ranging across all types of the likes. mostly, I am looking for someone who knows a little bit about auditory and frequency spectrum analysis

#

here is an example

twin hound
#

anyone know how to make stats.uniform select integers only?

fiery dust
#

any online course to learn data science and data analysis?

#

thats good?

#

one or more

twin hound
#

@neat anvilHow can I plot all of my predictions vs. actual (for example x1 with x_test1 vs. y for all inputs)

#

ty

neat anvil
#

!d scipy.stats.randint

arctic wedgeBOT
#

scipy.stats.randint = <scipy.stats._discrete_distns.randint_gen object>```
A uniform discrete random variable.

As an instance of the [`rv_discrete`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.rv_discrete.html#scipy.stats.rv_discrete "scipy.stats.rv_discrete") class, [`randint`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.randint.html#scipy.stats.randint "scipy.stats.randint") object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

Notes

The probability mass function for [`randint`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.randint.html#scipy.stats.randint "scipy.stats.randint") is:

\[f(k) = \frac{1}{\texttt{high} - \texttt{low}}\] for \(k \in \{\texttt{low}, \dots, \texttt{high} - 1\}\).

[`randint`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.randint.html#scipy.stats.randint "scipy.stats.randint") takes \(\texttt{low}\) and \(\texttt{high}\) as shape parameters...
neat anvil
#

this bot

#

what is it doin

twin hound
#

also how do I just ingore all warnings

neat anvil
#

!d warnings.catch_warnings

arctic wedgeBOT
#

class warnings.catch_warnings(*, record=False, module=None)```
A context manager that copies and, upon exit, restores the warnings filter and the [`showwarning()`](https://docs.python.org/3/library/warnings.html#warnings.showwarning "warnings.showwarning") function. If the *record* argument is [`False`](https://docs.python.org/3/library/constants.html#False "False") (the default) the context manager returns [`None`](https://docs.python.org/3/library/constants.html#None "None") on entry. If *record* is [`True`](https://docs.python.org/3/library/constants.html#True "True"), a list is returned that is progressively populated with objects as seen by a custom [`showwarning()`](https://docs.python.org/3/library/warnings.html#warnings.showwarning "warnings.showwarning") function (which also suppresses output to `sys.stdout`). Each object in the list has attributes with the same names as the arguments to [`showwarning()`](https://docs.python.org/3/library/warnings.html#warnings.showwarning "warnings.showwarning").

The *module* argument takes a module that will be used instead of the module returned when you import [`warnings`](https://docs.python.org/3/library/warnings.html#module-warnings "warnings: Issue warning messages and control their disposition.") whose filter will be protected. This argument exists primarily for testing the [`warnings`](https://docs.python.org/3/library/warnings.html#module-warnings "warnings: Issue warning messages and control their disposition.") module itself.
twin hound
#

how do I apply it in code

#
warnings.catch_warnings(*, record=False, module=None)
```?
neat anvil
#

follow the documentation link

#

it shows examples

twin hound
#

ok got it thanks

#

how do I make the randomized search select the best parameters based on the score?

#

because everytime I run it it keeps changing

#

@neat anvil

strange zealot
#

i have this data set i want to check what are the survival chances of people with same tickets

#

could someone help

neat anvil
#

It is selecting the best parameters based on cross-validation score

#

if it's changing every time that could mean a couple of things: your search space has many roughly equivalent optima (if the CV scores of many of the random models are around a similar reasonable value) OR you've selected the validation splits in a way that makes it difficult to get a reliable score (if the CV scores of many of the random models near 100%) OR the training data is so messy there is no way to achieve a good model with this type of model (if the CV score of many of the random models are low) OR your scoring metric is ill-defined OR the training data is so messy it's not much better than training on random noise, so you just get random parameters out (they're different each time you run it b/c it randomizes how it splits the data and the params)

#

those (if whatever) conditions are kind of hand-wavey, not for certain

#

but those are some signals and possible explanations

twin hound
#

damn ok I see

#

the CV scores are calculated in the background correct?

neat anvil
#

I'd recommend trying a much, much simpler model. Like just a basic logistic regression.

#

If you can't fit it with decent accuracy on data that simple

twin hound
#

I would but the problem is this is for a project where SVM and MLP needs to be used

neat anvil
#

more complex models aren't going to do much better.

#

well, it can give you a baseline expectation of what is reasonable

twin hound
#

its all good I appreciate youre help. Im meeting with my prof today to help my sorry ass

neat anvil
#

always a good idea

twin hound
#

yea thanks man

haughty ibex
#

search = [] for values in df['data']: search.append(re.search(r'\d{7}[N]\d{7}[E]', values).group(0).rstrip()) print(search)

Hello everybody i have this regex. I'm trying to search through one of the columns in my dataframe and return the string not the match object. i know i need to use group to achieve this however on some occasions throughout my dataframe re.search will return none. and group() will crash saying 'NoneType' object has no attribute 'group' i saw somewhere that group(0) should get rid of the nones but it didn't work. I know i can fix this with a try: except: block but im trying to find a different solution.

serene scaffold
#

@haughty ibex did you try Series.str.find?

#

!docs pandas.Series.str.find

#

oh that's the wrong one. must be extract

#

!docs pandas.Series.str.extract

arctic wedgeBOT
#

Series.str.extract(pat, flags=0, expand=True)```
Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.
serene scaffold
#

you'll also have to put parentheses around the part of the pattern you want to keep. which I guess will be all of it.

#

try to figure it out, and if you can't, I will show you the solution @haughty ibex

haughty ibex
#

df['report text'].str.extract(r'\d{7}[N]\d{7}[E]')

#

getting ValueError: pattern contains no capture groups

serene scaffold
#

@haughty ibex it extracts a capture group, so you have to put the whole thing in parentheses, if you want that

#

though it looks like there are two parts to this pattern, \d{7}[N] and \d{7}[E]

#

so you could get that information in two columns automatically, if you wanted.

#
>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')
   letter digit
0      a     1
1      b     2
2    NaN   NaN
haughty ibex
#

oh ok i think i got it to work but now im getting Length of values (0) does not match length of index (11)

#

in my test csv file i have 3 rows that would contain no matches for my re to test it out.

serene scaffold
#

please show what you changed the code to and the whole error message starting from Traceback.

haughty ibex
#

ok sorry the traceback was something i forgot to comment out while testing out the changes

#

df['pattern match'] = df['data'].str.extract(r'(\d{7}[N]\d{7}[E])')

serene scaffold
#

yay

haughty ibex
#

is there a flag to not get NaN values and just have and empty cell

serene scaffold
#

No, because NaN is an empty cell, basically

#

They're also the best way to represent missing data.

#

@haughty ibex make sense?

haughty ibex
#

@serene scaffold yes. i appreciate the help. can i use multiple regex patterns.
i have two other regex patterns that im using to find some data in csv files

regex1 = r'\d{1,3}[thrd]([a-zA-Z]+( [a-zA-Z]+)+)[e]\s' regex2 = r'\d{1,3}([a-zA-Z]+( [a-zA-Z]+)+)\d\s+([a-zA-Z])+\b'

could i do something like:
df['pattern match'] = df['data'].str.extract(regex1,regex2)

im guessing its not that simple lol

#

regex1 = r'\d{1,3}[thrd]([a-zA-Z]+( [a-zA-Z]+)+)[e]\s' regex2 = r'\d{1,3}([a-zA-Z]+( [a-zA-Z]+)+)\d\s+([a-zA-Z])+\b' regex_list = [regex1, regex2] regex_search = [] for x in df['data']: for regex in regex_list: try: regex_search.append(re.search(regex, x).group().rstrip()) except: pass

i am currently doing this and it seems to be working just looking for a more optimized solution.

misty flint
#

dang stelercus seems to know pandas inside out huh?

#

im impressed

serene scaffold
#

you can do df['data'].str.extract more than once and make more than one column, yes.

haughty ibex
#

see i know it was trash

#

thats why i came here and asked lol

low jay
#

Hi, how would you plot a 3D linear regression model from a dataframe?

serene scaffold
#

@misty flint I actually don't know how I'd do that off the top of my head ^

haughty ibex
#

@serene scaffold i want the matches to be in the same column so thats why i did the double for loop.

serene scaffold
#

and then forgot

low jay
#

@serene scaffold I've been trying to look for solutions for it online but I genuinely don't understand it. Thank you tho.

serene scaffold
serene scaffold
low jay
#

@serene scaffold This is what it looks like

serene scaffold
#
df['pattern_match'] = ''
for pattern in [regex1, regex2]:
    df['pattern_match'] += df['data'].str.extract(pattern).fillna('')
#

you could do this, I guess @haughty ibex

serene scaffold
low jay
#

@serene scaffold Oh the dataframe is from a practical

serene scaffold
#

and what three columns are going to be the axes on the plot?

low jay
#

@serene scaffold 1) Yes

#
  1. The dependent variable will be the flipper length, bill length and depth will be the other 2 variables
misty flint
#

oh wait

#

i think i did it before in MATLAB

serene scaffold
#
import matplotlib.pyplot as plt

x, y, z = df[['bill_length_mm', 'bill_depth_mm', 'filpper_length_mm']].dropna()

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z)

plt.show()
misty flint
#

but thats MATLAB monkaCHRIST

serene scaffold
#

try something like this

#
pd.concat(
    reduce(
        add,
        (df['data'].str.extract(pattern).fillna('') for pattern in (regex1, regex2))
    )
)

I made it slightly more lispy

steel geyser
#

Looking for a library that can analyze a game that is being played on my stream. I’m a twitch streamer and want to be able to track say the number of kills I get in a particular game. What libraries would I look into to do that?

serene scaffold
misty flint
#

this could end up being a relatively simple problem or something much more involved lol

serene scaffold
#

yes. if the solution isn't something happening to a static UI element, any solution we come up with will probably need so much compute power that you won't be able to play your game.

misty flint
#

video data is not something ive worked with personally but like you mentioned its def a hassle

serene scaffold
#

but at least you have a GPU 😄

misty flint
#

the groups ive seen work with it also complain about how much data is generated as well

#

so you def dont want to use all that data, just certain stills/frames if possible

steel geyser
misty flint
#

ah perfect

#

the simple solution

serene scaffold
#

so, you need something that watches the pixels on that part of the screen, and any time they change, it needs to detect if the change is the number going up.

steel geyser
misty flint
#

my first instinct is to look into opencv and pytesseract

serene scaffold
misty flint
#

at least thats off the top of my head

serene scaffold
#

anyway, I don't do anything with images except maybe optical character recognition. so I don't even know if there are libraries that watch parts of a screen.

misty flint
#

hmm i think ive seen an article about it once

#

there was a twitch streamer that did something similar

steel geyser
#

Ok. I’m familiar with opencsv. Not so much pytesseract.

steel geyser
misty flint
#

ah i remember now, this was a high schooler on a podcast i listen to

#

maybe you can find something by googling him

#

i think he ended up getting into a really good school bc of this

#

its been a while, i dont remember

steel geyser
misty flint
#

lol the host is the data scientist, while the high school kid is the twitch streamer that was a guest on that episode

#

but you can still listen

#

he has something interesting guests all across DS

#

some people work in all sorts of domains and fields

#

i think its most interesting hearing their background/journey

#

one was an olympic medalist before going into DS

#

another one was an ex-cultist

#

💀

#

anyway interesting stories tbh

steel geyser
#

Ahh. Ok. I see. Well I appreciate it.

misty flint
#

good luck bud. let me know if you end up getting it to work

#

i still think opencv will let you do something

iron basalt
#

If you generally know where the text is you probably want to only grab that region or it will be slow on larger resolutions.

frank quiver
#
class AutoEncoder(nn.Module):
  def __init__(self):
    super(AutoEncoder, self).__init__()
    self.encoder = nn.Sequential(
            nn.Conv2d(55, 16, 3, stride=1, padding=1),  # b, 16, 10, 10
            nn.ReLU(True),
            nn.MaxPool2d(2, stride=1),  # b, 16, 5, 5
            nn.Conv2d(16, 8, 3, stride=1, padding=1),  # b, 8, 3, 3
            nn.ReLU(True),
            nn.MaxPool2d(2, stride=1)  # b, 8, 2, 2
        )
    self.decoder = nn.Sequential(
            nn.ConvTranspose2d(8, 16, 3, stride=1),  # b, 16, 5, 5
            nn.ReLU(True),
            nn.ConvTranspose2d(16, 8, 5, stride=1, padding=1),  # b, 8, 15, 15
            nn.ReLU(True),
            nn.ConvTranspose2d(8, 55, 2, stride=1, padding=1),  # b, 1, 28, 28
            nn.Tanh()
        )
  def forward(self, x):
    x = self.encoder(x)
    print(x.shape)
    x = self.decoder(x)
    print(x.shape)
    return x
``` I am getting error `/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:47: UserWarning: Using a target size (torch.Size([1, 55, 46, 46])) that is different to the input size (torch.Size([1, 55, 47, 47])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.` . My input size is `(1,55,46,46)` but i dont know why i am getting `[1, 55, 47, 47]` ?
rugged hawk
#

Is there any way to get only last 3 months data?
The first row is latest month so what I did is: made a list of months
l1=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']

month=df['Date'].iloc[0]
curr_month=month[:3]
curr_index=l1.index(curr_month)
prev_month=l1[curr_index-1]
last_second_month=l1[curr_index-2]
month_list=[curr_month,prev_month,last_second_month]

so month_list gives me last 3 months including current, then I tried to find list elements in df column using df[df['Date'].str.contains('|'.join(month_list))]

#

but as you can see in the picture the last rows from df it contains last year Mar data. so it returning that data also. so How can I get the only latest last 3 months data

minor elbow
#

u can use slice operator with dates, so assuming u have the date in variable x you can go df.loc[x:,:]

#

date indexs can get a bit intricate with pandas

#

u could resample the index to monthly frequency, take the 3rd last index of that, then make x the "yyyy-mm" string of that with strftime(), then use x with the slice index

#

in other news this is an interesting review from openai re gpt https://openai.com/blog/language-model-safety-and-misuse/

OpenAI

The deployment of powerful AI systems has enriched our understanding of safety and misuse far more than would have been possible through research alone. Notably: API-based language model misuse often comes in different forms than we feared most. We have identified limitations in existing language model evaluations that we are

flint pendant
#

How can I use Levenshtein.ratio to compared strings between 2 different columns in a dataframe? I have a dataframe with a few ten million rows and can't figure out how to get it to do the ratio of the strings in each row of the dataframe.

rugged hawk
pastel valley
#
base_model = Sequential()

resnet50_model = tf.keras.applications.ResNet50(include_top=False,
                   input_shape=(144,144,3),
                   pooling='max',classes=6,
                   weights='imagenet')

for layer in resnet50_model.layers:
        layer.trainable=False

base_model.add(resnet50_model)

base_model.add(Flatten())
base_model.add(Dense(1024, activation='relu'))
base_model.add(Dense(512, activation='relu'))
base_model.add(Dense(256, activation='relu'))
base_model.add(Dense(6, activation='softmax'))
#

in this snippet there shows transfer learning right?

pastel valley
#

but what if i just want to just use the architecture of resnet50 and i want to train it myself?

pastel valley
#

non trainable parameters are units that are unchangable? isnt that bad?

#

how to prevent it?

urban lance
#

I wouldn't worry too much, but I'm interested in the answer anyways

pastel valley
#

how to diagnose this kind of thing on keras?

#

maybe those non trainables are from the resnet50?

tacit basin
# pastel valley non trainable parameters are units that are unchangable? isnt that bad?

The number of none trainable weights of the model comes from the BatchNormalization layers whose mean and variance vectors are updated via layer updates instead of backpropagation and therefore are considered as none trainable parameters.
https://github.com/experiencor/keras-yolo2/issues/167

GitHub

I found the answer to that question but I am posting it here in case someone is asking themselves the same question as it took me some time to figure it out. We consider in this example the Tiny Yo...

pastel valley
#

btw in this code

base_model = Sequential()

resnet50_model = tf.keras.applications.ResNet50(include_top=False,
                   input_shape=(144,144,3),
                   pooling='max',classes=6,
                   weights=None)

base_model.add(resnet50_model)

base_model.add(Flatten())
base_model.add(Dense(1024, activation='relu'))
base_model.add(Dense(512, activation='relu'))
base_model.add(Dense(256, activation='relu'))

base_model.add(Dense(6, activation='softmax'))

base_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=METRICS)

i want to just use the architecture of the resnet50 and train it myself with my own data and classes
i changed the input shape and added dense layer to the end is this it? did i implement what i wanted to do right?

pastel valley
#

yo?

#

wew since i am training resnet from scratch there are alot of computations needed right? this will take alot of time

limber kelp
#

I want to see if these two columns are related or not like if particular Branch always maps to particular city?

How can I check it?

#

using pandas

#

or any python lib

desert minnow
#

Hello all, Im trying to build a ordinal classification model (basically ranking prediction). Can someone help me out in choosing model? Thanks 😊

tacit basin
pastel valley
#

but my dataset is from may local drive do i need to upload it to collab?

tacit basin
pastel valley
#

btw is this natural ?

tacit basin
# pastel valley btw is this natural ?

never trained from scratch. but early epochs that may be right since almost nothing is correct yet.
btw why don't you want to train using transfer learning?

pastel valley
# tacit basin never trained from scratch. but early epochs that may be right since almost noth...

i am doing experiment on image augmentations i want to compare if there will be performance boost or what and to compare them fairly i think using the same architecture and exactly the same initial weights will be good so i first created my own cnn architecture but i realized that doing this experiment on my own simple architecture is non sense because noone will ever use it so i decided to use a popular or one of the best architectures

#

using the architecture ill create 2 identical models and train them on classifying the same classes but with different data

#

like datasetA is with etc and datasetB with etc like that

#

does it make sense?

pastel valley
upper spindle
#

i want to read in a csv file from a directory using this code eth = pd.read_csv("../EC331/combined_posts_comments_final.csv") but it doesnt seem to work

pastel valley
#

@tacit basin
btw how about this ?
what does it mean its learning but maybe it needs more epochs to get better validation?]

neat anvil
#

honestly @pastel valley these questions about data augmentation, transfer learning, and deep learning model architecture are quite complicated to answer and get to the root of a lot of fundamentals of deep learning. You'd probably be best served taking some courses and building up your fundamentals in math and stats IMO.

#

and I mean sounds like you're curious enough about the topic that you'd probably enjoy the courses

urban lance
#

can someone explain to my how this "sum" param works exaclty? I'm having some strange results

df.groupby(["user",pd.Grouper(key="timestamp", freq="W")]).agg({
    "col1": "sum"

})```
I have a column with true and false values exclusively, I'm trying to count the true values within a certain interval
but some results are negative
#

I really don't understand why it does that

serene scaffold
urban lance
#

it appears as though groupby tries to save an int16 value in an int8 🤔

civic stone
#

Good Afternoon everyone ,

i am trying to use "Word2Vec" package in pycharm
from gensim.models import Word2Vec

but it shows an error Unresolved reference 'Word2Vec'

can anybody support me on this

tacit basin
urban lance
#

@serene scaffold I've found my issue

#

it indeed was bit overflowing

tacit basin
tacit basin
serene scaffold
arctic wedgeBOT
#

Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.

A full traceback could look like:

Traceback (most recent call last):
  File "my_file.py", line 5, in <module>
    add_three("6")
  File "my_file.py", line 2, in add_three
    a = num + 3
TypeError: can only concatenate str (not "int") to str

If the traceback is long, use our pastebin.

urban lance
#

And also my dataset was HUGE and it caused memory issues

serene scaffold
#

ah

upper spindle
serene scaffold
#

though we already know from the error message that you don't.

upper spindle
#

there is the directory

#

but it still comes up with the same error

serene scaffold
# upper spindle

this screenshot cuts off the error message. but I'll only look at error messages that are given as text.

#

do you know what the .. at the beginning of the path do? if not, you should probably delete them.

upper spindle
#

FileNotFoundError: [Errno 2] No such file or directory: '../EC331/Ethereum Data/combined_posts_comments_final.csv'

#

this is the error sorry

upper spindle
serene scaffold
upper spindle
#

okay, ill give that a try

desert oar
#

you can figure out what the working directory is by doing this in a python code cell: import os; os.getcwd()

#

.. is always relative to the working directory, not to the current file/script being executed

upper spindle
#

it worked thanks @serene scaffold and @desert oar

serene scaffold
desert oar
somber bough
#

So im planning to make a program that can identify the original 151 pokemon if you can upload a picture, and I got a dataset from kaggle, and i was going to use googlr teach able machine to upload and make the model, but i was wondering if that would be bad idea to have 151 diffrent things in 1 tenserflow model?

serene scaffold
#

what you use to create the model (tensorflow, pytorch, etc) doesn't actually matter as far as its potential capability

#

what matters is the training data that you have and the model architecture

#

that said, how many separate images do you have for each of the 151 pokemon?

#

because if you only have one image per pokemon, that's not going to be enough

somber bough
tacit basin
# upper spindle okay, ill give that a try

In addition to what salt rock lamp said about os.getcwd() in jupyterlab in a cell you could use bash pwd like that

!pwd

It's a useful way to execute bash commands in a notebook

serene scaffold
desert oar
desert oar
#

50-70 seems good

serene scaffold
tacit basin
desert oar
#

I think it's a standard practice in image classification problems to synthetically generate a lot more training samples by algorithmically distorting or otherwise modifying the images

desert oar
#

stretching, skewing, altering colors, rotating/mirroring, adding noise, etc.

normal saffron
#

anyone want to explain how tf to make ai

#

pls help

desert oar
#

there are lots of articles about data augmentation for image classification problems

normal saffron
#

i read

#

them

serene scaffold
normal saffron
#

they are no help:(

desert oar
#

what you need is a machine learning course starting at the basics

normal saffron
desert oar
#

you're basically asking somebody to type out a textbook chapter for you

normal saffron
#

copy and paste

#

no but rlly could someone at least show me an ai program so i could see how it works?

serene scaffold
upper spindle
#

how would someone from an econ background specialise/learn about ML/DL/NN

tacit basin
mild dirge
serene scaffold
normal saffron
tacit basin
serene scaffold
#

but self-driving cars are going to have numerous components

normal saffron
#

true

serene scaffold
#

they probably need cameras to see whats going on, and models to identify what each thing is

normal saffron
#

so cv?

serene scaffold
#

and then it needs some formula to decide how fast or slow to go based on those conditions, as well as incline, speed limits, etc.

desert oar
# upper spindle how would someone from an econ background specialise/learn about ML/DL/NN

depending on how advanced the econ background is, you should have more than enough math and statistics foundational knowledge to jump in "math first". Fast.ai can't hurt as an easy "first course in modern deep learning". for books, check out Probabilistic Machine Learning by Murphy and/or Deep Learning by Goodfellow. what the econ background lets you do is skip all the statistics basics and go right for the fun stuff

normal saffron
serene scaffold
normal saffron
desert oar
#

however you will probably want to revisit statistics from outside the perspective of econometrics, because in my experience econometricians tend to use different techniques and think about problems differently @upper spindle . so it depends on your background. the general recommendations are more or less the same as for someone who knows very little or nothing, but the benefit of having a quantitative background is that you can move a lot faster through the intro material and don't need to spend time learning how to program a computer, how to read equations, how to reason statistically/probabilistically, etc.

upper spindle
desert oar
#

what is your background @upper spindle, specifically?

serene scaffold
upper spindle
upper spindle
#

but im lacking on the programming side

serene scaffold
#

anyway, my advice would be to apply to graduate programs in something more closely related to data science. I've worked with data scientists with an economics background, so it's probably one of the better non-CS avenues into DS/AI.

serene scaffold
desert oar
#

ah, so undergrad econ

#

that changes things a bit

upper spindle
#

i have done some, but my department here in the uk used stata

desert oar
#

yeah you basically should treat yourself like an advanced beginner

#

start where everyone else starts

#

you probably can read equations, and do calculus, and know some linear algebra

#

you know what regression is, you know about model bias and variance, you know about statistical inference at least on a basic level, you know how to reason about model building

upper spindle
desert oar
#

so start at the basics but you can move quickly through it

#

i very strongly suggest the Murphy book

#

the beginning material should all be familiar to you from econometrics, but it might be expressed somewhat differently from what you are used to

#

that + the fast.ai course should be a great start imo

#

no need to rush through it

upper spindle
#

thanks

desert oar
#

i also strongly suggest learning python, since this is a python forum 🙂

#

R isn't that useful for "machine learning" as such

upper spindle
desert oar
#

good, that will be useful in industry

#

a lot of jobs will place high value on your ability to write code independently

upper spindle
serene scaffold
desert oar
desert oar
#

pytorch is pretty easy to use

#

especially when you already know the underlying math

#

i also wouldn't spend too much energy on learning how to implement things "from scratch"

#

numerical computing is its own field

#

learn about how the models work mathematically and how to use them, don't worry about implementing them

upper spindle
upper spindle
upper spindle
#

ive been using tensorflow to implement lstm's so far

serene scaffold
#

that is, you're actually implementing LSTMs "from scratch" (ie with no constructs more abstract than individual tensors)?

#

I ask because we've discussed lately how overused the word "implement" tends to be. but yeah, implementing things like that "from scratch" isn't something I'd do at your stage, though you'll get to a point where you could if you wanted to.

upper spindle
#

ohh okay, sorry haha, ive been using code from githubs, youtube and combining them into a univariate lstm

desert oar
#

yeah dont waste your time with the youtube tutorials

#

go in with a beginner's mind imo

#

you'll make progress quickly

#

you won't struggle like a real beginner would

misty flint
#

but tbh

#

i also recommended python

#

since if you want to do advanced data science in R, you end up calling the Reticulate package anyway

#

aka using python through R

#

even the R podcasters i listen to end up having to use python sometimes

#

and theyre trained as biostatisticians too

#

even if you have to do bioinformatics, theres biopython

#

but the documentation for some of that stuff can be terrible sometimes so good luck

misty flint
serene scaffold
#

is there cuda-enabled deep learning in R?

misty flint
#

if there is, idk about it

serene scaffold
#

because it's easier to just have all of scientific computing under one roof, and if they're missing that, it's going to become impossible to compete.

misty flint
#

true

#

many academics use R tho, so dont think its going away anytime soon

upper spindle
misty flint
#

for now

nova tapir
#

can someone explain why correct option is option 4?

upper spindle
#

is jupyter labs the best tool for data science/programming in python

#

seen a few people use spyder

#

or what are your go to tools for data science

serene scaffold
# nova tapir

think of which two quadrants the data points are in, and which quadrant the options are in

cinder thicket
serene scaffold
# nova tapir

the first two options are on an axis, but not lined up with data points. look at where the other two are, if you treat them as points.

upper spindle
#

and try to install again

cinder thicket
serene scaffold
#

are you following a tutorial? this looks like a misguided use of Python OOP

#
return self.df   # there is no self.df attribute of myDataframe
x.dataframe()    # This value isn't used, so nothing happens--did you want to return it?

If you want the d variable in the myDataframe.dataframe method to be exposed, you have to store it as self.d = ...

desert oar
#

even as pseudocode, this is very weird code that seems to have been written by someone who was confused about how to use classes

#

i don't mean to be offensive, but i think that was what stelercus was commenting on

cinder thicket
#

@upper spindle tried this and still getting same errors

#

its jupyterlab i am trying to install

serene scaffold
#

myDataframe just looks like a wrapper around a single dataframe with no particular purpose, and it refers to instance variables that aren't defined.

#
def make_df(ticker):
    return pd.DataFrame({'Ticker': [ticker]})

What you have appears to be an over-engineered version of this.

upper spindle
serene scaffold
#

if you need to go back to having {'Ticker': [ticker]}, you can just do df.to_dict() on the dataframe. the wrapper class just adds a layer of potential complexity.

upper spindle
cinder thicket
serene scaffold
#

I find it is easier to define an init function with self.var in case I decide to add functions or alter the code down the line.
one usually wants to avoid having lots of mutable state

#

you're also creating an additional API on top of pandas that people who use your code would have to learn.

stone marlin
#

Aw, I was trying to do a little refactor of the code, but they deleted it. :'[

tacit basin
tacit basin
tacit basin
cinder thicket
tacit basin
tacit basin
#

It's a commonly used tool in DS.

cinder thicket
tacit basin
#

Colab is a good option as well

cinder thicket
#

may install the desktop app in the future, but not now

cinder thicket
tacit basin
cinder thicket
tacit basin
cinder thicket
#

now that i know how do do this, can someone help me with doing shape from shading with python
i want to know if its possible, and if so, how do i do it?

#

i want to make some DEMs for moons in the solar system

tacit basin
#

Not sure what that means but you could do most things with images in python for example with opencv library

cinder thicket
tacit basin
floral valley
#

what does it mean when a arima problem is unconstrained?

serene scaffold
floral valley
#

The error when I run SAIRMAX is "this problem is unconstrained"

#

It outputs buts not entirely

serene scaffold
#

try showing the whole error message from Traceback.

floral valley
#

heres the error if you want to see

serene scaffold
#

I will only look at text, sorry

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

floral valley
#

sure

#

been trying to get this completed forever and just cant lol

serene scaffold
floral valley
#

how do you get the inferred frequency?

#

i would persume it guesses off the data but how do you pass that in

serene scaffold
#

I don't know--I don't even know what the problem is

#

I'm just following my usual debugging steps

floral valley
#

should i send my kaggle code? everything is on here

#

may help

#

one sec

serene scaffold
#

I don't think I can do a deep dive right now, but someone else might.

floral valley
#

dont worry ill just have it on here if anyone can help, havnt had much progress by myself and need to do 3 models lol

#

its not a large program but trying to get it working

tacit basin
minor elbow
#

for your SARIMA thing i would suggest using a simpler model like dropping the seasonal order and trend and see if that works then you can add in the extra stuff to find out exactly whats causing the issue

#

im not sure the 'this problem is unconstrained' is an error

astral delta
#

Anyone here good with pytesseract and pyautogui dm me im tryna create a bot for something

serene scaffold
astral delta
#

Ight, so I am trying to make a bot answer these questions rlly fast, so I am trying to use ocr to get the questions and answer it, and then I will try to correspond the answer to one of the choices and press 1,2,3,4 to get the correct answer

serene scaffold
#

is it always two integers and one of the four basic operations?

thin palm
#

➜ wbanalysis git:(gcp) ✗ make upload_data [🐍 warren-buffet]
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
why is GCP saying the destination url must name a directory even though I give it a directory?

pseudo wren
#

I am having trouble converting api data into a csv and also manipulating said data

#

for example I am doing a project on murder rates as reported by different news publications

#

I was able to pull the API

#

but every time I try to convert it to a cvf, I get an error message

#

i've looked at several stack overflow message boards but cannot find a solution that works for me

#
import json 
import csv
with urllib.request.urlopen('https://content.guardianapis.com/search?api-key=47b0057b-d60d-4a3d-a6cf-c1f79aeedaa4') as url:
  s = url.read()
  print(s)```
#

this is the code for right now

twin hound
#

Hey guys can someone tell me why my scores are so low for my testing set when clearly the model predicts the test data very well:

#

These plots are test vs prediction from my model

#

blue is test, red is prediction

misty flint
#

idk about clearly

twin hound
#

What I mean is they predict the data relatively well

#

So a score less than 0.6 makes zero sense

#

I've seen it plotted using another ML method with higher score and it doesn't look nearly that clean

rain temple
#

I am following a tutorial on pix2pix generation. The output for the shapes of each of the target arrays and the source arrays are
Loaded: (1096, 256, 256, 3) (1096, 256, 256, 3)

but for me the arrays are
Loaded: (256,256,3) (256,256,3)

Does this mean that not all of the images are loaded into the arrays?

#

I plotted the contents on matplotlib and this is what I get, but I am supposed to get a picture of the images.

#

Could someone please help. Thx

somber bough
#

So I'm making a custom detection model where you can upload an image and it will put it through the detection, and display the top 3 closest results (like on a bar graph), but im somewhat new to python so i dont know which library to use

quick eagle
#

Hello all! I'm trying to use the fastdtw module to align time-series data that is slightly off.... but the fastdtw alignment makes it WAAY worse!! Any suggestions on whether I'm using that module incorrectly, or a better way to synchronize data?

#

But I start with that (orange trace is a few seconds ahead), and fastdtw makes a mess of it!!!

violet gull
#

this makes me happy

#

predicting a 50 50 shot with 75% accuracy

grand dagger
#

WHAT

violet gull
#

what

serene scaffold
violet gull
#

is square or is not square

#

binary

serene scaffold
#

I don't mean to be the bearer of bad news, but for binary classification, 50% is the worst possible accuracy

#

so 75% is kind of like 50%

violet gull
#

huh?

#

what does that even mean

serene scaffold
#

If there's only two classes, and your model was completely random, then it would get 50% accuracy

violet gull
#

yes

#

and its getting 75%

#

therefor its better

serene scaffold
#

I suppose

violet gull
#

i dont see the problem

#

it is working as expected

pastel valley
#

btw i am using google collab is there an option to use more computational power?

serene scaffold
#

@pastel valley there is if you're willing to pay them for it.

pastel valley
serene scaffold
#

Colab is already generous.

pastel valley
#

its still take me forever hahha

serene scaffold
#

Did you remember to set it to use the gpu

pastel valley
#

no its all default i dont know how to configure collab

#

is there that option?

serene scaffold
#

Yes. But idk how to do it off the top of my head

#

Also, let me reiterate that I think you would benefit a lot from and very much enjoy a formal data science course.

tacit basin
serene scaffold
#

You might also need to move the model to the GPU

heavy bay
#

I'm making a simple neural network to find the relationship between 2 numbers ```py
from tensorflow import keras
import numpy as np

model = keras.Sequential(keras.layers.Dense(units=1, input_shape=[1]))
model.compile(optimizer='sgd', loss='mean_squared_error')

def calulate_trangular_numbers(n):
for i in range(1, n+1):
yield int(i*(i+1)/2)

n = 20
x = np.array(list(range(1, n+1)))
y = np.array(list(calulate_trangular_numbers(n)))

model.fit(x, y, epochs=500)``` (I want it to find the relationship between the x values and y y = x*(x+1)/2)
But for some reason when I fit the model the loss is nan

Epoch 1/500
1/1 [==============================] - 0s 9ms/step - loss: nan
Epoch 2/500
1/1 [==============================] - 0s 8ms/step - loss: nan
Epoch 3/500
1/1 [==============================] - 0s 7ms/step - loss: nan
Epoch 4/500
1/1 [==============================] - 0s 12ms/step - loss: nan
Epoch 5/500
1/1 [==============================] - 0s 12ms/step - loss: nan``` any reason for why this could happen?
tacit basin
serene scaffold
heavy bay
pastel valley
pastel valley
heavy bay
shut phoenix
#

Learn how to use TensorFlow 2.0 in this full tutorial course for beginners. This course is designed for Python programmers looking to enhance their knowledge and skills in machine learning and artificial intelligence.

Throughout the 8 modules in this course you will learn about fundamental concepts and methods in ML & AI like core learning alg...

▶ Play video
#

I am getting into ml and ai field

copper dirge
#

I would recommend getting more confident with python before starting something like this...

pastel valve
#

Hi guys, i have a question regarding machine learning. Which algorithm will be the best if the data set generated will be based on the graphical location of the mouse cursor (numerical data) the objective is the allow the machine to learn the mouse movements

tacit basin
tacit basin
pastel valve
#

mouse movement. graphical data( numerical)

tacit basin
pastel valve
#

no, raw data

#

graphical location of the cursor

tacit basin
pastel valve
#

x and y axis

tacit basin
#

And output?

pastel valve
#

from what i belive, the output will be based on the input

#

since the machine will have to predict what the next input might look llike

tacit basin
pastel valve
#

ye

#

s

tacit basin
tacit basin
tacit basin
# shut phoenix Alr ty

They will have live course starting it n April, in person and online https://mobile.twitter.com/jeremyphoward/status/1499600211714674688

I am over the moon to announce:

  1. I'm now a professor at University of Queensland (UQ), the top institute in my home state!
  2. I'll be teaching a brand new deep learning course at UQ from April, which will form the basis of a new @fastdotai course! 🧵
    https://t.co/RAMaHb7eZ2
Likes

2797

Retweets

192

shut phoenix
#

Interesting

tacit basin
shut phoenix
#

Tysm

tacit basin
#

Live course may be paid, but they release as free MOOC soon after live course finishes.

sterile rivet
#

Any of yall are experienced with big data projects? I want to start with one and would love to know your dataset preferences.

somber prism
#

guys i have a doubt , here https://fractaldle.medium.com/brief-overview-on-object-detection-algorithms-ec516929be93

what does it mean by For each object class, train a SVM (one versus other) classifier. You can use hard negative mining to improve the classification accuracy. , does it take the output of last fc hidden layer and feed it to svm for classification or take the softmax fc layer and feed it to svm?

Medium

Understanding Object detection frameworks and discussing the evolution of the same.

sterile heath
#

https://youtu.be/GVsUOuSjvcg For anyone who hasn't seen it yet. Very interesting bit about flashable analogue chips running pretrained models with significantly reduced power consumption vs banks of gpus.

Visit https://brilliant.org/Veritasium/ to get started learning STEM for free, and the first 200 people will get 20% off their annual premium subscription. Digital computers have served us well for decades, but the rise of artificial intelligence demands a totally new kind of computer: analog.

Thanks to Mike Henry and everyone at Mythic for the...

▶ Play video
#

Cool bit of ML history, too.

karmic moth
#

Does anyone know how to use Tf-Idf with a CNN for texts (NLP)

#

any article or something u can refer me to or tutorial?

craggy tiger
#

Does anyone know of any data-science projects which I can join?

hollow sentinel
#

wow this is nice

#

it has application and bases itself off a good textbook

pastel valley
#

this is probably my best results so far the distance of train test is not like the other ones
but those spikes on loss and accuracy is it normal? or there are common knowledge on why those happens?

grave frost
drifting lion
#

anyone who has worked on ML, do people put training and testing processes on same .py file or create different modules for each?

neat anvil
#

I like tests to always go in a separate ‘tests/‘ directory. Source and tests being together makes things confusing IMO

maiden kite
#

like the university of heilsinki one

iron basalt
#

(I do this for not just ML but all new algorithms)

#

In addition, I like to have at least 1 test made by someone else to make sure that i'm not just making tests I know it will pass.

craggy tiger
#

Hi there, I am looking for a data-science community to work with on interesting projects.

stone marlin
#

Everyone should read this and apply it to their development cycle, for real.

iron basalt
#

I like to imagine programming like crystallization or annealing. At first it's hot and I want to strike it often, but eventually I want it to cool off and harden / crystallize.

#

(Pro tip, check the commit rate of a piece of code, if it starts slowing down, it's time to add some tests and let it harden, but if it's updated a lot even after a long time, maybe it's the wrong approach / design and therefor is causing a lot of bugs)

#

(If someone asks you to fix their code base, look for what is being changed a lot and find out why)

stone marlin
#

I was given that last advice by a former manager, and we had a tool to look at file-commit-rates. Many of them were just adding business logic (or false positives --- typos someone forgot to squash) and so it was easy to pull that out so that the business logic could be more easily changed and updated and then "plugged in" to the microservice. Great advice.

lapis sequoia
#

Hello I have a conda project with a typer cli app in it located in libraryassignment/__main__.py file and I'm currently running typer commands like so: python -m libraryassignment <command_name>. It works fine but I want to be able to execute without -m flag like so: python libraryassignment <command_name> but I get ModuleNotFoundError: no moduled named 'libraryassignment'.
As far as I know, I have to either include it to the path or create a python package. I'm relatively new to conda and I wonder how can I tackle this issue creating all the required configuration to build a package so that python detects it as a package allowing me to keep developing on the project.
I used poetry in the past and it's pretty intuitive and easy to use especially regarding the building process of python packages with pyproject.toml and poetry.lock files but I don't have much experience with conda and I wonder if you can help me with some guidelines that I can put into practice to build a package from a conda project.
Thank you very much in advance.

thin palm
#

any GCP experts out there who can tell me why I keep getting this error? Just trying to upload a folder to my GCP Bucket

➜ wbanalysis git:(gcp) make upload_data
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
make: *** [upload_data] Error 1

misty flint
#

speaking of, i need to learn a cloud tool

tacit basin
# maiden kite wdym by some python coding experience? I am am doing automatetheboringstuff book...

A year of coding (preferably Python) and high school math is the recommended pre-requisite. The best way to get up to speed is to start taking the current course new, and work to fill in any knowledge/expertise gaps you come across as you go.
https://mobile.twitter.com/jeremyphoward/status/1499600223920074754

A year of coding (preferably Python) and high school math is the recommended pre-requisite. The best way to get up to speed is to start taking the current course new, and work to fill in any knowledge/expertise gaps you come across as you go.
https://t.co/nzv7pek0iq

maiden kite
#

like you need calculus and advanced functions

#

for the course

misty flint
#

just try it

#

you should be able to fill in any knowledge gaps like they said

sterile heath
pastel valley
#

how long i s the cooldown with this?

serene scaffold
tacit basin
# pastel valley how long i s the cooldown with this?

The more you use it the longer you have to wait I've read. It's like hours or days.
You could try transfer learning. Kaggle will give you around 30-40 hrs of GPU usage a week guaranteed. For now AWS sagemaker studio lab doesn't have limits other than 4hrs session, similar to paperspace but here GPU may be not available at times due to demand.

novel raven
#

Hey

#

Would you need maths skill for data science?

#

If yes then what could it be

serene scaffold
tacit basin
spare moat
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

karmic moth
#

hi i have a question

#
model = Sequential()

model.add(Conv1D(filters=3, kernel_size=1, activation='relu', input_shape=(None, 3, 10, 1)))
# model.add(MaxPool1D(pool_size=3, strides=1))
# model.add(GlobalMaxPooling1D())

# model.add(Conv1D(filters=32, kernel_size=3, activation='relu'))
# model.add(MaxPool1D(pool_size=2, strides=2))
# model.add(GlobalMaxPooling1D())

model.add(Flatten())

model.add(Dense(units=128,activation='relu'))

model.add(Dense(units=1,activation='sigmoid'))

# For a binary classification problem
model.compile(loss='binary_crossentropy', optimizer='adam')
#

here is a cnn model code

#

im getting this error

#
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-4023ec9e66ce> in <module>
     11 model.add(Flatten())
     12 
---> 13 model.add(Dense(units=128,activation='relu'))
     14 
     15 model.add(Dense(units=1,activation='sigmoid'))


ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.
#

does anayone know y?

pastel valley
#

you only has 1 available prediction

karmic moth
#

what?

pastel valley
#

or also maybe the input shape?

karmic moth
#

so how can i fix?

#

isnt the input shape set on the first layer, which is the first Conv1D layer, and i have done that

pastel valley
#

am also noob so i dont know if what i am saying is correct hahaha

#

but the shape should be square i think

#

and the units after your flatten() should be =< the units of flattened

#

also your final layer should be more than one unit because if it is only one then its predicting only a single class in your case i think sigmoid is for binary class

#

correct me if am wrong 😅

karmic moth
#

i am prediciting one value, a binary value, so its binary prediction

novel raven
violet gull
violet gull
#

also is there a certain matrix size inside the neural network that is most efficient

#

ex.
121 data points --> x --> y --> z --> 2

violet gull
iron basalt
violet gull
#

are we talking about the same thing?

#

thats not exactly how ML works

iron basalt
violet gull
#

so i input an array of size 121 * 121 right

#

and then i go through 3 layers of matrix multiplication to get the 121 into a 1x2 or a 2x1 i forgor

iron basalt
#

"get the 121 into a 1x2 or a 2x1 i forgor" - I don't understand what this means.

karmic moth
#

can someone help answer my question

lilac dagger
#

know where to get started with data science and what is basically is

#

is it legit just anyalization shit tons of data for a motive

#

like facebook with their ad systems?

lapis sequoia
lapis sequoia
lilac dagger
#

icic

tacit basin
#

Crazy idea: Neovim but like jupyter Notebook. So Neovim Notebooks! Possible?

modest shuttle
#

Hello,
rect = win32gui.GetWindowRect(hwnd)
I grab my screen for object detection but i want grab specific section of my screen, How can i do that?

gloomy anvil
#

hello y'all! I created a SARIMAX model and need some help evaluating the Results:

#

I mean this looks quite good at first glance, right? But is it? The RMSE is 0.024718 when comparing acutal vs. prediction

#

Could you maybe have a look at it?

desert oar
#

rmse of 0.025 on values on the order of ~2 seems good to me!

#

however it looks like your model testing procedure is probably not valid

#

you don't want to just check a bunch of one-step-ahead forecasts, obviously those will always be good

#

you need a train/test split

#

or better yet cross validation

fallow frost
#

Any body in the data science/ analytics field?
I wanna ask how much more do i need to know to get a basic/ junior data analyst position

gloomy anvil
#

this is the description of my test datset. I split it into 1000 rows for training and 171 rows for the test.

#

This is my code:

#load dataset
df = pd.read_csv('ADA_1440.csv', index_col = 'date', parse_dates = True)

#split the closing price into train and test data
train = df.iloc[:1000,4]
test = df.iloc[1000:,4]

#select exogenous variables
exo = df.iloc[:,6:61]

#split exogenuous variables into train and test data
exo_train = exo.iloc[:1000]
exo_test = exo.iloc[1000:]

#run auto_arima to find the best configuration (I selected m=7 and D=1 by running seasonal_decompose and acf and pacf plots)
auto_arima(df['close'], exogenous=exo, m=7, trace=True, D=1).summary()

#set the best configuration from auto_arima for the SARIMAX model 
Model = SARIMAX(train, exog = exo_train, order=(1,0,2), seasonal_order = (0,1,1,7))

#train model
Model = Model.fit()

#get prediction
prediction = Model.predict(len(train), len(train)+len(test)-1, exog = exo_test, typ = 'levels')

#plot the prediction
plt.plot(test, color ='red', label = 'Actual')
plt.plot(prediction, color ='blue', label = 'Prediction')
plt.xlabel('Time')
plt.ylabel('Price')
plt.legend()
plt.show

#calculate rmse
rmse = math.sqrt(mean_squared_error(test, prediction))

gloomy anvil
serene scaffold
misty flint
#

for entry-level data analyst positions, the amount of knowledge needed isnt usually too much

#

but the issue comes with how competitive those positions, especially from people making career changes

#

sometimes you are competing with people with graduate degrees, work experience in certain domain, etc.

#

so you usually need something special to stand out

graceful glacier
#

hello!

#

i need help creating a dataframe with a multi index

serene scaffold
graceful glacier
#

sorry about that im still thinking through it

serene scaffold
#

sounds good. I might be able to check this channel again when your question is ready.

modern cypress
#

Hey, I am trying to change my project to be able to detect multiple objects in a single instance. All my images are annotated using pascal, but I am unsure where to go from here. Previously I had a "default" class filled with many random images, but I realized this is very incorrect (in my mind) and I would rather use general object detection and maybe add some bounding boxes if time allows

#

My file breakdown looks like this:

graceful glacier
#

i like the profile picture tony

modern cypress
#

thanks 🤣

somber prism
modern cypress
#

This is my first time trying something of this nature

#

These are some results of the older project but I realised that I was treating an ordinary image as a class instead of it being the default setting, if that makes sense

somber prism
#

btw does anyone know how to plot the normalized image, i mean after normalizing the image using albumentatation and converting the image range to -1 to 1 , matplotlib is displaying black image . now how can i avoid that ?? i even tried plt.imshow((image * 255).to(torch.uint8))

somber prism
modern cypress
somber prism
#

what you are trying is image classification

modern cypress
#

Ohhhhhhh right. Okay time to look into object detection approaches

#

Thanks for the help

violet gull
#

i ran it for an hour and it stopped increasing at about a score of 165

woeful tusk
#

Any tips to plot this, all blocks on the same plot? I have it inside a DataFrame. My line of thought was iterating through it each column but I guess iterating and dataframe shouldnt work together, right?

serene scaffold
#

what kind of plot are we talking about?

#

do you want line plots, where each block is a line?

woeful tusk
#

I was thinking of making it dinamically, since the amount of blocks can change based on user input

serene scaffold
# woeful tusk Yea

I would first do df.index = df.index.str.extract(r'(\d+)').astype(int) so that the index is ints instead of strings

#

and then you can use df.plot.line(). it might even work just like that, without any additional work

#

you might have to transpose it. but then that's just df.T.plot.line()

woeful tusk
serene scaffold
#

I would need a code representation of your dataframe that I can c/p to experiment.

woeful tusk
serene scaffold
woeful tusk
#
{'Bloco 1': [6000.0, 6000.0, 6000.0, 6000.0, 5996.913420966637], 'Bloco 2': [6000.0, 6000.0, 6000.0, 5986.342797261716, 5963.890663247039], 'Bloco 3': [6000.0, 6000.0, 5939.570902083334, 5873.3415172031355, 5809.641970812106], 'Bloco 4': [6000.0, 5732.619047619048, 5586.096291071429, 5478.48851392744, 5386.497501391264], 'Bloco 5': [6000.0, 6000.0, 5939.570902083334, 5859.684314464852, 5773.532634059145]}
serene scaffold
#

let me see

#

is this not basically what you want?

woeful tusk
#

There are only five values inside each list, but it goes further.

#

Yea

serene scaffold
#

so what's the problem? didn't I basically give you the solution?

woeful tusk
#

I was getting an error on the df.index =.... line

serene scaffold
#

okay, so show the error

woeful tusk
#

I guess it's because my index names have "Day" on the string

serene scaffold
#

saying that you "got an error" is uninformative. copy and paste the error from Traceback

#

also, you can label the x axis as "day" with xlabel='Day'

woeful tusk
#

ValueError: Index data must be 1-dimensional

serene scaffold
#

I asked you to copy and paste the error from Traceback.

#

!traceback

arctic wedgeBOT
#

Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.

A full traceback could look like:

Traceback (most recent call last):
  File "my_file.py", line 5, in <module>
    add_three("6")
  File "my_file.py", line 2, in add_three
    a = num + 3
TypeError: can only concatenate str (not "int") to str

If the traceback is long, use our pastebin.

woeful tusk
#
Traceback (most recent call last):
  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\simuladorexplicito.py", line 89, in <module>
    relatorio.index = relatorio.index.str.extract(r'(\d+)').astype(int)
  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\.venv\lib\site-packages\pandas\core\generic.py", line 5596, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas\_libs\properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\.venv\lib\site-packages\pandas\core\generic.py", line 768, in _set_axis
w__
    return Index(np.asarray(data), dtype=dtype, copy=copy, name=name, **kwargs)  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\.venv\lib\site-packages\pandas\core\indexes\base.py", line 503, in __new__
    arr = klass._ensure_array(arr, dtype, copy)  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\.venv\lib\site-packages\pandas\core\indexes\numeric.py", line 183, in _ensure_array
    raise ValueError("Index data must be 1-dimensional")ValueError: Index data must be 1-dimensional
serene scaffold
#

okay, can you do print(df.index)?

woeful tusk
#
Index(['Dia 0', 'Dia 15', 'Dia 30', 'Dia 45', 'Dia 60', 'Dia 75', 'Dia 90',
       'Dia 105', 'Dia 120', 'Dia 135', 'Dia 150', 'Dia 165', 'Dia 180',
       'Dia 195', 'Dia 210', 'Dia 225', 'Dia 240', 'Dia 255', 'Dia 270',
       'Dia 285', 'Dia 300', 'Dia 315', 'Dia 330', 'Dia 345', 'Dia 360'],
      dtype='object')
#

Btw, I need an extension to plot on the VS Code? The plot line runs fine, but shows nothing

serene scaffold
#

I don't use vs code

#

try

df.index = df.index.str.extract(r'(\d+)').astype(int).squeeze().tolist()
#

there's probably a better way to do it. somewhere.

woeful tusk
#

Had to do a plt.show()

serene scaffold
#

looks like you need to transpose it.

woeful tusk
#

That was with transpose already, gonna try without it

#

Worked, thank you very much mate

serene scaffold
#

🔥

violet gull
violet gull
#

what

serene scaffold
#

you got a score of 165. idk what that means.

violet gull
#

there is a data set of 500 squares and 500 not squares

#

for every data thing it correctly identifies it gets a point

#

and for every one it does wrong it loses a point

#

so 165 means it got 417.5 wrong and 582.5 right i think?

serene scaffold
violet gull
#

yes

#

wym by an actual metric

serene scaffold
#

saying that you "got 517.5 wrong and 582.5 right" is vague, whereas reporting the score for a performance metric is specific.

#

also how did you get some partially correct?

violet gull
#

it didnt

#

it never actually outputted 165

#

it only outputs even numbers

#

165 was just an average i saw

#

how do i make a performance metric

serene scaffold
#

well, let's go over a few issues with the code first.

notSquare = square  # This **does not** make a copy, it just makes another reference
if self.classify(square) == True:  # Never do comparisons to True or False. if `self.classify(squre)` is already True, you're just writing `if True == True`

You're also using lowerCamelCase for everything, when you should be using UpperCamelCase for class names and snake_case for everything else.

vague kindle
violet gull
#

^

vague kindle
#

If you were using camel case

serene scaffold
#

I don't think I can dive into what it would take to improve the performance, as you've written a lot of it in "pure python" and I'm used to reading code that uses the numpy/torch style is used more extensively.

serene scaffold
violet gull
#

hmmm

#

so is it broken?

#

classify square returns a boolean

serene scaffold
#

I'm not sure what it's intended to do.

violet gull
#

which part

serene scaffold
#

all of it. what is the model supposed to predict, for what inputs?

violet gull
#

its suppose to take an array like ```
test = [[
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
]]

serene scaffold
#

like, if the region of 1s is square?

violet gull
#

yes

serene scaffold
#

you don't need ML for that?

violet gull
#

well i want to

serene scaffold
#

also you have [[ and ]] but each row isn't its own list

violet gull
#

ye

#

im just trying to learn how machine learning works

#

:C

serene scaffold
#
        for square in squares:
            if self.classify(square) == True:
                score += 1
            else:
                score -= 1 

you don't subtract when a model makes the wrong prediction.

violet gull
#

yes i do