#data-science-and-ml | Python | Page 382

stone marlin Mar 3, 2022, 4:26 PM

#

I remember going through both of them when I was picking out a framework to use, and I was like, "ehh, these are both fine." Streamlit, though, felt easier to manage and build stuff up with for me --- but totally subjective. They're both great tools.

#

You'll probably do the data validation and then, after you get the results, it'll take an existing template and fill the stuff. In Flask / Jinja, you need to make this template yourself. In Streamlit, it sort'a makes it for you. I don't remember Dash's thing, but I think it's similar.

agile cobalt Mar 3, 2022, 4:27 PM

#

there's also Mode (and many others), if you just want reports instead of actual dashboards

#

in Dash you sort of define the HTML outline then feed plotly graphs into a div
(and use callbacks to update it based on user interactions)

stone marlin Mar 3, 2022, 4:28 PM

#

Ahhh, that sounds familiar. I haven't seen Mode, dang, there's a lot of these.

#

I only know Streamlit because Emyrs told me about it here. :']

agile cobalt Mar 3, 2022, 4:28 PM

#

I'm not sure if Mode was free or not though

odd meteor Mar 3, 2022, 4:29 PM

#

Lol yeah in a bus. I thought that's what everyone calls it

stone marlin Mar 3, 2022, 4:30 PM

#

Here's a table example in Streamlit, for Tsar's reference. I'm not sure how much customization you can do on tables, tho. https://share.streamlit.io/streamlit/example-app-interactive-table/main

agile cobalt Mar 3, 2022, 4:30 PM

#

streamlit seems fine as well - if it doesn't lock anything behind a paywall (cough dash enterprise...), maybe go for it

stone marlin Mar 3, 2022, 4:31 PM

#

Secretly, I love streamlit too because it integrates well with Altair, my beloved underrated plotting library. :'''']

desert oar Mar 3, 2022, 4:31 PM

#

i have no idea, you will have to investigate what is different between the validation and training sets. it might be background objects, or it might be some other problem

acoustic crow Mar 3, 2022, 4:35 PM

#

Let me represent this a bit more clearly. So I don't need to create any charts and etc. The thing that I have is a dataset which contains an 'n' amount of columns and each column has an some sort of value in it. For this column I have an observed and expected value. I have to create a validation script in Python with the help of Pandas and some other validation scripts that I am able to find to run their checks such as if this value is either negative and etc. After that validation is completed I need to somehow visualize it in a report style showing with flags(importance) each column that did not pass the check, it has to be sort of interactive to be able to filter and so on

#

So I am not sure where to really categorize my problem if its web related or not

stone marlin Mar 3, 2022, 4:39 PM

#

Yeah, that seems like something you could do in a table, but I'm not sure how things like icons of flags or highlighting work in Streamlit.

desert oar Mar 3, 2022, 4:39 PM

#

this sounds like it might be easier to just roll your own flask app or something

#

the requirement seems pretty straightforward, just a yes/no indicator next to each column name, and an expandable <details> element w/ specific information about what failed if anything

#

maybe a way to export some report as a text document or json or whatever

stone marlin Mar 3, 2022, 4:40 PM

#

In the flask app, they'd still have to use some js framework like data.table or something. I think Streamlit has this built-in if the results are in pandas.

#

Either way is probably fine, though.

desert oar Mar 3, 2022, 4:41 PM

#

would they? you could render it statically in an html table

stone marlin Mar 3, 2022, 4:41 PM

#

To get an interactive table with filters?

desert oar Mar 3, 2022, 4:41 PM

#

oh i missed that they wanted it to be interactive

#

this is gonna sound stupid but... have you considered generating an excel workbook?

stone marlin Mar 3, 2022, 4:42 PM

#

Yeah, that's the only reason I'd recommend SL instead of just rollin' their own. https://datatables.net/ is very powerful, but also --- can be frustrating to work with.

#

Haha, that's not a bad idea either. And pandas, also, has a default exporter for excel.

desert oar Mar 3, 2022, 4:42 PM

#

yeah ive generated pretty sophisticated reports that way

stone marlin Mar 3, 2022, 4:42 PM

#

Yeah, it's actually really cool. A lot of people require excel, so it's a pretty nice thing they put in. :']

acoustic crow Mar 3, 2022, 4:44 PM

#

So basically extract the results of the data validation into an excel spreadsheet and just structure it there?

desert oar Mar 3, 2022, 4:44 PM

#

im a bit surprised there arent convenient and light-weight "off the shelf" libraries for sortable/filterable tables

desert oar Mar 3, 2022, 4:44 PM

#

acoustic crow So basically extract the results of the data validation into an excel spreadshee...

yeah pretty much, instead of making a webpage

#

that said, i feel like everyone's first web app is a table

#

so it can't be that hard

#

not that there's anything wrong with streamlit either

acoustic crow Mar 3, 2022, 4:45 PM

#

I am just not good with web dev, not much experience and I have no idea how to make a template and later feed data into that template and so on

desert oar Mar 3, 2022, 4:45 PM

#

i'd go with excel then personally, if that meets the requirements

stone marlin Mar 3, 2022, 4:45 PM

#

If you want to get better at webdev, try that out. Otherwise, there's some other good options here. :']

agile cobalt Mar 3, 2022, 4:45 PM

#

desert oar im a bit surprised there arent convenient and light-weight "off the shelf" libra...

https://dash.plotly.com/datatable/interactivity exists... but even I admit it doesn't really meets these criteria all that well

desert oar Mar 3, 2022, 4:46 PM

#

weirdly i dont even see a "table" widget here https://streamlit.io/components?category=widget

Components • Streamlit

Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful web apps in minutes.

acoustic crow Mar 3, 2022, 4:47 PM

#

I researched dash and I dont think it meets the requirements that are necessary. Streamlit sort of does. Excel seems like a good idea, but so does the web populating. I am down to learn new things but for the sake of graduating I am not sure which is the correct course of action

agile cobalt Mar 3, 2022, 4:48 PM

#

excel is fine tbh

#

you can even use xlslwriter / openpyxl to format it nicely for reports

sterile rivet Mar 3, 2022, 4:48 PM

#

https://prnt.sc/xE7nXBAy0zyg

Labels consist of 3 items, together them 3 makes around 12k datapoints. The graph yall see above is correct,
1st item has 4.2k points
2nd item has 5.3k points( As yall can see, they added 5.3k on the 4.2k graph) How can I avoid this?
3rd item has 2.5k points which is again added over 1 and 2, what to do to make their bar plots separately?

Lightshot

Screenshot

Captured with Lightshot

acoustic crow Mar 3, 2022, 4:48 PM

#

i do appreciate the ideas, I think excel might be a good option as well at this point

#

Thank you for the ideas, people! If anybody has any other input that they would like to share, I'd love to discuss it!

stone marlin Mar 3, 2022, 4:54 PM

#

desert oar weirdly i dont even see a "table" widget here https://streamlit.io/components?ca...

I'm not sure why they call some stuff widgets and some stuff not, but it's done with

st.dataframe(my_dataframe)
st.table(data.iloc[0:10])

#

https://docs.streamlit.io/library/cheatsheet Here's most of the stuff they have.

desert oar Mar 3, 2022, 4:54 PM

#

acoustic crow I researched dash and I dont think it meets the requirements that are necessary....

for the sake of graduating I am not sure which is the correct course of action
"whatever is easiest for you" is the best option here imo

uneven flame Mar 3, 2022, 4:55 PM

#

Hi! If anyone has experience with Federated Learning implementations on custom image dataset, or any experience with TFF or even FLOWER or sth else, we need some advice/help to get started. Please dm me.
https://www.tensorflow.org/federated/federated_learning

TensorFlow

Federated Learning | TensorFlow Federated

desert oar Mar 3, 2022, 4:55 PM

#

sterile rivet https://prnt.sc/xE7nXBAy0zyg Labels consist of 3 items, together them 3 makes a...

is it correct or is it incorrect? who is "they"? can you provide more context for this question?

sterile rivet Mar 3, 2022, 5:01 PM

#

desert oar is it correct or is it incorrect? who is "they"? can you provide more context fo...

So, New York(0), London(1) and Paris(2) has 4723, 5341, 2510 points respectively, and these are together merged in label(which is my x-axis here), together these make around 12k points.
I wanted to plot a bar chart for each label individually, the bar chart for New York is correct(as you could see in the graph) .
London(1) has 5.3k data points and it is supposed to show 5.3k in the graph above, but it is the addition of NewYork + London and addition of all 3 (which is 12k as shown in the graph) in the Paris barplot.
How can I plot them individually?

desert oar Mar 3, 2022, 5:03 PM

#

sterile rivet So, New York(0), London(1) and Paris(2) has 4723, 5341, 2510 points respectively...

post the code where you defined your data

#

cities = pd.Series({
    'New York': 4723,
    'London': 5341,
    'Paris': 2510,
})

cities.plot.bar()
plt.show()

#

should be as easy as that

arctic wedgeBOT Mar 3, 2022, 5:09 PM

#

Hey @sterile rivet!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

sterile rivet Mar 3, 2022, 5:12 PM

#

desert oar post the code where you defined your data

Here

https://github.com/avonis3/Twitter-classification-project/blob/main/Twitter project part 2.ipynb

GitHub

Twitter-classification-project/Twitter project part 2.ipynb at main...

Predicting if tweets will go viral or nah. Contribute to avonis3/Twitter-classification-project development by creating an account on GitHub.

desert oar Mar 3, 2022, 5:14 PM

#

labels = [0] * len(new_york_text) + [1] * len(london_text) + [2] * len(paris_text)

what did you expect?

#

matplotlib bar just takes counts and positions

#

you're way overthinking this

#

why are you converting these to lists?

sterile rivet Mar 3, 2022, 5:18 PM

#

desert oar why are you converting these to lists?

Bc these are all actual tweets

desert oar Mar 3, 2022, 5:19 PM

#

sterile rivet Bc these are all actual tweets

so?

#

this code makes no sense to me

#

you just need to get the length of each dataframe and put it in a bar chart

#

it looks like you're trying to do a bunch of complicated stuff that you don't need to do

sterile rivet Mar 3, 2022, 5:24 PM

#

desert oar this code makes no sense to me

uh, this is actually a project with some assigned tasks, plotting a bar chart isnt a task but I am still trying to plot one for practicing matplolib.
3 different datasets are given according to the areas, and I am supposed to make a system which predicts whether a tweet was sent from any of the 3 cities.

desert oar Mar 3, 2022, 5:25 PM

#

even so. i think you are way overthinking this plot here

sterile rivet Mar 3, 2022, 5:25 PM

#

desert oar so?

I was getting a Value Error

desert oar Mar 3, 2022, 5:25 PM

#

what is the simplest possible way this could work?

#

just put the 3 lengths in a list...

#

sizes = [
    len(new_york_tweets),
    len(london_tweets),
    len(paris_tweets),
]
labels = ["NY", "London", "Paris"]
plt.bar(range(len(sizes)), sizes, tick_label=labels)

sterile rivet Mar 3, 2022, 5:26 PM

#

desert oar just put the 3 lengths in a list...

Yep, that's what I did, I converted all the tweets and put it into 1 big list.

desert oar Mar 3, 2022, 5:27 PM

#

but that isn't what i am saying to do

#

i'm saying to just get the length of each group of tweets individually

#

and just plot them

#

look at my code

#

it couldn't get any simpler

#

you're trying to do something much fancier and more complicated than you need to

#

simple is good

sterile rivet Mar 3, 2022, 5:28 PM

#

desert oar look at my code

Yep! I got it now, ty!

neat anvil Mar 3, 2022, 5:37 PM

#

acoustic crow i do appreciate the ideas, I think excel might be a good option as well at this ...

If you want to output stuff to excel From python, the xlsxwriter library is incredible. I use it extensively for professional workflows to dynamically generate templated spreadsheets. It’s pretty easy I actually find it easier to make a complex spreadsheet using xlsxwriter than to make it using excel.

#

https://xlsxwriter.readthedocs.io/

acoustic crow Mar 3, 2022, 5:39 PM

#

neat anvil If you want to output stuff to excel From python, the xlsxwriter library is incr...

I'm not sure exactly how these libraries work, because as I said I am rather new to Python and still figuring my way through it. So I would need to look into that library and what exactly does

#

But thank you for the tip

#

Do I basically create the format of the excel file through this library or how exactly?

urban lance Mar 3, 2022, 5:39 PM

#

how can I save the result of a df.groupby to a new dataframe?

neat anvil Mar 3, 2022, 5:44 PM

#

acoustic crow Do I basically create the format of the excel file through this library or how e...

It allows you to write excel files from python. Put data or formulas into cells, create filters, lock down certain sheets, graphs , everything. So you can use python to hit your apis or database or whatever, then make a dope excel out of the data you've collected. Once you write the code right once it’s automated and you can just run the python code every week or whatever to generate the dope spreadsheet with no work

#

But if your dataset is easily imported directly into excel, it may be kind of pointless to do anything in python

#

Except as a learning exercise for yourself

daring frost Mar 3, 2022, 5:59 PM

#

urban lance how can I save the result of a df.groupby to a new dataframe?

Let's say this is your DataFrame

df = pd.DataFrame({
    'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
    'Max Speed': [380., 370., 24., 26.]})

reset_index() can be used to return a DataFrame based on your grouping

df.groupby(["Animal"]).size().reset_index()

serene scaffold Mar 3, 2022, 6:02 PM

#

urban lance how can I save the result of a df.groupby to a new dataframe?

so df.groupby returns a "grouped dataframe", which is like a bag of dataframes where each dataframe is one group. you have to do some operation on the "bag" to reduce them back to one dataframe

urban lance Mar 3, 2022, 6:04 PM

#

serene scaffold so `df.groupby` returns a "grouped dataframe", which is like a bag of dataframes...

I don't wanted to save the groups as a new dataframe

#

I might have what I need now 🙏

#

the code doesn't look nice, but if it works it works

serene scaffold Mar 3, 2022, 6:04 PM

#

show code

urban lance Mar 3, 2022, 6:04 PM

#

been struggling with this for way to long

urban lance Mar 3, 2022, 6:04 PM

#

serene scaffold show code

I'd rather not, it's embarrasing 😅

serene scaffold Mar 3, 2022, 6:06 PM

#

if you're willing to swallow your pride, I can suggest improvements. up to you.

urban lance Mar 3, 2022, 6:07 PM

#

alright then, I'll get back to you if I'm sure my jank worked

#

gimme 30min

serene scaffold Mar 3, 2022, 6:07 PM

#

I might be doing something else by then. we'll see

urban lance Mar 3, 2022, 6:07 PM

#

No worries, I have time. There is no need for a quick reply

#

@serene scaffold actually here is what I'm going right now 😅
The writing an reading csv part, does exactly what I want. Of course it's not very efficient

df = pd.DataFrame(df).reset_index()
df.to_csv("chunk_processed_csv.csv", index=False, encoding='utf-8-sig')
df = pd.read_csv("chunk_processed_csv.csv")
df = df.iloc[1: , :]

serene scaffold Mar 3, 2022, 6:14 PM

#

what is this supposed to do? you're trying to "get rid of the index"?

#

because you can't--every dataframe always has an index no matter what

#

if you just don't want to look at the index, that is doable.

grand vapor Mar 3, 2022, 6:29 PM

#

Hey everyone, I’m trying to read and store H5 file data in pandas dataframes. I have 8 H5 files each around 3GB. So, it’s a lot of data. I can do this successfully, but it freezes my computer and takes a very long time. I’m wondering, is there a more efficient and less memory-taxing way of doing this? Should I convert from H5 to another format like CSV or Parquet or pickle?

urban lance Mar 3, 2022, 6:38 PM

#

serene scaffold because you can't--every dataframe always has an index no matter what

What no

#

I'll show ya step by step

serene scaffold Mar 3, 2022, 6:41 PM

#

urban lance What no

the dataframe always has an index. there's no way around that. you can just choose to not print it.

urban lance Mar 3, 2022, 6:41 PM

#

we're not talking about any index

serene scaffold Mar 3, 2022, 6:43 PM

#

urban lance <@!253696366952316929> actually here is what I'm going right now 😅 The writing...

it seems that the whole point of all of this is just to get away from there being an index

urban lance Mar 3, 2022, 6:45 PM

#

I'll show you what I'm doing

serene scaffold Mar 3, 2022, 6:45 PM

#

Sure. I have about five minutes.

urban lance Mar 3, 2022, 6:47 PM

#

my program won't finish in that time

serene scaffold Mar 3, 2022, 6:47 PM

#

Alright, good luck!

urban lance Mar 3, 2022, 6:49 PM

#

it works anyways

#

not very fast, but it works

#

actually

#

This is what I have

#

and this is what I want

#

#

@serene scaffold

#

does that explain what I'm trying to do 🤔

#

I wanna save the groups I made as a new dataframe (and not ungroup them)

serene scaffold Mar 3, 2022, 6:55 PM

#

urban lance I wanna save the groups I made as a new dataframe (and not ungroup them)

you could make the group name a new column, I guess?

#

I can't really tell what your data model is.

urban lance Mar 3, 2022, 6:57 PM

#

once my code has finished running, I'll go through what I'm doing

#

today was the first time I used groupby so I know next to nothing about it

#

this is what the columns look like after grouping
(the data is sensitive so I'm not showing this)

#

then I reset the index with

df = pd.DataFrame(df).reset_index()

#

when I now write the file to a csv, and read it back in
the lambda, min max and sum level somehow becomes part of the data 🤷‍♂️

#

note that the data is still grouped in this state

#

Then I drop the first row with this state
drop the "drived_tstamp" and rename the other 2 to "min" and "max" respectivelely

#

and then I got the data exaclty the way I wanted, still grouped but without the lambda, min max and sum level

twin hound Mar 3, 2022, 7:18 PM

#


# Apply standardized scaling to the training and test data, but only fit the training set

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# SVM model with parameters adjusted for maximum optimization
rand_list = {"C": stats.uniform(1, 100),
             "gamma": stats.uniform(0.1, 1)}
svm_model = SVC(max_iter = 5000,kernel='rbf',C=97.366, gamma=0.4834)

# Perform Randomized Search for hyper parameterization

clf = RandomizedSearchCV(svm_model, param_distributions = rand_list, random_state = 0)
search = clf.fit(X_train, y_train)
params = search.best_params_
print(params)
print(search.score)

# Fit model with data and perform prediction

svm_fit = svm_model.fit(X_train, y_train)
prediction = svm_model.predict(X_test) # Model prediction using testing set


# Use the score metric for evaluation of the model accuracy

score_train = svm_model.score(X_train,y_train)
score_test = svm_model.score(X_test,y_test)

print(score_train)
print(score_test)

# Perform k-fold cross validation to optimize the model and reduce bias/variance
# Number of folds

k = 5
kf = StratifiedKFold(n_splits=k, shuffle = False, random_state = None)

# Model Prediction with k-fold cross-validation using testing set

prediction_kf = cross_val_predict(svm_fit,X_test,y_test,cv = k)

# K-fold cross validation on the training/validation set
k_score_train = cross_val_score(svm_fit,X_train,y_train,cv = k)

# K-fold cross validation on the testing set
k_score_test = cross_val_score(svm_fit,X_test,y_test,cv = k)

mean_accuracy_train = np.average(k_score_train)
mean_accuracy_test = np.average(k_score_test)

print(mean_accuracy_train)
print(mean_accuracy_test)

#

someone please help I have bad overfitting. note that my model is highly non-linear

neon heart Mar 3, 2022, 7:34 PM

#

I can not figure out why this group is splitting like this -

Screen_Shot_2022-03-03_at_2.33.20_PM.png

twin hound Mar 3, 2022, 7:35 PM

#

can someone help me 1 on 1? i can explain it easily through voice

misty flint Mar 3, 2022, 7:43 PM

#

have you tried looking at the data columns individually. if i saw this, i would slice only IndustrySubsector just to double check

neon heart Mar 3, 2022, 7:44 PM

#

misty flint have you tried looking at the data columns individually. if i saw this, i would ...

Never mind it looked to be some sort of error in the string field - possibly spacing issue, I fixed it with this -

neat anvil Mar 3, 2022, 7:49 PM

#

twin hound ```py # Apply standardized scaling to the training and test data, but only fit ...

you're not actually using the search.\_best\_params in your "final" svm_model

twin hound Mar 3, 2022, 7:50 PM

#

neat anvil you're not actually using the `search.\_best\_params` in your "final" svm_model

Yea I know i just input the values to do a quick check

#

it didnt change

neat anvil Mar 3, 2022, 7:50 PM

#

svm_fit = svm_model.fit(X_train, y_train, **params)

twin hound Mar 3, 2022, 7:50 PM

#

my problem isnt with the parameters

#

I think the data is just too nonlinear

neat anvil Mar 3, 2022, 7:50 PM

#

okay well can you understand that pasting code in here that's not the code you actually used is not very helpful

twin hound Mar 3, 2022, 7:51 PM

#

that is the code I used

#

the C and gamma values I changed manually

#

in the code I posted. those were the results from the random searhc

neon heart Mar 3, 2022, 7:51 PM

#

neon heart Never mind it looked to be some sort of error in the string field - possibly spa...

**

Screen_Shot_2022-03-03_at_2.50.36_PM.png

neat anvil Mar 3, 2022, 7:52 PM

#

twin hound in the code I posted. those were the results from the random searhc

ah. That's confusing.

twin hound Mar 3, 2022, 7:53 PM

#

anyway I tried your fix its giving an error

#

neat anvil Mar 3, 2022, 7:56 PM

#

O woops

#

yeah it'd be svm_fit = SVC(max_iter = 5000,kernel='rbf',**params).fit(X_train, y_train)

twin hound Mar 3, 2022, 7:59 PM

#

ok yea it worked now

#

but it still has bad overfitting

#

not sure why test set has terrible score. Could I just do private chat with u, im sure u could help easily if u understood the data

neat anvil Mar 3, 2022, 8:05 PM

#

@twin hound there's parameters you're not tuning in the hyperparameter search- the kernel and max_iter. Try adding those to the search.

twin hound Mar 3, 2022, 8:09 PM

#

how would I add kernel

#

ive tried all the kernels rbf is the best

cinder thicket Mar 3, 2022, 8:13 PM

#

(from #python-discussion )hello, new here
is there any way to do Shape From shading in python, if so, how do i do it?
i want to make DEMs for many of the solar system's moons with the image data avalable

neat anvil Mar 3, 2022, 8:14 PM

#

rand_list = {
    "C": stats.uniform(1, 100),
    "gamma": stats.uniform(0.1, 1),
    "max_iter": stats.uniform(1,5000),
    "kernel": ["rbf", "opt2", "opt3"],
}

@twin hound

cinder thicket Mar 3, 2022, 8:14 PM

#

cinder thicket (from <#267624335836053506> )hello, new here is there any way to do Shape From s...

like turn this image here into a usable height map/DEM that can be used in space programs, 3d modeling, etc

neat anvil Mar 3, 2022, 8:15 PM

#

twin hound ive tried all the kernels rbf is the best

There's no way to judge rbf is best in isolation from changes in the other hyperparameters. maybe when max_iters is 5000 rbf is best, but if max_iters is a different value, some other kernel may be best.

twin hound Mar 3, 2022, 8:18 PM

#

ok well anyway to summarize my issue, even after applying all of this tuning, the score on the training set is really high (0.96-0.997) but the testing set doesnt change (0.5-0.7) and when I apply kfold cross validation the training set ranges from 0.6-0.7 and the test set ranges from 0.4-0.5

#

my issue is it seems the model only works well with the training set

#

for 2 days Ive played around with the parameters. literally have changed everything tried many combinations, grid search, rand search, etc. my main issue is just what I said above. Wondering if you know why this occurs usually

tacit basin Mar 3, 2022, 8:24 PM

#

Rather low number. What algorithm is that?

neat anvil Mar 3, 2022, 8:26 PM

#

twin hound for 2 days Ive played around with the parameters. literally have changed everyth...

if you're not optimizing all the parameters in a properly set-up cross-validated search, that's my #1 guess as to why you're getting unexpectedly bad performance on the test set. #2 guess is the data in the test set is just too different from the data in the train set.

twin hound Mar 3, 2022, 8:27 PM

#

how do I set up a good cross validated search?

#

I'm not sure it's #2 because I performed the same analysis using train_test_split to just verify if the test data was bad. but train_test_split gave same results

#

like given the code, what would u do

#

with your experience

tacit basin Mar 3, 2022, 8:30 PM

#

twin hound I'm not sure it's #2 because I performed the same analysis using train_test_spli...

What is your data? Does it have time factor that you can't randomly select train test split

twin hound Mar 3, 2022, 8:30 PM

#

heres a sample of the training and test

#

with the model on it

twin hound Mar 3, 2022, 8:31 PM

#

tacit basin What is your data? Does it have time factor that you can't randomly select train...

its already provided training data, and already provided testing data from excel file. 8 inputs and 1 output with [0,1,2,3,4] classifiers

neat anvil Mar 3, 2022, 8:31 PM

#

twin hound how do I set up a good cross validated search?

that's what I'm trying to tell you to do. Use all the parameters with reasonable search ranges in the RandomSearchCV. Make sure the number of cross-validation folds makes sense

twin hound Mar 3, 2022, 8:32 PM

#

could it be because the training set is small?

#

its only [750,8]

#

test set is [150,8]

neat anvil Mar 3, 2022, 8:32 PM

#

you're not properly doing the cross-validated hyperparameter search, so your model is overfit to the Training data.

tacit basin Mar 3, 2022, 8:32 PM

#

twin hound its already provided training data, and already provided testing data from excel...

I missed it. Anyway does it have time related factor?

twin hound Mar 3, 2022, 8:33 PM

#

tacit basin I missed it. Anyway does it have time related factor?

no

twin hound Mar 3, 2022, 8:33 PM

#

neat anvil you're not properly doing the cross-validated hyperparameter search, so your mod...

I dont understand how I am not doing it correctly

twin hound Mar 3, 2022, 8:33 PM

#

neat anvil you're not properly doing the cross-validated hyperparameter search, so your mod...

am I supposed to literally just randomly try every single parameter and hope for thes best. I feel thats ineficient

tacit basin Mar 3, 2022, 8:33 PM

#

twin hound no

Great. Can you send link to your data again?

twin hound Mar 3, 2022, 8:33 PM

#

ok sure let me send my data

#

how do I send excel data here?

neat anvil Mar 3, 2022, 8:34 PM

#

neat anvil ```py rand_list = { "C": stats.uniform(1, 100), "gamma": stats.uniform(0...

you need to simultaneously run hyperparameter search for as many different parameters at once. Like this

tacit basin Mar 3, 2022, 8:34 PM

#

twin hound how do I send excel data here?

Don't know. Can be just couple of rows

neat anvil Mar 3, 2022, 8:35 PM

#

twin hound am I supposed to literally just randomly try every single parameter and hope for...

this is an astute observation. That's why more recent versions of SKLearn added hyperparameter search functions that learn as they go: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.HalvingRandomSearchCV.html#sklearn.model_selection.HalvingRandomSearchCV

scikit-learn

sklearn.model_selection.HalvingRandomSearchCV

Examples using sklearn.model_selection.HalvingRandomSearchCV: Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.24, Prediction Intervals for Gradient Boosting Regressio...

tacit basin Mar 3, 2022, 8:35 PM

#

twin hound am I supposed to literally just randomly try every single parameter and hope for...

Random search has its benefits over grid search. But you can try both

neat anvil Mar 3, 2022, 8:35 PM

#

you can also try the hyperopt library, which uses different learning algorithms than sklearn for hyperparameter optimization

#

But random search is quite powerful. Your model and data are both small, so randomly searching a few hundred options and picking the best is likely to find a very good solution in a relatively short amount of computation time.

twin hound Mar 3, 2022, 8:37 PM

#

so do I just use random search with every single parameter?

#

theres like 30

neat anvil Mar 3, 2022, 8:39 PM

#

probably not all 30. that's where expertise comes in. You need to understand how SVMs work, how the training algorithm works, and how your data is interacting with those things. Then you can pick reasonable choices for which parameters to search and reasonable ranges for them.

#

just 2 parameters is clearly not enough, since you are reporting such a large difference b/w train and test performance

tacit basin Mar 3, 2022, 8:40 PM

#

twin hound theres like 30

If your data is small training probably doesn't take long. So you search a lot :). You can use some auto ml library as well.

twin hound Mar 3, 2022, 8:41 PM

#

well I know for svm regularization, kernel and gamma have high impact

#

what if I change my input features

#

instead of just putting the x data as is, make a relationship between them to reduce dimensions

neat anvil Mar 3, 2022, 8:43 PM

#

that would have a large impact on your model, yes

#

maybe good, maybe bad

twin hound Mar 3, 2022, 8:43 PM

#

#

this is the data fyi

#

so for example reduce dimensions by going water/cement and coarse/fine aggregate. than

#

i feel my input data is really bad and has bad bias

tacit basin Mar 3, 2022, 8:46 PM

#

twin hound

What is age?

twin hound Mar 3, 2022, 8:47 PM

#

its the setting time of the concrete

#

different concretes have different setting times because it highly affects compressive strength

#

its classified as concrete age

tacit basin Mar 3, 2022, 8:49 PM

#

twin hound different concretes have different setting times because it highly affects compr...

Do you scale the data before fitting the model?

twin hound Mar 3, 2022, 8:50 PM

#

yea its all scaled

tacit basin Mar 3, 2022, 8:50 PM

#

What are zeros in some rows?

twin hound Mar 3, 2022, 8:50 PM

#

#data-science-and-ml message

tacit basin Mar 3, 2022, 8:50 PM

#

twin hound yea its all scaled

In the excel is not scaled right?

twin hound Mar 3, 2022, 8:50 PM

#

the excel is not

#

0 just means it has no value for that certain input

#

like for example no fly ash for the third sample

tacit basin Mar 3, 2022, 8:53 PM

#

twin hound 0 just means it has no value for that certain input

Is that missing value?

twin hound Mar 3, 2022, 8:53 PM

#

no it means there is no "amount" of that parameter in the concrete mix

#

essentially what you see is 8 inputs which are different materials for concrete mix and 1 output which is the strength class

#

some concretes have no fly ash, plasticizer, etc. so it has a value of 0

tacit basin Mar 3, 2022, 8:56 PM

#

twin hound some concretes have no fly ash, plasticizer, etc. so it has a value of 0

I see.
Regarding the y class. Is it balanced?

#

Or do we have more samples with certain class?

twin hound Mar 3, 2022, 8:57 PM

#

tacit basin Mar 3, 2022, 8:57 PM

#

Not bad

#

Ok.why SVC? :)

twin hound Mar 3, 2022, 8:57 PM

#

heres also an example of what the data looks like if we plot the first and 2nd inputs against eachother. most of them look random like this

#

#

the colors are just the different classes

#

I have to use ANN and SVC

#

its for a project thats why : (

#

I am getting the same issue with ANN if you are wondering

#

MLP to be specific

tacit basin Mar 3, 2022, 9:01 PM

#

Ah
Ok. All seems fine what you showed me.
I hate machine learning 🤪

twin hound Mar 3, 2022, 9:04 PM

#

I know man... thats why im here 😭

tacit basin Mar 3, 2022, 9:08 PM

#

So this seems like overfitting. What can cause overfitting in SVC?

iron basalt Mar 3, 2022, 9:17 PM

#

Having your product revolve around one big model is a fundemental strategy for "unicorn" companies. The idea being to find some niche which has yet to be automated (low hanging pre-computerization fruit) and then automate it with a website + maybe an ML model. They often call themselves "tech" companies (using tech does not make the company a tech company) and mostly pop up on the west coast of the US.

#

The goal is then to hype it up to infinite and sell when it's highly valued to a bigger "tech" company or a bank. And it works for some, and when it does it's very profitable so they keep trying.

serene scaffold Mar 3, 2022, 9:21 PM

#

Sorry, but we don't allow recruitment in this server.

wispy remnant Mar 3, 2022, 9:21 PM

#

ah apologies

#

I honestly do not know where to turn

serene scaffold Mar 3, 2022, 9:21 PM

#

I'm not really sure either. There's a Python job board on python.org.

wispy remnant Mar 3, 2022, 9:22 PM

#

as nobody will help me, and its vital I have someone who knows what they are doing to take on this task.

#

thanks

#

this channel would pertain to phyphox correct?

#

spectrum analysis etc.

serene scaffold Mar 3, 2022, 9:23 PM

#

idk what phyplox is, but this is the channel to discuss scientific computing in Python

wispy remnant Mar 3, 2022, 9:25 PM

#

it handles sensors in mobile devices, ranging across all types of the likes. mostly, I am looking for someone who knows a little bit about auditory and frequency spectrum analysis

#

here is an example

#

cinder thicket Mar 3, 2022, 9:27 PM

#

cinder thicket (from <#267624335836053506> )hello, new here is there any way to do Shape From s...

anyone see this?

twin hound Mar 3, 2022, 9:43 PM

#

tacit basin So this seems like overfitting. What can cause overfitting in SVC?

#

anyone know how to make stats.uniform select integers only?

fiery dust Mar 3, 2022, 9:47 PM

#

any online course to learn data science and data analysis?

#

thats good?

#

one or more

twin hound Mar 3, 2022, 10:00 PM

#

@neat anvilHow can I plot all of my predictions vs. actual (for example x1 with x_test1 vs. y for all inputs)

#

ty

neat anvil Mar 3, 2022, 10:01 PM

#

!d scipy.stats.randint

arctic wedgeBOT Mar 3, 2022, 10:01 PM

#

scipy.stats.randint


scipy.stats.randint = <scipy.stats._discrete_distns.randint_gen object>```
A uniform discrete random variable.

As an instance of the [`rv_discrete`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.rv_discrete.html#scipy.stats.rv_discrete "scipy.stats.rv_discrete") class, [`randint`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.randint.html#scipy.stats.randint "scipy.stats.randint") object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

Notes

The probability mass function for [`randint`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.randint.html#scipy.stats.randint "scipy.stats.randint") is:

\[f(k) = \frac{1}{\texttt{high} - \texttt{low}}\] for \(k \in \{\texttt{low}, \dots, \texttt{high} - 1\}\).

[`randint`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.randint.html#scipy.stats.randint "scipy.stats.randint") takes \(\texttt{low}\) and \(\texttt{high}\) as shape parameters...

neat anvil Mar 3, 2022, 10:01 PM

#

this bot

#

what is it doin

twin hound Mar 3, 2022, 10:05 PM

#

also how do I just ingore all warnings

#

neat anvil Mar 3, 2022, 10:07 PM

#

!d warnings.catch_warnings

arctic wedgeBOT Mar 3, 2022, 10:07 PM

#

warnings.catch\_warnings


class warnings.catch_warnings(*, record=False, module=None)```
A context manager that copies and, upon exit, restores the warnings filter and the [`showwarning()`](https://docs.python.org/3/library/warnings.html#warnings.showwarning "warnings.showwarning") function. If the *record* argument is [`False`](https://docs.python.org/3/library/constants.html#False "False") (the default) the context manager returns [`None`](https://docs.python.org/3/library/constants.html#None "None") on entry. If *record* is [`True`](https://docs.python.org/3/library/constants.html#True "True"), a list is returned that is progressively populated with objects as seen by a custom [`showwarning()`](https://docs.python.org/3/library/warnings.html#warnings.showwarning "warnings.showwarning") function (which also suppresses output to `sys.stdout`). Each object in the list has attributes with the same names as the arguments to [`showwarning()`](https://docs.python.org/3/library/warnings.html#warnings.showwarning "warnings.showwarning").

The *module* argument takes a module that will be used instead of the module returned when you import [`warnings`](https://docs.python.org/3/library/warnings.html#module-warnings "warnings: Issue warning messages and control their disposition.") whose filter will be protected. This argument exists primarily for testing the [`warnings`](https://docs.python.org/3/library/warnings.html#module-warnings "warnings: Issue warning messages and control their disposition.") module itself.

twin hound Mar 3, 2022, 10:09 PM

#

how do I apply it in code

#

warnings.catch_warnings(*, record=False, module=None)
```?

neat anvil Mar 3, 2022, 10:09 PM

#

follow the documentation link

#

it shows examples

twin hound Mar 3, 2022, 10:10 PM

#

ok got it thanks

#

how do I make the randomized search select the best parameters based on the score?

#

because everytime I run it it keeps changing

#

@neat anvil

strange zealot Mar 3, 2022, 10:23 PM

#

i have this data set i want to check what are the survival chances of people with same tickets

#

could someone help

neat anvil Mar 3, 2022, 10:23 PM

#

It is selecting the best parameters based on cross-validation score

#

if it's changing every time that could mean a couple of things: your search space has many roughly equivalent optima (if the CV scores of many of the random models are around a similar reasonable value) OR you've selected the validation splits in a way that makes it difficult to get a reliable score (if the CV scores of many of the random models near 100%) OR the training data is so messy there is no way to achieve a good model with this type of model (if the CV score of many of the random models are low) OR your scoring metric is ill-defined OR the training data is so messy it's not much better than training on random noise, so you just get random parameters out (they're different each time you run it b/c it randomizes how it splits the data and the params)

#

those (if whatever) conditions are kind of hand-wavey, not for certain

#

but those are some signals and possible explanations

twin hound Mar 3, 2022, 10:29 PM

#

damn ok I see

#

the CV scores are calculated in the background correct?

neat anvil Mar 3, 2022, 10:30 PM

#

I'd recommend trying a much, much simpler model. Like just a basic logistic regression.

#

If you can't fit it with decent accuracy on data that simple

twin hound Mar 3, 2022, 10:30 PM

#

I would but the problem is this is for a project where SVM and MLP needs to be used

neat anvil Mar 3, 2022, 10:30 PM

#

more complex models aren't going to do much better.

#

well, it can give you a baseline expectation of what is reasonable

twin hound Mar 3, 2022, 10:32 PM

#

its all good I appreciate youre help. Im meeting with my prof today to help my sorry ass

neat anvil Mar 3, 2022, 10:32 PM

#

always a good idea

twin hound Mar 3, 2022, 10:32 PM

#

yea thanks man

haughty ibex Mar 4, 2022, 2:27 AM

#

search = [] for values in df['data']: search.append(re.search(r'\d{7}[N]\d{7}[E]', values).group(0).rstrip()) print(search)

Hello everybody i have this regex. I'm trying to search through one of the columns in my dataframe and return the string not the match object. i know i need to use group to achieve this however on some occasions throughout my dataframe re.search will return none. and group() will crash saying 'NoneType' object has no attribute 'group' i saw somewhere that group(0) should get rid of the nones but it didn't work. I know i can fix this with a try: except: block but im trying to find a different solution.

serene scaffold Mar 4, 2022, 2:37 AM

#

@haughty ibex did you try Series.str.find?

#

!docs pandas.Series.str.find

#

oh that's the wrong one. must be extract

#

!docs pandas.Series.str.extract

arctic wedgeBOT Mar 4, 2022, 2:38 AM

#

pandas.Series.str.extract


Series.str.extract(pat, flags=0, expand=True)```
Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.

serene scaffold Mar 4, 2022, 2:39 AM

#

you'll also have to put parentheses around the part of the pattern you want to keep. which I guess will be all of it.

#

try to figure it out, and if you can't, I will show you the solution @haughty ibex

haughty ibex Mar 4, 2022, 2:51 AM

#

df['report text'].str.extract(r'\d{7}[N]\d{7}[E]')

#

getting ValueError: pattern contains no capture groups

serene scaffold Mar 4, 2022, 2:55 AM

#

@haughty ibex it extracts a capture group, so you have to put the whole thing in parentheses, if you want that

#

though it looks like there are two parts to this pattern, \d{7}[N] and \d{7}[E]

#

so you could get that information in two columns automatically, if you wanted.

#

>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')
   letter digit
0      a     1
1      b     2
2    NaN   NaN

haughty ibex Mar 4, 2022, 2:57 AM

#

oh ok i think i got it to work but now im getting Length of values (0) does not match length of index (11)

#

in my test csv file i have 3 rows that would contain no matches for my re to test it out.

serene scaffold Mar 4, 2022, 2:57 AM

#

please show what you changed the code to and the whole error message starting from Traceback.

haughty ibex Mar 4, 2022, 3:02 AM

#

ok sorry the traceback was something i forgot to comment out while testing out the changes

#

df['pattern match'] = df['data'].str.extract(r'(\d{7}[N]\d{7}[E])')

serene scaffold Mar 4, 2022, 3:02 AM

#

yay

haughty ibex Mar 4, 2022, 3:02 AM

#

is there a flag to not get NaN values and just have and empty cell

serene scaffold Mar 4, 2022, 3:03 AM

#

No, because NaN is an empty cell, basically

#

They're also the best way to represent missing data.

#

@haughty ibex make sense?

haughty ibex Mar 4, 2022, 3:11 AM

#

@serene scaffold yes. i appreciate the help. can i use multiple regex patterns.
i have two other regex patterns that im using to find some data in csv files

regex1 = r'\d{1,3}[thrd]([a-zA-Z]+( [a-zA-Z]+)+)[e]\s' regex2 = r'\d{1,3}([a-zA-Z]+( [a-zA-Z]+)+)\d\s+([a-zA-Z])+\b'

could i do something like:
df['pattern match'] = df['data'].str.extract(regex1,regex2)

im guessing its not that simple lol

#

regex1 = r'\d{1,3}[thrd]([a-zA-Z]+( [a-zA-Z]+)+)[e]\s' regex2 = r'\d{1,3}([a-zA-Z]+( [a-zA-Z]+)+)\d\s+([a-zA-Z])+\b' regex_list = [regex1, regex2] regex_search = [] for x in df['data']: for regex in regex_list: try: regex_search.append(re.search(regex, x).group().rstrip()) except: pass

i am currently doing this and it seems to be working just looking for a more optimized solution.

misty flint Mar 4, 2022, 3:19 AM

#

dang stelercus seems to know pandas inside out huh?

#

PikaThink

#

im impressed

serene scaffold Mar 4, 2022, 3:27 AM

#

haughty ibex `regex1 = r'\d{1,3}[thrd]([a-zA-Z]+( [a-zA-Z]+)+)[e]\s' regex2 = r'\d{1,3}([a-zA...

banish this for loop from your life

#

you can do df['data'].str.extract more than once and make more than one column, yes.

haughty ibex Mar 4, 2022, 3:28 AM

#

see i know it was trash

#

thats why i came here and asked lol

low jay Mar 4, 2022, 3:28 AM

#

Hi, how would you plot a 3D linear regression model from a dataframe?

serene scaffold Mar 4, 2022, 3:28 AM

#

@misty flint I actually don't know how I'd do that off the top of my head ^

haughty ibex Mar 4, 2022, 3:29 AM

#

@serene scaffold i want the matches to be in the same column so thats why i did the double for loop.

serene scaffold Mar 4, 2022, 3:29 AM

#

low jay Hi, how would you plot a 3D linear regression model from a dataframe?

how does one plot 3d data in general? I only did that once for a homework assignment two years ago.

#

and then forgot

low jay Mar 4, 2022, 3:30 AM

#

@serene scaffold I've been trying to look for solutions for it online but I genuinely don't understand it. Thank you tho.

serene scaffold Mar 4, 2022, 3:31 AM

#

low jay <@!253696366952316929> I've been trying to look for solutions for it online but ...

is the dataframe that you currently have multi-indexed or what?

serene scaffold Mar 4, 2022, 3:32 AM

#

haughty ibex <@!253696366952316929> i want the matches to be in the same column so thats why ...

why do you want that? what do the matches even represent?

low jay Mar 4, 2022, 3:33 AM

#

@serene scaffold This is what it looks like

serene scaffold Mar 4, 2022, 3:34 AM

#

df['pattern_match'] = ''
for pattern in [regex1, regex2]:
    df['pattern_match'] += df['data'].str.extract(pattern).fillna('')

#

you could do this, I guess @haughty ibex

serene scaffold Mar 4, 2022, 3:34 AM

#

low jay <@!253696366952316929> This is what it looks like

why are some of them NaNs?

low jay Mar 4, 2022, 3:36 AM

#

@serene scaffold Oh the dataframe is from a practical

serene scaffold Mar 4, 2022, 3:36 AM

#

low jay <@!253696366952316929> Oh the dataframe is from a practical

is practical a thing? anyway, do you want to ignore the rows with NaNs?

#

and what three columns are going to be the axes on the plot?

low jay Mar 4, 2022, 3:37 AM

#

@serene scaffold 1) Yes

#

The dependent variable will be the flipper length, bill length and depth will be the other 2 variables

misty flint Mar 4, 2022, 3:39 AM

#

serene scaffold <@!446424248479645706> I actually don't know how I'd do that off the top of my h...

tbh i wouldnt either lol

#

oh wait

#

i think i did it before in MATLAB

serene scaffold Mar 4, 2022, 3:39 AM

#

import matplotlib.pyplot as plt

x, y, z = df[['bill_length_mm', 'bill_depth_mm', 'filpper_length_mm']].dropna()

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z)

plt.show()

misty flint Mar 4, 2022, 3:39 AM

#

but thats MATLAB monkaCHRIST

serene scaffold Mar 4, 2022, 3:39 AM

#

try something like this

#

pd.concat(
    reduce(
        add,
        (df['data'].str.extract(pattern).fillna('') for pattern in (regex1, regex2))
    )
)

I made it slightly more lispy

steel geyser Mar 4, 2022, 4:47 AM

#

Looking for a library that can analyze a game that is being played on my stream. I’m a twitch streamer and want to be able to track say the number of kills I get in a particular game. What libraries would I look into to do that?

serene scaffold Mar 4, 2022, 4:48 AM

#

steel geyser Looking for a library that can analyze a game that is being played on my stream....

what's the simplest way to know that you got a kill? is there a kill count on screen?

misty flint Mar 4, 2022, 4:51 AM

#

this could end up being a relatively simple problem or something much more involved lol

serene scaffold Mar 4, 2022, 4:52 AM

#

yes. if the solution isn't something happening to a static UI element, any solution we come up with will probably need so much compute power that you won't be able to play your game.

misty flint Mar 4, 2022, 4:52 AM

#

video data is not something ive worked with personally but like you mentioned its def a hassle

serene scaffold Mar 4, 2022, 4:52 AM

#

but at least you have a GPU 😄

misty flint Mar 4, 2022, 4:53 AM

#

the groups ive seen work with it also complain about how much data is generated as well

#

so you def dont want to use all that data, just certain stills/frames if possible

steel geyser Mar 4, 2022, 4:55 AM

#

serene scaffold what's the simplest way to know that you got a kill? is there a kill count on sc...

Ya there is a kill count on screen.

misty flint Mar 4, 2022, 4:56 AM

#

ah perfect

#

the simple solution

#

DoggoKek

serene scaffold Mar 4, 2022, 4:56 AM

#

so, you need something that watches the pixels on that part of the screen, and any time they change, it needs to detect if the change is the number going up.

steel geyser Mar 4, 2022, 4:57 AM

#

serene scaffold so, you need something that watches the pixels on that part of the screen, and a...

Is that going to take up a lot of computing power? In my mind I was thinking of it taking a screen shot and then analyze it, see if it changed from last time then delete the screenshot.

misty flint Mar 4, 2022, 4:57 AM

#

my first instinct is to look into opencv and pytesseract

serene scaffold Mar 4, 2022, 4:58 AM

#

steel geyser Is that going to take up a lot of computing power? In my mind I was thinking of ...

you'd need to constantly be taking screenshots and analyzing them

misty flint Mar 4, 2022, 4:58 AM

#

at least thats off the top of my head

serene scaffold Mar 4, 2022, 4:58 AM

#

anyway, I don't do anything with images except maybe optical character recognition. so I don't even know if there are libraries that watch parts of a screen.

misty flint Mar 4, 2022, 4:59 AM

#

hmm i think ive seen an article about it once

#

there was a twitch streamer that did something similar

steel geyser Mar 4, 2022, 4:59 AM

#

Ok. I’m familiar with opencsv. Not so much pytesseract.

steel geyser Mar 4, 2022, 5:00 AM

#

misty flint there was a twitch streamer that did something similar

I figured someone somewhere has done it. Not trying to copy someone’s code or work but just wanted to see what libraries they used to do it.

misty flint Mar 4, 2022, 5:02 AM

#

ah i remember now, this was a high schooler on a podcast i listen to

#

https://open.spotify.com/episode/3zAvY6tnNeCT8XNX5Rb1XD

Spotify

How this High Schooler's Data Helped Break Twitch Streaming Records...

Listen to this episode from Ken's Nearest Neighbors on Spotify. Will is a junior in high school, he has been super involved with data science. He is innovating the data collected on twitch and esports. He self taught himself how to code from a young age and is now using what he has learned to create tools for esports and analyze data from twitch.

#

maybe you can find something by googling him

#

pithink

#

i think he ended up getting into a really good school bc of this

#

its been a while, i dont remember

steel geyser Mar 4, 2022, 5:04 AM

#

misty flint maybe you can find something by googling him

Thank you! I definitely will. Just looking at his podcasts I’m probably going to listen to all of his episodes.

misty flint Mar 4, 2022, 5:04 AM

#

lol the host is the data scientist, while the high school kid is the twitch streamer that was a guest on that episode

#

ID_BoomKek

#

but you can still listen

#

he has something interesting guests all across DS

#

some people work in all sorts of domains and fields

#

i think its most interesting hearing their background/journey

#

one was an olympic medalist before going into DS

#

another one was an ex-cultist

#

💀

#

anyway interesting stories tbh

steel geyser Mar 4, 2022, 5:09 AM

#

Ahh. Ok. I see. Well I appreciate it.

misty flint Mar 4, 2022, 5:10 AM

#

good luck bud. let me know if you end up getting it to work

#

i still think opencv will let you do something

iron basalt Mar 4, 2022, 5:21 AM

#

steel geyser Looking for a library that can analyze a game that is being played on my stream....

OpenCV with Pytesseract will probably just work.

#

PIL's image grab will work for getting the image: https://pillow.readthedocs.io/en/stable/reference/ImageGrab.html

ImageGrab Module

The ImageGrab module can be used to copy the contents of the screen or the clipboard to a PIL image memory.

#

If you generally know where the text is you probably want to only grab that region or it will be slow on larger resolutions.

steel geyser Mar 4, 2022, 5:30 AM

#

iron basalt If you generally know where the text is you probably want to only grab that regi...

Ok. Thank you!

frank quiver Mar 4, 2022, 6:21 AM

#

class AutoEncoder(nn.Module):
  def __init__(self):
    super(AutoEncoder, self).__init__()
    self.encoder = nn.Sequential(
            nn.Conv2d(55, 16, 3, stride=1, padding=1),  # b, 16, 10, 10
            nn.ReLU(True),
            nn.MaxPool2d(2, stride=1),  # b, 16, 5, 5
            nn.Conv2d(16, 8, 3, stride=1, padding=1),  # b, 8, 3, 3
            nn.ReLU(True),
            nn.MaxPool2d(2, stride=1)  # b, 8, 2, 2
        )
    self.decoder = nn.Sequential(
            nn.ConvTranspose2d(8, 16, 3, stride=1),  # b, 16, 5, 5
            nn.ReLU(True),
            nn.ConvTranspose2d(16, 8, 5, stride=1, padding=1),  # b, 8, 15, 15
            nn.ReLU(True),
            nn.ConvTranspose2d(8, 55, 2, stride=1, padding=1),  # b, 1, 28, 28
            nn.Tanh()
        )
  def forward(self, x):
    x = self.encoder(x)
    print(x.shape)
    x = self.decoder(x)
    print(x.shape)
    return x
``` I am getting error `/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:47: UserWarning: Using a target size (torch.Size([1, 55, 46, 46])) that is different to the input size (torch.Size([1, 55, 47, 47])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.` . My input size is `(1,55,46,46)` but i dont know why i am getting `[1, 55, 47, 47]` ?

rugged hawk Mar 4, 2022, 6:27 AM

#

Is there any way to get only last 3 months data?
The first row is latest month so what I did is: made a list of months
l1=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']

month=df['Date'].iloc[0]
curr_month=month[:3]
curr_index=l1.index(curr_month)
prev_month=l1[curr_index-1]
last_second_month=l1[curr_index-2]
month_list=[curr_month,prev_month,last_second_month]

so month_list gives me last 3 months including current, then I tried to find list elements in df column using df[df['Date'].str.contains('|'.join(month_list))]

#

but as you can see in the picture the last rows from df it contains last year Mar data. so it returning that data also. so How can I get the only latest last 3 months data

minor elbow Mar 4, 2022, 6:59 AM

#

u can use slice operator with dates, so assuming u have the date in variable x you can go df.loc[x:,:]

#

date indexs can get a bit intricate with pandas

#

u could resample the index to monthly frequency, take the 3rd last index of that, then make x the "yyyy-mm" string of that with strftime(), then use x with the slice index

#

in other news this is an interesting review from openai re gpt https://openai.com/blog/language-model-safety-and-misuse/

OpenAI

Lessons Learned on Language Model Safety and Misuse

The deployment of powerful AI systems has enriched our understanding of safety and misuse far more than would have been possible through research alone. Notably: API-based language model misuse often comes in different forms than we feared most. We have identified limitations in existing language model evaluations that we are

flint pendant Mar 4, 2022, 7:53 AM

#

How can I use Levenshtein.ratio to compared strings between 2 different columns in a dataframe? I have a dataframe with a few ten million rows and can't figure out how to get it to do the ratio of the strings in each row of the dataframe.

rugged hawk Mar 4, 2022, 7:53 AM

#

minor elbow u can use slice operator with dates, so assuming u have the date in variable x y...

I don't have year in data, as you can see the attached picture

pastel valley Mar 4, 2022, 8:13 AM

#

base_model = Sequential()

resnet50_model = tf.keras.applications.ResNet50(include_top=False,
                   input_shape=(144,144,3),
                   pooling='max',classes=6,
                   weights='imagenet')

for layer in resnet50_model.layers:
        layer.trainable=False

base_model.add(resnet50_model)

base_model.add(Flatten())
base_model.add(Dense(1024, activation='relu'))
base_model.add(Dense(512, activation='relu'))
base_model.add(Dense(256, activation='relu'))
base_model.add(Dense(6, activation='softmax'))

#

in this snippet there shows transfer learning right?

pastel valley Mar 4, 2022, 8:57 AM

#

but what if i just want to just use the architecture of resnet50 and i want to train it myself?

pastel valley Mar 4, 2022, 9:17 AM

#

non trainable parameters are units that are unchangable? isnt that bad?

#

how to prevent it?

urban lance Mar 4, 2022, 9:21 AM

#

I wouldn't worry too much, but I'm interested in the answer anyways

pastel valley Mar 4, 2022, 9:25 AM

#

how to diagnose this kind of thing on keras?

#

maybe those non trainables are from the resnet50?

tacit basin Mar 4, 2022, 9:27 AM

#

pastel valley non trainable parameters are units that are unchangable? isnt that bad?

The number of none trainable weights of the model comes from the BatchNormalization layers whose mean and variance vectors are updated via layer updates instead of backpropagation and therefore are considered as none trainable parameters.
https://github.com/experiencor/keras-yolo2/issues/167

GitHub

[SOLVED] What are the non-trainable parameters of the model? · Issu...

I found the answer to that question but I am posting it here in case someone is asking themselves the same question as it took me some time to figure it out. We consider in this example the Tiny Yo...

pastel valley Mar 4, 2022, 9:30 AM

#

tacit basin The number of none trainable weights of the model comes from the BatchNormalizat...

oh its normal and its form bn of resnet i see nice nice thank you

#

btw in this code

base_model = Sequential()

resnet50_model = tf.keras.applications.ResNet50(include_top=False,
                   input_shape=(144,144,3),
                   pooling='max',classes=6,
                   weights=None)

base_model.add(resnet50_model)

base_model.add(Flatten())
base_model.add(Dense(1024, activation='relu'))
base_model.add(Dense(512, activation='relu'))
base_model.add(Dense(256, activation='relu'))

base_model.add(Dense(6, activation='softmax'))

base_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=METRICS)

i want to just use the architecture of the resnet50 and train it myself with my own data and classes
i changed the input shape and added dense layer to the end is this it? did i implement what i wanted to do right?

pastel valley Mar 4, 2022, 10:39 AM

#

yo?

#

wew since i am training resnet from scratch there are alot of computations needed right? this will take alot of time

limber kelp Mar 4, 2022, 11:15 AM

#

I want to see if these two columns are related or not like if particular Branch always maps to particular city?

How can I check it?

#

using pandas

#

or any python lib

desert minnow Mar 4, 2022, 11:20 AM

#

Hello all, Im trying to build a ordinal classification model (basically ranking prediction). Can someone help me out in choosing model? Thanks 😊

tacit basin Mar 4, 2022, 11:28 AM

#

pastel valley wew since i am training resnet from scratch there are alot of computations neede...

644 is the number of minibatches in your dataset. it's the same regardless if you train from scratch or using transfer learning. It depends on your minibatch size and number of training images. training from scratch you probably will need more than 10 epochs.

pastel valley Mar 4, 2022, 11:33 AM

#

tacit basin 644 is the number of minibatches in your dataset. it's the same regardless if yo...

will 100 be a good number for epochs? btw it will take forever if i train this on my laptop so i tried this google collab

#

but my dataset is from may local drive do i need to upload it to collab?

tacit basin Mar 4, 2022, 11:41 AM

#

pastel valley but my dataset is from may local drive do i need to upload it to collab?

yes you would need to upload data to gdrive

pastel valley Mar 4, 2022, 11:49 AM

#

tacit basin yes you would need to upload data to gdrive

nice nice thank you

#

btw is this natural ?

tacit basin Mar 4, 2022, 11:59 AM

#

pastel valley btw is this natural ?

never trained from scratch. but early epochs that may be right since almost nothing is correct yet.
btw why don't you want to train using transfer learning?

pastel valley Mar 4, 2022, 1:02 PM

#

tacit basin never trained from scratch. but early epochs that may be right since almost noth...

i am doing experiment on image augmentations i want to compare if there will be performance boost or what and to compare them fairly i think using the same architecture and exactly the same initial weights will be good so i first created my own cnn architecture but i realized that doing this experiment on my own simple architecture is non sense because noone will ever use it so i decided to use a popular or one of the best architectures

#

using the architecture ill create 2 identical models and train them on classifying the same classes but with different data

#

like datasetA is with etc and datasetB with etc like that

#

does it make sense?

pastel valley Mar 4, 2022, 1:07 PM

#

pastel valley btw in this code ```python base_model = Sequential() resnet50_model = tf.keras...

btw how you use pre trained models? did i do it right? but in this i just dont copy the weights learned from the imagenet dataset so in short i just copied the architecture?

upper spindle Mar 4, 2022, 1:49 PM

#

i want to read in a csv file from a directory using this code eth = pd.read_csv("../EC331/combined_posts_comments_final.csv") but it doesnt seem to work

pastel valley Mar 4, 2022, 1:54 PM

#

@tacit basin
btw how about this ?
what does it mean its learning but maybe it needs more epochs to get better validation?]

neat anvil Mar 4, 2022, 2:19 PM

#

honestly @pastel valley these questions about data augmentation, transfer learning, and deep learning model architecture are quite complicated to answer and get to the root of a lot of fundamentals of deep learning. You'd probably be best served taking some courses and building up your fundamentals in math and stats IMO.

#

and I mean sounds like you're curious enough about the topic that you'd probably enjoy the courses

urban lance Mar 4, 2022, 2:44 PM

#

can someone explain to my how this "sum" param works exaclty? I'm having some strange results

df.groupby(["user",pd.Grouper(key="timestamp", freq="W")]).agg({
    "col1": "sum"

})```
I have a column with true and false values exclusively, I'm trying to count the true values within a certain interval
but some results are negative

#

I really don't understand why it does that

serene scaffold Mar 4, 2022, 2:47 PM

#

urban lance can someone explain to my how this "sum" param works exaclty? I'm having some st...

it should just be aggregating col1 with sum, or something like that. if you want additional help, show a reproducible example with df.head(10).to_dict('list'). Screenshots are useless, in this context.

urban lance Mar 4, 2022, 2:53 PM

#

it appears as though groupby tries to save an int16 value in an int8 🤔

civic stone Mar 4, 2022, 2:54 PM

#

Good Afternoon everyone ,

i am trying to use "Word2Vec" package in pycharm
from gensim.models import Word2Vec

but it shows an error Unresolved reference 'Word2Vec'

can anybody support me on this

tacit basin Mar 4, 2022, 2:58 PM

#

pastel valley i am doing experiment on image augmentations i want to compare if there will be...

You can evaluate image augmentations with transfer learning as well. You will see results faster.

urban lance Mar 4, 2022, 2:59 PM

#

@serene scaffold I've found my issue

#

it indeed was bit overflowing

tacit basin Mar 4, 2022, 3:00 PM

#

pastel valley btw how you use pre trained models? did i do it right? but in this i just dont c...

Yes if you pass None to weights it will initialize 'random' weights. If you soecify say imagenet then it will use ptetrained weights. You can still train from there with different augmentations.

tacit basin Mar 4, 2022, 3:02 PM

#

pastel valley <@!490342783572246538> btw how about this ? what does it mean its learning but m...

Yes when accuracy on valid set is improving it's learning. You can continue training up until your valid metric improves. If I'd doesn't improve or gets worse then it's overfitting

serene scaffold Mar 4, 2022, 3:03 PM

#

urban lance it appears as though groupby tries to save an int16 value in an int8 🤔

why are you using such low-bit integers?

tacit basin Mar 4, 2022, 3:04 PM

#

upper spindle i want to read in a csv file from a directory using this code `eth = pd.read_csv...

What error you get?

serene scaffold Mar 4, 2022, 3:04 PM

#

upper spindle i want to read in a csv file from a directory using this code `eth = pd.read_csv...

!traceback

arctic wedgeBOT Mar 4, 2022, 3:04 PM

#

Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.

A full traceback could look like:

Traceback (most recent call last):
  File "my_file.py", line 5, in <module>
    add_three("6")
  File "my_file.py", line 2, in add_three
    a = num + 3
TypeError: can only concatenate str (not "int") to str

If the traceback is long, use our pastebin.

urban lance Mar 4, 2022, 3:06 PM

#

serene scaffold why are you using such low-bit integers?

Because they read as objects and I turned them into int8 because that's what they were before processing the datasets 🤦‍♂️

#

And also my dataset was HUGE and it caused memory issues

serene scaffold Mar 4, 2022, 3:07 PM

#

ah

upper spindle Mar 4, 2022, 3:12 PM

#

tacit basin What error you get?

im getting this error FileNotFoundError: [Errno 2] No such file or directory: '../EC331/combined_posts_comments_final.csv'

serene scaffold Mar 4, 2022, 3:13 PM

#

upper spindle im getting this error `FileNotFoundError: [Errno 2] No such file or directory: '...

you need to know your current working directory, and then see if there is a EC311 directory in the one above it, and if it has a combined_posts_comments_final.csv file

#

though we already know from the error message that you don't.

upper spindle Mar 4, 2022, 3:17 PM

#

#

there is the directory

#

but it still comes up with the same error

serene scaffold Mar 4, 2022, 3:22 PM

#

upper spindle

this screenshot cuts off the error message. but I'll only look at error messages that are given as text.

#

do you know what the .. at the beginning of the path do? if not, you should probably delete them.

upper spindle Mar 4, 2022, 3:24 PM

#

FileNotFoundError: [Errno 2] No such file or directory: '../EC331/Ethereum Data/combined_posts_comments_final.csv'

#

this is the error sorry

upper spindle Mar 4, 2022, 3:25 PM

#

serene scaffold do you know what the `..` at the beginning of the path do? if not, you should pr...

deleted but the error is still the same as i just sent FileNotFoundError: [Errno 2] No such file or directory: '../EC331/Ethereum Data/combined_posts_comments_final.csv'

serene scaffold Mar 4, 2022, 3:25 PM

#

upper spindle deleted but the error is still the same as i just sent `FileNotFoundError: [Errn...

you have to figure out the working directory that Python is using and give the path relative to that location

upper spindle Mar 4, 2022, 3:28 PM

#

okay, ill give that a try

desert oar Mar 4, 2022, 3:29 PM

#

upper spindle `FileNotFoundError: [Errno 2] No such file or directory: '../EC331/Ethereum Data...

this is jupyterlab?

#

you can figure out what the working directory is by doing this in a python code cell: import os; os.getcwd()

#

.. is always relative to the working directory, not to the current file/script being executed

upper spindle Mar 4, 2022, 3:32 PM

#

it worked thanks @serene scaffold and @desert oar

serene scaffold Mar 4, 2022, 3:32 PM

#

desert oar `..` is always relative to the working directory, _not_ to the current file/scri...

I'm planning to make a tag for this kind of issue, and I should mention this caveat. thanks!

desert oar Mar 4, 2022, 3:32 PM

#

serene scaffold I'm planning to make a tag for this kind of issue, and I should mention this cav...

good idea. this comes up a lot in the context of "scripts" as well as notebooks

somber bough Mar 4, 2022, 3:38 PM

#

So im planning to make a program that can identify the original 151 pokemon if you can upload a picture, and I got a dataset from kaggle, and i was going to use googlr teach able machine to upload and make the model, but i was wondering if that would be bad idea to have 151 diffrent things in 1 tenserflow model?

serene scaffold Mar 4, 2022, 3:38 PM

#

what you use to create the model (tensorflow, pytorch, etc) doesn't actually matter as far as its potential capability

#

what matters is the training data that you have and the model architecture

#

that said, how many separate images do you have for each of the 151 pokemon?

#

because if you only have one image per pokemon, that's not going to be enough

somber bough Mar 4, 2022, 3:40 PM

#

serene scaffold that said, how many separate images do you have for each of the 151 pokemon?

There are around 50 - 70 each

tacit basin Mar 4, 2022, 3:40 PM

#

upper spindle okay, ill give that a try

In addition to what salt rock lamp said about os.getcwd() in jupyterlab in a cell you could use bash pwd like that

!pwd

It's a useful way to execute bash commands in a notebook

serene scaffold Mar 4, 2022, 3:40 PM

#

somber bough There are around 50 - 70 each

okay, so you can make a neural network for that. 151 might be a lot of classes--I'm not sure.

desert oar Mar 4, 2022, 3:41 PM

#

tacit basin In addition to what salt rock lamp said about os.getcwd() in jupyterlab in a cel...

i generally recommend not using shell commands in notebooks, because it's convenient but you very quickly run into problems where your notebook is dependent on specifics of the user's environment, e.g. it no longer works on windows

desert oar Mar 4, 2022, 3:42 PM

#

serene scaffold okay, so you can make a neural network for that. 151 might be a lot of classes--...

151 is fine as long as you have enough data points per class

#

50-70 seems good

serene scaffold Mar 4, 2022, 3:42 PM

#

desert oar 151 is fine as long as you have enough data points per class

I'm actually learning about image processing for the first time, for text recognition 😄

tacit basin Mar 4, 2022, 3:42 PM

#

desert oar i generally recommend *not* using shell commands in notebooks, because it's conv...

That's possible. But for quick pwd is perfect

desert oar Mar 4, 2022, 3:42 PM

#

I think it's a standard practice in image classification problems to synthetically generate a lot more training samples by algorithmically distorting or otherwise modifying the images

upper spindle Mar 4, 2022, 3:43 PM

#

tacit basin In addition to what salt rock lamp said about os.getcwd() in jupyterlab in a cel...

thanks

desert oar Mar 4, 2022, 3:43 PM

#

stretching, skewing, altering colors, rotating/mirroring, adding noise, etc.

normal saffron Mar 4, 2022, 3:43 PM

#

anyone want to explain how tf to make ai

#

pls help

desert oar Mar 4, 2022, 3:43 PM

#

there are lots of articles about data augmentation for image classification problems

normal saffron Mar 4, 2022, 3:43 PM

#

i read

#

them

serene scaffold Mar 4, 2022, 3:43 PM

#

normal saffron anyone want to explain how tf to make ai

that's a very broad question that could take months to explain

normal saffron Mar 4, 2022, 3:43 PM

#

they are no help:(

desert oar Mar 4, 2022, 3:44 PM

#

normal saffron i read

my post was in response to some thing above, not you

#

what you need is a machine learning course starting at the basics

normal saffron Mar 4, 2022, 3:44 PM

#

serene scaffold that's a very broad question that could take months to explain

oversimplified then

desert oar Mar 4, 2022, 3:44 PM

#

fast.ai is a good option

#

you're basically asking somebody to type out a textbook chapter for you

normal saffron Mar 4, 2022, 3:44 PM

#

copy and paste

#

no but rlly could someone at least show me an ai program so i could see how it works?

serene scaffold Mar 4, 2022, 3:46 PM

#

normal saffron no but rlly could someone at least show me an ai program so i could see how it w...

each one works differently. but if AI is something that interests you, you will probably enjoy a course about the fundamentals

normal saffron Mar 4, 2022, 3:46 PM

#

serene scaffold each one works differently. but if AI is something that interests you, you will ...

then a random ai program?

upper spindle Mar 4, 2022, 3:46 PM

#

how would someone from an econ background specialise/learn about ML/DL/NN

tacit basin Mar 4, 2022, 3:47 PM

#

normal saffron then a random ai program?

Self driving cars

mild dirge Mar 4, 2022, 3:47 PM

#

upper spindle how would someone from an econ background specialise/learn about ML/DL/NN

Same way anyone would, start with linear algebra and statistics

serene scaffold Mar 4, 2022, 3:47 PM

#

upper spindle how would someone from an econ background specialise/learn about ML/DL/NN

for your own edification, or to pivot careers?

tacit basin Mar 4, 2022, 3:47 PM

#

mild dirge Same way anyone would, start with linear algebra and statistics

Depends

normal saffron Mar 4, 2022, 3:47 PM

#

tacit basin Self driving cars

...........You know where to find code for a tesla?

tacit basin Mar 4, 2022, 3:47 PM

#

normal saffron ...........You know where to find code for a tesla?

At Tesla i guess

serene scaffold Mar 4, 2022, 3:47 PM

#

normal saffron ...........You know where to find code for a tesla?

the source code for tesla is probably very proprietary

#

but self-driving cars are going to have numerous components

normal saffron Mar 4, 2022, 3:48 PM

#

true

serene scaffold Mar 4, 2022, 3:48 PM

#

they probably need cameras to see whats going on, and models to identify what each thing is

normal saffron Mar 4, 2022, 3:48 PM

#

so cv?

serene scaffold Mar 4, 2022, 3:49 PM

#

and then it needs some formula to decide how fast or slow to go based on those conditions, as well as incline, speed limits, etc.

desert oar Mar 4, 2022, 3:49 PM

#

upper spindle how would someone from an econ background specialise/learn about ML/DL/NN

depending on how advanced the econ background is, you should have more than enough math and statistics foundational knowledge to jump in "math first". Fast.ai can't hurt as an easy "first course in modern deep learning". for books, check out Probabilistic Machine Learning by Murphy and/or Deep Learning by Goodfellow. what the econ background lets you do is skip all the statistics basics and go right for the fun stuff

normal saffron Mar 4, 2022, 3:49 PM

#

serene scaffold and then it needs some formula to decide how fast or slow to go based on those c...

well elon musk must have good employees

serene scaffold Mar 4, 2022, 3:50 PM

#

normal saffron well elon musk must have good employees

well, of course

normal saffron Mar 4, 2022, 3:50 PM

#

serene scaffold well, of course

lol

desert oar Mar 4, 2022, 3:51 PM

#

however you will probably want to revisit statistics from outside the perspective of econometrics, because in my experience econometricians tend to use different techniques and think about problems differently @upper spindle . so it depends on your background. the general recommendations are more or less the same as for someone who knows very little or nothing, but the benefit of having a quantitative background is that you can move a lot faster through the intro material and don't need to spend time learning how to program a computer, how to read equations, how to reason statistically/probabilistically, etc.

upper spindle Mar 4, 2022, 3:51 PM

#

serene scaffold for your own edification, or to pivot careers?

i wanna move into data science

desert oar Mar 4, 2022, 3:51 PM

#

what is your background @upper spindle, specifically?

serene scaffold Mar 4, 2022, 3:51 PM

#

upper spindle i wanna move into data science

so, to pivot careers? salt rock lamp just gave you some great advice, so I'll respond once you've been able to read that.

upper spindle Mar 4, 2022, 3:51 PM

#

desert oar however you will probably want to revisit statistics from _outside_ the perspect...

thanks, that was my issue with programming and with the statistics/probability

upper spindle Mar 4, 2022, 3:52 PM

#

desert oar what _is_ your background <@!722177620019511380>, specifically?

im a current university student but about to graduate in a few months

#

but im lacking on the programming side

serene scaffold Mar 4, 2022, 3:53 PM

#

anyway, my advice would be to apply to graduate programs in something more closely related to data science. I've worked with data scientists with an economics background, so it's probably one of the better non-CS avenues into DS/AI.

serene scaffold Mar 4, 2022, 3:53 PM

#

upper spindle but im lacking on the programming side

did you do any programming in R?

desert oar Mar 4, 2022, 3:54 PM

#

ah, so undergrad econ

#

that changes things a bit

upper spindle Mar 4, 2022, 3:54 PM

#

i have done some, but my department here in the uk used stata

desert oar Mar 4, 2022, 3:54 PM

#

yeah you basically should treat yourself like an advanced beginner

#

start where everyone else starts

#

you probably can read equations, and do calculus, and know some linear algebra

#

you know what regression is, you know about model bias and variance, you know about statistical inference at least on a basic level, you know how to reason about model building

upper spindle Mar 4, 2022, 3:55 PM

#

desert oar you probably can read equations, and do calculus, and know some linear algebra

yeh, my maths background from a-level was pretty strong so im not too worried about that too much, other than when equations get horrible

desert oar Mar 4, 2022, 3:55 PM

#

so start at the basics but you can move quickly through it

#

i very strongly suggest the Murphy book

#

the beginning material should all be familiar to you from econometrics, but it might be expressed somewhat differently from what you are used to

#

that + the fast.ai course should be a great start imo

#

no need to rush through it

upper spindle Mar 4, 2022, 3:56 PM

#

thanks

desert oar Mar 4, 2022, 3:56 PM

#

i also strongly suggest learning python, since this is a python forum 🙂

#

R isn't that useful for "machine learning" as such

upper spindle Mar 4, 2022, 3:56 PM

#

desert oar i also strongly suggest learning python, since this is a python forum 🙂

yeh, ive been developing my python ability over the year

desert oar Mar 4, 2022, 3:56 PM

#

good, that will be useful in industry

#

a lot of jobs will place high value on your ability to write code independently

upper spindle Mar 4, 2022, 3:58 PM

#

desert oar a lot of jobs will place high value on your ability to write code independently

that is the toughest skill ive found over the year, especially for NNs specifically

serene scaffold Mar 4, 2022, 3:58 PM

#

desert oar R isn't that useful for "machine learning" as such

I only asked about R because I thought economists usually use that

desert oar Mar 4, 2022, 3:58 PM

#

serene scaffold I only asked about R because I thought economists usually use that

a lot of them still use stata, but yeah social scientists and statisticians often use R

desert oar Mar 4, 2022, 3:59 PM

#

upper spindle that is the toughest skill ive found over the year, especially for NNs specifica...

are you writing them "from scratch" somehow?

#

pytorch is pretty easy to use

#

especially when you already know the underlying math

#

i also wouldn't spend too much energy on learning how to implement things "from scratch"

#

numerical computing is its own field

#

learn about how the models work mathematically and how to use them, don't worry about implementing them

upper spindle Mar 4, 2022, 4:14 PM

#

desert oar are you writing them "from scratch" somehow?

yeye, i am, but ive been using youtube and github projects to just get a sense of what im trying to do

upper spindle Mar 4, 2022, 4:14 PM

#

desert oar a lot of them still use stata, but yeah social scientists and statisticians ofte...

yeh, my whole department uses stata and are slowly transitioning to R

upper spindle Mar 4, 2022, 4:14 PM

#

desert oar learn about how the models work mathematically and how to use them, don't worry ...

okay thanks

#

ive been using tensorflow to implement lstm's so far

serene scaffold Mar 4, 2022, 4:16 PM

#

that is, you're actually implementing LSTMs "from scratch" (ie with no constructs more abstract than individual tensors)?

#

I ask because we've discussed lately how overused the word "implement" tends to be. but yeah, implementing things like that "from scratch" isn't something I'd do at your stage, though you'll get to a point where you could if you wanted to.

upper spindle Mar 4, 2022, 4:21 PM

#

ohh okay, sorry haha, ive been using code from githubs, youtube and combining them into a univariate lstm

desert oar Mar 4, 2022, 4:29 PM

#

yeah dont waste your time with the youtube tutorials

#

work through fast.ai

#

go in with a beginner's mind imo

#

you'll make progress quickly

#

you won't struggle like a real beginner would

misty flint Mar 4, 2022, 4:29 PM

#

serene scaffold I only asked about R because I thought economists usually use that

so do public health folks and many in pharmaceuticals/biostats. CDC uses R here.

#

but tbh

#

i also recommended python

#

since if you want to do advanced data science in R, you end up calling the Reticulate package anyway

#

aka using python through R

#

DoggoKek

#

even the R podcasters i listen to end up having to use python sometimes

#

and theyre trained as biostatisticians too

#

even if you have to do bioinformatics, theres biopython

#

but the documentation for some of that stuff can be terrible sometimes so good luck

misty flint Mar 4, 2022, 4:34 PM

#

desert oar work through fast.ai

fast ai is great

#

praise

serene scaffold Mar 4, 2022, 4:35 PM

#

is there cuda-enabled deep learning in R?

misty flint Mar 4, 2022, 4:35 PM

#

if there is, idk about it

#

pithink

serene scaffold Mar 4, 2022, 4:36 PM

#

because it's easier to just have all of scientific computing under one roof, and if they're missing that, it's going to become impossible to compete.

misty flint Mar 4, 2022, 4:37 PM

#

true

#

many academics use R tho, so dont think its going away anytime soon

upper spindle Mar 4, 2022, 4:38 PM

#

desert oar work through fast.ai

okay, will do, thanks

misty flint Mar 4, 2022, 4:38 PM

#

~~for now~~

#

CLe_FeelsEvilLurk

nova tapir Mar 4, 2022, 4:39 PM

#

#

can someone explain why correct option is option 4?

upper spindle Mar 4, 2022, 4:41 PM

#

is jupyter labs the best tool for data science/programming in python

#

seen a few people use spyder

#

or what are your go to tools for data science

serene scaffold Mar 4, 2022, 4:42 PM

#

nova tapir

think of which two quadrants the data points are in, and which quadrant the options are in

cinder thicket Mar 4, 2022, 4:42 PM

#

upper spindle is jupyter labs the best tool for data science/programming in python

oh yeah, i have a problem with jupyter, tried installing and keep getting this, how do i fix this and install jupyter

serene scaffold Mar 4, 2022, 4:43 PM

#

nova tapir

the first two options are on an axis, but not lined up with data points. look at where the other two are, if you treat them as points.

upper spindle Mar 4, 2022, 4:46 PM

#

cinder thicket oh yeah, i have a problem with jupyter, tried installing and keep getting this, ...

i would uninstall

#

and try to install again

cinder thicket Mar 4, 2022, 4:47 PM

#

upper spindle i would uninstall

you mean jupyter or python?

serene scaffold Mar 4, 2022, 4:48 PM

#

are you following a tutorial? this looks like a misguided use of Python OOP

#

return self.df   # there is no self.df attribute of myDataframe
x.dataframe()    # This value isn't used, so nothing happens--did you want to return it?

If you want the d variable in the myDataframe.dataframe method to be exposed, you have to store it as self.d = ...

desert oar Mar 4, 2022, 4:51 PM

#

upper spindle is jupyter labs the best tool for data science/programming in python

the best tool is the one that you like the best. try both jupyterlab and spyder, you can also try just using plain python files + ipython on the side, etc.

#

even as pseudocode, this is very weird code that seems to have been written by someone who was confused about how to use classes

#

i don't mean to be offensive, but i think that was what stelercus was commenting on

cinder thicket Mar 4, 2022, 4:54 PM

#

@upper spindle tried this and still getting same errors

#

its jupyterlab i am trying to install

serene scaffold Mar 4, 2022, 4:55 PM

#

myDataframe just looks like a wrapper around a single dataframe with no particular purpose, and it refers to instance variables that aren't defined.

#

def make_df(ticker):
    return pd.DataFrame({'Ticker': [ticker]})

What you have appears to be an over-engineered version of this.

upper spindle Mar 4, 2022, 4:56 PM

#

cinder thicket you mean jupyter or python?

jupyter

serene scaffold Mar 4, 2022, 4:56 PM

#

if you need to go back to having {'Ticker': [ticker]}, you can just do df.to_dict() on the dataframe. the wrapper class just adds a layer of potential complexity.

upper spindle Mar 4, 2022, 4:57 PM

#

cinder thicket <@!722177620019511380> tried this and still getting same errors

im not too sure, tbh, maybe check you have the right requirements

cinder thicket Mar 4, 2022, 5:00 PM

#

upper spindle im not too sure, tbh, maybe check you have the right requirements

where are the requirements on the jupyter site?

serene scaffold Mar 4, 2022, 5:03 PM

#

I find it is easier to define an init function with self.var in case I decide to add functions or alter the code down the line.
one usually wants to avoid having lots of mutable state

#

you're also creating an additional API on top of pandas that people who use your code would have to learn.

stone marlin Mar 4, 2022, 5:07 PM

#

Aw, I was trying to do a little refactor of the code, but they deleted it. :'[

tacit basin Mar 4, 2022, 5:09 PM

#

misty flint fast ai is great

They just announced new 'old' course for this April. Similar to last one but with Timm and transformers integration

tacit basin Mar 4, 2022, 5:11 PM

#

upper spindle is jupyter labs the best tool for data science/programming in python

Vscode is option too. It's supports notebooks as well as py files

tacit basin Mar 4, 2022, 5:15 PM

#

cinder thicket <@!722177620019511380> tried this and still getting same errors

On windows you can always install anaconda. You will get a lot of libraries for DS. And visual UI launcher. It's a bit heavy download though.

cinder thicket Mar 4, 2022, 5:17 PM

#

tacit basin On windows you can always install anaconda. You will get a lot of libraries for ...

i want to do these tutorials and they use jupyter or collab, and i want to use jupyter https://www.planethunters.coffee/tutorials

PH Coffee Chat

Tutorials | PH Coffee Chat

tacit basin Mar 4, 2022, 5:18 PM

#

cinder thicket i want to do these tutorials and they use jupyter or collab, and i want to use j...

Where do they say what 'tools' they use?

cinder thicket Mar 4, 2022, 5:23 PM

#

tacit basin Where do they say what 'tools' they use?

https://www.planethunters.coffee/intro-to-python

PH Coffee Chat

Intro to Python | PH Coffee Chat

tacit basin Mar 4, 2022, 5:26 PM

#

cinder thicket https://www.planethunters.coffee/intro-to-python

Ok. They don't specify install method . But you can install graphical anaconda. It comes with jupyter, a lot of libraries preinstalled, conda virtual env manager. https://www.anaconda.com/products/individual#Downloads

Anaconda

Anaconda | Individual Edition

Anaconda's open-source Individual Edition is the easiest way to perform Python/R data science and machine learning on a single machine.

#

It's a commonly used tool in DS.

#

Another option is standalone jupyter app https://github.com/jupyterlab/jupyterlab-desktop

GitHub

GitHub - jupyterlab/jupyterlab-desktop: JupyterLab desktop applicat...

JupyterLab desktop application, based on Electron. - GitHub - jupyterlab/jupyterlab-desktop: JupyterLab desktop application, based on Electron.

cinder thicket Mar 4, 2022, 5:33 PM

#

tacit basin Another option is standalone jupyter app https://github.com/jupyterlab/jupyterla...

will i still be able to import Astropy and other stuff needed to do what i want to do?

tacit basin Mar 4, 2022, 5:36 PM

#

cinder thicket will i still be able to import Astropy and other stuff needed to do what i want ...

I would think so but i didn't use app version. With conda / anaconda yes you can install packages with conda install or pip install

#

Colab is a good option as well

cinder thicket Mar 4, 2022, 5:41 PM

#

tacit basin Colab is a good option as well

i may go with colab than

#

may install the desktop app in the future, but not now

cinder thicket Mar 4, 2022, 5:42 PM

#

tacit basin Colab is a good option as well

but how do you even import astropy and other modules on colab?

tacit basin Mar 4, 2022, 5:46 PM

#

cinder thicket but how do you even import astropy and other modules on colab?

You would need to pip install the package every time you start session if it's not a part of preinstalled packages on colab (a lot of packages are preinstalled)

cinder thicket Mar 4, 2022, 5:48 PM

#

tacit basin You would need to pip install the package every time you start session if it's n...

will it keep it installed if i save a session to my drive and reopen it (and just modify the code so i can look at a different TIC star)

tacit basin Mar 4, 2022, 5:51 PM

#

cinder thicket will it keep it installed if i save a session to my drive and reopen it (and jus...

It needs to be installed every time but it wouldn't take long

Screenshot_2022-03-04-18-50-26-92_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

cinder thicket Mar 4, 2022, 5:52 PM

#

tacit basin It needs to be installed every time but it wouldn't take long

alright

#

now that i know how do do this, can someone help me with doing shape from shading with python
i want to know if its possible, and if so, how do i do it?

#

i want to make some DEMs for moons in the solar system

tacit basin Mar 4, 2022, 5:53 PM

#

Not sure what that means but you could do most things with images in python for example with opencv library

cinder thicket Mar 4, 2022, 5:55 PM

#

tacit basin Not sure what that means but you could do most things with images in python for ...

shape from shading is where you turn a still image into a DEM
here is a example from unmannedspaceflight.com (astronomy forum)

#

http://www.unmannedspaceflight.com/index.php?s=9c8c48a9b3359b9a1a68a71293b12207&showtopic=6543
whole topic for those interested, program there using is for linux only, not windows

tacit basin Mar 4, 2022, 6:04 PM

#

cinder thicket http://www.unmannedspaceflight.com/index.php?s=9c8c48a9b3359b9a1a68a71293b12207&...

Not sure if this is what you want? http://geologyandpython.com/dem-processing.html

Download and Process DEMs in Python | Geology and Python

This tutorial shows how to automate downloading and processing DEM files. It shows a one-liner code to download SRTM (30 or 90 m) data and how to use rasterio to reproject the downloaded data into a desired CRS, spatial resolution or bounds.

floral valley Mar 4, 2022, 6:06 PM

#

what does it mean when a arima problem is unconstrained?

serene scaffold Mar 4, 2022, 6:18 PM

#

floral valley what does it mean when a arima problem is unconstrained?

do you mean unconstrained optimization?

floral valley Mar 4, 2022, 6:19 PM

#

The error when I run SAIRMAX is "this problem is unconstrained"

#

It outputs buts not entirely

serene scaffold Mar 4, 2022, 6:20 PM

#

try showing the whole error message from Traceback.

floral valley Mar 4, 2022, 6:21 PM

#

heres the error if you want to see

serene scaffold Mar 4, 2022, 6:21 PM

#

I will only look at text, sorry

#

!paste

arctic wedgeBOT Mar 4, 2022, 6:21 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

floral valley Mar 4, 2022, 6:21 PM

#

sure

#

https://paste.pythondiscord.com/unehiwolot

#

been trying to get this completed forever and just cant lol

serene scaffold Mar 4, 2022, 6:26 PM

#

floral valley https://paste.pythondiscord.com/unehiwolot

https://stackoverflow.com/questions/49547245/valuewarning-no-frequency-information-was-provided-so-inferred-frequency-ms-wi

Stack Overflow

ValueWarning: No frequency information was provided, so inferred fr...

I try to fit Autoregression by sm.tsa.statespace.SARIMAX. But I meet a warning, then I want to set frequency information for this model.
Who used to meet it, can you help me ?

fit1 = sm.tsa.states...

floral valley Mar 4, 2022, 6:29 PM

#

how do you get the inferred frequency?

#

i would persume it guesses off the data but how do you pass that in

serene scaffold Mar 4, 2022, 6:29 PM

#

I don't know--I don't even know what the problem is

#

I'm just following my usual debugging steps

floral valley Mar 4, 2022, 6:30 PM

#

should i send my kaggle code? everything is on here

#

may help

#

one sec

serene scaffold Mar 4, 2022, 6:32 PM

#

I don't think I can do a deep dive right now, but someone else might.

floral valley Mar 4, 2022, 6:33 PM

#

dont worry ill just have it on here if anyone can help, havnt had much progress by myself and need to do 3 models lol

#

https://www.kaggle.com/connor2608/codes

codes

Explore and run machine learning code with Kaggle Notebooks | Using data from [Private Datasource]

#

its not a large program but trying to get it working

misty flint Mar 4, 2022, 6:42 PM

#

tacit basin They just announced new 'old' course for this April. Similar to last one but wit...

blobhyperthink

#

i need april to come now

tacit basin Mar 4, 2022, 7:45 PM

#

misty flint i need april to come now

yeah, will be fun: https://twitter.com/jeremyphoward/status/1499600211714674688

Jeremy Howard (@jeremyphoward)

I am over the moon to announce:

I'm now a professor at University of Queensland (UQ), the top institute in my home state!
I'll be teaching a brand new deep learning course at UQ from April, which will form the basis of a new @fastdotai course! 🧵
https://t.co/RAMaHb7eZ2

Likes

2448

Retweets

164

minor elbow Mar 4, 2022, 10:58 PM

#

floral valley i would persume it guesses off the data but how do you pass that in

you can pass it with the freq argument, MS is monthly data where the index -dd part is the start of the month

#

for your SARIMA thing i would suggest using a simpler model like dropping the seasonal order and trend and see if that works then you can add in the extra stuff to find out exactly whats causing the issue

#

im not sure the 'this problem is unconstrained' is an error

astral delta Mar 4, 2022, 11:15 PM

#

Anyone here good with pytesseract and pyautogui dm me im tryna create a bot for something

serene scaffold Mar 4, 2022, 11:16 PM

#

astral delta Anyone here good with pytesseract and pyautogui dm me im tryna create a bot for ...

It's not likely that anyone will DM you. you should say what you want help with in this channel.

astral delta Mar 4, 2022, 11:18 PM

#

Ight, so I am trying to make a bot answer these questions rlly fast, so I am trying to use ocr to get the questions and answer it, and then I will try to correspond the answer to one of the choices and press 1,2,3,4 to get the correct answer

serene scaffold Mar 4, 2022, 11:21 PM

#

is it always two integers and one of the four basic operations?

thin palm Mar 4, 2022, 11:31 PM

#

➜ wbanalysis git:(gcp) ✗ make upload_data [🐍 warren-buffet]
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
why is GCP saying the destination url must name a directory even though I give it a directory?

pseudo wren Mar 4, 2022, 11:39 PM

#

I am having trouble converting api data into a csv and also manipulating said data

#

for example I am doing a project on murder rates as reported by different news publications

#

I was able to pull the API

#

but every time I try to convert it to a cvf, I get an error message

#

i've looked at several stack overflow message boards but cannot find a solution that works for me

#

import json 
import csv
with urllib.request.urlopen('https://content.guardianapis.com/search?api-key=47b0057b-d60d-4a3d-a6cf-c1f79aeedaa4') as url:
  s = url.read()
  print(s)```

#

this is the code for right now

twin hound Mar 5, 2022, 1:04 AM

#

Hey guys can someone tell me why my scores are so low for my testing set when clearly the model predicts the test data very well:

#

#

These plots are test vs prediction from my model

#

blue is test, red is prediction

misty flint Mar 5, 2022, 1:07 AM

#

~~idk about clearly~~

#

RunFail

twin hound Mar 5, 2022, 1:16 AM

#

What I mean is they predict the data relatively well

#

So a score less than 0.6 makes zero sense

#

I've seen it plotted using another ML method with higher score and it doesn't look nearly that clean

rain temple Mar 5, 2022, 1:18 AM

#

I am following a tutorial on pix2pix generation. The output for the shapes of each of the target arrays and the source arrays are
Loaded: (1096, 256, 256, 3) (1096, 256, 256, 3)

but for me the arrays are
Loaded: (256,256,3) (256,256,3)

Does this mean that not all of the images are loaded into the arrays?

#

I plotted the contents on matplotlib and this is what I get, but I am supposed to get a picture of the images.

#

Could someone please help. Thx

somber bough Mar 5, 2022, 4:12 AM

#

So I'm making a custom detection model where you can upload an image and it will put it through the detection, and display the top 3 closest results (like on a bar graph), but im somewhat new to python so i dont know which library to use

quick eagle Mar 5, 2022, 4:20 AM

#

Hello all! I'm trying to use the fastdtw module to align time-series data that is slightly off.... but the fastdtw alignment makes it WAAY worse!! Any suggestions on whether I'm using that module incorrectly, or a better way to synchronize data?

#

I'm following this writeup:
https://towardsdatascience.com/how-to-synchronize-time-series-datasets-in-python-f2ae51bee212

Medium

How to Synchronize Time Series Datasets in Python

Using dynamic time warping to synchronize time-series data

#

But I start with that (orange trace is a few seconds ahead), and fastdtw makes a mess of it!!!

u5JrkhyMnAzsMs6PmtbmuvnAF46sZzZPt5BwO3J7l9hu8jSZpgFnOSpM5J8lvgsqr6HvCuGbzFC4FjqrbwNU0k6kAvKuqvtu76XAt4GLgUdX1YqqOhJ4J3BaVV1GM9nJTNxaVZcDHwKOneF7SJImnLNZSpI0RO1slicmuXLUbZEkdZsjc5IkSZLUQY7MSZIkSVIHOTInSZIkSR1kMSdJkiRJHWQxJ0mSJEkdZDEnSZIkSR1kMSdJkiRJHWQxJ0mSJEkd9L8Hd8eS84EIAAAAAUlEQVRmjOm5hQAAAABJRU5ErkJggg.png

violet gull Mar 5, 2022, 5:23 AM

#

this makes me happy

#

predicting a 50 50 shot with 75% accuracy

grand dagger Mar 5, 2022, 5:24 AM

#

WHAT

violet gull Mar 5, 2022, 5:24 AM

#

what

serene scaffold Mar 5, 2022, 5:31 AM

#

violet gull predicting a 50 50 shot with 75% accuracy

so there's only two options?

violet gull Mar 5, 2022, 5:31 AM

#

is square or is not square

#

binary

serene scaffold Mar 5, 2022, 5:32 AM

#

I don't mean to be the bearer of bad news, but for binary classification, 50% is the worst possible accuracy

#

so 75% is kind of like 50%

violet gull Mar 5, 2022, 5:32 AM

#

huh?

#

what does that even mean

serene scaffold Mar 5, 2022, 5:32 AM

#

If there's only two classes, and your model was completely random, then it would get 50% accuracy

violet gull Mar 5, 2022, 5:33 AM

#

yes

#

and its getting 75%

#

therefor its better

serene scaffold Mar 5, 2022, 5:33 AM

#

I suppose

violet gull Mar 5, 2022, 5:33 AM

#

i dont see the problem

#

it is working as expected

pastel valley Mar 5, 2022, 5:34 AM

#

neat anvil honestly <@!694276264273641483> these questions about data augmentation, transfe...

im bad at the maths hahah

pastel valley Mar 5, 2022, 5:35 AM

#

tacit basin Yes when accuracy on valid set is improving it's learning. You can continue trai...

yo its been already 10 epochs and the validation metric are the same as the 1st epoch is this normal? i am training it for 100 epochs

#

btw i am using google collab is there an option to use more computational power?

serene scaffold Mar 5, 2022, 6:11 AM

#

@pastel valley there is if you're willing to pay them for it.

pastel valley Mar 5, 2022, 6:12 AM

#

serene scaffold <@694276264273641483> there is if you're willing to pay them for it.

will there be any other platform which is free?

serene scaffold Mar 5, 2022, 6:12 AM

#

pastel valley will there be any other platform which is free?

Not one that will give you more compute power than colab

#

Colab is already generous.

pastel valley Mar 5, 2022, 6:13 AM

#

its still take me forever hahha

#

serene scaffold Mar 5, 2022, 6:14 AM

#

Did you remember to set it to use the gpu

pastel valley Mar 5, 2022, 6:14 AM

#

no its all default i dont know how to configure collab

#

is there that option?

serene scaffold Mar 5, 2022, 6:15 AM

#

Yes. But idk how to do it off the top of my head

#

Also, let me reiterate that I think you would benefit a lot from and very much enjoy a formal data science course.

tacit basin Mar 5, 2022, 6:17 AM

#

pastel valley no its all default i dont know how to configure collab

You need to change runtime to GPU
Runtime - change runtime type

serene scaffold Mar 5, 2022, 6:18 AM

#

You might also need to move the model to the GPU

heavy bay Mar 5, 2022, 6:19 AM

#

I'm making a simple neural network to find the relationship between 2 numbers ```py
from tensorflow import keras
import numpy as np

model = keras.Sequential(keras.layers.Dense(units=1, input_shape=[1]))
model.compile(optimizer='sgd', loss='mean_squared_error')

def calulate_trangular_numbers(n):
for i in range(1, n+1):
yield int(i*(i+1)/2)

n = 20
x = np.array(list(range(1, n+1)))
y = np.array(list(calulate_trangular_numbers(n)))

model.fit(x, y, epochs=500)``` (I want it to find the relationship between the x values and y y = x*(x+1)/2)
But for some reason when I fit the model the loss is nan

Epoch 1/500
1/1 [==============================] - 0s 9ms/step - loss: nan
Epoch 2/500
1/1 [==============================] - 0s 8ms/step - loss: nan
Epoch 3/500
1/1 [==============================] - 0s 7ms/step - loss: nan
Epoch 4/500
1/1 [==============================] - 0s 12ms/step - loss: nan
Epoch 5/500
1/1 [==============================] - 0s 12ms/step - loss: nan``` any reason for why this could happen?

tacit basin Mar 5, 2022, 6:20 AM

#

pastel valley will there be any other platform which is free?

There are couple of free GPU options: colab, paperspace, kaggle, AWS sagemaker studio lab

serene scaffold Mar 5, 2022, 6:21 AM

#

heavy bay I'm making a simple neural network to find the relationship between 2 numbers ``...

Make sure that y doesn't have any nans in it

heavy bay Mar 5, 2022, 6:21 AM

#

serene scaffold Make sure that y doesn't have any nans in it

I checked, it doesn't have any nan values

pastel valley Mar 5, 2022, 6:24 AM

#

serene scaffold Also, let me reiterate that I think you would benefit a lot from and very much e...

probably but i need the maths first i think hahaha

pastel valley Mar 5, 2022, 6:24 AM

#

tacit basin You need to change runtime to GPU Runtime - change runtime type

oh nice nice i got it

heavy bay Mar 5, 2022, 6:44 AM

#

heavy bay I checked, it doesn't have any nan values

weird the loss isn't nan when the input arrays contain 10 elements but the prediction is far from the expected value

shut phoenix Mar 5, 2022, 8:46 AM

#

Is this good
https://youtu.be/tPYj3fFJGjk

YouTube

freeCodeCamp.org

TensorFlow 2.0 Complete Course - Python Neural Networks for Beginne...

Learn how to use TensorFlow 2.0 in this full tutorial course for beginners. This course is designed for Python programmers looking to enhance their knowledge and skills in machine learning and artificial intelligence.

Throughout the 8 modules in this course you will learn about fundamental concepts and methods in ML & AI like core learning alg...

▶ Play video

#

I am getting into ml and ai field

copper dirge Mar 5, 2022, 8:59 AM

#

somber bough So I'm making a custom detection model where you can upload an image and it will...

Look up a youtube video on machine-learning image analysis

#

I would recommend getting more confident with python before starting something like this...

pastel valve Mar 5, 2022, 10:15 AM

#

Hi guys, i have a question regarding machine learning. Which algorithm will be the best if the data set generated will be based on the graphical location of the mouse cursor (numerical data) the objective is the allow the machine to learn the mouse movements

tacit basin Mar 5, 2022, 10:17 AM

#

shut phoenix Is this good https://youtu.be/tPYj3fFJGjk

I don't know this course. I can recommend Fastai courses they are suitable for beginners in AI with some python coding experience. course.fast.ai

tacit basin Mar 5, 2022, 10:20 AM

#

pastel valve Hi guys, i have a question regarding machine learning. Which algorithm will be t...

What would input and output for the model?

pastel valve Mar 5, 2022, 10:21 AM

#

mouse movement. graphical data( numerical)

tacit basin Mar 5, 2022, 10:21 AM

#

pastel valve mouse movement. graphical data( numerical)

Images?

pastel valve Mar 5, 2022, 10:21 AM

#

no, raw data

#

graphical location of the cursor

tacit basin Mar 5, 2022, 10:21 AM

#

pastel valve no, raw data

Example?

pastel valve Mar 5, 2022, 10:21 AM

#

x and y axis

tacit basin Mar 5, 2022, 10:21 AM

#

And output?

pastel valve Mar 5, 2022, 10:27 AM

#

from what i belive, the output will be based on the input

#

since the machine will have to predict what the next input might look llike

tacit basin Mar 5, 2022, 10:44 AM

#

pastel valve since the machine will have to predict what the next input might look llike

I mean also x,y coordinates?

pastel valve Mar 5, 2022, 10:44 AM

#

ye

#

s

tacit basin Mar 5, 2022, 10:49 AM

#

It's a multiple output regression. For example Deep neutral network https://machinelearningmastery.com/deep-learning-models-for-multi-output-regression/

Machine Learning Mastery

Deep Learning Models for Multi-Output Regression

Multi-output regression involves predicting two or more numerical variables. Unlike normal regression where a single value is predicted for each […]

shut phoenix Mar 5, 2022, 10:52 AM

#

tacit basin I don't know this course. I can recommend Fastai courses they are suitable for b...

Alr ty

tacit basin Mar 5, 2022, 10:53 AM

#

Or these algorithms support multuoutput regression in scikit learn:
LinearRegression (and related)
KNeighborsRegressor
DecisionTreeRegressor
RandomForestRegressor (and related)
https://machinelearningmastery.com/multi-output-regression-models-with-python/

Machine Learning Mastery

How to Develop Multi-Output Regression Models with Python

Multioutput regression are regression problems that involve predicting two or more numerical values given an input example. An example might […]

tacit basin Mar 5, 2022, 10:55 AM

#

shut phoenix Alr ty

They will have live course starting it n April, in person and online https://mobile.twitter.com/jeremyphoward/status/1499600211714674688

Jeremy Howard (@jeremyphoward)

I am over the moon to announce:

I'm now a professor at University of Queensland (UQ), the top institute in my home state!
I'll be teaching a brand new deep learning course at UQ from April, which will form the basis of a new @fastdotai course! 🧵
https://t.co/RAMaHb7eZ2

Likes

2797

Retweets

192

shut phoenix Mar 5, 2022, 10:55 AM

#

Interesting

tacit basin Mar 5, 2022, 10:58 AM

#

shut phoenix Interesting

But their courses and book are available for free, course above, book: GitHub.com/fastai/fastbook

shut phoenix Mar 5, 2022, 10:58 AM

#

Tysm

tacit basin Mar 5, 2022, 10:59 AM

#

Live course may be paid, but they release as free MOOC soon after live course finishes.

sterile rivet Mar 5, 2022, 11:50 AM

#

Any of yall are experienced with big data projects? I want to start with one and would love to know your dataset preferences.

somber prism Mar 5, 2022, 12:13 PM

#

guys i have a doubt , here https://fractaldle.medium.com/brief-overview-on-object-detection-algorithms-ec516929be93

what does it mean by For each object class, train a SVM (one versus other) classifier. You can use hard negative mining to improve the classification accuracy. , does it take the output of last fc hidden layer and feed it to svm for classification or take the softmax fc layer and feed it to svm?

Medium

Brief Overview on object detection Algorithms.

Understanding Object detection frameworks and discussing the evolution of the same.

sterile heath Mar 5, 2022, 12:17 PM

#

https://youtu.be/GVsUOuSjvcg For anyone who hasn't seen it yet. Very interesting bit about flashable analogue chips running pretrained models with significantly reduced power consumption vs banks of gpus.

YouTube

Veritasium

We're Building Computers Wrong

Visit https://brilliant.org/Veritasium/ to get started learning STEM for free, and the first 200 people will get 20% off their annual premium subscription. Digital computers have served us well for decades, but the rise of artificial intelligence demands a totally new kind of computer: analog.

Thanks to Mike Henry and everyone at Mythic for the...

▶ Play video

#

Cool bit of ML history, too.

karmic moth Mar 5, 2022, 1:53 PM

#

Does anyone know how to use Tf-Idf with a CNN for texts (NLP)

#

any article or something u can refer me to or tutorial?

craggy tiger Mar 5, 2022, 2:02 PM

#

Does anyone know of any data-science projects which I can join?

hollow sentinel Mar 5, 2022, 2:55 PM

#

https://harvard-iacs.github.io/2018-CS109A/pages/schedule.html

Harvard CS109A | Schedule

FALL 2018 - Harvard University, Institute for Applied Computational Science.

#

wow this is nice

#

it has application and bases itself off a good textbook

pastel valley Mar 5, 2022, 4:57 PM

#

this is probably my best results so far the distance of train test is not like the other ones
but those spikes on loss and accuracy is it normal? or there are common knowledge on why those happens?

grave frost Mar 5, 2022, 5:39 PM

#

sterile heath https://youtu.be/GVsUOuSjvcg For anyone who hasn't seen it yet. Very interesting...

its a pretty bad vid IMO, Derek doesn't look like he researched enough. Ever since his vids got out of physics, they're quality has been steadily degrading

drifting lion Mar 5, 2022, 5:39 PM

#

anyone who has worked on ML, do people put training and testing processes on same .py file or create different modules for each?

neat anvil Mar 5, 2022, 5:50 PM

#

I like tests to always go in a separate ‘tests/‘ directory. Source and tests being together makes things confusing IMO

maiden kite Mar 5, 2022, 6:38 PM

#

tacit basin I don't know this course. I can recommend Fastai courses they are suitable for b...

wdym by some python coding experience? I am am doing automatetheboringstuff book would that be enough or would a beginner mooc be beter

#

like the university of heilsinki one

iron basalt Mar 5, 2022, 7:02 PM

#

drifting lion anyone who has worked on ML, do people put training and testing processes on sam...

Both, when the model is new and still very buggy I just want fast development iteration times and just keep it all in one file. Then when it seems not so buggy anymore I start creating separate "official" tests that need to be passed before it's ready for use.

#

(I do this for not just ML but all new algorithms)

#

In addition, I like to have at least 1 test made by someone else to make sure that i'm not just making tests I know it will pass.

craggy tiger Mar 5, 2022, 7:30 PM

#

Hi there, I am looking for a data-science community to work with on interesting projects.

stone marlin Mar 5, 2022, 7:41 PM

#

iron basalt In addition, I like to have at least 1 test made by someone else to make sure th...

This is a perfect answer to the question.

#

Everyone should read this and apply it to their development cycle, for real.

iron basalt Mar 5, 2022, 7:44 PM

#

I like to imagine programming like crystallization or annealing. At first it's hot and I want to strike it often, but eventually I want it to cool off and harden / crystallize.

#

(Pro tip, check the commit rate of a piece of code, if it starts slowing down, it's time to add some tests and let it harden, but if it's updated a lot even after a long time, maybe it's the wrong approach / design and therefor is causing a lot of bugs)

#

(If someone asks you to fix their code base, look for what is being changed a lot and find out why)

stone marlin Mar 5, 2022, 7:53 PM

#

I was given that last advice by a former manager, and we had a tool to look at file-commit-rates. Many of them were just adding business logic (or false positives --- typos someone forgot to squash) and so it was easy to pull that out so that the business logic could be more easily changed and updated and then "plugged in" to the microservice. Great advice.

lapis sequoia Mar 5, 2022, 8:02 PM

#

Hello I have a conda project with a typer cli app in it located in libraryassignment/__main__.py file and I'm currently running typer commands like so: python -m libraryassignment <command_name>. It works fine but I want to be able to execute without -m flag like so: python libraryassignment <command_name> but I get ModuleNotFoundError: no moduled named 'libraryassignment'.
As far as I know, I have to either include it to the path or create a python package. I'm relatively new to conda and I wonder how can I tackle this issue creating all the required configuration to build a package so that python detects it as a package allowing me to keep developing on the project.
I used poetry in the past and it's pretty intuitive and easy to use especially regarding the building process of python packages with pyproject.toml and poetry.lock files but I don't have much experience with conda and I wonder if you can help me with some guidelines that I can put into practice to build a package from a conda project.
Thank you very much in advance.

thin palm Mar 5, 2022, 8:22 PM

#

any GCP experts out there who can tell me why I keep getting this error? Just trying to upload a folder to my GCP Bucket

➜ wbanalysis git:(gcp) make upload_data
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
make: *** [upload_data] Error 1

misty flint Mar 5, 2022, 9:21 PM

#

thin palm any GCP experts out there who can tell me why I keep getting this error? Just tr...

maybe ask in #tools-and-devops

#

speaking of, i need to learn a cloud tool

#

blobhyperthink

tacit basin Mar 5, 2022, 9:37 PM

#

maiden kite wdym by some python coding experience? I am am doing automatetheboringstuff book...

A year of coding (preferably Python) and high school math is the recommended pre-requisite. The best way to get up to speed is to start taking the current course new, and work to fill in any knowledge/expertise gaps you come across as you go.
https://mobile.twitter.com/jeremyphoward/status/1499600223920074754

Jeremy Howard (@jeremyphoward)

A year of coding (preferably Python) and high school math is the recommended pre-requisite. The best way to get up to speed is to start taking the current course new, and work to fill in any knowledge/expertise gaps you come across as you go.
https://t.co/nzv7pek0iq

maiden kite Mar 5, 2022, 9:53 PM

#

tacit basin A year of coding (preferably Python) and high school math is the recommended pre...

highschool math means 12 math right

#

like you need calculus and advanced functions

#

for the course

misty flint Mar 5, 2022, 9:55 PM

#

just try it

#

you should be able to fill in any knowledge gaps like they said

sterile heath Mar 5, 2022, 10:17 PM

#

grave frost its a pretty bad vid IMO, Derek doesn't look like he researched enough. Ever sin...

Hm. Okay. Good to know another view.

pastel valley Mar 6, 2022, 4:28 AM

#

how long i s the cooldown with this?

serene scaffold Mar 6, 2022, 4:31 AM

#

pastel valley how long i s the cooldown with this?

I'm not sure, but going forward, you should probably experiment with a smaller amount of data. you might also consider paying.

tacit basin Mar 6, 2022, 4:46 AM

#

pastel valley how long i s the cooldown with this?

The more you use it the longer you have to wait I've read. It's like hours or days.
You could try transfer learning. Kaggle will give you around 30-40 hrs of GPU usage a week guaranteed. For now AWS sagemaker studio lab doesn't have limits other than 4hrs session, similar to paperspace but here GPU may be not available at times due to demand.

novel raven Mar 6, 2022, 5:12 AM

#

Hey

#

Would you need maths skill for data science?

#

If yes then what could it be

#

plus1

serene scaffold Mar 6, 2022, 5:45 AM

#

novel raven Would you need maths skill for data science?

you at least need to understand/be excited to learn about probability and statistics. if you want to do ML as well, you'd need to have the same relationship with linear algebra and calculus.

tacit basin Mar 6, 2022, 6:48 AM

#

novel raven If yes then what could it be

You can start with using high level APIs like Sci-kit learn and fill the gaps as you go.

spare moat Mar 6, 2022, 6:53 AM

#

!code

arctic wedgeBOT Mar 6, 2022, 6:53 AM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

karmic moth Mar 6, 2022, 7:19 AM

#

hi i have a question

#

model = Sequential()

model.add(Conv1D(filters=3, kernel_size=1, activation='relu', input_shape=(None, 3, 10, 1)))
# model.add(MaxPool1D(pool_size=3, strides=1))
# model.add(GlobalMaxPooling1D())

# model.add(Conv1D(filters=32, kernel_size=3, activation='relu'))
# model.add(MaxPool1D(pool_size=2, strides=2))
# model.add(GlobalMaxPooling1D())

model.add(Flatten())

model.add(Dense(units=128,activation='relu'))

model.add(Dense(units=1,activation='sigmoid'))

# For a binary classification problem
model.compile(loss='binary_crossentropy', optimizer='adam')

#

here is a cnn model code

#

im getting this error

#

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-4023ec9e66ce> in <module>
     11 model.add(Flatten())
     12 
---> 13 model.add(Dense(units=128,activation='relu'))
     14 
     15 model.add(Dense(units=1,activation='sigmoid'))


ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.

#

does anayone know y?

pastel valley Mar 6, 2022, 7:21 AM

#

you only has 1 available prediction

karmic moth Mar 6, 2022, 7:21 AM

#

what?

pastel valley Mar 6, 2022, 7:22 AM

#

or also maybe the input shape?

karmic moth Mar 6, 2022, 7:23 AM

#

so how can i fix?

#

isnt the input shape set on the first layer, which is the first Conv1D layer, and i have done that

pastel valley Mar 6, 2022, 7:23 AM

#

am also noob so i dont know if what i am saying is correct hahaha

#

but the shape should be square i think

#

and the units after your flatten() should be =< the units of flattened

#

also your final layer should be more than one unit because if it is only one then its predicting only a single class in your case i think sigmoid is for binary class

#

correct me if am wrong 😅

karmic moth Mar 6, 2022, 7:52 AM

#

i am prediciting one value, a binary value, so its binary prediction

novel raven Mar 6, 2022, 7:55 AM

#

serene scaffold you at least need to understand/be excited to learn about probability and statis...

i love data & specially probabilities but im not the algebra type of person

novel raven Mar 6, 2022, 7:55 AM

#

tacit basin You can start with using high level APIs like Sci-kit learn and fill the gaps as...

i have used it before

violet gull Mar 6, 2022, 8:20 AM

#

https://paste.pythondiscord.com/utaqajurat can someone verify that the algorithms are working and its working as expected?

pastel valley Mar 6, 2022, 8:32 AM

#

karmic moth i am prediciting one value, a binary value, so its binary prediction

oh my bad

violet gull Mar 6, 2022, 8:32 AM

#

also is there a certain matrix size inside the neural network that is most efficient

#

ex.
121 data points --> x --> y --> z --> 2

iron basalt Mar 6, 2022, 9:07 AM

#

violet gull also is there a certain matrix size inside the neural network that is most effic...

1x1.

#

= 1

violet gull Mar 6, 2022, 9:07 AM

#

iron basalt 1x1.

what?

iron basalt Mar 6, 2022, 9:07 AM

#

violet gull what?

A matrix of size 1x1 is most efficient.

violet gull Mar 6, 2022, 9:09 AM

#

are we talking about the same thing?

#

thats not exactly how ML works

iron basalt Mar 6, 2022, 9:09 AM

#

violet gull are we talking about the same thing?

Are we? What do you mean then?

violet gull Mar 6, 2022, 9:10 AM

#

so i input an array of size 121 * 121 right

#

and then i go through 3 layers of matrix multiplication to get the 121 into a 1x2 or a 2x1 i forgor

iron basalt Mar 6, 2022, 9:11 AM

#

"get the 121 into a 1x2 or a 2x1 i forgor" - I don't understand what this means.

karmic moth Mar 6, 2022, 10:43 AM

#

https://datascience.stackexchange.com/questions/108800/input-0-of-layer-max-pooling1d-3-is-incompatible-with-the-layer-error

Data Science Stack Exchange

Input 0 of layer max_pooling1d_3 is incompatible with the layer Error

Ok, so basically, i have some Tf-Idf features and some additional features like wordcount, sentiment on my data. Now, according to my knowledge, when we use Convolutional layer, the data needs to be

#

can someone help answer my question

lilac dagger Mar 6, 2022, 10:58 AM

#

know where to get started with data science and what is basically is

#

is it legit just anyalization shit tons of data for a motive

#

like facebook with their ad systems?

lapis sequoia Mar 6, 2022, 11:25 AM

#

lilac dagger know where to get started with data science and what is basically is

you can check pins snow.

lapis sequoia Mar 6, 2022, 11:26 AM

#

lilac dagger is it legit just anyalization shit tons of data for a motive

it helps in various ways. it can be used for analytics, predictions, classification, reinforcementing, problem solving and well.. hella hella stuff.

lilac dagger Mar 6, 2022, 11:27 AM

#

icic

tacit basin Mar 6, 2022, 11:41 AM

#

Crazy idea: Neovim but like jupyter Notebook. So Neovim Notebooks! Possible?

modest shuttle Mar 6, 2022, 1:56 PM

#

Hello,
rect = win32gui.GetWindowRect(hwnd)
I grab my screen for object detection but i want grab specific section of my screen, How can i do that?

gloomy anvil Mar 6, 2022, 3:00 PM

#

hello y'all! I created a SARIMAX model and need some help evaluating the Results:

#

#

I mean this looks quite good at first glance, right? But is it? The RMSE is 0.024718 when comparing acutal vs. prediction

#

I posted my code and my approach to: https://www.reddit.com/r/learnmachinelearning/comments/t7yznq/i_need_help_evaluating_my_results_interpreting_my/

r/learnmachinelearning - I need help evaluating my results / interp...

0 votes and 0 comments so far on Reddit

#

Could you maybe have a look at it?

desert oar Mar 6, 2022, 3:58 PM

#

gloomy anvil I mean this looks quite good at first glance, right? But is it? The RMSE is 0.02...

what is the actual scale of the variable?

#

rmse of 0.025 on values on the order of ~2 seems good to me!

#

however it looks like your model testing procedure is probably not valid

#

you don't want to just check a bunch of one-step-ahead forecasts, obviously those will always be good

#

you need a train/test split

#

or better yet cross validation

#

https://otexts.com/fpp3/tscv.html @gloomy anvil

5.10 Time series cross-validation | Forecasting: Principles and Pra...

3rd edition

fallow frost Mar 6, 2022, 4:06 PM

#

Any body in the data science/ analytics field?
I wanna ask how much more do i need to know to get a basic/ junior data analyst position

gloomy anvil Mar 6, 2022, 4:07 PM

#

desert oar what is the actual scale of the variable?

So this is the closing price that I am trying to predict:

count    171.000000
mean       1.906868
std        0.505193
min        1.056412
25%        1.393226
50%        1.988028
75%        2.233953
max        2.968611
Name: close, dtype: float64

#

this is the description of my test datset. I split it into 1000 rows for training and 171 rows for the test.

#

This is my code:

#load dataset
df = pd.read_csv('ADA_1440.csv', index_col = 'date', parse_dates = True)

#split the closing price into train and test data
train = df.iloc[:1000,4]
test = df.iloc[1000:,4]

#select exogenous variables
exo = df.iloc[:,6:61]

#split exogenuous variables into train and test data
exo_train = exo.iloc[:1000]
exo_test = exo.iloc[1000:]

#run auto_arima to find the best configuration (I selected m=7 and D=1 by running seasonal_decompose and acf and pacf plots)
auto_arima(df['close'], exogenous=exo, m=7, trace=True, D=1).summary()

#set the best configuration from auto_arima for the SARIMAX model 
Model = SARIMAX(train, exog = exo_train, order=(1,0,2), seasonal_order = (0,1,1,7))

#train model
Model = Model.fit()

#get prediction
prediction = Model.predict(len(train), len(train)+len(test)-1, exog = exo_test, typ = 'levels')

#plot the prediction
plt.plot(test, color ='red', label = 'Actual')
plt.plot(prediction, color ='blue', label = 'Prediction')
plt.xlabel('Time')
plt.ylabel('Price')
plt.legend()
plt.show

#calculate rmse
rmse = math.sqrt(mean_squared_error(test, prediction))

gloomy anvil Mar 6, 2022, 4:11 PM

#

desert oar https://otexts.com/fpp3/tscv.html <@!803185107547586600>

thanks for the crossvalidation. I didnt think about this, because I am working with continuous timeseries. My assumption was that crossval is not possible in timeseries

serene scaffold Mar 6, 2022, 4:19 PM

#

novel raven i love data & specially probabilities but im not the algebra type of person

don't say that you're "not an algebra type of person". algebra is pretty much the most basic level of math, and if you let a prior experience dictate whether or not you like it, you're setting yourself up for disappointment.

misty flint Mar 6, 2022, 4:29 PM

#

fallow frost Any body in the data science/ analytics field? I wanna ask how much more do i ne...

best way to find out is to apply to a few jobs tbh

#

for entry-level data analyst positions, the amount of knowledge needed isnt usually too much

#

but the issue comes with how competitive those positions, especially from people making career changes

#

sometimes you are competing with people with graduate degrees, work experience in certain domain, etc.

#

so you usually need something special to stand out

graceful glacier Mar 6, 2022, 5:58 PM

#

hello!

#

i need help creating a dataframe with a multi index

serene scaffold Mar 6, 2022, 6:02 PM

#

graceful glacier i need help creating a dataframe with a multi index

please give more information; you haven't said enough for anyone to know what you need

graceful glacier Mar 6, 2022, 6:03 PM

#

sorry about that im still thinking through it

serene scaffold Mar 6, 2022, 6:03 PM

#

sounds good. I might be able to check this channel again when your question is ready.

modern cypress Mar 6, 2022, 6:09 PM

#

Hey, I am trying to change my project to be able to detect multiple objects in a single instance. All my images are annotated using pascal, but I am unsure where to go from here. Previously I had a "default" class filled with many random images, but I realized this is very incorrect (in my mind) and I would rather use general object detection and maybe add some bounding boxes if time allows

#

My file breakdown looks like this:

#

graceful glacier Mar 6, 2022, 6:11 PM

#

i like the profile picture tony

modern cypress Mar 6, 2022, 6:11 PM

#

thanks 🤣

somber prism Mar 6, 2022, 6:12 PM

#

modern cypress Hey, I am trying to change my project to be able to detect multiple objects in a...

🤔 what you mean by general obj detection ? you mean you want your target label to have xmin, ymin, xmax, ymax ? if thats the case then pascal voc format is exactly that

modern cypress Mar 6, 2022, 6:15 PM

#

somber prism 🤔 what you mean by general obj detection ? you mean you want your target label ...

Hmm yeah I guess so. Would that allow me to detect multiple objects in the same image?

#

This is my first time trying something of this nature

#

#

These are some results of the older project but I realised that I was treating an ordinary image as a class instead of it being the default setting, if that makes sense

somber prism Mar 6, 2022, 6:17 PM

#

btw does anyone know how to plot the normalized image, i mean after normalizing the image using albumentatation and converting the image range to -1 to 1 , matplotlib is displaying black image . now how can i avoid that ?? i even tried plt.imshow((image * 255).to(torch.uint8))

somber prism Mar 6, 2022, 6:17 PM

#

modern cypress Hmm yeah I guess so. Would that allow me to detect multiple objects in the same ...

rn almost every object detection model can detect multiple objs in an image

modern cypress Mar 6, 2022, 6:20 PM

#

somber prism rn almost every object detection model can detect multiple objs in an image

Oh? This is my model currently. Maybe I'm finding this is because I'm finding the argmax of the prediction

#

somber prism Mar 6, 2022, 6:21 PM

#

what you are trying is image classification

modern cypress Mar 6, 2022, 6:22 PM

#

Ohhhhhhh right. Okay time to look into object detection approaches

#

Thanks for the help

violet gull Mar 6, 2022, 6:34 PM

#

https://paste.pythondiscord.com/ekavewuzaf how fix and be less suck?

#

i ran it for an hour and it stopped increasing at about a score of 165

woeful tusk Mar 6, 2022, 6:55 PM

#

Any tips to plot this, all blocks on the same plot? I have it inside a DataFrame. My line of thought was iterating through it each column but I guess iterating and dataframe shouldnt work together, right?

serene scaffold Mar 6, 2022, 6:56 PM

#

woeful tusk Any tips to plot this, all blocks on the same plot? I have it inside a DataFrame...

avoid thinking about iterating when you're working with dataframes.

#

what kind of plot are we talking about?

#

do you want line plots, where each block is a line?

woeful tusk Mar 6, 2022, 6:57 PM

#

serene scaffold do you want line plots, where each block is a line?

Yea

#

I was thinking of making it dinamically, since the amount of blocks can change based on user input

serene scaffold Mar 6, 2022, 6:58 PM

#

woeful tusk Yea

I would first do df.index = df.index.str.extract(r'(\d+)').astype(int) so that the index is ints instead of strings

#

and then you can use df.plot.line(). it might even work just like that, without any additional work

#

you might have to transpose it. but then that's just df.T.plot.line()

woeful tusk Mar 6, 2022, 7:00 PM

#

serene scaffold you might have to transpose it. but then that's just `df.T.plot.line()`

I thought of that too, I've used that Block as column format because it's easier to visualize on excel

serene scaffold Mar 6, 2022, 7:00 PM

#

woeful tusk I thought of that too, I've used that Block as column format because it's easier...

I don't know what you mean by this. did df.plot.line() look like something other than what you expected?

#

I would need a code representation of your dataframe that I can c/p to experiment.

woeful tusk Mar 6, 2022, 7:02 PM

#

serene scaffold I don't know what you mean by this. did `df.plot.line()` look like something oth...

I mean, the DF is the blocks as columns because my original project only had an excel file as outpout, I thought of transposing now before plotting

serene scaffold Mar 6, 2022, 7:03 PM

#

woeful tusk I mean, the DF is the blocks as columns because my original project only had an ...

you'll have to do print(df.head().to_dict('list')) and show the text for us to continue.

woeful tusk Mar 6, 2022, 7:04 PM

#

{'Bloco 1': [6000.0, 6000.0, 6000.0, 6000.0, 5996.913420966637], 'Bloco 2': [6000.0, 6000.0, 6000.0, 5986.342797261716, 5963.890663247039], 'Bloco 3': [6000.0, 6000.0, 5939.570902083334, 5873.3415172031355, 5809.641970812106], 'Bloco 4': [6000.0, 5732.619047619048, 5586.096291071429, 5478.48851392744, 5386.497501391264], 'Bloco 5': [6000.0, 6000.0, 5939.570902083334, 5859.684314464852, 5773.532634059145]}

serene scaffold Mar 6, 2022, 7:04 PM

#

let me see

#

is this not basically what you want?

woeful tusk Mar 6, 2022, 7:05 PM

#

There are only five values inside each list, but it goes further.

#

Yea

serene scaffold Mar 6, 2022, 7:05 PM

#

so what's the problem? didn't I basically give you the solution?

woeful tusk Mar 6, 2022, 7:07 PM

#

I was getting an error on the df.index =.... line

serene scaffold Mar 6, 2022, 7:07 PM

#

okay, so show the error

woeful tusk Mar 6, 2022, 7:07 PM

#

I guess it's because my index names have "Day" on the string

serene scaffold Mar 6, 2022, 7:07 PM

#

saying that you "got an error" is uninformative. copy and paste the error from Traceback

#

also, you can label the x axis as "day" with xlabel='Day'

woeful tusk Mar 6, 2022, 7:09 PM

#

ValueError: Index data must be 1-dimensional

serene scaffold Mar 6, 2022, 7:09 PM

#

I asked you to copy and paste the error from Traceback.

#

!traceback

arctic wedgeBOT Mar 6, 2022, 7:10 PM

#

Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.

A full traceback could look like:

Traceback (most recent call last):
  File "my_file.py", line 5, in <module>
    add_three("6")
  File "my_file.py", line 2, in add_three
    a = num + 3
TypeError: can only concatenate str (not "int") to str

If the traceback is long, use our pastebin.

woeful tusk Mar 6, 2022, 7:10 PM

#

Traceback (most recent call last):
  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\simuladorexplicito.py", line 89, in <module>
    relatorio.index = relatorio.index.str.extract(r'(\d+)').astype(int)
  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\.venv\lib\site-packages\pandas\core\generic.py", line 5596, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas\_libs\properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\.venv\lib\site-packages\pandas\core\generic.py", line 768, in _set_axis
w__
    return Index(np.asarray(data), dtype=dtype, copy=copy, name=name, **kwargs)  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\.venv\lib\site-packages\pandas\core\indexes\base.py", line 503, in __new__
    arr = klass._ensure_array(arr, dtype, copy)  File "C:\Users\joao_\Desktop\Projetos Python\Simulador de pressão\.venv\lib\site-packages\pandas\core\indexes\numeric.py", line 183, in _ensure_array
    raise ValueError("Index data must be 1-dimensional")ValueError: Index data must be 1-dimensional

serene scaffold Mar 6, 2022, 7:10 PM

#

okay, can you do print(df.index)?

woeful tusk Mar 6, 2022, 7:11 PM

#

Index(['Dia 0', 'Dia 15', 'Dia 30', 'Dia 45', 'Dia 60', 'Dia 75', 'Dia 90',
       'Dia 105', 'Dia 120', 'Dia 135', 'Dia 150', 'Dia 165', 'Dia 180',
       'Dia 195', 'Dia 210', 'Dia 225', 'Dia 240', 'Dia 255', 'Dia 270',
       'Dia 285', 'Dia 300', 'Dia 315', 'Dia 330', 'Dia 345', 'Dia 360'],
      dtype='object')

#

Btw, I need an extension to plot on the VS Code? The plot line runs fine, but shows nothing

serene scaffold Mar 6, 2022, 7:13 PM

#

I don't use vs code

#

try

df.index = df.index.str.extract(r'(\d+)').astype(int).squeeze().tolist()

#

there's probably a better way to do it. somewhere.

woeful tusk Mar 6, 2022, 7:17 PM

#

Had to do a plt.show()

#

serene scaffold Mar 6, 2022, 7:17 PM

#

looks like you need to transpose it.

woeful tusk Mar 6, 2022, 7:18 PM

#

That was with transpose already, gonna try without it

#

#

Worked, thank you very much mate

serene scaffold Mar 6, 2022, 7:27 PM

#

🔥

violet gull Mar 6, 2022, 7:43 PM

#

https://paste.pythondiscord.com/ekavewuzaf how fix and be less suck?
i ran it for an hour and it stopped increasing at about a score of 165

serene scaffold Mar 6, 2022, 8:30 PM

#

violet gull https://paste.pythondiscord.com/ekavewuzaf how fix and be less suck? i ran it fo...

165 out of what?

violet gull Mar 6, 2022, 8:30 PM

#

what

serene scaffold Mar 6, 2022, 8:30 PM

#

you got a score of 165. idk what that means.

violet gull Mar 6, 2022, 8:31 PM

#

there is a data set of 500 squares and 500 not squares

#

for every data thing it correctly identifies it gets a point

#

and for every one it does wrong it loses a point

#

so 165 means it got 417.5 wrong and 582.5 right i think?

serene scaffold Mar 6, 2022, 8:33 PM

#

violet gull so 165 means it got 417.5 wrong and 582.5 right i think?

you should use an actual metric, like precision or recall. do you know what true positive, false positives, true negatives, and false negatives are?

violet gull Mar 6, 2022, 8:33 PM

#

yes

#

wym by an actual metric

serene scaffold Mar 6, 2022, 8:35 PM

#

saying that you "got 517.5 wrong and 582.5 right" is vague, whereas reporting the score for a performance metric is specific.

#

also how did you get some partially correct?

violet gull Mar 6, 2022, 8:35 PM

#

it didnt

#

it never actually outputted 165

#

it only outputs even numbers

#

165 was just an average i saw

#

how do i make a performance metric

serene scaffold Mar 6, 2022, 8:39 PM

#

well, let's go over a few issues with the code first.

notSquare = square  # This **does not** make a copy, it just makes another reference
if self.classify(square) == True:  # Never do comparisons to True or False. if `self.classify(squre)` is already True, you're just writing `if True == True`

You're also using lowerCamelCase for everything, when you should be using UpperCamelCase for class names and snake_case for everything else.

vague kindle Mar 6, 2022, 8:42 PM

#

serene scaffold well, let's go over a few issues with the code first. ```py notSquare = square ...

Wouldn't the code still technically work anyway?

violet gull Mar 6, 2022, 8:43 PM

#

^

vague kindle Mar 6, 2022, 8:43 PM

#

If you were using camel case

serene scaffold Mar 6, 2022, 8:43 PM

#

I don't think I can dive into what it would take to improve the performance, as you've written a lot of it in "pure python" and I'm used to reading code that uses the numpy/torch style is used more extensively.

serene scaffold Mar 6, 2022, 8:44 PM

#

vague kindle Wouldn't the code still technically work anyway?

not the part about notSquare = square. but I was just making suggestions as I was reading through the code, regardless of whether they were logic or style errors.

violet gull Mar 6, 2022, 8:44 PM

#

hmmm

#

so is it broken?

#

classify square returns a boolean

serene scaffold Mar 6, 2022, 8:44 PM

#

I'm not sure what it's intended to do.

violet gull Mar 6, 2022, 8:44 PM

#

which part

serene scaffold Mar 6, 2022, 8:45 PM

#

all of it. what is the model supposed to predict, for what inputs?

violet gull Mar 6, 2022, 8:45 PM

#

its suppose to take an array like ```
test = [[
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
]]

serene scaffold Mar 6, 2022, 8:46 PM

#

like, if the region of 1s is square?

violet gull Mar 6, 2022, 8:46 PM

#

yes

serene scaffold Mar 6, 2022, 8:46 PM

#

you don't need ML for that?

violet gull Mar 6, 2022, 8:46 PM

#

well i want to

serene scaffold Mar 6, 2022, 8:46 PM

#

also you have [[ and ]] but each row isn't its own list

violet gull Mar 6, 2022, 8:46 PM

#

ye

#

im just trying to learn how machine learning works

#

:C

serene scaffold Mar 6, 2022, 8:48 PM

#

        for square in squares:
            if self.classify(square) == True:
                score += 1
            else:
                score -= 1

you don't subtract when a model makes the wrong prediction.

violet gull Mar 6, 2022, 8:48 PM

#

yes i do