#data-science-and-ml | Python | Page 311

broken quail May 12, 2021, 2:44 PM

#

can you give list of field?

hoary wigeon May 12, 2021, 2:46 PM

#

sure

#

Columns Name : ['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC', 'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition', 'SalePrice', 'GrLivAreaGroup', 'PricePerGRLA']

broken quail May 12, 2021, 2:49 PM

#

what the hell so much hahaha... wait

#

maybe... housing style & average price (barplot), (i dont know what the utities) but you can try utilities with landslope

maybe more help if you had dataset link or metadata

south crag May 12, 2021, 2:57 PM

#

hoary wigeon sure

is this the kaggle dataset

#

or is it something u r given

broken quail May 12, 2021, 2:58 PM

#

you had the link? so many different housing dataset on kaggle

south crag May 12, 2021, 2:58 PM

#

I would suggest u to pick a subset randomly and pairplot to observe which features r important

#

or u can just preprocess things

#

build a model in keras and try using l2 regularisation

#

it will help to eliminate features

lapis sequoia May 12, 2021, 3:13 PM

#

enc = OneHotEncoder()

#

It creates a matrix, https://www.youtube.com/watch?v=irHhDMbw3xo

YouTube

Data School

How do I encode categorical features using scikit-learn?

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this video, you'll learn how to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a...

▶ Play video

#

https://www.kaggle.com

Kaggle: Your Machine Learning and Data Science Community

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

desert oar May 12, 2021, 3:18 PM

#

Think of it this way: if you sample without replacement from a finite population, eventually you start running out of members in the population and the distribution starts to shift away from what it was originally.

lapis sequoia May 12, 2021, 3:18 PM

#

south crag I would suggest u to pick a subset randomly and pairplot to observe which featur...

In addition to that I'd recommend using seaborn pairplot to find a good correlation between your target and features. @hoary wigeon

neon marsh May 12, 2021, 3:20 PM

#

Is anyone here familiar with Human Action Recognition or HAR?

bold timber May 12, 2021, 3:23 PM

#

lapis sequoia enc = OneHotEncoder()

Sorry, can u explain clearly to me step by step how to built an array of [0.0 0.0 1.0 27.0 48000.0] on 'Spain' ?

lapis sequoia May 12, 2021, 3:26 PM

#

Just watch the video

#

It explain you how to build it

#

I'd recommend not using fit_transform() but using fit() and then transform()

fringe rapids May 12, 2021, 3:39 PM

#

Anyone have any guidance how quadratic programming works?

hoary wigeon May 12, 2021, 3:43 PM

#

south crag is this the kaggle dataset

its something that i got from sir as assignment.

hoary wigeon May 12, 2021, 3:45 PM

#

lapis sequoia In addition to that I'd recommend using seaborn pairplot to find a good correlat...

shall i generate pairplot by taking 4 random feature and filter the good one ?

lapis sequoia May 12, 2021, 3:48 PM

#

up to you

#

not that the more features you give to the pairplot the longer it will take to plot that

#

around 5-10 minutes for all your features (depending on how many values you have in each feature)

hoary wigeon May 12, 2021, 3:50 PM

#

lapis sequoia not that the more features you give to the pairplot the longer it will take to p...

yeah i know

#

im confused between columns which columns can be helpful for me

#

I think i must try heatmap first

lapis sequoia May 12, 2021, 3:51 PM

#

just plot all, go and grab a drink or go for a walk

hoary wigeon May 12, 2021, 3:51 PM

#

then relate

lapis sequoia May 12, 2021, 3:51 PM

#

and see the result

#

Don't forget that preprocessing is mainly human based

hoary wigeon May 12, 2021, 3:52 PM

#

yeh

grave frost May 12, 2021, 3:57 PM

#

lapis sequoia Don't forget that preprocessing is mainly human based

you can use automatic tools for basic datasets tho

#

try PyCaret - it has outstanding feature engineering capabilities

lapis sequoia May 12, 2021, 3:59 PM

#

grave frost you can use automatic tools for basic datasets tho

are you telling me i can be even more lazy than i already am!?

#

😮

south crag May 12, 2021, 4:02 PM

#

lolll xD

hoary wigeon May 12, 2021, 4:03 PM

#

restriction : use only python

south crag May 12, 2021, 4:03 PM

#

hoary wigeon restriction : ` use only python `

yeah seaborn is python library

lapis sequoia May 12, 2021, 4:04 PM

#

grave frost you can use automatic tools for basic datasets tho

so it auto-parcels the dataframe picking the best features that can be utilized for the model?

hoary wigeon May 12, 2021, 4:05 PM

#

shall i drop all object column for finding correlation ?

#

shall i consider only int and float ?

south crag May 12, 2021, 4:05 PM

#

hoary wigeon restriction : ` use only python `

I would honestly ask u to loook into kaggle housing dataset code tab .. many people would implement different solution...most of them shd work for ur dataset

hoary wigeon May 12, 2021, 4:06 PM

#

south crag I would honestly ask u to loook into kaggle housing dataset code tab .. many peo...

i guess no

#

its a challenge

hoary wigeon May 12, 2021, 4:08 PM

#

hoary wigeon shall i drop all object column for finding correlation ?

clear my this doubt ?

south crag May 12, 2021, 4:09 PM

#

no

#

u have to preprocess it

#

remove null things and stuff

grave frost May 12, 2021, 4:10 PM

#

lapis sequoia so it auto-parcels the dataframe picking the best features that can be utilized ...

yeah, it's pretty basic stuff that you can write on your own; just that brute-force mostly wins out over other 'human' alternatives

lapis sequoia May 12, 2021, 4:11 PM

#

grave frost yeah, it's pretty basic stuff that you can write on your own; just that brute-fo...

yup, i guess it is for someone who doesn't have a lot of time on coding and is stuck in constant meetings

#

hence can run the code in the background while working

grave frost May 12, 2021, 4:11 PM

#

lapis sequoia yup, i guess it is for someone who doesn't have a lot of time on coding and is s...

it's mostly as a starter point for a dataset.

#

you can't expect to beat SOTA or anything with it

lapis sequoia May 12, 2021, 4:11 PM

#

you tried it? how slow is it?

grave frost May 12, 2021, 4:11 PM

#

lapis sequoia you tried it? how slow is it?

20-30mins. does automatic EDA for me, which I find useful

lapis sequoia May 12, 2021, 4:12 PM

#

not bad, how big is the dataset?

grave frost May 12, 2021, 4:12 PM

#

lapis sequoia not bad, how big is the dataset?

mine was about 1000 rows

lapis sequoia May 12, 2021, 4:13 PM

#

it is fine i guess

#

would be amazing for a project that you don't want to do

#

🤣

grave frost May 12, 2021, 4:13 PM

#

uh-huh. won't give much accuracy tho

lapis sequoia May 12, 2021, 4:14 PM

#

what's the score difference?

grave frost May 12, 2021, 4:14 PM

#

so not a one-stop solution, but not bad

lapis sequoia May 12, 2021, 4:14 PM

#

between handpicking and that lib?

grave frost May 12, 2021, 4:14 PM

#

dunno - it uses traditional ML algos which I never want to see results from

lapis sequoia May 12, 2021, 4:15 PM

#

i guess it uses sklearn

grave frost May 12, 2021, 4:15 PM

#

lapis sequoia i guess it uses sklearn

most probably

lapis sequoia May 12, 2021, 4:15 PM

#

but bro

#

#

🤣

grave frost May 12, 2021, 4:16 PM

#

hefty

lapis sequoia May 12, 2021, 4:16 PM

#

reminds me of the mom can we have x at home

grave frost May 12, 2021, 4:16 PM

#

you won't use most of them anyways.

#

and models have more arguments lol

lapis sequoia May 12, 2021, 4:16 PM

#

lapis sequoia

"Mom can we have machine learning?"
"We already have ML at home"
ML at home

lapis sequoia May 12, 2021, 4:18 PM

#

grave frost you won't use most of them anyways.

what do you use pytorch or tenser?

desert oar May 12, 2021, 4:18 PM

#

caret is already "we have X at home"

#

I kid. It wasn't a bad library for its time

#

Makes sense to port it to python. Much less verbose than sklearn

#

Good for quick things

grave frost May 12, 2021, 4:18 PM

#

lapis sequoia what do you use pytorch or tenser?

TF mostly, Pytorch for complex tensor manipulation. though, I recently started using PT for modelling and I can say it's much easier to debug

south crag May 12, 2021, 4:25 PM

#

hoary wigeon clear my this doubt ?

https://www.kaggle.com/learn/intermediate-machine-learning for data preprocessing

Learn Intermediate Machine Learning Tutorials

Handle missing values, non-numeric values, data leakage, and more.

south crag May 12, 2021, 4:27 PM

#

hoary wigeon i guess no

https://www.kaggle.com/learn/intro-to-machine-learning in this one , they tell us how to solve kaggle housing data using random forest

Learn Intro to Machine Learning Tutorials

Learn the core ideas in machine learning, and build your first models.

lapis sequoia May 12, 2021, 4:28 PM

#

grave frost TF mostly, Pytorch for complex tensor manipulation. though, I recently started u...

Tried Scala? Wanna try it out and see what it is about

bronze skiff May 12, 2021, 4:40 PM

#

lapis sequoia Tried Scala? Wanna try it out and see what it is about

one of these things is not like the other

lapis sequoia May 12, 2021, 4:41 PM

#

bronze skiff one of these things is not like the other

?

lapis sequoia May 12, 2021, 5:48 PM

#

@bold timber message here

bold timber May 12, 2021, 5:51 PM

#

lapis sequoia ?

Sorry I still don't understand about this. I had been watching your suggest video and It's so different in my case.

I know a 'France' have binary number is [1.0 0.0 0.0 44.0 72000.0] because at the first time 'France' is showing up and the rest (except 'France') Dummy number is 0.

How about 'Spain'?
Spain showing up at the second place. Why 2 Dummy numbers is 0 and 0, why not 0 and 1 because 'Spain' showing up at the second row in that table?

#

Please telling me clearly. I am so confuse about that

lapis sequoia May 12, 2021, 5:54 PM

#

Alphabetical

#

F - G - S

#

France = [1, 0, 0]
Germany = [0, 1, 0]
Spain = [0, 0, 1]

lapis sequoia May 12, 2021, 5:56 PM

#

bold timber Sorry I still don't understand about this. I had been watching your suggest vide...

.

bold timber May 12, 2021, 5:56 PM

#

Oh my god I think like that before. but i still don't believe it

#

ok. thank u so much!

lapis sequoia May 12, 2021, 5:57 PM

#

np

bold timber May 12, 2021, 5:59 PM

#

lapis sequoia np

But, why the value of 'Age' and 'Salary' not change the position?

lapis sequoia May 12, 2021, 6:02 PM

#

they are values

#

they are not strings

#

onehot transforms strings

#

not values

#

unless you tell it to transform them

bold timber May 12, 2021, 6:05 PM

#

lapis sequoia they are values

Whether can the reason is because i using remainder='Passthrough'?

lapis sequoia May 12, 2021, 6:05 PM

#

yup it basically passes through

bold timber May 12, 2021, 6:05 PM

#

So that having value to keep

bold timber May 12, 2021, 6:05 PM

#

lapis sequoia yup it basically passes through

Oh yeah I understand now

lapis sequoia May 12, 2021, 6:06 PM

#

👍

bold timber May 12, 2021, 6:08 PM

#

lapis sequoia 👍

Why when I running at the second time, I have one new dummy number?

#

And so on...

lapis sequoia May 12, 2021, 6:11 PM

#

I'd recommend not using fit_transform() but using fit() and then transform()

bold timber May 12, 2021, 6:11 PM

#

lapis sequoia I'd recommend not using fit_transform() but using fit() and then transform()

How I can do that?

#

where i place that code in a cell?

lapis sequoia May 12, 2021, 6:11 PM

#

you literally substitute fit_transform with fit and then transform

#

why are you fit_tranform your test?

bold timber May 12, 2021, 6:13 PM

#

lapis sequoia why are you fit_tranform your test?

Because I want to generalize the value in order to machine can understand

bold timber May 12, 2021, 6:14 PM

#

bold timber where i place that code in a cell?

I had been try but i get an error

neat basin May 12, 2021, 6:14 PM

#

hey sorry if I'm interrupting at all but I got a quick question about jupyter/colab. Is there a way to have the same np.random.seed across the entire book or do I have to call it in every sell I want to use that seed?

lapis sequoia May 12, 2021, 6:15 PM

#

you have to apply transform on your test dataset too

#

unless you don't have it

#

honestly fit_transform is the worst idea i've seen for a beginners

#

share your notebook in a colab or datalys

bold timber May 12, 2021, 6:17 PM

#

lapis sequoia honestly fit_transform is the worst idea i've seen for a beginners

I learning from udemy. and my Instructor teaching me like that

bold timber May 12, 2021, 6:17 PM

#

lapis sequoia honestly fit_transform is the worst idea i've seen for a beginners

why u can say it's a worst idea?

lapis sequoia May 12, 2021, 6:17 PM

#

because you have people who are wondering wtf is happening to their data

bold timber May 12, 2021, 6:18 PM

#

I'm so beginner in machine learning

bold timber May 12, 2021, 6:18 PM

#

lapis sequoia because you have people who are wondering wtf is happening to their data

yeah this happened to me hahaha

lapis sequoia May 12, 2021, 6:18 PM

#

share collab

bold timber May 12, 2021, 6:19 PM

#

lapis sequoia share collab

like this

lapis sequoia May 12, 2021, 6:21 PM

#

i added you

bold timber May 12, 2021, 6:21 PM

#

lapis sequoia i added you

what do you mean "added you?"

#

I had been accept bro

oak geode May 12, 2021, 6:30 PM

#

do you ppl have any suggestions for whom to follow on YouTube or twitter regarding ds

#

like getting useful stuff and resources from them

bold timber May 12, 2021, 6:42 PM

#

oak geode do you ppl have any suggestions for whom to follow on YouTube or twitter regardi...

CS Dojo

#

or u can using Udemy or Datacamp

oak geode May 12, 2021, 6:44 PM

#

no I didn't mean courses

#

just knowledge and practical stuff

#

or maybe talks and conferences

ripe blade May 12, 2021, 7:23 PM

#

Hi, i need help regarding PCA implementation in python, can anyone guide me?

lilac raven May 12, 2021, 7:30 PM

#

If I want to find the correlation coefficient by using matrix multiplication on a time series data[t] and volume [x,y,z,t], like this seed_ts_win = seed_ts[t:t+win_width] vol_ts_win = vol_ts[:, :, :, t:t+win_width] Will I need to reshape the vol_ts_win data to something like [xyz,t] to do a matrix dot product?

grave frost May 12, 2021, 7:47 PM

#

lapis sequoia Tried Scala? Wanna try it out and see what it is about

Why would I want to ever use Scala?

ripe blade May 12, 2021, 7:58 PM

#

ripe blade Hi, i need help regarding PCA implementation in python, can anyone guide me?

😦

desert oar May 12, 2021, 8:01 PM

#

grave frost Why would I want to ever use Scala?

different programming languages have different features and ways of expressing things. same reason you might choose one tool over another when building a shed.

desert oar May 12, 2021, 8:01 PM

#

ripe blade 😦

don't "ask to ask". state your question, then maybe someone can help.

grave frost May 12, 2021, 8:02 PM

#

hmmmm... 🤔

grave frost May 12, 2021, 8:02 PM

#

desert oar different programming languages have different features and ways of expressing t...

still, I don't need it, I don't like it and see no reason to learn some arbitrary language just to handle big data

desert oar May 12, 2021, 8:04 PM

#

any language that you don't already use is "some arbitrary language"

grave frost May 12, 2021, 8:04 PM

#

desert oar any language that you don't already use is "some arbitrary language"

true

#

but there is nothing there in scala that makes a compelling reason for me to learn

#

Anyways,
255 tensor(0.1585, device='cuda:0', grad_fn=<NegBackward>)
Any guesses what exactly that means so I can find it and remove that line?

#

my guess is that it's the loss

#

is that how PT displays it's loss?

slim fox May 12, 2021, 8:55 PM

#

grave frost still, I don't need it, I don't like it and see no reason to learn some arbitrar...

That's until you have actual need for it, either working with Spark or maybe just bcs of your work....
I would recommend against just discarding things this way "I don't want to learn Scala" or don't plan learn sql etc. It's totally fine that you don't need it now or don't want to bother with it at this point

#

Just don't be too absolute.

#

“Only a Sith deals in absolutes.”

lapis sequoia May 12, 2021, 9:25 PM

#

grave frost Why would I want to ever use Scala?

data, big data processing, so just wondering if you happened to have a chance to play around w/ it

grave frost May 12, 2021, 10:12 PM

#

slim fox That's until you have actual need for it, either working with Spark or maybe jus...

ofc, when I would need it I would learn it - but chances are that would be a long time hence, and I see no benefit in torturing myself to learn something that I don't care at all

exotic maple May 12, 2021, 10:27 PM

#

grave frost dunno - it uses traditional ML algos which I never want to see results from

wyh?

exotic maple May 12, 2021, 10:28 PM

#

grave frost dunno - it uses traditional ML algos which I never want to see results from

why? I don't think it's wise to throw a NN at everything just ebcause lol

desert oar May 12, 2021, 10:36 PM

#

exotic maple why? I don't think it's wise to throw a NN at everything just ebcause lol

Same, I think it's is a really weird approach, though it probably works in certain problem domains

exotic maple May 12, 2021, 10:40 PM

#

Generalizations are for the most part, nonsense. Neural Networks are extremely powerful in many domains, but if you can pop a Gaussian Naive Bayes and get 90% + accuracy, is that "wrong"?

I have a friend in DS that tells me "just use a NN" for everything and I've told him many times that it sounds like a stupid answer. If you can just throw a NN at everything after some preprocessing, where exactly is the Data Scientists expertise needed?

neat basin May 12, 2021, 11:05 PM

#

oak geode do you ppl have any suggestions for whom to follow on YouTube or twitter regardi...

depends what you're looking for. Ken Jee is always good on YouTube, so is the Python Programming channel. Ken's podcast also is good for data science talks. A podcast that a guy I know runs who's a data scientist at four square is "the local maximum" which is all about data science and tech topics. Don't know anything for Twitter. LinkedIn has a lot of great people to follow.

lapis sequoia May 12, 2021, 11:10 PM

#

guys

#

i have a question

#

with an already image segmentation CNN trained, can i pass it an anime image, and will it segmentate correctly?

slate hollow May 12, 2021, 11:33 PM

#

so

#

i was messing around with

#

tf.keras.losses.BinaryCrossEntropy() (default values for everything)

#

and this happened: py loss.call([[0.1, 0.9], [0.1, 0.9], [0.1, 0.9]], [[0, 1], [0.1, 0.9], [0.1, 0.9]]) Out[22]: <tf.Tensor: shape=(3,), dtype=float32, numpy=array([1.5379095 , 0.32508278, 0.32508278], dtype=float32)>

#

the thing is, even though it supposedly expected some numbers, it worked just fine

velvet thorn May 12, 2021, 11:44 PM

#

slate hollow the thing is, even though it supposedly expected some numbers, it worked just fi...

huh

#

what do you mean

slate hollow May 12, 2021, 11:44 PM

#

oh uh

#

in the docs

#

it says that for y_true it expects an array of 1'a dn 0's

velvet thorn May 12, 2021, 11:46 PM

#

slate hollow it says that for `y_true` it expects an array of 1'a dn 0's

oh

#

as in

#

you're asking why

slate hollow May 12, 2021, 11:46 PM

#

yeah

velvet thorn May 12, 2021, 11:46 PM

#

when you pass non-integral values for y_true (the first argument), you don't run into an error?

slate hollow May 12, 2021, 11:47 PM

#

yeah

#

i passed [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9]]

velvet thorn May 12, 2021, 11:48 PM

#

slate hollow yeah

okay

#

uh

#

do you know what crossentropy means?

#

mathematically

slate hollow May 12, 2021, 11:49 PM

#

no, not 100% sure

velvet thorn May 12, 2021, 11:49 PM

#

okay

#

what about entropy?

#

in the information theory context

#

just wanna get an idea of your background knowledge

#

actually do you wanna know the theory or do you just want a quick answer

slate hollow May 12, 2021, 11:51 PM

#

just just a quick answer lol

velvet thorn May 12, 2021, 11:51 PM

#

because it's not checked

#

and it's mathematically valid

#

that's it basically

slate hollow May 12, 2021, 11:52 PM

#

how can you have something that can just indiscriminately take both 1 and 2 numbers

velvet thorn May 12, 2021, 11:53 PM

#

slate hollow how can you have something that can just indiscriminately take both 1 and 2 numb...

are you talking about the dimensions

slate hollow May 12, 2021, 11:53 PM

#

yeah

velvet thorn May 12, 2021, 11:53 PM

#

broadcasting?

#

again, it's not mathematically invalid

#

you could have such an output from a previous layer

#

so it's practically possible

slate hollow May 12, 2021, 11:55 PM

#

ok then

slate hollow May 13, 2021, 12:18 AM

#

so here's my code: https://paste.pythondiscord.com/heropoboro.py
all the filepaths contain is just one line of text, so yeah

#

thing is, when i run it i get this cryptic error: https://paste.pythondiscord.com/wusedupito.apache

#

and i used global variables to get what was actually happening in split_up and i get this: <tf.Tensor 'sequential/text_vectorization/StringSplit/RaggedGetItem/strided_slice_5:0' shape=(None,) dtype=string>

hoary wigeon May 13, 2021, 4:45 AM

#

std        20.645407
min        11.000000
25%        17.000000
50%        27.000000
75%        54.000000
max        71.000000

#

Is there any quick function to distribute my numerical data in to category , like

If value is 6, it should lie in category 0-10

?????

exotic maple May 13, 2021, 4:58 AM

#

hoary wigeon ``` std 20.645407 min 11.000000 25% 17.000000 50% 27...

you mean bins? you can use pandas cut

hoary wigeon May 13, 2021, 4:59 AM

#

nope

#

periods

#

like we use for date

#

range**

exotic maple May 13, 2021, 4:59 AM

#

If you want your numerical data into a category thats a bin...

hoary wigeon May 13, 2021, 5:00 AM

#

exotic maple May 13, 2021, 5:00 AM

#

hoary wigeon May 13, 2021, 5:00 AM

#

i created a function and applied to it

exotic maple May 13, 2021, 5:00 AM

#

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

#

I mean its literally created to create bins lol

hoary wigeon May 13, 2021, 5:00 AM

#

ohh

#

can we use mix and max value and ask cut to create intervals automatically ?

exotic maple May 13, 2021, 5:02 AM

#

i dont know

hoary wigeon May 13, 2021, 5:03 AM

#

oh k

desert oar May 13, 2021, 5:36 AM

#

no, you'd have to write your own function to do it using e.g. np.linspace

tacit palm May 13, 2021, 6:30 AM

#

hello is there any libraries i should be aware of when dealing with time like hh:mm:ss in a dataframe

lilac raven May 13, 2021, 8:07 AM

#

Why is C just the last file and not the combined of the two? ```
if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
full_name = pathlib.Path(root) / file
try:
read_fname = full_name
data = np.loadtxt(read_fname)
# data_num = data

                   # data_list = data.tolist()
                   # datalist = []
                  # Output = []
                  #  datalist.append(data)
                    data_list = data.tolist()
                    
                    data_list.extend(data_list)
                    c = np.array(data_list)
                   # for i in range(len(data)):
                     #   Output.append(np.mean(data[i]))
                    print(data_list)```

desert oar May 13, 2021, 8:08 AM

#

tacit palm hello is there any libraries i should be aware of when dealing with time like hh...

pandas has good support for this by itself

tidal bough May 13, 2021, 8:08 AM

#

lilac raven Why is C just the last file and not the combined of the two? ``` if file.endswit...

You're extending data_list with itself?.. Weird thing to do, but it looks like it should work.

lilac raven May 13, 2021, 8:09 AM

#

just trying to make a combined location for the multiple files

hallow bronze May 13, 2021, 8:09 AM

#

Hey guys I want to learn datas cience and other skills

#

I decided to take a real python subscription is it any good

lilac raven May 13, 2021, 8:14 AM

#

this is full code with print np.mean at the end ```for root, dirs, files in os.walk("/Users/jsmith/Documents"):
for file in files:
if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
full_name = pathlib.Path(root) / file
try:
read_fname = full_name
data = np.loadtxt(read_fname)

                    data_list = data.tolist()
              
                    data_list.extend(data)
                    c = np.array(data_list)
                 
                    print(c)
                    
                    print((np.mean[c]), axis = 0)
        
                except Exception as e:
                             print (e)```

#

I get [47.63424964 48.70779177 44.95860981 46.17740726 38.02733795 38.13563849 35.35533906 35.68120161 38.23956264 40.52353468 36.66523725 31.91423693 39.82019774 40.08918629 33.96831102 59.21946001 43.1648879 44.69394836 40.13199376 75. 72.50760609 28.40454509 22.94157339 26.28287415 30.52569707 37.17810563 32.2139077 23.27373341 47.63424964 48.70779177 44.95860981 46.17740726 38.02733795 38.13563849 35.35533906 35.68120161 38.23956264 40.52353468 36.66523725 31.91423693 39.82019774 40.08918629 33.96831102 59.21946001 43.1648879 44.69394836 40.13199376 75. 72.50760609 28.40454509 22.94157339 26.28287415 30.52569707 37.17810563 32.2139077 23.27373341] 'function' object is not subscriptable [356.22258666 349.47877856 256.22921202 251.57835095 393.43572114 204.17516989 108.25317547 109.66546928 156.79073102 215.62248388 76.82953714 131.98240352 107.1130911 100. 155.02932274 267.62847382 342.38136632 289.35272592 319.09348501 277.627819 261.0439415 229.46949688 313.32438432 250.97033911 194.77984801 326.2595784 235.80044922 140.2466315 356.22258666 349.47877856 256.22921202 251.57835095 393.43572114 204.17516989 108.25317547 109.66546928 156.79073102 215.62248388 76.82953714 131.98240352 107.1130911 100. 155.02932274 267.62847382 342.38136632 289.35272592 319.09348501 277.627819 261.0439415 229.46949688 313.32438432 250.97033911 194.77984801 326.2595784 235.80044922 140.2466315 ] 'function' object is not subscriptable

#

so it prints each file one at a time

#

and not together like a combined array

#

why so?

ebon geyser May 13, 2021, 10:37 AM

#

https://stackoverflow.com/questions/67516244/how-can-i-train-my-chatterbot-in-python-with-chatterbot-module/67516520#67516520

Stack Overflow

How can I train my chatterbot in Python with `Chatterbot` module

I am trying to make a chatbot, using Chatterbot, and then integrating it into my Discord Bot... I have done some research and got to know that I can use the Chatterbot library easily to train my bo...

#

Anyone here who can answer this?

#

Please ping me

lapis sequoia May 13, 2021, 11:17 AM

#

Can someone help how can I convert GML string to GeoJSON?

#

I know, GeoPandas has methods for it, but I'm totally lost in it.

lapis sequoia May 13, 2021, 11:47 AM

#

Guys im a highschool student so should i learn linear algebra 18 hours to be good at machine learning????

#

and calculus 12 hours

#

etc.

#

math from freecodecamp

serene scaffold May 13, 2021, 12:20 PM

#

lapis sequoia Guys im a highschool student so should i learn linear algebra 18 hours to be goo...

You will need to know linear algebra to understand machine learning, though I'm not sure you can "learn it" in 18 hours. If you're in the US, make sure that you're doing well in whatever math courses you're currently taking so that you're a competitive applicant to computer science degree programs.

lapis sequoia May 13, 2021, 12:21 PM

#

Im not in the us

#

Sad kek

#

But good answer@serene scaffold thanks your respect

serene scaffold May 13, 2021, 12:24 PM

#

lapis sequoia Im not in the us

I only mentioned "in the US" because I don't want to make general statements about computer science departments I know nothing about. Is a university education something that's expected for scientific work in your region?

lapis sequoia May 13, 2021, 12:25 PM

#

Yep

#

They do

serene scaffold May 13, 2021, 12:26 PM

#

Alright. So look at universities with computer science programs that you might want to attend and figure out what they look for in applicants. If it's not on their website, you can probably call their admissions department and ask.

#

My department looks for a strong academic record in general, but not getting an A in calculus immediately disqualifies you.

#

(And that's more or less it. No programming experience is expected.)

#

Let me know if you have comments about that @lapis sequoia.

lapis sequoia May 13, 2021, 12:29 PM

#

Uh

#

Its depend on region?

serene scaffold May 13, 2021, 12:32 PM

#

lapis sequoia Its depend on region?

yes, if your goal is to work in machine learning professionally, it depends on the local market and what those employers expect. And if they want a university education, then it also depends on what they expect.

lapis sequoia May 13, 2021, 12:37 PM

#

@serene scaffold ok sir so its possible to learn all courses? But how can i use them for programming???

serene scaffold May 13, 2021, 12:38 PM

#

lapis sequoia <@253696366952316929> ok sir so its possible to learn all courses? But how can i...

learn all courses? the math ones you mentioned?

lapis sequoia May 13, 2021, 12:39 PM

#

Yah

serene scaffold May 13, 2021, 12:39 PM

#

Yes, you can do them if you want.

lapis sequoia May 13, 2021, 12:43 PM

#

So all of you explain

#

Learn math for field work?

#

@serene scaffold 1 More stupid question

serene scaffold May 13, 2021, 12:45 PM

#

Ping me again when you ask the question.

lapis sequoia May 13, 2021, 12:47 PM

#

@serene scaffold Can i get freelance by become machine learning?

serene scaffold May 13, 2021, 12:48 PM

#

lapis sequoia <@253696366952316929> Can i get freelance by become machine learning?

you're asking if you can become a freelance machine learning engineer? I am not sure. Try asking in #career-advice, though you will probably need to disclose what country you are in.

lapis sequoia May 13, 2021, 12:48 PM

#

In usa?

serene scaffold May 13, 2021, 12:49 PM

#

you said you are not in the USA, right? People will need to know what country you are in to know if that career direction is viable in your market.

lapis sequoia May 13, 2021, 12:55 PM

#

@serene scaffold can you example how to use linear algebra in real programming?

serene scaffold May 13, 2021, 12:55 PM

#

lapis sequoia <@253696366952316929> can you example how to use linear algebra in real programm...

if you make a neural network, you would use linear algebra

lapis sequoia May 13, 2021, 12:56 PM

#

But i think most kind of programming use linear algreba

#

Even game developer

#

@serene scaffold

desert oar May 13, 2021, 1:09 PM

#

serene scaffold My department looks for a strong academic record in general, but not getting an ...

Meanwhile here I am with my C in freshman calc 🙃

#

(but i had to work my butt off later to make up for it)

lapis sequoia May 13, 2021, 1:10 PM

#

@desert oar are you freelance?

desert oar May 13, 2021, 1:11 PM

#

i was a professional data scientist for 5 years although my current job is a software engineer. i was not a freelancer but i did do a one-off consulting gig.

lapis sequoia May 13, 2021, 1:11 PM

#

Jeez

#

Can you build onw car?

#

Can you build rocket??

#

Can you build onw discord app???

#

Can you hack nasa????

desert oar May 13, 2021, 1:12 PM

#

No car or rocket, but my friend with a mechanical engineering PhD can build a rocket 🙂 I have built a simple Discord bot before.

lapis sequoia May 13, 2021, 1:13 PM

#

That 5 years college?

desert oar May 13, 2021, 1:15 PM

#

My education path was:

well-regarded American public high shcool
well-regarded American research university BA with double major in economics and math (along with some other credentials)
top-ranked American research university MA

I have had a pretty "easy" journey, all things considered.

lapis sequoia May 13, 2021, 1:16 PM

#

@desert oar Is anything doesnt use much math?

desert oar May 13, 2021, 1:16 PM

#

In data science? No, unfortunately; you need math. Data engineering doesn't require much math, although math still helps in that role too.

#

I thought I hated math until I took a linear algebra course in college.

obtuse stratus May 13, 2021, 1:17 PM

#

desert oar In data science? No, unfortunately; you need math. Data engineering doesn't requ...

do i need to know economics along with data science

desert oar May 13, 2021, 1:17 PM

#

Then I realized I was just taught badly.

desert oar May 13, 2021, 1:17 PM

#

obtuse stratus do i need to know economics along with data science

No, I studied economics because I thought I wanted to become an economist. However it does turn out to be useful in some industries.

#

In fact I still kind of wish I became an economist 🙂 edit: I would not be opposed to a mid-career PhD, but the job market for academics is difficult now so I don't mind waiting.

lapis sequoia May 13, 2021, 1:19 PM

#

@desert oarWhat is your hardest math

desert oar May 13, 2021, 1:20 PM

#

Hardest? as in, the math that I have the most trouble with? Real analysis and financial math. Too much "computation". I prefer playing with abstract symbols.

obtuse stratus May 13, 2021, 1:21 PM

#

yeah, I'm a freshman in university with DS-AI major , do i need to know some other industries ?

desert oar May 13, 2021, 1:23 PM

#

obtuse stratus yeah, I'm a freshman in university with DS-AI major , do i need to know some oth...

The more you know about any particular industry, the more appealing you will be for a "general purpose" DS role. Industry knowledge can be a significant bonus on a job application, and can offset comparatively weaker research/academic credentials. A well-managed data science team has a mix of both "researchers" and "industry people". If you have an interest in a particular industry, you should feel empowered to pursue that interest, it might prove more fruitful than grinding away at stuff you don't care about.

lapis sequoia May 13, 2021, 1:23 PM

#

@desert oar god typing

#

@desert oar So can ai talk together?

obtuse stratus May 13, 2021, 1:25 PM

#

desert oar The more you know about any particular industry, the more appealing you will be ...

thank you so much. I appreciate it !

desert oar May 13, 2021, 1:25 PM

#

lapis sequoia <@389497659087650836> So can ai talk together?

Technically yes, but they won't say anything intelligent!

#

Artificial Gibberish

lapis sequoia May 13, 2021, 1:26 PM

#

Magic

#

@desert oar How to be mechanical engi?

#

https://tenor.com/view/when-youre-too-lonely-to-get-friends-to-help-capture-the-point-ready-turret-brain-gif-16208246

Tenor

#

And if im a mechanical engi can i build my onw sentry like tf2??

desert oar May 13, 2021, 1:29 PM

#

lapis sequoia <@389497659087650836> How to be mechanical engi?

Go to university, study a lot, learn lots and lots of calculus and linear algebra

#

You could probably build something that looks like it, but it can't "unpack" itself from nothing like that.

lapis sequoia May 13, 2021, 1:30 PM

#

Lol

#

Lmao

#

So how to smarther than elon musk?

obtuse stratus May 13, 2021, 1:30 PM

#

desert oar Go to university, study a lot, learn lots and lots of calculus and linear algebr...

for DS, what should i do for good career?

lapis sequoia May 13, 2021, 1:31 PM

#

What is ds?

obtuse stratus May 13, 2021, 1:31 PM

#

Data Science

lapis sequoia May 13, 2021, 1:31 PM

#

Oh

mint palm May 13, 2021, 1:31 PM

#

lapis sequoia https://tenor.com/view/when-youre-too-lonely-to-get-friends-to-help-capture-the-...

This defies conservation of volume

lapis sequoia May 13, 2021, 1:32 PM

#

@mint palm uh

#

Im died

#

Hello, how can I convert GML string to GeoJSON in Python? I was looking https://gis.stackexchange.com/questions/77974/converting-gml-to-geojson-using-python-and-ogr-with-geometry-transformation/77982#77982 but it doesn't work, and also it's reading file instead of string from variable. Can someone help me please?

mint palm May 13, 2021, 1:32 PM

#

Haha

#

I was serious

karmic apex May 13, 2021, 1:34 PM

#

Hi everyone. Is there any software engineer/developer here who is switching to data science?

jade carbon May 13, 2021, 1:40 PM

#

can we convert the .pb file into h5?
y mean in from pure tf into keras model

#

y wanna check how they build the inception coco model

crude fable May 13, 2021, 2:02 PM

#

Hi, is anyone familar with pytorch advanced tensor indexing (or operations) and is free to voicechat a little bit?

deft heron May 13, 2021, 2:17 PM

#

has anyone here ever used Chatterbot in python? (it's a library)

jade carbon May 13, 2021, 2:23 PM

#

jade carbon y wanna check how they build the inception coco model

how?

jade carbon May 13, 2021, 2:24 PM

#

crude fable Hi, is anyone familar with pytorch advanced tensor indexing (or operations) and ...

tensor indexing?
why didn't try to use tensorflow? even more easier

crude fable May 13, 2021, 2:27 PM

#

yes, do u have time for voice chat so I can elaborate my problem

jade carbon May 13, 2021, 2:28 PM

#

y dont know when the good time to elaborate ur problem

#

y am busy know

crude fable May 13, 2021, 2:56 PM

#

I'll just ask here: how can I get multiple spans of a tensor into one?
E.g. the tensor is of size [32 (batch_size), 256 (seq_length), 768 (emb_size)] and I have multiple sequence indexers like [[1~2], [1~4]]

grave frost May 13, 2021, 3:03 PM

#

exotic maple Generalizations are for the most part, nonsense. Neural Networks are extremely p...

it honestly doesn't make sense for me to throw algos when in the same time I could have been researching NN architectures.

#

ofc, you can get 90%+ with NB or anything, but why should i use it when I know a simple NN would start at 96%+ ???

lime jewel May 13, 2021, 3:08 PM

#

Does anybody know if the QuadroRTX4000 is sufficient for deep learning on non-video tasks?

I'm using it for biomedical data, and im trying to figure out if its okay to do the RTX4000 or if i should sacrifice elsewhere to get something a little bigger

#

its 8 GB RAM, which can load the entirety of my data

exotic maple May 13, 2021, 3:12 PM

#

grave frost ofc, you can get 90%+ with NB or anything, but why should i use it when I know a...

The thing is...you dont know.

I think of it like this: you can cook fries in both your kitchen or an industrial frier...but youre going to fire up or buy an indistrial frier for one time not life-changing thing

grave frost May 13, 2021, 3:19 PM

#

exotic maple The thing is...you dont know. I think of it like this: you can cook fries in bo...

bruh. sklean doesn't work with TF data generators

#

there are a lot of utilities that frameworks offer, and I don't want to write my own generators just to use naive bayes

#

I do get your point

#

but I am simply describing the time saved in practical usage

#

ofc, maybe I don't need a NN - but then if an algo does outperform I would simply use it

young beacon May 13, 2021, 6:08 PM

#

Hi, I was trying to import gensim package but I get the following error:

   1053     # try to load fast, cythonized code if possible
-> 1054     from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
   1055 
   1056 except ImportError:

__init__.pxd in init gensim._matutils()

ValueError: numpy.ndarray has the wrong size, try recompiling. Expected 80, got 88

#

I have installed the version 3.4.0

fast dune May 13, 2021, 6:30 PM

#

I am self learning Numpy (for future ML classes). Numpy array broadcasting is brand new to me, since in the past I used basic loops for everything. I cannot wrap my head around doing math operations with broadcasting. All the online examples are TOO simple for me to learn.

#

Take for example, I have an image (1600 x 900) and I have a numpy array of 1000 random (x,y) coordinates. For each pixel on the image, I want to find the closest (x,y) coordinate, and replace its pixel to that (x,y) coordinate's pixel.

#

The easy part: Replacing the old pixel with the new pixel.
The hard part: How the heck do I compute 'closestDistance()' on each pixel? Aka. on each element in my 1600 x 900 ndarray.

exotic maple May 13, 2021, 6:59 PM

#

fast dune Take for example, I have an image (1600 x 900) and I have a numpy array of 1000 ...

can you share a sample of your array?

#

Because as I remember, numpy arrays for images is basically - > Row - column are used as (x,y) to map the pixel position, and Z is used to store color information, or so

#

so basically, the pixel [0, 0, (255,255,255)] would be a black pixel at the x-0, y-0 position

slate hollow May 13, 2021, 9:30 PM

#

https://paste.pythondiscord.com/ikebizifin.py so i'm doing some stuff with the imdb dataset

#

https://paste.pythondiscord.com/domamahosu.sql

#

thing is, i get this cryptic error

#

happens in process_text

#

and for some reason i can't even see what's going on in there i just get some weird

#

Tensor("Placeholder:0", shape=(None, 1), dtype=string)
Tensor("ExpandDims:0", shape=(None, 1), dtype=string)```

#

Tensor("text_vectorization_4/add:0", shape=(2,), dtype=int32)
Tensor("sequential_3/text_vectorization_4/add:0", shape=(2,), dtype=int32)```

#

and stuff like that- any help?

tidal bough May 13, 2021, 10:25 PM

#

with input shapes: [2], [?], [], [?], [?].
huh

#

never seen an ? before lol

velvet thorn May 13, 2021, 11:05 PM

#

fast dune Take for example, I have an image (1600 x 900) and I have a numpy array of 1000 ...

uh.

#

I'm curious

#

why do you want to do this?

#

this isn't really a broadcasting problem btw

fast dune May 13, 2021, 11:50 PM

#

@velvet thorn Yep, a valid question. I'm learning Python by converting my old Java homework into Python. However, I ran into the brick wall that is Python loops are horribly inefficient so I cannot convert my code in a 1:1 format.

velvet thorn May 13, 2021, 11:51 PM

#

fast dune <@!171929073063297024> Yep, a valid question. I'm learning Python by converting ...

okay so

fast dune May 13, 2021, 11:51 PM

#

Here is a code block from my Java code. It's like sophomore level code and uses all loops. https://gist.github.com/DennisPing/b049ee6331256bed7029db9e444405c7?ts=4

Gist

mosaic_filter.java

GitHub Gist: instantly share code, notes, and snippets.

velvet thorn May 13, 2021, 11:51 PM

#

let me get this straight

#

on a toy example

#

okay I think

#

you need some sort of tree

fast dune May 13, 2021, 11:57 PM

#

My plan was to use Numpy with as few loops as possible. The math is easy to understand but I'm struggling with coding it as matrix operations. (tagging @exotic maple in case.)
As a visual I'm posting an example image:

#

#

velvet thorn May 13, 2021, 11:59 PM

#

fast dune My plan was to use Numpy with as few loops as possible. The math is easy to unde...

there's probably a specialised algorithm for this

#

but not in my area of experience

#

if you wanted to translate your loops naively though

#

you could look into numba

#

which optimises through JIT compilation

fast dune May 14, 2021, 12:02 AM

#

Yeah, that's a valid alternative. In the interest of learning, I want to find someone who can help me with Numpy first.

velvet thorn May 14, 2021, 12:02 AM

#

fast dune Yeah, that's a valid alternative. In the interest of learning, I want to find so...

I'm not really sure if this is easily vectorised

#

hm actually...

#

let me think

#

OKAY hold up

velvet thorn May 14, 2021, 12:03 AM

#

fast dune Yeah, that's a valid alternative. In the interest of learning, I want to find so...

how do you determine distance?

#

I'm assuming

#

Euclidean distance?

fast dune May 14, 2021, 12:04 AM

#

Yeah, euclidean but I exclude the sq root becuase I only care about relative distance.

velvet thorn May 14, 2021, 12:04 AM

#

I don't know if this will be faster but this is my guess

fast dune May 14, 2021, 12:10 AM

#

Don't worry 🙂 , I'll ask around here for a few days. This problem is very dense because I'm not using loops. (The loop Python version takes 25 seconds to process while Java takes ~1 sec)

velvet thorn May 14, 2021, 12:10 AM

#

hm hold up thinking

velvet thorn May 14, 2021, 12:16 AM

#

fast dune Don't worry 🙂 , I'll ask around here for a few days. This problem is very dense...

okay

#

again, I don't know if this is faster

#

but this is my thought process.

#

say you have an image of shape (x, y)

#

create an array of shape (n, 2), a, where n = x * y, representing coordinates

#

so [[0, 0], [0, 1], [0, 2]...[0, y], [1, 0], [1, 1]...[x, y]]

#

the array of seeds, s, is already in the shape (m, 2)

#

take a[:, np.newaxis, :] - b[np.newaxis, ...] to create a raw difference array, rd, of shape (n, m, 2), where rd[:, i, :] == a - b[i]

#

in other words, the result of the difference between all coordinates and a particular seed's coordinates

#

taking (rd ** 2).sum(axis=2), reducing over the last axis, gives an array d of shape (n, m), where the element i, j represents the squared Euclidean distance between the ith entry in a and the jth entry in s

#

the last step, then, is to take d.argmin(axis=-1), which will give, for each coordinate, the index of the seed that is nearest

#

🥴 that was difficult

fast dune May 14, 2021, 12:25 AM

#

Starting from the top: If my numSeeds = 100, I create an ndarray of shape (100, 2)?

velvet thorn May 14, 2021, 12:25 AM

#

I think it makes sense

#

I haven't done numpy stuff in a long while

#

someone should check my reasoning

fast dune May 14, 2021, 12:28 AM

#

The stuff you posted matched some of the concept testing I did. I knew I have to do np.newaxis but I didn't know where to add it to. I had a feeling I should create some sort of map and do imgArray - seedArray to get a difference array.

#

But my brain couldn't handle all this new stuff.

velvet thorn May 14, 2021, 12:31 AM

#

yeah this kind of thing is easier if you have a background in mathematics

#

which is why I'm considering getting a master's

#

it's p fun though

velvet thorn May 14, 2021, 12:31 AM

#

fast dune The stuff you posted matched some of the concept testing I did. I knew I have to...

yup that's the basic idea

#

the dimensions just need to line up

fast dune May 14, 2021, 12:35 AM

#

I'll spend some time self testing and get back to you. Probably will post a github gist of my Python code.

slate hollow May 14, 2021, 1:56 AM

#

https://paste.pythondiscord.com/ikebizifin.py so i'm doing some stuff with the imdb dataset
https://paste.pythondiscord.com/domamahosu.sql
thing is, i get this cryptic error
happens in process_text
and for some reason i can't even see what's going on in there i just get some weird

Tensor("ExpandDims:0", shape=(None, 1), dtype=string)
Tensor("text_vectorization_4/add:0", shape=(2,), dtype=int32)
Tensor("sequential_3/text_vectorization_4/add:0", shape=(2,), dtype=int32)```
and stuff like that- any help?

velvet thorn May 14, 2021, 2:40 AM

#

fast dune I'll spend some time self testing and get back to you. Probably will post a gith...

okay I think

#

you'd need to chunk it

#

otherwise for any image of reasonable size the resultant array gets too big

limpid saddle May 14, 2021, 2:43 AM

#

SVM is taking more than and hour and counting.. is that okay or could there be a problem?

velvet thorn May 14, 2021, 2:43 AM

#

limpid saddle SVM is taking more than and hour and counting.. is that okay or could there be a...

data shape?

exotic maple May 14, 2021, 2:49 AM

#

limpid saddle SVM is taking more than and hour and counting.. is that okay or could there be a...

How many rows do you have? SVM takes a long time the more records you have

limpid saddle May 14, 2021, 2:50 AM

#

velvet thorn data shape?

is that what you mean?

limpid saddle May 14, 2021, 2:50 AM

#

exotic maple How many rows do you have? SVM takes a long time the more records you have

93636

exotic maple May 14, 2021, 2:50 AM

#

that sounds like too much for an SVM

#

what are you trying to classify?

limpid saddle May 14, 2021, 2:51 AM

#

it must be less tho after the data cleansing

desert oar May 14, 2021, 2:51 AM

#

yeah that will take forever to train, svm's are slow

#

probably faster and more accurate to throw it into keras with one hidden layer

limpid saddle May 14, 2021, 2:51 AM

#

hmm on average, how big should it be for SVM?

velvet thorn May 14, 2021, 2:51 AM

#

@limpid saddle what kernel

exotic maple May 14, 2021, 2:52 AM

#

limpid saddle hmm on average, how big should it be for SVM?

I dont there's a hard rule but ive seen <50,000 records tops

velvet thorn May 14, 2021, 2:52 AM

#

why are you using SVMs btw

exotic maple May 14, 2021, 2:52 AM

#

velvet thorn why are you using SVMs btw

yeah thats why I asked him what is he trying to classify lol

desert oar May 14, 2021, 2:52 AM

#

is there any problem where svms are still useful? i feel like neural networks kind of ate kernel methods and i haven't seen anyone do kernel anything in forever

exotic maple May 14, 2021, 2:53 AM

#

desert oar is there any problem where svms are still useful? i feel like neural networks ki...

some NLP tasks?

desert oar May 14, 2021, 2:53 AM

#

and for the linear svm you can just do hinge loss w/ gradient descent or whatever

desert oar May 14, 2021, 2:53 AM

#

exotic maple some NLP tasks?

curious what those would be

#

something where they specifically want to obtain support vectors?

exotic maple May 14, 2021, 2:58 AM

#

speaking of NN I really need to start learning TF2 and Keras...

limpid saddle May 14, 2021, 3:00 AM

#

velvet thorn why are you using SVMs btw

I'm trying to see what would bring out better results, I tried Naive Bayes as well but the results aren't looking so good

#

I'm very new to this so I'm pretty confused

strange fern May 14, 2021, 3:03 AM

#

Quick numpy Q: there's a notation used in a for loop using plt.scatter() that looks like X_r[y == i, 0], what does it actually mean?

Code snippet:

for color, i, target_name in zip(colors, [0, 1, 2], target_names):
    plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw,
                label=target_name)

exotic maple May 14, 2021, 3:03 AM

#

limpid saddle I'm trying to see what would bring out better results, I tried Naive Bayes as we...

but what ar eyou trying to classify? and what kind of data do you have

velvet thorn May 14, 2021, 3:04 AM

#

strange fern Quick numpy Q: there's a notation used in a for loop using `plt.scatter()` that ...

the first column (0) of every row in X_r X_r[] for which the corresponding value in y is equal to i (y == i)

exotic maple May 14, 2021, 3:06 AM

#

can any1 please recommend a good place or course where to learn PyTorch and Tensorflow2?

limpid saddle May 14, 2021, 3:06 AM

#

exotic maple but what ar eyou trying to classify? and what kind of data do you have

I have a column of phrases and a column of sentiments (int) and I'm supposed to train it to see the phrase and get the sentiment correctly

exotic maple May 14, 2021, 3:13 AM

#

limpid saddle I have a column of phrases and a column of sentiments (int) and I'm supposed to ...

have you already extracted features from that text?

#

Binary or Count_vectorized features?

#

You can try with Multinomial Naive Bayes if you are using Count_vectorize, or with a Bernoulli Naive Bayes if Binary

#

NB would be infinitly better than SVM

limpid saddle May 14, 2021, 3:16 AM

#

Yepp I have

#

What about LR, would that take so long too?

exotic maple May 14, 2021, 3:17 AM

#

LR should be fast but ive never heard of it being use for what you want

#

NB should be equally fast to be honest

limpid saddle May 14, 2021, 3:20 AM

#

Do you mind if I send some of the learning curves I got? Because I can't really tell if they're okay or not

limpid saddle May 14, 2021, 4:19 AM

#

Hello, could someone take a look at #help-peanut and give me their opinions on the graphs? and which would be the best to pick?

slate hollow May 14, 2021, 4:31 AM

#

https://paste.pythondiscord.com/ikebizifin.py so i'm doing some stuff with the imdb dataset
https://paste.pythondiscord.com/domamahosu.sql
thing is, i get this cryptic error
happens in process_text
and for some reason i can't even see what's going on in there i just get some weird

Tensor("Placeholder:0", shape=(None, 1), dtype=string)
Tensor("ExpandDims:0", shape=(None, 1), dtype=string)
Tensor("text_vectorization_4/add:0", shape=(2,), dtype=int32)
Tensor("sequential_3/text_vectorization_4/add:0", shape=(2,), dtype=int32)
```and stuff like that- any help?

#

turns out

#

turns out all i need was to change py text_vec = TextVectorization() to py text_vec = TextVectorization(input_shape=[])

#

bruh

#

what does input_shape=[] even mean

exotic maple May 14, 2021, 4:45 AM

#

slate hollow what does `input_shape=[]` even mean

that you are a passing list (or array?) as an input=

slate hollow May 14, 2021, 4:46 AM

#

exotic maple that you are a passing list (or array?) as an input=

but it doesn't have any numbers

#

from my experience i've always had to pass like a input_shape=(10,)

#

or something of the sort

exotic maple May 14, 2021, 4:46 AM

#

oh

#

eh thats weird

#

ahh

#

your data is 0 dimensional

slate hollow May 14, 2021, 4:47 AM

#

uh

#

each batch is like

#

a batch of 32 sentence thingies

#

wait no

#

nvm it's just each batch is like a bunch of strings

exotic maple May 14, 2021, 4:48 AM

#

uih nvm me too. Your data is 1-dmensional, not 0

#

0 is just a scalar i forgot lol

hazy basin May 14, 2021, 6:08 AM

#

oh arcpy is giving me a hassle

stark vortex May 14, 2021, 6:21 AM

#

I think this is probably the closest category of chat for my issue. (please tell me if I am wrong)

I need someone who is smarter than me in Opencv to assist me in some issues I am having capturing video frames from my webcam using the v4l2 backend. I am running this on a raspberry pi.

strange jay May 14, 2021, 7:35 AM

#

Hi guys, I have an ultra specific issue that I am trying to solve. So there are two cells in a given Excel sheet and they have a number that Identifies something and then a photo count given from a file explorer image upload. I am trying to find a solution to find discrepancies between different folder names that contain the file count. The issue is that the image counts in the consolidated excel sheet and the files is around 2000 photos, so I am trying to detect given folder number with files within it and have a script pull up which folders have discrepancies to the Excel sheet. Any possible solutions?

#

If anyone has any possible solutions or suggestions please let me know

tough cosmos May 14, 2021, 8:24 AM

#

Hey guys, I have been invested in the Data science sector a lot , studying courses on coursera udemy from recognized institutions and udemy. I have learned almost all libraries required and even machine learning.
Now I don't to how to start my career...

iron ginkgo May 14, 2021, 2:01 PM

#

Hey, I got a probably simple(?) question about data analysis. I have a sequence of values (stored in DataFrame) and from that sequence I want to analyse and extract subsequences where the mean value is less/more than some other constant value (basically looking up chains of suspicious values) and I want to extract them. Does this process has some specific name?

potent badge May 14, 2021, 2:12 PM

#

Question: When dealing with a neural network, why would someone divide by the sqrt of the number of neurons in the layer after performing the dot product between the neurons and the weights of that specific layer?

desert oar May 14, 2021, 2:22 PM

#

iron ginkgo Hey, I got a probably simple(?) question about data analysis. I have a sequence ...

this doesn't have a particular name, but looking for sequences of suspicious values falls into the category of "anomaly detection"

iron ginkgo May 14, 2021, 2:22 PM

#

Thanks!

desert oar May 14, 2021, 2:22 PM

#

potent badge Question: When dealing with a neural network, why would someone divide by the sq...

did you see this written somewhere?

potent badge May 14, 2021, 2:26 PM

#

desert oar did you see this written somewhere?

no, one of my professors did it, and i don't know why?

#

I did read online on some sources, that some people initialize the weights as 1/sqrt(# of nodes) but....... not after doing the dot product

desert oar May 14, 2021, 2:28 PM

#

it's apparently a thing in attention units https://www.paperswithcode.com/method/scaled, where it scales the theoretical variance to 1 under the specific conditions of an attention unit

#

https://ai.stackexchange.com/a/25057

Artificial Intelligence Stack Exchange

Why does this multiplication of $Q$ and $K$ have a variance of $d_k...

In scaled dot product attention, we scale our outputs by dividing the dot product by the square root of the dimensionality of the matrix:

The reason why is stated that this constrains the distribu...

#

(side note: i hate that stats, data science, and ai are 3 separate forums... they all get the same damn questions)

#

think of it this way: the greater the number of nodes, the larger the resulting dot product

potent badge May 14, 2021, 2:30 PM

#

So if im going from a first layer with say 784 neurons to a layer with 128 neurons would I be doing sqrt(784) or the sqrt of (784*128)

desert oar May 14, 2021, 2:31 PM

#

im only seeing this technique used for the attention mechanism

#

i think you'd do the latter based on what i'm reading

#

because the linear component of a layer is (W · x) + b where in this case W is 128×784, x is 784×1, and b is 128×1

#

however because the nonlinearity is elementwise, you would apply this elementwise too...

#

so maybe it's sqrt(784)

#

honestly i have no idea, ask your prof and let us know

#

there is this, which is more what i would intuitively expect people to use https://paperswithcode.com/method/weight-normalization

Papers with Code - Weight Normalization Explained

Weight Normalization is a normalization method for training neural networks. It is inspired by batch normalization, but it is a deterministic method that does not share batch normalization's property of adding noise to the gradients. It reparameterizes each weight vector $\textbf{w}$ in terms of a parameter vector $\textbf{v}$ and a scalar param...

lapis sequoia May 14, 2021, 2:41 PM

#

@desert oar If i want to start teachnology business should i good at math for coding?

bronze skiff May 14, 2021, 2:41 PM

#

dividing by the sqrt of num neurons is xavier initialization

lapis sequoia May 14, 2021, 2:42 PM

#

What are you all talking about ?

desert oar May 14, 2021, 2:42 PM

#

@bronze skiff i believe this is inside the network before applying the nonlinearity

bronze skiff May 14, 2021, 2:42 PM

#

suppose you assume your inputs are distributed according to N(0,1)

lapis sequoia May 14, 2021, 2:42 PM

#

Wait

#

Is this linear algebra?

bronze skiff May 14, 2021, 2:42 PM

#

then after applying independent weights in a dot product, you have something that is N(0,1)+...+N(0,1) = N(0,num_neurons) distributed

#

this is too large, so you divide by sqrt(num_neurons) to bring it back to N(0,1)

desert oar May 14, 2021, 2:43 PM

#

so if you can assume that the weights are already normal then yeah i can see how it acts to scale the output back down to unit variance

have you heard of doing this in the network itself, rather than for initialization?

bronze skiff May 14, 2021, 2:43 PM

#

this is what batch norm is kinda trying to do

#

but often initialization is good enough

#

which is why for example, pytorch nn.Linear does this by default

desert oar May 14, 2021, 2:44 PM

#

batch norm actually uses the estimated std dev of the weights right?

bronze skiff May 14, 2021, 2:44 PM

#

if you look at the source code for it, it initializes by dividing by the sqrt(num_neurons) already

bronze skiff May 14, 2021, 2:45 PM

#

desert oar batch norm actually uses the estimated std dev of the weights right?

yes, but batchwise-- so if your batch is too small it leads to biased estimates

#

another way to do this is layer norm

desert oar May 14, 2021, 2:45 PM

#

ah, i didn't realize layer norm was a thing. this must be a form of layer normalization then?

bronze skiff May 14, 2021, 2:45 PM

#

dynamically normalizing signals in neural nets is still an active field 😛

desert oar May 14, 2021, 2:46 PM

#

again i am familiar with dividing by the norm of the weight vector... but not by the sqrt of the number of weights

#

i just found the term "weight standardization", which seems apt here

bronze skiff May 14, 2021, 2:46 PM

#

yeah

#

one is to normalize the "size" of the preactivation outputs

#

and one is to actually normalize the "distribution" of the preactivations

#

they're not unrelated, but not the same

potent badge May 14, 2021, 2:47 PM

#

bronze skiff this is too large, so you divide by sqrt(num_neurons) to bring it back to N(0,1)

so Ive got layers [784, 784, 784, 10] when I dealing with the output layer would I do sqrt (10)?

desert oar May 14, 2021, 2:47 PM

#

https://arxiv.org/pdf/1903.10520.pdf
https://paperswithcode.com/method/weight-standardization
they still use the estimated std dev from the weights though

Papers with Code - Weight Standardization Explained

Weight Standardization is a normalization technique that smooths the loss landscape by standardizing the weights in convolutional layers. Different from the previous normalization methods that focus on activations, WS considers the smoothing effects of weights more than just length-direction decoupling. Theoretically, WS reduces the Lipschitz co...

#

they don't assume the variance is = # of weights

#

im really curious what conditions make that assumption valid; apparently something analogous is valid in attention units according to what i found and posted above

#

side note: i really like the graphics in this paper

#

looks like good old matplotlib

#

well-labeled figures, nicely typeset equations

bronze skiff May 14, 2021, 2:51 PM

#

potent badge so Ive got layers [784, 784, 784, 10] when I dealing with the output layer would...

during initialization of the net

#

https://cs230.stanford.edu/section/4/

Section 4 (Week 4)

Xavier Initialization and Regularization

potent badge May 14, 2021, 2:52 PM

#

so I would only divide for the first layer?

bronze skiff May 14, 2021, 2:53 PM

#

i think you're conflating an initialization with normalization

#

initialization is in the very beginning, when a network is constructed

#

normally the weights are randomly sampled under maybe an N(0,1) distribution

#

instead, we divide each weight in each layer by the number of neurons in that layer

#

that's the initialized net

#

afterwards, we just run the net like normal during training

potent badge May 14, 2021, 2:55 PM

#

hmmm

bronze skiff May 14, 2021, 2:55 PM

#

also, +1 for jax

potent badge May 14, 2021, 2:56 PM

#

so I am dividing the weights before i do the dot product?

bronze skiff May 14, 2021, 2:56 PM

#

this is weight initialization

#

there are no dot products

potent badge May 14, 2021, 2:57 PM

#

okay makes sense but do you see where the sqrt is in that code

#

it happened after every dot product

#

but I just took some out

#

and im trying to figure out why

#

my professor had written the dot product like that

bronze skiff May 14, 2021, 2:59 PM

#

technically since it's preactivation there is literally no difference between "initializing the weights by dividing by sqrt" and "initializing weights at N(0,1) and dividing the preactivations by sqrt"

#

it's just math

potent badge May 14, 2021, 3:00 PM

#

thats what I would have thought, but he originally did it for the hidden layers and output layer as well... dividing by sqrt 784

bronze skiff May 14, 2021, 3:01 PM

#

that's fine

#

you scale by the number of input neurons

#

so at each input, you have 784 neurons

#

it's only the output that you have 10

potent badge May 14, 2021, 3:01 PM

#

when I take the sqrts out and train the model as is, i get boosted accuracy, but when I do the sqrts the accuracy is like 10% for every epoch

bronze skiff May 14, 2021, 3:02 PM

#

¯_(ツ)_/¯

potent badge May 14, 2021, 3:02 PM

#

Pepega

#

literally without the sqrts

#

#

and then with them it will be like 0.10..etc (10%)

desert oar May 14, 2021, 3:04 PM

#

are you sure you didnt misunderstand what the prof was doing

#

(im also surprised that taking out the sqrts messes up the model that badly)

#

wouldnt you divide by sqrt(len(w))?

potent badge May 14, 2021, 3:05 PM

#

uh i dont think i could have misunderstood, he kinda just threw this file at us and was like "fix it"

#

this was his code so him dividing by 784 is like what i dont understand

desert oar May 14, 2021, 3:06 PM

#

oh thats your prof's code?

potent badge May 14, 2021, 3:06 PM

#

yeah

#

he changed it a bit from the Jax Neural Network code thats out there on the web

bronze skiff May 14, 2021, 3:07 PM

#

i just don't understand why there's a relu, and then relu again

potent badge May 14, 2021, 3:07 PM

#

under the #skip pre activations?

#

i believe its because we have not yet taken the relu of the input layer, so we do that and then that's our new x value

#

this is the original from jax creators or whatever

#

desert oar May 14, 2021, 3:10 PM

#

bronze skiff i just don't understand why there's a relu, and then relu again

1 relu good, 2 relu better lemon_eyes

grave frost May 14, 2021, 3:16 PM

#

desert oar 1 relu good, 2 relu better <:lemon_eyes:754441879885447296>

infinite relus enters the chat

haughty tree May 14, 2021, 3:16 PM

#

hey I want to start on AI and ml is there any course to follow through done my maths in school and csci major (but I don't mind to brush up my maths for ml suggest me )

#

lacking without proper guidance

echo orbit May 14, 2021, 3:20 PM

#

Hi, i'm working on a project about COVID19 Tweets (especially hashtags), and i was thinking about making a neural network to make predictions.
I currently have a dataframe with each row listing all the hashtags used in a single tweet/thread (lists) based on multiple months (from january 2020 to march 2021). I already studied it using networkx (still a wip but it's all about aesthetics now). That means hashtags from a same list (so same row) are "linked" to each others.
My question is : is it possible to make a machine that train on these hashtag lists month by month, then ask it to predict what would the next month's hashtag links be (so i can make another network and compare it to the datas) ?

haughty tree May 14, 2021, 3:20 PM

#

Kindly help me I'm just delaying

echo orbit May 14, 2021, 3:20 PM

#

                   Tweet_ID                          Hashtag
0       1219778294238699520                   [#coronavirus]
1       1219780718680633344           [#us, #wuhanpneumonia]
2       1219785759277772800                [#wuhanpneumonia]
3       1219791407377895424                   [#coronavirus]
4       1219797876127215616                   [#coronavirus]
5       1219805336074215424                         [#virus]
6       1219806921953181697              [#wuhancoronavirus]
7       1219809142237552640                      [#ncov2019]
8       1219811430825771008                      [#breaking]
9       1219813007695286272                   [#coronavirus]
10      1219813206379466752                   [#coronavirus]
11      1219815181599019008                   [#coronavirus]
12      1219817038354558976                         [#wuhan]
13      1219818433165946880         [#us, #wuhancoronavirus]
14      1219819377157005314              [#wuhancoronavirus]
15      1219823330234003462                   [#coronavirus]
16      1219824203454529536                   [#coronavirus]
17      1219824463367172096                      [#breaking]
18      1219824742099824640                   [#coronavirus]
19      1219826185049231360              [#wuhancoronavirus]
20      1219828098025213952                            [#us]
21      1219832397790760966                   [#coronavirus]
22      1219832615743770624                      [#breaking]
23      1219837312114286594                   [#coronavirus]
24      1219838131005874176                [#wuhanpneumonia]
25      1219838150530351104                            [#us]
26      1219839406351106048                         [#wuhan]
27      1219840010779873281  [#china, #china, #wuhan, #ncov]
28      1219840206519422976                            [#us]
29      1219840734418747393              [#wuhancoronavirus]

#

The df looks like this (300K+ tweets)

#

I don't really know on what extent can a neural network make predictions on, so if anyone can enlighten me regarding my issue, that'd be greatly appreciated

grave frost May 14, 2021, 3:27 PM

#

haughty tree Kindly help me I'm just delaying

you can check out the pinned resources

desert oar May 14, 2021, 3:47 PM

#

@echo orbit this looks like what they call a "multi-label classification" task

#

although it's also kind of a time series

echo orbit May 14, 2021, 3:48 PM

#

what do you mean (regarding the multi-label classification) please ?

desert oar May 14, 2021, 3:48 PM

#

@grave frost you're the RNN evangelist here, how would you model a time series where each time point is a sparse vector?

echo orbit May 14, 2021, 3:49 PM

#

Regarding time series i took a look on some articles but what i see is each value is set to a specific time, while here it's everything for the same month

#

So i'm a bit confused regarding how to approach the problem

haughty tree May 14, 2021, 3:50 PM

#

grave frost you can check out the pinned resources

where is it pinned?

grave frost May 14, 2021, 3:50 PM

#

echo orbit Regarding time series i took a look on some articles but what i see is each valu...

you don't

#

If it were me, I would compile a sizeable file of all hashtags possible @echo orbit you can easily scrape them from twitter (putting a min limit to ensure they are reasonably famous) encode the tokens numerically and try to predict them

desert oar May 14, 2021, 3:52 PM

#

haughty tree where is it pinned?

there is an icon at the top of the discord window, near the search bar

echo orbit May 14, 2021, 3:52 PM

#

On my program i made a dict so it takes only the 50 most used hashtags regarding COVID19 tweets (if that's what you were asking) for each month

#

However i don't understand what you mean by encoding the tokens then predicting them

grave frost May 14, 2021, 3:53 PM

#

echo orbit On my program i made a dict so it takes only the 50 most used hashtags regarding...

good. you won't get much out of it tho

#

the tokens method was something off NLP - I doubt it wouldn't work reasonably for your problem

echo orbit May 14, 2021, 3:54 PM

#

Yeah, it just serves to me so i don't have to count each hashtag's occurence counts

grave frost May 14, 2021, 3:55 PM

#

it was based on the fact twitter is full of fools - their tags are like # + <some_weird_place> + <virus_name> + year all of which can be broken down into tokens.

#

best you can do is to try it - can't gurantee

echo orbit May 14, 2021, 3:55 PM

#

I see

grave frost May 14, 2021, 3:56 PM

#

so #chinacoronavirus and #wuhanvirus and wuhancorona would be decomposable

#

I am not a twitter or Time-series expert, so take my advice with a grain of salt

echo orbit May 14, 2021, 3:57 PM

#

I don't think i'll have time for that unfortunately (as i have to submit the project before monday and i have a lot of stuff to do beside that)

#

If predictions aren't possible (at least not with such a short time), is it perhaps possible to categorize hashtags regarding how "linked" they are ?

#

Then make a program that gives the probability of two hashtags being linked for ex ?

grave frost May 14, 2021, 3:58 PM

#

yes - you can categorize hashtags but you need labelled data

#

wdym by linked?

echo orbit May 14, 2021, 4:01 PM

#

If i take the dataframe sample i posted above :

27      1219840010779873281  [#china, #china, #wuhan, #ncov]```
In the tweet with the tweet_id `1219840010779873281`, `#china, #wuhan and #ncov` are in the same tweet/thread, so i consider them "linked" here (with china used twice, still have to figure out if i should count it twice or not)

#

With networkx i made a network to see these links in a more general way, and my question was if it's currently possible to make a program so it gives the probability for 2 hashtags to be linked

desert oar May 14, 2021, 4:08 PM

#

if you just want the probability of 2 hashtags appearing together in a tweet, you can use pointwise mutual information https://en.wikipedia.org/wiki/Pointwise_mutual_information

Pointwise mutual information

Pointwise mutual information (PMI), or point mutual information, is a measure of association used in information theory and statistics. In contrast to mutual information (MI) which builds upon PMI, it refers to single events, whereas MI refers to the average of all possible events.

echo orbit May 14, 2021, 4:36 PM

#

Hmmm

#

I don't see what i can do to improve my project then

#

aside from visualizing datas through networkx

light stump May 14, 2021, 4:36 PM

#

does anyone know how to perform maximum likelihood estimation for a 2D Gaussian?

uncut barn May 14, 2021, 4:46 PM

#

does anyone know what the K variable her means?

tidal bough May 14, 2021, 4:50 PM

#

number of dimensions

light stump May 14, 2021, 4:51 PM

#

my issue is that I don’t know how to translate the math into code

#

I’m trying to fit 2d Gaussians to fluorescent peaks on an image using MLE

#

but my parameters are somewhat nonstandard

uncut barn May 14, 2021, 4:54 PM

#

@tidal bough so if we were workin in 3D k=3?, but how can that be if x the input vector ranges from 1 to P, are we assuming k=p?

tidal bough May 14, 2021, 4:55 PM

#

uncut barn <@!266216750876459008> so if we were workin in 3D k=3?, but how can that be if x...

hmm, no idea why that's the case

#

nevertheless, it should be the number of dimensions

uncut barn May 14, 2021, 4:56 PM

#

so number of features in x?

tidal bough May 14, 2021, 4:56 PM

#

yeah, k=P

uncut barn May 14, 2021, 4:57 PM

#

ok thanks

strange fern May 14, 2021, 5:05 PM

#

velvet thorn the first column (`0`) of every row in `X_r` `X_r[]` for which the corresponding...

Sorry I couldn't respond right away, thank you for the insight! I worked it through based on what you told me and I think I understand it, thanks!!

grave frost May 14, 2021, 5:19 PM

#

Suppose I am working with a Masked Language Model to pre-train on a specific dataset. In that dataset, most sequences have a particular token of a high frequency

Sample Sequence:-
    <tok1>, <tok1>, <tok4>, <tok7>, <tok4>, <tok4> ---> here tok4 is very frequent in this sequence

So if I mask some tokens and get the model to train to predict those masked tokens, obviously the model will gain a bias in predicting <tok4> due to its statistical frequency.

Since <tok4> represents important information, 'downsampling' (or removing those frequent tokens) would not be preferred and I would love to have my sequence as intact as possible.

How best should I deal with this? Is there any already established method that can counter this problem?

bronze skiff May 14, 2021, 5:24 PM

#

you can "preweight" the sequence before the attention step

#

if you're using an attention model

desert oar May 14, 2021, 5:24 PM

#

is that something like class weighting in logistic regression?

bronze skiff May 14, 2021, 5:25 PM

#

kinda-- though i think in logistic regression class weighting is penalizing incorrect classification of the dominant class less?

#

here it's like, you have the sequence <tok1> <tok1> <tok2> <tok3> but really you have something like <tok1, 0.1> <tok1, 0.1> <tok2, 0.5> <tok3, 0.3> where the weighting could be based on frequency or something

#

and so during self-attention you take weighted dot products instead of regular dot products

desert oar May 14, 2021, 5:29 PM

#

bronze skiff kinda-- though i think in logistic regression class weighting is penalizing inco...

rewarding correct classification of the rare class, but same thing basically

bronze skiff May 14, 2021, 5:30 PM

#

desert oar rewarding correct classification of the rare class, but same thing basically

i'm clearly a pessimist

grave frost May 14, 2021, 5:30 PM

#

bronze skiff and so during self-attention you take weighted dot products instead of regular d...

so I adjust the attention mask? I was thinking along the same lines, but I have never seen it ever implemented

bronze skiff May 14, 2021, 5:30 PM

#

grave frost so I adjust the attention mask? I was thinking along the same lines, but I have ...

yes, and, first time for everything

grave frost May 14, 2021, 5:31 PM

#

first time for everything
so nothing of that sort has even been written?

bronze skiff May 14, 2021, 5:31 PM

#

maybe it has, would be surprised if it hasn't

grave frost May 14, 2021, 5:33 PM

#

aight. now just to find someone capable enough

#

thanks a lot BTW @bronze skiff 🚀

bronze skiff May 14, 2021, 5:33 PM

#

np

upper sphinx May 14, 2021, 6:33 PM

#

May I ask? what are the advantages of using a python notebook than using a regular python script?

#

I've heard that it is often used on data science and machine learning so should I only use notebooks on these specifically?

serene scaffold May 14, 2021, 6:55 PM

#

Suppose there's a function that takes two arguments. Is there a vectorized way to call this function with every like row (same index) in two dataframes?

desert oar May 14, 2021, 7:22 PM

#

serene scaffold Suppose there's a function that takes two arguments. Is there a vectorized way t...

what do you mean "like row"?

#

maybe something like numpy.vectorize?

serene scaffold May 14, 2021, 7:24 PM

#

desert oar what do you mean "like row"?

The rows have indices that evaluate as equal

desert oar May 14, 2021, 7:24 PM

#

so you want to pair rows together across the 2 dataframes?

bronze skiff May 14, 2021, 7:25 PM

#

uh, why not join the two dataframes on the index

#

and then apply your function

desert oar May 14, 2021, 7:26 PM

#

or use pd.concat to get the multiindex column name

bronze skiff May 14, 2021, 7:26 PM

#

though you'll probably need to precompose your two-arity function with a random lambda

desert oar May 14, 2021, 7:27 PM

#

result = (
    pd.concat(
        {'x': df1, 'y': df2},
        axis=1,
    )
    .apply(
        lambda row: myfunc(row['x'], row['y']),
        axis=1,
    )
)

serene scaffold May 14, 2021, 7:32 PM

#

desert oar ```python result = ( pd.concat( {'x': df1, 'y': df2}, axis=1...

Lisp intensifies

desert oar May 14, 2021, 7:33 PM

#

i wish i could write it like this and not have people get confused

result = (
    pd.concat(
        {'x': df1, 'y': df2},
        axis=1)
    .apply(
        lambda row: myfunc(row['x'], row['y']),
        axis=1))

#

(which would be the lispy version)

serene scaffold May 14, 2021, 7:34 PM

#

Anyway, isn't this going to make a dataframe of dataframes or something?

#

Seems like a bad data model

desert oar May 14, 2021, 7:35 PM

#

it depends on what myfunc returns

serene scaffold May 14, 2021, 7:35 PM

#

A float

desert oar May 14, 2021, 7:35 PM

#

then it should just return a series of floats

fresh zenith May 14, 2021, 8:17 PM

#

gey guys if i want to get started in graphin things

#

matplotlib is the thing to learn right

#

but can you use it in any old ide

#

like pycharm?

serene scaffold May 14, 2021, 8:28 PM

#

fresh zenith but can you use it in any old ide

the IDE is a means of editing the code. that's unrelated to what the code actually does. So yes, you can use any IDE.

#

Or no IDE for that matter.

serene scaffold May 14, 2021, 8:29 PM

#

desert oar ```python result = ( pd.concat( {'x': df1, 'y': df2}, axis=1...

result = pd.concat(
        {'x': df1, 'y': df2},
        axis=1
    ).apply(
        lambda row: myfunc(row['x'], row['y']),
        axis=1
    )

This is how I'd have done it.

bronze skiff May 14, 2021, 8:30 PM

#

or since you're looking for indices that are equal, you know, just join

serene scaffold May 14, 2021, 8:31 PM

#

serene scaffold ```py result = pd.concat( {'x': df1, 'y': df2}, axis=1 ).app...

I somehow now don't like this either.

serene scaffold May 14, 2021, 8:32 PM

#

bronze skiff or since you're looking for indices that are equal, you know, just join

but wouldn't I then need to specify which columns should be included in either argument?

bronze skiff May 14, 2021, 8:32 PM

#

ah, you sound like you want something like a "zip" for dataframes

serene scaffold May 14, 2021, 8:32 PM

#

ye

bronze skiff May 14, 2021, 8:33 PM

#

i actually thought you could iterate through a df like that?

#

though not sure how to do it in a vectorized way that isn't a giant loop

serene scaffold May 14, 2021, 8:33 PM

#

I mean sure, but I want it to be v e c t o r i z e d

bronze skiff May 14, 2021, 8:33 PM

#

i mean, apply isn't even vectorized

#

sadly

serene scaffold May 14, 2021, 8:34 PM

#

ah well

bronze skiff May 14, 2021, 8:34 PM

#

it's a giant loop under the hood

desert oar May 14, 2021, 8:37 PM

#

bronze skiff or since you're looking for indices that are equal, you know, just join

pd.concat is a join on indices unless you specifically tell it not to

#

pd.concat, pd.merge, and pd.DataFrame.join are all kind of the same thing

#

in fact pretty much any pandas operation is a join on index

bronze skiff May 14, 2021, 8:38 PM

#

ah yes, you're right

#

i literally haven't used pandas in months

desert oar May 14, 2021, 8:38 PM

#

there's a lot to forget

red hound May 14, 2021, 8:51 PM

#

    with tf.GradientTape() as gen_tape:
        predictions = generator(z)
        #predictions = tf.cast(predictions, dtype=tf.int32)
        predictions = tf.nn.embedding_lookup(embedding, predictions)
        predictions = tf.reshape(predictions, shape=(128, 18, 200))

Im looking for a workaround as tf.cast isnt differentiable but the embedding_lookup strictly need integers as indices. As i want to optimize the generator, casting outside the gradient tape is no option. If you got an idea please feel free to ping me

ripe forge May 14, 2021, 8:56 PM

#

serene scaffold I mean sure, but I want it to be v e c t o r i z e d

sounds like a bad premise. If the operation needs to be vectorized, the function should fundamentally be operating on vectors, not individual data points

#

also, fwiw, np.vectorize is a noob trap, it almost always resorts to a native loop. Also, amusingly enough, if you're using apply, you might actually get better performance by good old list comp.

serene scaffold May 14, 2021, 9:59 PM

#

ripe forge sounds like a bad premise. If the operation needs to be vectorized, the function...

Performance aside, I like how the "numpy data model" encourages you to think in terms of operations on the whole data rather than in terms of loops. And in general I try to learn how to do whatever I'm trying to do in numpy/pandas/etc without explicit loops.

patent loom May 14, 2021, 10:06 PM

#

So I’m trying to do some sentiment analysis on movie scripts and trying to distinguish at least a variety of emotions based on each sentence. Does anybody have any tips or recommendations on how to get started?

#

Was going to use nltk for python and then go from there

serene scaffold May 14, 2021, 10:08 PM

#

patent loom So I’m trying to do some sentiment analysis on movie scripts and trying to disti...

I've only heard of sentiment analysis that classifies "positive" and "negative", and maybe "neutral". I'm not familiar with one that attempts to classify into more specific emotions than that

#

So you might want to look into multi class sentiment analysis and see if that, well, exists

patent loom May 14, 2021, 10:10 PM

#

Well I’ll just make the adjustment and base it on positive, neutral and negative per sentence

serene scaffold May 14, 2021, 10:10 PM

#

Make the adjustment?

patent loom May 14, 2021, 10:10 PM

#

What about Bert?

#

I meant something else for the adjustment part

serene scaffold May 14, 2021, 10:11 PM

#

One could probably involve bert in a sentiment analysis pipeline, yes

#

So you want a model that predicts a tuple of three floats (positive, negative, neutral) for each input?

patent loom May 14, 2021, 10:15 PM

#

No I’m trying to do something like VaderSentiment

#

Ignore me right now. I’m not making sense and I’m stressed. I’ll be back once I get my shit together

grave frost May 14, 2021, 10:19 PM

#

VADER is pretty old-tech. Pre-trained models are all the rage now

patent loom May 14, 2021, 10:20 PM

#

Any suggestions?

#

@grave frost

grave frost May 14, 2021, 10:20 PM

#

patent loom Any suggestions?

what's the dataset?

patent loom May 14, 2021, 10:21 PM

#

A movie script the movie script database

grave frost May 14, 2021, 10:21 PM

#

imdb?

patent loom May 14, 2021, 10:21 PM

#

https://imsdb.com/

The Internet Movie Script Database (IMSDb)

Movie scripts, Film scripts at IMSDb

grave frost May 14, 2021, 10:21 PM

#

you can use simple RNN's if you are new

patent loom May 14, 2021, 10:21 PM

#

https://imsdb.com/scripts/Joker.html

grave frost May 14, 2021, 10:21 PM

#

rather than jumping on pre-trained models

patent loom May 14, 2021, 10:21 PM

#

Here goes an example of a script

#

Could you send a link to the RNN documentation

grave frost May 14, 2021, 10:22 PM

#

RNN is a type of model arch

#

I recommend you learn the ML basics first before diving in

flint mason May 14, 2021, 10:22 PM

#

Can we store bucket iterator type dataset in pytorch

grave frost May 14, 2021, 10:22 PM

#

flint mason Can we store bucket iterator type dataset in pytorch

generator?

patent loom May 14, 2021, 10:23 PM

#

https://tenor.com/view/sad-cry-crying-tears-broken-gif-15062040

Tenor

flint mason May 14, 2021, 10:23 PM

#

grave frost generator?

yeah

patent loom May 14, 2021, 10:24 PM

#

It’s a love hate relationship with coding man 🥲

grave frost May 14, 2021, 10:24 PM

#

flint mason yeah

yea pytorch has generators

grave frost May 14, 2021, 10:24 PM

#

patent loom It’s a love hate relationship with coding man 🥲

I hate coding myself
~~eeveryone on this server - triggered~~

flint mason May 14, 2021, 10:24 PM

#

no how can we store bucketiteratore type datasets from torchtext library and load them later to avoid downloading down time for code

grave frost May 14, 2021, 10:25 PM

#

flint mason no how can we store bucketiteratore type datasets from torchtext library and loa...

you write on your own then - shouldn't be too difficult

flint mason May 14, 2021, 11:12 PM

#

grave frost you write on your own then - shouldn't be too difficult

What are you talking about? How am I supposed to write training data on my own

exotic maple May 14, 2021, 11:40 PM

#

does TF have a train/test splitter or do you use sklearns train_test_split?

serene scaffold May 14, 2021, 11:50 PM

#

exotic maple does TF have a train/test splitter or do you use sklearns train_test_split?

I would just use the sklearn one

exotic maple May 14, 2021, 11:51 PM

#

serene scaffold I would just use the sklearn one

thanks! In that you case to get the train,val, test splits you can do 2 train_test_splits, right? First to get train and test data, and then split the train data again to get train and val datasets

velvet thorn May 14, 2021, 11:51 PM

#

exotic maple thanks! In that you case to get the train,val, test splits you can do 2 train_te...

ye

lunar zenith May 15, 2021, 12:57 AM

#

Anyone knows matplot?

#

#

How do I get the recovered out of there

#

df_pie = df.loc[(df['Recovered'] >= 80000)]
df_pie = df_pie.groupby('WHO Region')['Recovered'].mean()

df_pie
df_pie.plot(kind = 'pie', radius = 2)```

near cosmos May 15, 2021, 1:33 AM

#

lunar zenith ```py df_pie = df.loc[(df['Recovered'] >= 80000)] df_pie = df_pie.groupby('WHO R...

It's been a bit since I've used pandas plot, but I would do

fig, ax = plt.subplots(1, 1)
df_pie.plot(..., ax=ax)

That gives you handles on the figure and axis objects, and you can futz around to edit the figure further

lunar zenith May 15, 2021, 1:39 AM

#

near cosmos It's been a bit since I've used pandas plot, but I would do ``` fig, ax = plt.s...

what exactly do you mean by ax = ax

lunar zenith May 15, 2021, 1:40 AM

#

near cosmos It's been a bit since I've used pandas plot, but I would do ``` fig, ax = plt.s...

I want to use a pie chart tho

#

not a regular plot

#

Is there a way to maximize, or remove the "Recovered" in the chart

#

near cosmos May 15, 2021, 1:58 AM

#

lunar zenith

ylabel=None maybe?

near cosmos May 15, 2021, 1:59 AM

#

near cosmos `ylabel=None` maybe?

Sorry, I'm on my phone so it's hard to write out in full

dapper hatch May 15, 2021, 2:31 AM

#

hi someone built a web app and run two jupyter notebooks ?

severe cloud May 15, 2021, 3:00 AM

#

i am building a face detection app on opencv but i am stumbling on the issue that its only drawing one eye and not all the faces with both eyes

#

#

can anyone point me in the right direction to drawing the eyes and faces of everyone in the picture?

lavish tundra May 15, 2021, 3:13 AM

#

someone here understand about line graph animation using matplotlib?

zealous tulip May 15, 2021, 3:43 AM

#

I am a beginner python user, any libraries or software to learn to get into visual recognition and machine learning

exotic maple May 15, 2021, 4:07 AM

#

zealous tulip I am a beginner python user, any libraries or software to learn to get into visu...

more than libraries you should focus on understanding machine learning itself. the libraries aren't complicated when you have an idea of what you're doing

zealous tulip May 15, 2021, 4:08 AM

#

I see @exotic maple

exotic maple May 15, 2021, 4:09 AM

#

but if you really want names -> Numpy, Pandas, Matplotlib / seaborn / plotly, sklearn, Tensorflow, opencv...etc

zealous tulip May 15, 2021, 4:09 AM

#

Thank you!

whole hamlet May 15, 2021, 4:14 AM

#

People, my LightGBM classifier is working at same speed when being run on GPU even after installing the Lightgbm for GPU

near cosmos May 15, 2021, 5:06 AM

#

lunar zenith

df_pie.plot(kind="pie", ylabel="") worked for me

near cosmos May 15, 2021, 5:10 AM

#

lunar zenith what exactly do you mean by ax = ax

ax is an argument to DataFrame.plot that you can give to force it to plot to an existing axis. So,

fig, ax = plt.subplots()   # create a figure and axis object
df_pie.plot(kind="pie", legend=False, ax=ax) # plot the pie chart to axis 'ax'

# make further adjustments
ax.set_ylabel("")
fig.tight_layout()

This is particularly useful for figures with multiple subplots, obviously. But its a common trick to do the plt.subplots call for pretty much every plot, even for single plots, because it is the most convenient and consistent way to have the fig and ax objects available. The matplotlib docs themselves say so https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html#a-figure-with-just-one-subplot

Although for this case it looks like DataFrame.plot will return an axis to you, so the you can also do, e.g.:

ax = df_pie.plot(kind="pie", legend=False)
ax.set_ylabel("")
ax.figure.tight_layout()

and for the direct question, you can turn off the label in the original call to plot:

df_pie.plot(kind="pie", legend=False, ylabel="")

near cosmos May 15, 2021, 5:25 AM

#

severe cloud can anyone point me in the right direction to drawing the eyes and faces of ever...

What technique are you using now

ripe forge May 15, 2021, 7:20 AM

#

serene scaffold Performance aside, I like how the "numpy data model" encourages you to think in ...

Aye, it was definitely a very different style of thinking, it took me a lot of time staring at stackoverflow to get it to click 😅

bold timber May 15, 2021, 8:40 AM

#

Hi, I have a question for y'all: Why when i want to Visualizing the test set, I using 'X_train' again to plotting?

grave breach May 15, 2021, 9:11 AM

#

@bold timber

#

You're visualising the model trained on X_train on X_test data

bold timber May 15, 2021, 9:20 AM

#

grave breach <@!786960616664727572>

Why i has only 10 of dots in that plot?

grave breach May 15, 2021, 9:21 AM

#

Don't know, haven't read the full code (and doesn't use matplotlib for visualization)

bold timber May 15, 2021, 9:29 AM

#

grave breach Don't know, haven't read the full code (and doesn't use matplotlib for visualiza...

In this case, why I just get a 20 list of data that actually dataset contain 30 value?

dusky granite May 15, 2021, 9:31 AM

#

hi i need help with tensorflow

#

Why do i get this error?

#

InvalidArgumentError: Unable to parse tensor proto

#

googling it out it seems that the dataset is greater that 2gb
which i doubt but want to verify
even if i use a takedataset of 10 it still gives the error

#

also i tried to solve it unsuccessfully in #help-cookie

dusky granite May 15, 2021, 10:08 AM

#

#help-cookie message

#

as it may get lost

ripe forge May 15, 2021, 10:14 AM

#

bold timber In this case, why I just get a 20 list of data that actually dataset contain 30 ...

Because you split it into train and test. That's literally what that line of code in the cell is doing

ripe forge May 15, 2021, 10:16 AM

#

bold timber Why i has only 10 of dots in that plot?

As for this.. Same reason, test set only has 10 points

ripe forge May 15, 2021, 10:16 AM

#

bold timber Hi, I have a question for y'all: Why when i want to Visualizing the test set, I ...

There's no "must" about this, you could comment out that line if you wanted.

bold timber May 15, 2021, 10:18 AM

#

ripe forge As for this.. Same reason, test set only has 10 points

whether 20 value of data belongs to training data?

ripe forge May 15, 2021, 10:18 AM

#

Are there 20 data points in train? Yes. That's what your code did.

bold timber May 15, 2021, 10:19 AM

#

Oh yeah I know it. Is because test_data is 1/3, which means 30/3 = 10. And 10 belongs to test. and 20 value of data belongs to train, right?

ripe forge May 15, 2021, 10:19 AM

#

Yep

#

And see the function name. Train test split. It's job is to split the data into train and test.

bold timber May 15, 2021, 10:20 AM

#

yeah I understand now. thank u.

bold timber May 15, 2021, 10:20 AM

#

ripe forge And see the function name. Train test split. It's job is to split the data into ...

But, what the meaning of random_state = 0?

ripe forge May 15, 2021, 10:21 AM

#

That's basically a "seed" for the random number generator required to do a random split

#

Basically, we want to make the program split the points randomly

#

That requires a random number generator. And computers don't "really" do random numbers, but instead they use some kind of pseudo random generator techniques.

#

All those techniques start with a seed. So if you give a fixed seed, you'll always get the same split

#

So long story short, the seed allows you to consistently get the same output from any random operation, such as a random split.

bold timber May 15, 2021, 10:25 AM

#

ripe forge So long story short, the seed allows you to consistently get the same output fro...

How about this? Why lost 1 value when i change a random state to 3?

#

i lost 63777.77

dusky granite May 15, 2021, 10:26 AM

#

as the seed changed

#

you didn't lose it it probably went to validation

#

(test)

bold timber May 15, 2021, 10:28 AM

#

dusky granite you didn't lose it it probably went to validation

whether i not using a random state is it fine?

dusky granite May 15, 2021, 10:29 AM

#

then there won't be any test data i think

bold timber May 15, 2021, 10:31 AM

#

dusky granite then there won't be any test data i think

I try to search on google. then when I not using a random_state, I'll get a number as randomly every time when i running that code, right?

dusky granite May 15, 2021, 10:32 AM

#

i think so yes
(i have not used this)

dusky granite May 15, 2021, 10:35 AM

#

dusky granite hi i need help with tensorflow

bump

hoary wigeon May 15, 2021, 10:46 AM

#

Hello

dusky granite May 15, 2021, 10:47 AM

#

hello

hoary wigeon May 15, 2021, 10:47 AM

#

i need help

dusky granite May 15, 2021, 10:47 AM

#

i too need help

hoary wigeon May 15, 2021, 10:48 AM

#

what kind of help ?

dusky granite May 15, 2021, 10:48 AM

#

dusky granite hi i need help with tensorflow

this

hoary wigeon May 15, 2021, 10:48 AM

#

no idea sorry

dusky granite May 15, 2021, 10:48 AM

#

what do you need help with?

wooden forge May 15, 2021, 11:07 AM

#

Hello everyone, i'm trying to make an Image Recognition algorithm, after creating a neural network from scratch, my goal is to create one without NN to compare their energy consumption

#

the thing is, after creating that last one, the precision is terrible and i don't know how to improve that, something even funnier : the only number to be correctly recognized is 4 (not all the time tho)

#

May i request your help ?

#

(i'm using the MNIST data base)

dusky granite May 15, 2021, 11:11 AM

#

have you tried training more?

wooden forge May 15, 2021, 11:14 AM

#

well, the technic is simply to calculate the average pixel values

#

my issues are on that one not the neural network, this one is working and i am very happy about it

dusky granite May 15, 2021, 11:15 AM

#

oh ok

#

so what's wrong?

wooden forge May 15, 2021, 11:15 AM

#

well the accuracy

#

that programm doesn't have a great accuracy (almost wrong everytime except when it's a 4 lol)

dusky granite May 15, 2021, 11:16 AM

#

do you have a big testing dataset?

wooden forge May 15, 2021, 11:16 AM

#

well 42000 images

#

but because it's an average, i could have 3000 or 1M it would be the same

#

i think the problem is on the comparaison

dusky granite May 15, 2021, 11:17 AM

#

well 42000 proves that it does not occur due to shortage of data

wooden forge May 15, 2021, 11:17 AM

#

the goal is to compare the image you're studying with the averages from a list containing the average for every labels

#

So those are hand written numbers between 0 and 9

dusky granite May 15, 2021, 11:19 AM

#

well i don't have a solution for your problem

#

interesting application tho

#

open source?

wooden forge May 15, 2021, 11:20 AM

#

wdym open source ?

dusky granite May 15, 2021, 11:20 AM

#

like is the code open for anyone to use

wooden forge May 15, 2021, 11:20 AM

#

it's a personal project so i assume yes

dusky granite May 15, 2021, 11:20 AM

#

cool

wooden forge May 15, 2021, 11:21 AM

#

i didn't upload it on anything apart here this morning because i was looking for help haha

dusky granite May 15, 2021, 11:21 AM

#

so not open source

#

atleast yet!

wooden forge May 15, 2021, 11:21 AM

#

yeah haha

#

i'm not really into code sharing organisation ? idk how to explain

#

i'm just coding stuff for me

dusky granite May 15, 2021, 11:22 AM

#

i am into it because i often steal code from other people

#

so i like to have my code open aswell

wooden forge May 15, 2021, 11:22 AM

#

lmao

#

wait i think i might know why

#

for the NN i had to normalize the pixels because with some functions it caused overflow

#

but with this i can just use regular pixel values

#

let me try that

#

alright let's see if it works

#

it doesn't but ! i just need to put a tolerance and that should do the thing

#

yeah haha i need to do something else but i feel very close

grave frost May 15, 2021, 11:45 AM

#

flint mason What are you talking about? How am I supposed to write training data on my own

I meant the generator lol, not the training data

dusky granite May 15, 2021, 11:47 AM

#

dusky granite hi i need help with tensorflow

bump

wooden forge May 15, 2021, 11:50 AM

#

yeah i have no idea how to use tensor flow

#

i wrote neural networks from scratch not with this

#

sorry mate

crude fable May 15, 2021, 11:50 AM

#

grave frost I meant the generator lol, not the training data

You mean customize a batch generator?

grave frost May 15, 2021, 11:50 AM

#

yea, you can wrap PT around it

dusky granite May 15, 2021, 11:51 AM

#

wooden forge yeah i have no idea how to use tensor flow

i just bump it when no conversation is going on

wooden forge May 15, 2021, 11:51 AM

#

wdym?

dusky granite May 15, 2021, 11:51 AM

#

so that people who view this channel know i have a problem

wooden forge May 15, 2021, 11:51 AM

#

ha okay

grave frost May 15, 2021, 11:51 AM

#

dusky granite so that people who view this channel know i have a problem

what GPU?

#

what dataset?

dusky granite May 15, 2021, 11:51 AM

#

grave frost what GPU?

TPU

grave frost May 15, 2021, 11:51 AM

#

dusky granite TPU

leave it

crude fable May 15, 2021, 11:51 AM

#

grave frost yea, you can wrap PT around it

Inherit a torch.utils.data.Dataset which could be wrapped by a DataLoader

#

Or just write a generator yourself

grave frost May 15, 2021, 11:51 AM

#

crude fable Inherit a torch.utils.data.Dataset which could be wrapped by a DataLoader

exactly 👍

dusky granite May 15, 2021, 11:52 AM

#

grave frost leave it

that is all i have managed to do yet

grave frost May 15, 2021, 11:52 AM

#

TPU is not supposed to be used by beginners especially if you don't know how to use CUDA GPUs

dusky granite May 15, 2021, 11:52 AM

#

i have the thing working with a gpu

grave frost May 15, 2021, 11:52 AM

#

or debug code for that matter

grave frost May 15, 2021, 11:52 AM

#

dusky granite i have the thing working with a gpu

then why do you want it on TPU?

dusky granite May 15, 2021, 11:52 AM

#

i use colab and am no longer getting connected to a gpu instance

#

you know more you use less you get

grave frost May 15, 2021, 11:53 AM

#

dusky granite i use colab and am no longer getting connected to a gpu instance

wait for a few hours

#

you will get one eventually

dusky granite May 15, 2021, 11:53 AM

#

i have been using very much for weeks

grave frost May 15, 2021, 11:53 AM

#

colab is not an unlimited supply of GPUs

wooden forge May 15, 2021, 11:53 AM

#

yeah i was told to use colab, because i need approximately 2 days to train my NN lol

grave frost May 15, 2021, 11:53 AM

#

use CPU when writing code

dusky granite May 15, 2021, 11:53 AM

#

the downtimes have reached 5 days

dusky granite May 15, 2021, 11:53 AM

#

wooden forge yeah i was told to use colab, because i need approximately 2 days to train my NN...

what do you use now?

grave frost May 15, 2021, 11:53 AM

#

I use colab all the time with no problems

wooden forge May 15, 2021, 11:54 AM

#

dusky granite what do you use now?

my pc

dusky granite May 15, 2021, 11:54 AM

#

grave frost I use colab all the time with no problems

pro user?

grave frost May 15, 2021, 11:54 AM

#

if you are not using CPU 95% of your time, then you are doing something very wrong

wooden forge May 15, 2021, 11:54 AM

#

i have to find how to use my GPU with spyder instead of my CPU

grave frost May 15, 2021, 11:54 AM

#

don't switch to GPU instance, just keep it on CPU

dusky granite May 15, 2021, 11:54 AM

#

grave frost if you are not using CPU 95% of your time, then you are doing something very wro...

i use my cpu most of the time

#

i just use gpu when training

grave frost May 15, 2021, 11:55 AM

#

dusky granite i use my cpu most of the time

then why were you locked out? most prob, you forgot to terminate the instance or left your GPU on

wooden forge May 15, 2021, 11:55 AM

#

i have a RTX i want to see if it's better (it's supposed to be because a GPU is faster than CPU for such things, why do you think people buy so many to mine bitcoins)

dusky granite May 15, 2021, 11:55 AM

#

yup i realised that later

#

that closing the tab does not terminate the session

dusky granite May 15, 2021, 11:55 AM

#

wooden forge i have a RTX i want to see if it's better (it's supposed to be because a GPU is ...

gpu will be faster for training

grave frost May 15, 2021, 11:55 AM

#

wooden forge i have a RTX i want to see if it's better (it's supposed to be because a GPU is ...

that has nothing to do with AI/mining

wooden forge May 15, 2021, 11:56 AM

#

i was just giving an exemple omg

grave frost May 15, 2021, 11:56 AM

#

wooden forge i was just giving an exemple omg

a very wrong one lol

dusky granite May 15, 2021, 11:56 AM

#

is a properly made model for tpu faster than gpu?

wooden forge May 15, 2021, 11:56 AM

#

dusky granite gpu will be faster for training

yes so if anyone knows how to switch spyder to GPU usage i'll be happy

crude fable May 15, 2021, 11:56 AM

#

Do you need to modify your code for gpu to run on TPUs?

grave frost May 15, 2021, 11:56 AM

#

crude fable Do you need to modify your code for gpu to run on TPUs?

a bit in TF, yes

dusky granite May 15, 2021, 11:56 AM

#

for fast use yes

#

or it works at cpu speeds

crude fable May 15, 2021, 11:57 AM

#

like, copying the tensors and parameters to the TPU device?

dusky granite May 15, 2021, 11:57 AM

#

there is some modification in code if that is what you mean

grave frost May 15, 2021, 11:58 AM

#

crude fable like, copying the tensors and parameters to the TPU device?

yeah, it's just the initialization for the TPU device, after that you place ops and tensors on individual TPU device. if you don't, then you are using a single TPU core which doesn't give any speed-up

dusky granite May 15, 2021, 11:58 AM

#

awesome ruler do you use tpu?

crude fable May 15, 2021, 11:58 AM

#

ic

grave frost May 15, 2021, 11:59 AM

#

dusky granite awesome ruler do you use tpu?

yeah, sometimes

crude fable May 15, 2021, 11:59 AM

#

How much faster are TPUs to GPUs

dusky granite May 15, 2021, 11:59 AM

#

can you help me convert this one example?

grave frost May 15, 2021, 11:59 AM

#

crude fable How much faster are TPUs to GPUs

8x roughly

crude fable May 15, 2021, 11:59 AM

#

cool

grave frost May 15, 2021, 11:59 AM

#

could be more, could be less

grave frost May 15, 2021, 12:00 PM

#

crude fable cool

because you have 8 cores in a TPU. think of a TPU like multiple GPU's integrated in a single device. it's a bit more complex than that, but a good analogy

#

so each core is a GPU, and since 8 cores = 8 GPU's

crude fable May 15, 2021, 12:00 PM

#

what about memory?

grave frost May 15, 2021, 12:00 PM

#

crude fable what about memory?

8gb per core

#

so your model should have to fit in 8gb

crude fable May 15, 2021, 12:01 PM

#

ic, not that large

grave frost May 15, 2021, 12:01 PM

#

in practice, it's quite different tho

crude fable May 15, 2021, 12:01 PM

#

I'm currently training my models on Tesla V100s

grave frost May 15, 2021, 12:01 PM

#

im not a hardware expert, but I am able to use models bigger than 8gb on TPU

crude fable May 15, 2021, 12:01 PM

#

with 32GB/card

grave frost May 15, 2021, 12:01 PM

#

dunno why?

dusky granite May 15, 2021, 12:01 PM

#

can you help me convert this one time?

grave frost May 15, 2021, 12:01 PM

#

most prob smthing to do with the TPU architecture

grave frost May 15, 2021, 12:02 PM

#

dusky granite can you help me convert this one time?

there isn't much "converting"

#

look it up on google

dusky granite May 15, 2021, 12:02 PM

#

error solving

#

i tried all i could

grave frost May 15, 2021, 12:02 PM

#

what's the error?

#

the full traceback

dusky granite May 15, 2021, 12:03 PM

#

this is the full thing

arctic wedgeBOT May 15, 2021, 12:03 PM

#

Hey @dusky granite!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

crude fable May 15, 2021, 12:03 PM

#

grave frost 8gb per core

with just 8gb per card maybe you'll have to reduce your batch size or it won't fit in. Or split the same batch among different cores which sounds complicated= =

dusky granite May 15, 2021, 12:03 PM

#

grave frost the full traceback

https://pastebin.com/c7YE23NQ

Pastebin

InvalidArgumentError Traceback (most recent ca...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

in short InvalidArgumentError: Unable to parse tensor proto

grave breach May 15, 2021, 12:06 PM

#

bold timber In this case, why I just get a 20 list of data that actually dataset contain 30 ...

Total data is 30, test data is 1/3 of the total, so train data is 2/3 of the data, this means that, if you only print X_train you only get two thirds, so 20

dusky granite May 15, 2021, 12:08 PM

#

here is my attempt at using tpu

#

print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])

tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

print("REPLICAS: ", tpu_strategy.num_replicas_in_sync)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
with tpu_strategy.scope():
  image_learner = Sequential([
    data_augmentation,#we pass all the images through data_augmentation to create multiple of them
    layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),#we change the rgb values range from existing 0-255 in int to 0-1 in floats
    #it is easier for the model to work in smaller range of values
    layers.Conv2D(16, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    #we create hidden nodes for the model to work on, we are using the activation method relu which is the most efficient one
    layers.Dropout(0.2),#we remove a number of output units, this regularizes the data and is a method to prevent overfitting which means over overtraining the model
    layers.Flatten(),#we normalize the layers
    layers.Dense(128, activation='relu'),#we make output layer
    layers.Dense(num_classes)#we state options for output
  ])
  image_learner.compile(optimizer='adam',
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
       metrics=['accuracy'])```

#

@grave frostyou there?

inland zephyr May 15, 2021, 12:15 PM

#

hello all i want to ask about Siamese Network. Is it okay if i have only 1 image per class to train the network, since the siamese works in pairs of images and if I only have one image for a class, so the positive pair will be both of the picture (it will the same picture in the class)?

#

and anyway is there any examples should i read (except paid course) to learn siamese implementation?

bold timber May 15, 2021, 12:28 PM

#

grave breach Total data is 30, test data is 1/3 of the total, so train data is 2/3 of the dat...

Why I get a plot almost similar when i using plt.plot(X_train, regressor.predict(X_train), color = 'blue') and plt.plot(X_test, regressor.predict(X_test), color = 'blue')?

viral hearth May 15, 2021, 12:31 PM

#

Im not sure. But here's another way to do it

#

Use numpy.polyfit.(X_train, 1) to get the polynomial of grade 1

#

And then you use numpy.polyval(numpy.polyfit(X_train, 1), X_train) to get the image of X_image

#

X_train

#

Use then plt.plot. This way it should work. Or at least it works for me

#

It is the regression polynomial ax+b

crude fable May 15, 2021, 12:41 PM

#

inland zephyr and anyway is there any examples should i read (except paid course) to learn sia...

It's meaningless to use Siamese Networks if you just have one pic under a class

inland zephyr May 15, 2021, 12:43 PM

#

I only have a single image a photoshot of a people. it only take once per person. I use MTCNN to extract the face

#

with this case I cannot use any kind of traditional cnn since it need to have much data

#

so my best bet is Siamese since it has probability to use few image

crude fable May 15, 2021, 12:44 PM

#

your goal is to train this MTCNN Net right?

inland zephyr May 15, 2021, 12:44 PM

#

no

#

the MTCNN job to take the face

crude fable May 15, 2021, 12:45 PM

#

then what are u trianing for

inland zephyr May 15, 2021, 12:45 PM

#

and the siamese will check whose face is thia

#

so i have let say thousands people and each one only has 1 face photo

#

so i need to build the simple nn to determine whose face is this

#

and based on my case and several read siamese is my best bet for this task

#

but i aware that if i only have 1 face let say it as the anchor and negative, i dont have face for the positive

#

is repreprocessing the image to make more variation of the face are good advice?

crude fable May 15, 2021, 12:49 PM

#

yes, basically what you want is to generalize different suituations (like expressions etc) based on only one pic

#

My advice is to use data augmentation to enlarge your class size first

inland zephyr May 15, 2021, 12:51 PM

#

for one image how much copy i need for siamese training?

crude fable May 15, 2021, 12:52 PM

#

I think Siamese Networks are mainly for matching problems and does not apply to your situation

inland zephyr May 15, 2021, 12:52 PM

#

the minimum one based on practical since i will use pre trained model to speed up development

crude fable May 15, 2021, 12:56 PM

#

At least, you've got to have many pics for the same class so that the model can generalize

inland zephyr May 15, 2021, 12:56 PM

#

anyway... if the siamese is using for matching task, i think it is have similarity with my goal.

crude fable May 15, 2021, 12:57 PM

#

The proper situation would be you have many pics under many classes

#

and you want to map pics of the same class to closer representations

#

while increase the distance between pics of different classes

inland zephyr May 15, 2021, 12:58 PM

#

i'm sorry i need to clarified that what i mean whose face it this, is when in the future the same person face taken and it match with one of the face in my database ( one faces per person ), it will inform me that the person has similiarity with this person in db

#

just like absent collection or fraudster recognition

#

so based from my case thats why i choose Siamese for my network

dusky granite May 15, 2021, 1:39 PM

#

i think i have shortened my problem

#

i need help figuring out this error

#

   1086           self._maybe_load_initial_epoch_from_ckpt(initial_epoch))
   1087       logs = None
-> 1088       for epoch, iterator in data_handler.enumerate_epochs():
   1089         self.reset_metrics()
   1090         callbacks.on_epoch_begin(epoch)

serene scaffold May 15, 2021, 1:46 PM

#

dusky granite i need help figuring out this error

there has to be more to the error message than this

dusky granite May 15, 2021, 1:47 PM

#

sure wait

dusky granite May 15, 2021, 1:47 PM

#

serene scaffold there has to be more to the error message than this

https://pastebin.com/BT1U2Vw2

Pastebin

InvalidArgumentError Traceback (most recent ca...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

i think this is what is the main problem currently

serene scaffold May 15, 2021, 1:48 PM

#

@dusky granite look at history = image_learner.fit(train_ds,validation_data=val_ds,epochs=10,steps_per_epoch=128) and make sure that the items you passed to it are all the right type.

dusky granite May 15, 2021, 1:49 PM

#

i don't know what steps_per_epoch is