#data-science-and-ml

1 messages ยท Page 311 of 1

broken quail
#

can you give list of field?

hoary wigeon
#

sure

#
Columns Name : ['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC', 'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition', 'SalePrice', 'GrLivAreaGroup', 'PricePerGRLA']
broken quail
#

what the hell so much hahaha... wait

#

maybe... housing style & average price (barplot), (i dont know what the utities) but you can try utilities with landslope

maybe more help if you had dataset link or metadata

south crag
#

or is it something u r given

broken quail
#

you had the link? so many different housing dataset on kaggle

south crag
#

I would suggest u to pick a subset randomly and pairplot to observe which features r important

#

or u can just preprocess things

#

build a model in keras and try using l2 regularisation

#

it will help to eliminate features

lapis sequoia
#

enc = OneHotEncoder()

#

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this video, you'll learn how to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a...

โ–ถ Play video
desert oar
#

Think of it this way: if you sample without replacement from a finite population, eventually you start running out of members in the population and the distribution starts to shift away from what it was originally.

lapis sequoia
neon marsh
#

Is anyone here familiar with Human Action Recognition or HAR?

bold timber
lapis sequoia
#

Just watch the video

#

It explain you how to build it

#

I'd recommend not using fit_transform() but using fit() and then transform()

fringe rapids
#

Anyone have any guidance how quadratic programming works?

hoary wigeon
hoary wigeon
lapis sequoia
#

up to you

#

not that the more features you give to the pairplot the longer it will take to plot that

#

around 5-10 minutes for all your features (depending on how many values you have in each feature)

hoary wigeon
#

im confused between columns which columns can be helpful for me

#

I think i must try heatmap first

lapis sequoia
#

just plot all, go and grab a drink or go for a walk

hoary wigeon
#

then relate

lapis sequoia
#

and see the result

#

Don't forget that preprocessing is mainly human based

hoary wigeon
#

yeh

grave frost
#

try PyCaret - it has outstanding feature engineering capabilities

lapis sequoia
#

๐Ÿ˜ฎ

south crag
#

lolll xD

hoary wigeon
#

restriction : use only python

south crag
lapis sequoia
hoary wigeon
#

shall i drop all object column for finding correlation ?

#

shall i consider only int and float ?

south crag
hoary wigeon
south crag
#

no

#

u have to preprocess it

#

remove null things and stuff

grave frost
lapis sequoia
#

hence can run the code in the background while working

grave frost
#

you can't expect to beat SOTA or anything with it

lapis sequoia
#

you tried it? how slow is it?

grave frost
lapis sequoia
#

not bad, how big is the dataset?

grave frost
lapis sequoia
#

it is fine i guess

#

would be amazing for a project that you don't want to do

#

๐Ÿคฃ

grave frost
#

uh-huh. won't give much accuracy tho

lapis sequoia
#

what's the score difference?

grave frost
#

so not a one-stop solution, but not bad

lapis sequoia
#

between handpicking and that lib?

grave frost
#

dunno - it uses traditional ML algos which I never want to see results from

lapis sequoia
#

i guess it uses sklearn

grave frost
lapis sequoia
#

but bro

#

๐Ÿคฃ

grave frost
#

hefty

lapis sequoia
#

reminds me of the mom can we have x at home

grave frost
#

you won't use most of them anyways.

#

and models have more arguments lol

lapis sequoia
# lapis sequoia

"Mom can we have machine learning?"
"We already have ML at home"
ML at home

lapis sequoia
desert oar
#

caret is already "we have X at home"

#

I kid. It wasn't a bad library for its time

#

Makes sense to port it to python. Much less verbose than sklearn

#

Good for quick things

grave frost
south crag
south crag
lapis sequoia
bronze skiff
lapis sequoia
lapis sequoia
#

@bold timber message here

bold timber
# lapis sequoia ?

Sorry I still don't understand about this. I had been watching your suggest video and It's so different in my case.

I know a 'France' have binary number is [1.0 0.0 0.0 44.0 72000.0] because at the first time 'France' is showing up and the rest (except 'France') Dummy number is 0.

How about 'Spain'?
Spain showing up at the second place. Why 2 Dummy numbers is 0 and 0, why not 0 and 1 because 'Spain' showing up at the second row in that table?

#

Please telling me clearly. I am so confuse about that

lapis sequoia
#

Alphabetical

#

F - G - S

#

France = [1, 0, 0]
Germany = [0, 1, 0]
Spain = [0, 0, 1]

bold timber
#

Oh my god I think like that before. but i still don't believe it

#

ok. thank u so much!

lapis sequoia
#

np

bold timber
lapis sequoia
#

they are values

#

they are not strings

#

onehot transforms strings

#

not values

#

unless you tell it to transform them

bold timber
lapis sequoia
#

yup it basically passes through

bold timber
#

So that having value to keep

bold timber
lapis sequoia
#

๐Ÿ‘

bold timber
#

And so on...

lapis sequoia
#

I'd recommend not using fit_transform() but using fit() and then transform()

bold timber
#

where i place that code in a cell?

lapis sequoia
#

you literally substitute fit_transform with fit and then transform

#

why are you fit_tranform your test?

bold timber
bold timber
neat basin
#

hey sorry if I'm interrupting at all but I got a quick question about jupyter/colab. Is there a way to have the same np.random.seed across the entire book or do I have to call it in every sell I want to use that seed?

lapis sequoia
#

you have to apply transform on your test dataset too

#

unless you don't have it

#

honestly fit_transform is the worst idea i've seen for a beginners

#

share your notebook in a colab or datalys

bold timber
bold timber
lapis sequoia
#

because you have people who are wondering wtf is happening to their data

bold timber
#

I'm so beginner in machine learning

bold timber
lapis sequoia
#

share collab

bold timber
lapis sequoia
#

i added you

bold timber
#

I had been accept bro

oak geode
#

do you ppl have any suggestions for whom to follow on YouTube or twitter regarding ds

#

like getting useful stuff and resources from them

bold timber
#

or u can using Udemy or Datacamp

oak geode
#

no I didn't mean courses

#

just knowledge and practical stuff

#

or maybe talks and conferences

ripe blade
#

Hi, i need help regarding PCA implementation in python, can anyone guide me?

lilac raven
#

If I want to find the correlation coefficient by using matrix multiplication on a time series data[t] and volume [x,y,z,t], like this seed_ts_win = seed_ts[t:t+win_width] vol_ts_win = vol_ts[:, :, :, t:t+win_width] Will I need to reshape the vol_ts_win data to something like [xyz,t] to do a matrix dot product?

grave frost
desert oar
desert oar
grave frost
#

hmmmm... ๐Ÿค”

grave frost
desert oar
#

any language that you don't already use is "some arbitrary language"

grave frost
#

but there is nothing there in scala that makes a compelling reason for me to learn

#

Anyways,
255 tensor(0.1585, device='cuda:0', grad_fn=<NegBackward>)
Any guesses what exactly that means so I can find it and remove that line?

#

my guess is that it's the loss

#

is that how PT displays it's loss?

slim fox
#

Just don't be too absolute.

#

โ€œOnlyย aย Sith deals in absolutes.โ€

lapis sequoia
grave frost
exotic maple
desert oar
exotic maple
#

Generalizations are for the most part, nonsense. Neural Networks are extremely powerful in many domains, but if you can pop a Gaussian Naive Bayes and get 90% + accuracy, is that "wrong"?

I have a friend in DS that tells me "just use a NN" for everything and I've told him many times that it sounds like a stupid answer. If you can just throw a NN at everything after some preprocessing, where exactly is the Data Scientists expertise needed?

neat basin
lapis sequoia
#

guys

#

i have a question

#

with an already image segmentation CNN trained, can i pass it an anime image, and will it segmentate correctly?

slate hollow
#

so

#

i was messing around with

#

tf.keras.losses.BinaryCrossEntropy() (default values for everything)

#

and this happened: py loss.call([[0.1, 0.9], [0.1, 0.9], [0.1, 0.9]], [[0, 1], [0.1, 0.9], [0.1, 0.9]]) Out[22]: <tf.Tensor: shape=(3,), dtype=float32, numpy=array([1.5379095 , 0.32508278, 0.32508278], dtype=float32)>

#

the thing is, even though it supposedly expected some numbers, it worked just fine

slate hollow
#

oh uh

#

in the docs

#

it says that for y_true it expects an array of 1'a dn 0's

velvet thorn
#

as in

#

you're asking why

slate hollow
#

yeah

velvet thorn
#

when you pass non-integral values for y_true (the first argument), you don't run into an error?

slate hollow
#

yeah

#

i passed [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9]]

velvet thorn
#

uh

#

do you know what crossentropy means?

#

mathematically

slate hollow
#

no, not 100% sure

velvet thorn
#

okay

#

what about entropy?

#

in the information theory context

#

just wanna get an idea of your background knowledge

#

actually do you wanna know the theory or do you just want a quick answer

slate hollow
#

just just a quick answer lol

velvet thorn
#

because it's not checked

#

and it's mathematically valid

#

that's it basically

slate hollow
#

how can you have something that can just indiscriminately take both 1 and 2 numbers

velvet thorn
slate hollow
#

yeah

velvet thorn
#

broadcasting?

#

again, it's not mathematically invalid

#

you could have such an output from a previous layer

#

so it's practically possible

slate hollow
#

ok then

slate hollow
#

and i used global variables to get what was actually happening in split_up and i get this: <tf.Tensor 'sequential/text_vectorization/StringSplit/RaggedGetItem/strided_slice_5:0' shape=(None,) dtype=string>

hoary wigeon
#
std        20.645407
min        11.000000
25%        17.000000
50%        27.000000
75%        54.000000
max        71.000000
#

Is there any quick function to distribute my numerical data in to category , like

If value is 6, it should lie in category 0-10

?????

exotic maple
hoary wigeon
#

nope

#

periods

#

like we use for date

#

range**

exotic maple
#

If you want your numerical data into a category thats a bin...

hoary wigeon
exotic maple
hoary wigeon
#

i created a function and applied to it

exotic maple
#

I mean its literally created to create bins lol

hoary wigeon
#

ohh

#

can we use mix and max value and ask cut to create intervals automatically ?

exotic maple
#

i dont know

hoary wigeon
#

oh k

desert oar
#

no, you'd have to write your own function to do it using e.g. np.linspace

tacit palm
#

hello is there any libraries i should be aware of when dealing with time like hh:mm:ss in a dataframe

lilac raven
#

Why is C just the last file and not the combined of the two? ```
if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
full_name = pathlib.Path(root) / file
try:
read_fname = full_name
data = np.loadtxt(read_fname)
# data_num = data

                   # data_list = data.tolist()
                   # datalist = []
                  # Output = []
                  #  datalist.append(data)
                    data_list = data.tolist()
                    
                    data_list.extend(data_list)
                    c = np.array(data_list)
                   # for i in range(len(data)):
                     #   Output.append(np.mean(data[i]))
                    print(data_list)```
desert oar
tidal bough
lilac raven
#

just trying to make a combined location for the multiple files

hallow bronze
#

Hey guys I want to learn datas cience and other skills

#

I decided to take a real python subscription is it any good

lilac raven
#

this is full code with print np.mean at the end ```for root, dirs, files in os.walk("/Users/jsmith/Documents"):
for file in files:
if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
full_name = pathlib.Path(root) / file
try:
read_fname = full_name
data = np.loadtxt(read_fname)

                    data_list = data.tolist()
              
                    data_list.extend(data)
                    c = np.array(data_list)
                 
                    print(c)
                    
                    print((np.mean[c]), axis = 0)
        
                except Exception as e:
                             print (e)```
#

I get [47.63424964 48.70779177 44.95860981 46.17740726 38.02733795 38.13563849 35.35533906 35.68120161 38.23956264 40.52353468 36.66523725 31.91423693 39.82019774 40.08918629 33.96831102 59.21946001 43.1648879 44.69394836 40.13199376 75. 72.50760609 28.40454509 22.94157339 26.28287415 30.52569707 37.17810563 32.2139077 23.27373341 47.63424964 48.70779177 44.95860981 46.17740726 38.02733795 38.13563849 35.35533906 35.68120161 38.23956264 40.52353468 36.66523725 31.91423693 39.82019774 40.08918629 33.96831102 59.21946001 43.1648879 44.69394836 40.13199376 75. 72.50760609 28.40454509 22.94157339 26.28287415 30.52569707 37.17810563 32.2139077 23.27373341] 'function' object is not subscriptable [356.22258666 349.47877856 256.22921202 251.57835095 393.43572114 204.17516989 108.25317547 109.66546928 156.79073102 215.62248388 76.82953714 131.98240352 107.1130911 100. 155.02932274 267.62847382 342.38136632 289.35272592 319.09348501 277.627819 261.0439415 229.46949688 313.32438432 250.97033911 194.77984801 326.2595784 235.80044922 140.2466315 356.22258666 349.47877856 256.22921202 251.57835095 393.43572114 204.17516989 108.25317547 109.66546928 156.79073102 215.62248388 76.82953714 131.98240352 107.1130911 100. 155.02932274 267.62847382 342.38136632 289.35272592 319.09348501 277.627819 261.0439415 229.46949688 313.32438432 250.97033911 194.77984801 326.2595784 235.80044922 140.2466315 ] 'function' object is not subscriptable

#

so it prints each file one at a time

#

and not together like a combined array

#

why so?

ebon geyser
#

Anyone here who can answer this?

#

Please ping me

lapis sequoia
#

Can someone help how can I convert GML string to GeoJSON?

#

I know, GeoPandas has methods for it, but I'm totally lost in it.

lapis sequoia
#

Guys im a highschool student so should i learn linear algebra 18 hours to be good at machine learning????

#

and calculus 12 hours

#

etc.

#

math from freecodecamp

serene scaffold
lapis sequoia
#

Im not in the us

#

Sad kek

#

But good answer@serene scaffold thanks your respect

serene scaffold
# lapis sequoia Im not in the us

I only mentioned "in the US" because I don't want to make general statements about computer science departments I know nothing about. Is a university education something that's expected for scientific work in your region?

lapis sequoia
#

Yep

#

They do

serene scaffold
#

Alright. So look at universities with computer science programs that you might want to attend and figure out what they look for in applicants. If it's not on their website, you can probably call their admissions department and ask.

#

My department looks for a strong academic record in general, but not getting an A in calculus immediately disqualifies you.

#

(And that's more or less it. No programming experience is expected.)

#

Let me know if you have comments about that @lapis sequoia.

lapis sequoia
#

Uh

#

Its depend on region?

serene scaffold
# lapis sequoia Its depend on region?

yes, if your goal is to work in machine learning professionally, it depends on the local market and what those employers expect. And if they want a university education, then it also depends on what they expect.

lapis sequoia
#

@serene scaffold ok sir so its possible to learn all courses? But how can i use them for programming???

serene scaffold
lapis sequoia
#

Yah

serene scaffold
#

Yes, you can do them if you want.

lapis sequoia
#

So all of you explain

#

Learn math for field work?

#

@serene scaffold 1 More stupid question

serene scaffold
#

Ping me again when you ask the question.

lapis sequoia
#

@serene scaffold Can i get freelance by become machine learning?

serene scaffold
lapis sequoia
#

In usa?

serene scaffold
#

you said you are not in the USA, right? People will need to know what country you are in to know if that career direction is viable in your market.

lapis sequoia
#

@serene scaffold can you example how to use linear algebra in real programming?

serene scaffold
lapis sequoia
#

But i think most kind of programming use linear algreba

#

Even game developer

#

@serene scaffold

desert oar
#

(but i had to work my butt off later to make up for it)

lapis sequoia
#

@desert oar are you freelance?

desert oar
#

i was a professional data scientist for 5 years although my current job is a software engineer. i was not a freelancer but i did do a one-off consulting gig.

lapis sequoia
#

Jeez

#

Can you build onw car?

#

Can you build rocket??

#

Can you build onw discord app???

#

Can you hack nasa????

desert oar
#

No car or rocket, but my friend with a mechanical engineering PhD can build a rocket ๐Ÿ™‚ I have built a simple Discord bot before.

lapis sequoia
#

That 5 years college?

desert oar
#

My education path was:

  1. well-regarded American public high shcool
  2. well-regarded American research university BA with double major in economics and math (along with some other credentials)
  3. top-ranked American research university MA

I have had a pretty "easy" journey, all things considered.

lapis sequoia
#

@desert oar Is anything doesnt use much math?

desert oar
#

In data science? No, unfortunately; you need math. Data engineering doesn't require much math, although math still helps in that role too.

#

I thought I hated math until I took a linear algebra course in college.

obtuse stratus
desert oar
#

Then I realized I was just taught badly.

desert oar
#

In fact I still kind of wish I became an economist ๐Ÿ™‚ edit: I would not be opposed to a mid-career PhD, but the job market for academics is difficult now so I don't mind waiting.

lapis sequoia
#

@desert oarWhat is your hardest math

desert oar
#

Hardest? as in, the math that I have the most trouble with? Real analysis and financial math. Too much "computation". I prefer playing with abstract symbols.

obtuse stratus
#

yeah, I'm a freshman in university with DS-AI major , do i need to know some other industries ?

desert oar
# obtuse stratus yeah, I'm a freshman in university with DS-AI major , do i need to know some oth...

The more you know about any particular industry, the more appealing you will be for a "general purpose" DS role. Industry knowledge can be a significant bonus on a job application, and can offset comparatively weaker research/academic credentials. A well-managed data science team has a mix of both "researchers" and "industry people". If you have an interest in a particular industry, you should feel empowered to pursue that interest, it might prove more fruitful than grinding away at stuff you don't care about.

lapis sequoia
#

@desert oar god typing

#

@desert oar So can ai talk together?

obtuse stratus
desert oar
#

Artificial Gibberish

lapis sequoia
#

Magic

#

@desert oar How to be mechanical engi?

#

And if im a mechanical engi can i build my onw sentry like tf2??

desert oar
#

You could probably build something that looks like it, but it can't "unpack" itself from nothing like that.

lapis sequoia
#

Lol

#

Lmao

#

So how to smarther than elon musk?

obtuse stratus
lapis sequoia
#

What is ds?

obtuse stratus
#

Data Science

lapis sequoia
#

Oh

mint palm
lapis sequoia
#

@mint palm uh

#

Im died

mint palm
#

Haha

#

I was serious

karmic apex
#

Hi everyone. Is there any software engineer/developer here who is switching to data science?

jade carbon
#

can we convert the .pb file into h5?
y mean in from pure tf into keras model

#

y wanna check how they build the inception coco model

crude fable
#

Hi, is anyone familar with pytorch advanced tensor indexing (or operations) and is free to voicechat a little bit?

deft heron
#

has anyone here ever used Chatterbot in python? (it's a library)

jade carbon
crude fable
#

yes, do u have time for voice chat so I can elaborate my problem

jade carbon
#

y dont know when the good time to elaborate ur problem

#

y am busy know

crude fable
#

I'll just ask here: how can I get multiple spans of a tensor into one?
E.g. the tensor is of size [32 (batch_size), 256 (seq_length), 768 (emb_size)] and I have multiple sequence indexers like [[1~2], [1~4]]

grave frost
#

ofc, you can get 90%+ with NB or anything, but why should i use it when I know a simple NN would start at 96%+ ???

lime jewel
#

Does anybody know if the QuadroRTX4000 is sufficient for deep learning on non-video tasks?

I'm using it for biomedical data, and im trying to figure out if its okay to do the RTX4000 or if i should sacrifice elsewhere to get something a little bigger

#

its 8 GB RAM, which can load the entirety of my data

exotic maple
grave frost
#

there are a lot of utilities that frameworks offer, and I don't want to write my own generators just to use naive bayes

#

I do get your point

#

but I am simply describing the time saved in practical usage

#

ofc, maybe I don't need a NN - but then if an algo does outperform I would simply use it

young beacon
#

Hi, I was trying to import gensim package but I get the following error:

   1053     # try to load fast, cythonized code if possible
-> 1054     from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
   1055 
   1056 except ImportError:

__init__.pxd in init gensim._matutils()

ValueError: numpy.ndarray has the wrong size, try recompiling. Expected 80, got 88
#

I have installed the version 3.4.0

fast dune
#

I am self learning Numpy (for future ML classes). Numpy array broadcasting is brand new to me, since in the past I used basic loops for everything. I cannot wrap my head around doing math operations with broadcasting. All the online examples are TOO simple for me to learn.

#

Take for example, I have an image (1600 x 900) and I have a numpy array of 1000 random (x,y) coordinates. For each pixel on the image, I want to find the closest (x,y) coordinate, and replace its pixel to that (x,y) coordinate's pixel.

#

The easy part: Replacing the old pixel with the new pixel.
The hard part: How the heck do I compute 'closestDistance()' on each pixel? Aka. on each element in my 1600 x 900 ndarray.

exotic maple
#

Because as I remember, numpy arrays for images is basically - > Row - column are used as (x,y) to map the pixel position, and Z is used to store color information, or so

#

so basically, the pixel [0, 0, (255,255,255)] would be a black pixel at the x-0, y-0 position

slate hollow
#

thing is, i get this cryptic error

#

happens in process_text

#

and for some reason i can't even see what's going on in there i just get some weird

#
Tensor("Placeholder:0", shape=(None, 1), dtype=string)
Tensor("ExpandDims:0", shape=(None, 1), dtype=string)```
#
Tensor("text_vectorization_4/add:0", shape=(2,), dtype=int32)
Tensor("sequential_3/text_vectorization_4/add:0", shape=(2,), dtype=int32)```
#

and stuff like that- any help?

tidal bough
#

with input shapes: [2], [?], [], [?], [?].
huh

#

never seen an ? before lol

velvet thorn
#

I'm curious

#

why do you want to do this?

#

this isn't really a broadcasting problem btw

fast dune
#

@velvet thorn Yep, a valid question. I'm learning Python by converting my old Java homework into Python. However, I ran into the brick wall that is Python loops are horribly inefficient so I cannot convert my code in a 1:1 format.

fast dune
velvet thorn
#

let me get this straight

#

on a toy example

#

okay I think

#

you need some sort of tree

fast dune
#

My plan was to use Numpy with as few loops as possible. The math is easy to understand but I'm struggling with coding it as matrix operations. (tagging @exotic maple in case.)
As a visual I'm posting an example image:

velvet thorn
#

but not in my area of experience

#

if you wanted to translate your loops naively though

#

you could look into numba

#

which optimises through JIT compilation

fast dune
#

Yeah, that's a valid alternative. In the interest of learning, I want to find someone who can help me with Numpy first.

velvet thorn
#

hm actually...

#

let me think

#

OKAY hold up

velvet thorn
#

I'm assuming

#

Euclidean distance?

fast dune
#

Yeah, euclidean but I exclude the sq root becuase I only care about relative distance.

velvet thorn
#

I don't know if this will be faster but this is my guess

fast dune
#

Don't worry ๐Ÿ™‚ , I'll ask around here for a few days. This problem is very dense because I'm not using loops. (The loop Python version takes 25 seconds to process while Java takes ~1 sec)

velvet thorn
#

hm hold up thinking

velvet thorn
#

again, I don't know if this is faster

#

but this is my thought process.

#

say you have an image of shape (x, y)

#

create an array of shape (n, 2), a, where n = x * y, representing coordinates

#

so [[0, 0], [0, 1], [0, 2]...[0, y], [1, 0], [1, 1]...[x, y]]

#

the array of seeds, s, is already in the shape (m, 2)

#

take a[:, np.newaxis, :] - b[np.newaxis, ...] to create a raw difference array, rd, of shape (n, m, 2), where rd[:, i, :] == a - b[i]

#

in other words, the result of the difference between all coordinates and a particular seed's coordinates

#

taking (rd ** 2).sum(axis=2), reducing over the last axis, gives an array d of shape (n, m), where the element i, j represents the squared Euclidean distance between the ith entry in a and the jth entry in s

#

the last step, then, is to take d.argmin(axis=-1), which will give, for each coordinate, the index of the seed that is nearest

#

๐Ÿฅด that was difficult

fast dune
#

Starting from the top: If my numSeeds = 100, I create an ndarray of shape (100, 2)?

velvet thorn
#

I think it makes sense

#

I haven't done numpy stuff in a long while

#

someone should check my reasoning

fast dune
#

The stuff you posted matched some of the concept testing I did. I knew I have to do np.newaxis but I didn't know where to add it to. I had a feeling I should create some sort of map and do imgArray - seedArray to get a difference array.

#

But my brain couldn't handle all this new stuff.

velvet thorn
#

yeah this kind of thing is easier if you have a background in mathematics

#

which is why I'm considering getting a master's

#

it's p fun though

velvet thorn
#

the dimensions just need to line up

fast dune
#

I'll spend some time self testing and get back to you. Probably will post a github gist of my Python code.

slate hollow
#

https://paste.pythondiscord.com/ikebizifin.py so i'm doing some stuff with the imdb dataset
https://paste.pythondiscord.com/domamahosu.sql
thing is, i get this cryptic error
happens in process_text
and for some reason i can't even see what's going on in there i just get some weird

Tensor("ExpandDims:0", shape=(None, 1), dtype=string)
Tensor("text_vectorization_4/add:0", shape=(2,), dtype=int32)
Tensor("sequential_3/text_vectorization_4/add:0", shape=(2,), dtype=int32)```
and stuff like that- any help?
velvet thorn
#

you'd need to chunk it

#

otherwise for any image of reasonable size the resultant array gets too big

limpid saddle
#

SVM is taking more than and hour and counting.. is that okay or could there be a problem?

exotic maple
limpid saddle
exotic maple
#

that sounds like too much for an SVM

#

what are you trying to classify?

limpid saddle
#

it must be less tho after the data cleansing

desert oar
#

yeah that will take forever to train, svm's are slow

#

probably faster and more accurate to throw it into keras with one hidden layer

limpid saddle
#

hmm on average, how big should it be for SVM?

velvet thorn
#

@limpid saddle what kernel

exotic maple
velvet thorn
#

why are you using SVMs btw

exotic maple
desert oar
#

is there any problem where svms are still useful? i feel like neural networks kind of ate kernel methods and i haven't seen anyone do kernel anything in forever

desert oar
#

and for the linear svm you can just do hinge loss w/ gradient descent or whatever

desert oar
#

something where they specifically want to obtain support vectors?

exotic maple
#

speaking of NN I really need to start learning TF2 and Keras...

limpid saddle
#

I'm very new to this so I'm pretty confused

strange fern
#

Quick numpy Q: there's a notation used in a for loop using plt.scatter() that looks like X_r[y == i, 0], what does it actually mean?

Code snippet:

for color, i, target_name in zip(colors, [0, 1, 2], target_names):
    plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw,
                label=target_name)
exotic maple
velvet thorn
exotic maple
#

can any1 please recommend a good place or course where to learn PyTorch and Tensorflow2?

limpid saddle
exotic maple
#

Binary or Count_vectorized features?

#

You can try with Multinomial Naive Bayes if you are using Count_vectorize, or with a Bernoulli Naive Bayes if Binary

#

NB would be infinitly better than SVM

limpid saddle
#

Yepp I have

#

What about LR, would that take so long too?

exotic maple
#

LR should be fast but ive never heard of it being use for what you want

#

NB should be equally fast to be honest

limpid saddle
#

Do you mind if I send some of the learning curves I got? Because I can't really tell if they're okay or not

limpid saddle
#

Hello, could someone take a look at #help-peanut and give me their opinions on the graphs? and which would be the best to pick?

slate hollow
#

https://paste.pythondiscord.com/ikebizifin.py so i'm doing some stuff with the imdb dataset
https://paste.pythondiscord.com/domamahosu.sql
thing is, i get this cryptic error
happens in process_text
and for some reason i can't even see what's going on in there i just get some weird

Tensor("Placeholder:0", shape=(None, 1), dtype=string)
Tensor("ExpandDims:0", shape=(None, 1), dtype=string)
Tensor("text_vectorization_4/add:0", shape=(2,), dtype=int32)
Tensor("sequential_3/text_vectorization_4/add:0", shape=(2,), dtype=int32)
```and stuff like that- any help?
#

turns out

#

turns out all i need was to change py text_vec = TextVectorization() to py text_vec = TextVectorization(input_shape=[])

#

bruh

#

what does input_shape=[] even mean

exotic maple
slate hollow
#

from my experience i've always had to pass like a input_shape=(10,)

#

or something of the sort

exotic maple
#

oh

#

eh thats weird

#

ahh

#

your data is 0 dimensional

slate hollow
#

uh

#

each batch is like

#

a batch of 32 sentence thingies

#

wait no

#

nvm it's just each batch is like a bunch of strings

exotic maple
#

uih nvm me too. Your data is 1-dmensional, not 0

#

0 is just a scalar i forgot lol

hazy basin
#

oh arcpy is giving me a hassle

stark vortex
#

I think this is probably the closest category of chat for my issue. (please tell me if I am wrong)

I need someone who is smarter than me in Opencv to assist me in some issues I am having capturing video frames from my webcam using the v4l2 backend. I am running this on a raspberry pi.

strange jay
#

Hi guys, I have an ultra specific issue that I am trying to solve. So there are two cells in a given Excel sheet and they have a number that Identifies something and then a photo count given from a file explorer image upload. I am trying to find a solution to find discrepancies between different folder names that contain the file count. The issue is that the image counts in the consolidated excel sheet and the files is around 2000 photos, so I am trying to detect given folder number with files within it and have a script pull up which folders have discrepancies to the Excel sheet. Any possible solutions?

#

If anyone has any possible solutions or suggestions please let me know

tough cosmos
#

Hey guys, I have been invested in the Data science sector a lot , studying courses on coursera udemy from recognized institutions and udemy. I have learned almost all libraries required and even machine learning.
Now I don't to how to start my career...

iron ginkgo
#

Hey, I got a probably simple(?) question about data analysis. I have a sequence of values (stored in DataFrame) and from that sequence I want to analyse and extract subsequences where the mean value is less/more than some other constant value (basically looking up chains of suspicious values) and I want to extract them. Does this process has some specific name?

potent badge
#

Question: When dealing with a neural network, why would someone divide by the sqrt of the number of neurons in the layer after performing the dot product between the neurons and the weights of that specific layer?

desert oar
iron ginkgo
#

Thanks!

desert oar
potent badge
#

I did read online on some sources, that some people initialize the weights as 1/sqrt(# of nodes) but....... not after doing the dot product

desert oar
#

(side note: i hate that stats, data science, and ai are 3 separate forums... they all get the same damn questions)

#

think of it this way: the greater the number of nodes, the larger the resulting dot product

potent badge
#

So if im going from a first layer with say 784 neurons to a layer with 128 neurons would I be doing sqrt(784) or the sqrt of (784*128)

desert oar
#

im only seeing this technique used for the attention mechanism

#

i think you'd do the latter based on what i'm reading

#

because the linear component of a layer is (W ยท x) + b where in this case W is 128ร—784, x is 784ร—1, and b is 128ร—1

#

however because the nonlinearity is elementwise, you would apply this elementwise too...

#

so maybe it's sqrt(784)

#

honestly i have no idea, ask your prof and let us know

#

there is this, which is more what i would intuitively expect people to use https://paperswithcode.com/method/weight-normalization

Weight Normalization is a normalization method for training neural networks. It is inspired by batch normalization, but it is a deterministic method that does not share batch normalization's property of adding noise to the gradients. It reparameterizes each weight vector $\textbf{w}$ in terms of a parameter vector $\textbf{v}$ and a scalar param...

lapis sequoia
#

@desert oar If i want to start teachnology business should i good at math for coding?

bronze skiff
#

dividing by the sqrt of num neurons is xavier initialization

lapis sequoia
#

What are you all talking about ?

desert oar
#

@bronze skiff i believe this is inside the network before applying the nonlinearity

bronze skiff
#

suppose you assume your inputs are distributed according to N(0,1)

lapis sequoia
#

Wait

#

Is this linear algebra?

bronze skiff
#

then after applying independent weights in a dot product, you have something that is N(0,1)+...+N(0,1) = N(0,num_neurons) distributed

#

this is too large, so you divide by sqrt(num_neurons) to bring it back to N(0,1)

desert oar
#

so if you can assume that the weights are already normal then yeah i can see how it acts to scale the output back down to unit variance

have you heard of doing this in the network itself, rather than for initialization?

bronze skiff
#

this is what batch norm is kinda trying to do

#

but often initialization is good enough

#

which is why for example, pytorch nn.Linear does this by default

desert oar
#

batch norm actually uses the estimated std dev of the weights right?

bronze skiff
#

if you look at the source code for it, it initializes by dividing by the sqrt(num_neurons) already

bronze skiff
#

another way to do this is layer norm

desert oar
#

ah, i didn't realize layer norm was a thing. this must be a form of layer normalization then?

bronze skiff
#

dynamically normalizing signals in neural nets is still an active field ๐Ÿ˜›

desert oar
#

again i am familiar with dividing by the norm of the weight vector... but not by the sqrt of the number of weights

#

i just found the term "weight standardization", which seems apt here

bronze skiff
#

yeah

#

one is to normalize the "size" of the preactivation outputs

#

and one is to actually normalize the "distribution" of the preactivations

#

they're not unrelated, but not the same

potent badge
desert oar
#

https://arxiv.org/pdf/1903.10520.pdf
https://paperswithcode.com/method/weight-standardization
they still use the estimated std dev from the weights though

Weight Standardization is a normalization technique that smooths the loss landscape by standardizing the weights in convolutional layers. Different from the previous normalization methods that focus on activations, WS considers the smoothing effects of weights more than just length-direction decoupling. Theoretically, WS reduces the Lipschitz co...

#

they don't assume the variance is = # of weights

#

im really curious what conditions make that assumption valid; apparently something analogous is valid in attention units according to what i found and posted above

#

side note: i really like the graphics in this paper

#

looks like good old matplotlib

#

well-labeled figures, nicely typeset equations

bronze skiff
potent badge
#

so I would only divide for the first layer?

bronze skiff
#

i think you're conflating an initialization with normalization

#

initialization is in the very beginning, when a network is constructed

#

normally the weights are randomly sampled under maybe an N(0,1) distribution

#

instead, we divide each weight in each layer by the number of neurons in that layer

#

that's the initialized net

#

afterwards, we just run the net like normal during training

potent badge
#

hmmm

bronze skiff
#

also, +1 for jax

potent badge
#

so I am dividing the weights before i do the dot product?

bronze skiff
#

this is weight initialization

#

there are no dot products

potent badge
#

okay makes sense but do you see where the sqrt is in that code

#

it happened after every dot product

#

but I just took some out

#

and im trying to figure out why

#

my professor had written the dot product like that

bronze skiff
#

technically since it's preactivation there is literally no difference between "initializing the weights by dividing by sqrt" and "initializing weights at N(0,1) and dividing the preactivations by sqrt"

#

it's just math

potent badge
#

thats what I would have thought, but he originally did it for the hidden layers and output layer as well... dividing by sqrt 784

bronze skiff
#

that's fine

#

you scale by the number of input neurons

#

so at each input, you have 784 neurons

#

it's only the output that you have 10

potent badge
#

when I take the sqrts out and train the model as is, i get boosted accuracy, but when I do the sqrts the accuracy is like 10% for every epoch

bronze skiff
#

ยฏ_(ใƒ„)_/ยฏ

potent badge
#

literally without the sqrts

#

and then with them it will be like 0.10..etc (10%)

desert oar
#

are you sure you didnt misunderstand what the prof was doing

#

(im also surprised that taking out the sqrts messes up the model that badly)

#

wouldnt you divide by sqrt(len(w))?

potent badge
#

uh i dont think i could have misunderstood, he kinda just threw this file at us and was like "fix it"

#

this was his code so him dividing by 784 is like what i dont understand

desert oar
#

oh thats your prof's code?

potent badge
#

yeah

#

he changed it a bit from the Jax Neural Network code thats out there on the web

bronze skiff
#

i just don't understand why there's a relu, and then relu again

potent badge
#

under the #skip pre activations?

#

i believe its because we have not yet taken the relu of the input layer, so we do that and then that's our new x value

#

this is the original from jax creators or whatever

desert oar
grave frost
haughty tree
#

hey I want to start on AI and ml is there any course to follow through done my maths in school and csci major (but I don't mind to brush up my maths for ml suggest me )

#

lacking without proper guidance

echo orbit
#

Hi, i'm working on a project about COVID19 Tweets (especially hashtags), and i was thinking about making a neural network to make predictions.
I currently have a dataframe with each row listing all the hashtags used in a single tweet/thread (lists) based on multiple months (from january 2020 to march 2021). I already studied it using networkx (still a wip but it's all about aesthetics now). That means hashtags from a same list (so same row) are "linked" to each others.
My question is : is it possible to make a machine that train on these hashtag lists month by month, then ask it to predict what would the next month's hashtag links be (so i can make another network and compare it to the datas) ?

haughty tree
#

Kindly help me I'm just delaying

echo orbit
#
                   Tweet_ID                          Hashtag
0       1219778294238699520                   [#coronavirus]
1       1219780718680633344           [#us, #wuhanpneumonia]
2       1219785759277772800                [#wuhanpneumonia]
3       1219791407377895424                   [#coronavirus]
4       1219797876127215616                   [#coronavirus]
5       1219805336074215424                         [#virus]
6       1219806921953181697              [#wuhancoronavirus]
7       1219809142237552640                      [#ncov2019]
8       1219811430825771008                      [#breaking]
9       1219813007695286272                   [#coronavirus]
10      1219813206379466752                   [#coronavirus]
11      1219815181599019008                   [#coronavirus]
12      1219817038354558976                         [#wuhan]
13      1219818433165946880         [#us, #wuhancoronavirus]
14      1219819377157005314              [#wuhancoronavirus]
15      1219823330234003462                   [#coronavirus]
16      1219824203454529536                   [#coronavirus]
17      1219824463367172096                      [#breaking]
18      1219824742099824640                   [#coronavirus]
19      1219826185049231360              [#wuhancoronavirus]
20      1219828098025213952                            [#us]
21      1219832397790760966                   [#coronavirus]
22      1219832615743770624                      [#breaking]
23      1219837312114286594                   [#coronavirus]
24      1219838131005874176                [#wuhanpneumonia]
25      1219838150530351104                            [#us]
26      1219839406351106048                         [#wuhan]
27      1219840010779873281  [#china, #china, #wuhan, #ncov]
28      1219840206519422976                            [#us]
29      1219840734418747393              [#wuhancoronavirus]
#

The df looks like this (300K+ tweets)

#

I don't really know on what extent can a neural network make predictions on, so if anyone can enlighten me regarding my issue, that'd be greatly appreciated

grave frost
desert oar
#

@echo orbit this looks like what they call a "multi-label classification" task

#

although it's also kind of a time series

echo orbit
#

what do you mean (regarding the multi-label classification) please ?

desert oar
#

@grave frost you're the RNN evangelist here, how would you model a time series where each time point is a sparse vector?

echo orbit
#

Regarding time series i took a look on some articles but what i see is each value is set to a specific time, while here it's everything for the same month

#

So i'm a bit confused regarding how to approach the problem

haughty tree
grave frost
#

If it were me, I would compile a sizeable file of all hashtags possible @echo orbit you can easily scrape them from twitter (putting a min limit to ensure they are reasonably famous) encode the tokens numerically and try to predict them

desert oar
echo orbit
#

On my program i made a dict so it takes only the 50 most used hashtags regarding COVID19 tweets (if that's what you were asking) for each month

#

However i don't understand what you mean by encoding the tokens then predicting them

grave frost
#

the tokens method was something off NLP - I doubt it wouldn't work reasonably for your problem

echo orbit
#

Yeah, it just serves to me so i don't have to count each hashtag's occurence counts

grave frost
#

it was based on the fact twitter is full of fools - their tags are like # + <some_weird_place> + <virus_name> + year all of which can be broken down into tokens.

#

best you can do is to try it - can't gurantee

echo orbit
#

I see

grave frost
#

so #chinacoronavirus and #wuhanvirus and wuhancorona would be decomposable

#

I am not a twitter or Time-series expert, so take my advice with a grain of salt

echo orbit
#

I don't think i'll have time for that unfortunately (as i have to submit the project before monday and i have a lot of stuff to do beside that)

#

If predictions aren't possible (at least not with such a short time), is it perhaps possible to categorize hashtags regarding how "linked" they are ?

#

Then make a program that gives the probability of two hashtags being linked for ex ?

grave frost
#

yes - you can categorize hashtags but you need labelled data

#

wdym by linked?

echo orbit
#

If i take the dataframe sample i posted above :

27      1219840010779873281  [#china, #china, #wuhan, #ncov]```
In the tweet with the tweet_id `1219840010779873281`, `#china, #wuhan and #ncov` are in the same tweet/thread, so i consider them "linked" here (with china used twice, still have to figure out if i should count it twice or not)
#

With networkx i made a network to see these links in a more general way, and my question was if it's currently possible to make a program so it gives the probability for 2 hashtags to be linked

desert oar
#

if you just want the probability of 2 hashtags appearing together in a tweet, you can use pointwise mutual information https://en.wikipedia.org/wiki/Pointwise_mutual_information

Pointwise mutual information (PMI), or point mutual information, is a measure of association used in information theory and statistics. In contrast to mutual information (MI) which builds upon PMI, it refers to single events, whereas MI refers to the average of all possible events.

echo orbit
#

Hmmm

#

I don't see what i can do to improve my project then

#

aside from visualizing datas through networkx

light stump
#

does anyone know how to perform maximum likelihood estimation for a 2D Gaussian?

uncut barn
#

does anyone know what the K variable her means?

tidal bough
#

number of dimensions

light stump
#

my issue is that I donโ€™t know how to translate the math into code

#

Iโ€™m trying to fit 2d Gaussians to fluorescent peaks on an image using MLE

#

but my parameters are somewhat nonstandard

uncut barn
#

@tidal bough so if we were workin in 3D k=3?, but how can that be if x the input vector ranges from 1 to P, are we assuming k=p?

tidal bough
#

nevertheless, it should be the number of dimensions

uncut barn
#

so number of features in x?

tidal bough
#

yeah, k=P

uncut barn
#

ok thanks

strange fern
grave frost
#

Suppose I am working with a Masked Language Model to pre-train on a specific dataset. In that dataset, most sequences have a particular token of a high frequency

Sample Sequence:-
    <tok1>, <tok1>, <tok4>, <tok7>, <tok4>, <tok4> ---> here tok4 is very frequent in this sequence

So if I mask some tokens and get the model to train to predict those masked tokens, obviously the model will gain a bias in predicting <tok4> due to its statistical frequency.

Since <tok4> represents important information, 'downsampling' (or removing those frequent tokens) would not be preferred and I would love to have my sequence as intact as possible.

How best should I deal with this? Is there any already established method that can counter this problem?

bronze skiff
#

you can "preweight" the sequence before the attention step

#

if you're using an attention model

desert oar
#

is that something like class weighting in logistic regression?

bronze skiff
#

kinda-- though i think in logistic regression class weighting is penalizing incorrect classification of the dominant class less?

#

here it's like, you have the sequence <tok1> <tok1> <tok2> <tok3> but really you have something like <tok1, 0.1> <tok1, 0.1> <tok2, 0.5> <tok3, 0.3> where the weighting could be based on frequency or something

#

and so during self-attention you take weighted dot products instead of regular dot products

desert oar
bronze skiff
grave frost
bronze skiff
grave frost
#

first time for everything
so nothing of that sort has even been written?

bronze skiff
#

maybe it has, would be surprised if it hasn't

grave frost
#

aight. now just to find someone capable enough

#

thanks a lot BTW @bronze skiff ๐Ÿš€

bronze skiff
#

np

upper sphinx
#

May I ask? what are the advantages of using a python notebook than using a regular python script?

#

I've heard that it is often used on data science and machine learning so should I only use notebooks on these specifically?

serene scaffold
#

Suppose there's a function that takes two arguments. Is there a vectorized way to call this function with every like row (same index) in two dataframes?

desert oar
#

maybe something like numpy.vectorize?

serene scaffold
desert oar
#

so you want to pair rows together across the 2 dataframes?

bronze skiff
#

uh, why not join the two dataframes on the index

#

and then apply your function

desert oar
#

or use pd.concat to get the multiindex column name

bronze skiff
#

though you'll probably need to precompose your two-arity function with a random lambda

desert oar
#
result = (
    pd.concat(
        {'x': df1, 'y': df2},
        axis=1,
    )
    .apply(
        lambda row: myfunc(row['x'], row['y']),
        axis=1,
    )
)
serene scaffold
desert oar
#

i wish i could write it like this and not have people get confused

result = (
    pd.concat(
        {'x': df1, 'y': df2},
        axis=1)
    .apply(
        lambda row: myfunc(row['x'], row['y']),
        axis=1))
#

(which would be the lispy version)

serene scaffold
#

Anyway, isn't this going to make a dataframe of dataframes or something?

#

Seems like a bad data model

desert oar
#

it depends on what myfunc returns

serene scaffold
#

A float

desert oar
#

then it should just return a series of floats

fresh zenith
#

gey guys if i want to get started in graphin things

#

matplotlib is the thing to learn right

#

but can you use it in any old ide

#

like pycharm?

serene scaffold
#

Or no IDE for that matter.

serene scaffold
bronze skiff
#

or since you're looking for indices that are equal, you know, just join

serene scaffold
serene scaffold
bronze skiff
#

ah, you sound like you want something like a "zip" for dataframes

serene scaffold
#

ye

bronze skiff
#

i actually thought you could iterate through a df like that?

#

though not sure how to do it in a vectorized way that isn't a giant loop

serene scaffold
#

I mean sure, but I want it to be v e c t o r i z e d

bronze skiff
#

i mean, apply isn't even vectorized

#

sadly

serene scaffold
#

ah well

bronze skiff
#

it's a giant loop under the hood

desert oar
#

pd.concat, pd.merge, and pd.DataFrame.join are all kind of the same thing

#

in fact pretty much any pandas operation is a join on index

bronze skiff
#

ah yes, you're right

#

i literally haven't used pandas in months

desert oar
#

there's a lot to forget

red hound
#
    with tf.GradientTape() as gen_tape:
        predictions = generator(z)
        #predictions = tf.cast(predictions, dtype=tf.int32)
        predictions = tf.nn.embedding_lookup(embedding, predictions)
        predictions = tf.reshape(predictions, shape=(128, 18, 200))

Im looking for a workaround as tf.cast isnt differentiable but the embedding_lookup strictly need integers as indices. As i want to optimize the generator, casting outside the gradient tape is no option. If you got an idea please feel free to ping me

ripe forge
#

also, fwiw, np.vectorize is a noob trap, it almost always resorts to a native loop. Also, amusingly enough, if you're using apply, you might actually get better performance by good old list comp.

serene scaffold
patent loom
#

So Iโ€™m trying to do some sentiment analysis on movie scripts and trying to distinguish at least a variety of emotions based on each sentence. Does anybody have any tips or recommendations on how to get started?

#

Was going to use nltk for python and then go from there

serene scaffold
#

So you might want to look into multi class sentiment analysis and see if that, well, exists

patent loom
#

Well Iโ€™ll just make the adjustment and base it on positive, neutral and negative per sentence

serene scaffold
#

Make the adjustment?

patent loom
#

What about Bert?

#

I meant something else for the adjustment part

serene scaffold
#

One could probably involve bert in a sentiment analysis pipeline, yes

#

So you want a model that predicts a tuple of three floats (positive, negative, neutral) for each input?

patent loom
#

No Iโ€™m trying to do something like VaderSentiment

#

Ignore me right now. Iโ€™m not making sense and Iโ€™m stressed. Iโ€™ll be back once I get my shit together

grave frost
#

VADER is pretty old-tech. Pre-trained models are all the rage now

patent loom
#

Any suggestions?

#

@grave frost

grave frost
patent loom
#

A movie script the movie script database

grave frost
#

imdb?

patent loom
grave frost
#

you can use simple RNN's if you are new

patent loom
grave frost
#

rather than jumping on pre-trained models

patent loom
#

Here goes an example of a script

#

Could you send a link to the RNN documentation

grave frost
#

RNN is a type of model arch

#

I recommend you learn the ML basics first before diving in

flint mason
#

Can we store bucket iterator type dataset in pytorch

grave frost
flint mason
patent loom
#

Itโ€™s a love hate relationship with coding man ๐Ÿฅฒ

grave frost
grave frost
flint mason
#

no how can we store bucketiteratore type datasets from torchtext library and load them later to avoid downloading down time for code

grave frost
flint mason
exotic maple
#

does TF have a train/test splitter or do you use sklearns train_test_split?

serene scaffold
exotic maple
lunar zenith
#

Anyone knows matplot?

#

How do I get the recovered out of there

#
df_pie = df.loc[(df['Recovered'] >= 80000)]
df_pie = df_pie.groupby('WHO Region')['Recovered'].mean()

df_pie
df_pie.plot(kind = 'pie', radius = 2)```
near cosmos
lunar zenith
lunar zenith
#

not a regular plot

#

Is there a way to maximize, or remove the "Recovered" in the chart

near cosmos
near cosmos
dapper hatch
#

hi someone built a web app and run two jupyter notebooks ?

severe cloud
#

i am building a face detection app on opencv but i am stumbling on the issue that its only drawing one eye and not all the faces with both eyes

#

can anyone point me in the right direction to drawing the eyes and faces of everyone in the picture?

lavish tundra
#

someone here understand about line graph animation using matplotlib?

zealous tulip
#

I am a beginner python user, any libraries or software to learn to get into visual recognition and machine learning

exotic maple
zealous tulip
#

I see @exotic maple

exotic maple
#

but if you really want names -> Numpy, Pandas, Matplotlib / seaborn / plotly, sklearn, Tensorflow, opencv...etc

zealous tulip
#

Thank you!

whole hamlet
#

People, my LightGBM classifier is working at same speed when being run on GPU even after installing the Lightgbm for GPU

near cosmos
near cosmos
# lunar zenith what exactly do you mean by ax = ax

ax is an argument to DataFrame.plot that you can give to force it to plot to an existing axis. So,

fig, ax = plt.subplots()   # create a figure and axis object
df_pie.plot(kind="pie", legend=False, ax=ax) # plot the pie chart to axis 'ax'

# make further adjustments
ax.set_ylabel("")
fig.tight_layout()

This is particularly useful for figures with multiple subplots, obviously. But its a common trick to do the plt.subplots call for pretty much every plot, even for single plots, because it is the most convenient and consistent way to have the fig and ax objects available. The matplotlib docs themselves say so https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html#a-figure-with-just-one-subplot

Although for this case it looks like DataFrame.plot will return an axis to you, so the you can also do, e.g.:

ax = df_pie.plot(kind="pie", legend=False)
ax.set_ylabel("")
ax.figure.tight_layout()

and for the direct question, you can turn off the label in the original call to plot:

df_pie.plot(kind="pie", legend=False, ylabel="")
near cosmos
ripe forge
bold timber
#

Hi, I have a question for y'all: Why when i want to Visualizing the test set, I using 'X_train' again to plotting?

grave breach
#

@bold timber

#

You're visualising the model trained on X_train on X_test data

bold timber
grave breach
#

Don't know, haven't read the full code (and doesn't use matplotlib for visualization)

bold timber
dusky granite
#

hi i need help with tensorflow

#

Why do i get this error?

#

InvalidArgumentError: Unable to parse tensor proto

#

googling it out it seems that the dataset is greater that 2gb
which i doubt but want to verify
even if i use a takedataset of 10 it still gives the error

dusky granite
#

as it may get lost

ripe forge
ripe forge
ripe forge
bold timber
ripe forge
#

Are there 20 data points in train? Yes. That's what your code did.

bold timber
#

Oh yeah I know it. Is because test_data is 1/3, which means 30/3 = 10. And 10 belongs to test. and 20 value of data belongs to train, right?

ripe forge
#

Yep

#

And see the function name. Train test split. It's job is to split the data into train and test.

bold timber
#

yeah I understand now. thank u.

bold timber
ripe forge
#

That's basically a "seed" for the random number generator required to do a random split

#

Basically, we want to make the program split the points randomly

#

That requires a random number generator. And computers don't "really" do random numbers, but instead they use some kind of pseudo random generator techniques.

#

All those techniques start with a seed. So if you give a fixed seed, you'll always get the same split

#

So long story short, the seed allows you to consistently get the same output from any random operation, such as a random split.

bold timber
#

i lost 63777.77

dusky granite
#

as the seed changed

#

you didn't lose it it probably went to validation

#

(test)

bold timber
dusky granite
#

then there won't be any test data i think

bold timber
dusky granite
#

i think so yes
(i have not used this)

dusky granite
hoary wigeon
#

Hello

dusky granite
#

hello

hoary wigeon
#

i need help

dusky granite
#

i too need help

hoary wigeon
#

what kind of help ?

dusky granite
hoary wigeon
#

no idea sorry

dusky granite
#

what do you need help with?

wooden forge
#

Hello everyone, i'm trying to make an Image Recognition algorithm, after creating a neural network from scratch, my goal is to create one without NN to compare their energy consumption

#

the thing is, after creating that last one, the precision is terrible and i don't know how to improve that, something even funnier : the only number to be correctly recognized is 4 (not all the time tho)

#

May i request your help ?

#

(i'm using the MNIST data base)

dusky granite
#

have you tried training more?

wooden forge
#

well, the technic is simply to calculate the average pixel values

#

my issues are on that one not the neural network, this one is working and i am very happy about it

dusky granite
#

oh ok

#

so what's wrong?

wooden forge
#

well the accuracy

#

that programm doesn't have a great accuracy (almost wrong everytime except when it's a 4 lol)

dusky granite
#

do you have a big testing dataset?

wooden forge
#

well 42000 images

#

but because it's an average, i could have 3000 or 1M it would be the same

#

i think the problem is on the comparaison

dusky granite
#

well 42000 proves that it does not occur due to shortage of data

wooden forge
#

the goal is to compare the image you're studying with the averages from a list containing the average for every labels

#

So those are hand written numbers between 0 and 9

dusky granite
#

well i don't have a solution for your problem

#

interesting application tho

#

open source?

wooden forge
#

wdym open source ?

dusky granite
#

like is the code open for anyone to use

wooden forge
#

it's a personal project so i assume yes

dusky granite
#

cool

wooden forge
#

i didn't upload it on anything apart here this morning because i was looking for help haha

dusky granite
#

so not open source

#

atleast yet!

wooden forge
#

yeah haha

#

i'm not really into code sharing organisation ? idk how to explain

#

i'm just coding stuff for me

dusky granite
#

i am into it because i often steal code from other people

#

so i like to have my code open aswell

wooden forge
#

lmao

#

wait i think i might know why

#

for the NN i had to normalize the pixels because with some functions it caused overflow

#

but with this i can just use regular pixel values

#

let me try that

#

alright let's see if it works

#

it doesn't but ! i just need to put a tolerance and that should do the thing

#

yeah haha i need to do something else but i feel very close

grave frost
dusky granite
wooden forge
#

yeah i have no idea how to use tensor flow

#

i wrote neural networks from scratch not with this

#

sorry mate

crude fable
grave frost
#

yea, you can wrap PT around it

dusky granite
wooden forge
#

wdym?

dusky granite
#

so that people who view this channel know i have a problem

wooden forge
#

ha okay

grave frost
#

what dataset?

dusky granite
grave frost
crude fable
#

Or just write a generator yourself

dusky granite
grave frost
#

TPU is not supposed to be used by beginners especially if you don't know how to use CUDA GPUs

dusky granite
#

i have the thing working with a gpu

grave frost
#

or debug code for that matter

grave frost
dusky granite
#

i use colab and am no longer getting connected to a gpu instance

#

you know more you use less you get

grave frost
#

you will get one eventually

dusky granite
#

i have been using very much for weeks

grave frost
#

colab is not an unlimited supply of GPUs

wooden forge
#

yeah i was told to use colab, because i need approximately 2 days to train my NN lol

grave frost
#

use CPU when writing code

dusky granite
#

the downtimes have reached 5 days

grave frost
#

I use colab all the time with no problems

wooden forge
dusky granite
grave frost
#

if you are not using CPU 95% of your time, then you are doing something very wrong

wooden forge
#

i have to find how to use my GPU with spyder instead of my CPU

grave frost
#

don't switch to GPU instance, just keep it on CPU

dusky granite
#

i just use gpu when training

grave frost
wooden forge
#

i have a RTX i want to see if it's better (it's supposed to be because a GPU is faster than CPU for such things, why do you think people buy so many to mine bitcoins)

dusky granite
#

yup i realised that later

#

that closing the tab does not terminate the session

dusky granite
grave frost
wooden forge
#

i was just giving an exemple omg

grave frost
dusky granite
#

is a properly made model for tpu faster than gpu?

wooden forge
crude fable
#

Do you need to modify your code for gpu to run on TPUs?

grave frost
dusky granite
#

for fast use yes

#

or it works at cpu speeds

crude fable
#

like, copying the tensors and parameters to the TPU device?

dusky granite
#

there is some modification in code if that is what you mean

grave frost
dusky granite
#

awesome ruler do you use tpu?

crude fable
#

ic

grave frost
crude fable
#

How much faster are TPUs to GPUs

dusky granite
#

can you help me convert this one example?

grave frost
crude fable
#

cool

grave frost
#

could be more, could be less

grave frost
# crude fable cool

because you have 8 cores in a TPU. think of a TPU like multiple GPU's integrated in a single device. it's a bit more complex than that, but a good analogy

#

so each core is a GPU, and since 8 cores = 8 GPU's

crude fable
#

what about memory?

grave frost
#

so your model should have to fit in 8gb

crude fable
#

ic, not that large

grave frost
#

in practice, it's quite different tho

crude fable
#

I'm currently training my models on Tesla V100s

grave frost
#

im not a hardware expert, but I am able to use models bigger than 8gb on TPU

crude fable
#

with 32GB/card

grave frost
#

dunno why?

dusky granite
#

can you help me convert this one time?

grave frost
#

most prob smthing to do with the TPU architecture

grave frost
#

look it up on google

dusky granite
#

error solving

#

i tried all i could

grave frost
#

what's the error?

#

the full traceback

dusky granite
#

this is the full thing

arctic wedgeBOT
#

Hey @dusky granite!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

โ€ข If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

โ€ข If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

crude fable
# grave frost 8gb per core

with just 8gb per card maybe you'll have to reduce your batch size or it won't fit in. Or split the same batch among different cores which sounds complicated= =

dusky granite
#

in short InvalidArgumentError: Unable to parse tensor proto

grave breach
dusky granite
#

here is my attempt at using tpu

#
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])

tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

print("REPLICAS: ", tpu_strategy.num_replicas_in_sync)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
with tpu_strategy.scope():
  image_learner = Sequential([
    data_augmentation,#we pass all the images through data_augmentation to create multiple of them
    layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),#we change the rgb values range from existing 0-255 in int to 0-1 in floats
    #it is easier for the model to work in smaller range of values
    layers.Conv2D(16, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    #we create hidden nodes for the model to work on, we are using the activation method relu which is the most efficient one
    layers.Dropout(0.2),#we remove a number of output units, this regularizes the data and is a method to prevent overfitting which means over overtraining the model
    layers.Flatten(),#we normalize the layers
    layers.Dense(128, activation='relu'),#we make output layer
    layers.Dense(num_classes)#we state options for output
  ])
  image_learner.compile(optimizer='adam',
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
       metrics=['accuracy'])```
#

@grave frostyou there?

inland zephyr
#

hello all i want to ask about Siamese Network. Is it okay if i have only 1 image per class to train the network, since the siamese works in pairs of images and if I only have one image for a class, so the positive pair will be both of the picture (it will the same picture in the class)?

#

and anyway is there any examples should i read (except paid course) to learn siamese implementation?

bold timber
viral hearth
#

Im not sure. But here's another way to do it

#

Use numpy.polyfit.(X_train, 1) to get the polynomial of grade 1

#

And then you use numpy.polyval(numpy.polyfit(X_train, 1), X_train) to get the image of X_image

#

X_train

#

Use then plt.plot. This way it should work. Or at least it works for me

#

It is the regression polynomial ax+b

crude fable
inland zephyr
#

I only have a single image a photoshot of a people. it only take once per person. I use MTCNN to extract the face

#

with this case I cannot use any kind of traditional cnn since it need to have much data

#

so my best bet is Siamese since it has probability to use few image

crude fable
#

your goal is to train this MTCNN Net right?

inland zephyr
#

no

#

the MTCNN job to take the face

crude fable
#

then what are u trianing for

inland zephyr
#

and the siamese will check whose face is thia

#

so i have let say thousands people and each one only has 1 face photo

#

so i need to build the simple nn to determine whose face is this

#

and based on my case and several read siamese is my best bet for this task

#

but i aware that if i only have 1 face let say it as the anchor and negative, i dont have face for the positive

#

is repreprocessing the image to make more variation of the face are good advice?

crude fable
#

yes, basically what you want is to generalize different suituations (like expressions etc) based on only one pic

#

My advice is to use data augmentation to enlarge your class size first

inland zephyr
#

for one image how much copy i need for siamese training?

crude fable
#

I think Siamese Networks are mainly for matching problems and does not apply to your situation

inland zephyr
#

the minimum one based on practical since i will use pre trained model to speed up development

crude fable
#

At least, you've got to have many pics for the same class so that the model can generalize

inland zephyr
#

anyway... if the siamese is using for matching task, i think it is have similarity with my goal.

crude fable
#

The proper situation would be you have many pics under many classes

#

and you want to map pics of the same class to closer representations

#

while increase the distance between pics of different classes

inland zephyr
#

i'm sorry i need to clarified that what i mean whose face it this, is when in the future the same person face taken and it match with one of the face in my database ( one faces per person ), it will inform me that the person has similiarity with this person in db

#

just like absent collection or fraudster recognition

#

so based from my case thats why i choose Siamese for my network

dusky granite
#

i think i have shortened my problem

#

i need help figuring out this error

#
   1086           self._maybe_load_initial_epoch_from_ckpt(initial_epoch))
   1087       logs = None
-> 1088       for epoch, iterator in data_handler.enumerate_epochs():
   1089         self.reset_metrics()
   1090         callbacks.on_epoch_begin(epoch)
serene scaffold
dusky granite
#

sure wait

dusky granite
#

i think this is what is the main problem currently

serene scaffold
#

@dusky granite look at history = image_learner.fit(train_ds,validation_data=val_ds,epochs=10,steps_per_epoch=128) and make sure that the items you passed to it are all the right type.

dusky granite
#

i don't know what steps_per_epoch is