#data-science-and-ml

1 messages · Page 362 of 1

delicate sphinx
#

The weights for my tensorflow model are 57MB haha

stone marlin
#

Wait, okay, so you're just putting in the tokenized words (as int indexes) into a NN?

delicate sphinx
#

yeah through an embedding layer

#

Yeah the tokenizer has a vocabulary I can index to get words with a given int

#

i.e. tokenizer.sequences_to_text([581, 20, 14, 3414]) = what are you doing

stone marlin
#

Right, okay, so far so good. IIRC, Keras requires integer stuff for this kind of thing. I think this is what Word2Vec does?

delicate sphinx
#

I only use every word in my project which is around 28,000 unique ones

#

it keeps count of the unique words

#

though there is another part of it that counts occurrences of the words I think

delicate sphinx
stone marlin
#

Okay, so, you've got the embedding layer. What are you hoping to get out of your NN at the end? You end up with a 16-element thing, what is that supposed to be?

delicate sphinx
#

an answer

#

i.e.

"what are you doing" --> model --> "nothing"

#

hopefully between lengths of 1 and 16 as my longest answer in my dataset is 12 (and I want to keep it in powers of 2)

stone marlin
#

Okay, so you're using like SGD to train the NN? This is where my knowledge of NLP / NNs gets a bit fuzzy, apologies.

delicate sphinx
#

CCE and Adam

#

Optimizer and loss

stone marlin
#

Okay, same dealio. Okay.

delicate sphinx
#

CCE loss

#

myb I always say them the wrong way around haha

stone marlin
#

Okay, so now I'm gonna be totally in the dark. Does your output, right now (on a test set) correspond to up to 16 words (index of words) which you've tokenized?

#

If you get back like, [1, 4, 6, 7, ...] is that going to be those words?

delicate sphinx
#

Yeah I just realised the way I'm doing it is definitely wrong

#

as my answer is tokenized too

#

which means as long as it outputs float I'll never get good enough answers

stone marlin
#

Right, the softmax is gonna be wacky.

delicate sphinx
#

because the output expects float and I have int

#

so I need to either:

  1. convert the stuff to int just before it's output or

  2. find a way to convert from float to int and do the inverse of that to my answers so the model can actually learn it

#

either way my answer format is wrong :/

stone marlin
#

Yeah, it feels like it's giving you a "fuzzy" answer, but that makes no sense with an integer corpus. Hm.

delicate sphinx
#

yeah it's my processing that converts it to float

stone marlin
#

I'm not knowledgeable enough in this field to be able to help right now, but I'll take a look at it a bit later and see if I can mess around with a toy model.

delicate sphinx
#

tbh with you I combine an image and a question model into one

#

so it's fairly complex so dw too much

#

but if you can make a sort of model that converts integers to floats then that's fair

stone marlin
#

Haha, I'm just going to do the question part, mostly so that I know it better. :']

delicate sphinx
#

but because of my image input I think I have to avoid just processing on ints

stone marlin
#

I haven't done much with NNs since I've never needed them for work, but they seem pret cool.

delicate sphinx
#

Yeah Tensorflow has lots of documentation and all as well

stone marlin
#

Can you process your image in another NN and pick out relevant features, then pass those in?

delicate sphinx
#

sadly the only project that appealed to me like it could've helped me used custom encoder/decoders which I don't want to steal

stone marlin
#

Object detection or whatnot? Or, possibly, figure out the "topic" of the question from your NN and feed that into an Object Detection NN?

delicate sphinx
#

and that's passed into an image model and my image model and question models combine into my merged model

stone marlin
#

Ahh, okay. Wild.

delicate sphinx
#

yeah, I can find help for either 1 thing but the only help I can find online for my project are quite different and don't just use tensorflow which I'd like to do

#

the way I've managed my model so far, this is the output

#

I think softmax is only realy useful for one-hot-encoding because it represents a probability on each word being correct

arctic crown
#

@stone marlin which algo can i use for this?

stone marlin
#

There are a number of ways to architecture this, but if you're specifically talking about one thing, you can record the times you've done this in a db or something and use linear regression.

#

At least for a proof of concept.

arctic crown
#

but

stone marlin
#

You could, for example, store "time lights went on in the last 30 days" or something, and regress on those.

arctic crown
#

but in linear regression lets say one axis is the time what would be the other axis?

stone marlin
#

x-axis is the day, y-axis is the time you put the light on or whatever.

#

Honestly, for your thing, you could literally just take the mean of the last N days.

arctic crown
stone marlin
#

Your y-axis would be the time. Your x-axis would be the day number.

arctic crown
#

hmm

stone marlin
#

If you want a 1-dimensional linear regression, that's pret much just the mean. So, you could do the last 7 days and be like, [7, 7, 7, 8, 7, 7, 6] for time to wake up, and it would take the mean of those and turn the light on then.

arctic crown
#

what do you mean by "mean"?

stone marlin
#

Average.

desert oar
#

there's nothing wrong imo if the index isn't "1 per row". if anything, it's good practice to try to use meaningful indices whenever you can, instead of just integer row numbers

delicate sphinx
#

mean as in: mean of [5,5,6,9,4] = 5+5+6+9+4 / 5 = 29/5 = 5.8 (mean = sum_of_list / len_of_list)

#

tbh I would've said just do the mean of it too, but I'm really no genius haha, and obviously you can use AI to prevent outliers contaminating it

#

but for the most part mean should work

arctic crown
#

ah

stone marlin
#

Yeah, mean is fine, median is prob better tbh.

arctic crown
#

but then i dont even need to use ml

stone marlin
#

Median's robust to outliers, so that'll be better.

#

Yes. You don't.

#

Haha, not every problem needs ML to solve!

arctic crown
#

yea lol

delicate sphinx
#

If it's some sort of "ML is required" project you've been given, Linear regression is probably best bet

arctic crown
delicate sphinx
#

But you might be trying to use a workshop of tools to hammer a nail in place here haha

stone marlin
#

Yeah, you can translate that to military time or whatever.

delicate sphinx
#

yeah 5:30 = 0530 in military time

#

much easier to manage

#

(or just do 5 * 60 + 30)

#

as long as you're consistent

#

If you want to bring in machine learning that will probably have to be a more complex project, i.e. learning whether or not a value is an outlier

#

i.e. if on the weekends they turn on the lights at 11:00 because they slept in longer than usual

#

then that shouldn't change the times for the rest of the week

arctic crown
#

you know when some people say their ai can improve over time

#

what do they exactly mean?

#

like it improves in what?

#

@delicate sphinx

#

@stone marlin

#

sorry for the pings

delicate sphinx
#

It depends on what they look to improve

#

And it depends on if they let it continue to learn

arctic crown
#

what does it learn tho?

delicate sphinx
#

It learns based on the information you give it

loud cave
#

Usually they are referring to 'online' learning, which is a method that can continually update itself

delicate sphinx
#

I've not heard of online learning for AI but I'm familiar with the idea behind it

arctic crown
#

im still a bit confused on what it learns

loud cave
#

You might already be familiar with methods that partition a large data set into train, test, etc. So that type of model learns once on the training data and that's it. If you have newer data that you want it to learn from, you have to make an entirely new model

#

I guess I"m jumping into this conversation without context. I assumed we're using 'AI' and 'ML' interchangably but from Tentenmen's answer is sounds like we're distinguishing them

delicate sphinx
#

I'm not as well versed in you with all the lingo and Jargon so you're probably better

#

at guessing the topic with your assumption

loud cave
#

I have no idea 😛

delicate sphinx
#

peace i forgot

#

how well versed are u with TF

loud cave
#

I have made some models but probably still a beginner in the big picture of things

arctic crown
#

same but im learning sklearn

delicate sphinx
#

have you ever used things like Embedding layer, Tokenizer or TextVectorization?

#

I've been stuck for like 2 weeks on how to actually get valid answers from a model

#

at first I just had <unk> <unk> <unk>, ....

#

then I had the same array over and over

arctic crown
#

also my ai is a personal assistant and if i want it to improve what can i make it improve in?

delicate sphinx
#

and now I'm just getting floats but have no idea how to translate that into words

loud cave
#

@arctic crown An example of online vs offline learning would be the linear regression model someone mentioned to you above. If you trained AKA fit that type of model, it would be 'learning' values for a coefficient that minimizes the squared error beween the line and all of the ground truth values AKA labels AKA 'y'. If you had an algorithm that fit the line given a dataset and couldn't update the coefficient after, that would be 'offline' learning. if you had an algorithm that could update the coefficient after each new example, that would be 'online' learning, so it could improve over time. I think SKLearn has some online learning algorithms in it

#

@delicate sphinx I actually have used those at a previous job. Or at least worked on a project where someone else used them and I inherited the model they made. When you say you don't know how to translate that into words, what do you mean?

delicate sphinx
#

So I can get float outputs, but I've no idea how to get that back into integer or string form

#

My current output is basically a softmax probability which should probably be used more for one-hot encoding

#

but apparently softmax could also work with other methods

#

if I one-hot encode I'm gonna destroy my processing time

#

(each answer / prediction would take 479,999 0's and one "1" value if I use one-hot encoding for most answers)

loud cave
#

What are you training the model to do? I guess the input is some words/sentences, but what is the ground truth/label that you are training it against?

delicate sphinx
#

Preprocessed image features + Question (LSTM) model --> a merged model that should output an answer

loud cave
#

Oh, I'm scrolling back up

#

and see some of the detail now

delicate sphinx
#

I also made a help in #help-cookie but I'm not sure if it's any use really

#

I've looked at TextVectorization, Embedding, Tokenizer but I can't understand any of them :/

loud cave
#

Is the answer supposed to be a single word?

delicate sphinx
#

everytime I've used them I've ended with float values I don't know how to translate back

#

the longest answer my dataset uses is 12 words long

#

and to keep my model easy to use, I try to make everything a power of 2

#

so the output length is 16 (16 words maximum)

#

my inputs:

loud cave
#

so input1 is the RGB/greyscale values of the image, input2 is words of the question and output is words of the anser?

delicate sphinx
#

input1 is: (36,2048,3)

#

where 3 is the channels (RGB)

#

so yeah

#

question input2 is (32)

#

longest question is 24 so I pad it to 32 (subnet masking in place)

iron basalt
#

Real life data that a robot receives in real time for example, is almost always not i.i.d.

loud cave
#

so there is some vocabulary of answer words defined?

iron basalt
#

(Though it's not binary, it can be more or less)

delicate sphinx
#

Which includes all questions and answers in my training, validation and test datasets

#

(28,000 unique words)

#

Peace is it cool if I DM? It's 00:19am here and I need to walk my dog before I code until 4am and walk him x-x

#

if not that's perfectly fine and don't worry

loud cave
#

So is the output supposed to be a vector of something like 16 X 28001? A probability for each word in each index of the answer, plus one more word for 'none'?

delicate sphinx
#

well, I was thinking of one hot encoding it

#

but that requires 16 * 28,000 values in lists

#

which is for most cases 477,999 0's

#

with one "1" value

#

and that is just such a waste imo, so I was hoping to use some sort of TextVectorization as apparently that's more of a dynamic approach that only uses as much as is needed

#

that's why I took the Tokenizer approach to begin with

loud cave
#

It's almost bedtime for me, you can send a message if you like but I may be asleep by the time you return. But I assume this isn't necessarily something you must finish in the next couple hours so if you haven't resolved it by the time I see your message I can still try to help

delicate sphinx
#

Yeah I mean I've been up till 4am every day this week trying to code this fix

#

I need to finish my model soon. If not by the end of this week I probably will have to leave it where its at

#

I've asked for help on this every day for the past week but understandably I'm not catching attention of many who understand TF and ofcourse TF is hard in itself so understandably I haven't been able to fix it

loud cave
#

I guess the only suggestion I can give is to try to make a simple dummy model using the embedding/vectorizer with a very limited vocabulary/answer size for the sake of easy debugging/understanding, and once you have that expand it into the real size

delicate sphinx
#

Yeah I guess I could :/ i sort of got tunnel vision with it all

loud cave
#

I guess 28,000 isn't that big a vocabulary really. But you could restrict it to like 5 words

#

It's not open source is it? If you put it on github or something I would try running it myself

delicate sphinx
#

I can give u a drive link to it but its not public yet

loud cave
#

Does the output answer have to be grammatical? or is like just a list of tags?

delicate sphinx
#

When I finish it I plan to put it on my website and make like 20 questions on stack overflow for every issue I had to ask for help for

#

So that I can answer them myself

#

The output can be grammatical but for the most part is one worded answers

arctic crown
arctic crown
delicate sphinx
#

so y = mx + c

#

m would be a coefficient

#

or: 10 = 5y + 2

#

5 would be a coefficient

desert oar
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @balmy bolt until <t:1640049534:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

arctic crown
#

if i want my personal assistant to improve automatically what can i make it improve in?

iron basalt
#

ML people call them weights because they see it as a weighted sum.

#

In decision theory, the weighted sum model (WSM), also called weighted linear combination (WLC) or simple additive weighting (SAW), is the best known and simplest multi-criteria decision analysis (MCDA) / multi-criteria decision making method for evaluating a number of alternatives in terms of a number of decision criteria.

desert oar
iron basalt
desert oar
#

yeah i had no idea it was such a general term

#

interesting

#

makes sense given the latin etymology

iron basalt
#

There is not really any good word to come up with anyhow for something abstract like that, but better than no word.

#

Weight also does not really make sense.

lapis sequoia
#

dont trust people online

elder thunder
#

What?

lapis sequoia
#

you said every information is not correct

#

so i told people not to trust them

elder thunder
#

No I didn't

lapis sequoia
#

yes u did

#

"Misinformation cannot be corrected"

#

u siad that

elder thunder
#

I told you to not DM people because misinformation cannot be corrected

lapis sequoia
#

who?

#

cares

#

so u go fuck off and leave me the fuck alone

elder thunder
elder thunder
#

He was dealt with

sour quarry
#

Have any of you had any experience with Google colab or similar tools? I'm trying to train some models but I'm a beginnier. I'm looking for something that's relatively easy to use but will work well enough for what I'm trying to do. Do any of you have any suggestions? Thank you. (:

lone drum
#

Hello
I am trying to select rows for specific date using df.loc method
My date column is object type
I am not able to get specific rows data from dataframe
Ping me when replying

stray pewter
#

Hello, I was assigned a task "Route Planning and ETA estimation in Urban Traffic Network Using Artificial Intelligence".
No dataset was given. I was planning to use data from OpenStreetMap.org. My question is How do you approach these types of problem?

teal mortar
rigid zodiac
#

!paste

hybrid mica
#

does anyone know of a dataset for a positive/negative sentiment analysis?

serene scaffold
#

I didn't check to see if the reviews are classified strictly as positive or negative, but if it's on a scale of some kind, you can descretize it.

#

(for example, you could say data['is_positive'] = data['sentiment'] >= 2.5)

#

I dunno, try it

#

(rather, I don't remember because I'm on my work computer. if it requires an account then I must have one on my home computer)

coral sage
#

I have a Pandas Series with each value as a Series of authors and the number of messages they sent in a given day, how do I convert it into a DataFrame?

#

Excuse the terrible drawing but that's basically what I wanna do on a bigger scale

serene scaffold
#

can you do print(series.to_dict()) and paste the text into the chat?

coral sage
#

The index is a bunch of dates

serene scaffold
coral sage
serene scaffold
#

Please ping me if you come back; otherwise I'm going to do something else.

#

!docs pandas.DataFrame.unstack

arctic wedgeBOT
#

DataFrame.unstack(level=- 1, fill_value=None)```
Pivot a level of the (necessarily hierarchical) index labels.

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).
serene scaffold
#

The solution will probably involve that. Good luck!

lapis sequoia
#

Hi all

#

Anyone has build a tool for forecasting in python?

serene scaffold
lapis sequoia
#

Yes but I need built an exe

serene scaffold
#

why does it need to be an exe?

lapis sequoia
#

Requirment

serene scaffold
#

well, once you have something working, you can ask how to make it an exe in #tools-and-devops

lapis sequoia
#

Does tslearn give me some dashboard sort of things

serene scaffold
#

if you want to make something with a UI, you can ask about that in #user-interfaces. AI libraries are about the actual AI component, and then there are other libraries for making interfaces.

coral sage
# serene scaffold Please ping me if you come back; otherwise I'm going to do something else.
{(datetime.date(2021, 6, 20), 'Name1'): 398,
 (datetime.date(2021, 6, 20), 'Name2'): 3,
 (datetime.date(2021, 6, 20), 'Name3'): 180,
 (datetime.date(2021, 6, 20), 'Name4'): 99,
 (datetime.date(2021, 6, 20), 'Name5'): 120,
 (datetime.date(2021, 6, 20), 'Name6'): 1,
 (datetime.date(2021, 6, 20), 'Name7'): 1347,
 (datetime.date(2021, 6, 20), 'Name8'): 893,
 (datetime.date(2021, 6, 20), 'Name9'): 207,
 ...

Sorry, I got preoccupied by something IRL. This is what I get when I use the .to_dict() method on the series.

serene scaffold
#

to_frame turns the Series into a DataFrame with two levels of indexing. The first level is the date and the second level is the name

#

Unstacking the second level (which is 1, because the numbering starts at 0) achieves the desired result.

#

actually it looks like you can just do series.unstack(level=1) and it's converted to a DataFrame as part of that

#

Yay!

coral sage
#

ahh yeah

#

Thank you very much :D, both worked

serene scaffold
#

also remember to have a copy/pastable example ready whenever yo have a pandas question

#

🐼 lemon_warpaint

coral sage
#

ah thanks for the tip, I'll remember that for next time :D

serene scaffold
coral sage
rigid zodiac
delicate sphinx
rigid zodiac
delicate sphinx
#

So your model is probably outputting shape 10 my bad

#

I'm not sure 10 and 2 are good values because if it was a power of 2 (i.e. 16) its much easier to compress to your desired shape

#

(I.e. 16 > 8 > 4 > 2)

#

Instead of 10 > 5

#

(The > is an arrow im just on phone)

rigid zodiac
#

i'm still a bit confusing about this. so what is the outputting shape mean? Is it for the number of classes for my outcome?

delicate sphinx
#

So maybe add another layer thats size 2

#

Can you do model.summary()

#

And show the final layer(s)

#

The output shape section will likely say (None, 10)

#

If you add another layer that outputs shape (None, 2) it should work, otherwise maybe try and make a layer that goes into 16 then go to 2 for the output

rigid zodiac
delicate sphinx
#

Can you send pictures of your model code or your model.summary()

#

Just so I know if what I'm saying might fix it or not

delicate sphinx
#

Ok so can you add a Dense(2)

#

As its sequential you should be able to just do model.add(tf.keras.layers.Dense(2))

#

Just put that right at the bottom

rigid zodiac
#

before the model.compile() and model.fit () right

delicate sphinx
#

Also just to give you a bit of extra info that may help you understand it

#

The (None, x) has None because thats your batch size

#

So that's a variable output size

#

If you're new to tf I just figured it worth saying that :)

rigid zodiac
delicate sphinx
delicate sphinx
#

Sorry I woke up a few minutes ago haha

rigid zodiac
delicate sphinx
#

Yeah sorry I meant 16 and 2

#

You could keep the 10 if you really want but personally I love using powers of 2 for all my layers due to the way tensorflow can squash stuff

#

But do another layer dense 2 just after the 1y

#

16

sleek sentinel
#

Hi

#

There is a module for detect font from image?

rigid zodiac
delicate sphinx
#

Model summary?
Might need to change a dense layer from 10 to 16

rigid zodiac
rigid zodiac
delicate sphinx
#

I thought your dense(16) worked it was just the output size issue

delicate sphinx
#

But in that model.summary() id recommend size of 16 into a new layer of 2

rigid zodiac
#

dense_8 thats size 2 is not working. I put it in as you say before

rigid zodiac
delicate sphinx
#

Sorry had to give a lift to someone

#

Yeah tensorflow squishes them by halving and multiplying

#

I.e. a layer of 255 would rarely work because it wouldn't be compatible with other layers fully, I.e. 255 would halve to 127.5 and either go to 127 or 128

#

But if you want another layer of 255 the best that layer could give you is 254 or 256

#

Which is why I try to keep all my outputs to a power of 2

rigid zodiac
delicate sphinx
#

whats the model.summary() now

rigid zodiac
#

hold on

rigid zodiac
delicate sphinx
#

hmm weird

#

a very bad solve, but maybe just try another dense after 16 of 8 then 4 then 2

#

but idk

rigid zodiac
#

let me tried with 8

rigid zodiac
# delicate sphinx but idk

I think the reason why it wont run because of the number of class. I need to change it to 2 since I only have 2 classes

delicate sphinx
#

yeah I don't know all that n_classes stuff you mentioned earlier

#

but that's a shout, I just assumed you hadn't made a chnage

#

change

small mist
#

print (hello world )

rigid zodiac
#

it work now, but the accuracy is shit so far. I will need to wait for it for 3 hours to know

delicate sphinx
#

you can always split up the dataset, your batch size was 16,000 or so a minute ago?

delicate sphinx
delicate sphinx
#

I've only learnt about it in uni and not done it myself as it sounds really complex, but I think eigenvalues are used for general face recognition

#

so it would make sense that you can expand on that for emotions (i.e. smiling)

#

I've no clue about the actual method to solve it all, I can't even seem to figure out how to one-hot encode my own work but yeah that sort of stuff is all I know

#

Personal + uneducated opinion, but lower face would be my go to

warm verge
#

I wanna see if I can detect that yeah

delicate sphinx
#

grayscale may be a shout, just because colour would require more data (i.e. 10,000 with and 10,000 without lipstick etc)

#

yeah for sure

warm verge
#

Then I can detect only the face using Haar cascade

delicate sphinx
#

and it can give you some images to send to friends and give them nightmares

warm verge
#

So once I have a grayscale face, it should be the best thing to apply PCA to

delicate sphinx
#

forehead being scrunched can be surprised

#

though the issue with that could bias older people as surprised (due to wrinkles)

warm verge
#

Forehead would be good yes, I haven't thought about that

delicate sphinx
#

Yeah you'd have to account for elderly people though

warm verge
#

The proposed system performs better than existing technique for facial emotion recognition when Gradient Filter, PCA and PSO has been considered for feature extraction with random forest classification technique.

delicate sphinx
#

though I'm not 100% on this perhaps if their eyes are not wide open but their forehead is scrunched then they're old

warm verge
delicate sphinx
#

but if eyes are wide open and forehead scrunched then they're surprised

warm verge
#

Although I would talk about its potential inaccuracy for wrinkles

#

Also I keep seeing 'random forest classifier' everywhere but I have no idea what it is

#

Google/Wikipedia/ELI5 not very helpful

delicate sphinx
#

Outside of my already limited knowledge I'm afraid

warm verge
#

Nw, thank you very much for the input! Got some new ideas now 😄

delicate sphinx
#

No worries, if you want to really expand it you can use RGB image to detect for wrinkles so you can better predict if their emotion is biased or not

#

but that's a self-thought idea so the practicality of it god knows

warm verge
#

RGB is definitely something that has big trade-offs so I'll try both ways

delicate sphinx
#

yeah you'd only want to preprocess bias weights with it

#

though you don't need to preprocess it and could just measure the amount of wrinkles and use that as a secondary accuracy score

#

i.e. "due to the wrinkles detected in the image this emotion guess may only be 30% accurate"

#

though in a less-harsh way so people in their 20-40s don't feel like you're calling them old haha

warm verge
#

That's a good point

#

I already have so many ethical considerations, ageism would be a good one to talk about

#

Good as in there's a lot to say, not that ageism is good

delicate sphinx
#

haha im glad

mortal silo
#

for theano to work with g++, do I have to have all 500mb of mingw-w64 files from conda? Aren't there any lightweight options or maybe I can precompile it somehow? I'm asking because I need to be able to run the program in any environment.

warped rapids
#

Does anyone have experience with matplotlib?

#

There's one thing I have to fix

#

And I have been stuck forever

hoary flame
#

Anyone experienced with Mask R CNN? I am trying to figure out the input format for the system. My dataset does not includes json file but each instance pixel's is labelled with the grayscale value.

warm verge
delicate sphinx
#

while also keeping it in here

#

personally I have no knowledge of it 😦

mortal silo
#

I tried. no one seems to know this 😦
Well time to try again

warped rapids
#

@warm verge

warm verge
warped rapids
#

Its flask

warm verge
#

You asked about matplotlib?

warped rapids
#

Yes

#

But its integrated in flask

#

In my example

#

So it all correlates

#

Plots data to webbrowser

#

But do you have any idea?

warm verge
#

It's one of those methods but I can't remember which

#

Lets you plot above

warped rapids
#

No, it's text method

#

I just need texts

warm verge
#

On the axes

warped rapids
#

No

#

I think what you mean are labels

#

I just want text

#

Like that

#

Or like this @warm verge

#

The percentages

warm verge
#

So you can either plot them in line with a datum point, such as 0.05 on that example graph, or use the axes and plot above

warped rapids
#

Yeah but inputs differ

#

plt.text(mostLeftSection, stats.norm.pdf(mostLeftSection, u, o), "test")

#

This is what I have

#

To try and have text in the mostleftsection

#

It works for color

#

mostLeftSection = np.linspace(u - 3 * o, u - 1 * o, 100)

#

plt.fill_between(mostLeftSection, 0, stats.norm.pdf(mostLeftSection, u, o), facecolor='#49393d')

novel acorn
#

Hello everyone, hope you're doing great!! 😄

Is there a way to get the function behind a neural network? In this image below, I want to get b

#

This is the network

warped rapids
#

Does anyone have experience with matplotlib and would like to help? :)

summer anchor
#

Greetings, I'd like to have some advice regarding our application.
We serve an API (Flask) that analyses images after testing them through about 10 models.
Currently the code base is a bit messy, I am trying to use a pattern that would allow us to plug-in and out different models, also increase reusability of the source code by implementing a class based code where we separate concerns for preprocessing, configuration and finally running the models against input image. Anyways, I am newer on the ML/DL domain although I have fundamental understanding of OOP and design patterns. Any recommendation is appreciated regarding "organization" of the project, thanks in advance!

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1640121661:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

odd meteor
# novel acorn Hello everyone, hope you're doing great!! 😄 Is there a way to get the function...

from this image, b is simply your output (the result predicted by your network).

Where:

X1, X2,..., Xn = n nodes/neurons in your input layer
W1, W2, ,..., Wn = The weight of the respective neurons in the input layer

Sigma = Your hidden layer with one neuron (remember each neuron in a hidden layer has its own bias weight, so because we only have 1 neuron in our hidden layer, that's why we have (W0) there)

a = sum of inputs x weights + bias

g(x) = it should be g(a) though. So here, a is then passed into the activation function to give us our yield

b = yield a.k.a output

odd meteor
# novel acorn This is the network

You've built your neural network architecture here, and have even made your neural nets fertile and ready for training by setting the loss function your optimization function should get to its global minimum)

So the next stage is to train your model (neural nets) by calling the fit method on it.

Then to get 'b', call your predict method on the model.

That's how to get the yield. ✌️

novel acorn
#

thank you, is much clearer now!

mortal silo
#

for theano to work with g++, do I have to have all 500mb of mingw-w64 files from conda? Aren't there any lightweight options or maybe I can precompile it somehow? I'm asking because I need to be able to run the program in any environment.

warped rapids
#

Does anyone have experience with matplotlib and would like to help? :)

desert bear
#

Hey, I have developed my own neural network based on gradient descent for back propagation. I'm now trying to test it, and to my surprise I always get similar results when it comes to worsening of the model.
No matter the data shuffling, performance of my NN drops always after third epoch:

#

Is this concerning to you?

gentle swift
#

Saw this on Product Hunt today: https://www.producthunt.com/posts/modern-data-stack

All things related to modern data stack for engineering in a single place. Nice resource!

Product Hunt

A platform for everything you need to know about the Modern Data Stack ⭐️ Companies & Categories shaping the Modern Data Stack 📚 Data stacks of the world's top companies 📖 Resources to get updates on the latest in this space 🛠 Jobs in data engineering

loud cave
#

It is a framework for deploying ML models

rose pasture
#

Hey guys how do you judge if whether a data science bootcamp is good or not?

odd meteor
# rose pasture Hey guys how do you judge if whether a data science bootcamp is good or not?

I think this question is subjective so here's how I'll guage the program:

  1. Richness of the curriculum

  2. Duration of the bootcamp

  3. Do they assume everyone is a novice and willing to start from the basis or do they assume we all know what we're doing 😂.
    Pay attention the stated prerequisite(s) if there's any.

  4. Where exactly are the people who were in their previously cohort now? Are majority of them now employed as Data Scientist or still willing to enroll for another data science bootcamp (This is the point where you really need to put on your Investigative Journalist + FBI cap) 😂

  5. The experience of your instructors. Are they Data Scientist, ML Engineer at well-to-do companies?

  6. Last but not the least... The teaching style. Do they assign mentors to students, what kinda projects are you gon be working on as your Capstone project... etc.

==================
I'm not affiliated to Fourth Brian but I'll always recommend their Bootcamp if you can finesse the payment.

https://FourthBrain.ai

delicate sphinx
#

(TENSORFLOW) does anyone know if my combination of TextVectorization + Embedding layer mean I can remove a mask_zero = True parameter

#

as mask_zero = True is stopping me from correctly loading my model from a json

#

but I want to be sure I don't need it before I remove it

rose pasture
wicked grove
#
def flip(image):
    image=cv2.flip(image,1)
    return image
#train_datagen = ImageDataGenerator(preprocessing_function=orth_rot,horizontal_flip=False)


j=0
my_img=os.listdir(train_path)
for i,image_name in enumerate(my_img):
    if(image_name.split('.')[1]=='jpg'):
        print(train_path+image_name)
        x=cv2.imread(os.path.join(train_path,image_name))
        x=cv2.cvtColor(x,cv2.COLOR_BGR2RGB)
        
        y=crop_square(x,512) 
        im_flip=flip(y)
        plt.imshow(im_flip)
        plt.show()
        
        cv2.imwrite(os.path.join(save_path,str(j),'_',image_name),im_flip)
        break```
#

can someone please tell me why cv2.imwrite is not working

delicate sphinx
#

can we get an output and the error output

#

Idk about cv2 but if it's something trivial I might be able to help

wicked grove
#

there is no error

#

the image just doesnt get saved to the location

delicate sphinx
#

does cv2 require opening a file to write to?

#

and it definitely prints just after the if? (if not then the if statement is likely wrong but as you're asking about cv2.imwrite I'll assume it definitely reaches that point)

stone marlin
#

How's it goin' tonight, y'all? I've got a few days to kill so I wanted to get grounded with some of those NN things y'all have been chattin' about. :'] Haha.

[Note: I've been workin' in ML/DS for a while, so I've got a pret good background in Python and general ML/DS architecture stuff, DDB nonsense, etc. Just don't know much about that sweet, sweet NN stuff!]

  1. Anyone got a tutorial series they dig?
  2. Anyone got a preferred framework? Is TF still the standard?

I was told "deeplearnin.ai" is a good place to start, but, you know, wanted to survey a bit.

wicked grove
delicate sphinx
odd meteor
stone marlin
#

I think DeepLearning.ai links to Ng's DL course on coursera. Or at least that's what it linked to me. PyTorch is also fine for me, I don't mind either way with TF vs. PyTorch. Seems like DL.ai is a good place to start then!

odd meteor
stone marlin
#

Yeah, I'm gonna see what it's all about, I'm not too worried about what I start with. Cool, thanks! I'm gonna try that then, and see how it goes.

odd meteor
delicate sphinx
#

Well, atleast my model doesn't overfit to just "yes" now ,-,

celest geyser
#

Not sure if this is the right channel but does anyone know of good ressources to start learning tacotron 2 (or other) for voice synthesis??

arctic crown
#

how can i make my ai assistant copy my habits?
suppose i set alarm straight for 5 days to ring at 7am, i want the program to set an alarm at that same time on the 6th day, in case if i forget to set it yourself?

woeful falcon
#

what would be the best way for a beginner in ai and neural stuff to learn the basics, like a small goal to work towards (eg- make a tictactoe playing neural net) etc ?

olive patio
#

https://colab.research.google.com/drive/1gQO_RddY0aBYtTQ2HTDcP6PXVEsJYuJL?usp=sharing

this is a colab project i've been working on. i'm using the dataset - https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

I was able to train the model and get accuracy and loss for validation sets. However, when I try to predict test set images the accuracy remains at 0.625. I ran it on 20 epochs and 4 epochs and 2 epochs and the value did not change. I know for a fact that the accuracy should be around 92% (I'm using a project for reference with the same model but some different code). Could someone please go through the notebook and see where I'm going wrong? I just cant figure it out. Thanks very much!

lapis sequoia
#

Hi anyone familiar with LSTM models? I am using LSTM to predicting stock price movement, however, the prediction result is very similar to original test set, why?

#
features=['Open','High','Low','Volume']
scaler=MinMaxScaler()
feature_transform=scaler.fit_transform(df[features])
feature_transform= pd.DataFrame(columns=features, data=feature_transform, index=df.index)
feature_transform.head()
timesplit= TimeSeriesSplit(n_splits=10,test_size=6126)
for train_index, test_index in timesplit.split(feature_transform):
        X_train, X_test = feature_transform[:len(train_index)], feature_transform[len(train_index): (len(train_index)+len(test_index))]
        y_train, y_test = output_var[:len(train_index)].values.ravel(), output_var[len(train_index): (len(train_index)+len(test_index))].values.ravel()
trainX =np.array(X_train)
testX =np.array(X_test)
X_train = trainX.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test = testX.reshape(X_test.shape[0], 1, X_test.shape[1])
lstm = Sequential()
lstm.add(LSTM(32, input_shape=(1, trainX.shape[1]), activation='relu', return_sequences=False))
lstm.add(Dense(1))
lstm.compile(loss='mean_squared_error', optimizer='adam')

history=lstm.fit(X_train, y_train, epochs=5, batch_size=8, verbose=1, shuffle=False)
y_pred= lstm.predict(X_test)
predictions=list(chain.from_iterable(y_pred.tolist()))
#

Here is the code I have used

#

Thanks. So my dataset is sorted by time, so there shouldn't be problem in input?

#

and I take a look at test_set, they took latest 6126 data which is fine, I just need it to predict next 6126 data

#

uh, I have 290K on train side, only 6126 on test side

#

well it because I can only get 6126 data for now to verify the prediction result, maybe more for future

#

but the thing I don't quite understand is still why the prediction looks basically same as test_set

#

well, maybe I will try another model and to see how it becomes, I use type() to check and didn't find out any problems in type

#

thanks

odd meteor
# olive patio https://colab.research.google.com/drive/1gQO_RddY0aBYtTQ2HTDcP6PXVEsJYuJL?usp=sh...

I haven't checked the code on Colab but try this

  1. Plot the learning curves to aid your understanding of what's really going on.

  2. Could be a case of Overfitting. If you didn't add Batch Normalization, or Dropout layer or Early Stopping callback, you might wanna do that and see if there's an improvement.

  3. Might be the problem of selecting the right number of epoch vs. Batch size.

A quick way to find out if this has a hand in what's happening is to train your neural net with 2 different batch_sizes and a constant epochs.

For example use:

  1. epochs = 5, batch_size = 1
  2. epochs = 5, batch_size = X_train.shape[0]
lapis sequoia
#

hi can anyone recommend me some resources to learn reinforcement learning

amber lark
#

How does data science connected to neural network?
Can I learn NN without the knowledge of data science?

delicate sphinx
#

NN is a part of data science I believe

#

I never explicitly learned about data science but I had learnt about what a NN is and how they work

#

so I think it's sort of hand-in-hand (though Data Science likely contains more than just NN obviously)

wary bison
#

Hi guys need some advice on a thing. I have a univariate time series dataset on which I need to apply anomaly detection. Everything needs to be unsupervised - the thresholds for the anomalies also need to come from the model/algorithm. Any suggestions/recommendations?

lapis sequoia
#

Hi could anyone please tell me about the math in machine learning?

delicate sphinx
#

if you plan to write from scratch or to apply your own functions to it, then you can expect lots of maths

lapis sequoia
#

I'm actually a beginner

delicate sphinx
#

It can get really mathematical but there are many libraries/packages out there that you can use that will do it largely for you

#

i.e. tensorflow has libraries that can apply loss functions, optimization functions, etc. all without me doing anything more than importing a loss and optimizer module

#

but tensorflow also allows you to customise your input/output functions and more, in which from the look of some things you'd need a very mathematical brain to interpret well enough to make yourself

lapis sequoia
#

Oh ok thank you very much 👍

delicate sphinx
#

if you want to do stuff with data you can probably expect lots of disgusting variables to do things quick and efficiently

#

but it's just like any project, to you it makes sense and to others you've just written a piece of space language

chilly torrent
#

I am struggling so hard with numpy.random.lognormal. For some reason, I am getting really high numbers when I am using a small mean. Does anyone have any idea what's going on?

# output 3.4214334251929405e+30

The output is 30 times bigger than the input!

ancient sorrel
#

I need some help idk if it s related to ai or data science but ti seems the closest domain. I have a 3d obj with a mtl texture and i have a script which takes photos of it from different angles. i m using pyrender. for some reason the object i get in the photo is untextured. can anyone give me a hand here?

delicate sphinx
# chilly torrent I am struggling so hard with numpy.random.lognormal. For some reason, I am getti...
#

does that help?

tidal bough
low spear
#

anyone can help me on this? this is code i obtained from github and i tried to learn it by using my own input but i bumped into the error. this code is about vehicles detection using faster rcnn

delicate sphinx
#

and if you want to help further a model.summary() can show the output shapes expected

#

making sure they all match is really good

normal violet
#

hey if anyone here uses flask could they please look at this?

#

please upvote if you can

chilly torrent
chilly torrent
wooden night
#

Any tips on implementing novel improvements from papers? (I have RL algorithms in mind specifically but I suppose it can apply for any paper-about-a-concept)
I'd consider myself a fairly competent python dev and I know the basics of RL and deep learning - enough to play around with stable_baselines3 and to get the gist of the algorithms - but when I read a paper my eyes glazeth over and I'm just unable to start writing an implementation

#

I've implemented tabular RL from scratch (trivial) and A2C from scratch (was very difficult). Is it just an experience thing, and I should be working my way up by implementing the actor-critic/Q-learning ladder myself from DQN/A2C up to TD3 and SAC?

desert oar
#

@wooden night unglaze your eyes and start writing 🙂 i don't have experience specifically with reinforcement learning, but sometimes there's nothing you can do but sit down and write an implementation

#

trying to figure out test cases is probably the hardest part (you do want to test your code, right?)

#

these algorithms can be very very difficult to verify and audit

#

writing tests sometimes is a matter of verifying that certain mathematical properties hold (modulo floating point error)

bronze skiff
#

as the physicists say, "shut up and calculate!"

wooden night
desert oar
#

as another option, use the reference implementation to generate test data (known-correct input-output pairs)

#

plus maybe there are bugs in the reference implementation

wooden night
#

I don't have to use cool improvement B per se but I think it would really help, but sadly it hasn't been added to the package I'm using

desert oar
#

maybe your employer will let you dedicate time to contributing the new version upstream even

arctic crown
#

in ml what is support vector machines?
and how does it work?

hybrid mica
#

in artificial neural networks, how does one know how many hidden layers are needed, and how many neurons?

desert oar
boreal summit
#

Mehn, it's so frustrating when much of the code in the book you're studying ain't running.

#

And this book was released this year. ☹️

bronze skiff
#

code changes fast

#

that's why math == best lang

boreal summit
#

I type in the exact code and think I'm wrong, then I go to the GitHub to copy and paste yet still doesn't run.

boreal summit
bronze skiff
#

rustlang? no. mathlang

boreal summit
#

I used to learn c# before university, so I wanted to get back that feeling.

#

Python is plain, and doesn't really feel like programming IMO.

desert oar
# arctic crown in ml what is support vector machines? and how does it work?

there are 2 primary ways to interpret an SVM (assuming classification):

  1. the modern way: a linear model with a specific loss function called "hinge loss"

  2. the traditional way: an algorithm that finds a "separating hyperplane" that divides two classes, such that the hyperplane has the greatest possible "margin" between it and the data

it also turns out that the SVM model is amenable to something called the "kernel trick", which lets you embed your data in an arbitrarily complicated space without having to actually transform the data, as long as you can compute inner products in that space in terms of the original data values.

this "kernel SVM" technique is part of why SVM was so popular before gradient boosting and neural networks rolled around . it allowed you to develop a fairly complicated "feature space" that possibly encoded high-order relationships between data points, in which the classes were much easier to separate. this is not entirely unlike what gradient boosting and neural networks do.

#

but i highly recommend reading a book instead of asking people online 🙂

desert oar
boreal summit
#

Applied Data science using Pyspark by Ramcharan Karla, Sundar Krishnan

#

One of the reasons I try to read recent books is so I'm sure I'm not reading old stuff. This book is copyrighted 2021, so I feel everything should work fine. But there are some hiccups.

bronze skiff
#

do your pyspark versions and jvm match with the book's?

#

they are using spark 3, which is quite stable

boreal summit
#

I can't confirm that ATM, but the latest Pyspark version is 3.0. if the book was released this year, then I think it should be compatible with the latest release.

bronze skiff
#

i find it hard to believe you're experiencing hiccups unless you are using a completely off version

#

spark 2 to spark 3 has quite a lot of changes

#

especially with pyspark

desert oar
#

that, and it might be useful to post the actual errors you are getting

bronze skiff
#

which basically sucked until this year, haha

boreal summit
#

I already sent the author a LinkedIn connect request. I'll Inbox him when he accepts, but I'll make sure I confirm the Pyspark version used in the book tomorrow so I can be certain that's not what's causing the issue.

bronze skiff
#

also, what errors are you getting?

bronze skiff
desert oar
#

like how a plane is the generalization of a line

#

a hyperplane is the generalization of a plane, beyond what we as humans can visualize. but a lot of the properties are the same, and yes it helps if you know the linear algebra to avoid being stuck too much on low-dimensional intuition

stone marlin
arctic crown
desert oar
#

a plane is specifically a 2-dimensional object in a 3-dimensional space. a generalization would be an n-dimensional object in an (n+1)-dimensional space

#

it's not longer specifically "2" and "3", ergo it is "more general" and therefore a "generalization"

#

this is a common practice in math: finding generalizations of things

boreal summit
hybrid mica
#

I built my first artificial neural network in python today, and I got an accuracy rate of 0.86. However, when I used the same dataset and calculated the accuracy rate with kernel svm and random forest (without any deep learning), they were both higher than 0.86. Is this normal?

bronze skiff
#

yeah, why not?

#

neural nets aren't the best thing for every single dataset you see, especially tabular datasets

serene scaffold
delicate sphinx
#

my model in tensorflow keeps outputting "yes" which is the most common single-word answer of my dataset

#

any ideas on how to fix?

#

I have image features and LSTMs combined into one model that then creates a network of dense layers, has a dropout of 0.5, activations tanh, and denses of 16,128 for each of those dropout,activation pairs.

The layer then has a dense layer of 256 and outputs via a dense layer of shape 16,23000. Finally, it is activated by a softmax

#

the dense layer of 256 uses a kernel_regularizer l1_l2 (elastic net)

serene scaffold
delicate sphinx
#

Image + Question --> Answer

#

the answer can be up to 16 separate words

serene scaffold
#

and for what percentage of the training instances is the answer "yes"?

delicate sphinx
#

question is up to 32 separate words but a TextVec and an Embedding layer before the LSTM layers so it's represented as a dense vec

#

A large amount, it would make the single most common answer, I don't have exact counts of how many answers are yes but I could probably find it fairly quick

#

though I'm trying to run another test on my model so have about 10 minutes before that will finish the first epoch

serene scaffold
#

so the possible answers are a mix of yes/no and qualitative questions?

delicate sphinx
#

yes I have 23,000 potential answers

serene scaffold
#

I feel like those two classes of questions should be handled separately.

delicate sphinx
#

(the vocab size will include words like "what" which for this example we can say isn't a possible answer, but I've included that in my output size regardless)

#

yeah it would be good if I could but idk how to :/

#

I was hoping it would work as just any old classifier

serene scaffold
#

where did you get the idea to do this?

delicate sphinx
#

the first 10 of 248,000 questions

#

Visual Question Answering

serene scaffold
delicate sphinx
#

No worries

serene scaffold
#

if you provide it as text, we can continue.

delicate sphinx
#

it's nothing of actual importance, just an example of the proportion of yes/no questions

#
Is the food napping on the table?
True answer:  no
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 1 , Answer:  yes                

What has been upcycled to make lights?
True answer:  kettles
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 2 , Answer:  yes                

Is this an Spanish town?
True answer:  no
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 3 , Answer:  yes                

Are there shadows on the sidewalk?
True answer:  yes
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 4 , Answer:  yes                

What is in the top right corner?
True answer:  tree
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 5 , Answer:  yes                

Is it cold outside?
True answer:  yes
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 6 , Answer:  yes                

What is leaning against the house?
True answer:  ladder
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 7 , Answer:  yes                

How many windows can you see?
True answer:  1
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 8 , Answer:  yes                

Is this in a park?
True answer:  yes
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Index: 9 , Answer:  yes   
serene scaffold
#

interesting; what do the lists of ints mean?

delicate sphinx
#

those are the very first 10 so obviously the density of yes/no questions won't be represented by this

#

the list of ints are the int versions of the output (softmax output, then np.argmax(output))

#

so [9, 0,0,0,...] means that the answer is "whatever word is represented by 9, and the rest is unknown" as 0 is an unknown token

serene scaffold
#
Is the food napping on the table?
True answer:  no

this doesn't even make sense?

#
Are colorless green ideas sleeping furiously?
True answer: sometimes

????

delicate sphinx
#

The questions are created to give the model "commonsense knowledge"

serene scaffold
#

I see

#

and you said there were images. how are those images being represented?

delicate sphinx
#

i.e. a more trained and adapted model would look at object pairings such as "food napping" and decide "what the heck, food can't sleep???"

#

the images are preprocessed via the InceptionV3 model

#

they are then flattened in my model and that is about all I do with them before passing them to the merged model (that is constructed of Dense layers, Activations and Dropouts)

#

Someone recommended regularization but that hasn't really changed the output from "yes"

#

(The images are loaded with np.load() to load them as numpy arrays which are then converted to tensors)

modest mulch
#

Anyone knows how I might be able to make a mask out of shapes/pictures in an image which contains text? so like it's an image from a pdf, so it could contain texts or pictures or shapes. Am I able to make a mask out of everything that is NOT a text?

sleek tapir
#

iim learning ml rn

#

is KNN related to topology in any way

#

like distances and metric spaces

bronze skiff
#

if your definition of topology is distances and metric spaces, sure

#

knn... uses distances

#

and distances make up metric spaces

sleek tapir
#

well i learn topolgoy at uni

#

how bout manifolds and differential geometry

#

Differential geometry finds applications throughout mathematics and the natural sciences. Most prominently the language of differential geometry was used by Albert Einstein in his theory of general relativity, and subsequently by physicists in the development of quantum field theory and the standard model of particle physics. Outside of physics, differential geometry finds applications in chemistry, economics, engineering, control theory, computer graphics and computer vision, and recently in machine learning.

#

from wiki

bronze skiff
#

i mean, i'm confused by your question

#

are you asking if KNN is related to parts of mathematics?

#

it's just a supervised learning algorithm-- it's hard to say if it relates to very large swathes of a very large field

sleek tapir
#

im deciding which subects

#

to take

#

yea knn is related to other parts of mathematics

#

for next year uni

merry ridge
#

It doesn't really use topology no. You could argue it uses point set topology, but by that same argument calculus is topology

sleek tapir
#

well i dont classify calculus a part of topology

#

i classify topology and calc as part of real analysis

#

then there is complex analysis

#

im not even a pure major

#

im a stats majro

#

together with cs

#

but ik a bit of toplogy and e.t.c.

merry ridge
#

Saying topology is part of real analysis is one way to offend every topologist.

sleek tapir
#

well my uni

#

does it part of real analyiss

#

my uni does a bit of topology in multivariable clac

#

(calc 2 or 3 in the us i think)

#

we learn calc 1 and probably a bit of calc 2 in High school

merry ridge
#

That is probably point set topology, I don't see any reason to introduce a student to anything from topology in multivariable calculus.

sleek tapir
#

yea its point set topology i think

#

it was hard when i did it

#

its oinly very basic obviously

#

open closed sets intersection and e.t.c.

short wren
#

Hey! I'm making a neural network to identify the presence of pneumonia in lung x-rays (1 or 0), so what activation functions and loss functions do you recommend?
I have relu for everything except output, which is softmax, and binary_crossentropy for loss
(tensorflow keras)

#

No matter what I try, my accuracy is 75% or below, but training almost always ends at 95% accuracy

#

and my model has a dropout of 50%

alpine rain
#

I have no idea which would be better, but I would suggest that you test multiple activation functions to see which provides the best result on one set of x-rays, and then test them on another set of x-rays to make sure the one that seems to be the best is really the best and it wasn't just overfitted

#

the difference in % between training and test data might be smaller if you have a larger training set

short wren
#

thanks!

short wren
alpine rain
#

that sounds good

#

not sure what kind of difference you're looking for in the images though, maybe the difference between a pneumonia patient's image and a regular person's image is small enough to need more training data... but the ratio between the training and testing images is good

willow linden
short wren
willow linden
#

data augmentation?

short wren
#

i looked into it

#

is layers.RandomFlip("horizontal_and_vertical"), sufficient

willow linden
#

I don't know, that's the magic of AI spending thousands of hours on hyperparameter tuning

#

haha

#

but maybe you should do a bit more of augmentation

alpine rain
#

RandomFlip sounds like something that will randomly flip your image either horizontally or vertically, judging by the parameters you added

#

I'm not sure if that's in any way beneficial to you

#

why would you want to train your AI to recognize pneumonia in a picture of lungs upside-down? it has no added value

willow linden
#

true, maybe play more with contrast

#

and that kind of stuff

alpine rain
#

I think creating a software to automatically detect an illness from an X-ray is a really cool idea. I honestly wish I could help you more, but I can only give you pointers

short wren
#

k ty

#

is there anything that is obviously wrong in this?

#
model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(
            45, (5, 5), activation="relu", input_shape=(IMG_WIDTH, IMG_HEIGHT, 3)
        ),
        tf.keras.layers.MaxPooling2D(pool_size=(3, 3)),
        tf.keras.layers.Conv2D(
            45, (2, 2), activation="relu"
        ),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(100, activation="relu"),
        tf.keras.layers.Dense(60, activation="relu"),
        tf.keras.layers.Dense(60, activation="relu"),
        tf.keras.layers.Dropout(0.35),
        tf.keras.layers.Dense(30, activation="relu"),
        tf.keras.layers.Dense(NUM_CATEGORIES, activation="softmax")
    ])
    model.compile(
        optimizer="adam",
        loss="binary_crossentropy",
        metrics=["accuracy"]
    )
    model.summary()
    return model```
alpine rain
#

do you get an obvious error message from it?

willow linden
charred umbra
charred umbra
alpine rain
#

Swish, Mish, and Phish? I thought this was the beginning of a joke and you're telling me these are actually names of features in tensorflow? 😄

charred umbra
#

I dont think Mish and Phish are in there, but you could probably use them still

#

Mish is defined as f(x) = xTanH(Softplus(x))

#

and Phish is f(x) = xTanH(GELU(x))

#

Mish and Phish could probably be added if enough people start using them like Swish though

charred umbra
alpine rain
#

"avengers, activate Phish"... doesn't quite have the same ring to it

charred umbra
alpine rain
#

after fish, probably 🙂

#

it can be really funny though, how some programming things get their names... like you'd think phish comes from fish, but maybe it's some very elaborate thing involving 13 people over the course of 4 and a half years plus a dog and two chocolate cakes that ended up being this name...

charred umbra
alpine rain
#

the guy's name was Phish?

charred umbra
#

I dont think so. It would be hilarious if it was though

alpine rain
#

so it wasn't quite "named after" those people so much as those people chose the name for it based on their initials...

alpine rain
#

also I imagine Philip was simping on Misra a little bit so he wanted to follow in that pattern Misra set with her name 🙂

charred umbra
#

I kinda wanna test out Mish and Swish in actual neural nets and see how they do

charred umbra
alpine rain
#

both are good

charred umbra
alpine rain
#

I like the whole Swish, Mish, Phish thing too, I think it's funny

#

also not very serious

#

so I don't take it seriously

#

but I probably should

#

but I'm using a programming language named after 5 british idiots so whatevs 🙂

charred umbra
#

The experiments show that Swish tends to work better than ReLU on deeper models across a number of challenging datasets, and its simplicity and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network. The choice of activation functions in deep networks has a significant effect on the training...

#
#

Swish, Mish, and Phish respectively

alpine rain
#

if I'd make an activation function, I'd just call it good and then people would say good AF 😄

alpine rain
#

Heruk is not my real name

charred umbra
#

ah ok

alpine rain
#

maybe I'd call it Wish 😄

#

you WISH this was a good activation function but it isn't 😄

charred umbra
#

Swish and Phish seem to be on par, but Mish is seemingly better than both of those

alpine rain
#

I have no idea

stone marlin
#

I missed this, but: topology? Part of real analysis? Color me offended. 🦂

alpine rain
#

offended by what?

willow linden
#

Hey guys, anyone knows a good source to learn about "probability density function"? or maybe just probability in general

stone marlin
analog hull
charred umbra
#

or maybe in autoencoders

analog hull
#

Do you think it could be implemented into the discriminator as well

charred umbra
#

I would think that having it in the descriminator would make the generator more accurate

#

since the idea of a GAN is for the discriminator to have high loss and an accuracy of 1/2 everywhere

#

so a function that minimizes loss would make the generator more accurate in making the fake images

#

but im not sure

analog hull
#

Anyways it's really cool we have a new activation function

willow linden
daring tiger
#

Phish is quite revolutionary

#

Surely the mods here should make an announcement about it, its very useful and applicable to the real world

#

At the very least it should be added to tensorflow

bronze skiff
#

any lesser source is just copium

stone marlin
#

I've been through Billingsley Prob + Measure like, ten years ago, so it might be nice to refresh --- is Ross' mostly theoretical, or is it more of an applied text?

desert oar
#

i guess the first chapter is a good rundown of probability theory, but probably not something you can effectively self-study from if you've never seen the material before

bronze skiff
#

its a canonical text for undergraduate probability, so its definitely something you can self study from

desert oar
#

i missed the exercises at the end of the chapter

#

yeah these are pretty good

#

i stand corrected

tight dove
#

I have a dataframe in pandas

#

I'm trying to "collapse" multiple rows to one since they pretty much the same values

#

How do I go about this?

#

PS : I had used groupby initially to transform the data

arctic crown
#

anyone have a dataset for a chatbot?
like the intents.json

odd meteor
austere swift
#

well if it was only proposed a couple days ago i wouldn't expect it to be

lapis sequoia
#

can anybody suggest me a hands-on course on reinforcement learning

lapis sequoia
#

Yo Guys

#

Where can i Learn AI

#

using python as a programming language specifically

#

i'm willing to enroll for a paid course just need good recommendation

minor mica
#

👋 Hi, I'm a full-stack web developer with a little bit of experience in Python, but most of my programming experience is in Javascript. I've always had an interest in machine learning and NLP, so I decided I want to make a chat bot with Python, create an API for it, and turn it into a full-stack project for my portfolio. To be more specific, I wanna make something that is like CleverBot in the sense that its goal is just to have conversations with humans that are as natural as possible. However, rather than taking the pure ML-based model that CleverBot uses, I wanted to try something like a rule-based approach that uses AI to augment the quality of its responses over time. So basically, I guess I'm just looking for any tips/advice? Tools that might help me? Dunno, just kinda playing with the idea at this point haha. 😅

bold timber
#

whether k-prototype need to feature scaling?

green zinc
# lapis sequoia Where can i Learn AI

there is this free course which is highly recommended https://es.coursera.org/learn/machine-learning, but does not use python ( you can ignore the programming parts) and it will help you learn the base of ml / ai. then you can look for specific tutorials on how to develop a ml/ai solution in python

iron basalt
# lapis sequoia i'm willing to enroll for a paid course just need good recommendation

MIT 6.034 Artificial Intelligence, Fall 2010
View the complete course: http://ocw.mit.edu/6-034F10
Instructor: Patrick Winston

In this lecture, Prof. Winston introduces artificial intelligence and provides a brief history of the field. The last ten minutes are devoted to information about the course at MIT.

License: Creative Commons BY-NC-SA
...

▶ Play video
odd meteor
iron basalt
#

There are several open courses now.

boreal summit
#

Hello everyone, the Pyspark version used in the book is 3.0.1, while the version on my laptop is 3.1.2

boreal summit
#

@bronze skiff @stone marlin

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @wintry quarry until <t:1640257217:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

serene scaffold
serene scaffold
tight dove
#

What I did was to do a groupby, then agg with the column names

serene scaffold
#

If you still have a question, I can try to help, though keep in mind that I do not look at screenshots of text.

tight dove
#

i already dropped duplicates -

tight dove
# serene scaffold I don't know enough about what you're trying to do to understand this context.
processed_customers = processed_customer \
      .groupby('customer_id',as_index=False) \
      .agg({
          'customer_name':'first',
          'total_invoice_count': 'first',
          'total_invoiced_amount': 'first',
          'unpaid_amount': 'first',
          'unpaid_count': 'first',
          'first_invoice_date': 'first',
          'first_invoice_amount': 'first',
          'last_payment_date': 'first',
          'last_payment_amount': 'first',
          'customer_segment': 'first'
      })
#

That's what I did

serene scaffold
tight dove
#

Okay. Usually i use display() but I will give this a try as well

serene scaffold
tight dove
serene scaffold
tight dove
serene scaffold
tight dove
#

Okay. Thanks!

serene scaffold
#
In [4]: df.groupby('customer_id',as_index=False) \
   ...:       .agg({
   ...:           'customer_name':'first',
   ...:           'total_invoice_count': 'first',
   ...:           'total_invoiced_amount': 'first',
   ...:           'unpaid_amount': 'first',
   ...:           'unpaid_count': 'first',
   ...:           'first_invoice_date': 'first',
   ...:           'first_invoice_amount': 'first',
   ...:           'last_payment_date': 'first',
   ...:           'last_payment_amount': 'first',
   ...:           'customer_segment': 'first'
   ...:       }).drop_duplicates()
Out[4]:
   customer_id customer_name  total_invoice_count  ...  last_payment_date  last_payment_amount  customer_segment
0            1     Microsoft                    6  ...         2021-06-01                 1000               Low
1            2         Apple                    4  ...         2021-08-15                 3000              High
2            3        Google                    4  ...         2021-04-01                 1000               Low
3            4       Netflix                    4  ...         2021-07-31                 2500              High
4            5          Meta                    2  ...         2021-07-15                  500               Low

[5 rows x 11 columns]
charred umbra
#

Actually, it seems that you can just run Phish, by creating it like this:

#
"""Tensorflow-Keras Implementation of phish"""

## Import Necessary Modules
import tensorflow as tf
from tensorflow.keras.layers import Activation
from tensorflow.keras.utils import get_custom_objects


class Phish(Activation):

    def __init__(self, activation, **kwargs):
        super(Phish, self).__init__(activation, **kwargs)
        self.__name__ = "phish"


def phish(x):
    return x*tf.math.tanh(tf.nn.gelu(x))


get_custom_objects().update({"phish": Phish(phish)}) ```
#

and calling "phish" as a string literal in the dense layers

charred umbra
polar narwhal
#

Is there anyone out there who knows opencv and can help?

bronze skiff
#

and yes, so what is your error with that spark version

analog pike
#

I don't know if i'd ask this here or no, i'm trying to work through the RL guides on tensorflow's website, but every time I try and import everything related to tf-agents I keep getting "AttributeError: module 'tf_agents.trajectories.trajectory' has no attribute 'Transition'"

#

I made sure my tf agents and tensorflow versions were correct

bronze skiff
#

are you using an ide and forgot to link your venv interpreter to it?

analog pike
#

no i don't believe so, I'm using the anaconda prompt to install the packages for jupyter notebook

#

everything else I installed works just fine

lilac crest
#

i am not sure if i should ask here but could someone help me with matplotlib stuff?

analog pike
#

the weird thing is if I just do import tf_agents it works fine, but when I get to trying to import specific things from it is when it decides to give an error

boreal summit
stone marlin
#

I turn my pings off for exactly this reason.

boreal summit
hazy escarp
#

Hey, I need some help with NEAT, anybody who knows it? Maybe DMs or smth idk

serene scaffold
barren seal
#

hello

hazy escarp
#

hi

barren seal
#

I want to implement Fast Fourier Transformation.

#

import scipy as sp
import matplotlib.pyplot as plt
listA = sp.ones(500)
listA[100:300] = -1
f = sp.fft(listA)
plt.plot(f)

#

but it's asking me -"AttributeError: module 'scipy' has no attribute 'fft'

#

anyone can help me with it?

#

if I use -import scipy.fft as fft

#

it says--TypeError: 'module' object is not callable

charred umbra
#

if you cant get the scipy one to wokr

barren seal
#

does numpy have fft ?

desert bear
#

Hey, does anyone know any github repo or module that can generate a 2d map like one below? I'm looking for a playground on which I can test some reinforcement learning algorithms

charred umbra
#

I used it once

barren seal
round field
#

how do i start learning the python coding?

hazy escarp
#

just find some tutorials on ytb ig

delicate sphinx
# round field how do i start learning the python coding?

As this is your first message in this entire discord, I assume you mean in general? If so, maybe check out something like HackerRank (If they're still around), or CodeAcademy etc. Or give yourself little projects to do, you can work up to things like PyTorch or Tensorflow for the Data Science

delicate sphinx
#

how can i shuffle different datasets by x amount? i.e. I need order to be maintained, so my (input1, input2, output) need to be such that when I fetch them after being shuffled, I get (input1_list[x], input2_list[x], output_list[x])

#

tensorflow

limber hemlock
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1640302301:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

sweet owl
#

so basically Whoever wants to help Im trying to create a "touchscreen" by using a side facing camera for now I want to see if the Hand is touching the display and Idk where to start since I know cv2 but im extremely weak with ai stuff

#

This is quite similar to virtual mice however I would use a side facing camera to factor if the fingers are touching the screen or not

vivid plank
sweet owl
#

Ur a god

#

Thanks

vivid plank
delicate sphinx
sweet owl
#

@vivid plank its kinda weird tho and a process like why do I need to make a PR

grave frost
#

what's i put the whole snippet

delicate sphinx
#

i would be the batch sizing

grave frost
delicate sphinx
#

to cache all of the images, questions and answers I'd need about 64GB of RAM

grave frost
#

reduce the buffer size then

delicate sphinx
#

Would that not drastically lower the point of the shuffle if I could only shuffle it < 255

#

as 1) that's a relatively small value (as a perfect shuffle according to tensorflow would be 248,001) and 2) less than batch size from what I understand means some values will be kept where they are

grave frost
#

if your batch size is 16, a buffer size of 32 would do?

delicate sphinx
#

It likely would be, I was thinking bigger = better in terms of batches though

#

I'm not sure if it actually would be

#

so was trying to keep it fairly high

grave frost
#

it probably won't be

delicate sphinx
#

My model is a mess lol

grave frost
#

try using prefetch too

delicate sphinx
#

Even using precision and recall metrics for my model all it outputs is "yes"

#

yeah I prefetch my images, should prefetch my one hot answers too

grave frost
#

all models are like that in the start 🙂 DL is not that easy

delicate sphinx
#

the actual model processing takes about 0.8 seconds and loading all of the images (when I use the full dataset) takes about 4 seconds

#

so I always end up waiting for the I/O bottleneck anyway

grave frost
#

but overtime, you'd immediately recognize what the problem is

delicate sphinx
#

Yeah, I'm just getting annoyed haha, no matter what I do it seems to only output "yes" so I'm doing all I can to change that

grave frost
#

yea, my model is bottlenecking too 😜 but I am a bit lazy and can wait more

delicate sphinx
#

I try kernel_regularizer, changing metrics to Precision and Recall (which I think is best), changing loss/optimizer etc. etc.

#

one change after the other and something I'm doing always ends up favouring the most common answer lol

grave frost
#

well, then speed shouldn't be a priority for you at this stage

delicate sphinx
#

yeah I'm using 10% of the full dataset

#

so it can do 1 epoch in about a minute as opposed to 45minutes

grave frost
#

You can't expect to save off much from 45 mins tbh

#

prolly 30 mins max

delicate sphinx
#

I can reduce that time but I'm trying to do extra processing to see if I can get a few varying results

#

but I still just get [9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] (once decoded that's "yes" with 15 spaces)

#

because like a big brain idiot I'm trying to also output up to 16 words ,-,

#

anyways, I'ma sleep cuz it's 1am rn, ty for the advice man

grave frost
#

hm..what's the task?

delicate sphinx
#

Visual Question Answering

#

Image + Question --> Answer

#

I've just thrown in like 6 metrics and giving it a quick run through now

grave frost
#

you're backpropping through 6 metrics?

grave frost
#

I doubt you'd get good accuracy without carefully adjusting your architecture

delicate sphinx
#

I'm using the ones given from this

#

ah :/

#

from that site I'm using:

METRICS = [
      keras.metrics.TruePositives(name='tp'),
      keras.metrics.FalsePositives(name='fp'),
      keras.metrics.TrueNegatives(name='tn'),
      keras.metrics.FalseNegatives(name='fn'), 
      keras.metrics.Precision(name='precision'),
      keras.metrics.Recall(name='recall'),
]
grave frost
#

wuz the loss

delicate sphinx
#

but from a quick test it still outputs "yes"

#

tbh I seem to have deleted the code that gets loss lmao, how do I get loss from train_on_batch again

grave frost
#

you can paste the code here, someone might look over it and help you out

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

delicate sphinx
#
[0.6561643481254578, 3789.0, 86.0, 100498472.0, 291.0, 0.9778064489364624, 0.9286764860153198]
#

The issue is much like the credit card fraud imbalanced data set I have a HUGE amount of "yes" answers

#

except it's not a binary classification task and has 23,000 possibilities

#
Loss from last batch trained:  0.5620800852775574
TruePos from last batch trained:  3801.0
FalsePos from last batch trained:  24.0
TrueNeg from last batch trained:  100498536.0
FalseNeg from last batch trained: 279.0
Precision from last batch trained:  0.9937254786491394
Recall from last batch trained:  0.9316176176071167
#

Yeah it's scoring quite high just by guessing "yes"

desert oar
#

imbalanced classification is hard in general

#

especially with a huge number of classes

odd coral
#

Anyone here code counterfactual regret minimization

coral sage
#

I'd like to have custom images instead of labels on my squarify tree map, How can I do that, I couldn't find anything on it when I checked.

#

This is the test data I'm working with for now

#

I plan on downloading the custom emoji images using requests and storing them in a folder with their emoji_id as the filename

#

This is what the treemap looks like, but instead of text labels I'd like the images with the respective filename

drifting mason
#

Hello, I wanted a small help

#

I am searching proteins in a disease let us say x

#

in google

#

there are about thousand

#

how do I automate the search and fetch only the proteins in the disease as per google

#

any idea?

#

pls help

bold timber
#

anyone can explain to me about silhouette score?

delicate sphinx
#

For my loss and optimizer

#

I think one issue i have is I try to output up to 16 words in 16 different lists because I have questions between 1 word and 12 words in my dataset

#

So if I run it too long with a kernel_regularizer, even with a tiny value it ends up forgetting even the most common answer and just outputs nothing

#

I'm not sure if its natively masked but without the regulariser it seems to still remember to output yes

lapis sequoia
#

I am wondering is there any machine learning models use info in the past to predict future outcome with time as independent variable?

#

like doing time series forecast on stock price, is there any model based on time on machine learning?

delicate sphinx
#

I'm not sure about the time side or int of it but in text there are models called RNNs that maintain cell state to adjust future biases

#

And I think you may be able to create dense networks with a bias_initializer so you might be able to assign higher bias to ones that have historically scored better

#

Bias weights I think you can also get from previous iterations of the network though as far as I'm aware you'd have to build the model and compile it everytime

lapis sequoia
#

thanks

mighty spoke
#

Hi does anyone know about how to use power spectral density on the price of the particular stock vs the time series ?

lapis sequoia
lapis sequoia
#

It is not possible to transfer your consciousness to the virtual world. It is impossible. Maybe it will be possible to create a virtual, limited double of a person, their limited imitation of a character, etc., but it will be a lie, it will be a computer processed set of 0 and 1, not a real person or his actual consciousness. Man will never reach the intellectual level to create an artificial intelligence equal to his own. It is logically impossible. AI can do things faster and more accurately, but it won't be true intelligence, just a limited imitation of it. AI is not really intelligence, but an algorithm that mimics the characteristics of intelligence in a limited way. And that artificial intelligence does certain things faster and more accurately is due to the speed of computers and these algorithms, and the numerical nature of computers that compute everything much faster than humans with satisfactory accuracy, hence the efficiency and accuracy of these algorithms, but it is not intelligence.
Can someone tell me if im wrong or right?

cedar brook
#

Does anyone have any book recommendations for something between ISLR and the elements of statistical learning wrt. difficulty/complexity?

burnt knot
#

So for the past few months I've been trying to build a bilingual voice cloning machine
The other day I ran into an issue that I can't figure out if it's either an obstacle or a permanent shutdown

#

I need to synthesise my audio, which requires trading my metadata files

#

But the process isn't streamlined in the slightest, meaning I have to take all my hundreds of datasets and manually write them in to be synthesised

#

Which I think might not be what I'm meant to do it this is supposed to be dealing with one metadata file

#

I don't know, I feel as of there's still a chance for me to make this work but I feel as if everything I've been doing since August has been a complete mess

narrow wren
#

Code design for a ML project

Hi everyone - I'd love your thoughts on something I'm working on:

I'm writing some code for a ML project that I'm working on. The rough pipeline is something like this:

  1. Pull raw training data from a SQL database
  2. Perform feature engineering
  3. Train the model on the data (hyperparameters have already been defined and stored in a config file)
  4. Save and upload trained model onto a server.
  5. Use trained model to score on new observations

All of this code will reside in a GitHub repo, and the same repo will house code for other models as well (in different folders within the repo) that are similar but have different features for the raw data for those models. My question is for point number 2 above:

How do I write modularized code for feature engineering? Should I define a specific class for the raw data that I pull for this model, and then define specific methods for each feature that I add to this dataframe? Or am I thinking about this wrongly? I apologise if my question doesn't make a lot of sense, but I'd appreciate any thoughts on this. Also, if you have links to a good public repo that I can look at to get some inspiration, that would be awesome too.

TIA

serene scaffold
narrow wren
serene scaffold
narrow wren
serene scaffold
narrow wren
# serene scaffold I guess you can make a function that takes a dataframe and returns a series, and...

Yes, that was my initial though. But this function will only be applicable to a specific dataframe. For example let's say the repo in this case hosts code for 5 different models. And let's say I have a file named get_raw_data.py which has different functions like get_raw_data_for_model1(), get_raw_data_for_model2(), ..., get_raw_data_for_model5(), each of which runs a SQL query to pull in raw data and returns a pandas dataframe required for each of the models. Each of these dataframe have different columns and are quiet different from each other. So the functions I create in feature_engineering.py are not applicable to all 5 dataframes. In this case, how do I separate out these functions? More concretely, if I have a function creare_feature_foo() in feature_engineering.py that is applicable only to dataframe 5, what's the most pythonic way to do it?

iron basalt
narrow wren
iron basalt
#

The second method that takes more code makes sense if you want to make this a bit more robust against people new to the project that don't really yet know what they are doing or don't really care.

#

To that end you can have a dataframe format checking tool (in code) that takes in a given expected format.

narrow wren
iron basalt
#

That's how you get those massive OOP projects with crazy class names that make no sense (cough Java libraries cough).

#

And they don't do much (pretty empty, just some getters and setters).

#

The time to make an object (often with no methods actually (data object)), is when you find yourself passing around the same arguments to different functions together all the time.

#

So you can think of an object / struct as being a shared stack frame for the variables.

#

Actually the way to get the optimal structure for a project is to first write out the entire thing flat (step 1), that is, no functions, no classes, just all inline. Then you look at which parts repeat or are very similar other than some differences in parameters. Take those repeat parts and make a function out of them (step 2). Note that this effectively compresses the code. Then look at your functions and see if they tend to have a bunch of arguments that can be grouped together / are passed around together. If they are, make an object out of them (with those variables as the members) (step 3). Again this compresses the code. Now go back to step 1. Extra step: sometimes it's worth splitting up the flat code even if it does not actually compress it further because you want to be able to read what is happening as high-level steps (like a story), this is often done in the main function / file.

#

This method will give you the optimal code in terms of size while still being readable. No function nor object is unnecessary.

#

All programming paradigms teach this method in an indirect way.

#

Note that this method uses hindsight. It does not try to predict which functions or classes need to be made and then make them (no upfront diagram (UML), only maybe after making it). It let's the code itself decide how it wants to look.

#

Prediction of classes and functions can be incorrect and lead to bad design.

safe elk
#

Done UML not too great yep

#

Evolve your code as needed

#

If you have UML you have to update that too more work

#

Document with UML when things are stable

narrow wren
stone marlin
#

Hm, I've done a significant amount of EDA with OOP and I thought it was fine, I don't think it's generally undesirable. But if someone is starting out, doing an easier or smaller project, or even just working alone, etc., I agree that it's sometimes sort'a overkill. Having said that, before anyone yells at me for liking OOP or whatever, I'll note my experience here.

My experience with Objects in EDA: For much of the data I worked with, there were components, subcomponents, etc., and these were typically all part of the same "thing" --- though I've done this even with travel data and other types of data, not just physically-modeled data. I can use classmethods to parse the appropriate parts of the data into objects which each have their own methods --- mainly cleaning and descriptor methods, as well as plotting methods. I'm also able to isolate different parts for inter-component feature engineering, and for component-to-component feature engineering it is enough to check for the class type to see if the components "go together". If one also forces an additive-only structure with feature engineering (at least until returning the df) then it's also trivial to add new features in the class.

I also feel that readability is significantly easier than the normal "imperative" EDA where it's just a bunch of imperative code with comments above it saying what something does --- especially if you want to modify one single part of the imperative code and you didn't realize something lower required some strict size or something. Though, to be fair, I'm a huge stickler for Python's type-hinting + documentation --- usually my commit stuff runs mypy + sphinx and throws an error if something isn't documented --- though I'm on the extreme side of this, I know.

#

My bias, though, is to err on the side of verbosity and telling people exactly what needs to go in and come out. I know not everyone likes this, but it's been okay for me so far.

#

(This also works well with larger data, when you need to format / feature engineer around chunks and you need to make a DAG structure to do all that nonsense. But, again, that's kind of the "shared code" that Squiggle talks about above, just a specific point I wanted to note.)

iron basalt
# stone marlin Hm, I've done a significant amount of EDA with OOP and I thought it was fine, I ...

In Python one does need to fight the language a bit here by not being lazy on the type hinting. I like to also distinguish between API specification (what are the functions and classes, pre-conditions, post-conditions, side-effects, return values, errors, is the operation atomic?, security concerns, TODO, etc), and documentation (an extensive set of documents which can link to or contain the API specification as well). API specification can be auto generated by "documentation" tools, but documentation is often like writing a book and takes a lot of time. Documentation is often best done after when one actually knows what it looks like (rather than just prediction | giving it time to settle), because having to change it is a costly (in time) thing to do.

#

*The entire program does not need to be complete to write documentation, parts can be documented. I just look at the rate at which they were changed over time (commits). If the part has no been touched in a while (solidified), then I document.

stone marlin
#

I don't disagree --- Python is great for prototyping, so to do anything for "production" (type-hinting, etc.) it does take a bit of work and boilerplate to make sure everyone's on the right page. Moreover, there's a lot of DS people that don't even know type-hinting is a thing, or that they can lint/format.

I'm not sure I understand the distinction you're making exactly between API docs and standard documentation, but it might be different here because we build our APIs separate from our EDA tooling and both take a different method of documenting (APIs use swagger, autogen usually; EDA uses self-written numpy docstyle). Either way, it's good to separate those otherwise it gets confusing, I agree.

The only thing I disagree with here (and mildly so) is that I make my peeps document when they're submitting a PR, even if things change soon, as well as write unit tests --- but, having said that, it's usually at a point then where things have "settled". Also, these are people who are doing this kind of thing a lot, so we already know the gist of what functions we should have. I also found that if I don't force them to document right away, it will literally never get documented, but maybe that's just me not badgering people enough later.

#

That's an interesting strategy --- I do not know if it would work for me in a team setting, but I can imagine if it's a solo project that isn't quite clear yet (initial stages of EDA, etc.) then, yeah, totally, I can see that being a reasonable way to go. Especially for early EDA when you're just scratchin' around at stuff.

#

One additional downside (at first) of my methodology is: it takes a LONG time to get people used to documenting, linting, type-hinting, etc. Eventually, it's second nature, but there's a pretty significant ramp-up time.

iron basalt
#

(It can take weeks)

stone marlin
#

Ahh, I understand. Yes, I agree --- sort of like, reporting or "long-term" documentation of tools for wide-spread use or something.

iron basalt
#

(Most projects can't afford this unless it's legacy / will stick around for a long time and not change too much / or they just have a lot of employees like the big popular game engines for example)

stone marlin
#

Yes, that documentation, I agree, should not be done until the tool is in a usable state and is not being used only by the team that made it. I misunderstood --- in the above, when I say "documentation" I mean doing google/numpy/whatever style docstrings and other minor things like that in a Python file, as well as Swaggering the API (or whatever doc system is used).

iron basalt
#

It's the sort of thing one creates to make sure that future employees can understand it all in total long after you are gone.

#

(An example would be like hardware documentation like Intel's official docs for their chips)

stone marlin
#

Yeah, that is a much more serious endeavor, one that I've luckily not had to do much. I don't think I'd require my team to do documentation like that on most of, if not all of, our EDA code. For a full ML project, probably only a small bit on how to run the pipeline (since it's generally standard pipelining).

iron basalt
# stone marlin Yeah, that is a much more serious endeavor, one that I've luckily not had to do ...

I think the writing of some comment at least per file (at the top (I like to include example code)) is very nice. For the functions themselves, I like the various assumptions being made (pre-conditions, post, etc (every function has more of these than one might think)), and in some cases to cover up bad language design (like how C does not have multiple return values so I need to say which argument is actually an output). I prefer very long function names (sentences sometimes), and variable names so if those do not already let you know what it does then maybe a comment still, but often they are (and are suppose to be) sufficient.

stone marlin
#

Agree with everything here. Especially in the context of Python.

#

Seems legit. I wanted above to note that these things are options, but for someone starting out in their journey / doing a smaller project, yeah, probably focusing on other things is more reasonable, as y'all noted above.

lapis sequoia
#

Hi All anyone has used Shap for fb's prophet?

safe elk
#

Lol we all agree

hoary wigeon
#

Hey I need some good project topic for developing Deep Learning Model.

Can anyone suggest me some good topics ?

trim cedar
# lapis sequoia ``` It is not possible to transfer your consciousness to the virtual world. It i...

The Google Alpha zero is said to have learned chess on its own by playing against itself and then it also defeated the then best chess engine, stockfish! In that case, it has also incorporated self-learning and utilization of its previous learning in subsequent attempts, to play even better etc.
Other than this, human intelligence is capable of creative content creation like poetry or art, which code like GPT3 is capable already to some basic extent right?
Would this suggest the contrary to you?

austere swift
bold timber
#

Hi, I am so wondering about this plot. Why can't plotting an 'other' value in the 'Terminal' feature?

trim cedar
bold timber
trim cedar
rose loom
#

hi guys, i have homework for artificial intelligence class. Its topic is "artificial intelligence in law". They asked us to design an artificial intelligence program to help lawyers. But everything I can think of has been done. I want to get your opinion too. Can you share your ideas with me?

bold timber
trim cedar
rose loom
#

I think, good idea. Thanks😊

rose loom
odd meteor
odd meteor
# rose loom hi guys, i have homework for artificial intelligence class. Its topic is "artifi...

Hmm Intresting... 😀

You could explore Topic Modelling for fraud detection using LDA (not to be mistaken for Linear Discriminant Analysis) I mean using the other LDA model used for topic modelling.

If you then want to make your work more alluring (or maybe, sophisticated) then delve into Self-Supervised vs Semi-Supervised Learning to compare result gotten from your LDA topic model.

bold timber
#

Do we should scaling to the data before determine a cluster?

#

I want to use DBSCAN, but I am confused about when to scale the data?

odd meteor
bold timber
#

but di you know about silhouette score?

#

does it true silhouette score is the method to determine a cluster if it has a label?

odd meteor
# bold timber but di you know about silhouette score?

I only know some people use silhouette metric in gauging performance of their clustering model but I haven't used it before so I don't know for sure.

Have you sift through Google yet? I'm sure it'll know the right answer.

bold timber
#

but I wondering about this. When the format datetime like this, it's advisable to drop or stay included to used to scaling?

safe elk
safe elk
#

weapons of math destruction” (WMDs). She defines WMDs as opaque mathematical models that embed human prejudice, misunderstanding, and bias into the software systems that automate numerous aspects of our lives. Her book covers several types of these models and the frustrating injustices they can perpetrate. In addition to case studies about credit scoring, online advertising, employment, and insurance, O’Neil discusses the use of WMDs in the criminal justice system

#

Explore possible bias in models thst cause inequality and suggest ways to fix the models

#

The title funny too

#

Oh yeah i resd the book it is informative and entertaining

spark apex
#

hey,
I am want pose detection on web
So i tried move net with ONNX and TFjs.
For PC:
ONNX: ~20 FPS
TFjs: ~ 40 FPS

For mobile:
ONNX: 2 FPS
TFjs: 6 FPS

What can i do to improve speed ? I am doing all inference on Client side in web

Any other models i can try ?
Or something completely else i can try ?

Thank you for help in Advance

grave frost
austere swift
#

theres also things like tensorrt which do lower-level optimizations as well

spark apex