#data-science-and-ml

1 messages · Page 244 of 1

frail locust
#

how do you type in code like that in discord

thin pecan
#

@frail locust Use the character once in-between of words to do this:this`
If you want to do full code snippets, use it three times like:

this
Hello```
#

Wow

#

Use the ` key

frail locust
#

ty

slow onyx
#

Hi guys! Could please somebody help me with a computer vision task?

lapis sequoia
#

Hey guys hows the mit course on linear algebra on youtube for data science ?

#

hey guys

arctic cliff
desert oar
#

@frail locust for visual display, or actually in the number?

#

@arctic cliff are those strings "NaN"?

#

oh hm

#

thats actually a bit weird

arctic cliff
#

The weirdest thing is the whole dataset has 9879 rows

#

Do 0 values affect ?

desert oar
#

0 is not the same as null

#

so no

arctic cliff
#

I see

orchid badge
#

Hi, I’m using Tensorflow 1.15.3 / Ludwig with Google Colab and have followed this guide to the letter: https://www.searchenginejournal.com/automated-intent-classification-using-deep-learning-part-2/318691/#close
When I run !ludwig experiment --data_csv Question_Classification_Dataset.csv --model_definition_file model_definition.yaml
I get an error File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 491, in __getattr__ raise _exceptions.UnparsedFlagAccessError(error_message) absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --preserve_unused_tokens before flags were parsed.

Here’s the link to Colab if you want to take a peek: https://colab.research.google.com/drive/1LZ9aA06B3wXysgGc8tfVHi8FZqZFx2xZ?usp=sharing

Thanks for your time. Been Googling all afternoon and don't want to give up! Hopefully not too much of a noobish question.

desert oar
#

@lapis sequoia maybe something like this ```python
import pandas as pd

data = pd.read_csv('data.csv', parse_dates=['timestamp'])

violations_hourly = data
.groupby([pd.Grouper(key='timestamp', freq='60T'), 'VIOLATED_DIRECTIVE'])
.apply(lambda x: x.shape[0])
.to_frame('count')
.reset_index()

fig, ax = plt.subplots()
for lab, grp in violations_hourly.groupby('VIOLATED_DIRECTIVE'):
ax.plot(grp['timestamp'], grp['count'], label=lab)
fig.legend()
fig.plot()

#

using .count() itself for some reason resulted in an empty dataframe

#

not sure why

arctic cliff
#

@desert oar
The blue df contains 2 int numbers
Can I split the x number like the y ?

lapis sequoia
#

thanks @desert oar I'll test it out and let you know

#

many thanks

desert oar
#

@arctic cliff

ax = plt.gca()
ax.set_xticklabels(blue['blueDragons'].unique())

maybe try something like this

arctic cliff
#

AttributeError: 'numpy.int64' object has no attribute 'unique'
@desert oar

#

It's only one value
But I want to automatically split it into 7 pieces so the plot can be more logical

#

I still don't know if I should let it like that ..

desert oar
#

oh

#

how does that make sense

#

how did you even plot a single value

#

ohhh i see what you did

#
ax = plt.gca()
ax.set_xticklabels(blue.columns)
#

@arctic cliff

#

anyway ax.set_xticklabels is what you want

#

plt.gca() is "Get Current Axis"

#

an "axis" (in matplotlib terminology) is the area that you plot in

#

a "figure" is a grid of one or more axes

modest rune
#

In scipy, If I understand everything correctly, scipy.norm.cdf(x) returns the percent chance <=X will occur with respect to a normal distribution curve. Thus X = 0, is 0.5. X = Infinity, is 1.0 and X = -Infinity, is 0.0.

But, can someone explain what scipy.normal.pdf(x) does?

desert oar
#

the pdf is the probability density

#

for a continuous distribution its not really interpretable in and of itself, but

#

its analogous to density in physics/chemistry

#

integral of density within a certain domain = mass in that domain

modest rune
#

for a continuous distribution its not really interpretable in and of itself, but
@desert oar

That is what is confusing me. Because, scipy seems to let you do this.

desert oar
#

they have a lot of use

#

just not a lot of interpretation without more context

tidal bough
desert oar
#

^

modest rune
#

Ok, so PDF(0) is equal to 0

desert oar
#

no

tidal bough
#

they are both used, because well, you can easily get one from the other.

desert oar
#

PDF(x) is derivative of CDF at x

#

PDF(x) = (d CDF / dx)(x)

#

it makes more sense intuitively if you look at discrete distributions

#

for which PDF(x) = P(X = x)

#

whereas for a continuous distribution P(X = x) is always 0 because measure theory

modest rune
#

ok, that makes sense. So, if I have a function... like the black scholes model that has one or more CDFs in it. And I want to find the derivative of that function, all CDFs turn into PDFs.

desert oar
#

so you can only talk about P(X <= x) in a continuous distribution, which is the CDF

#

correct

tidal bough
modest rune
#

excellent, you all were great help 🙂

tidal bough
#

The red is the PDF of a gauss (normal) distribution, the blue is its CDF - for the normal distribution, it's called the error function.

arctic cliff
#

I guess my plotting itself is wrong .. @desert oar

#

My columns is contained of 0 or 1

#

So I'm summing

#

That's why I end up with only one value

modest rune
#

I ran into this while learning how to backsolve the implied volatility of a stock option from its current market price. The process required the usage of the derivative of the black scholes model.

desert oar
#

maybe something like this

blue = df[['blueDragons', 'blueWins']].sum()
plt.plot([0, 1], blue)
plt.gca().set_xticklabels(blue.index)
plt.xlabel('Dragon Effect')
plt.ylabel('Winnings')
#

@arctic cliff

arctic cliff
#

Oh

#

forgot something..

#

It's the same as before ..

#

Does that plot makes sense to you ?
Can you get any useful info from it ?

#

Maybe it's right I don't know

tidal bough
#

the x-axis is missing on the new ones

arctic cliff
#

Guess I don't even need a visualization for that kind of comparing ?

#

Well, They're actually not
X and Y are only 2 values
1 for x
1 for y

desert oar
#

the idea of my code was to try and control the X axes more. you can keep experimenting

#

but yeah you should just print those values imo

#

no purpose in graphing 2 points

tidal bough
#

wait, you have two points?

desert oar
#

@tidal bough they did .sum() on 2 columns

tidal bough
#

Then yeah, lol, maybe don't connect them with a line, that's very misleading 😅

desert oar
#

df[['a', 'b']].sum() returns a Series with 2 numbers and index values 'a' and 'b'

tidal bough
#

it's not really something you see in How To Lie With Statistics. Not even there do people get 2 points of data and pretend it's a straight line.

desert oar
#

i think they're just trying to make a plot lol

#

matplotlib docs are a swamp

arctic cliff
#

xD Gotcha !

tidal bough
#

they are

desert oar
#

try plt.scatter or plt.plot(x, y, '.')

#

or plt.plot(x, y, 'o') (i think)

tidal bough
#

or marker = "d",linestyle="", I think

arctic cliff
#

I tried the scatter thing on both the series and the original df columns

#

I got some weird outputs

#

Not weird if they make sense ..

#

Something you can expect from 0 and 1

desert oar
#
df.plot.scatter('blueDragons', 'blueWins')

did you try this?

#

oh yeah just 0 and 1

#

how about a cross table

arctic cliff
#

What's a cross table ?

desert oar
#
pd.crosstab(df['blueDragons'], df['blueWins'])
arctic cliff
#

Wait

#

Oh

#

I would prefer printing

bleak fox
#

@arctic cliff is the issue resolved? If not please provide me some background I may help you in this

arctic cliff
#

Give me a second

bleak fox
#

@arctic cliff thanks, i have gone through with this... Now please share what is the exact problem which you are facing?

arctic cliff
#

I tried to make a plot about the relationship between blue/redWins and blue/redDragons

bleak fox
#

@arctic cliff what is the point in plotting these 4 points...?

arctic cliff
#

To show out the relationship between the correlation of both of them

bleak fox
#

To show correlation we generally use the scatter plot, hence you can use df.bluewins vs df.redwins (all values)

#

Also for correlation, you can use df.corr() , to print

arctic cliff
#

Give me a second

bleak fox
#

df[['blueWins','blueDragons', 'redWins','redDragons']].corr()

arctic cliff
#

plt.scatter(df['blueWins'], df['blueDragons'])

bleak fox
#

Use the all values of these points, does your main df has only 4 rows?

arctic cliff
#

Nope
But it contains only 2 kind of values

#

0 and 1

#

1 = Won
0 = Lost

bleak fox
#

Can you share access of your notebook with me?

lapis sequoia
#

does anyone know why this sql query is not working in the 'WHERE' clause

arctic cliff
#

Sure thing, Wait

bleak fox
#

does anyone know why this sql query is not working in the 'WHERE' clause
@lapis sequoia share query

lapis sequoia
#
select extract(month from tstamp) as mon, extract(year from tstamp) as yyyy, count(number)
FROM table
WHERE mon != 8 and yyyy != 2020
GROUP BY 1,2
ORDER BY 2,1
#

getting column 'mon' does not exist

#

in the where clause

#

using psql

#

it works fine when i exclude the where clause but the columns are named mon and yyyy

bleak fox
#

Put "mon" And same for "yyyy" And try once

lapis sequoia
#

in the select or where clause?

arctic cliff
#

I guess I need to search for you to add you to the Collaborators ?

bleak fox
#

in the select or where clause?
@lapis sequoia in where clause

lapis sequoia
#

im still getting a column "mon" does not exist

arctic cliff
#

Do you have a kaggle account ?

lapis sequoia
#

WHERE "mon" != 8 and "yyyy" != 2020

bleak fox
arctic cliff
#

Done

bleak fox
#

WHERE "mon" != 8 and "yyyy" != 2020
@lapis sequoia nav, the name you are changing as mon and yyyy are just to display you, you can use the same conversion for where aswell like where year from tstmp! = x

lapis sequoia
#

oh yeah i understand that

#

i just wanted to know why it doesnt work for an alias

#

so i cant use an alias in the where clause?

bleak fox
#

i just wanted to know why it doesnt work for an alias
@lapis sequoia your where query is still on db side with actual column names where as alias is just giving a new name after extraction, now where is called before data extraction... Hence where always require the actual column name

#

@arctic cliff added heatmap and correlation matrix for your data in notebook cell 6 and 7

lapis sequoia
#

ok thank you @bleak fox

#

also last question

bleak fox
#

also last question
@lapis sequoia I'll try

lapis sequoia
#
select extract(month from tstamp) as mon, extract(year from tstamp) as yyyy, count(number)
FROM table
WHERE extract(month from tstamp) != 8 and extract(year from tstamp)  != 2020
GROUP BY 1,2
ORDER BY 2,1```
#

why does this result in all the rows containing 8 in the month AND all the rows containing 2020 in year to be lost?

arctic cliff
#

What can I understand from a heatmap? It looks unfamiliar to me

lapis sequoia
#

i just want 8-2020 to be lost

#

the where clause seems to be the problem

bleak fox
#
select extract(month from tstamp) as mon, extract(year from tstamp) as yyyy, count(number)
FROM table
WHERE extract(month from tstamp) != 8 and extract(year from tstamp)  != 2020
GROUP BY 1,2
ORDER BY 2,1```

@lapis sequoia this is what your filter is doing if month is 8 and year is 2020 don't include them...

lapis sequoia
#

yes

#

but it gets rid of 8-2017, 8-2019, 8-2020, AND all the months of 2020

#

so it gets rid of 1-2020, 2-2020, 3-2020 as well

#

i just want only 8-2020 gone

#

ok wait i changed it to OR and it worked

#

i dont know why lol

bleak fox
#

What can I understand from a heatmap? It looks unfamiliar to me
@arctic cliff is is giving correlation betwee all your columns, values near to 1 shows good correlation where near to 0 shows they are independent

#

so it gets rid of 1-2020, 2-2020, 3-2020 as well
@lapis sequoia happy for you😀

lapis sequoia
#

thanks!

#

for helping

bleak fox
proper swift
#

quick question, (apologies if im using the wrong terminology) is there a way to stop jupyter notebooks from automatically moving the notebook every time i click a cell? Its driving me insane

bleak fox
#

quick question, (apologies if im using the wrong terminology) is there a way to stop jupyter notebooks from automatically moving the notebook every time i click a cell? Its driving me insane
@proper swift can you please elaborate what is moving and where?

proper swift
#

sorry first time using jupyter, everytime i try to click the end of piece of code, the notebook jaggedly moves. I have to scroll with the mouse to get it back into a more suitable position
I have no additional extensions installed. Im only using vanilla Jupyter on Windows 10 with Python 3.8

bleak fox
#

sorry first time using jupyter, everytime i try to click the end of piece of code, the notebook jaggedly moves downwards. I have to scroll with the mouse to get it back into more suitable position
@proper swift sorry bro... It seems some issue with your browser/os/jupyter settings.. It is outside my scope... 😩

proper swift
#

😦

bleak fox
#

@proper swift you can use vs code notebooks... I feel they are better than jupyter notebooks

proper swift
#

Good to know. Sadly , i'm following a tutorial on Pandas which is using jupyter notebooks

bleak fox
#

Good to know. Sadly , i'm following a tutorial on Pandas which is using jupyter notebooks
@proper swift look it is just a place where you write code... You will be easily able to do things in vs code notebooks with same commands...

#

Good to know. Sadly , i'm following a tutorial on Pandas which is using jupyter notebooks
@proper swift check this out https://youtu.be/sHk9PH-9tSs

Follow me on twitter: https://evidencenmedia.com/twitter
In depth tutorial about how to get and open jupyter notebook inside visual studio code.

This is your opportunity to support the work I am doing.

Become a member of our exclusive data science community where we do pro...

▶ Play video
proper swift
#

thansk for the link, will check it out

bleak fox
#

@proper swift welcome...

#

@bleak fox Are u a data science student?
@lapis sequoia no, i am a professional with 7+ year of experience in this field.

tidal sonnet
#

link to where i can find out more about data science?

#

other than wikipedia?
Or is wikipedia reliable?

bleak fox
bitter harbor
#

@tidal sonnet Data sci is more than ml tho

tidal sonnet
#

ik

#

i picked py cause i wanted to learn ml
but i also want to know more about other parts of data science

bitter harbor
#

well imo, databases are a big part of it

#

as in, if you learn to use them, they'll be pretty useful

#

numpy + pandas + matplotlib are some key libraries to learn as well

tidal sonnet
#

noted

#

thank youuuu

desert oar
#

python is pretty much used in every field nowadays

bitter harbor
#

ngl still haven't used it for ela

desert oar
#

ela?

bitter harbor
#

english

desert oar
#

oh. people use it in the humanities, albeit more rarely

#

pandoc (in haskell) was written by a philosophy professor, if i recall right

bitter harbor
#

hm I'll have a look into it

desert oar
#

you have cases where people in the humanities write python scripts to manage their reference lists, things like that

#

not typically used directly in research, but can definitely be used as an automation tool by researchers

solid aurora
#

So I'm trying to write a kfolding algorithm that maintains class balance (like sklearn's StratifiedKFolds) and doesn't split groups (like sklearn's GroupKFolds)

#

I'm not sure how to go about doing that though

#

any ideas for a basic algorithm I could follow?

flat quest
#

@solid aurora
Not really any particular algorithm, except for getting all the elements for each class, find how many elements you need to have an even ration, and then throw away the extra elements (this could be problematic tho)

solid aurora
#

@flat quest I'm not trying to delete elements from my dataset at all

#

If my class balance is 1:4, StratifiedKFolds will make all folds approximately 1:4 as well

#

meaning it purposely tries to maintain that ratio rather than leaving it up to probability

flat quest
#

well one way to go about it would be lets say your fold has 1000 elements and there's a ratio of 4 cats to 1 dog.

The dataset has 4000 cats to 1000 dogs. So you calculate the number of cats that you'll need, then get all the cats in the ds and randomly select that number of elements you need, then do the same for the dogs.

There's probably a faster way to do it, but that's just one way to do it @solid aurora

solid aurora
#

@flat quest then I also need to not split groups across folds (like GroupKFolds)

#

what you described is basically StratifiedKFolds, which is what I would normally use unless I needed groups

#

going with your example of cats+dogs, let there be owners who each own anywhere from 1 to 100 cats and dogs

#

owners can have both cats and dogs

#

then I'm not allowed to split the pets of an owner across two or more different folds

#

but let's say I have 1000 pets and I want 5 folds, then I still need 160 cats and 40 dogs per fold

#

@flat quest make sense?

flat quest
#

ah gotcha

So not split groups across folds, but you want to balance the overall classes
You can't get perfectly equal class ratios across each fold, but you can make an approximation

Ok so one way would be.

Calculate the number of elements for each class that should be in the fold.
Then select the group that can reduce the required number of elements for that fold by the greatest (so lets say fold needs 4000 cats 1000 dogs, but group has 80 cats 20 dogs, the remaining elements required would be 3920 cats 980 dogs).
Continue doing so until we hit a certain threshold for all classes.

This would require some calculation steps to find the one that can reduce the required elements by the greatest. It might be slow. It'll also congregate all the large groups into the first few folds.

Another way would be to again calculate the number of elements of each class for the fold, but then select a group at random. Continue to do so, until one or all the classes have surpassed a threshold. This one is faster and will distribute the groups better, but it will be more error prone.

For example we might have many groups that are 100 cats 1 dog. This might cause the class distribution for the fold to be like 8,000 cats to 1000 dogs when all the class counts reach their threshold.

@scarlet wigeon

fervent bridge
#

A good complete blood cell dataset that I could be linked to?

bleak fox
#

@tidal sonnet Data sci is more than ml tho
@bitter harbor 100% right

solid aurora
#

@flat quest yea you're right that will probably be close enough

#

I was vastly overcomplicating it lol

#

I was trying to liken this to the packing problem and 0/1 knapsack and all

#

🙂

flat quest
#

yeah maybe, but if you still want to try that route, by all means 😉

I don't think it would provide much of an improvement @solid aurora
But you never know right?

small orbit
drowsy kite
#

hey guys, has anyone seen an example of a model being deployed on online streamlit?

#

im a little confused on how it works in terms of running the model on a server

#

like i know ordinarily you load the model onto a pickle file on flask

#

but all the examples ive seen with streamlit run the actually model before rendering the prediction

drowsy kite
#

nvm guys on the community had solution

safe sparrow
#

In keras, if im working on a multi-channel input layer, and throw a cnn onto that layer, does the cnn get applied to all channels, and how?

#
input = Input(shape=(100, 100, 4,))
x = Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same')(input)
#

is x added to all channels, and if so, how?

velvet thorn
#

no

#

okay, wait

#

what do you mean

#

never mind, I think I get what you meant

#

yes

eager ledge
#

Hi all, Pandas beginner here. Just wondering how it's possible to aggregate the value column as a weighted average?

#
df.columns = ['material_id', 'Thickness', 'Material', 'Width', 'Quantity', 'Date']
        pd.to_datetime(df['Date'])
        df = df.pivot_table(index=['Material', 'Thickness', 'Width'],
                            columns=[],
                            aggfunc=df.ewm(times='Date', halflife=datetime.timedelta(days=60)).mean(),
                            values='Quantity')
#

Her's where I'm up to, but I keep getting ```
IndexError: list assignment index out of range

ripe forge
#

Full trace back? Which line exactly gives that error

eager ledge
#

I found it was actually caused by another exception, it's trying to convert another column to a float?

#

I'm now getting a keyerror for 'Date' with the above code

#
KeyError: 'Date'```
#

Also have tried this ```
df = df.groupby(by=['Material', 'Thickness', 'Width']).agg({'Quantity': ['mean', df.ewm(times='Date', halflife=datetime.timedelta(days=60)).mean()]})

#

Which causes the original exception, which is ValueError: could not convert string to float: which relates to the Material column

acoustic halo
#

I need some ensembling advice: I have 5 neural nets I want to ensemble, I got good results with weighted averaging their softmax outputs. Now I want to try and apply weights to each models individual class prediction scores. I tried LR but it seems to be slightly worst than weighted average, plus I have to split my validation set up to train the weights

#

What else could I try?

#

Infact, should i train my stacking LR model on the validation or training data?

ripe forge
#

Never train on validation. Defeats the whole purpose of a validation dataset

acoustic halo
#

The general consensus is that you learn the weights for weighted averaging on the validation data rather than training

#

Which is why i did it

ripe forge
#

Uh, I'm not aware of such a general consensus but it seems wrong to me. Maybe I'm out of the loop.

#

Common sense dictates it's wrong though, so actually no, I'd challenge that statement.

acoustic halo
#

"A smarter way to ensemble classifiers is to do a weighted average, where the weights are learned on the validation data"

#

Which is in a book written by the author of keras

#

So idk

ripe forge
#

I guess the idea is to prevent overfit? It's so leaky though

#

Why would it not leak info from holdout into our actual ensemble?

acoustic halo
#

I am guessing since the whole models are weighted rather t6han individual outputs, its not as much of a concern

#

But yes, there would to some degree

ripe forge
#

I'll refrain from further comment on this, this is out of my depth and just feels wrong, which may simply be due to a gap in my knowledge.

#

If theres some material around this that you or someone else encounters and can share, I'd greatly appreciate it

acoustic halo
#

That quote is the only real information I have on the matter

spark cape
#

in a frequentist model, your model should never adapt to the validation data. in a baysian model, it will because that's how it works. I wonder if the meaning behind the keras book's author's comment was in this spirit

#

the weights are adapted as you go

acoustic halo
#

The only thing that I can think that makes a bit of sense is that if we treat the average model weights as hyperparameters like we would when optimising the models, testing them on the validation set and picking the best parameters based on validation accuracy

spark cape
#

thats the definition of overfitting

acoustic halo
#

Then I have no idea why, but everywhere uses the validation set that i can see

spark cape
#

maybe it's a nomenclature thing? ime it's training and test data. your model never sees the test data until you're super convinced that the model is robust and sound.

acoustic halo
#

I have 3 sets, train, validation and test

spark cape
#

ok i never heard of validation data then

acoustic halo
#

Validation is basically just for hyperparam optimisation

#

and is representative of the test set

spark cape
#

couldn't that be done with cross validation style chopping up the data?

acoustic halo
#

Yeah, basically, but the competition specifically hands out the validation set for that purpose, with the test set unlabelled until it closes

spark cape
#

and is representative of the test set
ok my background would be training data = historical data; test data = paper trading. so you can't have a data set that 'represents' data that doesn't exist yet

acoustic halo
#

I get what you mean, but it's for a competition, so the test set is already defined

#

It's just that it hasn't been labelled unlike the val set

#

Effectively the entire dataset is split into the 3 sets, using the test and validation, provide a model that best predicts the unlabelled test set, each set is made up randomly from the entire corpus

velvet thorn
#

some people have a train-validation-test split

#

so instead of something like K-fold cross-validation

#

you have a fixed validation set

#

of course, this can lead to overfitting your hyperparameters to the validation set

#

which cross-validation is more robust to

#

but in either case you find out on the test set.

#

which cross-validation is more robust to
@velvet thorn and "more robust", not "immune to".

#

so

#

Yeah, basically, but the competition specifically hands out the validation set for that purpose, with the test set unlabelled until it closes
@acoustic halo and yes, this does happen.

#

a fair bit.

#

but ultimately the point is that you perform hyperparameter tuning on a subset of the data you have that is not seen by the model, right?

acoustic halo
#

Okay, so then in my case, where i have the 3 sets, which do I use for weighted averaging?

velvet thorn
#

and that you perform ultimate evaluation of the model upon a set that has not been seen at all, even for hyperparameter tuning

acoustic halo
#

@velvet thorn yes

velvet thorn
#

Okay, so then in my case, where i have the 3 sets, which do I use for weighted averaging?
@acoustic halo context?

#

weighted averaging of?

acoustic halo
#

Each model is trained on the train set and hyperparam optimisation is done on validation set

quick fox
#

Hi everyone,

I have a problem I've been working on for a couple of days and I just can't find a solution to it. I have a 4 identical dataframes with a Multilayered Columns and a single-layered Index.

Each Dataframe consists of one sample with different dilutions and for each dilution 3 seperate measurements were taken. So all dilutions are grouped under sample and all replicates are grouped under their dilution.

I want to combine these different Dataframes so that the replicates of all 4 samples are grouped next to each other. So it's a kind of nested merge.

I tried merging two Groupby objects the following way:

for group, group2 in zip(df1, df2):
pd.merge(group, group2, on="Label for level2)

But I get an error saying Grouped Objects cannot be merged. I tried looking for a solution but I'm not even sure how exactly to search exactly what I am looking for. Any help is greatly appreciated.

Thanks a lot

acoustic halo
#

Now i want to weighted average the outputs

#

Do I use the train set to find the weights or the validation set

spark cape
#

@quick fox oh my god i spent like a week trying to merge two dataframes with multi column indices. I gave up and walked the lists and wrote the join myself because nothing made sense.

velvet thorn
#

it depends on your process

#

but in general I would say the train set...?

acoustic halo
#

This is the problem, I would have thought the train set too

velvet thorn
#

unless

acoustic halo
#

But resources i find online use the validation set

velvet thorn
#

you fit on the validation set and evaluate on the test set and stop there

acoustic halo
#

specifically the author of keras says use validation set

velvet thorn
#

@quick fox if you don't show sample data it's gonna be hard for you to get help

quick fox
#

@spark cape Yeah I'm starting to get desperate, too. If it comes to it I'll have to do it by hand which I really don't want to

velvet thorn
#

it doesn't sound like a very simple problem

#

so if you want someone to be able to work on it, you need a way for them to easily reproduce the initial situation

#

as well as know what the expected result is

#

in my time on SO

#

I've seen a lot of pandas questions go unanswered because it's not clear what people want

#

and in general, written explanations are bad.

#

data is good.

#

code that can be copy-pasted to create initial state is best

quick fox
#

Yeah I get that. I thought this might be a problem that's easy to solve for someone more experienced than I

velvet thorn
#

it's not that the problem is difficult to solve

#

it's that it's difficult to explain

#

and if I don't know what your problem is, I can't help you with it

#

just as you have no idea what to search for, I have no idea what your data actually looks like

quick fox
#

Alright I'm on the phone right now. I guess a picture won't do?

velvet thorn
#

well

#

if it's not immediately obvious how to solve it

lapis sequoia
#

hey @desert oar thanks for the help yesterday

velvet thorn
#

I probably won't go any further than hazarding a guess

#

but someone else might

#

so why not

#

specifically the author of keras says use validation set
@acoustic halo which book

#

is this from

#

by the way

acoustic halo
#

@velvet thorn Deep learning with Python by Francois CHollet

velvet thorn
#

I have read that

#

which page?

acoustic halo
#

265

#

Also found this:
"Finding the weights using the same training set used to fit the ensemble members will likely result in an overfit model. A more robust approach is to use a holdout validation dataset unseen by the ensemble members during training."

velvet thorn
#

yup

#

you fit on the validation set and evaluate on the test set and stop there
@velvet thorn should be this

#

I mean

#

the thing is

#

you're basically doing a form of boosting

acoustic halo
#

Except the part where I can't evaluate on the test set

velvet thorn
#

if you use the same dataset that the base learners are trained on

#

right?

acoustic halo
#

Because it's unlabelled

velvet thorn
#

yeah

#

so

#

what I would suggest is

#

split your data further

#

the train set

#

train base learners on t1, train meta-learner on t2, then evaluate on v

#

then final predictions on test set and submit that

#

like

acoustic halo
#

hmm okay not a bad idea

velvet thorn
#

I don't think it's wrong to train the meta-learner on the train set

#

but like I said

#

you're basically doing hardcore boosting

#

which is already fairly prone to overfitting

#

so

#

and yeah I mean meta-learning is probably pretty high variance already

#

what models are you using?

#

simple LR to combine?

acoustic halo
#

NN

velvet thorn
#

the metalearner

acoustic halo
#

no, i'm learning the weights through nelder mead minimisation currently

velvet thorn
#

oh, hm

#

I've actually never done that

#

but okay, that could work

acoustic halo
#

actually, differential evolution, not nelder mead

velvet thorn
#

oh, you can try

#

training with K-fold CV too

acoustic halo
#

I might just be lazy and leave it as is, my reasoning being that the averaging weights are not learnt per se, they are just another hyperparameter optimisation selected on the basis of the performance over the validation set

#

Just like layer size is selected on validation set performance

desert oar
#

@acoustic halo what are you doing? reinventing the wheel? 😛

#

oh ensembling

acoustic halo
#

I don't even know anymore

velvet thorn
#

☸️

#

man I got Kubernetes flashbacks there

desert oar
#

isnt the basic stacking method just train a bunch of uncorrelated models, then fit linear regression on their predictions?

acoustic halo
#

Yes

#

For stacking anyway

desert oar
#

so what are you working on? im curious

acoustic halo
#

I want to do average weight ensembling though, i tries stacking but got less than desirable results

#

fwiw, I actually won the first stage

desert oar
#

congrats

#

what is average weight ensembling? never heard of that

acoustic halo
#

Literally adding all the softmax outputs from vafrious models together

#

but also applying a weight to each model output

#

It's super basic, but I was planning on using the validation set to find the weights

#

Which is a big no-no apparently, despite being used in every article i look at

desert oar
#

well linear regression is a weighted average if you squint

acoustic halo
#

Yeah, I used LR to learn weights for each softmax output from each model individually

#

also theres this rule:
"Participants are NOT allowed to use the development set or any external dataset (labeled or unlabeled) to train their systems."

#

So i don't want to train the meta model on the validation set per se

#

Though I would argue optimising the model weights is learnt in much the same way model hyperparameters are learnt

desert oar
#

Yeah

#

What is the development set?

acoustic halo
#

the validation set

desert oar
#

Ah ok

acoustic halo
#

they just call it dev

desert oar
#

Yeah you wouldnt use that

#

Youd have to split your training data

acoustic halo
#

darn

desert oar
#

We actually have this problem at work, people want to use methods like temperature scaling and gold loss correction

#

All of those require "auxiliary" training sets

#

So if you get too aggressive using those methods you end up cutting down the size of your main training set significantly

#

Which can really hurt when you have a highly imbalanced problem or you are already low on data

acoustic halo
#

So, this is my problem, I have to go back and retrain all my models on a smaller training set

#

Which is a massive pain

desert oar
#

In some cases we have just reused the training set, but in those cases we were able to convince ourselves that the training so it wasn't significantly different from any other version of that data set we would have now or in the future

#

And we had to proceed very carefully to avoid overfitting

acoustic halo
#

But, this still doesnt explain why everyone seems to get their weights on the validation set

desert oar
#

Are they just breaking the rules? Lol

acoustic halo
#

based on the above link and textbooks

velvet thorn
#

didn't we discuss that earlier

desert oar
#

Oh, yeah. We do that too

#

But it really makes your validation set less useful

#

Think of it this way, every "external" procedure requires another validation set

#

So if you only have one validation set you basically need to decide which procedure gets trained on the main training set and which procedure gets trained on the validation

#

In this case the rules of the contest tell you what your decision is, either you reuse the training set or you split off your own validation sets

acoustic halo
#

This is what I originally thought, but my instructor insisted the rule was mainly in the context of using the validation set to train the neural nets

#

But what you said makes more sense

desert oar
#

what kind of problem is this

#

regression? classification? how many / what kind of features?

acoustic halo
#

Classification (1000 classes), features vary per model

#

But mainly n-grams, abstract syntax tree nodes and a special version of BERT

desert oar
#

ah very similar to stuff ive worked on

#

how imbalanced are the classes

#

and how imbalanced are the features

#

(why cant i spell imbalanced today)

acoustic halo
#

Classes are evenly split, features are alright

desert oar
#

how many records

sterile bobcat
#

If you want links for Machine Learning and AI learning courses and files send me a message

acoustic halo
#

50k in the training set, 25k in validation and test

desert oar
#

oh yeah

#

can you slice off like 5k from the training set?

#

use that to train the ensemble

#

how many models are you ensembling? like 5?

acoustic halo
#

Yeah it's 5, the main thing I am trying to justify in my mind is whether selecting the weights counts as hyperparameter optimisation

#

Because I could just grid search

#

and pick the best

desert oar
#

ew why

#

wait

acoustic halo
#

Just like picking layer sizes

desert oar
#

are you allowed to use the development set for hyperparameter optimization?

acoustic halo
#

yes

desert oar
#

oh

#

what the fuck

#

that's such a weaselly distinction

acoustic halo
#

Is that not the point of the validation set anyway?

desert oar
#

yes but in real life it's not a strict delineation

#

its not just "model + hyperparameters"

#

there are potentially several "layers" of training

#

as you're seeing here

#

if you have a model w/ gold loss correction, temperature scaling, and hyperparameter tuning, theoretically you have three nested training procedures

acoustic halo
#

So what i'm hearing is that i can get away with using the validation set to "optimise my hyperparameters" 😆

desert oar
#

well... more like "it's not the main model" is your argument

#

what a stupid rule imo

#

i think the idea here is that you aren't allowed to do a final training run that includes the validation set, before submitting

acoustic halo
#

Exactly

desert oar
#

idk

#

thats what people do in real life though!

acoustic halo
#

It's a mess

desert oar
#

thats the whole point of a validation set

#

can you like, clarify the rule w/ a judge

#

or i guess you can just do it anyway and hope nobody calls you out

acoustic halo
#

I'll do just that, but until I hear otherwise, I'll go on the basis that I'm allowed to use the validation set for hyperparameter optimisation which includes selecting weights

Realistically, I'm not too bothered about the competition, this is for my final project so I'm more concerned in learning the actual concepts than results

random perch
#

Does anyone have experience working on ml/ai open source projects? If so please reach out to me!

#

I'm trying to get started with tensorflow and opencv open source but im not sure where to start in terms of how to contribute.

paper niche
#

@random perch I've contributed to open source projects before, as I'm sure many here have. Just not specifically tensorflow and opencv, but I don't imagine the process being any/much different. Most decent projects have a Contributing page/document that point you where help is most appreciated by the core devs. For example tensorflow: https://www.tensorflow.org/community/contribute

lapis sequoia
#

@Klaouss#9437

random perch
#

@paper niche Thank you very much! I appreciate your help 🙂

faint ravine
#

Hey everyone, does anyone have experience with data generators?

weak sentinel
#

a little @faint ravine

faint ravine
#

What kind of data do you usually generate?

#

And are you aware of any generative algorithms other than the famous GAN?

weak sentinel
#

also stupid question: for an input layer do i have to use keras.layers.Flatten() if im just inputting an array of parameters

#

i just used generators for a Image Classifier CNN

#

just modulated CIFAR-10 for more training data

faint ravine
#

So, just standard generation? Like rotating the image or playing around with the contrast?

weak sentinel
#

yeah nothing complicated at all

faint ravine
#

Neat

weak sentinel
#

sorry if im not helpful lol

faint ravine
#

Lol, It's ok.

hollow silo
#

is SVM a good project to put on a resume

#

entry level roles

faint ravine
#

Probably not

#

Aim for something that has a bit more purpose. Coding up an algorithm and running it often does not count. You have to "make it do something" and show results.

#

what does your SVM do anyway?

desert oar
#

it might be a good project

#

if it's "i did a data science project and i happened to use an SVM for my model" that seems like a fine resume item

#

(as long as you can justify why you used the SVM)

hollow silo
#

what does your SVM do anyway?
@faint ravine i used SVM for point cloud segmentation

#

basically i had some point cloud data with different points belonging to differnet classes

#

and i implemented a multi class SVM to segment the point cloud into different regions

faint ravine
#

@desert oar is right.
Yeah, but don't say that on your resume. It sounds like: "I got some data, and I classified it". Something like: "I built a dog/cat recognizer" would be better.

desert oar
#

of course

#

youre describing your project

#

make your project sound like a project

#

put the details in the bullet points

#

what kind of data was in the point clouds? or was it just a toy project w/ simulated data?

hollow silo
#

they were 3D Coordinates from a LIDAR Scanner

faint ravine
#

Yeah, don't rehearse the theoritical ideas that you learned. Implement them into something practically useful.

lapis sequoia
#

Guys give me a advice like how to learn data science so how do i see data-science in thinking way

hollow silo
#

and the dataset is public

desert oar
#

LIDAR Scanner Data Segmentation

  • Used SVM to segment LIDAR scanner data
  • etc...
faint ravine
#

Guys give me a advice like how to learn data science so how do i see data-science in thinking way
@lapis sequoia Do you wanna plug-and-chug or learn the underlying theory?

hollow silo
#

how does this sound?

LIDAR Point Cloud Segmentation 
  -Implemented a soft margin multi-class SVM for point cloud segmentation 
  -Reduced computation time (by some metric) using efficient vectorized operations 
  -Achieved so and so accuracy 
lapis sequoia
#

@faint ravine what u mean by plug-and-chug

faint ravine
#

That's good

desert oar
#

did you implement the svm though?

hollow silo
#

yes

desert oar
#

nice

hollow silo
#

like do you mean if use scikit learn?

desert oar
#

yeah

hollow silo
#

i wrote an SVM Class using numpy

desert oar
#

very nice

#

you might also want to mention where/how you got the data

hollow silo
#

so no scikit

#

you might also want to mention where/how you got the data
@desert oar will do

lapis sequoia
#

i'm a beginner in numpy

#

but i love numpy

desert oar
#

one exercise that i was told to do by my MA thesis advisor was to write an executive summary of my projects

lapis sequoia
#

Thank You

desert oar
#

a 1 page document w/ maybe 1 plot. basically an extended abstract

hollow silo
#

i realised that being able to describe your projects is a very important skill

#

something you should keep a log of while you are building the project

desert oar
#

+1

lapis sequoia
#

+1

#

i wanna build something with numpy

desert oar
#

@hollow silo sounds like you'll have no problem getting hired, if that's your mentality 🙂

hollow silo
#

i wanna build something with numpy
@lapis sequoia write a neural network from scratch

#

one layer

lapis sequoia
#

hard to build or easy?

hollow silo
#

@hollow silo sounds like you'll have no problem getting hired, if that's your mentality 🙂
@desert oar thank u 🥺 its really hard bc i dont have a degree directly related to CS etc

#

the grind is real in software

#

hard to build or easy?
@lapis sequoia you can use numpy for pretty much anything actually...if you're interested in data science and ML then yeah a one layer NN is of moderate difficulty.. you can extend that to an autoencoder as well

lapis sequoia
#

oh thanks

hollow silo
#

the power of numpy lies in matrix slicing and dicing operations

lapis sequoia
#

yes i'm learning numpy

hollow silo
#

yeah np.dot etc is cool but a lot of times people just use for loops over their numpy matrices when the same thing can be represented as a matrix product

lapis sequoia
#

slicing index, shape,reshape i love them

hollow silo
#

yes i'm learning numpy
@lapis sequoia if you are interested in computer vision, i recommend following the cs231n course

desert oar
#

wait until you guys learn about np.einsum

hollow silo
#

you can do their assignments

#

wait until you guys learn about np.einsum
@desert oar i have read about that 😄 but never used it

#

i didnt understand it too well

desert oar
#

"regex for array math"

hollow silo
#

"regex for array math"
@desert oar thats a neat way to put it

tidal bough
#

sadly, unless I missed something major, it's "only" for any kind of multiplication operations

#

you can't, say, make it calculate the sum of each element with each.

#

you can do outer multiplication: "i,j->ij" but not summing

arctic wedgeBOT
#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

digital juniper
#

hey, does anyone here use kaggle? i'm trying to make a team notebook for a competition and i'm not sure how to do it

flat quest
#

sadly, I think you did miss something major 😉 @tidal bough

Its just a summation equation at its core so it can deal with summing each element with each.
All you do is np.einsum('i,i', a, b)

desert oar
tidal bough
#

@flat quest nah, that's just scalar multiplication. I meant, from 1d array a and b, produce a 2d array c, where c[i,j]=a[i]+b[j]

flat quest
#

i mean i don't see how that would be multiplication. It's just summation all the way through.

But yeah if u want to do c[i,j] = a[i] + b[j]. You can do explicit mode I believe np.einsum('i,j -> i,j). I'm not sure entirely if that works, but based on the docs, it seems like it would. @tidal bough

tidal bough
#

@flat quest nope, np.einsum("i,j -> ij") would do c[i,j] = a[i]*b[j]

flat quest
#

ah right. Yeah not thinking too straight this morning lol.

tidal bough
#

i mean i don't see how that would be multiplication. It's just summation all the way through.
scalar multiplication of vectors(1d arrays) is defined as the sum a[i]*b[i] for all i 🙂

flat quest
#

yeah ur right, it's all multiplication, and then summing over those multiplicated terms
My bad :/

Guess it's up to the standard addition to deal with those problems then 😉

tidal bough
#

Well, semi-standard. You do this via the glory of np.ufunc.outer 🙂

#
In [254]: arr
Out[254]: array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [255]: np.add.outer(arr,arr)
Out[255]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 3,  4,  5,  6,  7,  8,  9, 10, 11],
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13],
       [ 6,  7,  8,  9, 10, 11, 12, 13, 14],
       [ 7,  8,  9, 10, 11, 12, 13, 14, 15],
       [ 8,  9, 10, 11, 12, 13, 14, 15, 16]])
#

huh, this function is actually pretty slow

#

yeah, this dumb C-loop is 4 times faster:

@numba.njit
def outer_sums_full(arr):
    res = np.zeros((len(arr),len(arr)),dtype=arr.dtype)
    for i in range(len(arr)):
        for j in range(len(arr)):
            res[i,j]=arr[i]+arr[j]
    return res
#

And this is a bit more faster, even, and is a very simple broadcasting-based solution:

@numba.njit
def outer_sums_3(arr):
    arr=arr.reshape(-1,1)
    return arr+arr.transpose()
toxic bone
#

Hello guys! I want to create some universal pandas reader for parquet/csv/hdf5/excel/sqlite/whatever depending on file extension.

How do you think, is it better to create it as function with **kwargs to send arguments or maybe use some kind of decorator?

Never tried decorators in practice, maybe it's time ti try

blazing sundial
#

Hey fam could I please get some help with creating an array in numpy?

bitter harbor
#

what kind of array

blazing sundial
#

hey man its super simple but its giving me a "not callable error"

#

totalBands = np.array([[180,15], [5,20], [8,16],])

tidal bough
#
In [296]: totalBands = np.array([[180,15], [5,20], [8,16],])

In [297]: totalBands
Out[297]:
array([[180,  15],
       [  5,  20],
       [  8,  16]])
blazing sundial
#

sorry that last comma shouldnt be there but stil

tidal bough
#

My guess is that you did something bad like redefining np.array.

#

what's the full error?

blazing sundial
#

oh i see! that makes sense cause it worked yesterday

#

smh

tidal bough
#

yeah, definitely redefined something

#

check these:

In [300]: type(np)
Out[300]: module

In [301]: type(np.array)
Out[301]: builtin_function_or_method
blazing sundial
#

could you explain a little more? Im a bit confused

tidal bough
#

do type(np) and see what it gives you, same for np.array

blazing sundial
#

what should i be looking for in those outputs?

tidal bough
#

the above is what you should get(if they aren't redefined).

blazing sundial
#

ohhhh

#

check it out

tidal bough
#

I'm not even sure how you did this.

bitter harbor
#

^^

tidal bough
#

do

del np.array
del np
import numpy as np
blazing sundial
#

lmaoooo f me

#

could i just clear everything?

tidal bough
#

well, restarting ipython works too, yes.

blazing sundial
#

could i just close out of spyder and reopen it?

#

sorry, im switching over from matlab and im still learning

tidal bough
#

yup, or probably just close and open the console.

#

or push a button somewhere to stop it.

blazing sundial
#

i think it happen when i tried to define a new array above

#

above it*

#

anyways let me try to restart it

bitter harbor
#

i think it happen when i tried to define a new array above
can you send that code?

tidal bough
#

you must have done something really weird like np.array = <something, a float64 to be precise>

blazing sundial
#

hmmm probably lol, could i get your advise on what im doing? maybe you could point me in the right direction

#

so i have that array right? basically for each element in that array i send earlier (i.e ([[180,15], [5,20]]) I want to add additional numbers to each element in asending order, so the element [5,20] would turn into [5,6,7,8,9,...20]

tidal bough
#

sounds like you just want arange.

#

the problem is that arrays have a specific size on each dimension.

#

like, you can't have the second row be length 5 and the first length 10.

#

you can only pad it with NaNs, I guess.

#

so for your task, a list of 1d arrays might make more sense.

#

though it depends on what you're later using that list for.

blazing sundial
#

yeah i was planning on using a 1xN array. The original idea was to interate through the array and find where each element has a matching value

tidal bough
#

uhh

#

!xy

arctic wedgeBOT
#

xy-problem

Asking about your attempted solution rather than your actual problem.

Often programmers will get distracted with a potential solution they've come up with, and will try asking for help getting it to work. However, it's possible this solution either wouldn't work as they expect, or there's a much better solution instead.

For more information and examples: http://xyproblem.info/

tidal bough
#

What's the actual problem you're trying to solve?

blazing sundial
#

oh my b, basically the problem is I have 12 ranges of data,ranging from 0 to 360, and i was trying write a function (or just code i guess) to see where if there is a value that the 12 ranges contain

#

does that make sense? sorry, i can try to explain better

tidal bough
#

if there is a value that the 12 ranges contain
So, whether there's an intersection of these 12 ranges?

blazing sundial
#

yeah:)

tidal bough
#

I think it's much simpler and doesn't require any numpy.

#

Consider what the intersection of two ranges might be. It's either:

  1. Some range. Say 5:10 intersects with 7:12 on 7:10
  2. Empty set. Say, 5:10 with 20:50 have no intersection
#

So you need to just write a function determining the intersection of two ranges. Apply it to the first two elements, then to the result of this and the third element, then to the result of that and the fourth...
(this way of applying is, by the way, what functools.reduce does)

blazing sundial
#

ohhh i see! okay thank you so much. Im gona work on this and ill see if can handle it from here

tame nest
#

I can probably ask a question here and someone help me with a pandas code

faint ravine
#

Yes, you can't.

tame nest
#

I was advised to come to this channel

faint ravine
#

I'm just kidding, what is it?

tame nest
#

I am trying to create a new column based on a condition on another column of string values and am facing weird behavior

#

please see this

#

I am trying to create the new variable 'flee1' based on the variable 'flee'..it should give True when 'flee' == 'Not fleeing'

#

any ideas anyone..even kaggle notebooks giving the same prob

#

nobody knows it seems 🙂 stackexchange for the real geeks

odd yoke
#

use .map @tame nest

#

df["flee"] = df["flee1"].map({True: "fleeing", False: "not fleeing"})

spiral peak
#

(it was a typo for anyone curious)

tame nest
#

🙂

#

@spiral peak helped me

#

the issue is solved..

#

Thanks @odd yoke

untold rose
#

are libraries like tensorflow or pytorch required to make neural networks?

odd yoke
#

required ? no
useful ? definitely

#

and chances are, if you don't use them, your code will very likely be less efficient in many ways

untold rose
#

ah ok

#

is it possible to make one using only numpy?

odd yoke
#

again, yes

#

but you'll have to do the differentiation yourself, you won't be able to run your code on the gpu, and there are not as many functions commonly used to build networks

untold rose
#

alright

#

thanks

arctic wedgeBOT
#

Hey @atomic oxide!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

atomic oxide
#

Hello guys could I please get help to solve this issue why the yticks label are out of their places (ticks)

velvet thorn
#

Hello guys could I please get help to solve this issue why the yticks label are out of their places (ticks)
@atomic oxide what do you mean out of place

#

I'm assuming you intend the logarithmic scale?

#

do you mean like how the major tick labels appear to be misaligned with the major ticks?

atomic oxide
#

yes this is my question

#

I get this issue in all my plots

velvet thorn
#

hard to say without seeing all your code.

#

you do?

#

that's weird

#

can you create a basic plot and show me?

#

e.g.

#
fig, ax = plt.subplots(figsize=(4, 4))

x = np.linspace(0, 2 * np.pi, 200)
y = np.cos(x)

ax.plot(x, y)
atomic oxide
#

ok

velvet thorn
#

yeah, then it's not all your plots

#

I mean, I guess this is a long shot but

#

you're not manipulating the ticks and/or tick labels manually, right

#

or using some custom Locator/Formatter that might cause this

arctic wedgeBOT
#

Hey @atomic oxide!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

#

Hey @atomic oxide!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

velvet thorn
#

just a heads up

#

if your code is that long I don't think many people will want to look through it.

atomic oxide
#

not long but how can i upload it

arctic wedgeBOT
#

Hey @atomic oxide!

It looks like you tried to attach file type(s) that we do not allow (). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

velvet thorn
#

read what the bot said

#

my GOD

#

why don't you try removing each line that deals with the y-ticks

#

until the problem stops

#

so you can figure out which line is causing it

#

ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y,pos: ('{{:.{:1d}f}}'.format(int(np.maximum(-np.log10(y),0)))).format(y))) this is my guess though

atomic oxide
#

I'm going to try, Thank you soooo much

velvet thorn
#

I would really suggest you clean up your code a little

#

actually, maybe clean it up a lot?

atomic oxide
#

Ok

dim olive
#

hello friends, given a large dataset of [x, y] and assuming these two have some sort of correlation, what methods could I use to measure how closely these two are related?

I am analyzing video game statistics where x is vision of the map, and y is deaths. each set of [x, y] are from a new, unrelated instance of the game.

dim olive
tidal bough
#

this doesn't look super correlated 😅

#

maybe you want to train a regression neural net for this.

#

"how well can my neural net predict y by x" is, technically, a measure of their correlation 🙂

dim olive
#

Yeah, sadly it does not, although it should have a fairly close correlation

#

ok, ty. My current regression model says: "lol"

#

I would very much like to prove whether or not it is correlated. As if it is not, it means x can be thrown out completely from my other analysis

bitter harbor
#

I mean that’s not awful

tidal bough
#

that's about as good a relationship as I can predict! 😅

#

by the way, is your x one-dimensional?

#

because, uhhh, this really doesn't look like enough data to predict y.

dim olive
#

It is one dimensional and not enough data to fully predict y, but I would like to measure the correlation before moving forward with some more in-depth ML

#

This is using linear regression, but I wanted to know if this was a reasonable approach to this specific problem

#

actually, the red line represents what I assumed, higher x should mean less y

#

but I dont just want to confirmation bias this whole project

serene scaffold
#

Is anyone familiar with a tool whereby you can enter a string, highlight a portion of that string, and see the index of the first and last character of what you've selected?

#

If I try to Google something like this I only get results about HTML.

#

I need it to write unit tests for an nlp project.

velvet thorn
#

Is anyone familiar with a tool whereby you can enter a string, highlight a portion of that string, and see the index of the first and last character of what you've selected?
@serene scaffold you mena like a frontend thing?

serene scaffold
#

I'm not sure what you mean

velvet thorn
#

when you say a "tool" do you mean like a (very small) webapp?

#

because when you mention highlighting I'm assuming there's a GUI?

serene scaffold
#

I'd prefer if it had a GUI because I'm not sure how to quickly disambiguate which instance of a substring I'm referring to if it were a CLI.

velvet thorn
#

hm I don't know of anything that can do that offhand but it shouldn't be that difficult to build

vague ivy
#

if i did help = random.randint(1,250) would it be used as a integer or a string when i do while help != 1: print('no')?

serene scaffold
#

!e

import random
thing = random.randint(1, 250)
print(thing, type(thing))
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

130 <class 'int'>
vague ivy
#

ok thanks

#

also is there a way to make a if statment in a loop?

#
whatever = random.randint(1,250)
test = 1
while whatever != test:
    print('no')
if rsr == UR:
    break
    print('UR')```
serene scaffold
#

can you open a help session and ping me?

modern canyon
desert parcel
#
Traceback (most recent call last):
  File "D:\Coding\python\AI\dropout.py", line 18, in <module>
    train_ds = TensorDataset(inputs, targets)
  File "D:\Coding\python\AI\lib\site-packages\torch\utils\data\dataset.py", line 158, in __init__
    assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)
AssertionError
#

Does anyone know what this error means

#

I was able to fix it

jovial thorn
#

Hey! How did I write the code blocks? I keep forgetting

#
def isPower (num, base):
    if base in {0, 1}:
        return num == base
    power = int (math.log (num, base) + 0.5)
    return base ** power == num
#

I'm using that function that I found in stackoverflow to check if a number is a power of 2, but I'm curious because I just don't undestand why power is the logarithm of the number in the base, but +0.5. What is the +0.5 achieving?

#

I feel like it's basic math but I just don't get it, it's been a long week lol

kind granite
#

lol where did you find that function @jovial thorn

#
return math.log(num, base).is_integer()
#

this will do it

serene scaffold
#

@jovial thorn @kind granite if this isn't related to data science you might want to solve this in a help channel; see #❓|how-to-get-help

jovial thorn
#

is basic math not related to data science????

#

thanks a lot @kind granite !

serene scaffold
#

not necessarily

jovial thorn
#

emphasis on necessarily ? I think it's easy to make a case for needing proper logarithms in data science lol

#

I wrote it here cause I deemed it related to data science

serene scaffold
#

I was only pointing out the possibility that an individual help channel might be better. feel free to carry on.

jovial thorn
#

thanks!

ripe forge
#

For future reference, It's easy to make a case for random abstract things, doesn't mean it's correct for this specific instance where it's literally talking about an ispower function. Just because data science is built on math doesn't mean anything goes and we start talking about addition or multiplication. You can try #algos-and-data-structs for questions related to algorithms, perhaps that is a better fit.

vague ivy
#

i have this code right here:

#
import random
t = 1
while (r := random.randint(1,40)) != t:
  print(r)
else:
  print("yes")```
#

i want python to print ('amount of times "r" was said')

#

(it may not look like it but it is data science)

oak iron
#

how did you do that code thing?

#

liek that code block

vague ivy
#

code here

oak iron
#

like how do you bold/spoilers, but with 3 `?

vague ivy
#

yep

oak iron
#

k, thanks

vague ivy
#

but can you help me?

oak iron
#

how?

vague ivy
#

i have this code right here:

import random
t = 1
while (r := random.randint(1,40)) != t:
  print(r)
else:
  print("yes")```
i want python to `print ('amount of times "r" was said')`
(it may not look like it but it is data science)
oak iron
#

i think the proper format is randint(num1, num2)

#

not random.randint()

vague ivy
#

i did that.......

#

ooooh

#

sorry

oak iron
#

in the while

pale thunder
#

you would have to increment a counter variable everytime you do print(r)

vague ivy
#

i didnt fo import random from random import *

pale thunder
#

your randint call is fine

vague ivy
#

so i have to do random.randint

oak iron
#

i think from random import randint works

vague ivy
#

no im talking to bizzarebazzar

#

oh ok

#

i will

pale thunder
#

import random works just fine

oak iron
#

yea, idk what's going wrong

#

wait

#

what compiler are you using?

ripe forge
#

Wrong? Nothing I thought. They just want to do something extra.

oak iron
#

wait

#

for me it outputs a bunch of numbers, before outputting yes

pale thunder
#

yes, that is what that code does. The goal is to also make it write the amount of numbers it wrote

ripe forge
#

Aye, that's because both r and yes is printed,

oak iron
#

random order, though

ripe forge
#

r is a randint

oak iron
#
14
32
16
...
28
6
yes
ripe forge
#

So yeah, make a counter variable before the loop set to 0. Each time the while loop is satisfied, add to this counter.

oak iron
#

that's my output

ripe forge
#

Print counter variable at the end of everything else outside the loop.

#

That's your output because that's what the output should be. So the real question is this, what did you expect instead?

#

There's a mismatch between what the code does and what you think it does in this case, if you think something is unexpected. We can try to address that.

pale thunder
#

could also do something like

import random
for i, r in enumerate(iter(lambda: random.randint(1, 40), 1)):
    print(r)
else:
    print(f'a number was said {i+1} times')
```but a counter variable is probably saner.
oak iron
#

I think he's offline...

copper hemlock
#

hello, i have a question

#

about pytorch

#

my image batch comes with dimension of 3 instead of 4, its missing the color channel

#

how is this possible

ripe forge
#

Each image has a dim of 3? (in which case that's normal) or does the whole batch only have 3 dimensions

#

In general, When images don't have colour channel it means they are essentially greyscaled images.

#

Such that the same pixel value is used for all 3 channels at once

uncut shadow
#

Hey. What does Flatten layer do (in e.g. Tensorflow) and what is it for?

ripe forge
#

Reduce the dimensions of something.

uncut shadow
#

Oh

ripe forge
#

Say turning a 2d matrix into a 1d array

uncut shadow
#

What's the point? Couldn't u just do this before feeding data to model?

ripe forge
#

Sure, you could.

uncut shadow
#

Hmmm

copper hemlock
#

batch is supposed have 4 dims no?
[batch_size, in_channel, w, h]

uncut shadow
#

So both ways are possible?

copper hemlock
#

mine comes with [batch_size, w, h]

#

so it throws error

ripe forge
#

It's probably logically easier to understand data going in normally, say images make sense as 2d for example

uncut shadow
#

Oh, yeah makes sense. Thanks

ripe forge
#

If there's a shape error eysidi you can freely reshape as needed I'd assume. This is a guess but it shouldn't cause problems

#

Make sure you keep the correct axes when you reshape though

#

So perhaps a shape of [batch size, 1, w, h] if your notation is correct.

copper hemlock
#

hmm i will try that thanks

#

i think the issue is caused from my dataloader

random perch
#

Should I buy the book about tensorflow and keras by O’reilly

stuck oar
#

Hey

#

anyone uses Visual Studio notebook here?

#

I thought of using that but I can't find the equivalent of shift+tab (jupyter notebook) on vscode

#

anyone here familiar with this?

modern canyon
#

@stuck oar just hover your mouse over the function

stuck oar
#

@stuck oar just hover your mouse over the function
@modern canyon right haha, thanks!

modern canyon
#

👍

teal notch
#

that error shows to me whene i'm trying to Draw A picture INSIDE OTHER IMAGE

#

can u help me

#

?

grave frost
#

So pretty wide and vague query - anyone know a model which uses transformers to be good to be used for seq2seq or NMT purposes?

#

Would something like FairSeq would be considered good, or maybe some flavors of BERT like models like RoBerta or BART or even GPT-2, Would these be good models for direct sequence to sequence conversion?

#

I think FairSeq is pretty good in itself, since it is dedicated to seq2seq problem types. Would it then be a good idea to use BART, RoBerta and all the other NLP models out there?

molten hamlet
#

@teal notch tuple object has no load

teal notch
#

@teal notch tuple object has no load
@molten hamlet yeah i know but how can i make this bot add images to other image

molten hamlet
#

!ask

arctic wedgeBOT
#

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving.
• Be patient while we're helping you.

You can find a much more detailed explanation on our website.

half bloom
#

idk where to put this but

#

whats the mistake here

desert parcel
#

what are you using to write your code in

#

just curious I can't actually help you lol

lapis sequoia
#

What r some good ai tutorials/courses

desert parcel
#

It's a VOD now

#

I'm currently on computer vision and logistic regression i'm not progressing as fast as I wanted it to

faint ravine
#

How can I build a digit recognizer in python?

lapis sequoia
#

@faint ravine you can build that using opencv,sklearn,numpy

faint ravine
#

How do I get the best neural network?

#

what is the best CNN for handwriting recognition?

#

SOTA

raven mulch
#

If you want to learn how to make your own deep learning library feel free to check out my youtube series! 🙂 https://www.youtube.com/watch?v=nNFsHQaD7gQ&t=1182s

Hello!
Today we start a new adventure where we will be expanding on the JoelNet library with the ultimate goal of deploying our own MNIST web classifier (and maybe attacking it using some simple adversarial attacks). The idea is to model the library around the scikit-learn api...

▶ Play video
#

I'm a researcher in machine learning and I make videos on the subject, I'm hoping to create discussions in the comment section to share knowledge, feel free to share them around if you think they are interesting and we can all learn together 🙂

faint ravine
#

What kind of research do you do?

glacial rune
#

I have a script that I would like to run daily to get data from websites. If I want to store this locally, would it be best to create a csv file and append the entries there?

#

another thought - if I wanted to store this online/in the cloud, what would people recommend?

#

one potential issue I can see with the CSV is I'll need a tab for each thing I'm tracking? Which could become quite high?

desert oar
#

Sqlite

#

Or 1 file per request

glacial rune
#

ah ok, I'll look into SQLite, thanks

faint ravine
#

you're welcome.

glacial rune
#

would it be bad practice to have manytables in a SQL database? as I'll be tracking price over days

#

and the table name would be the product name, I guess

raven mulch
#

@faint ravine in machine learning security

#

That’s what some of my videos are on

#

Robustness etc

faint ravine
#

Like, adverserial attacks and such?

raven mulch
#

Yeah

faint ravine
#

Neat

raven mulch
#

I reviewed some papers on that already

#

🙂

faint ravine
#

So you must be aware of GANs?

raven mulch
#

Generative adversarial networks?

faint ravine
#

yeh

raven mulch
#

They don’t have much to do with adversarial samples

faint ravine
#

Really?

raven mulch
#

Yeah a lot of people get that mixed up haha

lapis sequoia
#

How would you write a docstring for something like kwargs in a function? The example below demonstrates two arguments from kwargs. The actual function will accept many more keyword arguments. I would like to define in the docstring what the possible keyword arguments are.

def prandtl(**kwargs):
    cp = kwargs.get('cp', None)
    alpha = kwargs.get('alpha', None)
    pr = cp / alpha
    return pr
faint ravine
#

Can you give a short summary of how you go about doing machine learning security? and some real world applications? I'd like to know more

pale thunder
#

why not just make those regular kw arguments

def prandtl(*, pr, alpha):
    pr = cp / alpha
    return pr
``` @lapis sequoia
raven mulch
#

Check out my video on adversarial samples and my other one on Lipschitz continuity

#

I think that does a good intro

#

Better than what I could explain here haha

#

But in short

#

We look at how we can attack networks

#

And where they fail (distribution shifts)

lapis sequoia
#

That works fine for a few arguments. But what if I have many arguments like 5 or 10 or more?

raven mulch
#

And we try to make them more robust against this

pale thunder
#

then you list them in the signature

#

if you are taking 10 arguments, you write 10 arguments

lapis sequoia
#

So what's the point of **kwargs when you can just define all the arguments"

pale thunder
#

when you do not know all the arguments, for example when extending a class, or things like the dict constructor, types.SimpleNamespace

faint ravine
#

Fail at what?

raven mulch
#

At the task at hand

#

A lot of ML is based under the iid assumption

#

So stuff doesn’t work for ood (out of distribution)

#

We try to fix that in ML sec

#

Or we come up with new attacks

faint ravine
#

Nice

lapis sequoia
#

How can I make this work for only (u, d, rho, mu) or (u, d, nu)?
If I invoke reynolds(0.25, 0.102, rho=910, nu=1.4e-6) then the function will still run.

def reynolds(u, d, rho=None, mu=None, nu=None):

    if u and d and rho and mu:
        re = (rho * u * d) / mu
    elif u and d and nu:
        re = (u * d) / nu
    else:
        raise ValueError('Must provide u, d, rho, mu or u, d, nu')

    return re
pearl crystal
#

Why are some well-known packages in python messy like sklearn?
For example in sklearn.preprocessing (StandardScaler), I should first call fit method and then transform?! Why? It is really messy and is a type of side effect

#

It should be a method like transform, does everything and returns standard data, really simple but I have to remember I first need to call fit to compute mean and std data and then transform it

lapis sequoia
#

Well this seems to work fine.

def reynolds(u, d, rho=None, mu=None, nu=None):

    if rho and mu and not nu:
        re = (rho * u * d) / mu
    elif nu and not rho and not mu:
        re = (u * d) / nu
    else:
        raise ValueError('Must provide (u, d, rho, mu) or (u, d, nu)')

    return re
pearl crystal
#

When we have different methods to do the same thing, all of them are acceptable?
np.tile
np.matlib.repmat

Which one do you prefer?
np.reshape(arr,[2,5]) --> numpy methods
arr.reshape([2,5]) --> object methods

graceful thunder
#

second one, it's cleaner

half bloom
#

what are you using to write your code in
@desert parcel jupyter

pearl crystal
#

jupyter is cool only for prototype and learning, I think

#

One big problem for me about jupyter is about IntelliSense and code completion, debugging, refactoring and git integration, bla bla. It is awful

odd yoke
#

you're not alone on that one, 100% agree

bitter harbor
#

I haven’t fully used it yet, but apparently spyder was built for data sci

brazen canyon
#

I use vscode + the inbuilt jupyter feature

austere swift
fervent bridge
#

Downloaded the Cars 169 data set and reading through the .mat file(never worked with mat) I am wanting to know how to get further details of the file currently I got [('annotations', (1, 16185), 'struct'), ('class_names', (1, 196), 'cell')] how do I get further details of 'annotations' and 'class_names'? Tried test['annotations'] nothing

drowsy kite
#

Hey guys does anyone have a cheat sheet or resource on how to predict values given that you have dummy columns?

#

im really confused on how you would identify the 1's and 0's when using .predict

fervent bridge
#

NVM my question already figured it out a while ago

drifting umbra
#

@drowsy kite you would need to decide on a model

#

if you have a lot of categorical variables i have found Catboost is faster and more accurate than XGBoost

#

let me know if you have questions

tidal sonnet
#

nice

buoyant cypress
#

hello data science people

#

I have a question which Im gonna crosspost

#

since this is probably the right place for it

uncut shadow
#

well, it's not connected with data science. But, what exactly do you mean? You could have just turned it to string and then just add , every 3 numbers

steady bronze
#

do you guys know how to return values which appear multiple times in a column using pandas

wintry sapphire
#

Hi guys, would anyone know why the output from np.polyfit is different from my own manual calculation through python?

tidal bough
#

@buoyant cypress probably can be done via just string formatting

uncut shadow
#

@buoyant cypress okay just use {n:,} where n is a number

tidal bough
#

Hi guys, would anyone know why the output from np.polyfit is different from my own manual calculation through python?
^ this is solved, by the way.

boreal swift
#

beat me to it

uncut shadow
boreal swift
#

There's also a way to make it locale aware

#

And in {n:,} you can replace comma with any symbol you want to seperate with

whole plover
#

I want to fit some data in a pandas dataframe using a custom lmfit model, however my output is shuffled around in a weird way

#

I'm using matplotlib for the plotting. the essence of the code is: ```python
result = model.fit(y, x=x, method="leastsq", params=params)

plt.scatter(x, y)
plt.scatter(x, result.best_fit)

#

any idea what is happening here?

#

the fitting model is a linear term with a numpy sin wave on top

grave frost
#

So pretty wide and vague query - anyone know a molde which uses transformers and is good for seq2seq or NMT purposes?

#

Would something like FairSeq would be considered good, or maybe some flavors of BERT like models like RoBerta or BART or even GPT-2, Would these be good models for direct sequence to sequence conversion?
I think FairSeq is pretty good in itself, since it is dedicated to seq2seq problem types. Would it then be a good idea to use BART, RoBerta and all the other NLP models out there?

#

@whole plover Any reason why you are using linear regression for such a data pattern?

whole plover
#

@grave frost Not intentionally no, isnt this a nonlinear fit?

#

I've never done fitting with python before so forgive me if im wrong

solid aurora
#

@pearl crystal .fit_transform()

#

basically does fit and transform in one step

uncut python
#

Can anyone tell me how can do you interpreted a linear regression graph as given here in figure a, b, and c

#

Interpret*

#

I can send more details or figure legends if needed

smoky meadow
#

What is better to use if I need only clear copy of numpy array, .copy() or np.copy()?

dusty sage
#

I'm considering doing a course on data science. Can someone recommend to me some good reading material that would show me the ropes

lapis sequoia
#

i mean both function doing same thing so u can choose which one u like u

#

personally i use np.copy() because i like the word np in my code

cosmic lynx
#

So I want to make a perfect AI for a fighting game, how far above my head am I getting? I don’t know anything about AI aside from how it works in theory.

cinder sage
#

No checking here either @cosmic lynx just fyi

cosmic lynx
#

I am confused

#

so should I just probe the internet instead?

cinder sage
#

You can ask general things, but not for us to help you cheat

grave frost
#

@cosmic lynx Do you want to use ML or just a generic game AI present in most single-player games??

#

@cinder sage How is asking for help cheating??

cosmic lynx
#

I was thinking ML just to see what insanity happens, who knows, it may find stuff like touch of death combos...

cinder sage
#

@grave frost they are asking to use AI to perform perfect actions in a game. My guess is that is against the ToS of said game.

grave frost
#

ML usually breaks the game because it exploits the game's engine, but yeah it is really powerfull if you use it correctly

#

However, RL does require some expertise. The more advanced your model, the more powerful actions it can take and the more it draws itself to above-human level of playing...

cosmic lynx
#

?
okay, now I think I get it. The game I’m planning on doing this in is like street fighter but with more stuff thrown in

grave frost
#

np. Model can handle it. If you want to ease into Reinforcement Learning (RL) best way is to use a simple DQN and bump up the complexity as you learn....

cosmic lynx
#

DQN?

grave frost
#

@cinder sage Why is using a bot aginst the Tos? As long as you don't "hack" or cheat it is considered fine. OpenAi made a model for DOTA 2. Since it got the same input as a human, it was allowed to play in the international tournament too...

#

@cosmic lynx Deep Q-learning Network. A very simple yet sometimes effective model. Good for simple games (Atari) and for beginners in RL...

cosmic lynx
#

Wait, a bot being allowed to play in a tournament? That’s interesting
I’ll have to look into it, thanks

grave frost
#

Anytime

cinder sage
#

@grave frost openAI had permission

grave frost
#

How does a bot get advantage? for many things, it usually a limitation as it can't take "blazing-fast actions" or use long-term strategy (consumer-level models). I don't see how it is cheating because you are basically limiting your own game score by allowing a bot to play...

cosmic lynx
#

also I don’t think I can hook hook up this bot to anything I can’t run on my potato....

grave frost
#

Do you have a GPU??

cosmic lynx
#

I have a 5 year old laptop that was middle end then....

grave frost
#

Well, You can't train a model without GPU. I suggest you look up Colab, Google's initiative to provide free GPU's with minimal setup. But It's not easy to do RL on Colab, so I suggest you get some GPU resources. A lappy isn't gonna cut it

cosmic lynx
#

F
Either way I was planning on buying a cheap desktop soon...

#

So much for AI tic-tac-toe bot perfecting the game....

grave frost
#

Just make sure it has a Nvidia GPU if you do want to do some ML. You can do ML with AMD GPU but it won't work perfectly and may lead to a lot of crashes and bugs.

#

@cosmic lynx I think there are some people who have done RL on CPU only but I guess it will take hell of a time then. If you are fine with running your laptop 24hr+ then I think you can get started right away