#data-science-and-ml | Python | Page 244

frail locust Aug 13, 2020, 2:28 PM

#

how do you type in code like that in discord

thin pecan Aug 13, 2020, 2:30 PM

#

@frail locust Use the character once in-between of words to do this:this`
If you want to do full code snippets, use it three times like:

this
Hello```

#

Wow

#

Use the ` key

frail locust Aug 13, 2020, 2:34 PM

#

ty

slow onyx Aug 13, 2020, 2:40 PM

#

Hi guys! Could please somebody help me with a computer vision task?

lapis sequoia Aug 13, 2020, 3:00 PM

#

Hey guys hows the mit course on linear algebra on youtube for data science ?

#

hey guys

arctic cliff Aug 13, 2020, 3:07 PM

#

Well ?

📎 unknown.png

#

WELL ?

📎 unknown.png

desert oar Aug 13, 2020, 3:33 PM

#

@frail locust for visual display, or actually in the number?

#

@arctic cliff are those strings "NaN"?

#

oh hm

#

thats actually a bit weird

arctic cliff Aug 13, 2020, 3:36 PM

#

The weirdest thing is the whole dataset has 9879 rows

#

Do 0 values affect ?

desert oar Aug 13, 2020, 3:37 PM

#

0 is not the same as null

#

so no

arctic cliff Aug 13, 2020, 3:37 PM

#

I see

orchid badge Aug 13, 2020, 3:50 PM

#

Hi, I’m using Tensorflow 1.15.3 / Ludwig with Google Colab and have followed this guide to the letter: https://www.searchenginejournal.com/automated-intent-classification-using-deep-learning-part-2/318691/#close
When I run !ludwig experiment --data_csv Question_Classification_Dataset.csv --model_definition_file model_definition.yaml
I get an error File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 491, in __getattr__ raise _exceptions.UnparsedFlagAccessError(error_message) absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --preserve_unused_tokens before flags were parsed.

Here’s the link to Colab if you want to take a peek: https://colab.research.google.com/drive/1LZ9aA06B3wXysgGc8tfVHi8FZqZFx2xZ?usp=sharing

Thanks for your time. Been Googling all afternoon and don't want to give up! Hopefully not too much of a noobish question.

desert oar Aug 13, 2020, 3:57 PM

#

@lapis sequoia maybe something like this ```python
import pandas as pd

data = pd.read_csv('data.csv', parse_dates=['timestamp'])

violations_hourly = data
.groupby([pd.Grouper(key='timestamp', freq='60T'), 'VIOLATED_DIRECTIVE'])
.apply(lambda x: x.shape[0])
.to_frame('count')
.reset_index()

fig, ax = plt.subplots()
for lab, grp in violations_hourly.groupby('VIOLATED_DIRECTIVE'):
ax.plot(grp['timestamp'], grp['count'], label=lab)
fig.legend()
fig.plot()

#

the "Grouper" thing i didnt know about before. got it from here: https://stackoverflow.com/a/32012129/2954547

Stack Overflow

Pandas: resample timeseries with groupby

Given the below pandas DataFrame:

In [115]: times = pd.to_datetime(pd.Series(['2014-08-25 21:00:00','2014-08-25 21:04:00',
'2014-08-25 22:07:00','2014-0...

#

using .count() itself for some reason resulted in an empty dataframe

#

not sure why

arctic cliff Aug 13, 2020, 4:03 PM

#

@desert oar
The blue df contains 2 int numbers
Can I split the x number like the y ?

📎 unknown.png

lapis sequoia Aug 13, 2020, 4:04 PM

#

thanks @desert oar I'll test it out and let you know

#

many thanks

desert oar Aug 13, 2020, 4:06 PM

#

@arctic cliff

ax = plt.gca()
ax.set_xticklabels(blue['blueDragons'].unique())

maybe try something like this

arctic cliff Aug 13, 2020, 4:16 PM

#

AttributeError: 'numpy.int64' object has no attribute 'unique'
@desert oar

#

It's only one value
But I want to automatically split it into 7 pieces so the plot can be more logical

#

I still don't know if I should let it like that ..

desert oar Aug 13, 2020, 5:04 PM

#

oh

#

how does that make sense

#

how did you even plot a single value

#

ohhh i see what you did

#

ax = plt.gca()
ax.set_xticklabels(blue.columns)

#

@arctic cliff

#

anyway ax.set_xticklabels is what you want

#

plt.gca() is "Get Current Axis"

#

an "axis" (in matplotlib terminology) is the area that you plot in

#

a "figure" is a grid of one or more axes

modest rune Aug 13, 2020, 5:20 PM

#

In scipy, If I understand everything correctly, scipy.norm.cdf(x) returns the percent chance <=X will occur with respect to a normal distribution curve. Thus X = 0, is 0.5. X = Infinity, is 1.0 and X = -Infinity, is 0.0.

But, can someone explain what scipy.normal.pdf(x) does?

#

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

desert oar Aug 13, 2020, 5:20 PM

#

the pdf is the probability density

#

for a continuous distribution its not really interpretable in and of itself, but

#

its analogous to density in physics/chemistry

#

integral of density within a certain domain = mass in that domain

modest rune Aug 13, 2020, 5:21 PM

#

for a continuous distribution its not really interpretable in and of itself, but
@desert oar

That is what is confusing me. Because, scipy seems to let you do this.

desert oar Aug 13, 2020, 5:22 PM

#

they have a lot of use

#

just not a lot of interpretation without more context

tidal bough Aug 13, 2020, 5:22 PM

#

https://en.wikipedia.org/wiki/Probability_density_function
Basically, the PDF is the derivative of the function you mentioned.

desert oar Aug 13, 2020, 5:22 PM

#

^

modest rune Aug 13, 2020, 5:22 PM

#

Ok, so PDF(0) is equal to 0

desert oar Aug 13, 2020, 5:23 PM

#

no

tidal bough Aug 13, 2020, 5:23 PM

#

they are both used, because well, you can easily get one from the other.

desert oar Aug 13, 2020, 5:23 PM

#

PDF(x) is derivative of CDF at x

#

PDF(x) = (d CDF / dx)(x)

#

it makes more sense intuitively if you look at discrete distributions

#

for which PDF(x) = P(X = x)

#

whereas for a continuous distribution P(X = x) is always 0 because measure theory

modest rune Aug 13, 2020, 5:25 PM

#

ok, that makes sense. So, if I have a function... like the black scholes model that has one or more CDFs in it. And I want to find the derivative of that function, all CDFs turn into PDFs.

desert oar Aug 13, 2020, 5:25 PM

#

so you can only talk about P(X <= x) in a continuous distribution, which is the CDF

#

correct

tidal bough Aug 13, 2020, 5:25 PM

#

https://www.desmos.com/calculator/njtytzquvt
Here, I made a graph.

modest rune Aug 13, 2020, 5:25 PM

#

excellent, you all were great help 🙂

tidal bough Aug 13, 2020, 5:25 PM

#

The red is the PDF of a gauss (normal) distribution, the blue is its CDF - for the normal distribution, it's called the error function.

arctic cliff Aug 13, 2020, 5:28 PM

#

I guess my plotting itself is wrong .. @desert oar

#

My columns is contained of 0 or 1

#

So I'm summing

#

That's why I end up with only one value

modest rune Aug 13, 2020, 5:28 PM

#

I ran into this while learning how to backsolve the implied volatility of a stock option from its current market price. The process required the usage of the derivative of the black scholes model.

desert oar Aug 13, 2020, 5:30 PM

#

maybe something like this

blue = df[['blueDragons', 'blueWins']].sum()
plt.plot([0, 1], blue)
plt.gca().set_xticklabels(blue.index)
plt.xlabel('Dragon Effect')
plt.ylabel('Winnings')

#

@arctic cliff

arctic cliff Aug 13, 2020, 5:30 PM

#

📎 vzzGE9NRMqQgl9E9qsIPEBwOHw5wScFBrt7QUxnJSJlSof6RUREUoj2EVERFKIgl9ERCSFKPhFRERSiIJfREQkhSj4RUREUoiCX0.png

#

Oh

#

forgot something..

#

📎 CIiIklEwS8iIpJEFPwiIiJJRMEvIiKSRBT8IiIiSUTBLyIikkTH4yXkNKD5AjKAAAAAElFTkSuQmCC.png

#

It's the same as before ..

#

📎 unknown.png

#

Does that plot makes sense to you ?
Can you get any useful info from it ?

#

Maybe it's right I don't know

tidal bough Aug 13, 2020, 5:36 PM

#

the x-axis is missing on the new ones

arctic cliff Aug 13, 2020, 5:37 PM

#

Guess I don't even need a visualization for that kind of comparing ?

#

Well, They're actually not
X and Y are only 2 values
1 for x
1 for y

desert oar Aug 13, 2020, 5:37 PM

#

the idea of my code was to try and control the X axes more. you can keep experimenting

#

but yeah you should just print those values imo

#

no purpose in graphing 2 points

tidal bough Aug 13, 2020, 5:37 PM

#

wait, you have two points?

desert oar Aug 13, 2020, 5:37 PM

#

@tidal bough they did .sum() on 2 columns

tidal bough Aug 13, 2020, 5:37 PM

#

Then yeah, lol, maybe don't connect them with a line, that's very misleading 😅

desert oar Aug 13, 2020, 5:38 PM

#

df[['a', 'b']].sum() returns a Series with 2 numbers and index values 'a' and 'b'

tidal bough Aug 13, 2020, 5:38 PM

#

it's not really something you see in How To Lie With Statistics. Not even there do people get 2 points of data and pretend it's a straight line.

desert oar Aug 13, 2020, 5:38 PM

#

i think they're just trying to make a plot lol

#

matplotlib docs are a swamp

arctic cliff Aug 13, 2020, 5:39 PM

#

xD Gotcha !

tidal bough Aug 13, 2020, 5:39 PM

#

they are

desert oar Aug 13, 2020, 5:39 PM

#

try plt.scatter or plt.plot(x, y, '.')

#

or plt.plot(x, y, 'o') (i think)

tidal bough Aug 13, 2020, 5:39 PM

#

or marker = "d",linestyle="", I think

arctic cliff Aug 13, 2020, 5:39 PM

#

I tried the scatter thing on both the series and the original df columns

#

I got some weird outputs

#

Not weird if they make sense ..

#

Something you can expect from 0 and 1

desert oar Aug 13, 2020, 5:40 PM

#

df.plot.scatter('blueDragons', 'blueWins')

did you try this?

#

oh yeah just 0 and 1

#

how about a cross table

arctic cliff Aug 13, 2020, 5:40 PM

#

What's a cross table ?

desert oar Aug 13, 2020, 5:40 PM

#

pd.crosstab(df['blueDragons'], df['blueWins'])

arctic cliff Aug 13, 2020, 5:40 PM

#

Wait

#

Oh

#

I would prefer printing

#

📎 unknown.png

bleak fox Aug 13, 2020, 7:11 PM

#

@arctic cliff is the issue resolved? If not please provide me some background I may help you in this

arctic cliff Aug 13, 2020, 7:13 PM

#

Give me a second

#

@bleak fox Take a look at this:
https://www.kaggle.com/potatomanduh/league-of-legends-dragons-effect-on-winning

League Of Legends: Dragons Effect on Winning

Explore and run machine learning code with Kaggle Notebooks | Using data from League of Legends Diamond Ranked Games (10 min)

bleak fox Aug 13, 2020, 7:17 PM

#

@arctic cliff thanks, i have gone through with this... Now please share what is the exact problem which you are facing?

arctic cliff Aug 13, 2020, 7:18 PM

#

I tried to make a plot about the relationship between blue/redWins and blue/redDragons

bleak fox Aug 13, 2020, 7:20 PM

#

@arctic cliff what is the point in plotting these 4 points...?

arctic cliff Aug 13, 2020, 7:20 PM

#

To show out the relationship between the correlation of both of them

bleak fox Aug 13, 2020, 7:22 PM

#

To show correlation we generally use the scatter plot, hence you can use df.bluewins vs df.redwins (all values)

#

Also for correlation, you can use df.corr() , to print

arctic cliff Aug 13, 2020, 7:23 PM

#

Give me a second

bleak fox Aug 13, 2020, 7:25 PM

#

df[['blueWins','blueDragons', 'redWins','redDragons']].corr()

arctic cliff Aug 13, 2020, 7:26 PM

#

📎 5LUiMu5KdcJEnnwKBLUiMMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ14v8A3LnA2qWBVkIAAAAASUVORK5CYII.png

#

plt.scatter(df['blueWins'], df['blueDragons'])

bleak fox Aug 13, 2020, 7:27 PM

#

Use the all values of these points, does your main df has only 4 rows?

arctic cliff Aug 13, 2020, 7:27 PM

#

Nope
But it contains only 2 kind of values

#

0 and 1

#

1 = Won
0 = Lost

bleak fox Aug 13, 2020, 7:27 PM

#

Can you share access of your notebook with me?

lapis sequoia Aug 13, 2020, 7:27 PM

#

does anyone know why this sql query is not working in the 'WHERE' clause

arctic cliff Aug 13, 2020, 7:27 PM

#

Sure thing, Wait

bleak fox Aug 13, 2020, 7:28 PM

#

does anyone know why this sql query is not working in the 'WHERE' clause
@lapis sequoia share query

lapis sequoia Aug 13, 2020, 7:28 PM

#

select extract(month from tstamp) as mon, extract(year from tstamp) as yyyy, count(number)
FROM table
WHERE mon != 8 and yyyy != 2020
GROUP BY 1,2
ORDER BY 2,1

#

getting column 'mon' does not exist

#

in the where clause

#

using psql

#

it works fine when i exclude the where clause but the columns are named mon and yyyy

bleak fox Aug 13, 2020, 7:29 PM

#

Put "mon" And same for "yyyy" And try once

lapis sequoia Aug 13, 2020, 7:29 PM

#

in the select or where clause?

arctic cliff Aug 13, 2020, 7:29 PM

#

I guess I need to search for you to add you to the Collaborators ?

bleak fox Aug 13, 2020, 7:30 PM

#

in the select or where clause?
@lapis sequoia in where clause

#

I guess I need to search for you to add you to the Collaborators ?
@arctic cliff kapil.task.pro@gmail.com

lapis sequoia Aug 13, 2020, 7:30 PM

#

im still getting a column "mon" does not exist

arctic cliff Aug 13, 2020, 7:30 PM

#

Do you have a kaggle account ?

lapis sequoia Aug 13, 2020, 7:30 PM

#

WHERE "mon" != 8 and "yyyy" != 2020

bleak fox Aug 13, 2020, 7:31 PM

#

Do you have a kaggle account ?
@arctic cliff https://www.kaggle.com/kapilpanwar

Kapil Panwar | Kaggle

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

arctic cliff Aug 13, 2020, 7:31 PM

#

Done

#

https://www.kaggle.com/potatomanduh/league-of-legends-dragons-effect-on-winning

League Of Legends: Dragons Effect on Winning

Explore and run machine learning code with Kaggle Notebooks | Using data from League of Legends Diamond Ranked Games (10 min)

bleak fox Aug 13, 2020, 7:32 PM

#

WHERE "mon" != 8 and "yyyy" != 2020
@lapis sequoia nav, the name you are changing as mon and yyyy are just to display you, you can use the same conversion for where aswell like where year from tstmp! = x

lapis sequoia Aug 13, 2020, 7:33 PM

#

oh yeah i understand that

#

i just wanted to know why it doesnt work for an alias

#

so i cant use an alias in the where clause?

bleak fox Aug 13, 2020, 7:40 PM

#

i just wanted to know why it doesnt work for an alias
@lapis sequoia your where query is still on db side with actual column names where as alias is just giving a new name after extraction, now where is called before data extraction... Hence where always require the actual column name

#

@arctic cliff added heatmap and correlation matrix for your data in notebook cell 6 and 7

#

📎 Screenshot_2020-08-14-01-09-10-355_com.android.chrome.jpg

lapis sequoia Aug 13, 2020, 7:41 PM

#

ok thank you @bleak fox

#

also last question

bleak fox Aug 13, 2020, 7:42 PM

#

also last question
@lapis sequoia I'll try

lapis sequoia Aug 13, 2020, 7:42 PM

#

select extract(month from tstamp) as mon, extract(year from tstamp) as yyyy, count(number)
FROM table
WHERE extract(month from tstamp) != 8 and extract(year from tstamp)  != 2020
GROUP BY 1,2
ORDER BY 2,1```

#

why does this result in all the rows containing 8 in the month AND all the rows containing 2020 in year to be lost?

arctic cliff Aug 13, 2020, 7:42 PM

#

What can I understand from a heatmap? It looks unfamiliar to me

lapis sequoia Aug 13, 2020, 7:42 PM

#

i just want 8-2020 to be lost

#

the where clause seems to be the problem

bleak fox Aug 13, 2020, 7:43 PM

#

select extract(month from tstamp) as mon, extract(year from tstamp) as yyyy, count(number)
FROM table
WHERE extract(month from tstamp) != 8 and extract(year from tstamp)  != 2020
GROUP BY 1,2
ORDER BY 2,1```

@lapis sequoia this is what your filter is doing if month is 8 and year is 2020 don't include them...

lapis sequoia Aug 13, 2020, 7:44 PM

#

yes

#

but it gets rid of 8-2017, 8-2019, 8-2020, AND all the months of 2020

#

so it gets rid of 1-2020, 2-2020, 3-2020 as well

#

i just want only 8-2020 gone

#

ok wait i changed it to OR and it worked

#

i dont know why lol

bleak fox Aug 13, 2020, 7:46 PM

#

What can I understand from a heatmap? It looks unfamiliar to me
@arctic cliff is is giving correlation betwee all your columns, values near to 1 shows good correlation where near to 0 shows they are independent

#

so it gets rid of 1-2020, 2-2020, 3-2020 as well
@lapis sequoia happy for you😀

lapis sequoia Aug 13, 2020, 7:47 PM

#

thanks!

#

for helping

bleak fox Aug 13, 2020, 7:50 PM

#

https://www.youtube.com/channel/UCChfG4FWN6qSPFqZF_E9XvA/videos please join and support 💪

YouTube

AI ML with Kapil Panwar

proper swift Aug 13, 2020, 7:50 PM

#

quick question, (apologies if im using the wrong terminology) is there a way to stop jupyter notebooks from automatically moving the notebook every time i click a cell? Its driving me insane

bleak fox Aug 13, 2020, 7:53 PM

#

quick question, (apologies if im using the wrong terminology) is there a way to stop jupyter notebooks from automatically moving the notebook every time i click a cell? Its driving me insane
@proper swift can you please elaborate what is moving and where?

proper swift Aug 13, 2020, 7:55 PM

#

sorry first time using jupyter, everytime i try to click the end of piece of code, the notebook jaggedly moves. I have to scroll with the mouse to get it back into a more suitable position
I have no additional extensions installed. Im only using vanilla Jupyter on Windows 10 with Python 3.8

bleak fox Aug 13, 2020, 7:58 PM

#

sorry first time using jupyter, everytime i try to click the end of piece of code, the notebook jaggedly moves downwards. I have to scroll with the mouse to get it back into more suitable position
@proper swift sorry bro... It seems some issue with your browser/os/jupyter settings.. It is outside my scope... 😩

proper swift Aug 13, 2020, 7:59 PM

#

😦

bleak fox Aug 13, 2020, 8:00 PM

#

@proper swift you can use vs code notebooks... I feel they are better than jupyter notebooks

proper swift Aug 13, 2020, 8:01 PM

#

Good to know. Sadly , i'm following a tutorial on Pandas which is using jupyter notebooks

bleak fox Aug 13, 2020, 8:02 PM

#

Good to know. Sadly , i'm following a tutorial on Pandas which is using jupyter notebooks
@proper swift look it is just a place where you write code... You will be easily able to do things in vs code notebooks with same commands...

#

Good to know. Sadly , i'm following a tutorial on Pandas which is using jupyter notebooks
@proper swift check this out https://youtu.be/sHk9PH-9tSs

YouTube

EvidenceN

How to install jupyter notebook in visual studio code

Follow me on twitter: https://evidencenmedia.com/twitter
In depth tutorial about how to get and open jupyter notebook inside visual studio code.

This is your opportunity to support the work I am doing.

Become a member of our exclusive data science community where we do pro...

▶ Play video

proper swift Aug 13, 2020, 8:04 PM

#

thansk for the link, will check it out

bleak fox Aug 13, 2020, 8:04 PM

#

@proper swift welcome...

#

thansk for the link, will check it out
@proper swift favour me in supporting my channel too, we have also started data science course from scratch https://www.youtube.com/channel/UCChfG4FWN6qSPFqZF_E9XvA/videos

YouTube

AI ML with Kapil Panwar

#

@bleak fox Are u a data science student?
@lapis sequoia no, i am a professional with 7+ year of experience in this field.

tidal sonnet Aug 13, 2020, 9:42 PM

#

link to where i can find out more about data science?

#

other than wikipedia?
Or is wikipedia reliable?

bleak fox Aug 13, 2020, 10:10 PM

#

link to where i can find out more about data science?
@tidal sonnet https://www.youtube.com/channel/UCChfG4FWN6qSPFqZF_E9XvA/videos

YouTube

AI ML with Kapil Panwar

bitter harbor Aug 13, 2020, 10:11 PM

#

@tidal sonnet Data sci is more than ml tho

tidal sonnet Aug 13, 2020, 10:11 PM

#

ik

#

i picked py cause i wanted to learn ml
but i also want to know more about other parts of data science

bitter harbor Aug 13, 2020, 10:13 PM

#

well imo, databases are a big part of it

#

as in, if you learn to use them, they'll be pretty useful

#

numpy + pandas + matplotlib are some key libraries to learn as well

tidal sonnet Aug 13, 2020, 10:22 PM

#

noted

#

thank youuuu

desert oar Aug 13, 2020, 10:36 PM

#

python is pretty much used in every field nowadays

bitter harbor Aug 13, 2020, 10:36 PM

#

ngl still haven't used it for ela

desert oar Aug 13, 2020, 10:38 PM

#

ela?

bitter harbor Aug 13, 2020, 10:38 PM

#

english

desert oar Aug 13, 2020, 10:39 PM

#

oh. people use it in the humanities, albeit more rarely

#

pandoc (in haskell) was written by a philosophy professor, if i recall right

bitter harbor Aug 13, 2020, 10:50 PM

#

hm I'll have a look into it

desert oar Aug 13, 2020, 10:53 PM

#

you have cases where people in the humanities write python scripts to manage their reference lists, things like that

#

not typically used directly in research, but can definitely be used as an automation tool by researchers

solid aurora Aug 13, 2020, 11:24 PM

#

So I'm trying to write a kfolding algorithm that maintains class balance (like sklearn's StratifiedKFolds) and doesn't split groups (like sklearn's GroupKFolds)

#

I'm not sure how to go about doing that though

#

any ideas for a basic algorithm I could follow?

flat quest Aug 13, 2020, 11:29 PM

#

@solid aurora
Not really any particular algorithm, except for getting all the elements for each class, find how many elements you need to have an even ration, and then throw away the extra elements (this could be problematic tho)

solid aurora Aug 13, 2020, 11:29 PM

#

@flat quest I'm not trying to delete elements from my dataset at all

#

If my class balance is 1:4, StratifiedKFolds will make all folds approximately 1:4 as well

#

meaning it purposely tries to maintain that ratio rather than leaving it up to probability

flat quest Aug 13, 2020, 11:34 PM

#

well one way to go about it would be lets say your fold has 1000 elements and there's a ratio of 4 cats to 1 dog.

The dataset has 4000 cats to 1000 dogs. So you calculate the number of cats that you'll need, then get all the cats in the ds and randomly select that number of elements you need, then do the same for the dogs.

There's probably a faster way to do it, but that's just one way to do it @solid aurora

solid aurora Aug 13, 2020, 11:52 PM

#

@flat quest then I also need to not split groups across folds (like GroupKFolds)

#

what you described is basically StratifiedKFolds, which is what I would normally use unless I needed groups

#

going with your example of cats+dogs, let there be owners who each own anywhere from 1 to 100 cats and dogs

#

owners can have both cats and dogs

#

then I'm not allowed to split the pets of an owner across two or more different folds

#

but let's say I have 1000 pets and I want 5 folds, then I still need 160 cats and 40 dogs per fold

#

@flat quest make sense?

flat quest Aug 14, 2020, 12:38 AM

#

ah gotcha

So not split groups across folds, but you want to balance the overall classes
You can't get perfectly equal class ratios across each fold, but you can make an approximation

Ok so one way would be.

Calculate the number of elements for each class that should be in the fold.
Then select the group that can reduce the required number of elements for that fold by the greatest (so lets say fold needs 4000 cats 1000 dogs, but group has 80 cats 20 dogs, the remaining elements required would be 3920 cats 980 dogs).
Continue doing so until we hit a certain threshold for all classes.

This would require some calculation steps to find the one that can reduce the required elements by the greatest. It might be slow. It'll also congregate all the large groups into the first few folds.

Another way would be to again calculate the number of elements of each class for the fold, but then select a group at random. Continue to do so, until one or all the classes have surpassed a threshold. This one is faster and will distribute the groups better, but it will be more error prone.

For example we might have many groups that are 100 cats 1 dog. This might cause the class distribution for the fold to be like 8,000 cats to 1000 dogs when all the class counts reach their threshold.

@scarlet wigeon

fervent bridge Aug 14, 2020, 1:15 AM

#

A good complete blood cell dataset that I could be linked to?

bleak fox Aug 14, 2020, 4:47 AM

#

@tidal sonnet Data sci is more than ml tho
@bitter harbor 100% right

solid aurora Aug 14, 2020, 5:06 AM

#

@flat quest yea you're right that will probably be close enough

#

I was vastly overcomplicating it lol

#

I was trying to liken this to the packing problem and 0/1 knapsack and all

#

🙂

flat quest Aug 14, 2020, 5:49 AM

#

yeah maybe, but if you still want to try that route, by all means 😉

I don't think it would provide much of an improvement @solid aurora
But you never know right?

small orbit Aug 14, 2020, 6:08 AM

#

https://www.kaggle.com/abhishekvaid19968/data-visualization-using-matplotlib-seaborn-plotly

Data-Visualization-Using-MATPLOTLIB-SEABORN-PLOTLY

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

drowsy kite Aug 14, 2020, 6:18 AM

#

hey guys, has anyone seen an example of a model being deployed on online streamlit?

#

im a little confused on how it works in terms of running the model on a server

#

like i know ordinarily you load the model onto a pickle file on flask

#

but all the examples ive seen with streamlit run the actually model before rendering the prediction

drowsy kite Aug 14, 2020, 6:46 AM

#

nvm guys on the community had solution

safe sparrow Aug 14, 2020, 9:53 AM

#

In keras, if im working on a multi-channel input layer, and throw a cnn onto that layer, does the cnn get applied to all channels, and how?

#

input = Input(shape=(100, 100, 4,))
x = Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same')(input)

#

is x added to all channels, and if so, how?

velvet thorn Aug 14, 2020, 10:08 AM

#

no

#

okay, wait

#

what do you mean

#

never mind, I think I get what you meant

#

yes

eager ledge Aug 14, 2020, 10:13 AM

#

Hi all, Pandas beginner here. Just wondering how it's possible to aggregate the value column as a weighted average?

#

I've found this ```
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ewm.html

#

df.columns = ['material_id', 'Thickness', 'Material', 'Width', 'Quantity', 'Date']
        pd.to_datetime(df['Date'])
        df = df.pivot_table(index=['Material', 'Thickness', 'Width'],
                            columns=[],
                            aggfunc=df.ewm(times='Date', halflife=datetime.timedelta(days=60)).mean(),
                            values='Quantity')

#

Her's where I'm up to, but I keep getting ```
IndexError: list assignment index out of range

ripe forge Aug 14, 2020, 11:23 AM

#

Full trace back? Which line exactly gives that error

eager ledge Aug 14, 2020, 11:34 AM

#

I found it was actually caused by another exception, it's trying to convert another column to a float?

#

I'm now getting a keyerror for 'Date' with the above code

#

KeyError: 'Date'```

#

Also have tried this ```
df = df.groupby(by=['Material', 'Thickness', 'Width']).agg({'Quantity': ['mean', df.ewm(times='Date', halflife=datetime.timedelta(days=60)).mean()]})

#

Which causes the original exception, which is ValueError: could not convert string to float: which relates to the Material column

acoustic halo Aug 14, 2020, 11:47 AM

#

I need some ensembling advice: I have 5 neural nets I want to ensemble, I got good results with weighted averaging their softmax outputs. Now I want to try and apply weights to each models individual class prediction scores. I tried LR but it seems to be slightly worst than weighted average, plus I have to split my validation set up to train the weights

#

What else could I try?

#

Infact, should i train my stacking LR model on the validation or training data?

ripe forge Aug 14, 2020, 12:02 PM

#

Never train on validation. Defeats the whole purpose of a validation dataset

acoustic halo Aug 14, 2020, 12:02 PM

#

The general consensus is that you learn the weights for weighted averaging on the validation data rather than training

#

Which is why i did it

ripe forge Aug 14, 2020, 12:05 PM

#

Uh, I'm not aware of such a general consensus but it seems wrong to me. Maybe I'm out of the loop.

#

Common sense dictates it's wrong though, so actually no, I'd challenge that statement.

acoustic halo Aug 14, 2020, 12:06 PM

#

"A smarter way to ensemble classifiers is to do a weighted average, where the weights are learned on the validation data"

#

Which is in a book written by the author of keras

#

So idk

ripe forge Aug 14, 2020, 12:06 PM

#

I guess the idea is to prevent overfit? It's so leaky though

#

Why would it not leak info from holdout into our actual ensemble?

acoustic halo Aug 14, 2020, 12:07 PM

#

I am guessing since the whole models are weighted rather t6han individual outputs, its not as much of a concern

#

But yes, there would to some degree

ripe forge Aug 14, 2020, 12:09 PM

#

I'll refrain from further comment on this, this is out of my depth and just feels wrong, which may simply be due to a gap in my knowledge.

#

If theres some material around this that you or someone else encounters and can share, I'd greatly appreciate it

acoustic halo Aug 14, 2020, 12:11 PM

#

That quote is the only real information I have on the matter

spark cape Aug 14, 2020, 12:13 PM

#

in a frequentist model, your model should never adapt to the validation data. in a baysian model, it will because that's how it works. I wonder if the meaning behind the keras book's author's comment was in this spirit

#

the weights are adapted as you go

acoustic halo Aug 14, 2020, 12:17 PM

#

The only thing that I can think that makes a bit of sense is that if we treat the average model weights as hyperparameters like we would when optimising the models, testing them on the validation set and picking the best parameters based on validation accuracy

spark cape Aug 14, 2020, 12:23 PM

#

thats the definition of overfitting

acoustic halo Aug 14, 2020, 12:24 PM

#

Then I have no idea why, but everywhere uses the validation set that i can see

spark cape Aug 14, 2020, 12:25 PM

#

maybe it's a nomenclature thing? ime it's training and test data. your model never sees the test data until you're super convinced that the model is robust and sound.

acoustic halo Aug 14, 2020, 12:26 PM

#

I have 3 sets, train, validation and test

spark cape Aug 14, 2020, 12:26 PM

#

ok i never heard of validation data then

acoustic halo Aug 14, 2020, 12:27 PM

#

Validation is basically just for hyperparam optimisation

#

and is representative of the test set

spark cape Aug 14, 2020, 12:28 PM

#

couldn't that be done with cross validation style chopping up the data?

acoustic halo Aug 14, 2020, 12:28 PM

#

Yeah, basically, but the competition specifically hands out the validation set for that purpose, with the test set unlabelled until it closes

spark cape Aug 14, 2020, 12:30 PM

#

and is representative of the test set
ok my background would be training data = historical data; test data = paper trading. so you can't have a data set that 'represents' data that doesn't exist yet

acoustic halo Aug 14, 2020, 12:30 PM

#

I get what you mean, but it's for a competition, so the test set is already defined

#

It's just that it hasn't been labelled unlike the val set

#

Effectively the entire dataset is split into the 3 sets, using the test and validation, provide a model that best predicts the unlabelled test set, each set is made up randomly from the entire corpus

velvet thorn Aug 14, 2020, 12:32 PM

#

some people have a train-validation-test split

#

so instead of something like K-fold cross-validation

#

you have a fixed validation set

#

of course, this can lead to overfitting your hyperparameters to the validation set

#

which cross-validation is more robust to

#

but in either case you find out on the test set.

#

which cross-validation is more robust to
@velvet thorn and "more robust", not "immune to".

#

so

#

Yeah, basically, but the competition specifically hands out the validation set for that purpose, with the test set unlabelled until it closes
@acoustic halo and yes, this does happen.

#

a fair bit.

#

but ultimately the point is that you perform hyperparameter tuning on a subset of the data you have that is not seen by the model, right?

acoustic halo Aug 14, 2020, 12:34 PM

#

Okay, so then in my case, where i have the 3 sets, which do I use for weighted averaging?

velvet thorn Aug 14, 2020, 12:34 PM

#

and that you perform ultimate evaluation of the model upon a set that has not been seen at all, even for hyperparameter tuning

acoustic halo Aug 14, 2020, 12:34 PM

#

@velvet thorn yes

velvet thorn Aug 14, 2020, 12:34 PM

#

Okay, so then in my case, where i have the 3 sets, which do I use for weighted averaging?
@acoustic halo context?

#

weighted averaging of?

acoustic halo Aug 14, 2020, 12:35 PM

#

Each model is trained on the train set and hyperparam optimisation is done on validation set

quick fox Aug 14, 2020, 12:35 PM

#

Hi everyone,

I have a problem I've been working on for a couple of days and I just can't find a solution to it. I have a 4 identical dataframes with a Multilayered Columns and a single-layered Index.

Each Dataframe consists of one sample with different dilutions and for each dilution 3 seperate measurements were taken. So all dilutions are grouped under sample and all replicates are grouped under their dilution.

I want to combine these different Dataframes so that the replicates of all 4 samples are grouped next to each other. So it's a kind of nested merge.

I tried merging two Groupby objects the following way:

for group, group2 in zip(df1, df2):
pd.merge(group, group2, on="Label for level2)

But I get an error saying Grouped Objects cannot be merged. I tried looking for a solution but I'm not even sure how exactly to search exactly what I am looking for. Any help is greatly appreciated.

Thanks a lot

acoustic halo Aug 14, 2020, 12:35 PM

#

Now i want to weighted average the outputs

#

Do I use the train set to find the weights or the validation set

spark cape Aug 14, 2020, 12:36 PM

#

@quick fox oh my god i spent like a week trying to merge two dataframes with multi column indices. I gave up and walked the lists and wrote the join myself because nothing made sense.

velvet thorn Aug 14, 2020, 12:36 PM

#

it depends on your process

#

but in general I would say the train set...?

acoustic halo Aug 14, 2020, 12:36 PM

#

This is the problem, I would have thought the train set too

velvet thorn Aug 14, 2020, 12:36 PM

#

unless

acoustic halo Aug 14, 2020, 12:37 PM

#

But resources i find online use the validation set

velvet thorn Aug 14, 2020, 12:37 PM

#

you fit on the validation set and evaluate on the test set and stop there

acoustic halo Aug 14, 2020, 12:37 PM

#

specifically the author of keras says use validation set

velvet thorn Aug 14, 2020, 12:37 PM

#

@quick fox if you don't show sample data it's gonna be hard for you to get help

quick fox Aug 14, 2020, 12:37 PM

#

@spark cape Yeah I'm starting to get desperate, too. If it comes to it I'll have to do it by hand which I really don't want to

velvet thorn Aug 14, 2020, 12:38 PM

#

it doesn't sound like a very simple problem

#

so if you want someone to be able to work on it, you need a way for them to easily reproduce the initial situation

#

as well as know what the expected result is

#

in my time on SO

#

I've seen a lot of pandas questions go unanswered because it's not clear what people want

#

and in general, written explanations are bad.

#

data is good.

#

code that can be copy-pasted to create initial state is best

quick fox Aug 14, 2020, 12:39 PM

#

Yeah I get that. I thought this might be a problem that's easy to solve for someone more experienced than I

velvet thorn Aug 14, 2020, 12:39 PM

#

it's not that the problem is difficult to solve

#

it's that it's difficult to explain

#

and if I don't know what your problem is, I can't help you with it

#

just as you have no idea what to search for, I have no idea what your data actually looks like

quick fox Aug 14, 2020, 12:41 PM

#

Alright I'm on the phone right now. I guess a picture won't do?

velvet thorn Aug 14, 2020, 12:41 PM

#

well

#

if it's not immediately obvious how to solve it

lapis sequoia Aug 14, 2020, 12:41 PM

#

hey @desert oar thanks for the help yesterday

velvet thorn Aug 14, 2020, 12:41 PM

#

I probably won't go any further than hazarding a guess

#

but someone else might

#

so why not

#

specifically the author of keras says use validation set
@acoustic halo which book

#

is this from

#

by the way

acoustic halo Aug 14, 2020, 12:42 PM

#

@velvet thorn Deep learning with Python by Francois CHollet

velvet thorn Aug 14, 2020, 12:42 PM

#

I have read that

#

which page?

acoustic halo Aug 14, 2020, 12:42 PM

#

265

#

Also found this:
"Finding the weights using the same training set used to fit the ensemble members will likely result in an overfit model. A more robust approach is to use a holdout validation dataset unseen by the ensemble members during training."

#

https://machinelearningmastery.com/weighted-average-ensemble-for-deep-learning-neural-networks/

velvet thorn Aug 14, 2020, 12:44 PM

#

yup

#

you fit on the validation set and evaluate on the test set and stop there
@velvet thorn should be this

#

I mean

#

the thing is

#

you're basically doing a form of boosting

acoustic halo Aug 14, 2020, 12:45 PM

#

Except the part where I can't evaluate on the test set

velvet thorn Aug 14, 2020, 12:45 PM

#

if you use the same dataset that the base learners are trained on

#

right?

acoustic halo Aug 14, 2020, 12:45 PM

#

Because it's unlabelled

velvet thorn Aug 14, 2020, 12:45 PM

#

yeah

#

so

#

what I would suggest is

#

split your data further

#

the train set

#

train base learners on t1, train meta-learner on t2, then evaluate on v

#

then final predictions on test set and submit that

#

like

acoustic halo Aug 14, 2020, 12:46 PM

#

hmm okay not a bad idea

velvet thorn Aug 14, 2020, 12:46 PM

#

I don't think it's wrong to train the meta-learner on the train set

#

but like I said

#

you're basically doing hardcore boosting

#

which is already fairly prone to overfitting

#

so

#

and yeah I mean meta-learning is probably pretty high variance already

#

what models are you using?

#

simple LR to combine?

acoustic halo Aug 14, 2020, 12:46 PM

#

NN

velvet thorn Aug 14, 2020, 12:46 PM

#

the metalearner

acoustic halo Aug 14, 2020, 12:47 PM

#

no, i'm learning the weights through nelder mead minimisation currently

velvet thorn Aug 14, 2020, 12:48 PM

#

oh, hm

#

I've actually never done that

#

but okay, that could work

acoustic halo Aug 14, 2020, 12:48 PM

#

actually, differential evolution, not nelder mead

velvet thorn Aug 14, 2020, 12:50 PM

#

oh, you can try

#

training with K-fold CV too

acoustic halo Aug 14, 2020, 12:53 PM

#

I might just be lazy and leave it as is, my reasoning being that the averaging weights are not learnt per se, they are just another hyperparameter optimisation selected on the basis of the performance over the validation set

#

Just like layer size is selected on validation set performance

desert oar Aug 14, 2020, 1:09 PM

#

@acoustic halo what are you doing? reinventing the wheel? 😛

#

oh ensembling

acoustic halo Aug 14, 2020, 1:10 PM

#

I don't even know anymore

velvet thorn Aug 14, 2020, 1:10 PM

#

☸️

#

man I got Kubernetes flashbacks there

desert oar Aug 14, 2020, 1:10 PM

#

isnt the basic stacking method just train a bunch of uncorrelated models, then fit linear regression on their predictions?

acoustic halo Aug 14, 2020, 1:10 PM

#

Yes

#

For stacking anyway

desert oar Aug 14, 2020, 1:11 PM

#

so what are you working on? im curious

acoustic halo Aug 14, 2020, 1:11 PM

#

https://sites.google.com/view/ai-soco-2020/

AI-SOCO 2020

#

I want to do average weight ensembling though, i tries stacking but got less than desirable results

#

fwiw, I actually won the first stage

desert oar Aug 14, 2020, 1:12 PM

#

congrats

#

what is average weight ensembling? never heard of that

#

https://pechyonkin.me/stochastic-weight-averaging/ this?

Max Pechyonkin

Stochastic Weight Averaging — a New Way to Get State of the Art Res...

Max Pechyonkin

acoustic halo Aug 14, 2020, 1:13 PM

#

https://machinelearningmastery.com/weighted-average-ensemble-for-deep-learning-neural-networks/

#

Literally adding all the softmax outputs from vafrious models together

#

but also applying a weight to each model output

#

It's super basic, but I was planning on using the validation set to find the weights

#

Which is a big no-no apparently, despite being used in every article i look at

desert oar Aug 14, 2020, 1:16 PM

#

well linear regression is a weighted average if you squint

acoustic halo Aug 14, 2020, 1:16 PM

#

Yeah, I used LR to learn weights for each softmax output from each model individually

#

also theres this rule:
"Participants are NOT allowed to use the development set or any external dataset (labeled or unlabeled) to train their systems."

#

So i don't want to train the meta model on the validation set per se

#

Though I would argue optimising the model weights is learnt in much the same way model hyperparameters are learnt

desert oar Aug 14, 2020, 1:23 PM

#

Yeah

#

What is the development set?

acoustic halo Aug 14, 2020, 1:23 PM

#

the validation set

desert oar Aug 14, 2020, 1:24 PM

#

Ah ok

acoustic halo Aug 14, 2020, 1:24 PM

#

they just call it dev

desert oar Aug 14, 2020, 1:24 PM

#

Yeah you wouldnt use that

#

Youd have to split your training data

acoustic halo Aug 14, 2020, 1:24 PM

#

darn

desert oar Aug 14, 2020, 1:24 PM

#

We actually have this problem at work, people want to use methods like temperature scaling and gold loss correction

#

All of those require "auxiliary" training sets

#

So if you get too aggressive using those methods you end up cutting down the size of your main training set significantly

#

Which can really hurt when you have a highly imbalanced problem or you are already low on data

acoustic halo Aug 14, 2020, 1:26 PM

#

So, this is my problem, I have to go back and retrain all my models on a smaller training set

#

Which is a massive pain

desert oar Aug 14, 2020, 1:26 PM

#

In some cases we have just reused the training set, but in those cases we were able to convince ourselves that the training so it wasn't significantly different from any other version of that data set we would have now or in the future

#

And we had to proceed very carefully to avoid overfitting

acoustic halo Aug 14, 2020, 1:27 PM

#

But, this still doesnt explain why everyone seems to get their weights on the validation set

desert oar Aug 14, 2020, 1:27 PM

#

Are they just breaking the rules? Lol

acoustic halo Aug 14, 2020, 1:27 PM

#

based on the above link and textbooks

velvet thorn Aug 14, 2020, 1:27 PM

#

didn't we discuss that earlier

desert oar Aug 14, 2020, 1:27 PM

#

Oh, yeah. We do that too

#

But it really makes your validation set less useful

#

Think of it this way, every "external" procedure requires another validation set

#

So if you only have one validation set you basically need to decide which procedure gets trained on the main training set and which procedure gets trained on the validation

#

In this case the rules of the contest tell you what your decision is, either you reuse the training set or you split off your own validation sets

acoustic halo Aug 14, 2020, 1:30 PM

#

This is what I originally thought, but my instructor insisted the rule was mainly in the context of using the validation set to train the neural nets

#

But what you said makes more sense

desert oar Aug 14, 2020, 1:31 PM

#

what kind of problem is this

#

regression? classification? how many / what kind of features?

acoustic halo Aug 14, 2020, 1:31 PM

#

Classification (1000 classes), features vary per model

#

But mainly n-grams, abstract syntax tree nodes and a special version of BERT

desert oar Aug 14, 2020, 1:33 PM

#

ah very similar to stuff ive worked on

#

how imbalanced are the classes

#

and how imbalanced are the features

#

(why cant i spell imbalanced today)

acoustic halo Aug 14, 2020, 1:34 PM

#

Classes are evenly split, features are alright

desert oar Aug 14, 2020, 1:35 PM

#

how many records

sterile bobcat Aug 14, 2020, 1:35 PM

#

If you want links for Machine Learning and AI learning courses and files send me a message

acoustic halo Aug 14, 2020, 1:36 PM

#

50k in the training set, 25k in validation and test

desert oar Aug 14, 2020, 1:36 PM

#

oh yeah

#

can you slice off like 5k from the training set?

#

use that to train the ensemble

#

how many models are you ensembling? like 5?

acoustic halo Aug 14, 2020, 1:39 PM

#

Yeah it's 5, the main thing I am trying to justify in my mind is whether selecting the weights counts as hyperparameter optimisation

#

Because I could just grid search

#

and pick the best

desert oar Aug 14, 2020, 1:39 PM

#

ew why

#

wait

acoustic halo Aug 14, 2020, 1:40 PM

#

Just like picking layer sizes

desert oar Aug 14, 2020, 1:40 PM

#

are you allowed to use the development set for hyperparameter optimization?

acoustic halo Aug 14, 2020, 1:40 PM

#

yes

desert oar Aug 14, 2020, 1:40 PM

#

oh

#

what the fuck

#

that's such a weaselly distinction

acoustic halo Aug 14, 2020, 1:40 PM

#

Is that not the point of the validation set anyway?

desert oar Aug 14, 2020, 1:40 PM

#

yes but in real life it's not a strict delineation

#

its not just "model + hyperparameters"

#

there are potentially several "layers" of training

#

as you're seeing here

#

if you have a model w/ gold loss correction, temperature scaling, and hyperparameter tuning, theoretically you have three nested training procedures

acoustic halo Aug 14, 2020, 1:41 PM

#

So what i'm hearing is that i can get away with using the validation set to "optimise my hyperparameters" 😆

desert oar Aug 14, 2020, 1:42 PM

#

well... more like "it's not the main model" is your argument

#

what a stupid rule imo

#

i think the idea here is that you aren't allowed to do a final training run that includes the validation set, before submitting

acoustic halo Aug 14, 2020, 1:43 PM

#

Exactly

desert oar Aug 14, 2020, 1:43 PM

#

idk

#

thats what people do in real life though!

acoustic halo Aug 14, 2020, 1:43 PM

#

It's a mess

desert oar Aug 14, 2020, 1:43 PM

#

thats the whole point of a validation set

#

can you like, clarify the rule w/ a judge

#

or i guess you can just do it anyway and hope nobody calls you out

acoustic halo Aug 14, 2020, 1:46 PM

#

I'll do just that, but until I hear otherwise, I'll go on the basis that I'm allowed to use the validation set for hyperparameter optimisation which includes selecting weights

Realistically, I'm not too bothered about the competition, this is for my final project so I'm more concerned in learning the actual concepts than results

random perch Aug 14, 2020, 2:03 PM

#

Does anyone have experience working on ml/ai open source projects? If so please reach out to me!

#

I'm trying to get started with tensorflow and opencv open source but im not sure where to start in terms of how to contribute.

paper niche Aug 14, 2020, 2:20 PM

#

@random perch I've contributed to open source projects before, as I'm sure many here have. Just not specifically tensorflow and opencv, but I don't imagine the process being any/much different. Most decent projects have a Contributing page/document that point you where help is most appreciated by the core devs. For example tensorflow: https://www.tensorflow.org/community/contribute

TensorFlow

Contribute to TensorFlow

lapis sequoia Aug 14, 2020, 3:00 PM

#

@Klaouss#9437

random perch Aug 14, 2020, 3:22 PM

#

@paper niche Thank you very much! I appreciate your help 🙂

faint ravine Aug 14, 2020, 3:49 PM

#

Hey everyone, does anyone have experience with data generators?

weak sentinel Aug 14, 2020, 3:50 PM

#

a little @faint ravine

faint ravine Aug 14, 2020, 3:51 PM

#

What kind of data do you usually generate?

#

And are you aware of any generative algorithms other than the famous GAN?

weak sentinel Aug 14, 2020, 3:51 PM

#

also stupid question: for an input layer do i have to use keras.layers.Flatten() if im just inputting an array of parameters

#

i just used generators for a Image Classifier CNN

#

just modulated CIFAR-10 for more training data

faint ravine Aug 14, 2020, 3:52 PM

#

So, just standard generation? Like rotating the image or playing around with the contrast?

weak sentinel Aug 14, 2020, 3:53 PM

#

yeah nothing complicated at all

faint ravine Aug 14, 2020, 3:53 PM

#

Neat

weak sentinel Aug 14, 2020, 3:53 PM

#

sorry if im not helpful lol

faint ravine Aug 14, 2020, 3:53 PM

#

Lol, It's ok.

hollow silo Aug 14, 2020, 3:58 PM

#

is SVM a good project to put on a resume

#

entry level roles

faint ravine Aug 14, 2020, 4:02 PM

#

Probably not

#

Aim for something that has a bit more purpose. Coding up an algorithm and running it often does not count. You have to "make it do something" and show results.

#

what does your SVM do anyway?

desert oar Aug 14, 2020, 4:07 PM

#

it might be a good project

#

if it's "i did a data science project and i happened to use an SVM for my model" that seems like a fine resume item

#

(as long as you can justify why you used the SVM)

hollow silo Aug 14, 2020, 4:08 PM

#

what does your SVM do anyway?
@faint ravine i used SVM for point cloud segmentation

#

basically i had some point cloud data with different points belonging to differnet classes

#

and i implemented a multi class SVM to segment the point cloud into different regions

faint ravine Aug 14, 2020, 4:12 PM

#

@desert oar is right.
Yeah, but don't say that on your resume. It sounds like: "I got some data, and I classified it". Something like: "I built a dog/cat recognizer" would be better.

desert oar Aug 14, 2020, 4:13 PM

#

of course

#

youre describing your project

#

make your project sound like a project

#

put the details in the bullet points

#

what kind of data was in the point clouds? or was it just a toy project w/ simulated data?

hollow silo Aug 14, 2020, 4:14 PM

#

they were 3D Coordinates from a LIDAR Scanner

faint ravine Aug 14, 2020, 4:14 PM

#

Yeah, don't rehearse the theoritical ideas that you learned. Implement them into something practically useful.

lapis sequoia Aug 14, 2020, 4:14 PM

#

Guys give me a advice like how to learn data science so how do i see data-science in thinking way

hollow silo Aug 14, 2020, 4:14 PM

#

and the dataset is public

desert oar Aug 14, 2020, 4:14 PM

#

LIDAR Scanner Data Segmentation

Used SVM to segment LIDAR scanner data
etc...

faint ravine Aug 14, 2020, 4:17 PM

#

Guys give me a advice like how to learn data science so how do i see data-science in thinking way
@lapis sequoia Do you wanna plug-and-chug or learn the underlying theory?

hollow silo Aug 14, 2020, 4:17 PM

#

how does this sound?

LIDAR Point Cloud Segmentation 
  -Implemented a soft margin multi-class SVM for point cloud segmentation 
  -Reduced computation time (by some metric) using efficient vectorized operations 
  -Achieved so and so accuracy

lapis sequoia Aug 14, 2020, 4:17 PM

#

@faint ravine what u mean by plug-and-chug

faint ravine Aug 14, 2020, 4:18 PM

#

That's good

desert oar Aug 14, 2020, 4:18 PM

#

did you implement the svm though?

hollow silo Aug 14, 2020, 4:18 PM

#

yes

desert oar Aug 14, 2020, 4:18 PM

#

nice

hollow silo Aug 14, 2020, 4:18 PM

#

like do you mean if use scikit learn?

desert oar Aug 14, 2020, 4:18 PM

#

yeah

hollow silo Aug 14, 2020, 4:19 PM

#

i wrote an SVM Class using numpy

desert oar Aug 14, 2020, 4:19 PM

#

very nice

#

you might also want to mention where/how you got the data

hollow silo Aug 14, 2020, 4:19 PM

#

so no scikit

#

you might also want to mention where/how you got the data
@desert oar will do

lapis sequoia Aug 14, 2020, 4:19 PM

#

i'm a beginner in numpy

#

but i love numpy

#

Hey guys for anyone interested in Data Science please check out my channel and leave a sub, if you want, would be very much appreciated to get my channel off the floor.
https://www.youtube.com/channel/UCiFF3AvbzLWdRyRnQMEttqw?view_as=subscriber

YouTube

Mazen Ahmed

Help me improve my videos.

desert oar Aug 14, 2020, 4:19 PM

#

one exercise that i was told to do by my MA thesis advisor was to write an executive summary of my projects

lapis sequoia Aug 14, 2020, 4:19 PM

#

Thank You

desert oar Aug 14, 2020, 4:19 PM

#

a 1 page document w/ maybe 1 plot. basically an extended abstract

hollow silo Aug 14, 2020, 4:20 PM

#

i realised that being able to describe your projects is a very important skill

#

something you should keep a log of while you are building the project

desert oar Aug 14, 2020, 4:20 PM

#

+1

lapis sequoia Aug 14, 2020, 4:20 PM

#

+1

#

i wanna build something with numpy

desert oar Aug 14, 2020, 4:20 PM

#

@hollow silo sounds like you'll have no problem getting hired, if that's your mentality 🙂

hollow silo Aug 14, 2020, 4:21 PM

#

i wanna build something with numpy
@lapis sequoia write a neural network from scratch

#

one layer

lapis sequoia Aug 14, 2020, 4:21 PM

#

hard to build or easy?

hollow silo Aug 14, 2020, 4:21 PM

#

@hollow silo sounds like you'll have no problem getting hired, if that's your mentality 🙂
@desert oar thank u 🥺 its really hard bc i dont have a degree directly related to CS etc

#

the grind is real in software

#

hard to build or easy?
@lapis sequoia you can use numpy for pretty much anything actually...if you're interested in data science and ML then yeah a one layer NN is of moderate difficulty.. you can extend that to an autoencoder as well

lapis sequoia Aug 14, 2020, 4:22 PM

#

oh thanks

hollow silo Aug 14, 2020, 4:23 PM

#

the power of numpy lies in matrix slicing and dicing operations

lapis sequoia Aug 14, 2020, 4:23 PM

#

yes i'm learning numpy

hollow silo Aug 14, 2020, 4:24 PM

#

yeah np.dot etc is cool but a lot of times people just use for loops over their numpy matrices when the same thing can be represented as a matrix product

lapis sequoia Aug 14, 2020, 4:24 PM

#

slicing index, shape,reshape i love them

hollow silo Aug 14, 2020, 4:24 PM

#

yes i'm learning numpy
@lapis sequoia if you are interested in computer vision, i recommend following the cs231n course

desert oar Aug 14, 2020, 4:24 PM

#

wait until you guys learn about np.einsum

hollow silo Aug 14, 2020, 4:24 PM

#

you can do their assignments

#

wait until you guys learn about np.einsum
@desert oar i have read about that 😄 but never used it

#

i didnt understand it too well

desert oar Aug 14, 2020, 4:27 PM

#

"regex for array math"

hollow silo Aug 14, 2020, 4:27 PM

#

"regex for array math"
@desert oar thats a neat way to put it

tidal bough Aug 14, 2020, 4:33 PM

#

sadly, unless I missed something major, it's "only" for any kind of multiplication operations

#

you can't, say, make it calculate the sum of each element with each.

#

you can do outer multiplication: "i,j->ij" but not summing

arctic wedgeBOT Aug 14, 2020, 5:05 PM

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @quick fox!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

digital juniper Aug 14, 2020, 5:18 PM

#

hey, does anyone here use kaggle? i'm trying to make a team notebook for a competition and i'm not sure how to do it

flat quest Aug 14, 2020, 5:26 PM

#

sadly, I think you did miss something major 😉 @tidal bough

Its just a summation equation at its core so it can deal with summing each element with each.
All you do is np.einsum('i,i', a, b)

desert oar Aug 14, 2020, 5:29 PM

#

https://mathworld.wolfram.com/EinsteinSummation.html

tidal bough Aug 14, 2020, 5:41 PM

#

@flat quest nah, that's just scalar multiplication. I meant, from 1d array a and b, produce a 2d array c, where c[i,j]=a[i]+b[j]

flat quest Aug 14, 2020, 6:42 PM

#

i mean i don't see how that would be multiplication. It's just summation all the way through.

But yeah if u want to do c[i,j] = a[i] + b[j]. You can do explicit mode I believe np.einsum('i,j -> i,j). I'm not sure entirely if that works, but based on the docs, it seems like it would. @tidal bough

tidal bough Aug 14, 2020, 6:44 PM

#

@flat quest nope, np.einsum("i,j -> ij") would do c[i,j] = a[i]*b[j]

flat quest Aug 14, 2020, 6:45 PM

#

ah right. Yeah not thinking too straight this morning lol.

tidal bough Aug 14, 2020, 6:45 PM

#

i mean i don't see how that would be multiplication. It's just summation all the way through.
scalar multiplication of vectors(1d arrays) is defined as the sum a[i]*b[i] for all i 🙂

flat quest Aug 14, 2020, 6:50 PM

#

yeah ur right, it's all multiplication, and then summing over those multiplicated terms
My bad :/

Guess it's up to the standard addition to deal with those problems then 😉

tidal bough Aug 14, 2020, 7:01 PM

#

Well, semi-standard. You do this via the glory of np.ufunc.outer 🙂

#

In [254]: arr
Out[254]: array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [255]: np.add.outer(arr,arr)
Out[255]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 3,  4,  5,  6,  7,  8,  9, 10, 11],
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13],
       [ 6,  7,  8,  9, 10, 11, 12, 13, 14],
       [ 7,  8,  9, 10, 11, 12, 13, 14, 15],
       [ 8,  9, 10, 11, 12, 13, 14, 15, 16]])

#

huh, this function is actually pretty slow

#

yeah, this dumb C-loop is 4 times faster:

@numba.njit
def outer_sums_full(arr):
    res = np.zeros((len(arr),len(arr)),dtype=arr.dtype)
    for i in range(len(arr)):
        for j in range(len(arr)):
            res[i,j]=arr[i]+arr[j]
    return res

#

And this is a bit more faster, even, and is a very simple broadcasting-based solution:

@numba.njit
def outer_sums_3(arr):
    arr=arr.reshape(-1,1)
    return arr+arr.transpose()

toxic bone Aug 14, 2020, 8:05 PM

#

Hello guys! I want to create some universal pandas reader for parquet/csv/hdf5/excel/sqlite/whatever depending on file extension.

How do you think, is it better to create it as function with **kwargs to send arguments or maybe use some kind of decorator?

Never tried decorators in practice, maybe it's time ti try

blazing sundial Aug 14, 2020, 8:17 PM

#

Hey fam could I please get some help with creating an array in numpy?

bitter harbor Aug 14, 2020, 8:21 PM

#

what kind of array

blazing sundial Aug 14, 2020, 8:22 PM

#

hey man its super simple but its giving me a "not callable error"

#

totalBands = np.array([[180,15], [5,20], [8,16],])

tidal bough Aug 14, 2020, 8:23 PM

#

In [296]: totalBands = np.array([[180,15], [5,20], [8,16],])

In [297]: totalBands
Out[297]:
array([[180,  15],
       [  5,  20],
       [  8,  16]])

blazing sundial Aug 14, 2020, 8:23 PM

#

sorry that last comma shouldnt be there but stil

tidal bough Aug 14, 2020, 8:23 PM

#

My guess is that you did something bad like redefining np.array.

#

what's the full error?

blazing sundial Aug 14, 2020, 8:24 PM

#

oh i see! that makes sense cause it worked yesterday

#

smh

#

📎 unknown.png

tidal bough Aug 14, 2020, 8:25 PM

#

yeah, definitely redefined something

#

check these:

In [300]: type(np)
Out[300]: module

In [301]: type(np.array)
Out[301]: builtin_function_or_method

blazing sundial Aug 14, 2020, 8:25 PM

#

could you explain a little more? Im a bit confused

tidal bough Aug 14, 2020, 8:26 PM

#

do type(np) and see what it gives you, same for np.array

blazing sundial Aug 14, 2020, 8:26 PM

#

what should i be looking for in those outputs?

tidal bough Aug 14, 2020, 8:26 PM

#

the above is what you should get(if they aren't redefined).

blazing sundial Aug 14, 2020, 8:27 PM

#

ohhhh

#

check it out

#

📎 unknown.png

tidal bough Aug 14, 2020, 8:27 PM

#

I'm not even sure how you did this.

bitter harbor Aug 14, 2020, 8:27 PM

#

^^

tidal bough Aug 14, 2020, 8:28 PM

#

do

del np.array
del np
import numpy as np

blazing sundial Aug 14, 2020, 8:28 PM

#

lmaoooo f me

#

could i just clear everything?

tidal bough Aug 14, 2020, 8:28 PM

#

well, restarting ipython works too, yes.

blazing sundial Aug 14, 2020, 8:29 PM

#

could i just close out of spyder and reopen it?

#

sorry, im switching over from matlab and im still learning

tidal bough Aug 14, 2020, 8:29 PM

#

yup, or probably just close and open the console.

#

or push a button somewhere to stop it.

blazing sundial Aug 14, 2020, 8:30 PM

#

i think it happen when i tried to define a new array above

#

above it*

#

anyways let me try to restart it

bitter harbor Aug 14, 2020, 8:31 PM

#

i think it happen when i tried to define a new array above
can you send that code?

tidal bough Aug 14, 2020, 8:31 PM

#

you must have done something really weird like np.array = <something, a float64 to be precise>

blazing sundial Aug 14, 2020, 8:32 PM

#

hmmm probably lol, could i get your advise on what im doing? maybe you could point me in the right direction

#

so i have that array right? basically for each element in that array i send earlier (i.e ([[180,15], [5,20]]) I want to add additional numbers to each element in asending order, so the element [5,20] would turn into [5,6,7,8,9,...20]

tidal bough Aug 14, 2020, 8:36 PM

#

sounds like you just want arange.

#

the problem is that arrays have a specific size on each dimension.

#

like, you can't have the second row be length 5 and the first length 10.

#

you can only pad it with NaNs, I guess.

#

so for your task, a list of 1d arrays might make more sense.

#

though it depends on what you're later using that list for.

blazing sundial Aug 14, 2020, 8:53 PM

#

yeah i was planning on using a 1xN array. The original idea was to interate through the array and find where each element has a matching value

tidal bough Aug 14, 2020, 8:54 PM

#

uhh

#

!xy

arctic wedgeBOT Aug 14, 2020, 8:54 PM

#

xy-problem

Asking about your attempted solution rather than your actual problem.

Often programmers will get distracted with a potential solution they've come up with, and will try asking for help getting it to work. However, it's possible this solution either wouldn't work as they expect, or there's a much better solution instead.

For more information and examples: http://xyproblem.info/

tidal bough Aug 14, 2020, 8:54 PM

#

What's the actual problem you're trying to solve?

blazing sundial Aug 14, 2020, 9:00 PM

#

oh my b, basically the problem is I have 12 ranges of data,ranging from 0 to 360, and i was trying write a function (or just code i guess) to see where if there is a value that the 12 ranges contain

#

does that make sense? sorry, i can try to explain better

tidal bough Aug 14, 2020, 9:01 PM

#

if there is a value that the 12 ranges contain
So, whether there's an intersection of these 12 ranges?

blazing sundial Aug 14, 2020, 9:01 PM

#

yeah:)

tidal bough Aug 14, 2020, 9:01 PM

#

I think it's much simpler and doesn't require any numpy.

#

Consider what the intersection of two ranges might be. It's either:

Some range. Say 5:10 intersects with 7:12 on 7:10
Empty set. Say, 5:10 with 20:50 have no intersection

#

So you need to just write a function determining the intersection of two ranges. Apply it to the first two elements, then to the result of this and the third element, then to the result of that and the fourth...
(this way of applying is, by the way, what functools.reduce does)

blazing sundial Aug 14, 2020, 9:05 PM

#

ohhh i see! okay thank you so much. Im gona work on this and ill see if can handle it from here

tame nest Aug 14, 2020, 9:44 PM

#

I can probably ask a question here and someone help me with a pandas code

faint ravine Aug 14, 2020, 9:44 PM

#

Yes, you can't.

tame nest Aug 14, 2020, 9:45 PM

#

I was advised to come to this channel

faint ravine Aug 14, 2020, 9:45 PM

#

I'm just kidding, what is it?

tame nest Aug 14, 2020, 9:45 PM

#

I am trying to create a new column based on a condition on another column of string values and am facing weird behavior

#

please see this

#

create new variable

📎 unknown.png

#

I am trying to create the new variable 'flee1' based on the variable 'flee'..it should give True when 'flee' == 'Not fleeing'

#

any ideas anyone..even kaggle notebooks giving the same prob

#

nobody knows it seems 🙂 stackexchange for the real geeks

odd yoke Aug 14, 2020, 10:09 PM

#

use .map @tame nest

#

df["flee"] = df["flee1"].map({True: "fleeing", False: "not fleeing"})

spiral peak Aug 14, 2020, 10:12 PM

#

(it was a typo for anyone curious)

tame nest Aug 14, 2020, 10:20 PM

#

🙂

#

@spiral peak helped me

#

the issue is solved..

#

Thanks @odd yoke

untold rose Aug 14, 2020, 10:39 PM

#

are libraries like tensorflow or pytorch required to make neural networks?

odd yoke Aug 14, 2020, 10:42 PM

#

required ? no
useful ? definitely

#

and chances are, if you don't use them, your code will very likely be less efficient in many ways

untold rose Aug 14, 2020, 10:43 PM

#

ah ok

#

is it possible to make one using only numpy?

odd yoke Aug 14, 2020, 10:45 PM

#

again, yes

#

but you'll have to do the differentiation yourself, you won't be able to run your code on the gpu, and there are not as many functions commonly used to build networks

untold rose Aug 14, 2020, 10:49 PM

#

alright

#

thanks

arctic wedgeBOT Aug 14, 2020, 11:03 PM

#

Hey @atomic oxide!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

atomic oxide Aug 14, 2020, 11:08 PM

#

Hello guys could I please get help to solve this issue why the yticks label are out of their places (ticks)

📎 1.jpg

velvet thorn Aug 14, 2020, 11:26 PM

#

Hello guys could I please get help to solve this issue why the yticks label are out of their places (ticks)
@atomic oxide what do you mean out of place

#

I'm assuming you intend the logarithmic scale?

#

do you mean like how the major tick labels appear to be misaligned with the major ticks?

atomic oxide Aug 14, 2020, 11:27 PM

#

yes this is my question

#

I get this issue in all my plots

velvet thorn Aug 14, 2020, 11:28 PM

#

hard to say without seeing all your code.

#

you do?

#

that's weird

#

can you create a basic plot and show me?

#

e.g.

#

fig, ax = plt.subplots(figsize=(4, 4))

x = np.linspace(0, 2 * np.pi, 200)
y = np.cos(x)

ax.plot(x, y)

atomic oxide Aug 14, 2020, 11:29 PM

#

ok

#

the issue doesn't appear

📎 2.jpg

velvet thorn Aug 14, 2020, 11:33 PM

#

yeah, then it's not all your plots

#

I mean, I guess this is a long shot but

#

you're not manipulating the ticks and/or tick labels manually, right

#

or using some custom Locator/Formatter that might cause this

arctic wedgeBOT Aug 14, 2020, 11:43 PM

#

Hey @atomic oxide!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

#

Hey @atomic oxide!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

velvet thorn Aug 14, 2020, 11:44 PM

#

just a heads up

#

if your code is that long I don't think many people will want to look through it.

atomic oxide Aug 14, 2020, 11:46 PM

#

not long but how can i upload it

arctic wedgeBOT Aug 14, 2020, 11:46 PM

#

Hey @atomic oxide!

It looks like you tried to attach file type(s) that we do not allow (). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

velvet thorn Aug 14, 2020, 11:47 PM

#

read what the bot said

#

my GOD

#

why don't you try removing each line that deals with the y-ticks

#

until the problem stops

#

so you can figure out which line is causing it

#

ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y,pos: ('{{:.{:1d}f}}'.format(int(np.maximum(-np.log10(y),0)))).format(y))) this is my guess though

atomic oxide Aug 14, 2020, 11:54 PM

#

I'm going to try, Thank you soooo much

velvet thorn Aug 14, 2020, 11:55 PM

#

I would really suggest you clean up your code a little

#

actually, maybe clean it up a lot?

atomic oxide Aug 14, 2020, 11:56 PM

#

Ok

dim olive Aug 15, 2020, 12:12 AM

#

hello friends, given a large dataset of [x, y] and assuming these two have some sort of correlation, what methods could I use to measure how closely these two are related?

I am analyzing video game statistics where x is vision of the map, and y is deaths. each set of [x, y] are from a new, unrelated instance of the game.

dim olive Aug 15, 2020, 12:39 AM

#

my data looks like this, so I am losing hope haha

📎 unknown.png

tidal bough Aug 15, 2020, 12:56 AM

#

this doesn't look super correlated 😅

#

maybe you want to train a regression neural net for this.

#

"how well can my neural net predict y by x" is, technically, a measure of their correlation 🙂

dim olive Aug 15, 2020, 12:58 AM

#

Yeah, sadly it does not, although it should have a fairly close correlation

#

ok, ty. My current regression model says: "lol"

#

I would very much like to prove whether or not it is correlated. As if it is not, it means x can be thrown out completely from my other analysis

#

red is the y predictions LOL

📎 unknown.png

bitter harbor Aug 15, 2020, 1:02 AM

#

I mean that’s not awful

tidal bough Aug 15, 2020, 1:05 AM

#

that's about as good a relationship as I can predict! 😅

#

by the way, is your x one-dimensional?

#

because, uhhh, this really doesn't look like enough data to predict y.

dim olive Aug 15, 2020, 1:08 AM

#

It is one dimensional and not enough data to fully predict y, but I would like to measure the correlation before moving forward with some more in-depth ML

#

This is using linear regression, but I wanted to know if this was a reasonable approach to this specific problem

#

actually, the red line represents what I assumed, higher x should mean less y

#

but I dont just want to confirmation bias this whole project

serene scaffold Aug 15, 2020, 2:35 AM

#

Is anyone familiar with a tool whereby you can enter a string, highlight a portion of that string, and see the index of the first and last character of what you've selected?

#

If I try to Google something like this I only get results about HTML.

#

I need it to write unit tests for an nlp project.

velvet thorn Aug 15, 2020, 2:41 AM

#

Is anyone familiar with a tool whereby you can enter a string, highlight a portion of that string, and see the index of the first and last character of what you've selected?
@serene scaffold you mena like a frontend thing?

serene scaffold Aug 15, 2020, 2:41 AM

#

I'm not sure what you mean

velvet thorn Aug 15, 2020, 2:43 AM

#

when you say a "tool" do you mean like a (very small) webapp?

#

because when you mention highlighting I'm assuming there's a GUI?

serene scaffold Aug 15, 2020, 2:44 AM

#

I'd prefer if it had a GUI because I'm not sure how to quickly disambiguate which instance of a substring I'm referring to if it were a CLI.

velvet thorn Aug 15, 2020, 2:45 AM

#

hm I don't know of anything that can do that offhand but it shouldn't be that difficult to build

vague ivy Aug 15, 2020, 2:45 AM

#

if i did help = random.randint(1,250) would it be used as a integer or a string when i do while help != 1: print('no')?

serene scaffold Aug 15, 2020, 2:46 AM

#

!e

import random
thing = random.randint(1, 250)
print(thing, type(thing))

arctic wedgeBOT Aug 15, 2020, 2:46 AM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

130 <class 'int'>

vague ivy Aug 15, 2020, 2:47 AM

#

ok thanks

#

also is there a way to make a if statment in a loop?

#

whatever = random.randint(1,250)
test = 1
while whatever != test:
    print('no')
if rsr == UR:
    break
    print('UR')```

serene scaffold Aug 15, 2020, 2:50 AM

#

can you open a help session and ping me?

modern canyon Aug 15, 2020, 3:03 AM

#

can someone test my project? (especially if you're on a mac)
https://github.com/shyam1998/movie-recommendation-system-GUI

GitHub

shyam1998/movie-recommendation-system-GUI

movie-recommendation-system-GUI. Contribute to shyam1998/movie-recommendation-system-GUI development by creating an account on GitHub.

desert parcel Aug 15, 2020, 3:23 AM

#

Traceback (most recent call last):
  File "D:\Coding\python\AI\dropout.py", line 18, in <module>
    train_ds = TensorDataset(inputs, targets)
  File "D:\Coding\python\AI\lib\site-packages\torch\utils\data\dataset.py", line 158, in __init__
    assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)
AssertionError

#

Does anyone know what this error means

#

I was able to fix it

jovial thorn Aug 15, 2020, 3:49 AM

#

Hey! How did I write the code blocks? I keep forgetting

#

def isPower (num, base):
    if base in {0, 1}:
        return num == base
    power = int (math.log (num, base) + 0.5)
    return base ** power == num

#

I'm using that function that I found in stackoverflow to check if a number is a power of 2, but I'm curious because I just don't undestand why power is the logarithm of the number in the base, but +0.5. What is the +0.5 achieving?

#

I feel like it's basic math but I just don't get it, it's been a long week lol

kind granite Aug 15, 2020, 4:07 AM

#

lol where did you find that function @jovial thorn

#

return math.log(num, base).is_integer()

#

this will do it

serene scaffold Aug 15, 2020, 4:08 AM

#

@jovial thorn @kind granite if this isn't related to data science you might want to solve this in a help channel; see #❓｜how-to-get-help

jovial thorn Aug 15, 2020, 4:09 AM

#

is basic math not related to data science????

#

thanks a lot @kind granite !

serene scaffold Aug 15, 2020, 4:10 AM

#

not necessarily

jovial thorn Aug 15, 2020, 4:14 AM

#

emphasis on necessarily ? I think it's easy to make a case for needing proper logarithms in data science lol

#

I wrote it here cause I deemed it related to data science

serene scaffold Aug 15, 2020, 4:16 AM

#

I was only pointing out the possibility that an individual help channel might be better. feel free to carry on.

jovial thorn Aug 15, 2020, 4:16 AM

#

thanks!

ripe forge Aug 15, 2020, 6:24 AM

#

For future reference, It's easy to make a case for random abstract things, doesn't mean it's correct for this specific instance where it's literally talking about an ispower function. Just because data science is built on math doesn't mean anything goes and we start talking about addition or multiplication. You can try #algos-and-data-structs for questions related to algorithms, perhaps that is a better fit.

vague ivy Aug 15, 2020, 6:38 AM

#

i have this code right here:

#

import random
t = 1
while (r := random.randint(1,40)) != t:
  print(r)
else:
  print("yes")```

#

i want python to print ('amount of times "r" was said')

#

(it may not look like it but it is data science)

oak iron Aug 15, 2020, 6:39 AM

#

how did you do that code thing?

#

liek that code block

vague ivy Aug 15, 2020, 6:40 AM

#

code here

#

📎 unknown.png

oak iron Aug 15, 2020, 6:40 AM

#

like how do you bold/spoilers, but with 3 `?

vague ivy Aug 15, 2020, 6:41 AM

#

yep

oak iron Aug 15, 2020, 6:41 AM

#

k, thanks

vague ivy Aug 15, 2020, 6:41 AM

#

but can you help me?

oak iron Aug 15, 2020, 6:41 AM

#

how?

vague ivy Aug 15, 2020, 6:42 AM

#

i have this code right here:

import random
t = 1
while (r := random.randint(1,40)) != t:
  print(r)
else:
  print("yes")```
i want python to `print ('amount of times "r" was said')`
(it may not look like it but it is data science)

oak iron Aug 15, 2020, 6:45 AM

#

i think the proper format is randint(num1, num2)

#

not random.randint()

vague ivy Aug 15, 2020, 6:46 AM

#

i did that.......

#

ooooh

#

sorry

oak iron Aug 15, 2020, 6:46 AM

#

in the while

pale thunder Aug 15, 2020, 6:47 AM

#

you would have to increment a counter variable everytime you do print(r)

vague ivy Aug 15, 2020, 6:47 AM

#

i didnt fo import random from random import *

pale thunder Aug 15, 2020, 6:47 AM

#

your randint call is fine

vague ivy Aug 15, 2020, 6:47 AM

#

so i have to do random.randint

oak iron Aug 15, 2020, 6:47 AM

#

i think from random import randint works

vague ivy Aug 15, 2020, 6:47 AM

#

no im talking to bizzarebazzar

#

oh ok

#

i will

pale thunder Aug 15, 2020, 6:47 AM

#

import random works just fine

oak iron Aug 15, 2020, 6:48 AM

#

yea, idk what's going wrong

#

wait

#

what compiler are you using?

ripe forge Aug 15, 2020, 6:48 AM

#

Wrong? Nothing I thought. They just want to do something extra.

oak iron Aug 15, 2020, 6:49 AM

#

wait

#

for me it outputs a bunch of numbers, before outputting yes

pale thunder Aug 15, 2020, 6:50 AM

#

yes, that is what that code does. The goal is to also make it write the amount of numbers it wrote

ripe forge Aug 15, 2020, 6:50 AM

#

Aye, that's because both r and yes is printed,

oak iron Aug 15, 2020, 6:50 AM

#

random order, though

ripe forge Aug 15, 2020, 6:50 AM

#

r is a randint

oak iron Aug 15, 2020, 6:51 AM

#

14
32
16
...
28
6
yes

ripe forge Aug 15, 2020, 6:51 AM

#

So yeah, make a counter variable before the loop set to 0. Each time the while loop is satisfied, add to this counter.

oak iron Aug 15, 2020, 6:51 AM

#

that's my output

ripe forge Aug 15, 2020, 6:52 AM

#

Print counter variable at the end of everything else outside the loop.

#

That's your output because that's what the output should be. So the real question is this, what did you expect instead?

#

There's a mismatch between what the code does and what you think it does in this case, if you think something is unexpected. We can try to address that.

pale thunder Aug 15, 2020, 6:53 AM

#

could also do something like

import random
for i, r in enumerate(iter(lambda: random.randint(1, 40), 1)):
    print(r)
else:
    print(f'a number was said {i+1} times')
```but a counter variable is probably saner.

oak iron Aug 15, 2020, 6:57 AM

#

I think he's offline...

copper hemlock Aug 15, 2020, 6:57 AM

#

hello, i have a question

#

about pytorch

#

my image batch comes with dimension of 3 instead of 4, its missing the color channel

#

how is this possible

ripe forge Aug 15, 2020, 7:03 AM

#

Each image has a dim of 3? (in which case that's normal) or does the whole batch only have 3 dimensions

#

In general, When images don't have colour channel it means they are essentially greyscaled images.

#

Such that the same pixel value is used for all 3 channels at once

uncut shadow Aug 15, 2020, 7:04 AM

#

Hey. What does Flatten layer do (in e.g. Tensorflow) and what is it for?

ripe forge Aug 15, 2020, 7:05 AM

#

Reduce the dimensions of something.

uncut shadow Aug 15, 2020, 7:05 AM

#

Oh

ripe forge Aug 15, 2020, 7:05 AM

#

Say turning a 2d matrix into a 1d array

uncut shadow Aug 15, 2020, 7:05 AM

#

What's the point? Couldn't u just do this before feeding data to model?

ripe forge Aug 15, 2020, 7:06 AM

#

Sure, you could.

uncut shadow Aug 15, 2020, 7:06 AM

#

Hmmm

copper hemlock Aug 15, 2020, 7:06 AM

#

batch is supposed have 4 dims no?
[batch_size, in_channel, w, h]

uncut shadow Aug 15, 2020, 7:06 AM

#

So both ways are possible?

copper hemlock Aug 15, 2020, 7:06 AM

#

mine comes with [batch_size, w, h]

#

so it throws error

ripe forge Aug 15, 2020, 7:06 AM

#

It's probably logically easier to understand data going in normally, say images make sense as 2d for example

uncut shadow Aug 15, 2020, 7:07 AM

#

Oh, yeah makes sense. Thanks

ripe forge Aug 15, 2020, 7:07 AM

#

If there's a shape error eysidi you can freely reshape as needed I'd assume. This is a guess but it shouldn't cause problems

#

Make sure you keep the correct axes when you reshape though

#

So perhaps a shape of [batch size, 1, w, h] if your notation is correct.

copper hemlock Aug 15, 2020, 7:09 AM

#

hmm i will try that thanks

#

i think the issue is caused from my dataloader

random perch Aug 15, 2020, 7:44 AM

#

Should I buy the book about tensorflow and keras by O’reilly

stuck oar Aug 15, 2020, 8:23 AM

#

Hey

#

anyone uses Visual Studio notebook here?

#

I thought of using that but I can't find the equivalent of shift+tab (jupyter notebook) on vscode

#

it's this question https://stackoverflow.com/questions/63408190/what-is-the-equivalent-of-shift-tab-of-jupyter-notebook-on-visual-studio

Stack Overflow

What is the equivalent of shift + tab of jupyter notebook on visual...

I'd like to see the suggestions like so: jupyter-suggestion-ss
Shift + Tab works only on jupyter notebook.

#

anyone here familiar with this?

modern canyon Aug 15, 2020, 9:01 AM

#

@stuck oar just hover your mouse over the function

stuck oar Aug 15, 2020, 9:03 AM

#

@stuck oar just hover your mouse over the function
@modern canyon right haha, thanks!

modern canyon Aug 15, 2020, 9:03 AM

#

👍

teal notch Aug 15, 2020, 9:57 AM

#

📎 Annotation_2020-08-15_105642.png

#

that error shows to me whene i'm trying to Draw A picture INSIDE OTHER IMAGE

#

can u help me

#

?

grave frost Aug 15, 2020, 10:48 AM

#

So pretty wide and vague query - anyone know a model which uses transformers to be good to be used for seq2seq or NMT purposes?

#

Would something like FairSeq would be considered good, or maybe some flavors of BERT like models like RoBerta or BART or even GPT-2, Would these be good models for direct sequence to sequence conversion?

#

I think FairSeq is pretty good in itself, since it is dedicated to seq2seq problem types. Would it then be a good idea to use BART, RoBerta and all the other NLP models out there?

molten hamlet Aug 15, 2020, 12:14 PM

#

@teal notch tuple object has no load

teal notch Aug 15, 2020, 1:14 PM

#

@teal notch tuple object has no load
@molten hamlet yeah i know but how can i make this bot add images to other image

molten hamlet Aug 15, 2020, 1:34 PM

#

!ask

arctic wedgeBOT Aug 15, 2020, 1:34 PM

#

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving.
• Be patient while we're helping you.

You can find a much more detailed explanation on our website.

half bloom Aug 15, 2020, 2:04 PM

#

idk where to put this but

#

📎 unknown.png

#

📎 unknown.png

#

whats the mistake here

desert parcel Aug 15, 2020, 2:52 PM

#

what are you using to write your code in

#

just curious I can't actually help you lol

lapis sequoia Aug 15, 2020, 2:56 PM

#

What r some good ai tutorials/courses

desert parcel Aug 15, 2020, 3:00 PM

#

https://youtu.be/vo_fUOk-IKk

YouTube

freeCodeCamp.org

Deep Learning with PyTorch Live Course - Tensors, Gradient Descent ...

This is a beginner-friendly coding-first online course on PyTorch - one of the most widely used and fastest growing frameworks for machine learning. This video covers the basic concepts in PyTorch viz. tensors & gradients, and walks through the process of implementing linear r...

▶ Play video

#

It's a VOD now

#

I'm currently on computer vision and logistic regression i'm not progressing as fast as I wanted it to

faint ravine Aug 15, 2020, 3:27 PM

#

How can I build a digit recognizer in python?

lapis sequoia Aug 15, 2020, 3:38 PM

#

@faint ravine you can build that using opencv,sklearn,numpy

faint ravine Aug 15, 2020, 3:39 PM

#

How do I get the best neural network?

#

what is the best CNN for handwriting recognition?

#

SOTA

raven mulch Aug 15, 2020, 3:59 PM

#

If you want to learn how to make your own deep learning library feel free to check out my youtube series! 🙂 https://www.youtube.com/watch?v=nNFsHQaD7gQ&t=1182s

YouTube

Federico Barbero

Developing a Deep Learning Library - JoelNet Library and Neural Net...

Hello!
Today we start a new adventure where we will be expanding on the JoelNet library with the ultimate goal of deploying our own MNIST web classifier (and maybe attacking it using some simple adversarial attacks). The idea is to model the library around the scikit-learn api...

▶ Play video

#

I'm a researcher in machine learning and I make videos on the subject, I'm hoping to create discussions in the comment section to share knowledge, feel free to share them around if you think they are interesting and we can all learn together 🙂

faint ravine Aug 15, 2020, 4:16 PM

#

What kind of research do you do?

glacial rune Aug 15, 2020, 4:51 PM

#

I have a script that I would like to run daily to get data from websites. If I want to store this locally, would it be best to create a csv file and append the entries there?

#

another thought - if I wanted to store this online/in the cloud, what would people recommend?

#

one potential issue I can see with the CSV is I'll need a tab for each thing I'm tracking? Which could become quite high?

desert oar Aug 15, 2020, 4:52 PM

#

Sqlite

#

Or 1 file per request

glacial rune Aug 15, 2020, 4:53 PM

#

ah ok, I'll look into SQLite, thanks

faint ravine Aug 15, 2020, 4:54 PM

#

you're welcome.

glacial rune Aug 15, 2020, 4:54 PM

#

would it be bad practice to have manytables in a SQL database? as I'll be tracking price over days

#

and the table name would be the product name, I guess

raven mulch Aug 15, 2020, 4:56 PM

#

@faint ravine in machine learning security

#

That’s what some of my videos are on

#

Robustness etc

faint ravine Aug 15, 2020, 4:56 PM

#

Like, adverserial attacks and such?

raven mulch Aug 15, 2020, 4:56 PM

#

Yeah

faint ravine Aug 15, 2020, 4:57 PM

#

Neat

raven mulch Aug 15, 2020, 4:57 PM

#

I reviewed some papers on that already

#

🙂

faint ravine Aug 15, 2020, 4:57 PM

#

So you must be aware of GANs?

raven mulch Aug 15, 2020, 4:57 PM

#

Generative adversarial networks?

faint ravine Aug 15, 2020, 4:57 PM

#

yeh

raven mulch Aug 15, 2020, 4:57 PM

#

They don’t have much to do with adversarial samples

faint ravine Aug 15, 2020, 4:58 PM

#

Really?

raven mulch Aug 15, 2020, 4:58 PM

#

Yeah a lot of people get that mixed up haha

lapis sequoia Aug 15, 2020, 4:58 PM

#

How would you write a docstring for something like kwargs in a function? The example below demonstrates two arguments from kwargs. The actual function will accept many more keyword arguments. I would like to define in the docstring what the possible keyword arguments are.

def prandtl(**kwargs):
    cp = kwargs.get('cp', None)
    alpha = kwargs.get('alpha', None)
    pr = cp / alpha
    return pr

faint ravine Aug 15, 2020, 4:59 PM

#

Can you give a short summary of how you go about doing machine learning security? and some real world applications? I'd like to know more

pale thunder Aug 15, 2020, 4:59 PM

#

why not just make those regular kw arguments

def prandtl(*, pr, alpha):
    pr = cp / alpha
    return pr
``` @lapis sequoia

raven mulch Aug 15, 2020, 5:00 PM

#

Check out my video on adversarial samples and my other one on Lipschitz continuity

#

I think that does a good intro

#

Better than what I could explain here haha

#

But in short

#

We look at how we can attack networks

#

And where they fail (distribution shifts)

lapis sequoia Aug 15, 2020, 5:00 PM

#

That works fine for a few arguments. But what if I have many arguments like 5 or 10 or more?

raven mulch Aug 15, 2020, 5:00 PM

#

And we try to make them more robust against this

pale thunder Aug 15, 2020, 5:01 PM

#

then you list them in the signature

#

if you are taking 10 arguments, you write 10 arguments

lapis sequoia Aug 15, 2020, 5:02 PM

#

So what's the point of **kwargs when you can just define all the arguments"

pale thunder Aug 15, 2020, 5:04 PM

#

when you do not know all the arguments, for example when extending a class, or things like the dict constructor, types.SimpleNamespace

faint ravine Aug 15, 2020, 5:04 PM

#

Fail at what?

raven mulch Aug 15, 2020, 5:06 PM

#

At the task at hand

#

A lot of ML is based under the iid assumption

#

So stuff doesn’t work for ood (out of distribution)

#

We try to fix that in ML sec

#

Or we come up with new attacks

faint ravine Aug 15, 2020, 5:24 PM

#

Nice

lapis sequoia Aug 15, 2020, 5:46 PM

#

How can I make this work for only (u, d, rho, mu) or (u, d, nu)?
If I invoke reynolds(0.25, 0.102, rho=910, nu=1.4e-6) then the function will still run.

def reynolds(u, d, rho=None, mu=None, nu=None):

    if u and d and rho and mu:
        re = (rho * u * d) / mu
    elif u and d and nu:
        re = (u * d) / nu
    else:
        raise ValueError('Must provide u, d, rho, mu or u, d, nu')

    return re

pearl crystal Aug 15, 2020, 5:49 PM

#

Why are some well-known packages in python messy like sklearn?
For example in sklearn.preprocessing (StandardScaler), I should first call fit method and then transform?! Why? It is really messy and is a type of side effect

#

It should be a method like transform, does everything and returns standard data, really simple but I have to remember I first need to call fit to compute mean and std data and then transform it

lapis sequoia Aug 15, 2020, 5:58 PM

#

Well this seems to work fine.

def reynolds(u, d, rho=None, mu=None, nu=None):

    if rho and mu and not nu:
        re = (rho * u * d) / mu
    elif nu and not rho and not mu:
        re = (u * d) / nu
    else:
        raise ValueError('Must provide (u, d, rho, mu) or (u, d, nu)')

    return re

pearl crystal Aug 15, 2020, 6:05 PM

#

When we have different methods to do the same thing, all of them are acceptable?
np.tile
np.matlib.repmat

Which one do you prefer?
np.reshape(arr,[2,5]) --> numpy methods
arr.reshape([2,5]) --> object methods

graceful thunder Aug 15, 2020, 6:19 PM

#

second one, it's cleaner

half bloom Aug 15, 2020, 6:23 PM

#

what are you using to write your code in
@desert parcel jupyter

pearl crystal Aug 15, 2020, 6:24 PM

#

jupyter is cool only for prototype and learning, I think

#

One big problem for me about jupyter is about IntelliSense and code completion, debugging, refactoring and git integration, bla bla. It is awful

odd yoke Aug 15, 2020, 6:30 PM

#

you're not alone on that one, 100% agree

bitter harbor Aug 15, 2020, 6:40 PM

#

I haven’t fully used it yet, but apparently spyder was built for data sci

brazen canyon Aug 15, 2020, 8:10 PM

#

I use vscode + the inbuilt jupyter feature

austere swift Aug 15, 2020, 8:33 PM

#

this

fervent bridge Aug 15, 2020, 11:49 PM

#

Downloaded the Cars 169 data set and reading through the .mat file(never worked with mat) I am wanting to know how to get further details of the file currently I got [('annotations', (1, 16185), 'struct'), ('class_names', (1, 196), 'cell')] how do I get further details of 'annotations' and 'class_names'? Tried test['annotations'] nothing

drowsy kite Aug 16, 2020, 1:37 AM

#

Hey guys does anyone have a cheat sheet or resource on how to predict values given that you have dummy columns?

#

im really confused on how you would identify the 1's and 0's when using .predict

fervent bridge Aug 16, 2020, 2:50 AM

#

NVM my question already figured it out a while ago

drifting umbra Aug 16, 2020, 5:01 AM

#

@drowsy kite you would need to decide on a model

#

if you have a lot of categorical variables i have found Catboost is faster and more accurate than XGBoost

#

https://catboost.ai/

CatBoost - state-of-the-art open-source gradient boosting library w...

CatBoost - state-of-the-art open-source gradient boosting library with categorical features support, https://catboost.yandex/ #catboost

#

let me know if you have questions

tidal sonnet Aug 16, 2020, 7:40 AM

#

nice

buoyant cypress Aug 16, 2020, 8:03 AM

#

hello data science people

#

I have a question which Im gonna crosspost

#

since this is probably the right place for it

#

📎 Screen_Shot_2020-08-16_at_8.02.51_PM.png

uncut shadow Aug 16, 2020, 10:39 AM

#

well, it's not connected with data science. But, what exactly do you mean? You could have just turned it to string and then just add , every 3 numbers

steady bronze Aug 16, 2020, 11:01 AM

#

do you guys know how to return values which appear multiple times in a column using pandas

wintry sapphire Aug 16, 2020, 11:36 AM

#

Hi guys, would anyone know why the output from np.polyfit is different from my own manual calculation through python?

tidal bough Aug 16, 2020, 12:24 PM

#

@buoyant cypress probably can be done via just string formatting

uncut shadow Aug 16, 2020, 12:24 PM

#

@buoyant cypress okay just use {n:,} where n is a number

tidal bough Aug 16, 2020, 12:24 PM

#

Hi guys, would anyone know why the output from np.polyfit is different from my own manual calculation through python?
^ this is solved, by the way.

boreal swift Aug 16, 2020, 12:24 PM

#

beat me to it

uncut shadow Aug 16, 2020, 12:24 PM

#

GWaobloChildPepeSweat

boreal swift Aug 16, 2020, 12:25 PM

#

There's also a way to make it locale aware

#

And in {n:,} you can replace comma with any symbol you want to seperate with

whole plover Aug 16, 2020, 1:40 PM

#

I want to fit some data in a pandas dataframe using a custom lmfit model, however my output is shuffled around in a weird way

#

red: fit, blue: data

📎 unknown.png

#

I'm using matplotlib for the plotting. the essence of the code is: ```python
result = model.fit(y, x=x, method="leastsq", params=params)

plt.scatter(x, y)
plt.scatter(x, result.best_fit)

#

any idea what is happening here?

#

the fitting model is a linear term with a numpy sin wave on top

grave frost Aug 16, 2020, 1:58 PM

#

So pretty wide and vague query - anyone know a molde which uses transformers and is good for seq2seq or NMT purposes?

#

Would something like FairSeq would be considered good, or maybe some flavors of BERT like models like RoBerta or BART or even GPT-2, Would these be good models for direct sequence to sequence conversion?
I think FairSeq is pretty good in itself, since it is dedicated to seq2seq problem types. Would it then be a good idea to use BART, RoBerta and all the other NLP models out there?

#

@whole plover Any reason why you are using linear regression for such a data pattern?

whole plover Aug 16, 2020, 2:06 PM

#

@grave frost Not intentionally no, isnt this a nonlinear fit?

#

I've never done fitting with python before so forgive me if im wrong

solid aurora Aug 16, 2020, 3:24 PM

#

@pearl crystal .fit_transform()

#

basically does fit and transform in one step

uncut python Aug 16, 2020, 3:31 PM

#

📎 Screenshot_2020-08-16-18-40-30-218_com.adobe.reader.jpg

#

Can anyone tell me how can do you interpreted a linear regression graph as given here in figure a, b, and c

#

Interpret*

#

I can send more details or figure legends if needed

smoky meadow Aug 16, 2020, 3:35 PM

#

What is better to use if I need only clear copy of numpy array, .copy() or np.copy()?

dusty sage Aug 16, 2020, 3:54 PM

#

I'm considering doing a course on data science. Can someone recommend to me some good reading material that would show me the ropes

lapis sequoia Aug 16, 2020, 4:01 PM

#

i mean both function doing same thing so u can choose which one u like u

#

personally i use np.copy() because i like the word np in my code

cosmic lynx Aug 16, 2020, 4:45 PM

#

So I want to make a perfect AI for a fighting game, how far above my head am I getting? I don’t know anything about AI aside from how it works in theory.

cinder sage Aug 16, 2020, 4:45 PM

#

No checking here either @cosmic lynx just fyi

cosmic lynx Aug 16, 2020, 4:45 PM

#

I am confused

#

so should I just probe the internet instead?

cinder sage Aug 16, 2020, 4:46 PM

#

You can ask general things, but not for us to help you cheat

grave frost Aug 16, 2020, 4:47 PM

#

@cosmic lynx Do you want to use ML or just a generic game AI present in most single-player games??

#

@cinder sage How is asking for help cheating??

cosmic lynx Aug 16, 2020, 4:49 PM

#

I was thinking ML just to see what insanity happens, who knows, it may find stuff like touch of death combos...

cinder sage Aug 16, 2020, 4:49 PM

#

@grave frost they are asking to use AI to perform perfect actions in a game. My guess is that is against the ToS of said game.

grave frost Aug 16, 2020, 4:50 PM

#

ML usually breaks the game because it exploits the game's engine, but yeah it is really powerfull if you use it correctly

#

However, RL does require some expertise. The more advanced your model, the more powerful actions it can take and the more it draws itself to above-human level of playing...

cosmic lynx Aug 16, 2020, 4:51 PM

#

?
okay, now I think I get it. The game I’m planning on doing this in is like street fighter but with more stuff thrown in

grave frost Aug 16, 2020, 4:52 PM

#

np. Model can handle it. If you want to ease into Reinforcement Learning (RL) best way is to use a simple DQN and bump up the complexity as you learn....

cosmic lynx Aug 16, 2020, 4:52 PM

#

DQN?

grave frost Aug 16, 2020, 4:53 PM

#

@cinder sage Why is using a bot aginst the Tos? As long as you don't "hack" or cheat it is considered fine. OpenAi made a model for DOTA 2. Since it got the same input as a human, it was allowed to play in the international tournament too...

#

@cosmic lynx Deep Q-learning Network. A very simple yet sometimes effective model. Good for simple games (Atari) and for beginners in RL...

cosmic lynx Aug 16, 2020, 4:55 PM

#

Wait, a bot being allowed to play in a tournament? That’s interesting
I’ll have to look into it, thanks

grave frost Aug 16, 2020, 4:55 PM

#

Anytime

cinder sage Aug 16, 2020, 4:55 PM

#

@grave frost openAI had permission

grave frost Aug 16, 2020, 4:56 PM

#

How does a bot get advantage? for many things, it usually a limitation as it can't take "blazing-fast actions" or use long-term strategy (consumer-level models). I don't see how it is cheating because you are basically limiting your own game score by allowing a bot to play...

cosmic lynx Aug 16, 2020, 4:58 PM

#

also I don’t think I can hook hook up this bot to anything I can’t run on my potato....

grave frost Aug 16, 2020, 4:59 PM

#

Do you have a GPU??

cosmic lynx Aug 16, 2020, 4:59 PM

#

I have a 5 year old laptop that was middle end then....

grave frost Aug 16, 2020, 5:01 PM

#

Well, You can't train a model without GPU. I suggest you look up Colab, Google's initiative to provide free GPU's with minimal setup. But It's not easy to do RL on Colab, so I suggest you get some GPU resources. A lappy isn't gonna cut it

cosmic lynx Aug 16, 2020, 5:02 PM

#

F
Either way I was planning on buying a cheap desktop soon...

#

So much for AI tic-tac-toe bot perfecting the game....

grave frost Aug 16, 2020, 5:04 PM

#

Just make sure it has a Nvidia GPU if you do want to do some ML. You can do ML with AMD GPU but it won't work perfectly and may lead to a lot of crashes and bugs.

#

@cosmic lynx I think there are some people who have done RL on CPU only but I guess it will take hell of a time then. If you are fine with running your laptop 24hr+ then I think you can get started right away