latent blaze May 5, 2021, 2:12 AM

#

wsl actually is pain

misty flint May 5, 2021, 2:56 AM

#

i remember asking that question. then our last team project was on making a chatbot

#

DoggoKek

#

then i figured it out

#

rasa 11/10

#

if you ever need to make a chatbot for deployment

sour abyss May 5, 2021, 3:09 AM

#

how can i calculate the p-value of a test statistic in python?

#

as of right now i only plan on doing t stat, z, chi square and slope of regression line

#

this is what i have as of right now

#

https://gist.github.com/lmaosoggypancakes/cb508b7cac906e3545170da8b94dc13e

Gist

stats.py

GitHub Gist: instantly share code, notes, and snippets.

velvet thorn May 5, 2021, 3:14 AM

#

sour abyss how can i calculate the p-value of a test statistic in python?

manually?

#

or using a library

sour abyss May 5, 2021, 3:15 AM

#

ideally manually

velvet thorn May 5, 2021, 3:16 AM

#

sour abyss ideally manually

hm

#

you know the mathematics behind these calculations, yes?

sour abyss May 5, 2021, 3:16 AM

#

yes

#

for the most part at least

velvet thorn May 5, 2021, 3:17 AM

#

okay so

#

what problems are you having exactly

sour abyss May 5, 2021, 3:19 AM

#

i can find the correct standardized test statistic, such as z-score, t stat, chi square etc. but i'm clueless on how to find the p value from that. i know that hardcoding in a table of z score probability values is not the way to go either

velvet thorn May 5, 2021, 3:19 AM

#

okay so

velvet thorn May 5, 2021, 3:20 AM

#

sour abyss i can find the correct standardized test statistic, such as z-score, t stat, chi...

you can basically

#

think of the p-value as the area under the PDF, right

#

(which corresponds to the CDF)

#

that's all calculus

sour abyss May 5, 2021, 3:21 AM

#

from a normal curve the p-val would be the area to the right of it correct?

velvet thorn May 5, 2021, 3:21 AM

#

sour abyss from a normal curve the p-val would be the area to the right of it correct?

it depends on your test

sour abyss May 5, 2021, 3:22 AM

#

oh right if you're doing like a 2 sided z stat it's P(X > z) and P(X < -z) iirc

velvet thorn May 5, 2021, 3:23 AM

#

ye

#

so

#

those are definite integrals

sour abyss May 5, 2021, 3:23 AM

#

uhh

#

ohhhhhhh right i think it's the area covered to the corresponding side of the test statistic right?

#

just finding the % of the area to the right/left of the statistic

#

if it was 1 sided with a null hypothesis of p = 0 and alternate hypothesis of p > 0 the upper and lower bounds of the integral is from the z-score to infinity righht?

#

i think

exotic maple May 5, 2021, 4:14 AM

#

velvet thorn those are definite integrals

Do people actually do integrals to get those from the PDF? I've never, even in college, solved the PDF in a range to get the p-value. sounds a bit overkill unless explicitly for teaching. Most distributions have their stndard tables no?

velvet thorn May 5, 2021, 4:15 AM

#

exotic maple Do people actually do integrals to get those from the PDF? I've never, even in c...

no, but

#

they said

#

they want to do it manually

exotic maple May 5, 2021, 4:15 AM

#

sour abyss if it was 1 sided with a null hypothesis of p = 0 and alternate hypothesis of p ...

that does sound correct.

exotic maple May 5, 2021, 4:15 AM

#

velvet thorn they want to do it manually

High respects to him lol. I'd just pop a z-table or t-table haha

sour abyss May 5, 2021, 4:21 AM

#

I Found a resource called scipy which can do integrals for you, but all this integral stuff is completely new to me, I've messed around with it back in middle school but its surprising that you can use an integral to find a CDF for a normal curve. Ended up just using scipy in the program, because manually configuring tables and god forbid degrees of freedom didn't look like the best option programatically

velvet thorn May 5, 2021, 4:38 AM

#

sour abyss I Found a resource called scipy which can do integrals for you, but all this int...

no

#

you can calculate the integral manually

#

well

#

it depends on what you want to do I guess

#

whether its' for learning

#

but anyway

#

the CDF is the integral of the PDF

astral path May 5, 2021, 4:45 AM

#

ok i need help really quickly

#

if i have two lists of different lengths but they're within the same range, how would i plot them over each other so they start and end at the same locations?

astral path May 5, 2021, 5:34 AM

#

ok well im scratching that completely now

#

how do i find correlation between two lists with different lengths but which are within the same time series?

#

as in i measured one variable 5 times in an hour, and another 423 times in an hour, and need to see if they're correlated

#

thanks!

#

i have like 30 minutes btw, been working on this for HOURS

lavish tundra May 5, 2021, 5:51 AM

#

someone know how to change all the symbols of a graph legend? I'm using seaborn and matplotlib and but even trying to use legend_handler of matplotlib i don't got sucess

item_xy.legend(legend, fontsize=legendsize, bbox_to_anchor=(0.87, 1.15), loc=2, handler_map={item_xy: HandlerLine2D(numpoints=1)})

#

i'm talking about this

velvet thorn May 5, 2021, 5:54 AM

#

astral path how do i find correlation between two lists with different lengths but which are...

resample

rigid bolt May 5, 2021, 6:29 AM

#

A 28×28 numpy array can be interpreted as a matrix with order 28×28 right? For reference it was mentioned in the tensorflow docs

sinful gale May 5, 2021, 7:06 AM

#

Can anyone explain me how polynomial regression works with the use of linear regression?

#

What does that line do?

hexed heath May 5, 2021, 7:26 AM

#

Hi there ! My question might be dumb but I am looking for the most efficient way to select N rows in a matrix such as there distances to one another are :

ideally, maximum
at least, larger than a threshold
I thought I could cluster my data on N cluster and pick in each one (because that is my underlying idea of selecting rows of different classes), but I wonder if there is really a need in clustering.
Thanks 🙂

mint palm May 5, 2021, 8:16 AM

#

so there is nothing to explore there?

#

if you can give an example of how visualised attention maps would know what NN focus on binary classification of cats, that would be great?

solar loom May 5, 2021, 8:38 AM

#

Can I get a little help on counting precision and recall of a search engine ?

ebon geyser May 5, 2021, 9:39 AM

#

Anyone who has heard about AIML files?

little compass May 5, 2021, 10:11 AM

#

Hey everybody!
I just uploaded a new video "Differentiable augmentation for GANs (using Kornia)"

https://youtu.be/J97EM3Clyys

GANs are known to be very data hungry. Are there ways how to make them more data efficient? As it turns out applying augmentations is not that straightforward. In this video, I explain a recent method called differentiable augmentation (DiffAugment) and use it to train the DCGAN.

YouTube

mildlyoverfitted

Differentiable augmentation for GANs (using Kornia)

In this video, I discuss the paper "Differentiable Augmentation for Data-Efficient GAN Training". Additionally, I take a few ideas from it and try to code up an experiment to investigate whether differentiable augmentation has any effect on GAN training. I use the open-source package Kornia to perform the augmentations. To make our lives simpler...

▶ Play video

trail yoke May 5, 2021, 10:30 AM

#

line 1, in <module>
from PyQt5 import QtCore, QtGui, QtWidgets
ModuleNotFoundError: No module named 'PyQt5'

#

help

lapis sequoia May 5, 2021, 11:14 AM

#

what will be better for a data scientist future at microsoft?

python and SQL
python and TSQL
R and SQL
R and TSQL

red hound May 5, 2021, 11:29 AM

#

Does anyone have good ressources to learn about Tensorflow Graph Debugging? I have a GAN which graph isn't entirely or correctly connected. Maybe there are issues caused by discrete or non differentiable parts, that no gradients can be calculated. I have no idea how to look deeper inside and find out, whats wrong. From the outside everything looks fine and the model compiles and runs the forward pass

mint palm May 5, 2021, 11:52 AM

#

trail yoke line 1, in <module> from PyQt5 import QtCore, QtGui, QtWidgets ModuleNotFoun...

are you running from terminal?

#

is so make sure your terminal is on same python version as you installed modules on

#

you can switch version in terminal by using

#

py -3.x.x filename.py

ebon geyser May 5, 2021, 12:09 PM

#

https://stackoverflow.com/questions/67400936/using-aiml-files-predicate-and-sessions-with-discord-py

Any person willing to answer my question?

Stack Overflow

Using AIML files' Predicate and Sessions with discord.py

I was learning about AIML files with Python. I know I need to use aiml module of Python, but I want to use it with discord.py.
I want to make it so that, suppose I am talking with the bot, and I tell

spare vortex May 5, 2021, 12:19 PM

#

ebon geyser Anyone who has heard about AIML files?

yes

ebon geyser May 5, 2021, 12:19 PM

#

spare vortex yes

Uhh, I need some help

#

Can u plzz see the link above?

spare vortex May 5, 2021, 12:19 PM

#

I made a chatbot api with it

ebon geyser May 5, 2021, 12:19 PM

#

Ooh, cool

#

Can u plzzzz see the link above?

ebon geyser May 5, 2021, 12:19 PM

#

spare vortex I made a chatbot api with it

It's open source?

spare vortex May 5, 2021, 12:20 PM

#

I mean it's still in development

#

I saw it

#

and I read it

#

basically AIML files are like HTML

#

but made for chatbots

#

what you have to do is

ebon geyser May 5, 2021, 12:20 PM

#

Ok?

spare vortex May 5, 2021, 12:20 PM

#

you need to create a file called
std-startup.xml

ebon geyser May 5, 2021, 12:20 PM

#

I mean

#

I have all that

spare vortex May 5, 2021, 12:21 PM

#

ah you di

#

do

#

then I will give you the resource

#

wait

ebon geyser May 5, 2021, 12:21 PM

#

I just need some help with predicates and stuff

spare vortex May 5, 2021, 12:21 PM

#

about aiml and pandorabots

ebon geyser May 5, 2021, 12:21 PM

#

Uhhh, what's pandorabots?

spare vortex May 5, 2021, 12:22 PM

#

http://www.aiml.foundation/doc.html

AIML Foundation

spare vortex May 5, 2021, 12:22 PM

#

ebon geyser Uhhh, what's pandorabots?

pandorabots is chatbot application platform that uses AIML

#

it's very good platform to use chatbots from

ebon geyser May 5, 2021, 12:22 PM

#

Ohhh

spare vortex May 5, 2021, 12:22 PM

#

especially Kuki_ai

ebon geyser May 5, 2021, 12:23 PM

#

Wait

#

The AIML files I have

#

Those all have version 1.0.1

spare vortex May 5, 2021, 12:23 PM

#

https://home.pandorabots.com/home.html

Pandorabots: Home

The leading platform for building and deploying chatbots.

spare vortex May 5, 2021, 12:23 PM

#

ebon geyser Those all have version 1.0.1

for python three you need to install
python-aiml

#

not just aiml

ebon geyser May 5, 2021, 12:23 PM

#

ebon geyser Those all have version 1.0.1

~~got those from GitHub~~

spare vortex May 5, 2021, 12:23 PM

#

aiml is for python2

#

python-aiml is for python3

ebon geyser May 5, 2021, 12:24 PM

#

Yes I know that

#

And is there a getting started guide? Or I need to see that link u sent?

spare vortex May 5, 2021, 12:24 PM

#

aiml docs are the guiee

#

guide

#

and pandorabots is the example

ebon geyser May 5, 2021, 12:25 PM

#

Oh, ok

spare vortex May 5, 2021, 12:25 PM

#

of how would you make one

#

aiml is very easy and very good chatbot system

#

but it is rule based

ebon geyser May 5, 2021, 12:25 PM

#

And can I use old AIML files?

spare vortex May 5, 2021, 12:25 PM

#

so it has its disadvantages

spare vortex May 5, 2021, 12:25 PM

#

ebon geyser And can I use old AIML files?

Aiml files are similar to xml files

#

so that wont matter

#

the only thing will matter is your the library

ebon geyser May 5, 2021, 12:26 PM

#

https://github.com/sohelamin/chatbot

Like of these...

GitHub

sohelamin/chatbot

An AI Based Chatbot [DEPRECATED]. Contribute to sohelamin/chatbot development by creating an account on GitHub.

spare vortex May 5, 2021, 12:26 PM

#

you are using

#

not that

#

its deprecated

ebon geyser May 5, 2021, 12:26 PM

#

Well does it matter?

spare vortex May 5, 2021, 12:26 PM

#

use the official resources wait

ebon geyser May 5, 2021, 12:26 PM

#

Am just getting the AIML files

spare vortex May 5, 2021, 12:27 PM

#

wait a sec

ebon geyser May 5, 2021, 12:27 PM

#

Sure

spare vortex May 5, 2021, 12:28 PM

#

https://github.com/pandorabots/Free-AIML

GitHub

pandorabots/Free-AIML

A collection of free AIML files from Mitsuku Chatbot creator Steve Worswick - pandorabots/Free-AIML

#

this and

ebon geyser May 5, 2021, 12:28 PM

#

spare vortex https://home.pandorabots.com/home.html

I can't use this. It's using an API module and thinks am making a website, but am making a bot...

ebon geyser May 5, 2021, 12:29 PM

#

spare vortex this and

Uhhh, I have also heard about std-65.aiml thingy

spare vortex May 5, 2021, 12:29 PM

#

http://www.aiml.foundation/

AIML Foundation

ebon geyser May 5, 2021, 12:30 PM

#

Oh ok cool

spare vortex May 5, 2021, 12:30 PM

#

ebon geyser I can't use this. It's using an API module and thinks am making a website, but a...

it doesnt mattwe

#

matter

#

the aiml files are same

ebon geyser May 5, 2021, 12:30 PM

#

And also, if I want some aiml files, can I get those from any GitHub repo?

spare vortex May 5, 2021, 12:30 PM

#

i used these in my chatbot api

spare vortex May 5, 2021, 12:30 PM

#

ebon geyser And also, if I want some aiml files, can I get those from **any** GitHub repo?

yes

#

just download using git

ebon geyser May 5, 2021, 12:30 PM

#

And I also need to change the version?

spare vortex May 5, 2021, 12:30 PM

#

clone it

ebon geyser May 5, 2021, 12:30 PM

#

From 1.0 to 2.0

#

?

spare vortex May 5, 2021, 12:30 PM

#

ebon geyser And I also need to change the version?

nope not really

#

it will work

ebon geyser May 5, 2021, 12:30 PM

#

Oh ok

spare vortex May 5, 2021, 12:31 PM

#

aiml 2.0 will also work

ebon geyser May 5, 2021, 12:31 PM

#

spare vortex just download using git

Can't I download manually?

spare vortex May 5, 2021, 12:31 PM

#

you can ofc

ebon geyser May 5, 2021, 12:31 PM

#

spare vortex aiml 2.0 will also work

What if the version is 1.0.* Those would also work?

#

means anything

spare vortex May 5, 2021, 12:32 PM

#

ye it will work

ebon geyser May 5, 2021, 12:32 PM

#

Oh cool!

spare vortex May 5, 2021, 12:32 PM

#

just try it and see lol

ebon geyser May 5, 2021, 12:32 PM

#

So I can just copy paste the aiml tiles

#

Files*

spare vortex May 5, 2021, 12:32 PM

#

yea

ebon geyser May 5, 2021, 12:32 PM

#

BTW, if it's ok for u, may I ping/DM u, regarding them?

spare vortex May 5, 2021, 12:33 PM

#

sure

ebon geyser May 5, 2021, 12:33 PM

#

And also, have u used sessions and predicates, like I asked in that question above?

spare vortex May 5, 2021, 12:33 PM

#

https://github.com/hosford42/AIML_Sets/tree/master/aiml_sets

GitHub

hosford42/AIML_Sets

AIML sets (ALICE & Mitsuku). Contribute to hosford42/AIML_Sets development by creating an account on GitHub.

#

this one has everything

#

all the files you will need

ebon geyser May 5, 2021, 12:33 PM

#

Ooh

#

Cool!

spare vortex May 5, 2021, 12:33 PM

#

it has alice chatbot files
mitsuki and
standard aiml files

ebon geyser May 5, 2021, 12:34 PM

#

Ooh

#

Those all are AIML files?!

raven knoll May 5, 2021, 12:34 PM

#

is it possible to use TextBlob in the dutch language?

spare vortex May 5, 2021, 12:34 PM

#

ebon geyser Those all are AIML files?!

yep

ebon geyser May 5, 2021, 12:34 PM

#

Damn, this person did do a looot of work

spare vortex May 5, 2021, 12:34 PM

#

yea

ebon geyser May 5, 2021, 12:35 PM

#

BTW, have u used predicates or something?

#

The get and set methods?

spare vortex May 5, 2021, 12:35 PM

#

they are not actually in aiml

#

you have tags

#

like learn
random

#

and say use

#

tags like that

ebon geyser May 5, 2021, 12:35 PM

#

Ooh ok

#

Thanks for help!

spare vortex May 5, 2021, 12:35 PM

#

look at the aiml website

ebon geyser May 5, 2021, 12:35 PM

#

Appreciate it dude

spare vortex May 5, 2021, 12:36 PM

#

np

spare vortex May 5, 2021, 12:37 PM

#

ebon geyser Damn, this person **did** do a looot of work

also

#

you are using it for your discord bot right?

kindred blade May 5, 2021, 12:46 PM

#

is it better to use matplotlib or Charts js to show on web

dark sigil May 5, 2021, 12:51 PM

#

For my data analysis unit I need to clean up a database to create charts from the data for the report i'm creating. I'm trying to figure out which columns won't be useful to drop them.
The scenario brief I'm set is this:
You are working in a small Data Analytics firm. A small Insurance Broker is looking to add another insurer on to their portfolio. The new insurer wants to see the claims performance of their current business (a “bordereau”).
Would I need to drop some columns?

lapis sequoia May 5, 2021, 12:53 PM

#

Hey anyone faced the similar issue where pandas converts ints,floats, and etc into objects?

example_array = np.array([
    [1, 2, 3],
    ['one', 'two', 'three'],
    [4.01, 5.01, 6.01],
    [np.nan, np.nan, np.nan]
])

df = pd.DataFrame(example_array, index=['int', 'string', 'float', 'nan'])

# df.select_dtypes(include = ['float'])
df.dtypes```

output:

0 object
1 object
2 object
dtype: object```

haughty tree May 5, 2021, 1:24 PM

#

can i have a good resource to learn ai/ml

#

kinda doesn't know the exact roadmap

grave breach May 5, 2021, 1:34 PM

#

@haughty tree Do you have an high school math background?

haughty tree May 5, 2021, 1:36 PM

#

grave breach <@!622876447354126346> Do you have an high school math background?

yes

grave breach May 5, 2021, 1:36 PM

#

Great

#

So if you want to start with neural network I heavily suggest nffs.io by sentdex (he's also making a youtube series of the book, but it might take a long time to complete)

tidal bough May 5, 2021, 2:10 PM

#

lapis sequoia Hey anyone faced the similar issue where pandas converts ints,floats, and etc in...

That seems somewhat understandable, because the original array example_array is already of dtype object (because numpy arrays are homogenous), so pandas merely doesn't recalculate the dtypes when creating a dataframe from a single numpy array.

#

Maybe you can avoid using an array here?

lapis sequoia May 5, 2021, 2:21 PM

#

yeah I did some research found it to be the way, just the issue I'm facing is if I construct multiple independent lists they will have the same problem while passing through DataFrame.

#

Instead of doing hole pd.Series I was wondering if there is a way to work around that issue

lapis sequoia May 5, 2021, 2:23 PM

#

tidal bough Maybe you can avoid using an array here?

.

#

example_array = [
[1, 2, 3],
['one', 'two', 'three'],
[4.01, 5.01, 6.01],
[np.nan, np.nan, np.nan]]

dtypes = ['int', 'string', 'float', 'nan']

df = pd.DataFrame()

for index in range(len(example_array)):
    df[dtypes[index]] = example_array[index]

df.dtypes```

#

this seems to solve the issue

tidal bough May 5, 2021, 2:27 PM

#

if I construct multiple independent lists they will have the same problem while passing through DataFrame.
Not sure what you mean by that. You get the same problem even when passing each column as a list or a numpy array with the right dtype?

lapis sequoia May 5, 2021, 2:28 PM

#

TIL: Panda associates dtypes per column basis

tidal bough May 5, 2021, 2:28 PM

#

Basically, I'm saying that if all that you have as a numpy array of dtype object, you need to somehow make pandas recalculate the dtypes of each column. Ideally, you justy wouldn't create such an array.

tidal bough May 5, 2021, 2:28 PM

#

lapis sequoia TIL: Panda associates dtypes per column basis

Yup! In fact, secretly each column of a dataframe, a Series, is essentially a 1d numpy array with a dtype of its own.

lapis sequoia May 5, 2021, 2:29 PM

#

yeah that wouldn't of been optimal, but the secondary version seems to work like a charm

balmy junco May 5, 2021, 3:12 PM

#

Hey guys, I am using resnet18 from pytorch and I'm having trouble optimizing for recall for my binary classifier. I think I need to make a change to my fc layer. Any thoughts on how to do this?

grave frost May 5, 2021, 3:46 PM

#

mint palm if you can give an example of how visualised attention maps would know what NN f...

99% of things you would propose would be done, unless you research them yourself 🤷

#

it's all just a google search away

mint palm May 5, 2021, 3:48 PM

#

thats true

hushed wasp May 5, 2021, 3:59 PM

#

# Creation of histograms (features)
temps1=time.time()

def build_histogram(kmeans, des, image_num):
    res = kmeans.predict(des)
    hist = np.zeros(len(kmeans.cluster_centers_))
    nb_des=len(des)
    if nb_des==0 : print("problème histogramme image  : ", image_num)
    for i in res:
        hist[i] += 1.0/nb_des
    return hist


# Creation of a matrix of histograms
hist_vectors=[]

for i, image_desc in enumerate(imagesarray) :
    if i%100 == 0 : print(i)  
    hist = build_histogram(kmeans, image_desc.reshape(-1, 1), i) #calculates the histogram
    hist_vectors.append(hist) #histogram is the feature vector

im_features = np.asarray(hist_vectors)

duration1=time.time()-temps1
print("temps de création histogrammes : ", "%15.2f" % duration1, "secondes")```


Hello guys, I don't know why sometimes this code works and why sometimes it's looping again and again.... can anyone help?

Thanks

stiff drift May 5, 2021, 4:06 PM

#

Hello, anyone knows about courses involving machine learning and trading?

grave frost May 5, 2021, 4:10 PM

#

do you want to use ML with trading/finance?

stiff drift May 5, 2021, 4:11 PM

#

yes

#

I would highly appreciate if anyone has some data about it. There is a book from Stephen Jensen named Machine Learning for algorithmic trading but idk if it is useful,

#

And some online courses, but nowadays many people sell bullshit specially trading related

grave frost May 5, 2021, 4:21 PM

#

that's cuz it's not a great idea in general

#

you need advanced models to turn a good profit, since humans rely mostly on luck and crypto

#

which can't be taught by a course

mint palm May 5, 2021, 4:27 PM

#

stiff drift Hello, anyone knows about courses involving machine learning and trading?

i know a course that has machine learning and involves some projects that are for frauds in finances

#

but its in R lang

#

u interested

stiff drift May 5, 2021, 4:31 PM

#

yeah sure, maybe i can adapt it for python

stiff drift May 5, 2021, 4:32 PM

#

grave frost you need advanced models to turn a good profit, since humans rely mostly on luck...

Yeah i know that but trading requires lots of attention and maybe with a bot, less info may slip by

grave frost May 5, 2021, 4:33 PM

#

stiff drift Yeah i know that but trading requires lots of attention and maybe with a bot, le...

you sound like....do you know the basics of ML?

stiff drift May 5, 2021, 4:33 PM

#

The other day i was 8 hs in front of the pc and the minute i go to the supermarket things happened haha

stiff drift May 5, 2021, 4:33 PM

#

grave frost you sound like....do you know the basics of ML?

Yeah im learning, just started but i know the basics

grave frost May 5, 2021, 4:33 PM

#

because there was a guy here the other day who asked the same question

#

and you can guess, he is a ~~shitposter~~

#

anyways, that's not how trading works

#

trading won't require a lot of "attention" per se unless you are doing day trading- which you shouldn't at all since it's very risky

mint palm May 5, 2021, 4:36 PM

#

stiff drift yeah sure, maybe i can adapt it for python

its data science for r by harvardx on edx

grave frost May 5, 2021, 4:36 PM

#

but you should have a very specific usecase for such a "bot" for it to be actually helpful to you

#

in the end, it depends on what exactly you are trying to automate and to what extent

mint palm May 5, 2021, 4:37 PM

#

https://www.edx.org/professional-certificate/harvardx-data-science

edX

HarvardX Data Science Professional Certificate

Learn key data science essentials, including R and machine learning, through real-world case studies to jumpstart your career as a data scientist.

#

@stiff drift

stiff drift May 5, 2021, 4:38 PM

#

I do day trading and i relly solely on myself. But it is tiring and somehow i want like a machine backup just in case i miss something you know

#

I will try finding some courses on udemy or the book i mentioned before. Thxs everyone

desert oar May 5, 2021, 4:41 PM

#

@stiff drift maybe you can try to write up some heuristic rules for what counts as "something happening" (e.g. price movement above a certain threshold) and then encode those in a simple program, just following rules, no fancy AI stuff

#

after all, machine learning often amounts to trying to capture human intuition and reasoning in a machine

#

starting with simple rules is often the best way to go

stiff drift May 5, 2021, 4:42 PM

#

Yeah, i ve done that. But maybe applying ML i build the most profitable bot ever hahah, just let me dream

#

thanks !

#

i will research if anyone is interested dm

desert oar May 5, 2021, 4:43 PM

#

start with trying to match what you, a human, currently do

#

then worry about making it better than what a human can do

#

the 1st one is already very hard, the 2nd is exponentially harder

stiff drift May 5, 2021, 4:44 PM

#

hahah yeah, will try my best

left jacinth May 5, 2021, 4:48 PM

#

i an amateur. can anybody help me?

stiff drift May 5, 2021, 4:51 PM

#

On what?

tidal bronze May 5, 2021, 5:06 PM

#

what is the prformance impact of copying a value to a variable
example:

for x in list1:
    y = self.data["Pallets"].get((t, i, f), 0) + 1
```is what I am currently doing
but what if I did this instead:
```python
for x in list1:
    d = self.data["Pallets"].get((t, i, f), 0)
    y = d + 1

I think it makes the code more readable but my code should also be performant, is the impacting of assigning the value to d negligible?

odd yoke May 5, 2021, 5:11 PM

#

It is negligible but it's not exactly the right channel

late shell May 5, 2021, 5:15 PM

#

Hello, A beginner question on ML, how do I know which model needs feature scaling & which doesn't?

mint palm May 5, 2021, 5:27 PM

#

late shell Hello, A beginner question on ML, how do I know which model needs feature scalin...

normalisation you mean?

late shell May 5, 2021, 5:38 PM

#

yeah, although normalization is just one feature scaling technique, right? there a few more

grave frost May 5, 2021, 5:38 PM

#

late shell Hello, A beginner question on ML, how do I know which model needs feature scalin...

you can research if that particular variant handles scaling (or doesn't need it) like NN's always need normalization

#

like with N.B, you don't normalize the probs

late shell May 5, 2021, 5:41 PM

#

grave frost you can research if that particular variant handles scaling (or doesn't need it)...

Is there a way of knowing which model requires it and which don't? like a thought process through which I can figure out for myself, or some kind of general rule of thumb?

mint palm May 5, 2021, 5:43 PM

#

just test on test set

#

if its too slow or too biased then it will require normalisation

grave frost May 5, 2021, 5:43 PM

#

late shell Is there a way of knowing which model requires it and which don't? like a though...

you know it on your own most times (like when you know your algo in-depth) but for some, there reasons are based mostly on practical observations, later supplemented by theory

mint palm May 5, 2021, 5:44 PM

#

and testing will tell you and normalisation is overall very much used......saves time too

grave frost May 5, 2021, 5:45 PM

#

what has testing got to do with normalization?

tidal bronze May 5, 2021, 5:45 PM

#

anybody jnow how to use the https://pandas.pydata.org/docs/reference/api/pandas.Series.get.html this pandas method?

it always return my defalut value, I am using a multi-index with a tuple like this:

self.data["Pallets"].get((t, i, f), 0)

late shell May 5, 2021, 5:46 PM

#

alright, thanks a ton @grave frost and @mint palm.

mint palm May 5, 2021, 5:47 PM

#

isnt it that some normalisation optimize how much portion of activation function we use

#

that do affect learning process

grave frost May 5, 2021, 5:49 PM

#

mint palm isnt it that some normalisation optimize how much portion of activation function...

no, from my limited knowledge, like in NN's large values lead to larger gradients. no need to boost a grad's worth if it's not actually contributing. it would just cause unnecessary bias

mint palm May 5, 2021, 5:50 PM

#

i too have just started it i dont know for sure.....so wont debate much😆

#

but for time i can say for sure it would fasten learning

grave frost May 5, 2021, 5:52 PM

#

mint palm but for time i can say for sure it would fasten learning

it might a little bit, but that is very insignificant. the primary purpose is to not create undue biases

mint palm May 5, 2021, 5:55 PM

#

https://towardsdatascience.com/the-vanishing-exploding-gradient-problem-in-deep-neural-networks-191358470c11#:~:text=In the case of exploding,causes overflow resulting in NaN

Medium

The Vanishing/Exploding Gradient Problem in Deep Neural Networks

Understanding the obstacles that faces us when building deep neural networks

#

see this it cause insufficient learning due to inproper scaled param

grave frost May 5, 2021, 5:57 PM

#

mint palm see this it cause insufficient learning due to inproper scaled param

that's not insufficient learning, it's when due to lack of normalization, your gradient keeps on increasing blowing up to infinity and giving nans. that's why we normalize

mint palm May 5, 2021, 5:57 PM

#

ya that cause some of details in input do unnoticed

#

that leads to more error when testing model

grave frost May 5, 2021, 5:58 PM

#

mint palm ya that cause some of details in input do unnoticed

???

mint palm May 5, 2021, 5:58 PM

#

mint palm ya that cause some of details in input do unnoticed

when some param vanish

grave frost May 5, 2021, 5:59 PM

#

parameters don't vanish

#

and model doesn't "notice" anything, it's just for analogy

mint palm May 5, 2021, 6:00 PM

#

grave frost parameters don't vanish

😒 i know its for analogy......then just become soo tiny they are too small to be affected by further function application

grave frost May 5, 2021, 6:00 PM

#

mint palm 😒 i know its for analogy......then just become soo tiny they are too small to ...

I think you are using the terminology wrongly

mint palm May 5, 2021, 6:01 PM

#

ya maybe

grave frost May 5, 2021, 6:01 PM

#

The parameters of a neural network are typically the weights of the connections. In this case, these parameters are learned during the training stage. So, the algorithm itself (and the input data) tunes these parameters. The hyper parameters are typically the learning rate, the batch size or the number of epochs.
simple definition

#

bruh, stop spamming the same message

mint palm May 5, 2021, 6:01 PM

#

lol

grave frost May 5, 2021, 6:01 PM

#

#❓｜how-to-get-help

tidal bronze May 5, 2021, 6:02 PM

#

bro once in 15min is hardly spamming

mint palm May 5, 2021, 6:02 PM

#

ya take help

tidal bronze May 5, 2021, 6:02 PM

#

I already have a channel

grave frost May 5, 2021, 6:02 PM

#

tidal bronze I already have a channel

then it's cross-posting, which is even worse

#

🤷

tidal bronze May 5, 2021, 6:03 PM

#

bro quit spamming me

mint palm May 5, 2021, 6:03 PM

#

😆

#

chill out guys

tidal bronze May 5, 2021, 6:03 PM

#

now my message has lost visibility, thanks a lot...

grave frost May 5, 2021, 6:04 PM

#

aight then, if you get muted due to cross-posting, don't blame me

mint palm May 5, 2021, 6:04 PM

#

🤣

tidal bronze May 5, 2021, 6:04 PM

#

alright if you get muted for going off-topic, don't blame me

mint palm May 5, 2021, 6:04 PM

#

just delete previous ones and post one last time.....we are stopping the chat

tidal bronze May 5, 2021, 6:05 PM

#

anybody jnow how to use the https://pandas.pydata.org/docs/reference/api/pandas.Series.get.html this pandas method?

it always return my defalut value, I am using a multi-index with a tuple like this:

self.data["Pallets"].get((t, i, f), 0)

and when I print(self.data["Pallets"].keys()) I get the expected output

grave frost May 5, 2021, 6:41 PM

#

QQ: If both the training set and eval set have the same ratio for the unbalanced classes, should I deal with the imbalance? (~~I am too lazy~~)
Funny, never even thought about this 🙂 Any ideas?

#

BTW My aim is just to get a good acc on the test set. Generalization be damned

novel oyster May 5, 2021, 6:51 PM

#

feel free to @ me or dm if you see this and can help

desert oar May 5, 2021, 7:24 PM

#

late shell Hello, A beginner question on ML, how do I know which model needs feature scalin...

i recommend standardization (centering + scaling to unit variance) for all your unbounded numerical features in pretty much any model

#

for bounded features, normalize to [0,1]

wicked mantle May 5, 2021, 8:33 PM

#

How can i resize image to bounding box? in pytorch
i mean dynamically set (xmin, ymin, xmax, ymax) values to all images. I think transforms.Resize() can help me, but Resize() only takes two arguments and its not accurate to bounding box

#

seems like there are no way to crop it with pytorch, i'll use Pillow lib

grave frost May 5, 2021, 9:50 PM

#

So I am doing fine-tuning with images, and I had a quick question.

#

The keras docs suggest that I should freeze my base model (not importing it's top part), and train the classifier placed at the end of the model. Then they would fine-tune on the whole model with SGD and a slow LR.

But intuitively, I was thinking that I would freeze the classifier 'layers' at the end of the model, allow the base model to train and learn the features from my specific dataset; then I would freeze the base model (which would perform feature extraction) and fine-tune with the same recipie on my custom/target dataset.

Why don't we do the second method, as opposed to the first?

wicked mantle May 5, 2021, 10:08 PM

#

How to add machine learning model to discord bot?
For example, i have a model to predict cats, and i want to implement this predict model to bot

desert oar May 5, 2021, 10:32 PM

#

grave frost The keras docs suggest that I should freeze my base model (not importing it's to...

because if you have frozen model weights, those model weights will still propagate information backwards through the network; in this case, erroneous information

#

if you want to train them separately, maybe use an autoencoder or something first and then put logistic regression or something on top of the low rank representation

#

but thats no better (and probably worse) than just doing it the way keras recommends

grave frost May 5, 2021, 10:34 PM

#

desert oar because if you have frozen model weights, those model weights will still propaga...

wait, so if I set trainable=False the model weights still get updated?

desert oar May 5, 2021, 10:35 PM

#

no, but the frozen model weights still affect the gradient, which affects the weight updates for the trainable weights

grave frost May 5, 2021, 10:36 PM

#

hmm...so if I want to use the base model for feature extraction, what do I do?
should I train it seperately, then load that checkpoint for the base model?

#

I had the base model for feature extraction, with a small CNN as a classifier. now, is there somehow a way to remove that CNN classifier at the bottom, only train the base model and store it?

#

because if I include the CNN classifier, I won't be able to load it since keras doesn't recongize why there are weights for layers not in the base model - so it errors out

desert oar May 5, 2021, 10:39 PM

#

can you do "everything but the last layer"?

#

like, include the convolutional layers from your model, but exclude the fully connected stuff at the end

grave frost May 5, 2021, 10:40 PM

#

you mean train the base model only, with no other layers? on my source dataset

desert oar May 5, 2021, 10:41 PM

#

what is the base model

#

and can you link to the keras doc recommendation? im curious what their wording is

grave frost May 5, 2021, 10:41 PM

#

efficientnetb0 for now

#

https://www.tensorflow.org/api_docs/python/tf/keras/applications/EfficientNetB0

TensorFlow

tf.keras.applications.EfficientNetB0 | TensorFlow Core v2.4.1

Instantiates the EfficientNetB0 architecture.

desert oar May 5, 2021, 10:42 PM

#

are you using the imagenet version or training from scratch

grave frost May 5, 2021, 10:42 PM

#

Im not including the FC classifier at top, so that flag is False.

grave frost May 5, 2021, 10:42 PM

#

desert oar are you using the imagenet version or training from scratch

yeah, imagenet

#

gives a better initialization 🤷

#

so imagenet weights as a starting point, then somehow modify the base model to learn features from my own source dataset. freeze it up, add FC layers and train those FC Layers on my target dataset

desert oar May 5, 2021, 10:44 PM

#

yeah. so train efficientb0 + fully-connected on your data, then: 1) freeze the efficientb0 and re-train the fully-connected layer at the end for better accuracy, and 2) go use the refined efficientb0 network for feature extraction elsewhere

grave frost May 5, 2021, 10:45 PM

#

step 0 and 1 are on the same "big" dataset, right? and step 2 I can use it on my target dataset?

desert oar May 5, 2021, 10:46 PM

#

that seems right, but what are you doing with the target dataset?

grave frost May 5, 2021, 10:46 PM

#

desert oar that seems right, but what are you doing with the target dataset?

classification - it's composition should be slightly different than my main dataset, and it's small too

desert oar May 5, 2021, 10:48 PM

#

in that case, can you zero out and/or fine-tune the fully connected weights?

#

i think that should be roughly equivalent to taking the features from the efficientb0 part and stacking a separate model on top

grave frost May 5, 2021, 10:49 PM

#

desert oar i _think_ that should be roughly equivalent to taking the features from the effi...

yea

can you zero out and/or fine-tune the fully connected weights?
the only thing I can think of, is to freeze those layers.

desert oar May 5, 2021, 10:49 PM

#

transfer learning, thats what its called

#

https://keras.io/guides/transfer_learning/

Keras documentation: Transfer learning & fine-tuning

grave frost May 5, 2021, 10:50 PM

#

yeah, that's the guide I was referrring

desert oar May 5, 2021, 10:50 PM

#

and yes either method is valid

grave frost May 5, 2021, 10:53 PM

#

so, to summarize. I would have efficientnetb0 + F.C connected initially. I freeze the F.C Layers, and train on the base model alone......?
I think what I want is to somehow train the whole effnet+F.C on my source dataset, but export weights only for the effnet. that load it somewhere else, and train on target dataset

grave frost May 5, 2021, 11:09 PM

#

So after reading up, it does seem keras provides two handy functions get_weights and set_weights to get weights of individual layer, and save them as numpy arrays to be loaded later (from an instantiated layer). Hopefully, with a bit of a luck I might be able try it.

desert oar May 5, 2021, 11:10 PM

#

grave frost so, to summarize. I would have efficientnetb0 + F.C connected initially. I freez...

Yes, you want this:

I think what I want is to somehow train the whole effnet+F.C on my source dataset, but export weights only for the effnet. that load it somewhere else, and train on target dataset

#

Wait

#

No

#

You want the opposite

grave frost May 5, 2021, 11:11 PM

#

I don't get how I want the opposite?

#

I train my FC, but not my effnet?

desert oar May 5, 2021, 11:13 PM

#

Yes

#

All the deep stuff requires a lot of data

#

So you train that part on the big data set

#

Are you saying that you think the extracted features should be different between your big data set and the target data set?

#

Which is why you would want to retrain the deep layers on the target?

grave frost May 5, 2021, 11:14 PM

#

desert oar Are you saying that you think the extracted features should be different between...

I am saying that the CNN layers would be able to extarct the features from the smaller dataset better, if they knew what to look for from the big dataset

desert oar May 5, 2021, 11:14 PM

#

Just just doesn't make sense to change what the deep stuff emits but freeze the classifier on top

#

Imagine you don't retrain the model but randomly rescale the final hidden layer outputs

#

Then your classifier weights will all be meaningless and your classifier will produce garbage

grave frost May 5, 2021, 11:17 PM

#

desert oar Then your classifier weights will all be meaningless and your classifier will pr...

hmmm...so you are saying, that I keep the effnet freezed up and train my F.C on it?

#

Effnet + F.C gives me a decent accuracy. --> if I keep the Effnet there with the same weights, then it would extract the same features, right?
Then I just need to re-train the F.C on the new dataset (to learn to make sense of features from new dataset and use it to predict slightly different classes [which would be accomodated by the activation function]) shouldn't that theoretically work?

desert oar May 5, 2021, 11:21 PM

#

grave frost hmmm...so you are saying, that I keep the effnet freezed up and train my F.C on ...

yes

desert oar May 5, 2021, 11:21 PM

#

grave frost Effnet + F.C gives me a decent accuracy. --> if I keep the Effnet there with the...

yes

void egret May 5, 2021, 11:22 PM

#

Hello. I have a question. I want to make bar plot from values of data frame, as you can see of screen. And the value I want to showed on the plot is on the bottom of picture. So I want percentage for every kind of education level. Is there any looping idea to make it happen without manually typing the differences? If its wrong chat for such a question, I'm sorry in advance.

desert oar May 5, 2021, 11:22 PM

#

that's literally the description of transfer learning @grave frost

grave frost May 5, 2021, 11:23 PM

#

aight. Then there is another slightly different method I can do. train the whole effnet + F.C shebang. then just use SGD with lower Lr on the new dataset?

desert oar May 5, 2021, 11:23 PM

#

you're fine-tuning the effnet model on your big data set, then using that fine-tuned model for transfer learning on the target data set

#

im not sure how thats different

grave frost May 5, 2021, 11:23 PM

#

desert oar you're fine-tuning the effnet model on your big data set, then using that fine-t...

but that's why I said earlier 😖

desert oar May 5, 2021, 11:24 PM

#

no. you were saying the opposite... or so i thought

#

where you freeze the fc layer at the end and update only the effnet parts, which makes no sense

grave frost May 5, 2021, 11:24 PM

#

desert oar im not sure how thats different

doing LR slowly on the whole model, taking extra assumption that the weights used in source dataset won't differ too much in the target one - whatever will, would get slowly updated

grave frost May 5, 2021, 11:25 PM

#

desert oar where you freeze the fc layer at the end and update only the effnet parts, which...

that's what they do in the guide link above tho

#

base_model = keras.applications.Xception(
    weights="imagenet",  # Load weights pre-trained on ImageNet.
    input_shape=(150, 150, 3),
    include_top=False,
)  # Do not include the ImageNet classifier at the top.

# Freeze the base_model
base_model.trainable = False

# Create new model on top
inputs = keras.Input(shape=(150, 150, 3))
# The base model contains batchnorm layers. We want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the
# base_model is running in inference mode here.
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x)  # Regularize with dropout
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)

model.summary()

desert oar May 5, 2021, 11:26 PM

#

yeah i just saw that

#

huh... so they are using a very low learning rate to update (not re-train) the base model, freezing the fully connected output model?

grave frost May 5, 2021, 11:26 PM

#

then how is that supposed to work? if they freeze their feature extractor...then wouldn't it all just break down due to lack of features

desert oar May 5, 2021, 11:26 PM

#

i dont think thats whats happening in this code

#

i think its the opposite

#

it looks like they are freezing the base model, and only training the output layer (as well as their other stuff on top)

grave frost May 5, 2021, 11:27 PM

#

desert oar huh... so they are using a very low learning rate to update (not re-train) the b...

oh, so you are saying they are taking imagenet as a good point, and slowly updating to fit their dataset

desert oar May 5, 2021, 11:28 PM

#

hold on

#

back up

#

there are 2 things happening here:

transfer learning: freeze the base model, train a new model on top
fine-tuning: after step (1), un-freezing the entire model and running a few more epochs with a very low learning rate

#

at no point are they freezing the new layers and training only the base layers

grave frost May 5, 2021, 11:30 PM

#

desert oar at no point are they freezing the new layers and training only the base layers

yeah, that's cause the imagenet initialization is in the domain of their problem. I have to re-train the feautre extractor to fit my own problem, which is nothing like imagenet

desert oar May 5, 2021, 11:31 PM

#

but you have 2 datasets right? a big one and the target one?

#

and those are at least similar in domain?

grave frost May 5, 2021, 11:31 PM

#

however, it seems imagenet is lucky for this dataset. so I just start it as a pseudo random initialization to learn features from the biggie dataset

desert oar May 5, 2021, 11:31 PM

#

void egret Hello. I have a question. I want to make bar plot from values of data frame, as...

df3.groupby(level='parental_level_of_education')['lunch'] \
    .apply(lambda y: y / y.sum())

maybe something like this?

grave frost May 5, 2021, 11:31 PM

#

desert oar and those are at least similar in domain?

y

desert oar May 5, 2021, 11:33 PM

#

grave frost however, it seems imagenet is lucky for this dataset. so I just start it as a ps...

(primary training) train base+new on data A, as if from scratch, until convergence
(transfer learning) freeze base, re-train new on data B until convergence
(fine-tuning) unfreeze base, update base+new on data B with very low learning rate

grave frost May 5, 2021, 11:33 PM

#

I wanna do 2 & 3

#

just with training base again

#

to recognize features from my domain, not imagenet

#

think of it as initializing base with no weights

desert oar May 5, 2021, 11:34 PM

#

but you are only using imagenet as initialization

grave frost May 5, 2021, 11:34 PM

#

desert oar but you are only using imagenet as initialization

yes, as a pseudo-random init

desert oar May 5, 2021, 11:35 PM

#

so im not sure what your hangup is

grave frost May 5, 2021, 11:35 PM

#

just coz it lets me converge to my needed features better and faster

desert oar May 5, 2021, 11:35 PM

#

but why arent you doing 1?

grave frost May 5, 2021, 11:35 PM

#

desert oar but why arent you doing 1?

very low accuracy

desert oar May 5, 2021, 11:36 PM

#

then maybe imagenet isnt good initialization after all?

#

why do you expect better accuracy if you dont train on the big dataset?

grave frost May 5, 2021, 11:36 PM

#

desert oar then maybe imagenet isnt good initialization after all?

none initialization gives 6% less

#

don't ask me why 🤷

lilac raven May 5, 2021, 11:37 PM

#

If I have a large amount of files that with the same number of values that I want to average, like I want to average a large number of curves, and some of those files have NaN as there value in some data points, can I still use the X+Y+Z/N

grave frost May 5, 2021, 11:37 PM

#

desert oar why do you expect better accuracy if you _dont_ train on the big dataset?

I do train on the big dataset, but just to get my feature extractor to start extracting features relevant to my problem.

#

then using the trained F.T, I extract features from small one, and train another F.C from scratch to classify my small dataset

desert oar May 5, 2021, 11:38 PM

#

i still dont see how this is different from the 1-3 steps i outlined

#

all i am saying is, dont freeze the fc network at the top and unfreeze the cnn at the base, and expect useful results

lilac raven May 5, 2021, 11:39 PM

#

[#1 + #2 + #3 + #4...etc. / n (number of files)]. say those # files have arrays [#,#,#,#,#,#..etc] and some of them have [#,#,#,#,NaN,#,#NaN]. Can i still obtain an average curve out of those

#

or do I have to not use the NaN files

grave frost May 5, 2021, 11:39 PM

#

desert oar i still dont see how this is different from the 1-3 steps i outlined

alright, it's kinda similar now that you point it out

desert oar May 5, 2021, 11:40 PM

#

lilac raven ```[#1 + #2 + #3 + #4...etc. / n (number of files)]```. say those # files have a...

you have to remove the missing values, yes

#

there is something called "imputation" for missing data in more advanced applications, but that will not help you here

lilac raven May 5, 2021, 11:40 PM

#

ah damn

desert oar May 5, 2021, 11:41 PM

#

you can't produce new information where no information exists

grave frost May 5, 2021, 11:41 PM

#

Thanx a ton for the guidance @desert oar 👍 🚀

desert oar May 5, 2021, 11:41 PM

#

grave frost Thanx a ton for the guidance <@!389497659087650836> 👍 🚀

you're welcome, good luck

lilac raven May 5, 2021, 11:41 PM

#

i was hoping there was like normalizing function to zero out that certain position in an array and somehow still do the whole array

#

but that makes sense

desert oar May 5, 2021, 11:42 PM

#

well you can tell numpy to omit the missing values for you

#

but thats just a convenience, its still removing them

lilac raven May 5, 2021, 11:42 PM

#

I was thinking in a way like the scatter plot, you can scatter plot an array with NaN

#

values

#

but visually seeing something and averaging are different

desert oar May 5, 2021, 11:44 PM

#

and what happens when you plot the missing values?

#

you just dont plot them

lilac raven May 5, 2021, 11:44 PM

#

yeah

desert oar May 5, 2021, 11:44 PM

#

same thing here

void egret May 5, 2021, 11:44 PM

#

desert oar ```python df3.groupby(level='parental_level_of_education')['lunch'] \ .apply...

Exacly what I wanted, thank You very much and sorry for the trouble.

desert oar May 5, 2021, 11:45 PM

#

void egret Exacly what I wanted, thank You very much and sorry for the trouble.

👍

lilac raven May 6, 2021, 12:37 AM

#

    for file in files:
       # for x in set(ids):
           # if file.startswith(str(x)):
                if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
                    full_name = pathlib.Path(root) / file
                    try:
                        read_fname = full_name
                        data = np.loadtxt(read_fname)
                        avg = sum(data)/float(len(data))

                        np.savetxt("Average-MID-hrv.txt",np.array(data))
                    except Exception as e:
                            print (e``` it doesnt look like the Average file it prints out is in average, rather it is just the same values as the second MID-R1 file. I only have two files named that in the folder to see if the averaging works for now

#

in my avg=sum(data)/float(len(data)) line, do I need put something else other than data so it grabs all of the files that match data requirements (currently only 2 just to test)

serene scaffold May 6, 2021, 12:57 AM

#

lilac raven in my ```avg=sum(data)/float(len(data))``` line, do I need put something else ot...

what is data? If it's an array, why are you writing your own formula to get the average instead of using the array's methods?

#

actually it's np.mean rather than an array method.

lilac raven May 6, 2021, 12:58 AM

#

Data is the input from np.load txt which is input from read_fname

serene scaffold May 6, 2021, 12:58 AM

#

lilac raven Data is the input from np.load txt which is input from read_fname

so what type is data?

lilac raven May 6, 2021, 12:58 AM

#

And I made the for loop to be looking a number of them.

#

An array

#

1 d array with like 10 ish values

serene scaffold May 6, 2021, 12:58 AM

#

so you'll want to use avg = np.mean(data).

lilac raven May 6, 2021, 12:59 AM

#

So that will automatically take the multiple data files that I'm reading?

serene scaffold May 6, 2021, 1:00 AM

#

you want to take multiple files, and do what?

#

concatenate the arrays from each one?

lilac raven May 6, 2021, 1:02 AM

#

Take the average of all those areays

#

Arrays*

serene scaffold May 6, 2021, 1:03 AM

#

what does it mean to take the average of all those arrays? do you want the output of that operation to be an array, or a single number?

lilac raven May 6, 2021, 1:04 AM

#

To create an average file of one array with the 10 values, so each value is average of all files

eternal briar May 6, 2021, 1:04 AM

#

Hi, In the help forum does anyone know Numpy / pandas

serene scaffold May 6, 2021, 1:04 AM

#

eternal briar Hi, In the help forum does anyone know Numpy / pandas

This is the right channel. Go ahead and ask your question.

lilac raven May 6, 2021, 1:04 AM

#

[Avg, avg, avg, avg,] not [avg]

#

An average array, not one value

serene scaffold May 6, 2021, 1:06 AM

#

lilac raven [Avg, avg, avg, avg,] not [avg]

>>> a = np.array([1, 2, 3])
>>> b = np.array([4, 5, 6])
>>> np.mean([a, b], axis=0)
array([2.5, 3.5, 4.5])

#

Or this--the effect is the same

>>> a = array([[1, 2, 3], [4, 5, 6]])
>>> np.mean(a, axis=0)
array([2.5, 3.5, 4.5])

#

axis=0 is the key.

lilac raven May 6, 2021, 1:08 AM

#

Yeah, so np.mean(data) will know to take all of the files that match My requirements?

serene scaffold May 6, 2021, 1:08 AM

#

lilac raven Yeah, so np.mean(data) will know to take all of the files that match My requirem...

no, np.mean assumes that you pass what you want to it.

lilac raven May 6, 2021, 1:09 AM

#

I'd have to use append then somehow

#

?

#

Like data =data.append under the original data

serene scaffold May 6, 2021, 1:09 AM

#

you'd have to make a 2d array and then use np.mean to take the average of each row

#

so you'd actually want to use axis=1

lilac raven May 6, 2021, 1:11 AM

#

Wouldnt utilizing append on data make it appends each new input in data

#

Since I'm looking at multiple files with the for loop

#

So would data becomes a 2d array after I append it

serene scaffold May 6, 2021, 1:12 AM

#

lilac raven Wouldnt utilizing append on data make it appends each new input in data

keep in mind that append operations on arrays creates a new array. You can append to lists if you want continuity.

#

It looks like Numpy might even handle it the same way

lilac raven May 6, 2021, 1:13 AM

#

[#,#,#,#,#,#] would be an array not a list though right

serene scaffold May 6, 2021, 1:14 AM

#

lilac raven [#,#,#,#,#,#] would be an array not a list though right

depends on the context. Lists are a data structure that come with Python. Arrays come from numpy.

mossy stratus May 6, 2021, 1:14 AM

#

anyone know how to graph an equation (3d) with matplotlib?

serene scaffold May 6, 2021, 1:14 AM

#

!e

import numpy as np
a = [1, 2, 3]
b = [4, 5, 6]
print(np.mean([a, b], axis=1))

arctic wedgeBOT May 6, 2021, 1:14 AM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

[2. 5.]

serene scaffold May 6, 2021, 1:15 AM

#

Looks like numpy can give you the expected behavior using lists.

lilac raven May 6, 2021, 1:17 AM

#

So doing np.mean(data) will be doing np.mean(datafile1,datafile2,datafile3,etc) with the way I have it reading in? I feel like I have to use append still for it to do that

serene scaffold May 6, 2021, 1:33 AM

#

lilac raven So doing np.mean(data) will be doing np.mean(datafile1,datafile2,datafile3,etc) ...

if you mean np.mean(data, axis=1) where data is a list of lists of ints and you do axis=1, then yes.

#

you would need to append each sub-list to data before np.mean(data, axis=1) is calculated.

minor marsh May 6, 2021, 1:53 AM

#

Hi guys, does anyone knows a great tutorial/course about Recurrent Neural Networks with LSTM, I did an udemy course about it, but I stuck at the predictions part

exotic maple May 6, 2021, 2:49 AM

#

serene scaffold !e ```py import numpy as np a = [1, 2, 3] b = [4, 5, 6] print(np.mean([a, b], ax...

This is so succintly simple and i've neverknown about it omfg...

mossy stratus May 6, 2021, 3:36 AM

#

import matplotlib.pyplot as plt
import numpy as np
import sympy

fig = plt.figure()
ax = fig.add_subplot(projection='3d')

x = np.linspace(-5,5,500)
y = np.linspace(-5,5,500)
x,y = np.meshgrid(x,y)

e = sympy.solve('x**2/4+y**2/9+z**2/16-1',sympy.Symbol('z'))

z = -2*np.sqrt(-9*x**2 - 4*y**2 + 36)/3

ax.plot_surface(x, y, z)

plt.show()

#

this isn't right

#

not sure why

civic summit May 6, 2021, 3:50 AM

#

Anyone have experience with model selection for binary independent variables and binary dependant variables? Thinking logit regression, but im trying actually trying to find which independent variables are most important and together have >70 accuracy, chi

#

12:00pm can't sleep thinking about projects.

hoary wigeon May 6, 2021, 4:34 AM

#

Can anyone help me with kernel not found Problem ?

#

When i try to open jupyter-notebook it doesnt start,

Something Error 500 is thrown back

#

when i open jupyter-lab and create new book, The kernel doesnt respond.

#

when i open a existing notebook, It works

#

Now the problem is im not able to use jupyter in anaconda for creating new notebooks

#

First Screen on launching Jupyter Notebook

#

On creating new NOTEBOOK

hoary wigeon May 6, 2021, 5:20 AM

#

Solved with this : conda install nbconvert=5.4.1

#

Thread Closed

#

Thank YOU

heavy bay May 6, 2021, 7:03 AM

#

Can anyone help me with fitting data to my model?
This is the data

le = sklearn.preprocessing.LabelEncoder()
date = le.fit_transform(list(data["Date"]))
_open = le.fit_transform(list(data["Open"]))
high = le.fit_transform(list(data["High"]))
low = le.fit_transform(list(data["Low"]))
adj_close = le.fit_transform(list(data["Adj Close"]))
volume = le.fit_transform(list(data["Volume"]))

X = list(date)
y = list(zip(high, low, _open, adj_close, volume))

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.1)
```But when I try to fit the data into the model as displayed below```py
linear = sklearn.linear_model.LinearRegression()
linear.fit(x_train, y_train)``` I get this error ```powershell
ValueError: Expected 2D array, got 1D array instead:
array=[2088  311 1839 ... 2422   64 1705].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.``` Thanks

twin fiber May 6, 2021, 7:21 AM

#

hello, is there any chance there is someone here who know's littlewoods rule? I am absolutely desperate to understand how to apply this rule to a question I have been given r.e. an assignment

#

any help is extremely appreciated ❤️

willow geyser May 6, 2021, 7:50 AM

#

curious if using the GPU for tensorflow makes it more "even" compared to the CPU.
And if that's what makes it faster?

arctic wedgeBOT May 6, 2021, 9:34 AM

#

:incoming_envelope: :ok_hand: applied mute to @tacit wharf until 2021-05-06 09:44 (9 minutes and 58 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

limpid oak May 6, 2021, 10:53 AM

#

need help

#

I have df which generate from pqsql, I can download it on my local system

#

but I want to create one url so when user visit that link script should run and output can be saved on user side

kindred radish May 6, 2021, 11:06 AM

#

Anyone know about Spectral Clustering? Just want to list the steps I think should happen and have someone be like: "yeah that's right" or "that bit's wrong"

wicked mantle May 6, 2021, 11:19 AM

#

RuntimeError: stack expects each tensor to be equal size, but got [3, 47, 47] at entry 0 and [1, 47, 47] at entry 5
This mean that my image isn't rgb?

hushed wasp May 6, 2021, 11:54 AM

#

# Creation of histograms (features)
temps1=time.time()

def build_histogram(kmeans, des, image_num):
    res = kmeans.predict(des)
    hist = np.zeros(len(kmeans.cluster_centers_))
    nb_des=len(des)
    if nb_des==0 : print("problème histogramme image  : ", image_num)
    for i in res:
        hist[i] += 1.0/nb_des
    return hist


# Creation of a matrix of histograms
hist_vectors=[]

for i, image_desc in enumerate(imagesarray) :
    if i%100 == 0 : print(i)  
    hist = build_histogram(kmeans, image_desc.reshape(-1, 1), i) #calculates the histogram
    hist_vectors.append(hist) #histogram is the feature vector

im_features = np.asarray(hist_vectors)

duration1=time.time()-temps1
print("temps de création histogrammes : ", "%15.2f" % duration1, "secondes")```



Hello guys, I don't know why sometimes this code works and why sometimes it's looping again and again.... can anyone help?

Thanks

serene scaffold May 6, 2021, 12:12 PM

#

hushed wasp ```py # Creation of histograms (features) temps1=time.time() def build_histogra...

so the for i, image_desc in enumerate(imagesarray) : loop is iterating more times than you expected?

hushed wasp May 6, 2021, 12:16 PM

#

Yes exactly

#

I just run it again it just continues to loop except one time it's worked

#

#

#

It worked one time and not the second time! Just the number of pictures is different but it's not even a question of too much pictures, cause sometimes with less pictures it doesn't work too...

limpid oak May 6, 2021, 12:20 PM

#

Creation of histograms (features)

temps1=time.time()

    res = kmeans.predict(des)
    hist = np.zeros(len(kmeans.cluster_centers_))
    nb_des=len(des)
    if nb_des==0 : print("problème histogramme image  : ", image_num)
    for i in res:
        hist[i] += 1.0/nb_des
        return hist


# Creation of a matrix of histograms
hist_vectors=[]

for i, image_desc in enumerate(imagesarray) :
    if i%100 == 0 : print(i)  
    hist = build_histogram(kmeans, image_desc.reshape(-1, 1), i) #calculates the histogram
    hist_vectors.append(hist) #histogram is the feature vector

im_features = np.asarray(hist_vectors)

duration1=time.time()-temps1
print("temps de création histogrammes : ", "%15.2f" % duration1, "secondes")```

#

try this ones @hushed wasp

hushed wasp May 6, 2021, 12:27 PM

#

I've got the same error

#

isn't it exactly the same? :p

limpid oak May 6, 2021, 12:28 PM

#

you getting error because you are returning hist outside of for loop

#

that's why you are getting hist of first input data

#

no, check for loop

hushed wasp May 6, 2021, 12:30 PM

#

ok indeed

#

thx

limpid oak May 6, 2021, 12:30 PM

#

try to plot it inside loop

#

check output there

hushed wasp May 6, 2021, 12:30 PM

#

but I ve got the same looping over and over

limpid oak May 6, 2021, 12:31 PM

#

show full error and script

hushed wasp May 6, 2021, 12:33 PM

#

I don't really raise an error it just keep running and crash

#

however when it's working it just calculate the histograms in like few seconds

#

I adapted the code from a SIFT extraction that I try to use with some CNN

#

Working I only get this :

#

I just rerun the exact same code and know i have just iterations again and again, changing absolutely nothing... (in code and data)

limpid oak May 6, 2021, 12:56 PM

#

in my opinion you should check line 11

hushed wasp May 6, 2021, 1:02 PM

#

I will!

Thanks for giving me some of your time @limpid oak

civic summit May 6, 2021, 1:18 PM

#

@limpid oak , do you have a sec to advise on the below? I have survey data that has independent variables that are all binary, as well as dependant variables that are binary. I am thinking logistic regression for my model of choice, but i am wondering what would be the best way of finding which independent variables are the most important predictors?

strange plinth May 6, 2021, 1:32 PM

#

I have this matplotlib chart, how can I get two y axes, one for each line?

import matplotlib.pyplot as plt
import matplotlib.dates

fig, ax = plt.subplots()
fig.set_size_inches(12, 8)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b'))

plt.plot(df.when, df.total, ".-", label="# Commits", color="black", linewidth=.5)
plt.plot(df.when, df.pctcon, label="% Conventional", color="green", linewidth=4)
plt.legend()
plt.show()

left mulch May 6, 2021, 1:43 PM

#

strange plinth I have this matplotlib chart, how can I get two y axes, one for each line? ```py...

ax2 = ax.twinx()

#

Plot the second one with ax2

strange plinth May 6, 2021, 1:43 PM

#

left mulch Plot the second one with ax2

can you tell me more about how to do that?

desert oar May 6, 2021, 1:44 PM

#

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
fig.set_size_inches(12, 8)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b'))

ax.plot(df.when, df.total, ".-", label="# Commits", color="black", linewidth=.5)

ax2 = ax.twinx()
ax2.plot(df.when, df.pctcon, label="% Conventional", color="green", linewidth=4)

fig.legend()
fig.show()

like this?

left mulch May 6, 2021, 1:44 PM

#

Yes

strange plinth May 6, 2021, 1:46 PM

#

Beautiful, thanks!

#

Hmm, the legend is a little borked like this. It appears, but i get a warning also?

#

UserWarning: Matplotlib is currently using module://ipykernel.pylab.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()

#

this is in a jupyter notebook

desert oar May 6, 2021, 1:48 PM

#

oh, use plt.legend and plt.show i guess. maybe theres some subtle difference

#

%matplotlib inline in the jupyter notebook should tell matplotlib to plot in the notebook and not elsewhere, but maybe plt.show does that automatically while fig.show doesn't

strange plinth May 6, 2021, 1:49 PM

#

plt.legend removes the warning, but now only one line is mentioned in the legend.

#

matplotlib is weird....

desert oar May 6, 2021, 1:49 PM

#

ah yep, fig.show is lower level and you should use plt.show https://matplotlib.org/stable/api/figure_api.html#matplotlib.figure.Figure.show

#

not sure why the legend wouldnt detect this automatically, but you can manually specify the lines to be used in the legend

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
fig.set_size_inches(12, 8)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b'))

line1 = ax.plot(df.when, df.total, ".-", label="# Commits", color="black", linewidth=.5)

ax2 = ax.twinx()
lien2 = ax2.plot(df.when, df.pctcon, label="% Conventional", color="green", linewidth=4)

plt.legend([line1, line2], [line1.label, line2.label])
plt.show()

#

im going based off this here https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend but ive never had to do this before

strange plinth May 6, 2021, 1:54 PM

#

can't find .label, but i can repeat the strings.

desert oar May 6, 2021, 1:54 PM

#

sorry, use .get_label()

strange plinth May 6, 2021, 1:57 PM

#

Nice! Now I want to force ax2 to go 0-100

desert oar May 6, 2021, 1:58 PM

#

ax2.set_ylim(0, 100) https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html#matplotlib.axes.Axes.set_ylim

strange plinth May 6, 2021, 1:59 PM

#

YAY! thanks 🙂

desert oar May 6, 2021, 1:59 PM

#

navigating the matplotlib docs is not easy... these ax things are instances of matplotlib.axes.Axes https://matplotlib.org/stable/api/axes_api.html#the-axes-class and the fig thing is an instance of matplotlib.figure.Figure https://matplotlib.org/stable/api/figure_api.html#matplotlib.figure.Figure

sly salmon May 6, 2021, 3:37 PM

#

hey guys - when getting into algorithms I found a book intended for absolute beginners (Grokking Algorithms) which simplified concepts simply and built up. Are there any books similar to this (for beginners) which you guys would recommend?
alternatively, do you guys have any positive sentiment to platforms like dataquest?

mint palm May 6, 2021, 4:05 PM

#

"the dotted function is noisy in the diagram" is what i understood earlier but now the instructor says mini batch norm makes the z(tilde) more noisy due to using mean / variance for each mini batch seperatly......what does he mean here.....arent we actually just scalling z so should actually stable mini batches

#

shy kraken May 6, 2021, 4:18 PM

#

This is driving me nuts, I'm trying to do a simple rolling average of some data. Blue line and points is the data, yellow line is supposed to be the moving average:

#

I don't understand why the yellow line would be equal to the where the blue dot is on the right side of the graph. It makes no sense

#

This is my code:

#


sma10 = data['PX']/data['WOS'].iloc[::-1].rolling(10).mean()

#

shouldn't that be reading in ten datapoints, finding the average? I had to throw in a .iloc[::1] because it was starting calcs from the left side

sick wedge May 6, 2021, 4:24 PM

#

how do i get my current working directory on the left like this on spyder

fervent zenith May 6, 2021, 5:56 PM

#

how do i convert this to decimal value 3,2 to 32.0

lapis sequoia May 6, 2021, 6:04 PM

#

fervent zenith how do i convert this to decimal value 3,2 to 32.0

highlight the whole column
ctrl+r
go to the replace tab
replace , with nothing
press ok

lunar zenith May 6, 2021, 6:05 PM

#

import matplotlib.pyplot as plt 
import pandas as pd

ra = pd.read_csv("ramen-ratings.csv")

new_ra5 = ra.loc[(ra["Stars"] != "Unrated")]

new_ra5["Stars"] = ra["Stars"].astype(float)

new_ra5 = ra.groupby("Country","Stars").mean() 
new_ra1 = ra.groupby("Country").Stars.count() 
for x in new_ra: 
    print(new_ra[x] / new_ra1)
 

for x in new_ra1: 
    print(x)``` I have this code, how do I fix the string to float: 'Unrated' error?

fervent zenith May 6, 2021, 6:14 PM

#

lapis sequoia highlight the whole column ctrl+r go to the replace tab replace , with nothing p...

but there is some instance od decimal values like 460,6 which should be 460.6 but removing , would give 4606

#

lapis sequoia May 6, 2021, 6:14 PM

#

fervent zenith but there is some instance od decimal values like 460,6 which should be 460.6 bu...

you would have to identify those specific ones then

#

maybe put a 1 in the column next to those ones, then sort them

#

@fervent zenith excel can't distinguish 50,8 as 508 or 50.8 unless it has some more information

shut slate May 6, 2021, 6:18 PM

#

Hey guys

#

Does anyone know how to read the cluster centeres of Kmeans?

#

Like this is what I have

#

#

What does this mean exactly?

desert oar May 6, 2021, 6:50 PM

#

@shut slate your matrix had 10 columns originally?

#

each cluster center is a vector of 10 coordinates

shut slate May 6, 2021, 6:51 PM

#

Thank you. But how do I make sense of what the clusters are?

#

I have the clusters and dont know what to do with it

desert oar May 6, 2021, 6:52 PM

#

what is this data?

#

what are you trying to achieve by using k-means?

shut slate May 6, 2021, 6:53 PM

#

Well the data is the housing market in Melbourne. I am trying to figure ut the price

#

desert oar May 6, 2021, 7:00 PM

#

so why are you doing k-means clustering?

#

you want a price that corresponds to each cluster?

#

let's say you have N rows and P features, and you perform K means clustering. then the cluster_centers_ is a K x P array, where each row is a cluster center, and each column corresponds to one of your original data features.

#

so if price is the 2nd feature in your data, then price will be the 2nd element of each row of the cluster_centers_ array

shut slate May 6, 2021, 7:02 PM

#

Actually yeah, i don't know why I am doing the clusterring. Here is why now that I think abot it, i want to know how the combinations of each feature corresponds to price I guess?

desert oar May 6, 2021, 7:02 PM

#

however note that you need all 10 elements to "describe" the cluster. you could have 2 clusters with similar mean prices, but very different values of the other features

shut slate May 6, 2021, 7:03 PM

#

So I guess my first problem is, Ok I clustered it into 3 clusters

#

Now what does that mean and how do I use it

#

lol

desert oar May 6, 2021, 7:03 PM

#

cluster analysis is fine as an exploratory technique, just keep in mind that k-means in particular tends to try to find equal-sized roughly-spherical clusters and won't necessarily give intelligent results unless you do more work to choose a suitable K

#

e.g. if you have completely random data it will still find K clusters for you, but those clusters will amount to basically segmenting the data into equal sizes and aren't so much "clusters" as they are "segments" with hard boundaries

shut slate May 6, 2021, 7:04 PM

#

I did the elbow analysisand it showed to do 3 clusters

desert oar May 6, 2021, 7:05 PM

#

ok, thats a reasonable place to start then

#

its still better to think of k-means output as "segments" rather than "clusters"

#

the two things you can do with k-means are:

look at the cluster/segment centers
determine which segment a data point belongs to, which amounts to finding the closest cluster center

shut slate May 6, 2021, 7:07 PM

#

Ok makes sense, is there any way I can visualize the clusters?

#

for example

grave breach May 6, 2021, 7:08 PM

#

Yes

#

look at this: https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html

shut slate May 6, 2021, 7:09 PM

#

#

So does this mean it just clustered by yearbuilt?

grave breach May 6, 2021, 7:10 PM

#

Year built on the x

#

And price on the y

shut slate May 6, 2021, 7:11 PM

#

the hue is the cluster

grave breach May 6, 2021, 7:11 PM

#

Yes, that should be the cluster

shut slate May 6, 2021, 7:11 PM

#

and as I can see is that from 1880 to 130s its one cluster

#

1930*

#

1940 to 1980 is 2nd

#

and the 3rd is 1980 to 2020

#

or am I talking non sense here

grave breach May 6, 2021, 7:13 PM

#

That should be correct

#

But I cannot clearly see the picture

shut slate May 6, 2021, 7:14 PM

#

sec

#

#

but other features for example

grave breach May 6, 2021, 7:15 PM

#

Yes, that is correct

shut slate May 6, 2021, 7:16 PM

#

#

So I just solved how it clusstered?

#

And can you get Python to tell you what features mattered the most?

grave breach May 6, 2021, 7:17 PM

#

It depends on the classifier

desert oar May 6, 2021, 7:17 PM

#

int this case it's obvious that year built is mainly driving the clustering from the plot

#

that's possibly because the scales of the numbers are all off

#

for feature importance you can do anova, i think its a nice idea https://stats.stackexchange.com/a/77693/36229

Cross Validated

Estimating the most important features in a k-means cluster partition

Is there a way to determine which features / variables of the dataset are the most important / dominant within a k-means cluster solution?

#

or you can do the distance between cluster centers feature-wise, that's a nice idea too

#

you should probably standardize your data before doing k-means

shut slate May 6, 2021, 7:20 PM

#

Ok will look into it l8

#

Thank you all

grave breach May 6, 2021, 7:21 PM

#

You're welcome

zinc lark May 6, 2021, 7:52 PM

#

anyone have an idea on why pytorch 11.1 cuda is so much bigger than 10.2 cuda?

#

coral kindle May 6, 2021, 9:24 PM

#

zinc lark anyone have an idea on why pytorch 11.1 cuda is so much bigger than 10.2 cuda?

Most likely cudatoolkit

#

Since installing pytorch through conda means you're downloading the associated CUDA version, that means everything comes prepackaged, even CuDNN

grave frost May 6, 2021, 9:57 PM

#

but there hasn't been that major of a revamp to add 1.2GB to the Cuda toolkit

coral kindle May 6, 2021, 10:19 PM

#

10 to 11 implies there's been one. I haven't checked the CUDA changelog however

lapis sequoia May 6, 2021, 10:22 PM

#

Is there any obvious pattern on how to design a deep learning model?

#

Like how do you know what to set the parameters to?

grave frost May 6, 2021, 10:39 PM

#

lapis sequoia Like how do you know what to set the parameters to?

nah, it's mostly guesswork and intuition. mostly, you try a set, see the result and adjust accordingly

lapis sequoia May 6, 2021, 10:39 PM

#

grave frost nah, it's mostly guesswork and intuition. mostly, you try a set, see the result ...

how do you go about guessing though

#

there are so many parameters

grave frost May 6, 2021, 10:39 PM

#

because you don't need to tune them all to get a decent accuracy?

lapis sequoia May 6, 2021, 10:40 PM

#

I see

harsh karma May 6, 2021, 10:50 PM

#

idk if this is the place to ask but why does heroku install scipy? i had it in requirements.txt but i removed it and all but it still installs it

desert oar May 6, 2021, 11:10 PM

#

harsh karma idk if this is the place to ask but why does heroku install scipy? i had it in r...

is it a dependency of another library you are using?

harsh karma May 6, 2021, 11:11 PM

#

i narrowed it down to this:

discord.py==1.7.2
pafy==0.5.5
praw==7.1.0
prawcore==1.5.0
premailer==3.7.0
protobuf==3.15.1
pycparser==2.20
pylint==2.6.0
python-dateutil==2.8.1
requests==2.25.1
yagmail==0.14.245
youtube-dl==2020.12.31
youtubepy==6.0.2```

#

i deleted all the scientific stuff but it still downloaded it

#

who thought this was a good idea

#

ayy

grave frost May 6, 2021, 11:28 PM

#

hmm...is it a problem if it downloads scipy?

dry terrace May 7, 2021, 5:53 AM

#

hey i would like to copy a dataset based on another but not directly. e.g. per column the distribution should remain and also conditional probabilities.
My approach so far is to see which column can be described exactly with the fewest conditions (Prob(A=1|B) = 1) and then set this column. Iteratively repeat until all columns are described.

The problem is that the column distribution may be destroyed. Does anyone have a better idea? The goal is to create a more anonymous dataset, which still has the best possible quality.

clever bramble May 7, 2021, 7:04 AM

#

Sound source separation challenge by Sony with an exclusive dataset created just for the challenge.

Pretty neat baselines to start with. 10,000 Swiss Francs prize.

Any one interested to participate?
https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021?utm_source=discord&utm_medium=python&utm_campaign=sony

AIcrowd | Music Demixing Challenge @ISMIR 2021 | Challenges

coral kindle May 7, 2021, 7:29 AM

#

I'm not sure which channel is appropriate for that, but has anybody managed to parse the insides of a PDF document using nothing but raw Python?

broken stratus May 7, 2021, 7:51 AM

#

Hey is anyone here good at pytoch...I need help to change my code from theano to pytorch

coral kindle May 7, 2021, 7:58 AM

#

I have some PyTorch knowledge but I never touched Theano

native nimbus May 7, 2021, 9:38 AM

#

https://www.youtube.com/watch?v=Rv3o54WNCio

YouTube

UbiOps

UbiOps Release Update (Thursday, 6th May 2021)

Yesterday (on the 6th of May) was our UbiOps Release Update Webinar.
During the session, Anouk Dutrée, Product Owner at UbiOps, gave a demo of several new features, which we have recently added to our platform.
She demonstrated how to deploy R code in UbiOps, set up monitoring emails and more. In case you missed the live session, we got you cov...

▶ Play video

desert oar May 7, 2021, 1:24 PM

#

dry terrace hey i would like to copy a dataset based on another but not directly. e.g. per c...

This doesn't sound like a bad idea. how many features are in the dataset? Maybe you could try to approximate the joint distribution directly

#

I believe a VAE can do that

#

http://ruishu.io/2018/03/14/vae/

Density Estimation: Variational Autoencoders - Rui Shu

One of the most popular models for density estimation is the Variational Autoencoder. It is a model that I have spent...

#

However if you ask on https://stats.stackexchange.com you might get more interesting and helpful answers

Cross Validated

Q&A for people interested in statistics, machine learning, data analysis, data mining, and data visualization

olive moat May 7, 2021, 2:38 PM

#

this might be an xy but i'm going to explain this as best i can
i am using pytorch and trying to create an lstm that takes a character and maps it to another, but i'm struggling a little with the representation of characters
everything i've seen encodes characters with onehot vectors, however i'm wondering why class labels aren't used instead? i.e. an integer 0-25 each one representing a letter, possibly a 26th representing padding
another issue is that i am trying to use Cross Entropy Loss and it the target seems to be required to be class label encoded instead of onehot, so why not just use class labels in the first place?
i'm a little lost :P

neon marsh May 7, 2021, 2:59 PM

#

For anyone that used jupyter notebook. Does jupyter notebook run better if your pc is more powerful since jupyter notebook runs on your browser

austere swift May 7, 2021, 3:35 PM

#

it runs in your browser but it still uses local resources, it just runs on a web server on your pc

#

so yes

#

its not using any external server to run the code or anything

serene scaffold May 7, 2021, 3:54 PM

#

neon marsh For anyone that used jupyter notebook. Does jupyter notebook run better if your ...

The computation power of the machine you're running on is what will determine performance, regardless of whether you're using jupyter or a repl or what have you.

neon marsh May 7, 2021, 3:54 PM

#

Alright thank you

serene scaffold May 7, 2021, 3:55 PM

#

for the record, I would strongly discourage you from using jupyter notebooks unless you have a lot of experience programming Python without them: https://datapastry.com/blog/why-i-dont-use-jupyter-notebooks-and-you-shouldnt-either/

DataPastry Blog

Why I don’t use Jupyter notebooks and you shouldn’t either

Jupyter notebooks give you instant feedback but lead to bad habits.

neon marsh May 7, 2021, 3:57 PM

#

Ohh ok will look into that

spring obsidian May 7, 2021, 4:03 PM

#

serene scaffold for the record, I would strongly discourage you from using jupyter notebooks unl...

I'm starting to see notebooks show up in intro python classes (like, programming fundamentals... not even data science courses) and I just cringe. It's unnecessary and it gets in the way.

serene scaffold May 7, 2021, 4:04 PM

#

spring obsidian I'm starting to see notebooks show up in intro python classes (like, programming...

In my experience helping people who are enrolled in university Python courses, they are very very bad.

#

If they don't teach them to use notebooks, they teach them to write getters and setters. You can't win.

spring obsidian May 7, 2021, 4:07 PM

#

serene scaffold In my experience helping people who are enrolled in university Python courses, t...

Agreed. I teach a python intro course at a US university and I'm sticking with replit.com.

serene scaffold May 7, 2021, 4:08 PM

#

Is the thinking there that teaching them to setup Python on their machine would be too cumbersome?

#

(Because I wouldn't blame you if that's the thinking.)

spring obsidian May 7, 2021, 4:10 PM

#

We have one class for both of these students: "I can't even navigate a file system from the command line" and "I actually know how to code already"

blazing bridge May 7, 2021, 4:10 PM

#

I think the reason for that would be to just stick to the fundamentals of the course so it can cater to both spectrums

desert oar May 7, 2021, 4:10 PM

#

One legitimate use is for "literate programming" homework assignments that mix code and written solutions

spring obsidian May 7, 2021, 4:11 PM

#

So we introduce VS Code eventually, but it starts with the replit.com "ide"

#

I feel like a shill for repl.it 😄

serene scaffold May 7, 2021, 4:12 PM

#

spring obsidian We have one class for both of these students: "I can't even navigate a file syst...

oddly enough, explicit use of the terminal rarely came up in my curriculum.

desert oar May 7, 2021, 4:12 PM

#

Repl.it is great. That said I don't see why notebooks are so much worse than anything else.

serene scaffold May 7, 2021, 4:12 PM

#

but yeah, that's always the dilemma with those courses.

desert oar May 7, 2021, 4:13 PM

#

I do think in a university setting there should be a mandatory one or two credit "how to use the unix flavored cli" course

spring obsidian May 7, 2021, 4:13 PM

#

desert oar I do think in a university setting there should be a mandatory one or two credit...

I like MIT's "Missing Semester"

blazing bridge May 7, 2021, 4:13 PM

#

desert oar Repl.it is great. That said I don't see why notebooks are so much worse than any...

I think the reason for not teaching notebooks early on is because students don't learn the fundamentals of scripting and seeing output in the terminal.

serene scaffold May 7, 2021, 4:13 PM

#

desert oar Repl.it is great. That said I don't see why notebooks are so much worse than any...

they encourage you to break the problem down in terms of how you can display stuff at the end of each cell, not in terms of code reusability. And then you have to have the entire state of your notebook in your live human memory if you re-execute cells for some situation-specific reason.

spring obsidian May 7, 2021, 4:14 PM

#

https://missing.csail.mit.edu/

the missing semester of your cs education

The Missing Semester of Your CS Education

serene scaffold May 7, 2021, 4:15 PM

#

That being said, if you understand the problems with notebooks and are quite specifically trying to do exploratory analysis, I guess you can have at it.

spring obsidian May 7, 2021, 4:24 PM

#

Is Spyder really used out there in the professional DS world?

desert oar May 7, 2021, 4:25 PM

#

spring obsidian Is Spyder really used out there in the professional DS world?

occasionally. when i used it, it felt like a cheap rstudio imitation.

spring obsidian May 7, 2021, 4:26 PM

#

Yeah... RStudio is probably the only thing I miss from before I switched to Python

lapis sequoia May 7, 2021, 4:52 PM

#

Does anyone here use Chatterbot, and if so, do you know of any corpuses I can use to train my bot to be nice, and not ask people when they're gonna die?

#

https://cdn.discordapp.com/attachments/477912057560432680/840251407394799656/Screenshot_20210507-113859_Discord.jpg
https://cdn.discordapp.com/attachments/477912057560432680/840252659239616512/Screenshot_20210507-114333_Discord.jpg

burnt bronze May 7, 2021, 5:05 PM

#

https://paste.pythondiscord.com/azunajadih.json what is wrong, I'm getting a schema validation error. what's wrong with this?

#

#python-discussion

desert oar May 7, 2021, 5:07 PM

#

@burnt bronze need more context. what are you doing? what schema? what code is doing it? etc etc

lunar zenith May 7, 2021, 5:09 PM

#

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.read_csv('ramen-ratings.csv')

countries = df['Country']
ls = {}
for x in countries:
    if ls.get(x) is None:
        ls[x] = 1
    else:
        ls[x] += 1

countries = ls.keys()
df = pd.DataFrame.from_dict(ls.items())
df.index = countries
df.plot.bar(figsize = (15,4.5))


plt.title("Number of Ratings per Country", fontdict = {'fontsize': 15})
plt.xlabel("Countries", fontdict = {'fontsize': 15})
plt.xticks(rotation = 90)
plt.ylabel("Number of Ratings", fontdict = {'fontsize': 15})
plt.show()``` Anyone know how I'd remove the '1' legend?

#

grave frost May 7, 2021, 5:25 PM

#

Hey I don't get the hate - what's wrong with notebooks? 😛

visual umbra May 7, 2021, 5:35 PM

#

do you guys have any recommendations for applying knowledge from andrew ng's coursera into projects

kindred blade May 7, 2021, 5:37 PM

#

does anyone know what's better to use for data visualization matplotlib or charts js

#

I found chartsJs is more customizable but matplotlib is a wonderful library

lunar zenith May 7, 2021, 5:38 PM

#

kindred blade does anyone know what's better to use for data visualization matplotlib or chart...

matplotlib

kindred blade May 7, 2021, 5:39 PM

#

why

#

what does matplotlib has that chartsJs doesnt

#

and isnt charts Js a javascript library so i think it works better on web though I love matplotlib and used it alot

silk tulip May 7, 2021, 5:44 PM

#

Hello, I want to use MAFA Dataset(https://www.kaggle.com/rahulmangalampalli/mafa-data) for mask-detection but the files are labeled in .mat format. Can anyone tell me how can I use this dataset in python?
I've used mafa extractor( https://pypi.org/project/mafaextractor/) but i am not sure how to implement this.
TIA

MAFA_data

grave frost May 7, 2021, 5:54 PM

#

anything you find interesting 🤷@visual umbra

upbeat topaz May 7, 2021, 6:02 PM

#

hello

#

Is this where we learn to code

#

or is that in another section

desert oar May 7, 2021, 6:08 PM

#

@upbeat topaz we don't have a general "learning" channel. we do have a list of resources, and a lot of help channels for targeted help, see #❓｜how-to-get-help

#

!resources

arctic wedgeBOT May 7, 2021, 6:08 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

upbeat topaz May 7, 2021, 6:08 PM

#

desert oar <@!737094924905742466> we don't have a general "learning" channel. we do have a ...

thank you

desert oar May 7, 2021, 6:09 PM

#

also don't forget to read channel topics 🙂 it's the text up at the top of your discord window, to the right of the channel name and to the left of the search bar. you can click on it to read the whole thing

upbeat topaz May 7, 2021, 6:09 PM

#

okay

serene scaffold May 7, 2021, 6:53 PM

#

Am I to understand that the a and b arguments can be array-likes of labels? https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

#

I'm not sure what is meant by "samples of scores" in "Calculate the T-test for the means of two independent samples of scores.".

desert oar May 7, 2021, 6:56 PM

#

i don't think it means anything here

#

this is just a t-test for 2 independent samples

#

the "scores" thing i think is just bad wording and/or someone lazily copying from their stats-for-engineers textbook

serene scaffold May 7, 2021, 7:02 PM

#

!e

from scipy.stats import ttest_ind as tt
result = tt(['a', 'a', 'b'], ['a', 'b', 'b'])
print(result)

arctic wedgeBOT May 7, 2021, 7:02 PM

#

@serene scaffold :x: Your eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 2, in <module>
003 |   File "/snekbox/user_base/lib/python3.9/site-packages/scipy/stats/stats.py", line 5771, in ttest_ind
004 |     v1 = np.var(a, axis, ddof=1)
005 |   File "<__array_function__ internals>", line 5, in var
006 |   File "/snekbox/user_base/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 3702, in var
007 |     return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
008 |   File "/snekbox/user_base/lib/python3.9/site-packages/numpy/core/_methods.py", line 211, in _var
009 |     arrmean = umr_sum(arr, axis, dtype, keepdims=True, where=where)
010 | TypeError: cannot perform reduce with flexible type

serene scaffold May 7, 2021, 7:02 PM

#

@desert oar does one have to assign arbitrary integers to each label? that seems odd.

desert oar May 7, 2021, 7:03 PM

#

wait, what are you trying to do here

serene scaffold May 7, 2021, 7:03 PM

#

determine the statistical significance of two sets of predictions

desert oar May 7, 2021, 7:03 PM

#

banish that sentence from your mind

serene scaffold May 7, 2021, 7:03 PM

#

(that is, whether changes in the design of the model changed the predictions in the second as compared to the first in a way that can't be accounted for by random chance)

desert oar May 7, 2021, 7:04 PM

#

it sounds like you want to test whether 2 samples are from the same bernoulli distribution

serene scaffold May 7, 2021, 7:04 PM

#

I've never heard of bernoulli

desert oar May 7, 2021, 7:05 PM

#

bernoulli is yes/no with some probability p of "yes"

#

(meaning that "no" has probability 1-p)

#

the probability p does happen to be the mean of the bernoulli distribution

#

so yes you can use a 2-sample t-test to test the hypothesis of whether p1 and p2 are equal, if the samples are big enough for the central limit theorem to kick in. but you should calculate it directly, don't use this function (which tries to calculate the means from the data which it expects to be numeric)

#

note that you fundamentally can't assume equal variances unless the null hypothesis is true, because the variance of a bernoulli is p * (1-p). so if p1 != p2 then obviously p1 * (1-p1) != p2 * (1-p2)

#

https://en.wikipedia.org/wiki/Bernoulli_distribution

serene scaffold May 7, 2021, 7:09 PM

#

.bm 840303279405793302

#

Thanks!

desert oar May 7, 2021, 7:09 PM

#

https://online.stat.psu.edu/stat800/lesson/5/5.5

PennState: Statistics Online Courses

5.5 - Hypothesis Testing for Two-Sample Proportions | STAT 800

Enroll today at Penn State World Campus to earn an accredited degree or certificate in Statistics.

#

https://online.stat.psu.edu/stat415/lesson/9/9.4

PennState: Statistics Online Courses

9.4 - Comparing Two Proportions | STAT 415

Enroll today at Penn State World Campus to earn an accredited degree or certificate in Statistics.

desert oar May 7, 2021, 7:12 PM

#

serene scaffold .bm 840303279405793302

what's this?

serene scaffold May 7, 2021, 7:12 PM

#

desert oar what's this?

Bookmark command from Lancebot.

desert oar May 7, 2021, 7:12 PM

#

ooh

#

yeah read those 2 links i sent. the 2nd is probably more useful to you

serene scaffold May 7, 2021, 7:13 PM

#

okay lemon_hyperpleased

obsidian quail May 7, 2021, 7:19 PM

#

hey, I'm trying to plot times in the format %H:%M on x_axis, but it is only returning 00:00.
When using dates, it seems to work fine, but not with the times.
here's the code:

plt.plot(time_x,players_y)

plt.gcf().autofmt_xdate()

date_format = mpl_dates.DateFormatter('%H:%M')

plt.gca().xaxis.set_major_formatter(date_format)

plt.tight_layout()
plt.show()```
`time_x` is just regular datetime format eg. `2021-05-07 18:08:38`
Any ideas what's happening? (new to mpl)

#

desert oar May 7, 2021, 7:23 PM

#

@obsidian quail can you provide some sample data to reproduce with

obsidian quail May 7, 2021, 7:24 PM

#

1 sec

#

['2021-05-07 18:07:52', '2021-05-07 18:07:54', '2021-05-07 18:07:56', '2021-05-07 18:07:59', '2021-05-07 18:08:01', '2021-05-07 18:08:05', '2021-05-07 18:08:07', '2021-05-07 18:08:22', '2021-05-07 18:08:24', '2021-05-07 18:08:26', '2021-05-07 18:08:36', '2021-05-07 18:08:38', '2021-05-07 18:08:41', '2021-05-07 18:09:24', '2021-05-07 18:09:38', '2021-05-07 18:09:40', '2021-05-07 18:09:42', '2021-05-07 18:09:44', '2021-05-07 18:09:46', '2021-05-07 18:09:49', '2021-05-07 18:18:28', '2021-05-07 18:19:21', '2021-05-07 18:20:21', '2021-05-07 18:21:21']

#

[31, 35, 37, 31, 21, 24, 28, 44, 45, 39, 33, 32, 29, 19, 21, 24, 27, 29, 34, 55, 32, 27, 29, 25]

desert oar May 7, 2021, 7:41 PM

#

@obsidian quail it's because your time_x is all strings

#

i don't think matplotlib is smart enough to do that conversion for you

obsidian quail May 7, 2021, 7:43 PM

#

ah right, I'm not too familiar with datetime, I'm storing the values in an sqlite table, how would I convert into datetime whilst in the list?

desert oar May 7, 2021, 7:44 PM

#

pd.to_datetime would be the easiest option

#

you're storing them as these timestamp strings in sqlite?

obsidian quail May 7, 2021, 7:44 PM

#

as integers

desert oar May 7, 2021, 7:44 PM

#

like unix timestamps?

obsidian quail May 7, 2021, 7:45 PM

#

c.execute("SELECT * FROM last_24")
players_y = []
time_x = []
for x in c:
    players_y.append(x[1])
    time_x.append(x[0])```
I'm then appending the values to each list?

desert oar May 7, 2021, 7:46 PM

#

it sounds like you have an int type on the column but are writing strings to the db

#

sqlite will let you write the wrong datatype to a column

#

(i think its a non-feature but they have their reasons for doing it)

obsidian quail May 7, 2021, 7:47 PM

#

hmm, it's a default value time integer DEFAULT (datetime('now', 'localtime'))
Would it be due to this?
(btw let me know if we should move to #databases if its getting offtopic here)

desert oar May 7, 2021, 7:48 PM

#

from datetime import datetime

c.execute("SELECT * FROM last_24")
players_y = []
time_x = []
for x in c:
    players_y.append(x[1])
    time_x.append(datetime.strptime('%Y-%m-%d %H:%M:%S', x[0]))

this should work, or something like it anyway

#

and yes let's un-fuck your database in #databases

#

there is a "more right" way to do this

obsidian quail May 7, 2021, 7:49 PM

#

👍

fleet tundra May 7, 2021, 7:54 PM

#

Hello, I'm trying to find the combination of one entry of Column B with the other entries in the column B while trying to rank the top 5 entries it repeatedly occurs with reference to column A. Can anyone help me how to do this with pandas?

lapis sequoia May 7, 2021, 8:05 PM

#

how would u attempt to train a nn with a lot of classes and low images? or at least, not the same amount of images per class

desert oar May 7, 2021, 8:08 PM

#

fleet tundra Hello, I'm trying to find the combination of one entry of Column B with the othe...

this description is very unclear. can you try to clarify and provide examples?

grave frost May 7, 2021, 8:09 PM

#

lapis sequoia how would u attempt to train a nn with a lot of classes and low images? or at le...

pre-training

desert oar May 7, 2021, 8:09 PM

#

lapis sequoia how would u attempt to train a nn with a lot of classes and low images? or at le...

this is a problem for any kind of model, not just a neural network. these are also 2 different problems to some extent. the 2nd problem is "imbalanced data" and the first one is just "not having enough data". you can use transfer learning for "not enough data" (as long as you can find a pre-trained model that's relevant) but imbalanced data is harder imo

grave frost May 7, 2021, 8:09 PM

#

~~imagenet to the rescue~~

desert oar May 7, 2021, 8:10 PM

#

for the not-enough-data case, if you can gather a large amount of unlabeled data but only a small amount of labeled data, train an unsupervised model on the big unlabeled dataset then use it to create features for the small labeled dataset

#

for imbalanced data, honestly even when i was a professional data scientist working with other professional data scientists we still struggled with this

#

it is a hard and unsolved problem

grave frost May 7, 2021, 8:12 PM

#

desert oar for imbalanced data, honestly even when i was a professional data scientist work...

what solutions you solved it with BTW apart from artificially weighting and Data aug??

desert oar May 7, 2021, 8:12 PM

#

sometimes (eg with images) you can get some traction with data augmentation and/or generation. you can also try oversampling and/or undersampling but i dont know of anyone who gets great results with that.

desert oar May 7, 2021, 8:12 PM

#

grave frost what solutions you solved it with BTW apart from artificially weighting and Data...

spending a fuckton of money and time to acquire more labeled data

#

i.e. we didnt solve it...

grave frost May 7, 2021, 8:12 PM

#

desert oar spending a fuckton of money and time to acquire more labeled data

well, sometimes the simplest solutions work the best 😉

desert oar May 7, 2021, 8:13 PM

#

at least, that alleviated the worst of the problem. we still ended up with severely unbalanced data, and we kind of just accepted that our accuracy on those classes would be really bad

grave frost May 7, 2021, 8:13 PM

#

hmmm....what was it on tho?

desert oar May 7, 2021, 8:13 PM

#

so we adjusted our performance metrics and set expectations with the business stakeholders accordingly

grave frost May 7, 2021, 8:13 PM

#

like the task/dataset?

desert oar May 7, 2021, 8:13 PM

#

we never got improvements by using any "fancy" methods. it only ever added noise.

#

yeah, our business had a huge amount of hand-constructed "categories" for different types of businesses, and we had to figure out the type of a business based on whatever we could find about it

grave frost May 7, 2021, 8:14 PM

#

tried DAGAN? it works for quite many use cases. I was thinking of using it, but didn't want to spend so much compute power/$ on it.

grave frost May 7, 2021, 8:15 PM

#

desert oar yeah, our business had a huge amount of hand-constructed "categories" for differ...

ahh, tabular.

desert oar May 7, 2021, 8:15 PM

#

we could get its address (e.g. for zoning information), name, scrape the web for its facebook page, etc

#

so as you can imagine we got great results when distinguishing photographers from nightclubs, but distinguishing bars from nightclubs was a lot harder

#

(made up example but you get the idea)

#

and the imbalance was because we only had like 3 nightclubs and 6,000 bars (again made up but not far from what we saw in some cases)

grave frost May 7, 2021, 8:16 PM

#

don't companies tell what they do on the website? just scrape it all, filter and BERT is up

desert oar May 7, 2021, 8:16 PM

#

and there were 1000+ of these classes

grave frost May 7, 2021, 8:16 PM

#

desert oar and there were 1000+ of these classes

lemon_exploding_head

desert oar May 7, 2021, 8:17 PM

#

you would think so, and yes we used something based on bert ensembled with another model using a bunch of tabular metadata

grave frost May 7, 2021, 8:17 PM

#

uh-huh. tried weighting?

desert oar May 7, 2021, 8:18 PM

#

yep it helped a bit

#

but we ran up against the lower bound of "almost 0 data" at some points

#

weighting is great when you have 15 observations vs 150 observations

#

but when you have 5 observations youre kind of at the mercy of the dungeon generation algorithm, so to speak

grave frost May 7, 2021, 8:19 PM

#

oof.

#

that looks hard af

desert oar May 7, 2021, 8:19 PM

#

it sucked and it sucked the life out of our team and i think theyre still working on it long after i quit

#

but now im just ranting 😛

grave frost May 7, 2021, 8:19 PM

#

hmmm....

desert oar May 7, 2021, 8:20 PM

#

i havent used DAGAN

grave frost May 7, 2021, 8:20 PM

#

nah, leave it

#

it's for image data augmentation using GAN's

fleet tundra May 7, 2021, 8:20 PM

#

desert oar this description is very unclear. can you try to clarify and provide examples?

Ah, I'm sorry. Column A has order details and column B has the products purchased, column c has the client details. I wanna group the products that occur together in different orders and rank them based on the frequency with which they occur with other products. Then I want to assign the top 5 product occurences for each client according to their history

desert oar May 7, 2021, 8:20 PM

#

Data Augmentation Generative Adversarial Networks
heh i did actually consider doing something like this

#

seemed like a rabbit hole though

grave frost May 7, 2021, 8:21 PM

#

doesn't work too well on numerical? atleast I haven't read any papers that do explore smthin like that

desert oar May 7, 2021, 8:21 PM

#

i wouldnt know. also a lot of the data was categorical anyway

#

"annual revenue < 10k, 10-100k, 100k+"

#

giant clusterfuck

grave frost May 7, 2021, 8:21 PM

#

anyways, if those companies are in US, doesn't some org keep track of companies and their type?

desert oar May 7, 2021, 8:22 PM

#

you would think so!

#

several do

#

none of them do it reliably

#

its not like humans who have a social security number and 3 well curated credit scores available from 3 well known agencies

#

its a dumpster fire and there isn't even a good "ripe for innovation" solution

#

its probably better in other countries where the government actually does things

desert oar May 7, 2021, 8:23 PM

#

fleet tundra Ah, I'm sorry. Column A has order details and column B has the products purchase...

so each row is a single product, and several rows can be part of the same order details?

grave frost May 7, 2021, 8:23 PM

#

I mean - you could do the reverse. if data on bar clubs is less, then reverse search and scrape

#

atleast you can get that 5 up to 10, if not 15

desert oar May 7, 2021, 8:24 PM

#

wdym reverse search

fleet tundra May 7, 2021, 8:24 PM

#

desert oar so each row is a single product, and several rows can be part of the same order ...

Yes, that's right.

grave frost May 7, 2021, 8:25 PM

#

desert oar wdym reverse search

search up companies in "bar" category

#

and scrape

desert oar May 7, 2021, 8:25 PM

#

fleet tundra Yes, that's right.

and you want to count the frequency with which any pair of two products occurs? or something more complicated?

desert oar May 7, 2021, 8:26 PM

#

grave frost search up companies in "bar" category

what, like on google? its an interesting idea, although labeling would get very messy since the org had some pretty specific and weird labels

#

like i said we ended up spending a lot of money to do pretty much that: find more businesses in these categories

#

the problem was that the data you can buy from 3rd parties doesnt have the special in-house labels

fleet tundra May 7, 2021, 8:28 PM

#

desert oar and you want to count the frequency with which any pair of two products occurs? ...

Frequency with which every product occurs within different order details. Then group the products that occur together and assign them to the client for each product they bought. One client can have multiple rows of order details and which in turn can have multiple rows of product details

desert oar May 7, 2021, 8:42 PM

#

fleet tundra Frequency with which every product occurs within different order details. Then g...

can you give an example? it sounds like you might want to use .groupby

grave frost May 7, 2021, 8:43 PM

#

desert oar the problem was that the data you can buy from 3rd parties doesnt have the speci...

branched model then

#

2 inputs - for both dataset categories

#

some might overlap, so you can hand remove them

#

but overall, would be some work to ensure proper dtypes and no NAN - but I guess would be easy for data scientiests

desert oar May 7, 2021, 8:45 PM

#

got a reference on this? i ended up separately creating vector embeddings on the "big" unlabeled dataset then applied those to the "small" labeled data

lapis sequoia May 7, 2021, 8:46 PM

#

mmm

grave frost May 7, 2021, 8:46 PM

#

desert oar got a reference on this? i ended up separately creating vector embeddings on the...

depends on the data type tho. if 3rd party is givin tabular, and you are using NLP then you have to have 2 models 🤷 but again, you are actual scientists so..

lapis sequoia May 7, 2021, 8:46 PM

#

is there a way to search with a py script images on google images and download them?

grave frost May 7, 2021, 8:46 PM

#

desert oar got a reference on this? i ended up separately creating vector embeddings on the...

not even contextual?

lapis sequoia May 7, 2021, 8:46 PM

#

so since i have the labels names, i can search the label on google and download a few

grave frost May 7, 2021, 8:47 PM

#

lapis sequoia so since i have the labels names, i can search the label on google and download ...

would be easy - google it up

#

but the real pain would be the data cleaning

lapis sequoia May 7, 2021, 8:47 PM

#

i dont wanna do it manually lol

desert oar May 7, 2021, 8:47 PM

#

grave frost depends on the data type tho. if 3rd party is givin tabular, and you are using N...

im not entirely sure what a model with 2 inputs would look like here. lets say you have 1 million records from dataset A and 5000 records from dataset B, and B might or might not contain some records from A.

grave frost May 7, 2021, 8:47 PM

#

lapis sequoia i dont wanna do it manually lol

I meant google up on how to write one

lapis sequoia May 7, 2021, 8:48 PM

#

ah

desert oar May 7, 2021, 8:48 PM

#

lapis sequoia is there a way to search with a py script images on google images and download t...

scripting google searches is probably against their terms of service, unless you use an official google image search api. discussion ToS violations is against our server's rules and therefore we can't help you with that aspect of your project.

#

!rules 5

arctic wedgeBOT May 7, 2021, 8:48 PM

#

Rules

5. Do not provide or request help on projects that may break laws, breach terms of services, be considered malicious or inappropriate. Do not help with ongoing exams. Do not provide or request solutions for graded assignments, although general guidance is okay.

lapis sequoia May 7, 2021, 8:48 PM

#

unless you use an official google image search api

#

so u cant help me

#

nice

desert oar May 7, 2021, 8:48 PM

#

those are the rules

grave frost May 7, 2021, 8:48 PM

#

desert oar im not entirely sure what a model with 2 inputs would look like here. lets say y...

model with 2 inputs would require concatenation. you can directly ref with keras - I perosnally haven't implemented so take my ideas with a grain of salt

lapis sequoia May 7, 2021, 8:48 PM

#

i repeat

#

unless you use an official google image search api

#

maybe u dont even read what u write

grave frost May 7, 2021, 8:49 PM

#

lapis sequoia i repeat

Im pretty sure it was just a continuation of his messages, not a reply to you

#

don't get too triggered

desert oar May 7, 2021, 8:49 PM

#

i dont think being rude or sarcastic to volunteer helpers on the internet is a good idea either

lapis sequoia May 7, 2021, 8:50 PM

#

so how are u a volunteer helper if u just said u wont

#

XDDD

#

xdxdxd

grave frost May 7, 2021, 8:50 PM

#

desert oar im not entirely sure what a model with 2 inputs would look like here. lets say y...

for tf/keras, a functional model can easily accomplish that

#

very high model complexity, but less time mucking about data

desert oar May 7, 2021, 8:51 PM

#

grave frost model with 2 inputs would require concatenation. you can directly ref with keras...

i guess im still not sure what this would mean. if i have 2 different "images" that together constitute a single record (maybe 2 different rows from 2 different databases that refer to the same entity), it makes sense. but how would i "concatenate" anything about 2 completely different entities and get any sensible results that i can use on a single entity at prediction time?

#

what would such a model be learning?

#

i have to step out but @ me so i dont miss your messages

fleet tundra May 7, 2021, 8:53 PM

#

desert oar can you give an example? it sounds like you might want to use `.groupby`

Well, yeah. Column A has say 10 orders from 1,2,3 (clients) with each order having a combo from the list of products a, b, c, d, e. I want to know in 10 orders how many times 'a' occurs along with 'b' and 'd' and with which orders so I can assign them based on the client. I want to do the same with every product. That is to find the combinations within product column but order specific. Then I want to find the aggregate of the most occurred combinations for each product

#

Do I make sense?

dim olive May 7, 2021, 8:55 PM

#

lapis sequoia so how are u a volunteer helper if u just said u wont

Hello, we are all volunteers here, meaning everyone here takes personal time to do this. No one is obliged to answer your questions, and we will never provide help when it breaks our rules. Please be mindful of this in the future.

lapis sequoia May 7, 2021, 8:55 PM

#

????

#

wtf are u talking about? lol. When did i obliged someone? rofl

grave frost May 7, 2021, 8:58 PM

#

desert oar i guess im still not sure what this would mean. if i have 2 different "images" t...

@desert oar basically from what I know, you have two input layers to take data from those 2 dataset. (tbh visually would be better) so when you build your networks for those 2 inputs - you basically have a multi-branch network (smthing like image segmentation and bounding box together, I beleive)

Anyways, you can have initial layers for each input (say conv's) and then at some point you would need a bottleneck - to merge both inputs together. you would have to create multple other "branches" too to capture complex hierarchial relations from both dataset.

This is where concat comes in - it would provide a bottleneck. I have attached a visual image that kinda gives an example. While a single LSTM layer may not be well served for most datasets - this is where multi-branched networks comes in.

nets like inception are a pain in the <> to work with due to their branched structure, but ofc complex model helps so much better than spending months on data processing and alignment.

again, I haven't implemented so there may be some important aspect I might have missed out - so take my ideas with a grain of salt 👍

#

Another extremnely basic one - doesn't seem to be too much a problem as long as you pool and flatten the conv outputs eh?

dim olive May 7, 2021, 9:00 PM

#

lapis sequoia wtf are u talking about? lol. When did i obliged someone? rofl

I think you may have misunderstood what I sad. obliged here is meant to mean that no user here is required to answer a question. This is not really the correct place to discuss this. You can contact modmail if you feel you would like to discuss this.

lapis sequoia May 7, 2021, 9:01 PM

#

i dont have to discuss anything. It seems u didnt read the whole conversation

grave frost May 7, 2021, 9:01 PM

#

theoretically, sounds good. multiple branches constituting different networks architectures working on different data types. but I guess you can always ping up your buddies to suggest and see if they might research a bit bout it

#

oh wait - you can also do transfer learning 🤣 it would be hell, but if you train different branches seperately and use set_weights/get_weights to reconstruct seperate tf.keras.Model with it, you can also fine-tune the whole damn thing brainmon

desert oar May 7, 2021, 10:41 PM

#

@grave frost i think this still doesn't apply to the particular case i described, whereas transfer learning would. this is just segmenting the input features for a single record. it's a great idea, kind of like doing what we did with building 2 models and ensembling them, but in a single network

#

but it doesn't solve the "use data from the big dataset to inform model on small dataset"

#

whereas transfer learning is meant for this (re: our discussion from a while ago)

#

nice graphics though

#

very helpful

desert oar May 7, 2021, 11:10 PM

#

fleet tundra Well, yeah. Column A has say 10 orders from 1,2,3 (clients) with each order havi...

so the algorithm would be something like this:

for each client:
  for each product:
    count each combination with other products

fleet tundra May 7, 2021, 11:12 PM

#

desert oar so the algorithm would be something like this: ``` for each client: for each p...

Yes. But if a client has multiple orders, it has to filtered by order too

desert oar May 7, 2021, 11:12 PM

#

it'd help if you gave some actual example data and example outputs

#

im not sure what "filtered by order" means, i thought that's what a combination w/ other products meant

fleet tundra May 7, 2021, 11:16 PM

#

Okay. Sorry about that.

   100          A             1
                B        
   101          A             2
                C   
   101a         B
                C 
   102          D             3
                A         
                B
                C```
If that's my data, I'd like to get a each client to be recommended of other products based on other purchases. Here Client 2 has 2 orders so while grouping each order is treated as a separate entity for aggregation of product suggestion

fleet tundra May 7, 2021, 11:17 PM

#

desert oar im not sure what "filtered by order" means, i thought that's what a combination ...

That's to know what the combination is per order

fervent turret May 7, 2021, 11:18 PM

#

def formatCurrency (balance):
if balance == savBal:
return str(updateBalance(balance, savIR))

def updateBalance (balance, rate):
savBal = balance + balance * rate/100
else:
chBal = balance + balance * rate/100
return balance + balance * rate/100

savBal = float(input("Enter your savings balance:"))
chBal = float(input("Enter your checking balance:"))
savIR = float(input("Enter your savings interest rate %:"))
chIR = float(input("Enter your checking interest rate %"))

print("Your updated savings balance is", formatCurrency(savBal))
print("Your updated checking balance is", formatCurrency(chBal))

#

can someone plese help me

#

please*

grave frost May 7, 2021, 11:19 PM

#

desert oar <@!738058085083381760> i think this still doesn't apply to the particular case i...

You can think of it as a single architecture trying to coordinate towards optimising weights for multiple other architectures all for the aim of increasing accuracy.

serene scaffold May 7, 2021, 11:19 PM

#

@fervent turret this is not a data science question. See #❓｜how-to-get-help

grave frost May 7, 2021, 11:21 PM

#

ofc, I wouldn't know the immediate advantages of that as opposed to a ensemble, but I don't see in what case ever unified architectures would not be appropriate.

#

Plus you are missing the key point @desert oar with multiple input nodes, the network can optimize what features to use from each branch and using them in various degrees (dropping features with no significance) and allows it to pool the information gained from each branch far more accurately than a naive ensemble.

lapis sequoia May 8, 2021, 12:33 AM

#

So, I have an interesting thing with GPT-2 and discord.
I wanna use GPT-2 to make a chatbot, but I don't know how to filter out previous messages from it's output so it only makes 1 message in response to other messages and stuff.
Currently, I append every message into a list called convo, and I feed it the previous 3 messages.

#

It generates the message, and sends it to the discord channel, which triggers another message, adding it's message to the conversation. the convo list stores everything, but also when I pass the previous 3 messages in, and it outputs, it's output contains the previous messages, which leads to messages getting huge, like over 2000 characters, in a matter of seconds. My current code is:

if message.channel.id == focus_channel:
      messages = chatlog[-3:]
      convo = ""
      for x in chatlog:
        convo += f"{x}\n"
      inputs = tokenizer.encode(convo, return_tensors='pt')
      outputs = model.generate(inputs, max_length=50, do_sample=True)   
      text = tokenizer.decode(outputs[0], skip_special_tokens=True)
      print(text.split("\n"))
      await message.channel.send(text,reference=message,mention_author=False)

austere swift May 8, 2021, 12:45 AM

#

can't you just take the last message from the convo list?

#

i might be misunderstanding your question

lapis sequoia May 8, 2021, 12:50 AM

#

I want to take the last few messages so that it continues the conversation instead of completeing the message.

#

Currently it's output is

message1
message2
message3
bot_response
with
newlines

carmine iron May 8, 2021, 12:51 AM

#

sometimes i see

#

what does this mean / do

lapis sequoia May 8, 2021, 12:59 AM

#

lapis sequoia Currently it's output is ``` message1 message2 message3 bot_response with newlin...

@austere swift

austere swift May 8, 2021, 1:00 AM

#

so just take the bot_response line

#

you can do something like text.split("\n")[3]

lapis sequoia May 8, 2021, 1:04 AM

#

But what about all the rest of it's response? (It's responses have newlines in them)

#

wait

#

I can probably just do text.split("\n")[3:]

#

Nope, that breaks horribly, just like the other times lol

#

mint palm May 8, 2021, 6:10 AM

#

#

the code of a course using tf.train.GradientDescentOptimizer(0.01).minimize(cost)

#

but this GradientDescentOptimizer is in version 1 of tensorflow

#

should i be doing this course?

#

or is it outdated

autumn basin May 8, 2021, 7:00 AM

#

It’s outdated

#

If it’s using TF 1.0

mint palm May 8, 2021, 7:01 AM

#

#

they say this

ripe forge May 8, 2021, 7:01 AM

#

Tf2, specifically it's keras api, will be a lot nicer.

#

Unfortunately you'll probably run into this issue with a lot of courses probably, since tensor flow 2 is relatively new.

#

So perhaps as long as you're willing to port the codes over or at least get a sense of how the same code could be written in Tf2 you could proceed.

mint palm May 8, 2021, 7:02 AM

#

mint palm

this sounds promising

ripe forge May 8, 2021, 7:03 AM

#

Ultimately the specifics of writing code don't really take away from the learnings offered by a course itself. So decide based on whether the course is good or not.

mint palm May 8, 2021, 7:03 AM

#

ok i believe its good

ripe forge May 8, 2021, 7:04 AM

#

OK cool, then in that case just keep the caveat in mind, you might have to modify the codes presented

#

In most cases Tf2 keras looks very similar to keras so it's an easy port if they use keras.

mint palm May 8, 2021, 7:06 AM

#

yeah they say they will use keras later

silver widget May 8, 2021, 8:19 AM

#

Hi all, got a question about overfitting. Can this be assumed as overfitting? or the model is good?
LR scores
0.9452887537993921
precision recall f1-score support

       0       0.95      1.00      0.97       933
       1       0.00      0.00      0.00        54

accuracy                           0.95       987

macro avg 0.47 0.50 0.49 987
weighted avg 0.89 0.95 0.92 987

[[933 0]
[ 54 0]]

#

I see the accuracy is good but the 0's at the negative side makes me question the model

#

also the precision, recall and f1 scores of 1( people had a stroke) is 0

ripe forge May 8, 2021, 9:22 AM

#

Is this score on train or test.

ripe forge May 8, 2021, 9:22 AM

#

silver widget Hi all, got a question about overfitting. Can this be assumed as overfitting? or...

Also, the answer is neither.

ripe forge May 8, 2021, 9:23 AM

#

silver widget I see the accuracy is good but the 0's at the negative side makes me question th...

This should hopefully indicate to you that accuracy is a terrible metric for this. Clearly this model is useless

#

This model can be entirely replaced by a single line of code print("NO stroke")

#

To state it more explicitly, the problem here is class imbalance. Your dataset is going to have more cases without stroke than with. Using accuracy as a metric then, would favour a model that just gets the no stroke cases correct.

#

I'm going to go out on a limb and assume that you agree that would make for a fairly bad model. So this isn't a good model, and our metric choice of accuracy isn't appropriate

#

As for overfitting, you only get a sense of that if you compare the fit on train vs the fit on test, after choosing a good metric.

silver widget May 8, 2021, 9:28 AM

#

ripe forge Is this score on train or test.

This is a score on based 'test'

silver widget May 8, 2021, 9:29 AM

#

ripe forge To state it more explicitly, the problem here is class imbalance. Your dataset i...

Exactly, the data has lot more 'no stroke' than 'stroke' output.

silver widget May 8, 2021, 9:29 AM

#

ripe forge As for overfitting, you only get a sense of that if you compare the fit on train...

Thank you, will try that now.

small mulch May 8, 2021, 9:45 AM

#

Hello, I am writing a code, which scrapes data from a website. Now, I want to put it in an excel file. What will be the best - csv or pandas or if there is something else? Also, I want it to append the new fields to the existing file and not make a new one when it is run again

ripe forge May 8, 2021, 9:56 AM

#

Pandas should be easy to work with once you're used to it.

small mulch May 8, 2021, 9:57 AM

#

Okay, Thank you so much!

somber prism May 8, 2021, 11:02 AM

#

anyone know how to fill in the specified value based on the query for the df

#

so for eg if theres a 0 in anyone of the column in pandas df , i want to change those values to some specified val

#

how do i do it without using for or while loop

#

i know about fillna but this isnt for NaN values

desert oar May 8, 2021, 11:10 AM

#

grave frost Plus you are missing the key point <@!389497659087650836> with multiple input no...

No, I got this point. And I do see how that it could propagate information more effectively than a naive ensemble. And I will definitely want to try it in a project at some point. But conceptually it's the same thing as ensembling two models, it just lets the learning process be smarter about the ensembling.

desert oar May 8, 2021, 11:16 AM

#

fleet tundra Okay. Sorry about that. ```Order Product Client 100 A ...

But I still don't understand how you want to calculate these recommendations. Can you give an example of the frequencies you calculate for this data?

Also, there are known algorithms that sound like what you're trying to do. You might want to look into "association rules" https://towardsdatascience.com/association-rules-2-aa9a77241654, and "collaborative filtering" https://towardsdatascience.com/intro-to-recommender-system-collaborative-filtering-64a238194a26.

desert oar May 8, 2021, 11:34 AM

#

somber prism so for eg if theres a 0 in anyone of the column in pandas df , i want to change ...

!e ```python
import pandas as pd

x = pd.Series([1,1,0,0,2,2])
print(x.tolist())

Option 1

x1 = x.copy()
x1.loc[x1 == 0] = 9
print(x1.tolist())

Option 2

x2 = x.apply(lambda val: 9 if val == 0 else val)
print(x2.tolist())

arctic wedgeBOT May 8, 2021, 11:34 AM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | [1, 1, 0, 0, 2, 2]
002 | [1, 1, 9, 9, 2, 2]
003 | [1, 1, 9, 9, 2, 2]

somber prism May 8, 2021, 11:35 AM

#

desert oar !e ```python import pandas as pd x = pd.Series([1,1,0,0,2,2]) print(x.tolist())...

thanks

desert oar May 8, 2021, 11:36 AM

#

small mulch Hello, I am writing a code, which scrapes data from a website. Now, I want to pu...

writing one line at a time is easier with the csv module. once you have created the file, you should use pandas to work with it, convert to xlsx if needed, etc.

grave frost May 8, 2021, 12:34 PM

#

@lapis sequoia GPT-2 can't hold a convo very effectively though (not as good as it's better counterpart)

grave frost May 8, 2021, 12:36 PM

#

desert oar No, I got this point. And I do see how that it could propagate information more ...

maybe it might be 🤷 I will open a question about this, cuz I can't find any good resources

winged stratus May 8, 2021, 1:50 PM

#

I wanted to know how much images are required to make a decent quality gan

#

ofc thats a very general question

#

but i have 300 1000x1000 images of mountains

#

and i plan on training a wasserstein gan on these images

#

would that be enough

#

or should i collect more images?

#

i have trash computer, so i dont want to waste a lot of time on a gan that wont work on 300 images

#

thanks!

grave breach May 8, 2021, 2:24 PM

#

Even tough GANs works with a very few data I don't think 300 is enough

#

Also, I think in this case lowering the quality could help a lot

#

@winged stratus

winged stratus May 8, 2021, 2:29 PM

#

thanks

#data-science-and-ml

Creation of histograms (features)

Option 1

Option 2