#data-science-and-ml

1 messages · Page 309 of 1

latent blaze
#

wsl actually is pain

misty flint
#

i remember asking that question. then our last team project was on making a chatbot

#

then i figured it out

#

rasa 11/10

#

if you ever need to make a chatbot for deployment

sour abyss
#

how can i calculate the p-value of a test statistic in python?

#

as of right now i only plan on doing t stat, z, chi square and slope of regression line

#

this is what i have as of right now

velvet thorn
#

or using a library

sour abyss
#

ideally manually

velvet thorn
#

you know the mathematics behind these calculations, yes?

sour abyss
#

yes

#

for the most part at least

velvet thorn
#

okay so

#

what problems are you having exactly

sour abyss
#

i can find the correct standardized test statistic, such as z-score, t stat, chi square etc. but i'm clueless on how to find the p value from that. i know that hardcoding in a table of z score probability values is not the way to go either

velvet thorn
#

okay so

velvet thorn
#

think of the p-value as the area under the PDF, right

#

(which corresponds to the CDF)

#

that's all calculus

sour abyss
#

from a normal curve the p-val would be the area to the right of it correct?

sour abyss
#

oh right if you're doing like a 2 sided z stat it's P(X > z) and P(X < -z) iirc

velvet thorn
#

ye

#

so

#

those are definite integrals

sour abyss
#

uhh

#

ohhhhhhh right i think it's the area covered to the corresponding side of the test statistic right?

#

just finding the % of the area to the right/left of the statistic

#

if it was 1 sided with a null hypothesis of p = 0 and alternate hypothesis of p > 0 the upper and lower bounds of the integral is from the z-score to infinity righht?

#

i think

exotic maple
# velvet thorn those are definite integrals

Do people actually do integrals to get those from the PDF? I've never, even in college, solved the PDF in a range to get the p-value. sounds a bit overkill unless explicitly for teaching. Most distributions have their stndard tables no?

velvet thorn
#

they said

#

they want to do it manually

exotic maple
sour abyss
#

I Found a resource called scipy which can do integrals for you, but all this integral stuff is completely new to me, I've messed around with it back in middle school but its surprising that you can use an integral to find a CDF for a normal curve. Ended up just using scipy in the program, because manually configuring tables and god forbid degrees of freedom didn't look like the best option programatically

velvet thorn
#

you can calculate the integral manually

#

well

#

it depends on what you want to do I guess

#

whether its' for learning

#

but anyway

#

the CDF is the integral of the PDF

astral path
#

ok i need help really quickly

#

if i have two lists of different lengths but they're within the same range, how would i plot them over each other so they start and end at the same locations?

astral path
#

ok well im scratching that completely now

#

how do i find correlation between two lists with different lengths but which are within the same time series?

#

as in i measured one variable 5 times in an hour, and another 423 times in an hour, and need to see if they're correlated

#

thanks!

#

i have like 30 minutes btw, been working on this for HOURS

lavish tundra
#

someone know how to change all the symbols of a graph legend? I'm using seaborn and matplotlib and but even trying to use legend_handler of matplotlib i don't got sucess

item_xy.legend(legend, fontsize=legendsize, bbox_to_anchor=(0.87, 1.15), loc=2, handler_map={item_xy: HandlerLine2D(numpoints=1)})
#

i'm talking about this

rigid bolt
#

A 28×28 numpy array can be interpreted as a matrix with order 28×28 right? For reference it was mentioned in the tensorflow docs

sinful gale
#

Can anyone explain me how polynomial regression works with the use of linear regression?

#

What does that line do?

hexed heath
#

Hi there ! My question might be dumb but I am looking for the most efficient way to select N rows in a matrix such as there distances to one another are :

  • ideally, maximum
  • at least, larger than a threshold
    I thought I could cluster my data on N cluster and pick in each one (because that is my underlying idea of selecting rows of different classes), but I wonder if there is really a need in clustering.
    Thanks 🙂
mint palm
#

so there is nothing to explore there?

#

if you can give an example of how visualised attention maps would know what NN focus on binary classification of cats, that would be great?

solar loom
#

Can I get a little help on counting precision and recall of a search engine ?

ebon geyser
#

Anyone who has heard about AIML files?

little compass
#

Hey everybody!
I just uploaded a new video "Differentiable augmentation for GANs (using Kornia)"

https://youtu.be/J97EM3Clyys

GANs are known to be very data hungry. Are there ways how to make them more data efficient? As it turns out applying augmentations is not that straightforward. In this video, I explain a recent method called differentiable augmentation (DiffAugment) and use it to train the DCGAN.

In this video, I discuss the paper "Differentiable Augmentation for Data-Efficient GAN Training". Additionally, I take a few ideas from it and try to code up an experiment to investigate whether differentiable augmentation has any effect on GAN training. I use the open-source package Kornia to perform the augmentations. To make our lives simpler...

▶ Play video
trail yoke
#

line 1, in <module>
from PyQt5 import QtCore, QtGui, QtWidgets
ModuleNotFoundError: No module named 'PyQt5'

#

help

lapis sequoia
#

what will be better for a data scientist future at microsoft?

  1. python and SQL
  2. python and TSQL
  3. R and SQL
  4. R and TSQL
red hound
#

Does anyone have good ressources to learn about Tensorflow Graph Debugging? I have a GAN which graph isn't entirely or correctly connected. Maybe there are issues caused by discrete or non differentiable parts, that no gradients can be calculated. I have no idea how to look deeper inside and find out, whats wrong. From the outside everything looks fine and the model compiles and runs the forward pass

mint palm
#

is so make sure your terminal is on same python version as you installed modules on

#

you can switch version in terminal by using

#

py -3.x.x filename.py

ebon geyser
spare vortex
ebon geyser
#

Can u plzz see the link above?

spare vortex
#

I made a chatbot api with it

ebon geyser
#

Ooh, cool

#

Can u plzzzz see the link above?

ebon geyser
spare vortex
#

I mean it's still in development

#

I saw it

#

and I read it

#

basically AIML files are like HTML

#

but made for chatbots

#

what you have to do is

ebon geyser
#

Ok?

spare vortex
#

you need to create a file called
std-startup.xml

ebon geyser
#

I mean

#

I have all that

spare vortex
#

ah you di

#

do

#

then I will give you the resource

#

wait

ebon geyser
#

I just need some help with predicates and stuff

spare vortex
#

about aiml and pandorabots

ebon geyser
#

Uhhh, what's pandorabots?

spare vortex
spare vortex
#

it's very good platform to use chatbots from

ebon geyser
#

Ohhh

spare vortex
#

especially Kuki_ai

ebon geyser
#

Wait

#

The AIML files I have

#

Those all have version 1.0.1

spare vortex
spare vortex
#

not just aiml

ebon geyser
spare vortex
#

aiml is for python2

#

python-aiml is for python3

ebon geyser
#

Yes I know that

#

And is there a getting started guide? Or I need to see that link u sent?

spare vortex
#

aiml docs are the guiee

#

guide

#

and pandorabots is the example

ebon geyser
#

Oh, ok

spare vortex
#

of how would you make one

#

aiml is very easy and very good chatbot system

#

but it is rule based

ebon geyser
#

And can I use old AIML files?

spare vortex
#

so it has its disadvantages

spare vortex
#

so that wont matter

#

the only thing will matter is your the library

ebon geyser
spare vortex
#

you are using

#

not that

#

its deprecated

ebon geyser
#

Well does it matter?

spare vortex
#

use the official resources wait

ebon geyser
#

Am just getting the AIML files

spare vortex
#

wait a sec

ebon geyser
#

Sure

spare vortex
#

this and

ebon geyser
ebon geyser
spare vortex
ebon geyser
#

Oh ok cool

spare vortex
#

matter

#

the aiml files are same

ebon geyser
#

And also, if I want some aiml files, can I get those from any GitHub repo?

spare vortex
#

i used these in my chatbot api

spare vortex
#

just download using git

ebon geyser
#

And I also need to change the version?

spare vortex
#

clone it

ebon geyser
#

From 1.0 to 2.0

#

?

spare vortex
#

it will work

ebon geyser
#

Oh ok

spare vortex
#

aiml 2.0 will also work

ebon geyser
spare vortex
#

you can ofc

ebon geyser
#
  • means anything
spare vortex
#

ye it will work

ebon geyser
#

Oh cool!

spare vortex
#

just try it and see lol

ebon geyser
#

So I can just copy paste the aiml tiles

#

Files*

spare vortex
#

yea

ebon geyser
#

BTW, if it's ok for u, may I ping/DM u, regarding them?

spare vortex
#

sure

ebon geyser
#

And also, have u used sessions and predicates, like I asked in that question above?

spare vortex
#

this one has everything

#

all the files you will need

ebon geyser
#

Ooh

#

Cool!

spare vortex
#

it has alice chatbot files
mitsuki and
standard aiml files

ebon geyser
#

Ooh

#

Those all are AIML files?!

raven knoll
#

is it possible to use TextBlob in the dutch language?

spare vortex
ebon geyser
#

Damn, this person did do a looot of work

spare vortex
#

yea

ebon geyser
#

BTW, have u used predicates or something?

#

The get and set methods?

spare vortex
#

they are not actually in aiml

#

you have tags

#

like learn
random

#

and say use

#

tags like that

ebon geyser
#

Ooh ok

#

Thanks for help!

spare vortex
#

look at the aiml website

ebon geyser
#

Appreciate it dude

spare vortex
#

np

spare vortex
#

you are using it for your discord bot right?

kindred blade
#

is it better to use matplotlib or Charts js to show on web

dark sigil
#

For my data analysis unit I need to clean up a database to create charts from the data for the report i'm creating. I'm trying to figure out which columns won't be useful to drop them.
The scenario brief I'm set is this:
You are working in a small Data Analytics firm. A small Insurance Broker is looking to add another insurer on to their portfolio. The new insurer wants to see the claims performance of their current business (a “bordereau”).
Would I need to drop some columns?

lapis sequoia
#

Hey anyone faced the similar issue where pandas converts ints,floats, and etc into objects?

example_array = np.array([
    [1, 2, 3],
    ['one', 'two', 'three'],
    [4.01, 5.01, 6.01],
    [np.nan, np.nan, np.nan]
])

df = pd.DataFrame(example_array, index=['int', 'string', 'float', 'nan'])

# df.select_dtypes(include = ['float'])
df.dtypes```

output: 

0 object
1 object
2 object
dtype: object```

haughty tree
#

can i have a good resource to learn ai/ml

#

kinda doesn't know the exact roadmap

grave breach
#

@haughty tree Do you have an high school math background?

grave breach
#

Great

#

So if you want to start with neural network I heavily suggest nffs.io by sentdex (he's also making a youtube series of the book, but it might take a long time to complete)

tidal bough
#

Maybe you can avoid using an array here?

lapis sequoia
#

yeah I did some research found it to be the way, just the issue I'm facing is if I construct multiple independent lists they will have the same problem while passing through DataFrame.

#

Instead of doing hole pd.Series I was wondering if there is a way to work around that issue

lapis sequoia
#
example_array = [
[1, 2, 3],
['one', 'two', 'three'],
[4.01, 5.01, 6.01],
[np.nan, np.nan, np.nan]]

dtypes = ['int', 'string', 'float', 'nan']

df = pd.DataFrame()

for index in range(len(example_array)):
    df[dtypes[index]] = example_array[index]

df.dtypes```
#

this seems to solve the issue

tidal bough
#

if I construct multiple independent lists they will have the same problem while passing through DataFrame.
Not sure what you mean by that. You get the same problem even when passing each column as a list or a numpy array with the right dtype?

lapis sequoia
#

TIL: Panda associates dtypes per column basis

tidal bough
#

Basically, I'm saying that if all that you have as a numpy array of dtype object, you need to somehow make pandas recalculate the dtypes of each column. Ideally, you justy wouldn't create such an array.

tidal bough
lapis sequoia
#

yeah that wouldn't of been optimal, but the secondary version seems to work like a charm

balmy junco
#

Hey guys, I am using resnet18 from pytorch and I'm having trouble optimizing for recall for my binary classifier. I think I need to make a change to my fc layer. Any thoughts on how to do this?

grave frost
#

it's all just a google search away

mint palm
#

thats true

hushed wasp
#
# Creation of histograms (features)
temps1=time.time()

def build_histogram(kmeans, des, image_num):
    res = kmeans.predict(des)
    hist = np.zeros(len(kmeans.cluster_centers_))
    nb_des=len(des)
    if nb_des==0 : print("problème histogramme image  : ", image_num)
    for i in res:
        hist[i] += 1.0/nb_des
    return hist


# Creation of a matrix of histograms
hist_vectors=[]

for i, image_desc in enumerate(imagesarray) :
    if i%100 == 0 : print(i)  
    hist = build_histogram(kmeans, image_desc.reshape(-1, 1), i) #calculates the histogram
    hist_vectors.append(hist) #histogram is the feature vector

im_features = np.asarray(hist_vectors)

duration1=time.time()-temps1
print("temps de création histogrammes : ", "%15.2f" % duration1, "secondes")```


Hello guys, I don't know why sometimes this code works and why sometimes it's looping again and again.... can anyone help?

Thanks
stiff drift
#

Hello, anyone knows about courses involving machine learning and trading?

grave frost
#

do you want to use ML with trading/finance?

stiff drift
#

yes

#

I would highly appreciate if anyone has some data about it. There is a book from Stephen Jensen named Machine Learning for algorithmic trading but idk if it is useful,

#

And some online courses, but nowadays many people sell bullshit specially trading related

grave frost
#

that's cuz it's not a great idea in general

#

you need advanced models to turn a good profit, since humans rely mostly on luck and crypto

#

which can't be taught by a course

mint palm
#

but its in R lang

#

u interested

stiff drift
#

yeah sure, maybe i can adapt it for python

stiff drift
grave frost
stiff drift
#

The other day i was 8 hs in front of the pc and the minute i go to the supermarket things happened haha

stiff drift
grave frost
#

because there was a guy here the other day who asked the same question

#

and you can guess, he is a shitposter

#

anyways, that's not how trading works

#

trading won't require a lot of "attention" per se unless you are doing day trading- which you shouldn't at all since it's very risky

mint palm
grave frost
#

but you should have a very specific usecase for such a "bot" for it to be actually helpful to you

#

in the end, it depends on what exactly you are trying to automate and to what extent

mint palm
#

@stiff drift

stiff drift
#

I do day trading and i relly solely on myself. But it is tiring and somehow i want like a machine backup just in case i miss something you know

#

I will try finding some courses on udemy or the book i mentioned before. Thxs everyone

desert oar
#

@stiff drift maybe you can try to write up some heuristic rules for what counts as "something happening" (e.g. price movement above a certain threshold) and then encode those in a simple program, just following rules, no fancy AI stuff

#

after all, machine learning often amounts to trying to capture human intuition and reasoning in a machine

#

starting with simple rules is often the best way to go

stiff drift
#

Yeah, i ve done that. But maybe applying ML i build the most profitable bot ever hahah, just let me dream

#

thanks !

#

i will research if anyone is interested dm

desert oar
#

start with trying to match what you, a human, currently do

#

then worry about making it better than what a human can do

#

the 1st one is already very hard, the 2nd is exponentially harder

stiff drift
#

hahah yeah, will try my best

left jacinth
#

i an amateur. can anybody help me?

stiff drift
#

On what?

tidal bronze
#

what is the prformance impact of copying a value to a variable
example:

for x in list1:
    y = self.data["Pallets"].get((t, i, f), 0) + 1
```is what I am currently doing
but what if I did this instead:
```python
for x in list1:
    d = self.data["Pallets"].get((t, i, f), 0)
    y = d + 1

I think it makes the code more readable but my code should also be performant, is the impacting of assigning the value to d negligible?

odd yoke
#

It is negligible but it's not exactly the right channel

late shell
#

Hello, A beginner question on ML, how do I know which model needs feature scaling & which doesn't?

late shell
#

yeah, although normalization is just one feature scaling technique, right? there a few more

grave frost
#

like with N.B, you don't normalize the probs

late shell
mint palm
#

just test on test set

#

if its too slow or too biased then it will require normalisation

grave frost
mint palm
#

and testing will tell you and normalisation is overall very much used......saves time too

grave frost
#

what has testing got to do with normalization?

tidal bronze
late shell
#

alright, thanks a ton @grave frost and @mint palm.

mint palm
#

isnt it that some normalisation optimize how much portion of activation function we use

#

that do affect learning process

grave frost
mint palm
#

i too have just started it i dont know for sure.....so wont debate much😆

#

but for time i can say for sure it would fasten learning

grave frost
mint palm
#

see this it cause insufficient learning due to inproper scaled param

grave frost
mint palm
#

ya that cause some of details in input do unnoticed

#

that leads to more error when testing model

mint palm
grave frost
#

parameters don't vanish

#

and model doesn't "notice" anything, it's just for analogy

mint palm
grave frost
mint palm
#

ya maybe

grave frost
#

The parameters of a neural network are typically the weights of the connections. In this case, these parameters are learned during the training stage. So, the algorithm itself (and the input data) tunes these parameters. The hyper parameters are typically the learning rate, the batch size or the number of epochs.
simple definition

#

bruh, stop spamming the same message

mint palm
#

lol

grave frost
tidal bronze
#

bro once in 15min is hardly spamming

mint palm
#

ya take help

tidal bronze
#

I already have a channel

grave frost
#

🤷

tidal bronze
#

bro quit spamming me

mint palm
#

😆

#

chill out guys

tidal bronze
#

now my message has lost visibility, thanks a lot...

grave frost
#

aight then, if you get muted due to cross-posting, don't blame me

mint palm
#

🤣

tidal bronze
#

alright if you get muted for going off-topic, don't blame me

mint palm
#

just delete previous ones and post one last time.....we are stopping the chat

tidal bronze
grave frost
#

QQ: If both the training set and eval set have the same ratio for the unbalanced classes, should I deal with the imbalance? (I am too lazy)
Funny, never even thought about this 🙂 Any ideas?

#

BTW My aim is just to get a good acc on the test set. Generalization be damned

novel oyster
#

feel free to @ me or dm if you see this and can help

desert oar
#

for bounded features, normalize to [0,1]

wicked mantle
#

How can i resize image to bounding box? in pytorch
i mean dynamically set (xmin, ymin, xmax, ymax) values to all images. I think transforms.Resize() can help me, but Resize() only takes two arguments and its not accurate to bounding box

#

seems like there are no way to crop it with pytorch, i'll use Pillow lib

grave frost
#

So I am doing fine-tuning with images, and I had a quick question.

#

The keras docs suggest that I should freeze my base model (not importing it's top part), and train the classifier placed at the end of the model. Then they would fine-tune on the whole model with SGD and a slow LR.

But intuitively, I was thinking that I would freeze the classifier 'layers' at the end of the model, allow the base model to train and learn the features from my specific dataset; then I would freeze the base model (which would perform feature extraction) and fine-tune with the same recipie on my custom/target dataset.

Why don't we do the second method, as opposed to the first?

wicked mantle
#

How to add machine learning model to discord bot?
For example, i have a model to predict cats, and i want to implement this predict model to bot

desert oar
#

if you want to train them separately, maybe use an autoencoder or something first and then put logistic regression or something on top of the low rank representation

#

but thats no better (and probably worse) than just doing it the way keras recommends

grave frost
desert oar
#

no, but the frozen model weights still affect the gradient, which affects the weight updates for the trainable weights

grave frost
#

hmm...so if I want to use the base model for feature extraction, what do I do?
should I train it seperately, then load that checkpoint for the base model?

#

I had the base model for feature extraction, with a small CNN as a classifier. now, is there somehow a way to remove that CNN classifier at the bottom, only train the base model and store it?

#

because if I include the CNN classifier, I won't be able to load it since keras doesn't recongize why there are weights for layers not in the base model - so it errors out

desert oar
#

can you do "everything but the last layer"?

#

like, include the convolutional layers from your model, but exclude the fully connected stuff at the end

grave frost
#

you mean train the base model only, with no other layers? on my source dataset

desert oar
#

what is the base model

#

and can you link to the keras doc recommendation? im curious what their wording is

grave frost
#

efficientnetb0 for now

desert oar
#

are you using the imagenet version or training from scratch

grave frost
#

Im not including the FC classifier at top, so that flag is False.

grave frost
#

gives a better initialization 🤷

#

so imagenet weights as a starting point, then somehow modify the base model to learn features from my own source dataset. freeze it up, add FC layers and train those FC Layers on my target dataset

desert oar
#

yeah. so train efficientb0 + fully-connected on your data, then: 1) freeze the efficientb0 and re-train the fully-connected layer at the end for better accuracy, and 2) go use the refined efficientb0 network for feature extraction elsewhere

grave frost
#

step 0 and 1 are on the same "big" dataset, right? and step 2 I can use it on my target dataset?

desert oar
#

that seems right, but what are you doing with the target dataset?

grave frost
desert oar
#

in that case, can you zero out and/or fine-tune the fully connected weights?

#

i think that should be roughly equivalent to taking the features from the efficientb0 part and stacking a separate model on top

grave frost
desert oar
#

transfer learning, thats what its called

grave frost
#

yeah, that's the guide I was referrring

desert oar
#

and yes either method is valid

grave frost
#

so, to summarize. I would have efficientnetb0 + F.C connected initially. I freeze the F.C Layers, and train on the base model alone......?
I think what I want is to somehow train the whole effnet+F.C on my source dataset, but export weights only for the effnet. that load it somewhere else, and train on target dataset

grave frost
#

So after reading up, it does seem keras provides two handy functions get_weights and set_weights to get weights of individual layer, and save them as numpy arrays to be loaded later (from an instantiated layer). Hopefully, with a bit of a luck I might be able try it.

desert oar
#

Wait

#

No

#

You want the opposite

grave frost
#

I don't get how I want the opposite?

#

I train my FC, but not my effnet?

desert oar
#

Yes

#

All the deep stuff requires a lot of data

#

So you train that part on the big data set

#

Are you saying that you think the extracted features should be different between your big data set and the target data set?

#

Which is why you would want to retrain the deep layers on the target?

grave frost
desert oar
#

Just just doesn't make sense to change what the deep stuff emits but freeze the classifier on top

#

Imagine you don't retrain the model but randomly rescale the final hidden layer outputs

#

Then your classifier weights will all be meaningless and your classifier will produce garbage

grave frost
#

Effnet + F.C gives me a decent accuracy. --> if I keep the Effnet there with the same weights, then it would extract the same features, right?
Then I just need to re-train the F.C on the new dataset (to learn to make sense of features from new dataset and use it to predict slightly different classes [which would be accomodated by the activation function]) shouldn't that theoretically work?

void egret
#

Hello. I have a question. I want to make bar plot from values of data frame, as you can see of screen. And the value I want to showed on the plot is on the bottom of picture. So I want percentage for every kind of education level. Is there any looping idea to make it happen without manually typing the differences? If its wrong chat for such a question, I'm sorry in advance.

desert oar
#

that's literally the description of transfer learning @grave frost

grave frost
#

aight. Then there is another slightly different method I can do. train the whole effnet + F.C shebang. then just use SGD with lower Lr on the new dataset?

desert oar
#

you're fine-tuning the effnet model on your big data set, then using that fine-tuned model for transfer learning on the target data set

#

im not sure how thats different

grave frost
desert oar
#

no. you were saying the opposite... or so i thought

#

where you freeze the fc layer at the end and update only the effnet parts, which makes no sense

grave frost
# desert oar im not sure how thats different

doing LR slowly on the whole model, taking extra assumption that the weights used in source dataset won't differ too much in the target one - whatever will, would get slowly updated

grave frost
#
base_model = keras.applications.Xception(
    weights="imagenet",  # Load weights pre-trained on ImageNet.
    input_shape=(150, 150, 3),
    include_top=False,
)  # Do not include the ImageNet classifier at the top.

# Freeze the base_model
base_model.trainable = False

# Create new model on top
inputs = keras.Input(shape=(150, 150, 3))
# The base model contains batchnorm layers. We want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the
# base_model is running in inference mode here.
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x)  # Regularize with dropout
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)

model.summary()

desert oar
#

yeah i just saw that

#

huh... so they are using a very low learning rate to update (not re-train) the base model, freezing the fully connected output model?

grave frost
#

then how is that supposed to work? if they freeze their feature extractor...then wouldn't it all just break down due to lack of features

desert oar
#

i dont think thats whats happening in this code

#

i think its the opposite

#

it looks like they are freezing the base model, and only training the output layer (as well as their other stuff on top)

grave frost
desert oar
#

hold on

#

back up

#

there are 2 things happening here:

  1. transfer learning: freeze the base model, train a new model on top
  2. fine-tuning: after step (1), un-freezing the entire model and running a few more epochs with a very low learning rate
#

at no point are they freezing the new layers and training only the base layers

grave frost
desert oar
#

but you have 2 datasets right? a big one and the target one?

#

and those are at least similar in domain?

grave frost
#

however, it seems imagenet is lucky for this dataset. so I just start it as a pseudo random initialization to learn features from the biggie dataset

desert oar
grave frost
desert oar
grave frost
#

I wanna do 2 & 3

#

just with training base again

#

to recognize features from my domain, not imagenet

#

think of it as initializing base with no weights

desert oar
#

but you are only using imagenet as initialization

grave frost
desert oar
#

so im not sure what your hangup is

grave frost
#

just coz it lets me converge to my needed features better and faster

desert oar
#

but why arent you doing 1?

grave frost
desert oar
#

then maybe imagenet isnt good initialization after all?

#

why do you expect better accuracy if you dont train on the big dataset?

grave frost
#

don't ask me why 🤷

lilac raven
#

If I have a large amount of files that with the same number of values that I want to average, like I want to average a large number of curves, and some of those files have NaN as there value in some data points, can I still use the X+Y+Z/N

grave frost
#

then using the trained F.T, I extract features from small one, and train another F.C from scratch to classify my small dataset

desert oar
#

i still dont see how this is different from the 1-3 steps i outlined

#

all i am saying is, dont freeze the fc network at the top and unfreeze the cnn at the base, and expect useful results

lilac raven
#

[#1 + #2 + #3 + #4...etc. / n (number of files)]. say those # files have arrays [#,#,#,#,#,#..etc] and some of them have [#,#,#,#,NaN,#,#NaN]. Can i still obtain an average curve out of those

#

or do I have to not use the NaN files

grave frost
desert oar
#

there is something called "imputation" for missing data in more advanced applications, but that will not help you here

lilac raven
#

ah damn

desert oar
#

you can't produce new information where no information exists

grave frost
#

Thanx a ton for the guidance @desert oar 👍 🚀

desert oar
lilac raven
#

i was hoping there was like normalizing function to zero out that certain position in an array and somehow still do the whole array

#

but that makes sense

desert oar
#

well you can tell numpy to omit the missing values for you

#

but thats just a convenience, its still removing them

lilac raven
#

I was thinking in a way like the scatter plot, you can scatter plot an array with NaN

#

values

#

but visually seeing something and averaging are different

desert oar
#

and what happens when you plot the missing values?

#

you just dont plot them

lilac raven
#

yeah

desert oar
#

same thing here

void egret
lilac raven
#
    for file in files:
       # for x in set(ids):
           # if file.startswith(str(x)):
                if file.endswith("_MID-R1-ECG.1D_hrv.txt"):
                    full_name = pathlib.Path(root) / file
                    try:
                        read_fname = full_name
                        data = np.loadtxt(read_fname)
                        avg = sum(data)/float(len(data))

                        np.savetxt("Average-MID-hrv.txt",np.array(data))
                    except Exception as e:
                            print (e``` it doesnt look like the Average file it prints out is in average, rather it is just the same values as the second MID-R1 file. I only have two files named that in the folder to see if the averaging works for now
#

in my avg=sum(data)/float(len(data)) line, do I need put something else other than data so it grabs all of the files that match data requirements (currently only 2 just to test)

serene scaffold
#

actually it's np.mean rather than an array method.

lilac raven
#

Data is the input from np.load txt which is input from read_fname

serene scaffold
lilac raven
#

And I made the for loop to be looking a number of them.

#

An array

#

1 d array with like 10 ish values

serene scaffold
#

so you'll want to use avg = np.mean(data).

lilac raven
#

So that will automatically take the multiple data files that I'm reading?

serene scaffold
#

you want to take multiple files, and do what?

#

concatenate the arrays from each one?

lilac raven
#

Take the average of all those areays

#

Arrays*

serene scaffold
#

what does it mean to take the average of all those arrays? do you want the output of that operation to be an array, or a single number?

lilac raven
#

To create an average file of one array with the 10 values, so each value is average of all files

eternal briar
#

Hi, In the help forum does anyone know Numpy / pandas

serene scaffold
lilac raven
#

[Avg, avg, avg, avg,] not [avg]

#

An average array, not one value

serene scaffold
#

Or this--the effect is the same

>>> a = array([[1, 2, 3], [4, 5, 6]])
>>> np.mean(a, axis=0)
array([2.5, 3.5, 4.5])
#

axis=0 is the key.

lilac raven
#

Yeah, so np.mean(data) will know to take all of the files that match My requirements?

serene scaffold
lilac raven
#

I'd have to use append then somehow

#

?

#

Like data =data.append under the original data

serene scaffold
#

you'd have to make a 2d array and then use np.mean to take the average of each row

#

so you'd actually want to use axis=1

lilac raven
#

Wouldnt utilizing append on data make it appends each new input in data

#

Since I'm looking at multiple files with the for loop

#

So would data becomes a 2d array after I append it

serene scaffold
#

It looks like Numpy might even handle it the same way

lilac raven
#

[#,#,#,#,#,#] would be an array not a list though right

serene scaffold
mossy stratus
#

anyone know how to graph an equation (3d) with matplotlib?

serene scaffold
#

!e

import numpy as np
a = [1, 2, 3]
b = [4, 5, 6]
print(np.mean([a, b], axis=1))
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

[2. 5.]
serene scaffold
#

Looks like numpy can give you the expected behavior using lists.

lilac raven
#

So doing np.mean(data) will be doing np.mean(datafile1,datafile2,datafile3,etc) with the way I have it reading in? I feel like I have to use append still for it to do that

serene scaffold
#

you would need to append each sub-list to data before np.mean(data, axis=1) is calculated.

minor marsh
#

Hi guys, does anyone knows a great tutorial/course about Recurrent Neural Networks with LSTM, I did an udemy course about it, but I stuck at the predictions part

exotic maple
mossy stratus
#
import matplotlib.pyplot as plt
import numpy as np
import sympy

fig = plt.figure()
ax = fig.add_subplot(projection='3d')

x = np.linspace(-5,5,500)
y = np.linspace(-5,5,500)
x,y = np.meshgrid(x,y)

e = sympy.solve('x**2/4+y**2/9+z**2/16-1',sympy.Symbol('z'))

z = -2*np.sqrt(-9*x**2 - 4*y**2 + 36)/3

ax.plot_surface(x, y, z)

plt.show()
#

this isn't right

#

not sure why

civic summit
#

Anyone have experience with model selection for binary independent variables and binary dependant variables? Thinking logit regression, but im trying actually trying to find which independent variables are most important and together have >70 accuracy, chi

#

12:00pm can't sleep thinking about projects.

hoary wigeon
#

Can anyone help me with kernel not found Problem ?

#

When i try to open jupyter-notebook it doesnt start,

  • Something Error 500 is thrown back
#
  • when i open jupyter-lab and create new book, The kernel doesnt respond.
#
  • when i open a existing notebook, It works
#

Now the problem is im not able to use jupyter in anaconda for creating new notebooks

#

First Screen on launching Jupyter Notebook

#

On creating new NOTEBOOK

hoary wigeon
#

Solved with this : conda install nbconvert=5.4.1

#

Thread Closed

#

Thank YOU

heavy bay
#

Can anyone help me with fitting data to my model?
This is the data

le = sklearn.preprocessing.LabelEncoder()
date = le.fit_transform(list(data["Date"]))
_open = le.fit_transform(list(data["Open"]))
high = le.fit_transform(list(data["High"]))
low = le.fit_transform(list(data["Low"]))
adj_close = le.fit_transform(list(data["Adj Close"]))
volume = le.fit_transform(list(data["Volume"]))

X = list(date)
y = list(zip(high, low, _open, adj_close, volume))

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.1)
```But when I try to fit the data into the model as displayed below```py
linear = sklearn.linear_model.LinearRegression()
linear.fit(x_train, y_train)``` I get this error ```powershell
ValueError: Expected 2D array, got 1D array instead:
array=[2088  311 1839 ... 2422   64 1705].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.``` Thanks
twin fiber
#

hello, is there any chance there is someone here who know's littlewoods rule? I am absolutely desperate to understand how to apply this rule to a question I have been given r.e. an assignment

#

any help is extremely appreciated ❤️

willow geyser
#

curious if using the GPU for tensorflow makes it more "even" compared to the CPU.
And if that's what makes it faster?

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @tacit wharf until 2021-05-06 09:44 (9 minutes and 58 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

limpid oak
#

need help

#

I have df which generate from pqsql, I can download it on my local system

#

but I want to create one url so when user visit that link script should run and output can be saved on user side

kindred radish
#

Anyone know about Spectral Clustering? Just want to list the steps I think should happen and have someone be like: "yeah that's right" or "that bit's wrong"

wicked mantle
#

RuntimeError: stack expects each tensor to be equal size, but got [3, 47, 47] at entry 0 and [1, 47, 47] at entry 5
This mean that my image isn't rgb?

hushed wasp
#
# Creation of histograms (features)
temps1=time.time()

def build_histogram(kmeans, des, image_num):
    res = kmeans.predict(des)
    hist = np.zeros(len(kmeans.cluster_centers_))
    nb_des=len(des)
    if nb_des==0 : print("problème histogramme image  : ", image_num)
    for i in res:
        hist[i] += 1.0/nb_des
    return hist


# Creation of a matrix of histograms
hist_vectors=[]

for i, image_desc in enumerate(imagesarray) :
    if i%100 == 0 : print(i)  
    hist = build_histogram(kmeans, image_desc.reshape(-1, 1), i) #calculates the histogram
    hist_vectors.append(hist) #histogram is the feature vector

im_features = np.asarray(hist_vectors)

duration1=time.time()-temps1
print("temps de création histogrammes : ", "%15.2f" % duration1, "secondes")```



Hello guys, I don't know why sometimes this code works and why sometimes it's looping again and again.... can anyone help?

Thanks
serene scaffold
hushed wasp
#

Yes exactly

#

I just run it again it just continues to loop except one time it's worked

#

It worked one time and not the second time! Just the number of pictures is different but it's not even a question of too much pictures, cause sometimes with less pictures it doesn't work too...

limpid oak
#

Creation of histograms (features)

temps1=time.time()

    res = kmeans.predict(des)
    hist = np.zeros(len(kmeans.cluster_centers_))
    nb_des=len(des)
    if nb_des==0 : print("problème histogramme image  : ", image_num)
    for i in res:
        hist[i] += 1.0/nb_des
        return hist


# Creation of a matrix of histograms
hist_vectors=[]

for i, image_desc in enumerate(imagesarray) :
    if i%100 == 0 : print(i)  
    hist = build_histogram(kmeans, image_desc.reshape(-1, 1), i) #calculates the histogram
    hist_vectors.append(hist) #histogram is the feature vector

im_features = np.asarray(hist_vectors)

duration1=time.time()-temps1
print("temps de création histogrammes : ", "%15.2f" % duration1, "secondes")```
#

try this ones @hushed wasp

hushed wasp
#

I've got the same error

#

isn't it exactly the same? :p

limpid oak
#

you getting error because you are returning hist outside of for loop

#

that's why you are getting hist of first input data

#

no, check for loop

hushed wasp
#

ok indeed

#

thx

limpid oak
#

try to plot it inside loop

#

check output there

hushed wasp
#

but I ve got the same looping over and over

limpid oak
#

show full error and script

hushed wasp
#

I don't really raise an error it just keep running and crash

#

however when it's working it just calculate the histograms in like few seconds

#

I adapted the code from a SIFT extraction that I try to use with some CNN

#

Working I only get this :

#

I just rerun the exact same code and know i have just iterations again and again, changing absolutely nothing... (in code and data)

limpid oak
#

in my opinion you should check line 11

hushed wasp
#

I will!

Thanks for giving me some of your time @limpid oak

civic summit
#

@limpid oak , do you have a sec to advise on the below? I have survey data that has independent variables that are all binary, as well as dependant variables that are binary. I am thinking logistic regression for my model of choice, but i am wondering what would be the best way of finding which independent variables are the most important predictors?

strange plinth
#

I have this matplotlib chart, how can I get two y axes, one for each line?

import matplotlib.pyplot as plt
import matplotlib.dates

fig, ax = plt.subplots()
fig.set_size_inches(12, 8)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b'))

plt.plot(df.when, df.total, ".-", label="# Commits", color="black", linewidth=.5)
plt.plot(df.when, df.pctcon, label="% Conventional", color="green", linewidth=4)
plt.legend()
plt.show()
left mulch
#

Plot the second one with ax2

strange plinth
desert oar
#
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
fig.set_size_inches(12, 8)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b'))

ax.plot(df.when, df.total, ".-", label="# Commits", color="black", linewidth=.5)

ax2 = ax.twinx()
ax2.plot(df.when, df.pctcon, label="% Conventional", color="green", linewidth=4)

fig.legend()
fig.show()

like this?

left mulch
#

Yes

strange plinth
#

Beautiful, thanks!

#

Hmm, the legend is a little borked like this. It appears, but i get a warning also?

#

UserWarning: Matplotlib is currently using module://ipykernel.pylab.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()

#

this is in a jupyter notebook

desert oar
#

oh, use plt.legend and plt.show i guess. maybe theres some subtle difference

#

%matplotlib inline in the jupyter notebook should tell matplotlib to plot in the notebook and not elsewhere, but maybe plt.show does that automatically while fig.show doesn't

strange plinth
#

plt.legend removes the warning, but now only one line is mentioned in the legend.

#

matplotlib is weird....

desert oar
#

not sure why the legend wouldnt detect this automatically, but you can manually specify the lines to be used in the legend

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
fig.set_size_inches(12, 8)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b'))

line1 = ax.plot(df.when, df.total, ".-", label="# Commits", color="black", linewidth=.5)

ax2 = ax.twinx()
lien2 = ax2.plot(df.when, df.pctcon, label="% Conventional", color="green", linewidth=4)

plt.legend([line1, line2], [line1.label, line2.label])
plt.show()
strange plinth
#

can't find .label, but i can repeat the strings.

desert oar
#

sorry, use .get_label()

strange plinth
#

Nice! Now I want to force ax2 to go 0-100

strange plinth
#

YAY! thanks 🙂

desert oar
sly salmon
#

hey guys - when getting into algorithms I found a book intended for absolute beginners (Grokking Algorithms) which simplified concepts simply and built up. Are there any books similar to this (for beginners) which you guys would recommend?
alternatively, do you guys have any positive sentiment to platforms like dataquest?

mint palm
#

"the dotted function is noisy in the diagram" is what i understood earlier but now the instructor says mini batch norm makes the z(tilde) more noisy due to using mean / variance for each mini batch seperatly......what does he mean here.....arent we actually just scalling z so should actually stable mini batches

shy kraken
#

This is driving me nuts, I'm trying to do a simple rolling average of some data. Blue line and points is the data, yellow line is supposed to be the moving average:

#

I don't understand why the yellow line would be equal to the where the blue dot is on the right side of the graph. It makes no sense

#

This is my code:

#

sma10 = data['PX']/data['WOS'].iloc[::-1].rolling(10).mean()

#

shouldn't that be reading in ten datapoints, finding the average? I had to throw in a .iloc[::1] because it was starting calcs from the left side

sick wedge
#

how do i get my current working directory on the left like this on spyder

fervent zenith
#

how do i convert this to decimal value 3,2 to 32.0

lapis sequoia
lunar zenith
#
import matplotlib.pyplot as plt 
import pandas as pd

ra = pd.read_csv("ramen-ratings.csv")

new_ra5 = ra.loc[(ra["Stars"] != "Unrated")]

new_ra5["Stars"] = ra["Stars"].astype(float)

new_ra5 = ra.groupby("Country","Stars").mean() 
new_ra1 = ra.groupby("Country").Stars.count() 
for x in new_ra: 
    print(new_ra[x] / new_ra1)
 

for x in new_ra1: 
    print(x)``` I have this code, how do I fix the string to float: 'Unrated' error?
fervent zenith
lapis sequoia
#

maybe put a 1 in the column next to those ones, then sort them

#

@fervent zenith excel can't distinguish 50,8 as 508 or 50.8 unless it has some more information

shut slate
#

Hey guys

#

Does anyone know how to read the cluster centeres of Kmeans?

#

Like this is what I have

#

What does this mean exactly?

desert oar
#

@shut slate your matrix had 10 columns originally?

#

each cluster center is a vector of 10 coordinates

shut slate
#

Thank you. But how do I make sense of what the clusters are?

#

I have the clusters and dont know what to do with it

desert oar
#

what is this data?

#

what are you trying to achieve by using k-means?

shut slate
#

Well the data is the housing market in Melbourne. I am trying to figure ut the price

desert oar
#

so why are you doing k-means clustering?

#

you want a price that corresponds to each cluster?

#

let's say you have N rows and P features, and you perform K means clustering. then the cluster_centers_ is a K x P array, where each row is a cluster center, and each column corresponds to one of your original data features.

#

so if price is the 2nd feature in your data, then price will be the 2nd element of each row of the cluster_centers_ array

shut slate
#

Actually yeah, i don't know why I am doing the clusterring. Here is why now that I think abot it, i want to know how the combinations of each feature corresponds to price I guess?

desert oar
#

however note that you need all 10 elements to "describe" the cluster. you could have 2 clusters with similar mean prices, but very different values of the other features

shut slate
#

So I guess my first problem is, Ok I clustered it into 3 clusters

#

Now what does that mean and how do I use it

#

lol

desert oar
#

cluster analysis is fine as an exploratory technique, just keep in mind that k-means in particular tends to try to find equal-sized roughly-spherical clusters and won't necessarily give intelligent results unless you do more work to choose a suitable K

#

e.g. if you have completely random data it will still find K clusters for you, but those clusters will amount to basically segmenting the data into equal sizes and aren't so much "clusters" as they are "segments" with hard boundaries

shut slate
#

I did the elbow analysisand it showed to do 3 clusters

desert oar
#

ok, thats a reasonable place to start then

#

its still better to think of k-means output as "segments" rather than "clusters"

#

the two things you can do with k-means are:

  1. look at the cluster/segment centers
  2. determine which segment a data point belongs to, which amounts to finding the closest cluster center
shut slate
#

Ok makes sense, is there any way I can visualize the clusters?

#

for example

grave breach
#

Yes

shut slate
#

So does this mean it just clustered by yearbuilt?

grave breach
#

Year built on the x

#

And price on the y

shut slate
#

the hue is the cluster

grave breach
#

Yes, that should be the cluster

shut slate
#

and as I can see is that from 1880 to 130s its one cluster

#

1930*

#

1940 to 1980 is 2nd

#

and the 3rd is 1980 to 2020

#

or am I talking non sense here

grave breach
#

That should be correct

#

But I cannot clearly see the picture

shut slate
#

sec

#

but other features for example

grave breach
#

Yes, that is correct

shut slate
#

So I just solved how it clusstered?

#

And can you get Python to tell you what features mattered the most?

grave breach
#

It depends on the classifier

desert oar
#

int this case it's obvious that year built is mainly driving the clustering from the plot

#

that's possibly because the scales of the numbers are all off

#

or you can do the distance between cluster centers feature-wise, that's a nice idea too

#

you should probably standardize your data before doing k-means

shut slate
#

Ok will look into it l8

#

Thank you all

grave breach
#

You're welcome

zinc lark
#

anyone have an idea on why pytorch 11.1 cuda is so much bigger than 10.2 cuda?

coral kindle
#

Since installing pytorch through conda means you're downloading the associated CUDA version, that means everything comes prepackaged, even CuDNN

grave frost
#

but there hasn't been that major of a revamp to add 1.2GB to the Cuda toolkit

coral kindle
#

10 to 11 implies there's been one. I haven't checked the CUDA changelog however

lapis sequoia
#

Is there any obvious pattern on how to design a deep learning model?

#

Like how do you know what to set the parameters to?

grave frost
lapis sequoia
#

there are so many parameters

grave frost
#

because you don't need to tune them all to get a decent accuracy?

lapis sequoia
#

I see

harsh karma
#

idk if this is the place to ask but why does heroku install scipy? i had it in requirements.txt but i removed it and all but it still installs it

desert oar
harsh karma
#

i narrowed it down to this:

discord.py==1.7.2
pafy==0.5.5
praw==7.1.0
prawcore==1.5.0
premailer==3.7.0
protobuf==3.15.1
pycparser==2.20
pylint==2.6.0
python-dateutil==2.8.1
requests==2.25.1
yagmail==0.14.245
youtube-dl==2020.12.31
youtubepy==6.0.2```
#

i deleted all the scientific stuff but it still downloaded it

#

who thought this was a good idea

grave frost
#

hmm...is it a problem if it downloads scipy?

dry terrace
#

hey i would like to copy a dataset based on another but not directly. e.g. per column the distribution should remain and also conditional probabilities.
My approach so far is to see which column can be described exactly with the fewest conditions (Prob(A=1|B) = 1) and then set this column. Iteratively repeat until all columns are described.

The problem is that the column distribution may be destroyed. Does anyone have a better idea? The goal is to create a more anonymous dataset, which still has the best possible quality.

clever bramble
coral kindle
#

I'm not sure which channel is appropriate for that, but has anybody managed to parse the insides of a PDF document using nothing but raw Python?

broken stratus
#

Hey is anyone here good at pytoch...I need help to change my code from theano to pytorch

coral kindle
#

I have some PyTorch knowledge but I never touched Theano

native nimbus
desert oar
#

I believe a VAE can do that

#

However if you ask on https://stats.stackexchange.com you might get more interesting and helpful answers

olive moat
#

this might be an xy but i'm going to explain this as best i can
i am using pytorch and trying to create an lstm that takes a character and maps it to another, but i'm struggling a little with the representation of characters
everything i've seen encodes characters with onehot vectors, however i'm wondering why class labels aren't used instead? i.e. an integer 0-25 each one representing a letter, possibly a 26th representing padding
another issue is that i am trying to use Cross Entropy Loss and it the target seems to be required to be class label encoded instead of onehot, so why not just use class labels in the first place?
i'm a little lost :P

neon marsh
#

For anyone that used jupyter notebook. Does jupyter notebook run better if your pc is more powerful since jupyter notebook runs on your browser

austere swift
#

it runs in your browser but it still uses local resources, it just runs on a web server on your pc

#

so yes

#

its not using any external server to run the code or anything

serene scaffold
neon marsh
#

Alright thank you

serene scaffold
neon marsh
#

Ohh ok will look into that

spring obsidian
serene scaffold
#

If they don't teach them to use notebooks, they teach them to write getters and setters. You can't win.

spring obsidian
serene scaffold
#

Is the thinking there that teaching them to setup Python on their machine would be too cumbersome?

#

(Because I wouldn't blame you if that's the thinking.)

spring obsidian
#

We have one class for both of these students: "I can't even navigate a file system from the command line" and "I actually know how to code already"

blazing bridge
#

I think the reason for that would be to just stick to the fundamentals of the course so it can cater to both spectrums

desert oar
#

One legitimate use is for "literate programming" homework assignments that mix code and written solutions

spring obsidian
#

So we introduce VS Code eventually, but it starts with the replit.com "ide"

#

I feel like a shill for repl.it 😄

serene scaffold
desert oar
#

Repl.it is great. That said I don't see why notebooks are so much worse than anything else.

serene scaffold
#

but yeah, that's always the dilemma with those courses.

desert oar
#

I do think in a university setting there should be a mandatory one or two credit "how to use the unix flavored cli" course

spring obsidian
blazing bridge
serene scaffold
spring obsidian
serene scaffold
#

That being said, if you understand the problems with notebooks and are quite specifically trying to do exploratory analysis, I guess you can have at it.

spring obsidian
#

Is Spyder really used out there in the professional DS world?

desert oar
spring obsidian
#

Yeah... RStudio is probably the only thing I miss from before I switched to Python

lapis sequoia
#

Does anyone here use Chatterbot, and if so, do you know of any corpuses I can use to train my bot to be nice, and not ask people when they're gonna die?

burnt bronze
desert oar
#

@burnt bronze need more context. what are you doing? what schema? what code is doing it? etc etc

lunar zenith
#
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.read_csv('ramen-ratings.csv')

countries = df['Country']
ls = {}
for x in countries:
    if ls.get(x) is None:
        ls[x] = 1
    else:
        ls[x] += 1

countries = ls.keys()
df = pd.DataFrame.from_dict(ls.items())
df.index = countries
df.plot.bar(figsize = (15,4.5))


plt.title("Number of Ratings per Country", fontdict = {'fontsize': 15})
plt.xlabel("Countries", fontdict = {'fontsize': 15})
plt.xticks(rotation = 90)
plt.ylabel("Number of Ratings", fontdict = {'fontsize': 15})
plt.show()``` Anyone know how I'd remove the '1' legend?
grave frost
#

Hey I don't get the hate - what's wrong with notebooks? 😛

visual umbra
#

do you guys have any recommendations for applying knowledge from andrew ng's coursera into projects

kindred blade
#

does anyone know what's better to use for data visualization matplotlib or charts js

#

I found chartsJs is more customizable but matplotlib is a wonderful library

kindred blade
#

why

#

what does matplotlib has that chartsJs doesnt

#

and isnt charts Js a javascript library so i think it works better on web though I love matplotlib and used it alot

silk tulip
grave frost
#

anything you find interesting 🤷@visual umbra

upbeat topaz
#

hello

#

Is this where we learn to code

#

or is that in another section

desert oar
#

@upbeat topaz we don't have a general "learning" channel. we do have a list of resources, and a lot of help channels for targeted help, see #❓|how-to-get-help

#

!resources

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

desert oar
#

also don't forget to read channel topics 🙂 it's the text up at the top of your discord window, to the right of the channel name and to the left of the search bar. you can click on it to read the whole thing

upbeat topaz
#

okay

serene scaffold
#

I'm not sure what is meant by "samples of scores" in "Calculate the T-test for the means of two independent samples of scores.".

desert oar
#

i don't think it means anything here

#

this is just a t-test for 2 independent samples

#

the "scores" thing i think is just bad wording and/or someone lazily copying from their stats-for-engineers textbook

serene scaffold
#

!e

from scipy.stats import ttest_ind as tt
result = tt(['a', 'a', 'b'], ['a', 'b', 'b'])
print(result)
arctic wedgeBOT
#

@serene scaffold :x: Your eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 2, in <module>
003 |   File "/snekbox/user_base/lib/python3.9/site-packages/scipy/stats/stats.py", line 5771, in ttest_ind
004 |     v1 = np.var(a, axis, ddof=1)
005 |   File "<__array_function__ internals>", line 5, in var
006 |   File "/snekbox/user_base/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 3702, in var
007 |     return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
008 |   File "/snekbox/user_base/lib/python3.9/site-packages/numpy/core/_methods.py", line 211, in _var
009 |     arrmean = umr_sum(arr, axis, dtype, keepdims=True, where=where)
010 | TypeError: cannot perform reduce with flexible type
serene scaffold
#

@desert oar does one have to assign arbitrary integers to each label? that seems odd.

desert oar
#

wait, what are you trying to do here

serene scaffold
#

determine the statistical significance of two sets of predictions

desert oar
#

banish that sentence from your mind

serene scaffold
#

(that is, whether changes in the design of the model changed the predictions in the second as compared to the first in a way that can't be accounted for by random chance)

desert oar
#

it sounds like you want to test whether 2 samples are from the same bernoulli distribution

serene scaffold
#

I've never heard of bernoulli

desert oar
#

bernoulli is yes/no with some probability p of "yes"

#

(meaning that "no" has probability 1-p)

#

the probability p does happen to be the mean of the bernoulli distribution

#

so yes you can use a 2-sample t-test to test the hypothesis of whether p1 and p2 are equal, if the samples are big enough for the central limit theorem to kick in. but you should calculate it directly, don't use this function (which tries to calculate the means from the data which it expects to be numeric)

#

note that you fundamentally can't assume equal variances unless the null hypothesis is true, because the variance of a bernoulli is p * (1-p). so if p1 != p2 then obviously p1 * (1-p1) != p2 * (1-p2)

serene scaffold
#

.bm 840303279405793302

#

Thanks!

desert oar
desert oar
serene scaffold
desert oar
#

ooh

#

yeah read those 2 links i sent. the 2nd is probably more useful to you

serene scaffold
#

okay lemon_hyperpleased

obsidian quail
#

hey, I'm trying to plot times in the format %H:%M on x_axis, but it is only returning 00:00.
When using dates, it seems to work fine, but not with the times.
here's the code:

plt.plot(time_x,players_y)

plt.gcf().autofmt_xdate()

date_format = mpl_dates.DateFormatter('%H:%M')

plt.gca().xaxis.set_major_formatter(date_format)

plt.tight_layout()
plt.show()```
`time_x` is just regular datetime format eg. `2021-05-07 18:08:38`
Any ideas what's happening? (new to mpl)
desert oar
#

@obsidian quail can you provide some sample data to reproduce with

obsidian quail
#

1 sec

#

['2021-05-07 18:07:52', '2021-05-07 18:07:54', '2021-05-07 18:07:56', '2021-05-07 18:07:59', '2021-05-07 18:08:01', '2021-05-07 18:08:05', '2021-05-07 18:08:07', '2021-05-07 18:08:22', '2021-05-07 18:08:24', '2021-05-07 18:08:26', '2021-05-07 18:08:36', '2021-05-07 18:08:38', '2021-05-07 18:08:41', '2021-05-07 18:09:24', '2021-05-07 18:09:38', '2021-05-07 18:09:40', '2021-05-07 18:09:42', '2021-05-07 18:09:44', '2021-05-07 18:09:46', '2021-05-07 18:09:49', '2021-05-07 18:18:28', '2021-05-07 18:19:21', '2021-05-07 18:20:21', '2021-05-07 18:21:21']

#

[31, 35, 37, 31, 21, 24, 28, 44, 45, 39, 33, 32, 29, 19, 21, 24, 27, 29, 34, 55, 32, 27, 29, 25]

desert oar
#

@obsidian quail it's because your time_x is all strings

#

i don't think matplotlib is smart enough to do that conversion for you

obsidian quail
#

ah right, I'm not too familiar with datetime, I'm storing the values in an sqlite table, how would I convert into datetime whilst in the list?

desert oar
#

pd.to_datetime would be the easiest option

#

you're storing them as these timestamp strings in sqlite?

obsidian quail
#

as integers

desert oar
#

like unix timestamps?

obsidian quail
#
c.execute("SELECT * FROM last_24")
players_y = []
time_x = []
for x in c:
    players_y.append(x[1])
    time_x.append(x[0])```
I'm then appending the values to each list?
desert oar
#

it sounds like you have an int type on the column but are writing strings to the db

#

sqlite will let you write the wrong datatype to a column

#

(i think its a non-feature but they have their reasons for doing it)

obsidian quail
#

hmm, it's a default value time integer DEFAULT (datetime('now', 'localtime'))
Would it be due to this?
(btw let me know if we should move to #databases if its getting offtopic here)

desert oar
#
from datetime import datetime

c.execute("SELECT * FROM last_24")
players_y = []
time_x = []
for x in c:
    players_y.append(x[1])
    time_x.append(datetime.strptime('%Y-%m-%d %H:%M:%S', x[0]))

this should work, or something like it anyway

#

there is a "more right" way to do this

obsidian quail
#

👍

fleet tundra
#

Hello, I'm trying to find the combination of one entry of Column B with the other entries in the column B while trying to rank the top 5 entries it repeatedly occurs with reference to column A. Can anyone help me how to do this with pandas?

lapis sequoia
#

how would u attempt to train a nn with a lot of classes and low images? or at least, not the same amount of images per class

desert oar
desert oar
grave frost
#

imagenet to the rescue

desert oar
#

for the not-enough-data case, if you can gather a large amount of unlabeled data but only a small amount of labeled data, train an unsupervised model on the big unlabeled dataset then use it to create features for the small labeled dataset

#

for imbalanced data, honestly even when i was a professional data scientist working with other professional data scientists we still struggled with this

#

it is a hard and unsolved problem

grave frost
desert oar
#

sometimes (eg with images) you can get some traction with data augmentation and/or generation. you can also try oversampling and/or undersampling but i dont know of anyone who gets great results with that.

desert oar
#

i.e. we didnt solve it...

grave frost
desert oar
#

at least, that alleviated the worst of the problem. we still ended up with severely unbalanced data, and we kind of just accepted that our accuracy on those classes would be really bad

grave frost
#

hmmm....what was it on tho?

desert oar
#

so we adjusted our performance metrics and set expectations with the business stakeholders accordingly

grave frost
#

like the task/dataset?

desert oar
#

we never got improvements by using any "fancy" methods. it only ever added noise.

#

yeah, our business had a huge amount of hand-constructed "categories" for different types of businesses, and we had to figure out the type of a business based on whatever we could find about it

grave frost
#

tried DAGAN? it works for quite many use cases. I was thinking of using it, but didn't want to spend so much compute power/$ on it.

desert oar
#

we could get its address (e.g. for zoning information), name, scrape the web for its facebook page, etc

#

so as you can imagine we got great results when distinguishing photographers from nightclubs, but distinguishing bars from nightclubs was a lot harder

#

(made up example but you get the idea)

#

and the imbalance was because we only had like 3 nightclubs and 6,000 bars (again made up but not far from what we saw in some cases)

grave frost
#

don't companies tell what they do on the website? just scrape it all, filter and BERT is up

desert oar
#

and there were 1000+ of these classes

grave frost
desert oar
#

you would think so, and yes we used something based on bert ensembled with another model using a bunch of tabular metadata

grave frost
#

uh-huh. tried weighting?

desert oar
#

yep it helped a bit

#

but we ran up against the lower bound of "almost 0 data" at some points

#

weighting is great when you have 15 observations vs 150 observations

#

but when you have 5 observations youre kind of at the mercy of the dungeon generation algorithm, so to speak

grave frost
#

oof.

#

that looks hard af

desert oar
#

it sucked and it sucked the life out of our team and i think theyre still working on it long after i quit

#

but now im just ranting 😛

grave frost
#

hmmm....

desert oar
#

i havent used DAGAN

grave frost
#

nah, leave it

#

it's for image data augmentation using GAN's

fleet tundra
desert oar
#

Data Augmentation Generative Adversarial Networks
heh i did actually consider doing something like this

#

seemed like a rabbit hole though

grave frost
#

doesn't work too well on numerical? atleast I haven't read any papers that do explore smthin like that

desert oar
#

i wouldnt know. also a lot of the data was categorical anyway

#

"annual revenue < 10k, 10-100k, 100k+"

#

giant clusterfuck

grave frost
#

anyways, if those companies are in US, doesn't some org keep track of companies and their type?

desert oar
#

you would think so!

#

several do

#

none of them do it reliably

#

its not like humans who have a social security number and 3 well curated credit scores available from 3 well known agencies

#

its a dumpster fire and there isn't even a good "ripe for innovation" solution

#

its probably better in other countries where the government actually does things

desert oar
grave frost
#

I mean - you could do the reverse. if data on bar clubs is less, then reverse search and scrape

#

atleast you can get that 5 up to 10, if not 15

desert oar
#

wdym reverse search

grave frost
#

and scrape

desert oar
desert oar
#

like i said we ended up spending a lot of money to do pretty much that: find more businesses in these categories

#

the problem was that the data you can buy from 3rd parties doesnt have the special in-house labels

fleet tundra
desert oar
grave frost
#

2 inputs - for both dataset categories

#

some might overlap, so you can hand remove them

#

but overall, would be some work to ensure proper dtypes and no NAN - but I guess would be easy for data scientiests

desert oar
#

got a reference on this? i ended up separately creating vector embeddings on the "big" unlabeled dataset then applied those to the "small" labeled data

lapis sequoia
#

mmm

grave frost
lapis sequoia
#

is there a way to search with a py script images on google images and download them?

lapis sequoia
#

so since i have the labels names, i can search the label on google and download a few

grave frost
#

but the real pain would be the data cleaning

lapis sequoia
#

i dont wanna do it manually lol

desert oar
grave frost
lapis sequoia
#

ah

desert oar
#

!rules 5

arctic wedgeBOT
#

5. Do not provide or request help on projects that may break laws, breach terms of services, be considered malicious or inappropriate. Do not help with ongoing exams. Do not provide or request solutions for graded assignments, although general guidance is okay.

lapis sequoia
#

unless you use an official google image search api

#

so u cant help me

#

nice

desert oar
#

those are the rules

grave frost
lapis sequoia
#

i repeat

#

unless you use an official google image search api

#

maybe u dont even read what u write

grave frost
#

don't get too triggered

desert oar
#

i dont think being rude or sarcastic to volunteer helpers on the internet is a good idea either

lapis sequoia
#

so how are u a volunteer helper if u just said u wont

#

XDDD

#

xdxdxd

grave frost
#

very high model complexity, but less time mucking about data

desert oar
#

what would such a model be learning?

#

i have to step out but @ me so i dont miss your messages

fleet tundra
# desert oar can you give an example? it sounds like you might want to use `.groupby`

Well, yeah. Column A has say 10 orders from 1,2,3 (clients) with each order having a combo from the list of products a, b, c, d, e. I want to know in 10 orders how many times 'a' occurs along with 'b' and 'd' and with which orders so I can assign them based on the client. I want to do the same with every product. That is to find the combinations within product column but order specific. Then I want to find the aggregate of the most occurred combinations for each product

#

Do I make sense?

dim olive
lapis sequoia
#

????

#

wtf are u talking about? lol. When did i obliged someone? rofl

grave frost
# desert oar i guess im still not sure what this would mean. if i have 2 different "images" t...

@desert oar basically from what I know, you have two input layers to take data from those 2 dataset. (tbh visually would be better) so when you build your networks for those 2 inputs - you basically have a multi-branch network (smthing like image segmentation and bounding box together, I beleive)

Anyways, you can have initial layers for each input (say conv's) and then at some point you would need a bottleneck - to merge both inputs together. you would have to create multple other "branches" too to capture complex hierarchial relations from both dataset.

This is where concat comes in - it would provide a bottleneck. I have attached a visual image that kinda gives an example. While a single LSTM layer may not be well served for most datasets - this is where multi-branched networks comes in.

nets like inception are a pain in the <> to work with due to their branched structure, but ofc complex model helps so much better than spending months on data processing and alignment.

again, I haven't implemented so there may be some important aspect I might have missed out - so take my ideas with a grain of salt 👍

#

Another extremnely basic one - doesn't seem to be too much a problem as long as you pool and flatten the conv outputs eh?

dim olive
lapis sequoia
#

i dont have to discuss anything. It seems u didnt read the whole conversation

grave frost
#

theoretically, sounds good. multiple branches constituting different networks architectures working on different data types. but I guess you can always ping up your buddies to suggest and see if they might research a bit bout it

#

oh wait - you can also do transfer learning 🤣 it would be hell, but if you train different branches seperately and use set_weights/get_weights to reconstruct seperate tf.keras.Model with it, you can also fine-tune the whole damn thing brainmon

desert oar
#

@grave frost i think this still doesn't apply to the particular case i described, whereas transfer learning would. this is just segmenting the input features for a single record. it's a great idea, kind of like doing what we did with building 2 models and ensembling them, but in a single network

#

but it doesn't solve the "use data from the big dataset to inform model on small dataset"

#

whereas transfer learning is meant for this (re: our discussion from a while ago)

#

nice graphics though

#

very helpful

desert oar
fleet tundra
desert oar
#

it'd help if you gave some actual example data and example outputs

#

im not sure what "filtered by order" means, i thought that's what a combination w/ other products meant

fleet tundra
#

Okay. Sorry about that.

   100          A             1
                B        
   101          A             2
                C   
   101a         B
                C 
   102          D             3
                A         
                B
                C```
If that's my data, I'd like to get a each client to be recommended of other products based on other purchases. Here Client 2 has 2 orders so while grouping each order is treated as a separate entity for aggregation of product suggestion
fleet tundra
fervent turret
#

def formatCurrency (balance):
if balance == savBal:
return str(updateBalance(balance, savIR))

def updateBalance (balance, rate):
savBal = balance + balance * rate/100
else:
chBal = balance + balance * rate/100
return balance + balance * rate/100

savBal = float(input("Enter your savings balance:"))
chBal = float(input("Enter your checking balance:"))
savIR = float(input("Enter your savings interest rate %:"))
chIR = float(input("Enter your checking interest rate %"))

print("Your updated savings balance is", formatCurrency(savBal))
print("Your updated checking balance is", formatCurrency(chBal))

#

can someone plese help me

#

please*

grave frost
serene scaffold
grave frost
#

ofc, I wouldn't know the immediate advantages of that as opposed to a ensemble, but I don't see in what case ever unified architectures would not be appropriate.

#

Plus you are missing the key point @desert oar with multiple input nodes, the network can optimize what features to use from each branch and using them in various degrees (dropping features with no significance) and allows it to pool the information gained from each branch far more accurately than a naive ensemble.

lapis sequoia
#

So, I have an interesting thing with GPT-2 and discord.
I wanna use GPT-2 to make a chatbot, but I don't know how to filter out previous messages from it's output so it only makes 1 message in response to other messages and stuff.
Currently, I append every message into a list called convo, and I feed it the previous 3 messages.

#

It generates the message, and sends it to the discord channel, which triggers another message, adding it's message to the conversation. the convo list stores everything, but also when I pass the previous 3 messages in, and it outputs, it's output contains the previous messages, which leads to messages getting huge, like over 2000 characters, in a matter of seconds. My current code is:

if message.channel.id == focus_channel:
      messages = chatlog[-3:]
      convo = ""
      for x in chatlog:
        convo += f"{x}\n"
      inputs = tokenizer.encode(convo, return_tensors='pt')
      outputs = model.generate(inputs, max_length=50, do_sample=True)   
      text = tokenizer.decode(outputs[0], skip_special_tokens=True)
      print(text.split("\n"))
      await message.channel.send(text,reference=message,mention_author=False)
austere swift
#

can't you just take the last message from the convo list?

#

i might be misunderstanding your question

lapis sequoia
#

I want to take the last few messages so that it continues the conversation instead of completeing the message.

#

Currently it's output is

message1
message2
message3
bot_response
with
newlines
carmine iron
#

sometimes i see

#

what does this mean / do

austere swift
#

so just take the bot_response line

#

you can do something like text.split("\n")[3]

lapis sequoia
#

But what about all the rest of it's response? (It's responses have newlines in them)

#

wait

#

I can probably just do text.split("\n")[3:]

#

Nope, that breaks horribly, just like the other times lol

mint palm
#

the code of a course using tf.train.GradientDescentOptimizer(0.01).minimize(cost)

#

but this GradientDescentOptimizer is in version 1 of tensorflow

#

should i be doing this course?

#

or is it outdated

autumn basin
#

It’s outdated

#

If it’s using TF 1.0

mint palm
#

they say this

ripe forge
#

Tf2, specifically it's keras api, will be a lot nicer.

#

Unfortunately you'll probably run into this issue with a lot of courses probably, since tensor flow 2 is relatively new.

#

So perhaps as long as you're willing to port the codes over or at least get a sense of how the same code could be written in Tf2 you could proceed.

mint palm
ripe forge
#

Ultimately the specifics of writing code don't really take away from the learnings offered by a course itself. So decide based on whether the course is good or not.

mint palm
#

ok i believe its good

ripe forge
#

OK cool, then in that case just keep the caveat in mind, you might have to modify the codes presented

#

In most cases Tf2 keras looks very similar to keras so it's an easy port if they use keras.

mint palm
#

yeah they say they will use keras later

silver widget
#

Hi all, got a question about overfitting. Can this be assumed as overfitting? or the model is good?
LR scores
0.9452887537993921
precision recall f1-score support

       0       0.95      1.00      0.97       933
       1       0.00      0.00      0.00        54

accuracy                           0.95       987

macro avg 0.47 0.50 0.49 987
weighted avg 0.89 0.95 0.92 987

[[933 0]
[ 54 0]]

#

I see the accuracy is good but the 0's at the negative side makes me question the model

#

also the precision, recall and f1 scores of 1( people had a stroke) is 0

ripe forge
#

Is this score on train or test.

ripe forge
#

This model can be entirely replaced by a single line of code print("NO stroke")

#

To state it more explicitly, the problem here is class imbalance. Your dataset is going to have more cases without stroke than with. Using accuracy as a metric then, would favour a model that just gets the no stroke cases correct.

#

I'm going to go out on a limb and assume that you agree that would make for a fairly bad model. So this isn't a good model, and our metric choice of accuracy isn't appropriate

#

As for overfitting, you only get a sense of that if you compare the fit on train vs the fit on test, after choosing a good metric.

silver widget
silver widget
silver widget
small mulch
#

Hello, I am writing a code, which scrapes data from a website. Now, I want to put it in an excel file. What will be the best - csv or pandas or if there is something else? Also, I want it to append the new fields to the existing file and not make a new one when it is run again

ripe forge
#

Pandas should be easy to work with once you're used to it.

small mulch
#

Okay, Thank you so much!

somber prism
#

anyone know how to fill in the specified value based on the query for the df

#

so for eg if theres a 0 in anyone of the column in pandas df , i want to change those values to some specified val

#

how do i do it without using for or while loop

#

i know about fillna but this isnt for NaN values

desert oar
desert oar
# fleet tundra Okay. Sorry about that. ```Order Product Client 100 A ...

But I still don't understand how you want to calculate these recommendations. Can you give an example of the frequencies you calculate for this data?

Also, there are known algorithms that sound like what you're trying to do. You might want to look into "association rules" https://towardsdatascience.com/association-rules-2-aa9a77241654, and "collaborative filtering" https://towardsdatascience.com/intro-to-recommender-system-collaborative-filtering-64a238194a26.

desert oar
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | [1, 1, 0, 0, 2, 2]
002 | [1, 1, 9, 9, 2, 2]
003 | [1, 1, 9, 9, 2, 2]
desert oar
grave frost
#

@lapis sequoia GPT-2 can't hold a convo very effectively though (not as good as it's better counterpart)

grave frost
winged stratus
#

I wanted to know how much images are required to make a decent quality gan

#

ofc thats a very general question

#

but i have 300 1000x1000 images of mountains

#

and i plan on training a wasserstein gan on these images

#

would that be enough

#

or should i collect more images?

#

i have trash computer, so i dont want to waste a lot of time on a gan that wont work on 300 images

#

thanks!

grave breach
#

Even tough GANs works with a very few data I don't think 300 is enough

#

Also, I think in this case lowering the quality could help a lot

#

@winged stratus

winged stratus
#

thanks