#data-science-and-ml

1 messages · Page 302 of 1

still karma
#

We will learn derivative and integrals in grade 12

winged yew
#

should i send my file ?? or error pic is enough ??

grave frost
still karma
#

Okey then

serene scaffold
grave frost
#

just make sure your grades in maths are tip-top - I usually don't compare with grades but many colleges do see math grades so keeping them at a good level helps a lot

winged yew
#

i dont understand

#

i will send file here ok

#

?

serene scaffold
winged yew
#

why

serene scaffold
#

We disabled it because we don't want people to have to download files. We do have a pastebin

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

winged yew
#

ok ok

#

i will jst send pic then ok ?

serene scaffold
#

You can if you'd like. People are more likely to help with text.

winged yew
#

ok

mortal trout
#

Hi can someone help me, suppose i have a cnn model (binary classification), suppose i give it a image thats completely irrelevant is there any way to restrict it ?

serene scaffold
short heart
#

Can I somehow save keras lstm model so I dont have to teach it everytime I use it

#

into a file

short heart
#

yeah

#

ty

#

Is there a way to take a mean value of several prediction attempts. I feel like theres a "proper" way to do it

serene scaffold
short heart
#

yea

serene scaffold
# short heart yea

and this is for image classification? you might look into precision and recall

short heart
#

i think you got me wrong

#

for example im making 100 predictions with the same input

#

i want to get mean

serene scaffold
#

so your model, once trained, predicts the same input differently?

short heart
#

yea

serene scaffold
#

that doesn't sound right to me

short heart
#

its lstm

#

just, is there a proper way to get mean of it

#

for example i can get predictions looking like that

plain garden
#

I'm currently running R studio, but I've installed reticulate. I have a project to complete, and my first step is to define which of the variables are Dependent and Independent.

#

Anyone?

short heart
#

and that

#

they r both technically right so i want to get mean of them

plain garden
#

Can anyone help me?

serene scaffold
plain garden
#

I'm running reticulate (Python interface in R)

#

Both basically

#

The easiest way to define dependent and independent

#

With R or Python

grave frost
short heart
#

yep

#

ive got no idea why it was like that myself

grave frost
#

check, you model, input_data, predictions, increase RNN units, check masking, etc.

short heart
#

ive checked it all

#

all seems fine

#

even double checked

grave frost
#

then make you model deeper

short heart
#

wdym

grave frost
#

inception - we need to go deeper

short heart
#

more layers?

grave frost
short heart
#

gonna try it

grave frost
#

please do, and don't overfit

short heart
#

yeah

#

ill see it

plain garden
bronze wolf
#

So I have a bunch of data in the following layout: Id, [x,y][x,y][x,y], and I want to rearrange the x,y pairs so that the one with the largest x value is on the left, then the next largest, then the smallest. I am stuck on how to do this. (I already have the data in an sql database, which each x and y in its own column)

tidal bough
short heart
#

i ran ran it 5 times with same input and its completely same in each answer

#

almost same i think

#

still, how can i add more data to lstm

plain garden
#

Can anyone help me please?

short heart
#

i have several datasets but again, uniting them is wrong i think

tidal bough
#

hmm, are you having trouble doing this smartly using vectorized operations (I have no idea about databases, so can't really help with that), or at all?

arctic wedgeBOT
#

Hey @tidal bronze!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

velvet thorn
trim oar
#

Or write a user-defined function that slices it? Or simply reorder the columns so you see the ones you want in the first 10?

velvet thorn
#

the SQL solution would be different from the numpy one

tidal bronze
#

https://paste.pythondiscord.com/ratajovuga.http
let's say we have this dataset nad I want to do kmeans clustering what would be a better input:

  • feed the algo averages for petal_width
  • feed the algo list of observed values for each row so [0.2, 0.3,0.2,0.4] for example

What would intuitevely make more sense?

trim oar
#

Sounds like you’re just passing simple problems to neural nets XD

plain garden
#

Can anyone help me?

bronze wolf
plain garden
#

I'm currently running R studio, but I've installed reticulate. I have a project to complete, and my first step is to define which of the variables are Dependent and Independent.
So, it is both an R question and Python.

#

Please help

velvet thorn
#

example:

>>> a
array([[[1, 9],
        [2, 3],
        [0, 4]],

       [[2, 4],
        [1, 2],
        [3, 4]]])
>>> np.sort(a, axis=1)
array([[[0, 3],
        [1, 4],
        [2, 9]],

       [[1, 2],
        [2, 4],
        [3, 4]]])
#

that's the concept

plain garden
#

Am I invisible?

tidal bough
#

less trivial if it's a pandas dataframe (you'd need to sort only some columns row-wise)

velvet thorn
#

reversing the sort is trivial (look into np.flip)

tidal bough
plain garden
#

I'm currently running R studio, but I've installed reticulate. I have a project to complete, and my first step is to define which of the variables are Dependent and Independent.
So, it is both an R question and Python.

#

Can you help me with this?

indigo obsidian
# plain garden Am I invisible?

Your question is just so vague that it sounds more like a first week intro to stats thing and completely irrelevant to R or Python.

plain garden
#

Ok

bronze wolf
velvet thorn
bronze wolf
bronze wolf
velvet thorn
#

just use vanilla Python sorted

bronze wolf
violet glacier
#

hello ı want to get better at data science are there any good resoruces videos pdfs etc u can reach me

grave frost
#

if a network can map foo-bar, then I don't see how it is simple

#

but yes, that advice is consistent based on my research so its probably correct

#

but still, I don't get it 🤷

trim oar
#

I don't know your data or your problem statement, but it sounds more like you're driving a Ferrari down an alley, or cutting a string with a chef knife

#

@grave frost Because neural networks are solving complex non-linear problems by their ensemble-on-steroid nature, if there aren't much data or the problem itself can be solved with a linear regression / logistic regression, there isn't much use to use a neural network

#

In fact in this case, linear models are preferable

short heart
#

How can I improve lstm model

grave frost
#

I tried with autokeras and that got me about 70~ish - but I can't use it since I don't know the code for the custom layers they use

#

so if autokeras can do that, I guess an NN can get SOTA 🤷

trim oar
#

What other ML algorithms have you tried?

#

And how many data points do you have?

#

Anything less than 5000 is almost not worth it

#

But if that's the bench mark accuracy, then I guess that's it

#

But I'd be weary of trying NNs before even trying other ML models

grave frost
#

I tried autokeras to see whether an NN works and that damn thing did manage to cross SOTA.

sacred gate
#

Hi everyone. I would like to ask your opinion about courses on coursera. Is it worth it or it's better to learn ML by reading books? Plus I'd like to know your opinion about their certificates as well.

lapis sequoia
#

Im trying to generate images with an opencv cascade sheet i made a while back, I cant find any info online on this, how would i do it?

desert oar
#

@oak tusk hi, i would restate your problem as this:

  • i have a dataset of stars, and there is a function/curve/time-series associated with each star
  • i want to be able to input the attributes of any star and obtain a hypothetical time series for that star, so that i can interpolate between stars and watch the shape of the time series change
#

is that accurate?

oak tusk
#

I think yes

#

they are evolution tracks for different masses and I'd like to interpolate between to get tracks for any mass

#

I heard about something called a VARIMA model but I'm not sure how to use it

#

So idk the best way to do this

#

Wow no one is here lol

oak tusk
#

Still no response?

#

they have been able to do it so I should be able to too

exotic maple
oak tusk
#

How can I do VARIMA in python?

#

Is there any tutorial

#

also how will it help me to get tracks for any mass

#

How should I solve this problem?

#

I'm just not very sure at all atm

#

Can anyone help?

exotic maple
#

sorry man I havent done time-series in ages and I cant catch-up right now 😦

oak tusk
#

don't even know if I should do time-series here

exotic maple
oak tusk
#

So what should I do

uncut orbit
#

ok that was a lot

#

let that be a lesson for us

#

never to do things like that

serene scaffold
uncut orbit
#

right

#

is there any way i can get the articles from medium about data science without having a subsciption?

#

i keep on using my articles up

#

and i dont get how an mdp works

upper schooner
#

does anyone know how to asynchronously load image data in PyTorch while training a model so that my hard drive doesn't bottleneck it

upper schooner
#

is there a package for that or do i need to code it from scratch? I know about pytorch dataloaders but dont know how to make it load in parallel

grave frost
#

even then, I don't see how hard writing your own can be

upper schooner
grave frost
upper schooner
#

I have a really old 3200rpm hard drive so I'd imagine it would slow down training a lot though i should probably test it before trying to solve a problem that might not even exist

lapis sequoia
#

Guys is the ml course on standford university worth it?

#

The one featured by course era I think

hollow sentinel
#

do you know the math required for that course?

lapis sequoia
#

Is that math not part of it?

#

I don’t have probs learning math though that’s why I’m going into it

hollow sentinel
#

math is part of it

young dock
#

how do i change the marker size of sm.graphics.plot_regress_exog

hollow sentinel
#

I’m trying to say that they expect you to know that math going into the course

young dock
#

my dots are way too thick

red hound
#

Has anyone ever implemented the gumbel softmax trick thing (using tf/keras) ? Any suggestions on how to approach it?

grave frost
red hound
#

yeah, good point

uncut orbit
#

oh i made a video on that once

south quest
#

let's try keep things to on topic discussion in this channel, memes aren't appropriate in this channel

rare herald
#

👩‍💻🐍 Does anyone else find #Matplotlib's API hard to remember? I have friends who just export their #Pandas data and plot with #Excel. I spend a lot of time googling #Matplotlib help. How do you make your #Python #Data #Plots?

hollow sentinel
#

yeah I was going to ask too

#

I guess we'll never know

#

maybe they self-botted

grave frost
#

Well, my model can overfit with a small piece of sample data. but with 80M+ parameters, it shows no sign of overfitting at all - which is pretty confusing. does anyone have any idea as to what the problem might be?

#

where are all the intellectuals when you need 'em

uncut orbit
#

what model are you using?

grave frost
#

tried all - CNN, LSTM, Transformer

uncut orbit
#

you stacked them?

grave frost
#

no, single

uncut orbit
#

like which model did you use for your data that has 80+mil params

grave frost
#

LSTM + fuckton of dense layers

zenith agate
#

has anyone worked with pretrained tf hub models? Im trying to work with my camera and get 224x224 images into their pretrained imagenet, but its not accepting the format. How do I convert from the numpy array to tf input?

tidal bough
#

Tensorflow tensors pretty much are numpy arrays with some extra bells and whistles, casting them to tf.tensor should be all that's needed.

#

Though I'd expect even that not to be necessary (the model should do it on its own)

zenith agate
#

ok well then my problem lies in a different area... im trying to build a quick classifier without training, ad ive seen that this model can already classify the things i need.

hub.KerasLayer("https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/4",
                   trainable=False)

from the tutorial I know this is the way to load the model into a keras layer, but after this do I have to put it into a sequential model to make it usable?

exotic maple
mortal pendant
#

I'm trying to train an RNN using a module called textgenrnn, but while I was previously able to get understandable results, after trying to improve them, I've just made them pretty much non-existent. I was making loads of changes at once since it takes a while to train so I couldn't really just make one change and then see if it produced better results or what it did, so I have no idea what caused it. Any ideas? The 'biggest' change I think I made was switching to word_level, but I don't see how that could be causing this- surely it would actually be the opposite effect?

Code:```py
from os import environ,listdir
environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

for filename in listdir("inputMessages"):
if filename[:-4]+".hdf5" not in listdir(".") and filename != "all.txt":
print(filename[:-4])
from textgenrnn import textgenrnn
textgen = textgenrnn(config_path="config.json", name=filename[:-4])
textgen.train_from_file('inputMessages/'+filename, num_epochs=5)
del textgen
del textgenrnnSample text (inputMessages/wuulfy.txt, contains 26k lines):i like that colour on the left
ew anime
i like the logo's
this isnt even close to done cameron
or even red screen if that's a things
yoooooooo
except when i started i was set 3 for maths
now that i have instagram
you cant break anything
i aint judging thoSample results (from wuulfy.txt, temperature 1, other temperatures produce literally nothing):

a.
'1.config.json:json
{"rnn_layers": 2, "rnn_size": 128, "rnn_bidirectional": false, "max_length": 32, "max_words": 16384, "dim_embeddings": 100, "word_level": true, "single_text": false, "name": "textgenrnn"}```

rare herald
#

👩‍💻🐍 Does anyone else find Matplotlib's API hard to remember? I have friends who just export their Pandas data and plot with Excel. I spend a lot of time googling Matplotlib help. How do you make your Python Data Plots?

exotic maple
#

matplotlib is af

mortal trout
#

i know we can do binary classification two times, or go for multi classification, im asking if there's any easy hack doing it

#

also im doing transfer learning using vgg16 if that helps

serene scaffold
fickle surge
#

python is better than R when it comes to machine learning, correct?

serene scaffold
fickle surge
#

ok

#

so python is the thing i should put my main focus on

zenith agate
#

has anyone ever used
detector = hub.load

serene scaffold
#

if you have general programming skills and you understand what you're trying to do, you'll be able to figure out the data science libraries when you need them.

fickle surge
#

ok, im currently taking a python course on codecademy, then i plan on reading a book on machine learning, and finally i plan on doing the codecademy courses on machine learning before starting any projects.

serene scaffold
fickle surge
#

i also know very basic java and javascript

serene scaffold
#

I have no idea what their python 3 course is like, but my one experience with codecademy like 7 years ago was pretty bad

#

I have no idea if the way they present the information now is the same

fickle surge
#

the machine learning courses seem pretty great

hollow sentinel
#

I did the code academy python course in high school and it made me frustrated

zenith agate
#

ive taken the python 3 course, it was pretty informative and they got me through the basics but majority of your expertise will come from working on projects

fickle surge
#

ive been taking the python course, its nice

serene scaffold
#

I've made an Alexa program and it has nothing to do with AI

#

Amazon abstracts away all the AI related considerations.

hollow sentinel
#

This code academy stuff looks different than what I did back then

#

I’m not even sure the course still exists lmao

fickle surge
#

haha

#

yeah its all really nice now

#

very interactive

hollow sentinel
#

Is it free?

fickle surge
fickle surge
#

others you need codecademy pro

hollow sentinel
#

oh I don’t pay to learn how to code 🥴

serene scaffold
#

Just based on the course titles for the beginner friendly courses, I would try to learn enough general Python knowledge to start with the intermediate content.

hollow sentinel
#

That’s my policy

fickle surge
#

this is what a normal lesson looks like now... not sure if this is any different

hollow sentinel
#

To each their own I guess

#

if courses are more your flavor then go for it

serene scaffold
#

@fickle surge are you aware of our resources page?

fickle surge
#

yes

hollow sentinel
#

Did you like automate the boring stuff?

fickle surge
#

huh?

hollow sentinel
#

automate the boring stuff w python

serene scaffold
#

which of us are you asking?

hollow sentinel
#

Xeos

serene scaffold
#

I don't recall them saying they had read that

fickle surge
#

read what?

hollow sentinel
#

Oh I thought he said he knows about the resources we offer

#

sorry

fickle surge
#

i looked at it a little

#

mainly the reading section

serene scaffold
hollow sentinel
#

understood

fickle surge
#

also i think i still might take the beginner courses just because they are interesting

#

should i look at the ones about learning r or just avoid them?

serene scaffold
#

I wouldn't think of it as avoiding R, but rather focusing on a more cohesive set of learning goals

#

(which probably won't include R)

fickle surge
#

ok

#

so dont take the courses on learning r?

serene scaffold
#

I wouldn't

fickle surge
#

ok

#

and i think the beginner lessons are mainly python oriented and kind of a way of dipping your toes in the water of machine learning

serene scaffold
#

R was created specifically for scientific computing, whereas Python was intended to be general-purpose and obtained a comprehensive data science stack after the fact.

fickle surge
#

ok

serene scaffold
#

but Python, even when used for scientific computing, benefits from having a large ecosystem that isn't specifically part of the data science stack.

fickle surge
#

yeah

hollow sentinel
#

What does ecosystem refer to?

fickle surge
#

python is great

hollow sentinel
#

Ecosystem of libraries?

serene scaffold
hollow sentinel
#

Oh

fickle surge
#

i think theyre referring to everything you can do with it

serene scaffold
#

even Python Discord is part of "the ecosystem"

fickle surge
#

not just machine learning

hollow sentinel
#

oh I just only see stelercus referring to the ecosystem w python so I didn’t know what it meant

fickle surge
#

oh

hollow sentinel
#

Now I do tho

serene scaffold
#

I'm not the only one who says it 🤷‍♂️

fickle surge
#

alright well ill finish working on this course and then ima read my book on machine learning. after that ill get started on the beginner lessons

#

mainly going to read about the concept of machine learning in the book rather than how to use it bc ill learn that in the course

#

cya!

exotic maple
#

comparing the ecosystem of python vs the ecosystem of R is analogous to the market share of windows vs linux; not even comparable

#

a this point R is "legacy" for DS. and tbh, if you know python, learning R should be trivial

fickle surge
#

ok

exotic maple
#

and the best about learning python over R, is, evne if you later dont care about DS / ML / AI, etc

#

you can always program in python

fickle surge
#

true

exotic maple
#

I have pet project of a text-based game. I'm trying to build it using no dependencies and no libraries

fickle surge
#

nice

#

alright! bye

misty flint
#

yes but i like the R libraries for statistics more

#

i mostly use python for most things tho

misty flint
#

rip R

#

🕯️

wintry sapphire
#

Hi guys, I have an issue that would need some help
so currently I have 2 csv files, 1 is the prices, the other is the amount of money invested
what I want to acheive is for it to be on a rolling basis
so like based on the money invested, I want to be able to get the number of stocks
and it keeps adding up

#

@misty flint any idea sir?

mortal pendant
#

In aitextgen when using line_by_line, the <|endoftext|> seems to be ignored meaning I have to split at it manually and meaning the min_length is innacurate (I'm trying to force it to be a certain length, but it just shoves and <|endoftext|> then is a bit spammy, instead of properly reaching that length). Any ideas of how to solve this? I see a similar looking issue https://github.com/minimaxir/aitextgen/issues/88 but apparently this was patched early last month. I don't have the <|startoftext|> though

grave frost
#

Hi everyone, my model can overfit with a small piece of sample data. but with 80M+ parameters, it shows no sign of overfitting at all - which is pretty confusing. does anyone have any idea as to what the problem might be?

#

even if the data might be too "simple" I think overfitting should still be possible, no?

hard yew
#

Hi, I am looking for how to implement the minimum pooling 2d layer such as
`class MinPooling2D(MaxPooling2D):

def __init__(self, pool_size=(2, 2), strides=None, padding='valid', data_format=None, **kwargs):
    super(MaxPooling2D, self).__init__(pool_size=pool_size, strides=strides, padding=padding,
                                       data_format=data_format, **kwargs)

def pooling_function(self, inputs, pool_size, strides, padding, data_format):
    return -K.pool2d(x=-inputs, pool_size=pool_size, strides=strides, padding=padding, data_format=data_format,
                     pool_mode='max')`
#

I got error from __init__() missing 1 required positional argument: 'pool_function'

#

Could anyone help this problem?

wicked mantle
#

hmm, how to fix this? keras.optimizers actually have SGD function

spark stag
#

I think that is because doing from keras.models import Sequential tries to treat models as a sub package, not a file, so you can do from keras import models then use models.Sequential for example

serene scaffold
#

I can try to replicate it in a moment.

wicked mantle
#

thanks guys<3

#

solution:
ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow

serene scaffold
wicked mantle
#

yeah, installed it just now
upd:
tensorflow have two different backend versions: cpu(pip install tensorflow) and gpu(pip install tensorflow-gpu).

serene scaffold
#

nice 💥

spark stag
serene scaffold
spark stag
#

well originally I used package and meant a folder that contains files but my terminology may be wrong there

serene scaffold
#

Generally speaking, a module is a representation of a .py file in your file system.

spark stag
#

what I was coming at was the import error: py ImportError: cannot import name 'SGD' from 'tensorflow.keras.models' (C:\Users\jamie\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\api\_v2\keras\models\__init__.py)from py from tensorflow.keras.models import SGDit's looking for an __init__.py when you try to import SGD from keras.models so doesn't it expect it to be a folder or am I getting confused

serene scaffold
spark stag
#

yeah, playing around with file structure that seems to work here, i just don't get why it doesn't work with the keras example in this caaes

#

was trying to import SQD from models 😢

#

ok now that error makes sense, that was just me being stupid, pycharm still doesn't like that way of importing it though so i'm not sure why it comes up underlined in red

stable cairn
#

Hi folks! I could use some help with a pandas query in #help-cookie , and I figure that if anyone could figure it out, it would be the data science gurus 🙂

short heart
#

Ive had an awesome model trained yesterday, but today im training the completely same code, nothing changed and model is far worse. How can I fix it?

#

Lstm keras.

dusk depot
#

wdym, your operator is backwards

#

you have 25000 < avg

#

don't you mean avg < 25000

grave frost
lavish tundra
short heart
#

for keras

#

im new to keras

grave frost
#

google it

serene scaffold
#

Is anyone familiar with an NER algorithm that can learn a single class when instances of that class (from a human perspective) is really two or more completely unrelated classes?

#

I don't think it would be possible to do it with a statistical approach because the differences between the underlying classes would throw everything off.

lapis sequoia
#

can i please get help csv and pads

#

pandas

#

Ive looked eevrywher

hollow sentinel
#

yeah this would be the place for that

oak tusk
#

list.pop()?

exotic maple
#

numpy array? I think np has a np.nan you can use for masking and deleting it nans

bronze wolf
#

so I have two arrays that look like this: array1: [[7810.0 5710.0], [7910.0 5710.0]] array2: [[7860.0 5710.0]] and I want to check if array two falls in between elements one and two of array one.

surreal willow
#

I hope that this is the right channel,
so this happens when I convert a json file to csv

#

how could I possibly fix this?

dusk depot
#

what do you want to happen

waxen girder
#

It looks like each cell is a json file.

short heart
#

How does random seed work? For example if I pick 0.9, is the trained model gonna perform close to 1, or will it be a completely different performance

grave frost
granite wolf
#

is this usual?

grave frost
#

depends on the dataset

thin prism
# wicked mantle

try using tensorflow.keras.(your need) I might be so wrong about this but its much stablized than regular keras I believe

#

and thats what stackoverflow recommended me as well haha

#

also try with google colab. I found myself so much easier coding in there. (I dont need to install any libraries haha).

grave frost
#

Keras automatically uses Tensorflow backend

#

even if you run import keras it would show "Using TensorFLow backend"

#

so there is no difference between stability

wicked mantle
#

ok, thnks

river yarrow
mortal pendant
# mortal pendant In aitextgen when using line_by_line, the <|endoftext|> seems to be ignored mean...

I'm still having the above issue, but I have another question. currently, I'm only training on a VPS (Hetzner CX11) and I want everything to be centralised there, but I was wondering if I could have my PC provide resources for modelling like a blockchain would or the likes, to speed up modeling. Would there be a way to have a script on the VPS and a script on my PC communicate and train seperate datasets together (maybe my PC could do some of the larger ones while the VPS sticks to the smaller ones, but the files remaining on the VPS)? Or would it just be best to download the large models, whitelist the smaller ones for the VPS, and train on the larger ones and upload manually? Or maybe even collaborate on the same model at the same time? Preferably, I could also out-source training to some of the people involved in this project or even publicly, but I worry this could pose some security risks. The problem is, I'm using the models live as I train them so I would prefer to be able to also upload live. Any ideas?

mortal pendant
#

Found out what I'm looking for is distributed training. Any ideas how I would accomplish this with aitextgen? I know aitextgen uses PyTorch, so I assume if distributed training is possible in PyTorch then I can use it with aitextgen

bronze skiff
#

just read some docs on distributed data parallel

#

distributed training isn't difficult to do, unless you want something more fault tolerant

#

which is... painful

mortal pendant
#

I already have (and still am) but I don't think it's going to help since there's a reason I'm using aitextgen- it's a lot more simple than the packages it covers. I'm usually good at learning new packages such as discord.py and I'd consider myself pretty sufficient in Python but I've never been able to get my head around things like PyTorch, Tensorflow, Keras, transformers...

#

And I especially don't want to spend ages doing this to find out using something like SFTP would be just as effective and easier 😅

bronze skiff
#

then yeah, you might be a bit out of your rocker with distributed training

#

again, the point of distributed training is to speed up training times over clusters of resources.. most of the ways people do it are pretty low level

mortal pendant
#

So, what would be the best option that doesn't involve the likes of Keras, Tensorflow, PytTorch etc?

bronze skiff
#

for what? training a neural net over multiple nodes?

mortal pendant
#

Specifically for distributing training in aitextgen using mine (or multiple) computers connected to an external server (VPS)

bronze skiff
#

i mean, without knowing the internals of say, pytorch or lightning? probably isn't

mortal pendant
#

And aitextgen uses PyTorch ans Transformers, so pretty much something that would be efficient with those models

bronze skiff
#

luckily it seems its built on lightning with is more high level

#

but you still need to know how to configure your gpus/multinode cpus with the right comm protocols

#

so nccl for nvidia gpus and mpi for cpus

mortal pendant
#

Surely all the multinode stuff is only for if I'm doing distributed data parallel?

bronze skiff
#

you do know what distributed training is, right

#

you're using multiple resources, like multi gpu or multinode cpus

#

if you are just training on one gpu or your local cpu node, then you don't need any of this

mortal pendant
#

Yes, training a model over multiple nodes at the same time. But since that requires knowledge of things like PyTorch, I'd instead be training seperate models per computer, which surely would mean I wouldn't need to use the multinode stuff, I just connect with something like SFTP?

bronze skiff
#

okay-- but then its not distributed training

mortal pendant
#

I'm training on a CPU on the VPS, and would prefer to be able to use the GPU I have on my PC but isn't necessarily needed, they could both use CPU if this would be more efficient

bronze skiff
#

now you're just training... n independent models

mortal pendant
#

I know, I asked for an alternative to distributed training

bronze skiff
#

but for what purpose are you trying to achieve?

#

it seems like repeated work

mortal pendant
#

I'm already training multiple models- currently, my VPS trains a bit on one model, then starts training another, and cycles through them all training them a bit more

bronze skiff
#

i mean, if you just want n independent models that never talk to each other, then its not like distributed training at all

#

then at which point you can forgo all this

mortal pendant
#

Yes, like I said, I'm asking for an alternative to it, not for how to do it without PyTorch since you said yourself it's likely not possible

bronze skiff
#

but its NOT an alternative to distributed training

#

its completely different

mortal pendant
#

I understand they're completely different, but in my case they accomplish the same thing. Surely if they're accomplishing the same thing, just in a different way, it would be classed as an alternative? 🤨

#

I'm just looking for any way to make use of resources over multiple computers to train a set of PyTorch models made in aitextgen. Distributed training was one of the options, but you said that (as I kinda expected) it would require quite a bit of knowledge in PyTorch or the likes to accomplish. So I'm asking, out of the other methods I could use to accomplish what I'm looking for, which would be easiest or most efficient for my use-case. In other words, the best alternative to distributed training for me

#

Sorry for the confusion 😅

bronze skiff
#

okay, then this isn't distributed training

mortal pendant
#
from os import environ,listdir
environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

from aitextgen.TokenDataset import TokenDataset
from aitextgen.utils import GPT2ConfigCPU
from aitextgen import aitextgen
config = GPT2ConfigCPU()

models = []
steps = 512
for model in listdir("GPT-models"):
    print("\n\n\n"+model)
    loc = "GPT2-models/"+model+"/"
    vocab,merges=loc+model+"-vocab.json",loc+model+"-merges.txt"
    models.append([loc,TokenDataset("messages/"+model+".txt", vocab, merges, block_size=64, line_by_line=True, progress_bar_refresh_rate=32, bos_token="", eos_token="", unk_token="", pad_token=""),aitextgen(model=loc+"pytorch_model.bin", vocab_file=vocab, merges_file=merges, config=config, cache_dir=loc, bos_token="", eos_token="", unk_token="")])
for model in models:
    model[2].train(model[1], model[0], batch_size=32, num_steps=steps, progress_bar_refresh_rate=32, save_every=steps, generate_every=steps)```That's my current code, incase it helps. I want to allow other computers to assist in the training process, but preferably while avoiding coding in things like PyTorch
bronze skiff
#

which is why i asked if you know the difference

mortal pendant
thin prism
mortal pendant
# bronze skiff which is why i asked if you know the difference

Yes. Distributed training works on the same model at once over nodes connected in a special kind of network I forgot the name of. I can't exactly say the difference between that and the other option since, well, there is no other option yet, I'm asking you for the other option, but I'm assuming the other option would be involving each 'node' (computer connected to the server, including the server itself) works on it's own set of models (where if one node is disconnected, the models it was working on would then be allocated to one of the other nodes)

mortal pendant
# mortal pendant ```py from os import environ,listdir environ['TF_CPP_MIN_LOG_LEVEL'] = '3' fro...

Also, just incase, here's the code used to make the original models in the first place, before being further developedpy for filename in listdir("messages"): if filename != "all.txt": if filename[:-4] in listdir("GPT2-models"): rmtree("GPT2-models/"+filename[:-4]) print("\n\n\n"+filename[:-4]) loc = "GPT2-models/"+filename[:-4]+"/" vocab,merges=loc+filename[:-4]+"-vocab.json",loc+filename[:-4]+"-merges.txt" mkdir(loc) train_tokenizer("messages/"+filename, prefix=filename[:-4], save_path=loc, serialize=False, min_frequency=5, bos_token="", eos_token="", unk_token="") ai = aitextgen(vocab_file=vocab, merges_file=merges, config=config, cache_dir=loc, bos_token="", eos_token="", unk_token="") ai.train(TokenDataset("messages/"+filename, vocab, merges, block_size=64, line_by_line=True, progress_bar_refresh_rate=32, bos_token="", eos_token="", unk_token="", pad_token=""), loc, batch_size=32, num_steps=8192, progress_bar_refresh_rate=32, save_every=8192, generate_every=1024) with open(loc+"sample.txt",'w+') as txtfile: txtfile.write("\n".join([result.split("\n")[0] for result in ai.generate(100, min_length=3, max_length=2000, return_as_list=True)]))

#

(I'm not needing help with the code shown above, it's just there incase you need to know specifics of my model in order to give advice for my original question, but any feedback would still be appreciated)

warm oak
#

I have a friend who's new to coding and is wondering if he could add AI to a snake game.

  1. Is this even reasonable? He would have another friend and myself helping him, but none of us have any experience with AI

  2. Do you have any suggestions of places to start if this doesn't seem too far-fetched?

uncut orbit
#

this is reasonable

#

but this is reinforcement learning

#

you might want to start at the basics

#

which is regression techniques

warm oak
#

Okay thanks. I'll do some research

sonic raft
#

Hi! Sorry for not asking strictly about Machine Learning..
So I found a Dataset on Kaggle that I find really fun and cool to work with, but I noticed a strange License, and I have no idea about Licenses.
Can someone help me out a bit?

serene scaffold
sonic raft
#

Yes I did, but it took me to the Legal Notice page of the site of the European Union(or something like that) but it didn't say anything about the license.

grave frost
#

why do you want to know BTW?

#

if it's on kaggle, then you can use it

sonic raft
#

I mean, that's alright but I want to train a model using the dataset the put it on my website so that I can reach it easily with having to run it using a notebook service.
I know that it basically impossible to tell what kind of dataset I used, but I still want to know! 🙂

grave frost
#

how would putting a model on your website help?

sonic raft
#

Not my model.. but an application, using Ipywidgets, and Voila you can create interactive applications.

grave frost
sonic raft
#

They probably will be, but the model I'm intending to create meant to handle a basic task, so it wont be that big.

astral path
#

quick question

#

im using cross_val_score to test my ML model, but I'm getting very unexpected values

#

I do the following:```python
model = tree.DecisionTreeRegressor()
clf = Pipeline([
('feature_selection', SelectKBest(chi2, k = 10)),
('classification', model)
])

clf.fit(minmax_train, labels)

scores = cross_val_score(clf, minmax_train, labels, cv=5)

#

and then scores is this:array([ -0.35824288, -1.72116347, -7.53874271, -1.00218319,-259.19346663])

#

I tried a DecisionTreeClassifier earlier which got more expected scores, but this is a regression problem

#

any ideas why this might occur?

#

should i try a different scoring metric besides default?

bitter harbor
#

is there a way to flip or separate the labels on the x axis vertically with seaborn?

#

cause either way it's unreadable currently

astral path
#

you can change the x variable to the y variable on the plot and vice versa

#

if you want it more readable you could make the graph higher (but that might be too high to be readable)

#

angle the labels

#

or make it interactive so the label only shows up when you hover over the bar on the graph (plotly does this)

bitter harbor
#

I'm pretty sure it has to be an image but still it's unreadable

#

that's why im wondering if you can flip them vertically

#

what's really weird is im filtering out openings that have <50 games present but there's still values under 50 here

astral path
#

oh yeah

#

matplotlib is under the hood of seaborn

#

you can rotate the xlabels by 90deg to do this

#

ax.set_xticklabels(ax.get_xticks(), rotation = 90)

#

where ax is defined with plt earlier in the code

bitter harbor
#

huh that didn't seem to work I had to do for tick in ax.get_xticklabels(): tick.set_rotation(90)

astral path
#

sounds good haha

velvet thorn
#

sounds like wrong visualisation

#

it's not even sorted 😢

bitter harbor
#

ya I need to figure out how to filter it properly what i've got obviously isn't working

#

it's much worse now tho smil

#

unless you wanna call it abstract art, then it's perfect

lapis sequoia
#

This is the error i get when i run this code

#

Im simply computing the moving averages and graphin for the users input

#

Im really not sure what im doing wrong and really need help

tight dew
#

Hey, just a curiosity

#

How capable are the best AIs in regards of making itself questions?

uncut orbit
#

gpt-3

#

its amazing

tight dew
#

like philosophical questions and such

uncut orbit
#

oh

#

thats a very good question

#

i dont know

#

i don't think we've built something like that

#

there's space for it tho

tight dew
#

i mean, in my mind the only reason it wouldnt be built is because it wouldnt work properly ?

#

cz it would be funny af

#
  • you can literally study so many things by analyzing the behaviour of the machine in regards to that
#

its more or less trying to simulate similar scenarios humans have been in

#

theoretical scenarios

#

like trying to teach what is ethics to a computer, or making it figure out by itself

misty flint
#

if i had data science nightmares, this would be it

tight dew
#

also are chess engines AIs?

#

or funny calculators?

#

or both

uncut orbit
#

what do you mean by chess engines

mortal trout
uncut orbit
#

i'm not sure how to use elastic net

stiff barn
astral path
#

How should I choose features when they're all have very heavy correlation with other ones?

#

I have almost 5000 features to choose from after generating quadratic features from a preexisting set of features

#

is there a way to agglomerate features by clustering them based on which are heavily correlated with each other?

pure quiver
#

Alphazero is a neural net iirc

#

the very first chess engines like m20 were of course calculators

tight dew
#

alphazero is on a quantum computer tho isnt it

pure quiver
#

no idea about the specifics, but you're asking about how the engines select moves right

#

I think stockfish you would consider a calculator

#

they're currently working on a neural network version of stockfish, stockfish NNUE

uncut orbit
#

wouldn't it be fun to put a quantum computer on every desk on the world

#

just imagine the processing speeds going up

tight dew
#

please no

uncut orbit
#

then no for you right lmao

#

kinda like bill gates

#

put a computer on every desk

#

i think what i just said is cringe worthy

#

what does l2 do?

tidal bough
#

intervals between duckies so far (minutes)

#

data is [11.0, 15.0, 13.0, 12.0, 12.0, 16.0, 15.0, 11.0, 12.0, 11.0, 16.0, 13.0, 14.0, 16.0, 11.0, 12.0, 16.0, 12.0, 12.0, 12.0, 13.0, 14.0]

dapper halo
#

Is there a way to mask features in a neural network. I'm giving it a full set of training data, but if the user cannot supply the full list of data itd obviously be good to train it for missing inputs. Just randomly inject zeros for some features in idk 15% of the sample, or will that bias it in an undesirable way?

astral path
#

If I have a feature set that looks like

     0         1     2         3     ...      4491      4492      4493      4494
0     0.0  0.075377  1.70  1.707317  ...  0.797852  0.239469  0.033743  0.004755
1     0.0  0.065327  1.70  1.707317  ...  0.797852  0.433821  0.110739  0.028268
2     0.0  1.768844  1.65  1.658537  ...  0.797852  0.180470  0.019164  0.002035
3     0.0  0.438243  1.75  1.756098  ...  0.797852  0.086764  0.004430  0.000226
4     0.0  1.281407  1.65  1.658537  ...  0.797852  0.520586  0.159464  0.048846
..    ...       ...   ...       ...  ...       ...       ...       ...       ...
973   0.0  0.438243  1.55  1.560976  ...  0.735229  0.845391  0.437238  0.226141
974   0.0  0.438243  1.25  1.219512  ...  0.407545  0.379220  0.111079  0.032537
975   0.0  0.438243  1.55  1.560976  ...  0.681844  0.224723  0.031979  0.004551
976   0.0  0.438243  1.60  1.609756  ...  0.297484  0.285813  0.069200  0.016754
977   0.0  0.582915  1.50  1.463415  ...  0.620324  0.270069  0.048132  0.008578

and I also have clustered these feature columns, with each element in this list representing which cluster the corresponding column is in: [4 7 9 ... 1 1 1], how would I combine these features?

#

so like all features in cluster 4 are somehow agglomerated together, and so on for all of them

#

thanks and cheers!

short heart
#

Whats the best amount of batches and epochs for lstm

#

Or how can I calculate it

short heart
#

How do I feed several datasets into lstm

untold cove
#

Anyone managed to make a clone similar to Deep Nostalgia?

mortal trout
#

@stiff barn right now im doing this like ```python
if image_classify.predict(image) == 'Other':
return 'Other Image'
return cat_dog_classifier(image)

bronze skiff
#

ml is just giant if then statements

serene scaffold
bronze skiff
#

more accurately its differentiable if then statements

late shell
#

hello, I'm fairly new to ML, and I can't understand whether DecisionTrees can handle categorical vairables or not? some stackoverflow answers say they can, some say they can't? what's the deal here? Suppose I am predicting the % effectiveness of a drug based on 3 features ['Dosage', 'Age', 'Sex']. Sex is a categorical variable with values M & F. So how will the decision tree algorithm split based on the categorical data?

grave frost
bronze skiff
#

for each candidate splitter, it judges its validity based on what is on what side of the split

#

so if you have a categorical var like s, if it decides to split on sex=m, then it'll bucket everything into two based on that split and compute the predictive gain function on it

#

then it tries another split, until it decides by whatever criterion to stop and pick the one with best gain

#

whether the package you use will support doing things this way is the better question

#

sklearn doesn't do it, but that's mostly because sklearn blows and you should use a better package

late shell
#

yes I'm using sklearn. so what does sklearn do in this case? And what other ml libraries do you recommend? Thanks for the prompt reply.

bronze skiff
#

sklearn treats everything as numeric vars, so if you have categorical ones, one hot encode them

#

though this presents its own litany of issues

#

unfortunate, in python this is the best you've got

#

i recommend picking up some r

misty flint
#

R

late shell
#

Alright, thankyou very much 🙌

warped chasm
#
df_comp.ftse.plot(figsize = (20,5) ,title = "FTSE Prices")
plt.title("S&P vs FTSE")
plt.show()```
#

Hey I'd like to know how this code really works^. Like how can we enter plt.title after plotting?

#

Shouldn't there be a separate function that takes all parameters together?

#

I read on SO that it's something related to "layers" in ds programming but didn't really understand that. Could anyone provide a primer?

iron basalt
split eagle
#

Hey, folks, I'm looking for help writing a function. I am working with a pandas df that includes clinical trials that are classified as either "low_accruing" (marked with a 1) or "normal accruing"(marked with a 0). The df is a document-term matrix containing lemmas that show up in the clinical trials and is thousands of columns wide. To identify the most salient terms, I am trying to drop all columns for terms that appear in :"low_accruing" trials and "normal_accruing" trials the same number of times. To do this I tried writing a functions that would multiply the values in rows with "1" in the "low_accrual" column by -1 and then dropping the columns with a sum = 0. Here's what I've written: data3 = data1.apply(lambda row: [i for i in row if data1.loc[row,'low_accrual'] == 0] else i*-1) This returns an "invalid syntax" error. Can you help me see what I'm writing wrong?

serene scaffold
#

that or you're missing a closing bracket for the list comp

#

what is that apply intended to do?

split eagle
#

@serene scaffold I was trying to apply the function to each row

serene scaffold
split eagle
#

I want to negate the values in a row if the value in the low_accrual column is 1.

serene scaffold
arctic wedgeBOT
#

Hey @split eagle!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold
#

Please copy and paste it into the chat

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

split eagle
serene scaffold
arctic wedgeBOT
#

Hey @split eagle!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

serene scaffold
#

You have to copy and paste the text into the chat or use the paste bin

#

it should look like this

0,1,5,6
1,1,7,3
2,5,9,2
split eagle
#

@serene scaffold It is still too large. Give me a sec to trim it down.

serene scaffold
#

A few lines will do.

arctic wedgeBOT
#

Hey @split eagle!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold
#

I won't be able to help with this. Hopefully someone else can take a look.

#

though this may solve your problem.

>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [1, 2, 5], [0, 5, 6]], columns='a b c'.split())
>>> df
   a  b  c
0  1  2  3
1  1  2  5
2  0  5  6
# Save what the a column is for later
>>> original_a = df['a'].copy()
>>> df[df['a'] == 1] *= -1
>>> df
   a  b  c
0 -1 -2 -3
1 -1 -2 -5
2  0  5  6
# This did what we wanted, except it also negated the a column
>>> df['a'] = original_a
>>> df
   a  b  c
0  1 -2 -3
1  1 -2 -5
2  0  5  6
#

I suppose you could also do

df[df['a'] == 1] *= -1
df['a'] *= -1
#

@split eagle see if you can apply that to your particular dataframe

split eagle
#

@serene scaffold Thank you for your help. Sorry for the technical difficulties on my end.

bronze skiff
uncut orbit
#

i found a data science cheat sheet

#

on medium

#

is it fine if i post it here

glad mulch
#

hello everyone. I am trying to create leading, coincident, and lagging composite indexes but i need to determine the optimal lag for each variable. should i use autoregressive model to identify optimal lag?

bronze skiff
bronze skiff
#

i'm not sure what it means to create a lead-lag indicator for an entire index

#

unless it was relative to another base index

glad mulch
#

oh im trying to create a composite index of a business cycle which composes of leading, lagging, and coincident indicators

bronze skiff
#

oh, so like a self-lag lead indicator?

glad mulch
#

but i want to determine optimal lags for each variable

bronze skiff
#

then yeah, you would base it off an autoregressive model

#

have you used the granger test?

#

look at that as a way to test for significance of intervals

glad mulch
#

so most of these variables have already been tested historically

#

for example, the conference board leading economic index has various indicators within it

#

im essentially trying to determine optimal lag period for each variable

bronze skiff
#

i think i might be conflating lag-lead indicators with what you're working with

#

i don't think i have expertise in this in that case, sorry

glad mulch
#

all good

untold charm
#

Hello I'm trying to use scipy's differential_evolution with a constraint of the type Bounds, but I'm getting an error. It works fine when I use a LinearConstraint

example code

import numpy as np
from scipy.optimize import (
    rosen,
    differential_evolution,
    Bounds,
    LinearConstraint,
)

bounds = np.array([
    [-5, 5],
    [-5, 5],
])

bc = Bounds(
    *np.array([
        [0, 2],
        [0, 2],
    ]).T
)

lc = LinearConstraint(np.eye(2), [0, 0], [2, 2])

result = differential_evolution(
    rosen,
    bounds=bounds,
    constraints=bc,
)

print(result)

the error:

Traceback (most recent call last):
  File "diff_evo.py", line 37, in <module>
    result = differential_evolution(
  File "C:\Users\Snaptraks\anaconda3\lib\site-packages\scipy\optimize\_differentialevolution.py", line 308, in differential_evolution
    ret = solver.solve()
  File "C:\Users\Snaptraks\anaconda3\lib\site-packages\scipy\optimize\_differentialevolution.py", line 810, in solve
    result = minimize(self.func,
  File "C:\Users\Snaptraks\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 605, in minimize
    constraints = standardize_constraints(constraints, x0, meth)
  File "C:\Users\Snaptraks\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 825, in standardize_constraints
    constraints = list(constraints)  # ensure it's a mutable sequence
TypeError: 'Bounds' object is not iterable
bronze skiff
#

are you using ver 1.4.0+?

untold charm
bronze skiff
#

no i mean scipy optimize

#

isn't it a separate package?

#

nvm, i was thinking scikit

untold charm
#

yeah scipy.optimize is shipped with scipy

vale fjord
#

Im currently reading up on scikit-learn and running through some different examples, as there are some tasks at work i think could be solved nicely by machine learning.
I was wondering, when running something continually, how would you keep it learning? I have a tfid classifier rn, which reads categories from a folder, and files of those folders as training data. Would i just add onto the training data, and rerun/retrain it every time it had to run, or is there a better way to do this?

lime goblet
#

any chance someone could explain to me how I know if my neural network is over-fitting based on the loss functions graph and the mean absolute error graph?

velvet thorn
#

that’s a good start

smoky lava
#

I am a complete beginner to LaTeX, is there a good way to represent the concept of mode with a formula?

lime goblet
#

what do you mean by represent the concept of mode?

lime goblet
smoky lava
#

I am working on homework, I have to wrote the formula / equation for mean median and mode, but I can't find a formula for mode, and I can't think of how to write that in mathematical notation

#

For mean I did:

lime goblet
#

so you are doing this on a set of values?

smoky lava
#

Yes, an ungrouped set of values

#

I found a formula for grouped data but that doesn't really make sense to use here

#

The mode is the value that occurs most often in the data, but is there a way to write that in a formula with LaTeX as above? Not looking for a full answer, just a little guidance.

lime goblet
#

i don't think it can be written as a formula in general, if anything you could write a definition using the

\usepackage{amsthm}
\newtheorem{definition}{Definition}
\begin{definition}

\end{definition}
smoky lava
#

thank you, I think I will just include the formula for grouped values, since the assignment specifically says to include a formula...

sharp turret
#

Hello fellows. I was working on an object detection problem and had a question about PCA.
In short, I'm trying to build a model that looks at a picture and says if there's a car in it. One of the things I'm doing involves blowing up the dimensionality of the image, then cutting it back down to a reasonable size with PCA.
My question is: Is this at all reasonable when I'm trying to discern between cars and everything else? If I fit the PCA on cars and everything else, I'd think it would just drown in excess variance. On the other hand, if I fit the PCA on just the cars, wouldn't the PCA transform be terrible at representing the variance of the not cars during testing?

velvet thorn
#

you could do it with the indicator function

#

and the max function

#

the mode(s) of a multiset have multiplicities are equal to the maximum multiplicity

uncut orbit
#

What does the learning rate in AdaBoost do?

velvet thorn
uncut orbit
#

oh kk

#

thx

misty flint
#

what libraries/apis/datasets/etc. do you guys use to get crypto price information?

twilit oracle
#

Hey I wanted to get into machine learning and I wanted to start with tensorflow but from all the tutorials I have seen it was really confusing (mostly the math) so I was wondering if any has any good recommendations to where I should start

misty flint
#

the pinned message has some pretty good resources

uncut orbit
#

or google datasets or smth

#

type in what you want there

#

you should be able to find it i think

#

crypto price right?

#

lemme check

#

yup

misty flint
#

wish it was up-to-date

#

but good enough

uncut orbit
#

👍

#

try going to google data or smth

misty flint
#

ok

marsh berry
#

Anyone here good with pandas?

#

I have a column where I am trying to get a count for every type of item in the column

#

However, some of the items in the column are like this : "cars", "racercars", "super cars", etc...

#

I was wondering how I can combine all items that have "cars" in them and get the total count

#

I'm currently grabbing counts like this pd.DataFrame(df['Type'].value_counts().to_frame())

uncut orbit
#

you'll get everything

#

adding columns is a bit different

#

like wdym by add

marsh berry
#

I'm just trying to get a count of all item types

uncut orbit
#

oh

marsh berry
#

Right but how do I combine items that have the word "car" in it?

#

This is what i get ```
Type Count
Airplane 23
car 2
race car 2
super car 2

#

This is what I want

#
Type        Count
Airplane    23
Car        6
uncut orbit
#

i think there's a way

#

i forgot

#

OHHHH

#

wait

#

nvm

#

i keep on forgetting

#

df.type()

#

no

#

not that

#

here's what im getting from my geeks to geeks search

#

`# importing pandas module
import pandas as pd

reading csv file from url

data = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

dropping null value columns to avoid errors

data.dropna(inplace = True)

substring to be searched

sub ='er'

start var

start = 2

creating and passsing series to new column

data["Indexes"]= data["Name"].str.find(sub, start)

display

data`

marsh berry
#

Hmm that will find all items that have "car" in it

uncut orbit
#

probably

marsh berry
#

I suppose I could get a count that way and add a new row the original dataframe and then remove all rows that have "car" in it

#

It's a little tedious to do that but I suppose that's all I got

short heart
#

im doing a project on something that hasnt been done before i think, if i get overfitting and somewhat good looking prediction for training data, does that mean thats its possible to predict something on test data?

#

cause i didnt have anything giving hope for a few days on test data

orchid gull
#

hey, I wanted to share a python library I've been working on, it boosts the T5 model speed up to 5x & also reduces the model size. https://github.com/Ki6an/fastT5

GitHub

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x. - Ki6an/fastT5

orchid gull
lapis sequoia
#
listener = sr.Recognizer()
engine = pyttsx3.init()



def talk(text):
    engine.say(text)
    engine.runAndWait()


def take_command():
    try:
        with sr.Microphone() as source:
            print('listening...')
            voice = listener.listen(source)
            command = listener.recognize_google(voice)
            command = command.lower()
            if 'bob' in command:
                command = command.replace('bob', '')
                print(command)
    except:
        pass
    return command


def run_bob():
    command = take_command()
    print(command)
    if 'time' in command:
        time = datetime.datetime.now().strftime('%I:%M %p')
        talk('Current time is ' + time)
    else:
        talk('Please say the command again.')


while True:
    run_bob()```
#

im having a error in this code

#
Traceback (most recent call last):
  File "d:/Shashi's stuff/everything/Coding related lol/CodeCreations/bob/bob.py", line 44, in <module>
    run_alexa()
  File "d:/Shashi's stuff/everything/Coding related lol/CodeCreations/bob/bob.py", line 34, in run_alexa
    command = take_command()
  File "d:/Shashi's stuff/everything/Coding related lol/CodeCreations/bob/bob.py", line 30, in take_command
    return command
UnboundLocalError: local variable 'command' referenced before assignment```
#

please help

jade chasm
#

I am running a transfer model for convolutional neural nets, and my validation accuracy is actually ahead of testing accuracy due to image augementation (distorting the input).

#

However, eventually my testing accuracy starts to catch up ~60 epochs in. I feel like the model starts to overfit slightly afterward, but increasing dropout or augmentation only decreases overall accuracy

#

Any idea how I can get the val_accuracy closer to the train accuracy?

#

As in, early epochs, val_acc >> train_acc

#

Deeper in, model starts overfitting it seems. But dropout is not really working

warped chasm
#

What will be the equivalent code for annual data (1989, 1990, ... and not 01/01/1989 , 01/02/1990)? I know that asfreq would require 'a' but am unsure about the to_datetime() method and the set_index() method

df_comp.set_index("date", inplace = True)
df_comp = df_comp.asfreq('b')```
jade chasm
#

This is the plot by the way. Is this clear overfitting?

grave frost
#

if the train accuracy gets to high 90s, then its def overfitting

#

another symptom of overfitting is when the train accuracy keeps climbing and the val accuracy also starts steadily decreasing.

warped chasm
#

using this, it automatically adds the 01-01 to each year, my format is just "1961, 1962, 1963 ...". How do I fix it?

tidal bough
#

that's because datetimes store the full date (at least)

#

Is it a problem that they are assumed to be Jan 1 each?

warped chasm
warped chasm
grave frost
#

I am confused about why on certain problems Neural Networks perform abysmally and traditional algos catch up? why does this happen and what kind of NN's are capable of mapping any function/data?

misty flint
#

one neural net to rule them all

cloud wigeon
#

I'm trying to calculate number of days between dates in an groupby object but getting stuck, trying to figure out number of 'hospital days' i.e 0-indexing for each patient admission based on a datetime column, I'm managing to easily do it with dplyr in R but having major difficulties in pandas..

#

I have long-format data with columns with
patientid | datum| variablex | variabely
11 2020-01-01 xx yy
11 2020-01-01 xz yz
11 2020-01-01 xo yf
11 2020-01-02 xx xf
12 2020-01-04 xx yz
12 2020-01-05 xx yf
12 2020-01-04 xx zz

#

df['hospital_days'] = df.groupby('patientid')['datum'].diff() / np.timedelta64(1, 'D')
df['hospital_days'] = df['hospital_days'].fillna(0)

#

Getting negative days, so I figure I have to sort the subgroups somehow before running the above line

sweet cedar
#

Hello to everyone, is anyone familiar using django and plotly dash together?

serene scaffold
exotic maple
short heart
#

how do i counter overfitting in keras

hallow bronze
#

Hi I want to learn ml and ai

#

can you reccomend some books on the above mentioned subject

#

hello

bronze skiff
bronze skiff
#

supplement with hastie et al's elements of statistical learning, and after you read the above transition to the canonical text by goodwillie et al on deep learning

#

that should be a good starting point

uncut orbit
#

what are some good reinforcement learning library?

soft salmon
#

can i ask a basic backpropogation math here?

warped chasm
#

Can I ignore the PACF coefficients after lag 1 as being random?

sweet cedar
#

I need to know how to use the same dashboars with different users and different datasets (django + plotly dash), any idea?

smoky lava
#

Is this a good description of Chebyshev's Theorem?

short heart
#

I’m using LSTM to predict 1 value by 60 values before it, will it be ok if I just concat several databases for training

exotic maple
smoky lava
#

Thanks, that is more clear

bronze skiff
smoky lava
#

Doesn't it tell you the minimum amount of values that must be within k standard deviations? It's my first day of statistics so I might be misunderstanding. @bronze skiff

#

I already submitted the assignment though, so we'll see XD

soft dock
tidal bough
#

I only don't really like talking about "amount of values" here - that's only valid in case you have a uniform discrete distribution (so the probability of each possible value is constant, and the total probability of falling in an interval is just proportional to the number of values inside it). In general, it's a statement about the probability of the value being in the interval.

exotic maple
#

as in; the proportion of values that lay below k standard deviations, accumulated. I.e -> below k standard devs, lay at least 0.x of values, aprrox

#

correct me pls @tidal bough :x

tidal bough
#

among other things, it can be thought of as a constraint on how thick the distribution's tails can be (how slowly the probability falls off) before the distribution's variance becomes infinite.

red hound
#
 def softargmax(values):
     beta = 1000.0
     tensor = tf.squeeze(tf.matmul(
         tf.nn.softmax((beta * values), axis=1),
         tf.reshape(tf.range(values.shape[2], dtype=tf.float32), [-1, 1])
     ))
     return tensor

Does anyone has an idea, whats "non differentiable" in this code?
It throws me:
ValueError: Variable <tf.Variable 'generator_lstm/kernel:0' shape=(128, 512) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

I already tried with tf.keras.backend ops and also with tf.raw_ops. All of them are differentiable on their own.

bronze skiff
#

i don't know tf, but in pytorch you can call values.detach() first to pull something off the gradient tape so it's separate from it

#

so you might have to use something to pull it off the tape

red hound
#

i have also already tried to hardcode this range thing

#

same error

bronze skiff
#

have you tried commenting out the body of your function and just returned a flat tensor?

#

maybe the issue isn't even in this function

red hound
#

i'll try. But as it is the only Lambda layer I use, its pretty likely that it causes the issue

#

I already tried like 100 things today. Frustrating

bronze skiff
#

i'm also certain there's tf implementations for gumbel-softmax out there to begin with 😛

red hound
#

You are absolutely right. But as I need the Max-Index, the Gumbel-Softmax just transforms my normal Softmax Output into One-Hot. Would not change much for me. I need this kind of SoftArgmax implementation. For Torch there are many implementations. TF/Keras Implementations seem to be pretty rare

bronze skiff
#

for sure-- this shows up a lot in differentiable sorting procedures (soft argmax)

#

which is why i thought the issue was a non-detach from the gradient tape

#

but if you already tried replacing values.shape[2] with like, a fixed number 5 and it didn't work then i'm not sure 😛

red hound
#

yeah thats what i did. I also replaced the tf.range thing by hardcoding the range array/tensor

#

well, my brain literally hurts

#

sometimes miracles happen after you have taken a little break

bronze skiff
#

if you don't mind sharing a longer gist i wouldn't mind looking if it's not too huge

red hound
#

Thank you for the offer, unfortunately I can't share much more due to confidentiality reasons 😦

#

I'm going to take a break for now. I'll come back when my head is working again. Thanks for the help and gn8 🙂

misty flint
#

🕯️

glad mulch
#

i have a dataframe that has duplicate values but they vary from being first or last. however, one of them is always a NaN value. any ideas on how to remove the NaN duplicate

serene scaffold
#

like, duplicate rows?

glad mulch
#

duplicate index that has 2 values within the same column

dapper halo
#

Extremely new to all of this. What metrics do you guys always look at for the performance of your models. Specifically for assessing different scaling/normalization processes?

#

Following that, what would indicate a need for a more complex architecture versus varying the preprocessing routine.

serene scaffold
misty flint
#

interesting chart about AI metrics

#

here is switzerland

velvet thorn
#

you can look at things like lift

serene scaffold
#

lift?

velvet thorn
#

by predicted probability

#

calculate the proportion of positives for each group

#

it’s somewhat related to calibration

dapper halo
#

well, a list to keep me busy for a day or so

#

appreciated

rigid echo
#

Hello anyone can explain input_shape=() inputs in LSTM keras
i have 250+label features and 99488 samples

#

i have used (250,1)

#

showing error

bronze skiff
# serene scaffold lift?

look up stuff like the lorenz curve or gini metric, that's usually where you see stuff like this

bright atlas
#

Can you Guys tell why this error occuring when I use grid.fit(X_train,y_train) in my SVM project

bronze skiff
#

quantitative econ is full of interesting classifier metrics that aren't auc-based

hollow sentinel
#

quantitative econ amazes me dude

bronze skiff
hollow sentinel
#

the internships they offer give you a limited list of like T20 colleges and if you don't belong to them you have to say you're from another college

bronze skiff
#

you most likely instantiated the grid search estimator wrong

bright atlas
#

But it is installed

bronze skiff
#

it has nothing to do with being installed?

#

why don't you post a code fragment instead of just the error

#

don't dm me, post it here

bright atlas
#

So how to solve this

bronze skiff
#

post the code that is failing

bright atlas
#

Whole file

#

grid.fit(X_train,y_train)

bronze skiff
#

yeah, that means nothing

#

whats the surrounding context-- how did you define grid, etc

bright atlas
#

grid = GridSearchCV(SVC,param_grid,verbose=3,refit=True)

bronze skiff
#

okay, now notice that you are just calling the class SVC, you're not instantiating an object

#

you need to instantiate an SVC object first and use that in the constructor

bright atlas
#

how do I do the instiate this

#

instantiate

bronze skiff
#

the same way you instantiate any other object from a class?

#

i seriously suggest you do a refresher on python itself

hollow sentinel
#

yeah I second what pastafish is saying

#

OOP is pretty important

bright atlas
#

OK I Got it

marsh berry
#

Anyone here use streamlit?

misty flint
#

streamlit is great

#

im using it for a project rn

marsh berry
#

@misty flint Yeah its made my life a lot easier. Do you know how to return data by using date ranges?

rigid echo
#

Hello anyone can explain input_shape=() inputs in LSTM keras
i have 250+label features and 99488 samples
i have used (250,1)
showing error

misty flint
#

thats less of a streamlit question and more of a datetime module question no?

#

however, if its loaded onto the df, you should be able to regardless

humble kestrel
#

Anyone here working on self supervised learning on images? BYOL to be precise

marsh berry
#

@misty flint I suppose you're right. I've got 2 st.date_input() boxes that takes a start date and end date. And I was thinking I would update the date if the user clicks a button.

misty flint
#

st.dataframe() is the function youre looking for for streamlit if you need to load the dataframe onto streamlit

marsh berry
#

@misty flint So I've got my dataframe and then I've created multiple new dataframes from the original and then I've created plotly charts out of each of the new dataframes which are based off the original.

#

I was thinking if I add parameters to filter the original dataframe with those 2 dates it should change the subsequent dataframes too right?

rigid echo
#

anyone?

exotic maple
exotic maple
#

like plotly, but easier and prettoer?

misty flint
#

it accepts plotly

#

think of it like flask

exotic maple
misty flint
#

but easier to use

#

much easier

#

and less template hassle

exotic maple
#

i'll have to check it out. im still a bit not-certain about what it does

rigid echo
#

anyone can help in LSTM keras ??

misty flint
#

and f strings helped me accomplish what i needed

#

outside of looking at your code logic, cant offer much more advice

marsh berry
#

@misty flint I think I've got it. Thank you for the help.

#

Any idea how you could potentially save an instance of your dashboard? Like if I create a report of a date range, could I share like a URL of that state?

#

I'm wondering if I'd have to integrate Flask to generate short URLs with parameters

#

Nonethless streamlit is actually lit

misty flint
#

np

#

and saving an instance so that others can access it? youd need to host it with something like a cloud provider

#

or something like heroku or digital ocean

austere swift
#

what does adding _mirror to the name of a layer in tensorflow do?

#

i've never seen that syntax

#

layer.name = layer.name + str("_mirror")

marsh berry
#

How do you report spam?

austere swift
lapis sequoia
#

!warn 730519909221531648 Do not post referral links in the server.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied warning to @lapis sequoia.

patent merlin
#
                 State    Capital          State      Capital
0          Maharashtra     Mumbai     California   Sacramento
1          West Bengal    Kolkata        Florida  Tallahassee
2        Uttar Pradesh    Lucknow        Georgia      Atlanta
3                Bihar      Patna  Massachusetts       Boston
4            Karnataka  Bengaluru       New York       Albany```
I have this data frame I want to delete country column how to do that?
austere swift
arctic wedgeBOT
#
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')```
Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.

Parameters  **labels**single label or list-likeIndex or column labels to drop.

**axis**{0 or ‘index’, 1 or ‘columns’}, default 0Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

**index**single label or list-likeAlternative to specifying axis (`labels, axis=0` is equivalent to `index=labels`).

**columns**single label or list-likeAlternative to specifying axis (`labels, axis=1` is equivalent to `columns=labels`).

**level**int or level name, optionalFor MultiIndex, level from which the labels will be removed.

**inplace**bool, default FalseIf False, return a copy. Otherwise, do operation inplace and return None.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html#pandas.DataFrame.drop)
patent merlin
#

@austere swift I have given name to column using df.columns.names=['Country',None] Now i want to remove India and America using Country name

#

And Dataframe is multiindex

modest void
#

is there a way to do a rolling window on an entire pandas dataframe, and not just a single column?

short heart
#

Ive got a question about lstm

#

whats better: if I put 100 dense units inside my model, or if I put 1 dense unit, predict 1 value, add it to input and do prediction again?

#

1 dense unit

#

100 dense units

short heart
#

How can I boost the learning process of keras LSTM?

bronze skiff
#

thats because your range is from -1000 to 1000... at those scales, the small bumps near -1 and 1 become invisible..m

#

try restricting your range to -2 and 2 and see what happens

trail isle
uncut orbit
#

i can't wait for the day when this stuff is sold in art galleries

#

OHHHHHHH

#

now i understand how apple's background thingy works

#

the one where they blur it

sacred yarrow
#

I am a high school student that is mostly new to programming and completely new to python, ai, dl, and ml. Where should i begin to start learning to code ai?

bronze wolf
hollow sentinel
#

looking at the existing libraries alone won't be enough bc ML/AI is also pretty math heavy

#
#

check this out to see the math you need to know

bronze skiff
sacred yarrow
hollow sentinel
#

well when I had my brief endeavor into DS/ML I started with Numpy, Matplotlib, and Seaborn

#

and then I "learned" Pandas and sklearn

sacred yarrow
#

ah ok thanks!

vague crypt
short heart
#

How can I boost the learning process of keras LSTM?

#

I have a massive dataset with over 20000 values (should be approximately over 100000 if I use everything) and my model takes forever to learn

#

Increasing batch size changes perfomance significantly

lapis sequoia
#

Hello

hollow sentinel
#

AP CS?

#

AP Calc?

warped chasm
#

appropriate lag is 0 or 7 or 11? How do I decide?

vague crypt
#

And AP CS

hollow sentinel
#

yeah that should work

#

idk if ap calculus covers linear algebra

vague crypt
#

Oh kk, will do more research

hybrid birch
hybrid birch
vague crypt
short heart
#

Imagine lstm model thats predicting values. Will it break if I give it values that are .rolling(3).mean() (for example)?

hybrid birch
#

These are pretty fundamental things. Machine Learning uses surprisingly low level maths.

#

^^

hybrid birch
hybrid birch
hybrid birch
exotic maple
#

Now, if it's a good extrapolation or not, that's a whole different thing.

hybrid birch
#

If there's a sequential dependence be careful with means 🙂

short heart
#

Im not sure there is

#

Not sure if its really a time series since im not passing it any datetimes

#

It just learns to predict values, knowing 120 behind it

short heart
abstract zealot
#

need some help here, i have a df consisting of 500 np.float64 values and ```py
hist = sns.histplot(data=df, y='values', hue='Color', kde=True, palette='Dark2',ax=ax[1])

is returning an error: ```py
No loop matching the specified signature and casting was found for ufunc add
#

Any ideas?

#

stops working as soon as i include kde=True

#

no idea why

hybrid birch
short heart
#

Hmmm, yeah I got a dataset that should take like 30 minutes to learn

#

Another question

exotic maple
#

A histogram s literally a count of discrete values. passing a continuous value doesnt make much sense to me

short heart
#

When I made a model that learned from a not so big dataset, it (somehow) made predictions far into the future (around 2/3 weeks), but when I trained another one on several huge datasets, it predicts for only few days, even though very accurately

#

Let me get screenies

exotic maple
#

for continous value you can compute KDE as well, but im not sure if its done in the same way.

short heart
abstract zealot
#

@exotic maple i agree, they are floats simply because they vary very slightly around integer values, casting the array to np.int before using kde=True still raises the same error

exotic maple
#

tRy using the KDE class directly, without hist

short heart
# short heart

The one with huge dataset (small prediction) was a 1 epoch model, but for 100 epochs its the same, just the prediction goes a little further

abstract zealot
exotic maple
#

@abstract zealot try cumulative?

abstract zealot
#

same error

#

ahhhhhhhh!

short heart
#

Whats better: autoregression or straight up prediction?

abstract zealot
#

conda update --all and jupyter restart fixed it @exotic maple i have absolutely no idea lmao, thank you for your help though

short heart
#

eh

#

i added more units to my model and decreased batch size but perfomance only seems to drop

#

before i started adding more data to my model, perfomance was better too

#

and now...

#

what even is this mess??? is this some new kind of overfitting

#

caause im testing literally on train data here

#

also now thing but with autoregression

#

why does this happen? should i increase batch size and decrease units?

bronze skiff
#

generally you should decrease batch size and increase model capacity

#

large batch sizes also requires some learning rate tuning as well, which can be tricky

short heart
#

the problem is

#

i decreased it

#

and it got worse

#

i added units and it got worse more

#

i added more DATA and though it made accurate predictions, it died too soon

#

i have literally no idea as to why thats happening

humble turret
#

hello, i have a large list of full names, i would like to be able to search by any part of that name for a match. basically name fuzzy search does anyone have any library recommendations for this?

short heart
#

im just gonna try and teach it with 1 epoch instead of 100, maybe some kind of magics gonna happen and do everything

bronze skiff
#

train longer until convergence maybe

#

and, no, unlikely

short heart
#

ive trained it for 5 hours

#

if not more

#

it gave worse result

#

far worse

#

um

short heart
#

it at least died after a small while but not instantly

dapper halo
#

Are there any other useful preprocessing routines for regression tasks? Have tried minmax, Zscore, robustScaler, yeo-johnson, quantile. Looking for more, but pretty much any document I find lists roughly the same ones.

short heart
#

thats some dark magic going on here

#

anyways gn to everyone

dapper halo
short heart
#

thats work for tomorrow

dapper halo
#

I wish you the best of luck and hopefully no hexing. Dont piss em off

lapis sequoia
#

Anybody done the machine learning AWS certification?

serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
lapis sequoia
#

Atm I'm trying to get some more experience with the stuff I've found being looked for by employers but I don't have much direct exposure to

#

Like AWS

serene scaffold
# lapis sequoia Machine learning specialist

if your underlying question is "is the information presented in the AWS certification course useful for what I want to do?" (rather than "will anyone care if I get it?"), then I'm not the one to answer.

lapis sequoia
#

I was just curious if anybody had done it, since the enrollment process seems awkward and I wanna know if anybody else had studied for it

#

I'm sure the actual material will be dumbed down and boring like most of the certs I've looked at

#

But they namedrop amazon and google so...

serene scaffold
#

Let's see what others might say. You can also ask in #career-advice, though you should provide context about why you're asking

hollow sentinel
#

I'm only a college student I'm not sure if my opinion matters

serene scaffold
#

though if you want to say "I'm in college, though I've heard xyz about the certification", that's more substantial.

hollow sentinel
#

but my friend did an AWS certification and it didn't really help him

#

he just said it was like any other certificate you could pick up

lapis sequoia
velvet thorn
#

it's pretty simple

#

okay I took it a couple of months ago for fun?

#

since I didn't really have much to do @ my job

#

so I thought

lapis sequoia
#

yeah it's similar for me

velvet thorn
#

might as well use some of that training budget

lapis sequoia
#

I'd rather get some exposure just to pad out my CV a little

#

Especially with the scanning software companies use now

#

Namedrop a few big companies and decent unis along the way

velvet thorn
#

can't hurt I guess

#

also I come from a non-CS background so it's probz relatively better for me...?

#

and I'm currently with a consultancy

#

clients kinda like that kind of thing