winged stratus May 8, 2021, 2:29 PM

#

i can try and get upto 1000 images more

grave breach May 8, 2021, 2:30 PM

#

Yes, if you lower the quality 1000 should be ok

winged stratus May 8, 2021, 2:30 PM

#

500x500?

#

i dont know what i should lower it to

grave breach May 8, 2021, 2:38 PM

#

Make a try

pallid parcel May 8, 2021, 2:48 PM

#

Anybody know of a library that produce PNG files from data points?

crude fable May 8, 2021, 2:49 PM

#

opencv?

#

what kind of data points?

pallid parcel May 8, 2021, 2:50 PM

#

Well data points for line graphs or candle stick gtaphs

crude fable May 8, 2021, 2:50 PM

#

ah, then matplotlib

pallid parcel May 8, 2021, 2:51 PM

#

Matplotlib makes PNG files?

crude fable May 8, 2021, 2:51 PM

#

of course

pallid parcel May 8, 2021, 2:51 PM

#

Huh

#

Ok

#

Thanks

crude fable May 8, 2021, 2:51 PM

#

just plt.savefig("*.png")

pallid parcel May 8, 2021, 2:51 PM

#

Nice, sounds simple enough

#

Been looking all over

crude fable May 8, 2021, 2:56 PM

#

Do u have an idea of how GANs could converge?

grave frost May 8, 2021, 3:00 PM

#

https://www.reddit.com/r/MachineLearning/comments/n7lv1o/oc_phand_gesture_recognition_play_pause_control/
I am more impressed that he used windows

r/MachineLearning - [OC] [P]Hand Gesture Recognition : play , pause...

502 votes and 35 comments so far on Reddit

#

that's too less. even using the DeiT recipie wouldn't yield much

grave breach May 8, 2021, 3:03 PM

#

crude fable Do u have an idea of how GANs could converge?

What do you mean by converge?

shy ember May 8, 2021, 3:03 PM

#

does anyone know how to control the number of clusters here (https://scikit-learn.org/stable/auto_examples/cluster/plot_affinity_propagation.html#sphx-glr-auto-examples-cluster-plot-affinity-propagation-py)

#

#

i was hoping for 7-9 clusters but it doesn't seem to separate them far enough

grave breach May 8, 2021, 3:04 PM

#

For this you should use k-mean clustering

crude fable May 8, 2021, 3:06 PM

#

grave breach What do you mean by converge?

Training a good discriminator while a better generator that fools D at the same time?

shy ember May 8, 2021, 3:07 PM

#

grave breach For this you should use k-mean clustering

will this work if i don't know the number of clusters beforehand?

#

from what i recall this was why i didn't use k-mean in the first place

crude fable May 8, 2021, 3:08 PM

#

the K in k-means is a hyperparameter I think

grave breach May 8, 2021, 3:08 PM

#

crude fable Do u have an idea of how GANs could converge?

Well, I don't think a GAN could ever converge

grave breach May 8, 2021, 3:09 PM

#

shy ember will this work if i don't know the number of clusters beforehand?

No, unfortunately k-mean clustering has a fixed size

#

Maybe you could try DBSCAN

#

If I don't go wrong scikit learn has DBSCAN implemented

crude fable May 8, 2021, 3:10 PM

#

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
@shy ember

#

I am not quite familiar with the general training scheme of GANs. Do D and G both start from scratch or is D initialized with a good enough checkpoint?

grave breach May 8, 2021, 3:15 PM

#

They both start from scratch

shy ember May 8, 2021, 3:18 PM

#

grave breach Maybe you could try DBSCAN

does this return the cluster center? i can't seem to find how in the documentation

grave breach May 8, 2021, 3:18 PM

#

I think you have to calculate it by yourself

#

But

#

Once you have all the points in a cluster

crude fable May 8, 2021, 3:19 PM

#

quote from Stackoverflow

grave breach May 8, 2021, 3:19 PM

#

You can do the mean of all the xs and the mean of all the ys

#

So you have the center

#

@shy ember

lapis sequoia May 8, 2021, 3:29 PM

#

I'm assuming you mean "better counterpart" as GPT-3, but that's a private beta, so what else can I use?

crude fable May 8, 2021, 3:29 PM

#

maybe BART?

lapis sequoia May 8, 2021, 3:29 PM

#

What's BART?

crude fable May 8, 2021, 3:30 PM

#

auto-regressively pre-trained models are generally better at generative tasks like convo

#

https://huggingface.co/transformers/

Transformers

State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0. ğŸ¤— Transformers (formerly known as pytorch-transformers and pytorch-pretrained-be...

#

A whole bunch of pre-trained models here

#

Basiclly BART = BERT (as encoder) + GPT (as decoder)

livid oar May 8, 2021, 4:01 PM

#

Checkout this new article 😃

https://www.analyticsvidhya.com/blog/2021/05/top-8-python-libraries-for-natural-language-processing-nlp-in-2021/

Analytics Vidhya

akshay31

Top 8 Python Libraries For Natural Language Processing (NLP) in 2021

These 8 NLP libraries in Python are a must know for anyone working with text data.

simple shadow May 8, 2021, 4:05 PM

#

hey everyone, how do i read a folder of images in github through python in google colab?

grave frost May 8, 2021, 4:21 PM

#

livid oar Checkout this new article 😃 https://www.analyticsvidhya.com/blog/2021/05/top-8...

ehh, that kind of stuff is always shite. There aren't any "must-to-know" libs ever - only those that are basic (numpy, pandas). It's absolutely useless to "learn" a library. you have to learn concepts, not libraries. in NLP, you do get some use out of those for basic processing, but I would rather say the person learns "TF/Pytorch" (i.e understand the practical terminology and API usage rather than just a lib) to use NLP models since most ML tools don't do good in real-world NLP tasks.

#

You'd find much more better blogs on Medium than Analytics Vidhya tbh

grave breach May 8, 2021, 4:56 PM

#

simple shadow hey everyone, how do i read a folder of images in github through python in googl...

Download them, put them on your google drive

#

And then open them via colab

grave breach May 8, 2021, 4:56 PM

#

grave breach And then open them via colab

Never did this but it seems pretty simple

#

@simple shadow

short heart May 8, 2021, 4:58 PM

#

How does fbprophet work?

#

cause I gave it some dataset and the output it generated kind of follows true values, but when it went to the values it didnt see, the prophet kinda starts..sucking?

#

what I mean is, how does it predict something? why did it even predict the values it was trained on if thats what happened on screenie?

grave frost May 8, 2021, 5:16 PM

#

short heart what I mean is, how does it predict something? why did it even predict the value...

you can only gain so much performance with black-box methods.

short heart May 8, 2021, 5:18 PM

#

wdym

grave frost May 8, 2021, 5:19 PM

#

exactly what I mean.....?

#

you can't expect something to work as perfect with minimal effort 🤷

short heart May 8, 2021, 5:21 PM

#

i just want to know how does fbprophet make predictions and why it predicts whats already there

grave frost May 8, 2021, 5:22 PM

#

short heart i just want to know how does fbprophet make predictions and why it predicts what...

learn ML basics, then see RNN'S/LSTM

simple shadow May 8, 2021, 5:40 PM

#

grave breach And then open them via colab

i want others to be able to run the google colab, wouldnt it not work if its on my google drive only?

grave breach May 8, 2021, 6:01 PM

#

simple shadow i want others to be able to run the google colab, wouldnt it not work if its on ...

Just create a shared google colab

#

*google account

royal lintel May 8, 2021, 6:34 PM

#

cannot pickle 'weakref' object
anyone knows how to solve it? Saving model using joblib.dump
tag if answered please

desert oar May 8, 2021, 6:34 PM

#

grave frost learn ML basics, then see RNN'S/LSTM

Prophet is less of a black box than an lstm

desert oar May 8, 2021, 6:36 PM

#

short heart cause I gave it some dataset and the output it generated kind of follows true va...

Prophet is essentially regression, and this is a common problem in forecasting. See the "how prophet works" section https://research.fb.com/blog/2017/02/prophet-forecasting-at-scale/

Facebook Research

Cecilia Lopez

Prophet: forecasting at scale - Facebook Research

Forecasting is a data science task that is central to many activities within an organization. For instance, large organizations like Facebook must engage in capacity planning to efficiently allocate scarce resources and goal setting in order to measure performance relative to a baseline.

desert oar May 8, 2021, 6:45 PM

#

short heart cause I gave it some dataset and the output it generated kind of follows true va...

Just eyeballing this output, it looks like the trend shifts after the big spike, and prophet tries to continue the flatter trend at the end of the time series. If you are aware of additional features that could have caused the flatter period, you should add them to the model

#

Basically, in order to predict a change in trend, the change either needs to be cyclical, or there needs to be some external feature that can indicate the trend is changing, and in what direction

#

Otherwise really the only reasonable thing a model could do is to continue the previous trend

short heart May 8, 2021, 6:46 PM

#

Makes sense

desert oar May 8, 2021, 6:46 PM

#

That said, it might be doing the wrong thing here in that what should be an unusual deviation from trend appears to be a change in trend

#

Maybe there are some configurable parameters that can help with this issue

#

Also, it would help if you drew a vertical line where the training data ended and the test data began

short heart May 8, 2021, 6:48 PM

#

Yeah, I did draw one after posted but im not on pc now

#

Give me a sec

#

Should me something like that

grave frost May 8, 2021, 6:49 PM

#

desert oar Prophet is essentially regression, and this is a common problem in forecasting. ...

oof

desert oar May 8, 2021, 6:50 PM

#

short heart Should me something like that

Yeah that's pretty much what I figured

grave frost May 8, 2021, 6:50 PM

#

that won't be very high accuracy, but I guess it gets the job done

desert oar May 8, 2021, 6:50 PM

#

Neural networks have shown themselves to be pretty lackluster in time series modeling compared to other domains

grave frost May 8, 2021, 6:51 PM

#

desert oar Neural networks have shown themselves to be pretty lackluster in time series mod...

not true in practice tho

desert oar May 8, 2021, 6:51 PM

#

I believe it if you are doing something like classifying 1000 different time series

grave frost May 8, 2021, 6:51 PM

#

you can check out kaggle's jane street, and everyone is running nets (I think the winner too, not sure tho)

desert oar May 8, 2021, 6:51 PM

#

For time series prediction on a single series? I haven't seen good results but I am obviously willing to be proven wrong

grave frost May 8, 2021, 6:51 PM

#

jane street is stock prediction BTW

#

Yirun's Solution (1st on 2021-03-29): Training Supervised Autoencoder with MLP

#

ahh, and a blend of xgboost

#

Cross-Validation (CV) Strategy and Feature Engineering:

    5-fold 31-gap purged group time-series split
    Remove first 85 days for training
    Forward-fill the missing values
    Transfer all resp targets (resp, resp_1, resp_2, resp_3, resp_4) to action for multi-label classification
    During inference, the mean of all predicted actions is taken as the final probability

Deep Learning Model:

    Use autoencoder to create new features, adding along with original features to the MLP
    Train autoencoder and MLP together in each CV split to prevent data leakage
    Add target information to autoencoder (supervised learning) to force it generating more relevant features, and to create a shortcut for backpropagation of gradient
    Add Gaussian noise layer before encoder for data augmentation and to prevent overfitting
    Use swish activation function instead of ReLU to prevent ‘dead neuron’ and smooth the gradient
    Batch Normalisation and Dropout are used for MLP
    Train the model with 3 different random seeds and take the average to reduce prediction variance
    Only use the models (with different seeds) trained in the last two CV splits since they have seen more data
    Only monitor the BCE loss of MLP instead of the overall loss for early stopping
    Use Hyperopt to find the optimal hyperparameter set

Ahh, the mind of grandmasters works in different ways than the rest of use mortals can comphrehend

desert oar May 8, 2021, 6:54 PM

#

Yeah I have no doubt that eventually we will figure out useful general purpose neural network models that offer incremental improvements over regression-based methods

#

But for now it's a mischaracterization to suggest that a regression model like prophet will be inherently less effective than something based on an LSTM

grave frost May 8, 2021, 6:55 PM

#

desert oar But for now it's a mischaracterization to suggest that a regression model like p...

if prohpet hasn't yielded much in a comp for 100,000$ - I think we can safely rule it out imo

desert oar May 8, 2021, 6:55 PM

#

The Jane Street problem is also a lot more sophisticated than forecasting a single series 🤷‍♂️

#

I'm not trying to say this particular model is the best thing since sliced bread either

#

Neural networks I would expect to perform much better on a complicated evaluation task like this

grave frost May 8, 2021, 6:59 PM

#

desert oar The Jane Street problem is also a lot more sophisticated than forecasting a sing...

it seems forcasting is the second stage of the comp - it will end up in 3 months. I bet for NNs to rule

#

!remindme

arctic wedgeBOT May 8, 2021, 6:59 PM

#

Missing required argument

expiration

desert oar May 8, 2021, 7:00 PM

#

I have no doubt that the winning forecast will at some point use deep learning somewhere along the way

grave frost May 8, 2021, 7:00 PM

#

uh-huh

royal lintel May 8, 2021, 7:33 PM

#

royal lintel `cannot pickle 'weakref' object` anyone knows how to solve it? Saving model usin...

refreshing question

tawdry hamlet May 8, 2021, 7:41 PM

#

Hi, does anyone have any good info on how to interpret the output of the critic/discriminator module in a Wassertstein GAN (WGAN-GP)? I am struggling to interpet whether a large number means 'normal' or a small number means normal and looking around online im seeing conflicting info?

#

Would love for some clarification

proper basin May 8, 2021, 7:55 PM

#

I'm using xml.sax to parse a 200GB XML file and I'm trying to get byte offsets so I can seek back. Is there a way to do this? The Locator interface only provides line/column numbers that are useless for seeking

lapis sequoia May 8, 2021, 7:57 PM

#

200GB... geez

proper basin May 8, 2021, 7:57 PM

#

yeah it's been a pain

lapis sequoia May 8, 2021, 7:57 PM

#

I can imagine.

#

Are you using a generator to deal with that much overhead?

proper basin May 8, 2021, 7:58 PM

#

I'm using xml.sax which has a push API, so I read some bytes, feed them to the XML parser, and it calls callbacks where I handle the XML elements

#

so like I feed 16k bytes to it, one of the event is interesting and I want to note the file position, and I don't know how to do that

#

If I count bytes when I push data in, I know where the 16k buffer is, but I don't know where exactly in the buffer the event happens

lapis sequoia May 8, 2021, 8:03 PM

#

Way over my head, wish I could help

#

The object event though

#

what happens when you print(blah.dir())

#

print(blah.__dir__())

#

Where blah is the obj

#

Even better, use ipython so you can tab out on the obj

proper basin May 8, 2021, 8:05 PM

#

unfortunately the parser's internals are in C so I can't really get at the internals

lapis sequoia May 8, 2021, 8:05 PM

#

Its still an obj is it not?

#

i.e.

#

blah blah your code blah blah

x = your callbacks obj at the event level that is interesting


Now you just need to see what properties x has

#

So to do, you can tab it out if you make x available to you in ipython

#

In [1]: f = str()

In [2]: f.casefold
capitalize() format_map() isnumeric() maketrans() split()
casefold() index() isprintable() partition() splitlines()
center() isalnum() isspace() replace() startswith()
count() isalpha() istitle() rfind() strip()
encode() isascii() isupper() rindex() swapcase()
endswith() isdecimal() join() rjust() title()
expandtabs() isdigit() ljust() rpartition() translate()
find() isidentifier() lower() rsplit() upper()
format() islower() lstrip() rstrip() zfill()
function()

#

Like that. I tabbed out and all the things f can do or has, pop up 🙂

proper basin May 8, 2021, 8:07 PM

#

Oh yeah there's stuff, just not useful stuff

#

(Pdb) p parser
<xml.sax.expatreader.ExpatParser object at 0x7f393cee4430>
(Pdb) p dir(parser)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_bufsize', '_close_source', '_cont_handler', '_decl_handler_prop', '_dtd_handler', '_ent_handler', '_entity_stack', '_err_handler', '_external_ges', '_interning', '_lex_handler_prop', '_namespaces', '_parser', '_parsing', '_reset_cont_handler', '_reset_lex_handler_prop', '_source', 'character_data', 'close', 'end_element', 'end_element_ns', 'end_namespace_decl', 'external_entity_ref', 'feed', 'getColumnNumber', 'getContentHandler', 'getDTDHandler', 'getEntityResolver', 'getErrorHandler', 'getFeature', 'getLineNumber', 'getProperty', 'getPublicId', 'getSystemId', 'notation_decl', 'parse', 'prepareParser', 'processing_instruction', 'reset', 'setContentHandler', 'setDTDHandler', 'setEntityResolver', 'setErrorHandler', 'setFeature', 'setLocale', 'setProperty', 'skipped_entity_handler', 'start_doctype_decl', 'start_element', 'start_element_ns', 'start_namespace_decl', 'unparsed_entity_decl']

proper basin May 8, 2021, 8:54 PM

#

Ok I wrote untractable code that records the byte offset of the buffer and the line number inside that buffer

lapis sequoia May 8, 2021, 11:04 PM

#

What's the logic that goes into selecting a loss function

grave frost May 8, 2021, 11:04 PM

#

lapis sequoia What's the logic that goes into selecting a loss function

how else would you optimize your gradients?

#

do you mean selecting a loss function or why do we use one?

lapis sequoia May 8, 2021, 11:05 PM

#

I mean, what loss function should I choose

#

Yeah, selecting one, I worded that badly lol

#

Like atm I'm doing an MLP sequential model for recognising numbers, what loss function should I be choosing?

grave frost May 8, 2021, 11:05 PM

#

classification on multi-class --> cross entropy

#

I mean, there aren't that many options are there?

lapis sequoia May 8, 2021, 11:06 PM

#

What's the difference between that and the sparse variant?

grave frost May 8, 2021, 11:07 PM

#

the sparse accepts one-hot encoded labels I think - not sure tho

lapis sequoia May 8, 2021, 11:07 PM

#

Ah, makes sense

#

Is it just for performance enhancement?

grave frost May 8, 2021, 11:07 PM

#

I mean in TF/keras i don't thikn it does accept spare.

#

it's the opposite in fact

lapis sequoia May 8, 2021, 11:08 PM

#

Then why use it

grave frost May 8, 2021, 11:08 PM

#

If you want to provide labels using one-hot representation, please use CategoricalCrossentropy loss

lapis sequoia May 8, 2021, 11:08 PM

#

Do you have a decision tree for this?

grave frost May 8, 2021, 11:08 PM

#

but in SO, Im pretty sure I read it as sparse loss for sparse labels

lapis sequoia May 8, 2021, 11:09 PM

#

Makes sense

#

Thanks

grave frost May 8, 2021, 11:10 PM

#

yup, it seems sparse_categorical_crossen is for labels that are NOT sparse.
https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other
that's a weird convention

Cross Validated

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy and sparse_categorial_crossentrop...

#

The sparse_categorical_crossentropy is a little bit different, it works on integers that's true, but these integers must be the class indices, not actual values.
Ahh, that makes sense in the terminology, if it does represent indices

sharp reef May 9, 2021, 12:46 AM

#

Yeah it really helps not having to keep one-hot distributions in memory for potentially multiple batches

grave frost May 9, 2021, 12:51 AM

#

I don't think it represents that significant of a overhead

lapis sequoia May 9, 2021, 1:58 AM

#

How can I use the transformers lib bert model to continue a conversation (and basically be a chatbot)

#

I've got this code, but I don't know how to turn it's output into usable text.

      tokenized = tokenizer.tokenize(message.content)
      indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized)
      tokens_tensor = torch.tensor([indexed_tokens])
      tokens_tensor = tokens_tensor.to('cpu')
      model.to('cpu')
      outputs = model(tokens_tensor)
      encoded_layers = outputs[0]

#

The encoded_layers is just a buncha random numbers

serene scaffold May 9, 2021, 2:42 AM

#

@lapis sequoia I can look at this tomorrow. However I believe the tokenizer can decode the "random numbers".

lapis sequoia May 9, 2021, 2:42 AM

#

I know they're not random, they're outputs of the... neurons, I think they're called?

serene scaffold May 9, 2021, 2:46 AM

#

I know you know they're not random. Bert uses transformers and I don't believe that transformers use neurons.

lapis sequoia May 9, 2021, 2:46 AM

#

ah

foggy field May 9, 2021, 2:47 AM

#

Hi, I think this might be a stupid question but I'm stuck. How do I train KNNRegressor from scikit using my entire dataset and then predict a target feature from 1 outside observation?

#

i asked my question more indepth in help-pretzel

#

but one has responded yet

foggy field May 9, 2021, 3:08 AM

#

anyone able to help?

flint mason May 9, 2021, 3:14 AM

#

RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x1 and 128x32)

Anyone familiar with this error. An using Resnet by modifying nn.Linear. The error is thrown when trying to evaluate

flint mason May 9, 2021, 3:16 AM

#

foggy field anyone able to help?

def predict_single(input, target, model):
    inputs = input.unsqueeze(0)
    predictions = model(input)               
    prediction = predictions[0].detach()
    print("Input:", input)
    print("Target:", target)
    print("Prediction:", prediction)

#

Will work for almost anything other than image classification

foggy field May 9, 2021, 3:18 AM

#

Oh thanks

#

ill have a look

lapis sequoia May 9, 2021, 3:21 AM

#

flint mason ```python RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x1 and 128...

we do need to do some stuff in resnet in order to pass previous data to next to next. usually shapes are not exactly perfect and that seems to be case here.

#

a simple solN would be having the layers kinda same dimensions instead of reducing

wispy tiger May 9, 2021, 8:23 AM

#

Hey folks! I'm trying to detect motion and people with opencv and deploy it with fastapi, but I'm having some trouble integrating the two. Details are in #help-potato. Could someone pop in and help out?

vast thunder May 9, 2021, 8:37 AM

#

Hello everyone!
So for a CNN (Convolutional Neural Network) , I am using Tensorflow . and I have Conv2d for the first 4 layers, a flatten , and then some dense layers with the last dense layer being Dense(3) with relu activation in ALL of them . My input image is loaded with CV2 and is B&W (the shape is (480, 640, 1)) and the output should be a number, for each image . The number should be either 0, 1 , or 2 . But with my Input (x values) looking something like :

[array([[213, 212, 212, ..., 149, 149, 149],
       [212, 212, 211, ..., 151, 151, 151],
       [211, 211, 211, ..., 150, 150, 150],
       ...,
       [ 27,  27,  27, ...,  18,  18,  18],
       [ 27,  27,  27, ...,  17,  17,  17],
       [ 27,  27,  27, ...,  17,  17,  17]], dtype=uint8), array([[212, 212, 212, ..., 150, 150, 149],
       [211, 211, 211, ..., 152, 151, 151],
       [211, 211, 211, ..., 152, 151, 151],
 . . .

And my output (y values) looking like :

[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

It gives the following error :

ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'numpy.ndarray'>"}), (<class 'list'> containing values of types {"<class 'int'>"})

Why though? Because the types don't match? But when I turn the y values into np.array (like all of them) , I still get an error, telling the input sizes don't match . Anyone knows why?

tidal bough May 9, 2021, 10:24 AM

#

vast thunder Hello everyone! So for a CNN (Convolutional Neural Network) , I am using Tensorf...

What's the input shape of your model? Is it (480, 640, 1) or (480,640)? I think you can check with .input_shape.

vast thunder May 9, 2021, 10:34 AM

#

tidal bough What's the input shape of your model? Is it `(480, 640, 1)` or `(480,640)`? I *t...

My input shape is (480, 640, 1)

#

I can send you the model code if you want

#

https://paste.pythondiscord.com/cefugitepe.apache

tidal bough May 9, 2021, 10:38 AM

#

If that's the case, the input shape for multiple images should be (N,480,640,1) (a single tensor/numpy array)

vast thunder May 9, 2021, 10:38 AM

#

Oh, N being the count?

tidal bough May 9, 2021, 10:38 AM

#

Yup

vast thunder May 9, 2021, 10:38 AM

#

Lemme try

tidal bough May 9, 2021, 10:38 AM

#

It looks like you're passing a list. If you convert that list to a numpy array, what's the shape and dtype?

vast thunder May 9, 2021, 10:44 AM

#

sorry my laptop crashed

#

after running the code with input shape of N

#

so uhm, Im waiting for it to load

#

Okay so

#

Um

#

It doesn't show the dtype

#

@tidal bough Should I convert EVERY element into one np array?

tidal bough May 9, 2021, 10:56 AM

#

vast thunder <@!266216750876459008> Should I convert EVERY element into one np array?

The entire thing you pass to the model should be a single numpy array.

tidal bough May 9, 2021, 10:56 AM

#

vast thunder It doesn't show the dtype

.dtype will show it. And what is this, the input? Shouldn't it be a lot bigger than this?

vast thunder May 9, 2021, 10:57 AM

#

This is for testing purposes , just wanted to practice

#

That's the output

#

The input is images

#

Images , loaded by cv2, grayscaled

tidal bough May 9, 2021, 10:57 AM

#

!e

import numpy as np
arr = np.arange(100).reshape(5,10,2)
print(arr.shape)
print(arr.dtype)

arctic wedgeBOT May 9, 2021, 10:57 AM

#

@tidal bough :white_check_mark: Your eval job has completed with return code 0.

001 | (5, 10, 2)
002 | int64

tidal bough May 9, 2021, 10:57 AM

#

check the shape and dtype of what you're passing to the model

vast thunder May 9, 2021, 10:57 AM

#

kk

#

uint8, (19, 1, 480, 640) is the shape

#

So I'm guessing ... the input shape should be (19, 480, 640, 1)?

tidal bough May 9, 2021, 11:04 AM

#

vast thunder So I'm guessing ... the input shape should be `(19, 480, 640, 1)`?

Yup, if the input shape of your model is (480, 640, 1).

#

Presumably it will complain about shape if you pass this tensor to it.

vast thunder May 9, 2021, 11:04 AM

#

tidal bough Yup, if the input shape of your model is `(480, 640, 1)`.

So I should just add the 19 as the first one?

#

model.add(Conv2D(20, (5, 5), (2, 2), input_shape=(19, 480, 640, 1), activation='relu'))

Is this alright as the first layer?

tidal bough May 9, 2021, 11:04 AM

#

If your shape is (19, 1, 480, 640), you can make it (19, 480, 640, 1) using .transpose([0,2,3,1])

errant gull May 9, 2021, 11:05 AM

#

What would be the best way to append a column of 1s as the 0th column to a numpy array

X = ([4,5,6,7,8,9],
     [4,5,6,7,8,9],
.
.
.
X should then be:
X = ([1,4,5,6,7,8,9],
     [1,4,5,6,7,8,9],
     [1, ..
     [1, ..
.
.
.

vast thunder May 9, 2021, 11:05 AM

#

tidal bough If your shape is `(19, 1, 480, 640)`, you can make it `(19, 480, 640, 1)` using ...

Oh right

tidal bough May 9, 2021, 11:05 AM

#

vast thunder ```py model.add(Conv2D(20, (5, 5), (2, 2), input_shape=(19, 480, 640, 1), activa...

No, your input shape of the model should describe one sample, I believe.

#

So (480,640,1) is right.

vast thunder May 9, 2021, 11:05 AM

#

So (480, 640, 1) then

#

okay, so I just transpose, then fit?

tidal bough May 9, 2021, 11:05 AM

#

yeah, I think so

vast thunder May 9, 2021, 11:05 AM

#

Lemme try

tidal bough May 9, 2021, 11:06 AM

#

errant gull What would be the best way to append a column of 1s as the 0th column to a numpy...

np.hstack([np.ones((X.shape[0],1)),X])

#

np.ones((X.shape[0],1)) is a (n,1)-shape array where n is X's number of rows.

vast thunder May 9, 2021, 11:10 AM

#

Thanks @tidal bough , really appreciate it!

grave frost May 9, 2021, 11:12 AM

#

wait, you can force Convolutionals layers to find horizontal/vertical patterns?

#

#

does anyone know the research/technical term behind this?

lapis sequoia May 9, 2021, 11:32 AM

#

you mean having different kind of filters in same layer?

#

yes. you can look out for inception network.

grave frost May 9, 2021, 11:36 AM

#

lapis sequoia you mean having different kind of filters in same layer?

how can you have different types of filters in the same layers?

lapis sequoia May 9, 2021, 11:37 AM

#

how do i install pyaudio?

lapis sequoia May 9, 2021, 11:37 AM

#

grave frost how can you have different types of filters in the same layers?

there is a model made for that.

grave frost May 9, 2021, 11:37 AM

#

lapis sequoia there is a model made for that.

which one?

lapis sequoia May 9, 2021, 11:37 AM

#

https://www.youtube.com/watch?v=C86ZXvgpejM

YouTube

DeepLearningAI

C4W2L06 Inception Network Motivation

Take the Deep Learning Specialization: http://bit.ly/39thYn3
Check out all our courses: https://www.deeplearning.ai
Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch

Follow us:
Twitter: https://twitter.com/deeplearningai_
Facebook: https://www.facebook.com/deeplearningHQ/
Linkedin: https://www.linkedin.com/com...

▶ Play video

#

..

#

inception network.

grave frost May 9, 2021, 11:38 AM

#

lapis sequoia inception network.

inception model is just composed of inception modules. and the architecutre I describes was sequential VGGish

lapis sequoia May 9, 2021, 11:39 AM

#

i was just implying that if it is like that then it is possible.

grave frost May 9, 2021, 11:39 AM

#

lapis sequoia i was just implying that if it is like that then it is possible.

well it's not, that's why its to mysterious

#

and the guys who described it looked like they had written it all in 10 mins

lapis sequoia May 9, 2021, 11:40 AM

#

alright. thats why i kinda asked question as i was not 100% sure about it.

lapis sequoia May 9, 2021, 11:40 AM

#

grave frost and the guys who described it looked like they had written it all in 10 mins

ref?

grave frost May 9, 2021, 11:40 AM

#

literally fucking one statement "Musically motivated CNN". I mean, why publish a paper when you can't even write out it's whole architecutre

grave frost May 9, 2021, 11:41 AM

#

lapis sequoia ref?

their official paper is 1.5 pages of everything

#

so you can guess what's on it

lapis sequoia May 9, 2021, 11:41 AM

#

wow

#

1.5

grave frost May 9, 2021, 11:41 AM

#

https://arxiv.org/pdf/1909.06654.pdf

#

:trash

lapis sequoia May 9, 2021, 11:42 AM

#

http://jordipons.me/media/CBMI16.pdf

#

this seems like same person. IEEE format.

grave frost May 9, 2021, 11:43 AM

#

ahh, why didn't they link the repo with this paper?

lapis sequoia May 9, 2021, 11:43 AM

#

there is one in which i sent hold on

#

oh i think its reference never mind

grave frost May 9, 2021, 11:45 AM

#

like that paper should have been the first thing in the repo

#

well, their technique is interesting, but severely outdatted

lapis sequoia May 9, 2021, 11:47 AM

#

its just 5 layers?

#

or there is repeatation like vgg?

grave frost May 9, 2021, 11:47 AM

#

uh-huh. on top of that, they have custom layers for temporal and spatial, didn't even run NAS on it

#

A mediocre submission, but I guess it's 5 years old so I have to cut them some slack.

#

I was thinking of using LeVit. Just gotta convince them to gimme more GPU's

lapis sequoia May 9, 2021, 11:49 AM

#

wow i need to learn a lot in data science.

grave frost May 9, 2021, 11:51 AM

#

imo I like Vit a lot, but it's main drawback is the fact that it requires labelled data for pre-training

#

There was a new technique called BYOL, but I haven't got around to understanding it or it's results + it's highly experimental. the advantage for it is that it can use random unlabelled images from the task and pre-train on that, just like in NLP

winged stratus May 9, 2021, 12:07 PM

#

have there been any advances in gans since wgan-gp?

hard hound May 9, 2021, 12:22 PM

#

Hey could someone help me define a function which computes average of every last 50 ints for every point?

grave frost May 9, 2021, 12:31 PM

#

it's honestly quite confusing to keep advances in something. I guess you can check benchmarks 🤷

lapis sequoia May 9, 2021, 12:57 PM

#

Should I be one hot-encoding the training set as well (if possible) or just the labels?

inland zephyr May 9, 2021, 1:42 PM

#

Hello all. I want to ask if there is any good tutorial to transfer learn VGGNet or Facenet with own-build datasets for production level architecture? Since i need to retrain it for specific face recognition task while cut production time. I search on Medium or anywhere there're lack of these kind of tutorial.

forest zenith May 9, 2021, 1:48 PM

#

hi, my code is run, but the voice of my AI is speak too fast

#

here is my code:

import speech_recognition
import pyttsx3
from datetime import datetime

now = datetime.now()

name = input("Please enter your name before using this: ")
today = now.strftime("%B %D, %Y")

robot_ear = speech_recognition.Recognizer()
robot_mouth = pyttsx3.init()
robot_brain = ""

with speech_recognition.Microphone() as mic:
    print("Robot: I'm listening")
    audio = robot_ear.listen(mic)

print("Robot:...")

try:
    you = robot_ear.recognize_google(audio)
except:
    you = ""
print("you: " + you)

if "":
    robot_brain = "I can hear you, try again."
elif "hello" in you:
    robot_brain = "Hello " + name
elif "today" in you:
    robot_brain = today
elif you == "WWDC 2021":
    robot_brain = "WWDC 2021 will start from June 7 to 11. You can see details in https://developer.apple.com/wwdc21/"
else:
    robot_brain = "Sorry, we not supported this question."    
print("Robot:", robot_brain)

robot_mouth = pyttsx3.init()
robot_mouth.say(robot_brain)
robot_mouth.runAndWait()

lapis sequoia May 9, 2021, 1:50 PM

#

serene scaffold <@456226577798135808> I can look at this tomorrow. However I believe the tokeniz...

@serene scaffold if you have some time, I woke up and can do this now

serene scaffold May 9, 2021, 1:51 PM

#

lapis sequoia <@!253696366952316929> if you have some time, I woke up and can do this now

Did you look at what methods the tokenizer has?

lapis sequoia May 9, 2021, 1:51 PM

#

I can't figure out how to look at the methods it has

#

I tried decode but there was an error, lemme try it again

serene scaffold May 9, 2021, 1:51 PM

#

The docs

lapis sequoia May 9, 2021, 1:51 PM

#

inland zephyr Hello all. I want to ask if there is any good tutorial to transfer learn VGGNet ...

I did the Tensorflow 2 for Deep Learning course on Coursera which is pretty good, goes into a fair amount of detail and uses VGG

#

The error is that encoded_layers is a list, and int can't convert it to a number (for tokenizer.decode)

#

Ah, I think I need to call convert_tokens_to_ids on the encoded_layers, then feed it to decode, possibly

serene scaffold May 9, 2021, 1:54 PM

#

lapis sequoia The error is that `encoded_layers` is a list, and `int` can't convert it to a nu...

I'll be at my desktop in like 15

lapis sequoia May 9, 2021, 1:54 PM

#

aight

#

Well, that's progress, I guess?

sick wedge May 9, 2021, 1:57 PM

#

can anyone understand this error?

#

lapis sequoia May 9, 2021, 1:58 PM

#

Trying convert_ids_to_tokens seems to return something that decode can't handle, and convert_tokens_to_ids and putting it through decode returns [UNK]

mint palm May 9, 2021, 2:06 PM

#

the course got updated, lets goooooooooo..................!!!!!!!!!!!

lapis sequoia May 9, 2021, 2:08 PM

#

sick wedge

You need to make sure your output layer and your labels layer are the same shape

#

What's your output layer?

sick wedge May 9, 2021, 2:09 PM

#

No worries I fixed it

lapis sequoia May 9, 2021, 2:09 PM

#

Cool

serene scaffold May 9, 2021, 2:11 PM

#

@lapis sequoiadid you try this one? https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.PreTrainedTokenizer.batch_decode

Tokenizer

A tokenizer is in charge of preparing the inputs for a model. The library contains tokenizers for all the models. Most of the tokenizers are available in two...

lapis sequoia May 9, 2021, 2:12 PM

#

No, I haven't, let my try that

#

Feeding batch decode the output directly: TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
Feeding batch decode convert_ids_to_tokens: ValueError: only one element tensors can be converted to Python scalars
Feeding batch decode convert_tokens_to_ids:

serene scaffold May 9, 2021, 2:19 PM

#

lapis sequoia No, I haven't, let my try that

what line caused that error?

lapis sequoia May 9, 2021, 2:20 PM

#

output directly:

Traceback (most recent call last):
    await coro(*args, **kwargs)
  File ".\public-chatbot02.py", line 28, in on_message
    msg_out = tokenizer.batch_decode(encoded_layers)
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils_base.py", line 3019, in batch_decode
    for seq in sequences
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils_base.py", line 3019, in <listcomp>
    for seq in sequences
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils_base.py", line 3055, in decode
    **kwargs,
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils.py", line 731, in _decode
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils.py", line 706, in convert_ids_to_tokens
    index = int(index)
``` `convert_ids_to_tokens`: ```
Traceback (most recent call last):
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\discord\client.py", line 343, in _run_event
    await coro(*args, **kwargs)
  File ".\public-chatbot02.py", line 28, in on_message
    token_ids = tokenizer.convert_ids_to_tokens(encoded_layers)
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils.py", line 706, in convert_ids_to_tokens
    index = int(index)
ValueError: only one element tensors can be converted to Python scalars

#

This is how I get encoded_layers

      outputs = model(tokens_tensor)
      encoded_layers = outputs[0]

ripe forge May 9, 2021, 2:21 PM

#

mint palm the course got updated, lets goooooooooo..................!!!!!!!!!!!

Thats awesome

serene scaffold May 9, 2021, 2:22 PM

#

@lapis sequoia this is a program where I use BERT. https://github.com/swfarnsworth/pseudobert/blob/ner/pseudobert/pseudofiers/base_pseudofier.py#L70

arctic wedgeBOT May 9, 2021, 2:22 PM

#

pseudobert/pseudofiers/base_pseudofier.py line 70

def _call_bert(self, text: str, start: int, end: int) -> t.Generator[t.Tuple[str, float], None, None]:```

serene scaffold May 9, 2021, 2:23 PM

#

see if that gives you any leads.

lapis sequoia May 9, 2021, 2:24 PM

#

Should I be one hot-encoding the training set as well (if possible) or just the labels?

granite sierra May 9, 2021, 6:09 PM

#

Hello, I'm doing object detection using darknet yolo, how would I write conditional statements based on the information in the image, like "if object in image, do this"

grave frost May 9, 2021, 6:09 PM

#

granite sierra Hello, I'm doing object detection using darknet yolo, how would I write conditio...

if model.predict(image) == 'tree': do_something

glass cedar May 9, 2021, 6:09 PM

#

lapis sequoia Should I be one hot-encoding the training set as well (if possible) or just the ...

Depends heavily on the type of information you want to capture

grave frost May 9, 2021, 6:10 PM

#

honestly, in reality there are very few times that you ever want to one-hot encode your x array

vast thunder May 9, 2021, 6:12 PM

#

Guys a cross-entropy loss basically runs soft-max on the losses?

granite sierra May 9, 2021, 6:12 PM

#

grave frost if `model.predict(image) == 'tree': do_something`

hmm, that's what I thought, the issue is I'm running running the model through the exe, like sending the image over command prompt to the image.

def darknet(note):
    os.chdir("C:/Users/Denis/FinalProject/Carla/darknet-master/build/darknet/x64")
    process = popen_spawn.PopenSpawn("darknet.exe detector test data/obj.data cfg/yolov4.cfg yolov4.weights")
    print(note)
    return process

note = "Darknet Ready"
darknet_process = darknet(note)

def car_recogniser(new, detected_car_loop, sensor_name_image):
    global darknet_process
    carla_image_exists = os.path.isfile(f"C:/Users/Denis/FinalProject/Carla/carla/temp/{sensor_name_image}.jpg")
    if carla_image_exists:
        carla_image = (f"C:/Users/Denis/FinalProject/Carla/carla/temp/{sensor_name_image}.jpg").encode()
        darknet_process.send(carla_image+b"\n")

grave frost May 9, 2021, 6:14 PM

#

vast thunder Guys a cross-entropy loss basically runs soft-max on the losses?

its just terminology used by people that softmax loss refers to cross-entropy loss which is an actual technical term

grave frost May 9, 2021, 6:14 PM

#

granite sierra hmm, that's what I thought, the issue is I'm running running the model through t...

like what exactly?

vast thunder May 9, 2021, 6:15 PM

#

grave frost its just terminology used by people that softmax loss refers to cross-entropy lo...

Hm

granite sierra May 9, 2021, 6:16 PM

#

I posted the code above what i'm doing, I'm running the model through an executable, its not within the script. The model itself isn't written in the script, I wasn't sure how I could incorporate the official darknet model in a pyscript

grave frost May 9, 2021, 6:18 PM

#

granite sierra I posted the code above what i'm doing, I'm running the model through an executa...

so you don't know how run inference on the model?

granite sierra May 9, 2021, 6:20 PM

#

I guess? My method does seem kinda jank

vast thunder May 9, 2021, 6:21 PM

#

Guys is it necessary to memorize/understand all the loss , or generally, formulas that Machine learning has? Considering you use something like Tensorflow though

grave breach May 9, 2021, 6:30 PM

#

vast thunder Guys is it necessary to memorize/understand all the loss , or generally, formula...

You can use machine learning without knowing anything about it

#

Pay attention about the fact that I worte "use"

vast thunder May 9, 2021, 6:30 PM

#

Well

grave breach May 9, 2021, 6:30 PM

#

Because, if you use some auto ml, like autokeras, you don't really get equal results

#

But you can still do a lot

vast thunder May 9, 2021, 6:30 PM

#

I can somewhat write a model in keras, without knowing what the frick the formulas mean

grave breach May 9, 2021, 6:31 PM

#

I suggest you to use autokers

vast thunder May 9, 2021, 6:31 PM

#

But I wonder, can I continue like this?

grave breach May 9, 2021, 6:31 PM

#

No

#

But

vast thunder May 9, 2021, 6:31 PM

#

grave breach I suggest you to use autokers

autokeras? what's that

vast thunder May 9, 2021, 6:31 PM

#

grave breach No

Why not? I won't need them, do I?

grave breach May 9, 2021, 6:31 PM

#

A library that automate all the model buildingh

#

*building

#

Training

#

Ecc.

#

You give it data, and it create and train a suitable model

vast thunder May 9, 2021, 6:31 PM

#

Oh cool . But isn't the model building itself fun, and generally the whole point?

grave breach May 9, 2021, 6:32 PM

#

I seuggest you to start with autokeras, move to keras and then move to pytorch

#

*suggest

vast thunder May 9, 2021, 6:32 PM

#

I am already using keras ig

#

I understand conv2d, dense, maxpooling, dropout, and flatten. I guess that's enough for image classification

grave breach May 9, 2021, 6:33 PM

#

vast thunder Why not? I won't need them, do I?

When you deal with advanced stuff, you need to understand what you're dealing with

grave breach May 9, 2021, 6:33 PM

#

vast thunder I understand conv2d, dense, maxpooling, dropout, and flatten. I guess that's eno...

Yes

vast thunder May 9, 2021, 6:33 PM

#

grave breach When you deal with advanced stuff, you need to understand what you're dealing wi...

Oh I see ... Well AI isn't gonna be my job, but I would do projects in it , which would be pretty fun . So yeah , I guess I won't even be an advanced data scientist

vast thunder May 9, 2021, 6:33 PM

#

grave breach Yes

I see

modern vine May 9, 2021, 6:52 PM

#

Anybody use Chatterbot? How can I recover the author of a message?

twin fiber May 9, 2021, 7:52 PM

#

hello, could anyone please help me understand an issue i'm having with a model i'm building for predicting customer defaults on loan repayments

#

would really really appreciate it

paper agate May 9, 2021, 9:20 PM

#

How can i learn about machine learning in Python? :P

exotic maple May 9, 2021, 9:33 PM

#

twin fiber hello, could anyone please help me understand an issue i'm having with a model i...

You should state the issue directly and people might be able to help you with it

twin fiber May 9, 2021, 9:35 PM

#

I am applying a weight of evidence function to a train data set, however around half of my variables are categorical and have been split into their sub-categories which are using binary 1,0 to say whether that sub category was selected. my WoE however are extremely low and I think I haven't dealt with the categorical variables correctly so just need and advice on how to do so

desert oar May 9, 2021, 9:38 PM

#

twin fiber I am applying a weight of evidence function to a train data set, however around ...

i thought weight of evidence only worked on categorical features anyway?

twin fiber May 9, 2021, 9:38 PM

#

nope

desert oar May 9, 2021, 9:38 PM

#

what's the target of the model? yes/no if the loan defaulted?

twin fiber May 9, 2021, 9:38 PM

#

yes

desert oar May 9, 2021, 9:39 PM

#

dont people just bin numerical features to compute weight of evidence?

#

then it's just the weight of evidence in each bin

twin fiber May 9, 2021, 9:39 PM

#

it's an assignment

#

and they have given us data with loads of variables

#

so I can't imagine we are meant to leave out all of the categorical variables

desert oar May 9, 2021, 9:40 PM

#

thats not what im saying

twin fiber May 9, 2021, 9:40 PM

#

as there are so many, like education type, housing type, contract type etc

desert oar May 9, 2021, 9:40 PM

#

im saying that weight evidence only works on categorical variables

#

and that non-categorical variables need to be binned into categories

twin fiber May 9, 2021, 9:40 PM

#

well that can't be true because I'm following tutorial and they only have numerical variables exclusively

desert oar May 9, 2021, 9:40 PM

#

what tutorial?

sour abyss May 9, 2021, 9:40 PM

#

paper agate How can i learn about machine learning in Python? :P

Cs50 ai is a nice course for learning ai with python

twin fiber May 9, 2021, 9:40 PM

#

maybe I'm misunderstanding categories

#

it's a tutorial from my university

desert oar May 9, 2021, 9:40 PM

#

link to the tutorial if you can

twin fiber May 9, 2021, 9:40 PM

#

a lab session

#

I can't link it it wont allow access

#

#

here is a sample

#

all of those variables are numerical in the data set

desert oar May 9, 2021, 9:41 PM

#

what's this sc module?

twin fiber May 9, 2021, 9:41 PM

#

yes

desert oar May 9, 2021, 9:41 PM

#

?

twin fiber May 9, 2021, 9:42 PM

#

I mean none of those variables are categorical in the dataset they are all numerical values

paper agate May 9, 2021, 9:42 PM

#

sour abyss Cs50 ai is a nice course for learning ai with python

Alright! Thanks!

twin fiber May 9, 2021, 9:42 PM

#

i think i'm misunderstanding something about categories maybe

desert oar May 9, 2021, 9:42 PM

#

what is sc?

twin fiber May 9, 2021, 9:42 PM

#

oh

desert oar May 9, 2021, 9:42 PM

#

where is this woebin_ply function from?

twin fiber May 9, 2021, 9:43 PM

#

scorecard

#

idk I'm very new to this

desert oar May 9, 2021, 9:43 PM

#

is this a publicly available package?

twin fiber May 9, 2021, 9:43 PM

#

yes

desert oar May 9, 2021, 9:43 PM

#

i am pretty sure that the "bin" in that name means that the numerical features are divided into "bins", in order to produce categories

#

maybe im wrong

#

this? https://pypi.org/project/scorecard/

PyPI

scorecard

PMML Scorecard generator

twin fiber May 9, 2021, 9:43 PM

#

either way I have so many and it just looks wrong because all the sub categories have been given their own column

desert oar May 9, 2021, 9:44 PM

#

yes that seems wrong

#

don't one-hot encode the categorical features

twin fiber May 9, 2021, 9:44 PM

#

i just dont know what to do

#

because in the examples they only deal in numerical variables

desert oar May 9, 2021, 9:44 PM

#

can you link to this scorecard package? the one i found does not appear to be correct

twin fiber May 9, 2021, 9:44 PM

#

so they never discuss what to do with categorical ones

desert oar May 9, 2021, 9:44 PM

#

were you ever actually taught how WoE is calculated?

#

is it in a textbook? course notes?

twin fiber May 9, 2021, 9:45 PM

#

yeah but i'm cramming before it is due tomorrow

#

understimated difficulty

desert oar May 9, 2021, 9:45 PM

#

well if you cant link to the package somewhere on the web, and you can't post the definition that you've been given, there isn't much i can say other than what i've already said

twin fiber May 9, 2021, 9:46 PM

#

this i gues

#

https://pypi.org/project/scorecardpy/

PyPI

scorecardpy

Credit Risk Scorecard

desert oar May 9, 2021, 9:47 PM

#

it looks like bins_adj were calculated by sc.woebin

#

i.e. those are "bins" calculated from the numerical features

#

which are themselves categories

twin fiber May 9, 2021, 9:47 PM

#

before this step

#

we used a zscore function on the data

#

if that helps

desert oar May 9, 2021, 9:47 PM

#

that's a good idea to do. but don't use zscore on anything categorical, it doesn't make sense to do it.

twin fiber May 9, 2021, 9:48 PM

#

and thats when i saw the split in the df

desert oar May 9, 2021, 9:48 PM

#

it looks like sc.woebin supports categorical features naturally, according to the docs http://shichen.name/scorecard/reference/woebin.html

WOE Binning — woebin

woebin generates optimal binning for numerical, factor and categorical variables using methods including tree-like segmentation or chi-square merge. woebin can also customizing breakpoints if the breaks_list was provided. The default woe is defined as ln(Pos_i/Neg_i). If you prefer ln(Neg_i/Pos_i), please set the argument positive as negative va...

#

woebin generates optimal binning for numerical, factor and categorical variables

twin fiber May 9, 2021, 9:49 PM

#

yeah i can't understand that page it's way above what i'm doing

#

i'm following their tutorial and applying the functions to the provided data set

#

no where do they explain how to deal with categorical variables 😦

desert oar May 9, 2021, 9:50 PM

#

(this is also the R docs i think)

#

what i would do is this:

make sure that your categorical features are either pd.Categorical or otherwise not a numeric type (e.g. int, float)
use sc.woebin and sc.woebin_ply as normal

#

just try it

#

use a small sample of the dataset

#

pick 2 categoricals and 2 numerics

twin fiber May 9, 2021, 9:50 PM

#

i dont know how

desert oar May 9, 2021, 9:50 PM

#

you dont know how to select columns from a pandas dataframe?

#

actually you dont even need to, it takes whole dataframes

#

    dt: A data frame with both x (predictor/feature) and y (response/label) variables.
    y: Name of y variable.
    x: Name of x variables. Default is None. If x is None, 
      then all variables except y are counted as x variables.

#

if you're using jupyter, go into a new cell and type ?sc.woebin

#

it will show you the docs

twin fiber May 9, 2021, 9:51 PM

#

i'm using colab

desert oar May 9, 2021, 9:51 PM

#

same thing

#

also Address is almost certainly a categorical variable...

twin fiber May 9, 2021, 9:52 PM

#

what am i meant to be reading I honestly am very new to this i can't understand what this is saying to me

desert oar May 9, 2021, 9:52 PM

#

you need to be learning basic python then, and basic pandas usage, as well as the fundamental vocabulary that people use to talk about data

#

im sorry if they didnt teach you this and you were expected to figure it out

#

thats not good teaching style and its counterproductive to learning

twin fiber May 9, 2021, 9:53 PM

#

:/

desert oar May 9, 2021, 9:53 PM

#

what kind of course is this? something actuarial?

twin fiber May 9, 2021, 9:53 PM

#

business analytics

desert oar May 9, 2021, 9:53 PM

#

i see

#

can you at least show the previous code sections

#

just copy and paste the code i dont need the text

#

i can try to give a very quick explanation of what it is doing

twin fiber May 9, 2021, 9:55 PM

#

there is quite a lot

#

i don't know how far to go back

#

I just need to know how to deal with these categorical variables because my WoE are so low they are useless

#

I don't understand why they didn't use categorical variables in any of the examples

#

#

#

have about 20 graphs like that

#

some look like this which seems wrong

#

#

#

that's all that's relevant I think

exotic maple May 9, 2021, 10:12 PM

#

It's the first time i've seen weight of evidence and information value lol. It looks intersting and easy to calculate, especially IV, but WoE...I dont think i'm getting what it intends to represent

twin fiber May 9, 2021, 10:15 PM

#

the nagative or positive impact of the variable relative to your target I guess

#

I'm not too familiar either

exotic maple May 9, 2021, 10:15 PM

#

towards data science tends to do a good job explaining stuff

#

https://towardsdatascience.com/model-or-do-you-mean-weight-of-evidence-woe-and-information-value-iv-331499f6fc2

Medium

Model? Or do you mean Weight of Evidence (WoE) and Information Valu...

Using the Titanic data set to explain and implement both the concepts step-by-step. A great opportunity to also code in Julia!

twin fiber May 9, 2021, 10:16 PM

#

yeah thank you i'll have a look

#

sadly i'm just going to have to continue modelling even though I know what I've done is incorrect

#

can't figure this out right now and have a llot more to finish

chilly pebble May 9, 2021, 10:17 PM

#

Hi everyone, woudl this be a place to talk about a decison i need to make by tmrw morning. I have to decide if I am going to move forward with my CS masters degree or if I am making the switch to a MS in Data Science. Are any of you professional data scientists?

exotic maple May 9, 2021, 10:18 PM

#

chilly pebble Hi everyone, woudl this be a place to talk about a decison i need to make by tmr...

you should probably go to #career-advice

chilly pebble May 9, 2021, 10:18 PM

#

exotic maple you should probably go to <#470889390588035082>

I am there now, I guess i should have just asked if anyone did Data Science as a job right now. I have some basic questions about that

desert oar May 9, 2021, 10:27 PM

#

@twin fiber it doesn't look like they only used numeric variables

twin fiber May 9, 2021, 10:27 PM

#

they did

#

I have a copy of the dataset

#

it is all numeric

desert oar May 9, 2021, 10:27 PM

#

gender is not a numeric variable

twin fiber May 9, 2021, 10:27 PM

#

they even had random numbers in the excel file for the address

#

I know but they used numbers in the excel file to replace

desert oar May 9, 2021, 10:27 PM

#

then they were just showing how to use the code

#

try it on the categorical data

twin fiber May 9, 2021, 10:31 PM

#

i have tried their code on my data

#

that is how i'm at this stage

#

what i'm saying is in their examples they are only applying the code to numeric values

#

whereas I have categories and subcategories

#

and I don't know to deal with them

#

I have used pandas.get_dummies()

#

which is meant to handle the categorical variables, but that has just divided the categories into the respective sub categories and assigned a 1 or a 0

desert oar May 9, 2021, 10:34 PM

#

im telling you not to use that pd.get_dummies or anything like it

#

just run it on the data with the categorical features, without trying to transform them

#

the docs clearly state that the default "tree" method works on both

twin fiber May 9, 2021, 10:35 PM

#

I'm just following the code

#

from me lectures

#

so I feel like it must be done this way surely

desert oar May 9, 2021, 10:36 PM

#

the code says to use pd.get_dummies on the categorical features?

#

it will certainly work

#

in fact it should be equivalent to not doing it, now that i think about it

twin fiber May 9, 2021, 10:36 PM

#

not explicitly on categorical variables

desert oar May 9, 2021, 10:36 PM

#

so what exactly is the problem that you are encountering

twin fiber May 9, 2021, 10:36 PM

#

the problem is that my data just seems wrong

desert oar May 9, 2021, 10:36 PM

#

well pd.get_dummies makes no sense to use on numeric features

twin fiber May 9, 2021, 10:36 PM

#

it has split sub categories up

#

and i have so many columns

desert oar May 9, 2021, 10:37 PM

#

what is a subcategory?

#

did you consider using fewer columns?

twin fiber May 9, 2021, 10:37 PM

#

and me WoE numbers are so low they are almost meaningless

#

like

desert oar May 9, 2021, 10:37 PM

#

that's probably because the splits were wrong

twin fiber May 9, 2021, 10:37 PM

#

type of job

#

and within that variable the customers can answer 5 different options

#

so my pd.get_dummies has split that into 5 columns

#

1 for each job

#

and assigned a 1 if it was selected

#

#

this slides along, there are just too many columns

#

and they say to remove any info_values below 0.1

#

#

all of mine are below 0.1 so I know it's wrong

desert oar May 9, 2021, 10:39 PM

#

can you just show the code that you ran

twin fiber May 9, 2021, 10:45 PM

#

which part man

#

my code is pages long

desert oar May 9, 2021, 10:46 PM

#

import pandas as pd
import scorecardpy as sc

df = pd.DataFrame([
  {'gender': 'f', 'weight_lbs': 120, 'is_adult': 1},
  {'gender': 'f', 'weight_lbs': 130, 'is_adult': 1},
  {'gender': 'f', 'weight_lbs': 60, 'is_adult': 0},
  {'gender': 'm', 'weight_lbs': 50, 'is_adult': 0},
  {'gender': 'm', 'weight_lbs': 70, 'is_adult': 0},
  {'gender': 'f', 'weight_lbs': 30, 'is_adult': 0},
  {'gender': 'f', 'weight_lbs': 45, 'is_adult': 0},
  {'gender': 'm', 'weight_lbs': 175, 'is_adult': 1},
  {'gender': 'm', 'weight_lbs': 163, 'is_adult': 1},
])

bins = sc.woebin(
    df,
    y='is_adult',
    x=['weight_lbs', 'gender'],
    breaks_list={'weight_lbs': [100]},
)
df_woe = sc.woebin_ply(df, bins)
print(df_woe)

this works. i had to manually specify a break point for the weight_lbs column, not entirely sure why, maybe not enough data for the tree splitting algorithm. but it works.

twin fiber May 9, 2021, 10:46 PM

#

desert oar May 9, 2021, 10:47 PM

#

ok, none of that looks offensive. pd.get_dummies will ignore numerical columns and only convert the categorical ones, so that should be fine

#

however it looks like you never actually used the z-scores you created

twin fiber May 9, 2021, 10:48 PM

#

hmm

desert oar May 9, 2021, 10:48 PM

#

you'd have to do something like customer_data[zscores.columns] = zscores to replace the numeric columns in the customer data with the z-score versions

twin fiber May 9, 2021, 10:48 PM

#

i guess i didn't see them use it in their code

desert oar May 9, 2021, 10:48 PM

#

(ideally you wouldn't replace them, you'd make new columns, but it sounds like you're already struggling w/ the python basics and at this point you just need to get it working)

#

maybe they use it later in the code?

twin fiber May 9, 2021, 10:49 PM

#

well the first lab was preprocessing

#

it ended with zscore function to normalize the data

#

now the second lab goes straight to WoE

#

and they don't use the zscores created previously

#

they just do the WoE stuff I'm trying to do now

#

but my results are just terrible

desert oar May 9, 2021, 10:50 PM

#

its also not that big a deal if you dont use the z scores

#

what code did you use to compute the WoE scores

#

that is what i am interested in

#

because that is where we might be able to identify the problems and misunderstandings

#

evidently you were never taught what WoE actually is or how to use python. which makes me upset and frustrated on your behalf, but that's not something we can change now.

twin fiber May 9, 2021, 10:51 PM

#

yeah it's been an insane few days

desert oar May 9, 2021, 10:51 PM

#

so i want to at least see specifically what you did that generated the bad output, so i can try to at least patch up your understanding enough to survive your exam

twin fiber May 9, 2021, 10:51 PM

#

the zscore line took me 3.5 hours yesterday

#

to get to work and remove the right columns LOL

#

it's an assignment due tomorrow at 4, luckily no exam

#

but i have so much more to try and learn

desert oar May 9, 2021, 10:51 PM

#

im sorry to hear that. learning is always slow at first, but having poor instruction makes it that much worse.

twin fiber May 9, 2021, 10:52 PM

#

yeah true thanks for your concern haha

#

#

that's what i used to compute the scores

desert oar May 9, 2021, 10:53 PM

#

and how did you compute bins_adj

twin fiber May 9, 2021, 10:55 PM

#

well first it was this

#

#

and then used that other function to manually adjust some of them

#

for the WoE plots, I have quite a few that look like this

#

#

that can't be right

desert oar May 9, 2021, 10:56 PM

#

ok, and how did you compute breaks_adj

velvet thorn May 9, 2021, 10:56 PM

#

oh lord

#

weight of evidence

#

it's been a while 🥴

twin fiber May 9, 2021, 10:56 PM

#

what do you mean?

desert oar May 9, 2021, 10:56 PM

#

it looks to me like this data is highly imbalanced. where you only have a small % of "bad" cases

desert oar May 9, 2021, 10:57 PM

#

twin fiber what do you mean?

when you compute bins_adj there is a variable called breaks_adj. how did you create breaks_adj?

twin fiber May 9, 2021, 10:57 PM

#

i just used the function

#

sc.woebin_adj

#

and it has a little input box to manually play with bins

#

and i just tried different bins to try and get some of the unintuitive plots to be monotonic

#

which is what was advised in the lectures slides

grave frost May 9, 2021, 10:58 PM

#

**My biggest mystery in ML is: **
HOW IN THE HOLY FUCK CAN I GET BETTER ACCURACY IF I TRAIN A MODEL FROM SCRATCH THAN PRE-TRAINING

desert oar May 9, 2021, 10:58 PM

#

grave frost **My biggest mystery in ML is: ** *HOW IN THE HOLY FUCK CAN I GET BETTER ACCURAC...

local minima

twin fiber May 9, 2021, 10:59 PM

#

is this something to worry about

#

grave frost May 9, 2021, 10:59 PM

#

desert oar local minima

but I tried all the LR's 😦

desert oar May 9, 2021, 10:59 PM

#

it means that there is 1 column that has all the same data in every entry

#

no, it is not something to worry about

#

it really would help if you at least listed the steps that you took to get from "a dataframe" to "the final output"

twin fiber May 9, 2021, 11:01 PM

#

um okay

#

i will try

desert oar May 9, 2021, 11:01 PM

#

i imagine you did something like this:

load the data
apply woebin to df and got bins
apply woebin_adj to bins and got breaks_adj
apply woebin to df with breaks_adj and got bins_adj
apply woebin_ply to df with bins_adj to get the final bad output

#

is that close enough?

#

and you are worried because the output seems like it's so low-quality that you think you did something wrong

twin fiber May 9, 2021, 11:02 PM

#

give me a sec i will close frames and then show code

desert oar May 9, 2021, 11:02 PM

#

you can also use our code posting site https://paste.pythondiscord.com to post your code

#

it would be easier to read that way anyway

twin fiber May 9, 2021, 11:02 PM

#

okay

#

how do i copy and paste multiple modules

#

not letting me highlight more than 1

desert oar May 9, 2021, 11:03 PM

#

notebook cells? i dont know if you can in colab. i hate colab.

grave frost May 9, 2021, 11:03 PM

#

@desert oar ok, then how do I do this

One classic technique is to gradually reduce the learning rate, then increase it and slowly draw it down, again, several times.
it just sounds like random Learning rates with extra steps

twin fiber May 9, 2021, 11:03 PM

#

yeah we have to use it

#

it's shit though imo

grave frost May 9, 2021, 11:04 PM

#

you can share notebook in colab to work on it in real time

desert oar May 9, 2021, 11:04 PM

#

i agree. i like that they make all the libraries and models available and i like that you get free gpu compute, but the interface is hot garbage.

desert oar May 9, 2021, 11:04 PM

#

grave frost you can share notebook in colab to work on it in real time

i can't and i don't want to, it's through their school

twin fiber May 9, 2021, 11:05 PM

#

https://paste.pythondiscord.com/seronokaco.sql

desert oar May 9, 2021, 11:06 PM

#

grave frost <@!389497659087650836> ok, then how do I do this > One classic technique is to g...

yeah learning rate stuff is kind of black magic. but initialization also matters a lot, it made sense to use imagenet for initialization but i guess if your data is very different then it also makes sense that it wouldnt work

twin fiber May 9, 2021, 11:06 PM

#

the seed is my student number

#

so i've just changed that

desert oar May 9, 2021, 11:06 PM

#

ok, good. i do not want to know that information.

#

thank you for sharing the code

grave frost May 9, 2021, 11:07 PM

#

desert oar yeah learning rate stuff is kind of black magic. but initialization also matters...

no, like I initialized with imagenet, then fitted on big dataset. then again, with no changes, I recompiled my model slightly (increased regularization to prevent overfitting) and re-fitted. well, the accuracy on base dataset and small one is similar 😐

#

~~I have never expereienced such shit in my life please send help~~

twin fiber May 9, 2021, 11:08 PM

#

thank you for helping

desert oar May 9, 2021, 11:09 PM

#

so do you think that any of these features should have a strong predictive effect on the target?

#

maybe this data is just crap for this task

twin fiber May 9, 2021, 11:09 PM

#

that's not the problem

#

well

#

maybe

#

but i mean

#

they have this in the code

desert oar May 9, 2021, 11:09 PM

#

if you selected the data yourself then it's very very possible that it's just not a good dataset

twin fiber May 9, 2021, 11:09 PM

#

#

look at my IV values on the right

#

#

so all of my variables have no impact?

#

and 1 has very weak impact

#

i've obviously done something wrong

distant path May 9, 2021, 11:12 PM

#

What level of python do I need to learn opencv?

grave frost May 9, 2021, 11:14 PM

#

Right, so basically I have a NN with an efficientnet + some Conv layers. Initially, I initialize the effnet with imagenet weights and then train from scratch on a pretty big dataset.

Next, I increase the regularization of the whole Net and re-fit to another smaller dataset. so the weights of all layers would be used again. the mystery is that this whole method performs much better than if I freezed my base network (effnet) and pre-trained on the small dataset.

I need theories on this. Any idea what could have happened?

desert oar May 9, 2021, 11:14 PM

#

@twin fiber personally i think that IVs are fucked up because you used pd.get_dummies. for a variable like NAME_HOUSING_TYPE, you need to be adding up the IVs across all the individual columns created by pd.get_dummies

#

are you very very certain that they told you to use WoE and IV on the output from pd.get_dummies?

distant path May 9, 2021, 11:17 PM

#

distant path What level of python do I need to learn opencv?

Please help

desert oar May 9, 2021, 11:18 PM

#

distant path Please help

what do you mean level? like your personal experience level?

distant path May 9, 2021, 11:18 PM

#

What are the things I should know about

twin fiber May 9, 2021, 11:21 PM

#

umm

desert oar May 9, 2021, 11:22 PM

#

distant path What are the things I should know about

at minimum: functions, modules, reading and writing files, numpy basics, how to use pip

twin fiber May 9, 2021, 11:22 PM

#

maybe not but I don't understand why we would go throuhg a whole preprocessing stage

#

in the previous lab

#

and then ignore it and use fresh data

desert oar May 9, 2021, 11:22 PM

#

are you sure they werent just showing you various ways to process your data

distant path May 9, 2021, 11:22 PM

#

desert oar at minimum: functions, modules, reading and writing files, numpy basics, how to ...

Alright thanks

twin fiber May 9, 2021, 11:23 PM

#

desert oar are you very very certain that they told you to use WoE and IV on the output fro...

actually I don't think i did apply it to the output from pd.get_dummies

desert oar May 9, 2021, 11:23 PM

#

it looks like you did, based on the column names

#

from my messing around with this module, it doesnt look like it "splits" columns for you into separate columns, one per category. however get_dummies definitely does that

twin fiber May 9, 2021, 11:24 PM

#

oh i did yes

desert oar May 9, 2021, 11:24 PM

#

i bet your WoE numbers look fine but your IVs look bad

#

try it without the splits

desert oar May 9, 2021, 11:24 PM

#

grave frost Right, so basically I have a NN with an efficientnet + some Conv layers. Initial...

im not sure i follow the 2 different procedures here

twin fiber May 9, 2021, 11:24 PM

#

what do you mean how do i do that

grave frost May 9, 2021, 11:25 PM

#

desert oar im not sure i follow the 2 different procedures here

they aren't 2 different procedures; first is just training the model again with new data (keeping old weights) and second is the proper pre-training --> freezing base model, then training the F.C on new data

desert oar May 9, 2021, 11:25 PM

#

i have to step away for a while... @twin fiber maybe go take a break and get some fresh air. sounds like you've been at this nonstop for hours.

heady yarrow May 9, 2021, 11:26 PM

#

hello

twin fiber May 9, 2021, 11:26 PM

#

i have been but sadly I don't have time to stop as it's due tomorrow and I have a lot more stuff to do r.e. applying models

heady yarrow May 9, 2021, 11:26 PM

#

twin fiber May 9, 2021, 11:26 PM

#

thanks so much for your time

#

I appreciate you ❤️

heady yarrow May 9, 2021, 11:27 PM

#

I want to get a list of players data here.

grave frost May 9, 2021, 11:27 PM

#

for generating code with AI, why don't we have masked "code modules" with special tokens for variables, chars (language specific subtleties etc.) and apply the masking LM recipie to get a good (kind of advanced) code autocompleter?

heady yarrow May 9, 2021, 11:27 PM

#

what should I do?

#

rigid ledge May 10, 2021, 1:24 AM

#

hey guys, I am training a graph GAN that consumes a lot of RAM. what are the tricks to reduce its usage ?

bronze skiff May 10, 2021, 1:26 AM

#

uh, what part consumes a lot of ram?

#

you can lazily load in data, saving you some memory on the input side... if your backprop graphs are too large to fit in memory, you can do gradient checkpointing

#

if you need to use a large batch size, but not enough space locally, maybe distribute over multiple nodes

#

your graph representation could also be really inefficient-- are you storing entire adjacency matrices, or just lists?

#

lots of things you can try

#

but you should learn how to use a profiler first, to figure out memory usages of each part

lapis sequoia May 10, 2021, 3:13 AM

#

if i am lacking images on my data set, how good/bad would be using a GAN to generate more images? 🙂

zinc lark May 10, 2021, 3:35 AM

#

do i need cuda installed locally after I build pytorch from source w/ CUDA support? I know that building CUDA-enabled pytorch requires CUDA, but what about running that build?

short inlet May 10, 2021, 5:41 AM

#

distant path What level of python do I need to learn opencv?

After you finish Machine learning

#

Because you will need Train Test Split and everything to run detection

charred skiff May 10, 2021, 6:27 AM

#

Helloooo,a basic question I have,why do we mutiply input with weights?

haughty tree May 10, 2021, 7:47 AM

#

https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/

Udemy

Learn Python for Data Science, Structures, Algorithms, Interviews

Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!

#

can i start this course to get knowledge in ml

verbal niche May 10, 2021, 8:47 AM

#

Hey there ! i was looking into the math behind neural nets, can anyone suggest some resources where they explain how they actually work ?

distant path May 10, 2021, 9:44 AM

#

haughty tree can i start this course to get knowledge in ml

I would also like an answer to this question

inland zephyr May 10, 2021, 11:16 AM

#

i need some suggestion about face recognition with transfer learning method. Let say if i have a hundred unique person but i only one image per person as the source. The task is to identify who's face is this. Is is feasible if i use image generator to make train test and valid images for each person?

#

since what i have now is one image per person

inland zephyr May 10, 2021, 11:20 AM

#

verbal niche Hey there ! i was looking into the math behind neural nets, can anyone suggest s...

you can read deep learning PDF e book from Ian Goodfellow which is give brief explanation about math behind the nets

verbal niche May 10, 2021, 11:24 AM

#

inland zephyr you can read deep learning PDF e book from Ian Goodfellow which is give brief e...

Lan goodfellow, will remember that, thanks

inland zephyr May 10, 2021, 11:24 AM

#

Ian Goodfellow
you're wellcome

brittle stratus May 10, 2021, 1:13 PM

#

zinc lark do i need cuda installed locally after I build pytorch from source w/ CUDA suppo...

I typically use GPU's via Google Collab notebooks for my grad courses. You need to verify cuda is available and you need to place the features and targets on the selected device. Here is the cuda documentation: https://pytorch.org/docs/stable/cuda.html

brittle stratus May 10, 2021, 1:16 PM

#

zinc lark do i need cuda installed locally after I build pytorch from source w/ CUDA suppo...

The easiest way is to do "torch.cuda.is_available()" for verification or just something like "device = torch.device("cuda" if torch.cuda.is_available() else "cpu")" and then during training and testing you place the features and targets on the device (i.e. device.to() call).

brittle stratus May 10, 2021, 1:18 PM

#

haughty tree https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootca...

I would watch Sentdex machine learning course on YouTube. He walks through all the more basic machine learning models and implements them line by line with clear cut explanations. plus it is free! I used that to get started and it helped build up my foundational knowledge. Link: https://www.youtube.com/playlist?list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v

YouTube

Machine Learning with Python

#

Has anyone here used Named-Entity-Recognition to categorize streamed data and then perform sentiment analysis with BERT or VADER or some other pretrained model? I have only done sentiment analysis but am trying to also use NER, so I don't have to query my streamed data prior and can just analyze all of it real-time. Any help would be greatly appreciated!

raven knoll May 10, 2021, 1:50 PM

#

I trained a logistic regression model with some scraped data and saved the model. Now im trying to import it and im trying to predict the test data from my school project but I keep getting the message could not convert string to float:

My code looks like this right now.

test_data = df["text"]

# loading the vectorizer
vect_name = open("ml-vectorizer/tldf-vectorizer.sav", "rb")
Xtest = vectorizer.transform(test_data)

# loading the model
model_name = open("ml-models/sentiment-model.sav", "rb")
model = joblib.load(model_name)

model.predict(test_data)```

I have no clue where to look...

neon marsh May 10, 2021, 2:08 PM

#

Does anyone know where I can find a tutorial of how to do this using python and tensorflow?

grim fiber May 10, 2021, 2:21 PM

#

X = teamid_matches.loc[:, ['home_team', 'away_team', 'home_team_cup', 'away_team_cup']]
X = np.array(X).astype('int32')
res_filter_drop = res_filter.drop(['date', 'home_score', 'away_score', 'tournament', 'city', 'country', 'neutral'], 1)
# Append data: simply exchange 'home team name' with 'away team name', 'home team championship' with 'away team
# championship', and replace the result
_X = X.copy()

_X[:, 0] = X[:, 1]
_X[:, 1] = X[:, 0]
_X[:, 2] = X[:, 3]
_X[:, 3] = X[:, 2]

y = res_filter_drop.loc[:, ['Winner']]
y = np.array(y).astype('int32')
y = np.reshape(y, (1, 900))
y = y[0]

_y = y.copy()

for i in range(len(_y)):
    if (_y[i] == 1):
        _y[i] = 2
    elif (_y[i] == 2):
        _y[i] = 1

X = np.concatenate((X, _X), axis=0)

y = np.concatenate((y, _y))

# Shuffle and split test, train
X, y = shuffle(X, y)

#

Guys I dont understand this piece of code right here why did he concatenate the columns for this prediction using svm

haughty tree May 10, 2021, 2:22 PM

#

brittle stratus I would watch Sentdex machine learning course on YouTube. He walks through all t...

he uses some library which i doesn't even know

grim fiber May 10, 2021, 2:23 PM

#

raven knoll I trained a logistic regression model with some scraped data and saved the model...

read the error message word by word you will find why

thorny kite May 10, 2021, 2:36 PM

#

oh good math

raven knoll May 10, 2021, 2:51 PM

#

grim fiber read the error message word by word you will find why

I am a pendejo, thanks I just noticed I am using the wrong variable

grim fiber May 10, 2021, 2:55 PM

#

yes you used test_data

#

@raven knoll

kindred radish May 10, 2021, 3:50 PM

#

So i;ve come up with this diagram for a single-layer perceptron network:

#

#

Sigma is the summation function, phi is the activation function, x are the inputs, y are the outputs

#

Would this be the correct diagram for a multi-layer perception network with one hidden layer?

#

#

Basically just checking if you have layers of activation functions:? Or is it only the output layer that has them?

mint palm May 10, 2021, 3:58 PM

#

Anyone having access currently to coursera deep learning specisialisation :course 2 ...plz dm?......theres a small help i need....please and thank you

rough otter May 10, 2021, 4:21 PM

#

if i have two correlated variables like sqft_living (house sqft) and sqft_above (upstairs sqft), is it necessary or beneficial to get rid of one of the two?

sweet zenith May 10, 2021, 4:42 PM

#

hey guys, which is better tensorflow or pytorch?

merry lintel May 10, 2021, 4:52 PM

#

most likely depends on the use case

austere swift May 10, 2021, 4:52 PM

#

^

#

i personally like pytorch better

#

its more pythonic

#

but for getting started using keras with tensorflow is easier

merry lintel May 10, 2021, 4:53 PM

#

@austere swift

#

i have a question

#

im pretty much a beginner but how did you get started with ai/ml?

#

i know dumb quesiton but i just wonder

austere swift May 10, 2021, 4:54 PM

#

i just started out doing some projects and learning about it

#

you have to have a good math background first though

#

linear algebra and calculus mainly

merry lintel May 10, 2021, 4:54 PM

#

did you study it by yourself?

austere swift May 10, 2021, 4:54 PM

#

yes

#

i made a simple neural network from scratch using only numpy

#

when i first started

atomic tide May 10, 2021, 4:55 PM

#

merry lintel im pretty much a beginner but how did you get started with ai/ml?

What level of education are you at? An online course might be the way to go. There are several excellent free online courses on Edx, Coursera, and MIT OCW.

austere swift May 10, 2021, 4:55 PM

#

that way i can learn more about it

atomic tide May 10, 2021, 4:55 PM

#

I actually started by taking an online course run by Berkeley. It's what got me into computer science and programming.

merry lintel May 10, 2021, 4:56 PM

#

i am currently in 8th grade but i dont live in uk/us so it means a bit different thing

#

oh okay

merry lintel May 10, 2021, 4:56 PM

#

atomic tide I actually started by taking an online course run by Berkeley. It's what got me ...

do you think khan academy is a good resource?

atomic tide May 10, 2021, 4:56 PM

#

Khan Academy is great!

merry lintel May 10, 2021, 4:57 PM

#

agreed

atomic tide May 10, 2021, 4:57 PM

#

I don't think they have much on AI, but their math content is really good.

#

That would be a great way to get your math up to the level required to really understand what you're doing in AI/ML.

merry lintel May 10, 2021, 4:58 PM

#

yeah thats fine because i enjoy maths anyway xd

atomic tide May 10, 2021, 4:58 PM

#

On the practical side, you might want to check out Kaggle: https://www.kaggle.com

Kaggle: Your Machine Learning and Data Science Community

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

merry lintel May 10, 2021, 4:59 PM

#

i mean you can probably do ai without actually understanding the math much but i want to know what am i doing

merry lintel May 10, 2021, 4:59 PM

#

atomic tide On the practical side, you might want to check out Kaggle: https://www.kaggle.co...

oh neat

merry lintel May 10, 2021, 4:59 PM

#

merry lintel oh neat

is it free?

atomic tide May 10, 2021, 5:00 PM

#

merry lintel i mean you can probably do ai without actually understanding the math much but i...

Yeah, I was about to say actually, it would be fine to get started trying out AI even if you don't yet have the math prerequisites. I personally paid far too much attention to the theory of AI relative to the practical side.

merry lintel May 10, 2021, 5:01 PM

#

and one more thing

#

if i want to get started with making a basic neural network from scratch should i use numpy?

atomic tide May 10, 2021, 5:02 PM

#

merry lintel is it free?

I believe Kaggle has free courses. Edx and Coursera courses can usually be audited for free (i.e. you have access to all the materials but don't get a certificate at the end).

atomic tide May 10, 2021, 5:02 PM

#

merry lintel if i want to get started with making a basic neural network from scratch should ...

Indeed 🙂

#

There's one text-book I recommend above all others for AI, and thats Artificial Intelligence: A Modern Approach by Russell and Norvig. It provides a pretty comprehensive overview of the subject.

merry lintel May 10, 2021, 5:04 PM

#

oh okay

#

i will check out all the resources

#

thanks a lot

atomic tide May 10, 2021, 5:04 PM

#

No prob 👍

knotty flume May 10, 2021, 5:14 PM

#

hey guys hows it going

#

how do i increase the bar space in matplotlib in graphs

#

in barchart

#

AttributeError: 'Rectangle' object has no property 'rwdith'
``` also  got this error

#

TypeError: barh() got multiple values for argument 'width'

neon marsh May 10, 2021, 5:28 PM

#

For anyone who needs to use scipy on Big Sur/Apple Silicon:
https://github.com/scipy/scipy/issues/13102#issuecomment-733988544

grave frost May 10, 2021, 6:15 PM

#

oh god, I joined this Ai server on discord and it's absolute shite. they think AGI is just a task, like classification
You would think an Ai-focused server would be better than a channel in the python server

grave frost May 10, 2021, 6:18 PM

#

austere swift i personally like pytorch better

I am pretty sure half of your messages are promotions for PyTorch. Seriously guys, TF/Keras is not that bad 🙂

austere swift May 10, 2021, 6:18 PM

#

I just said I personally like it more

#

I still use keras a lot lol

grave frost May 10, 2021, 6:28 PM

#

Are you sponsored by FAIR to spread Pytorch?

#

😛

#

oof, I hate all these stupid dependencies

knotty flume May 10, 2021, 6:30 PM

#

Tell me about it

grave frost May 10, 2021, 6:44 PM

#

I just hate numba. it f-ups all my stuff

true crag May 10, 2021, 6:51 PM

#

any Idea?
ValueError: Input 0 of layer sequential is incompatible with the layer: expected axis -1 of input shape to have value 96 but received input with shape (32, 1)

#

I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)

#

and that one

grave frost May 10, 2021, 6:58 PM

#

what do you think the error means?

limber yew May 10, 2021, 7:12 PM

#

Hello guys!

#

soooo I am like 14 and I want to learn AI

#

can anyone recommend any vids?

#

Anyone?

misty torrent May 10, 2021, 7:51 PM

#

Does the memory / cpu required to drop a column from a dataframe scale with the amount of records in the column?

sly salmon May 10, 2021, 7:51 PM

#

what is a data pipeline? is this similar to a ci/cd pipeline?

uncut orbit May 10, 2021, 7:56 PM

#

limber yew can anyone recommend any vids?

ik a pretty good class this summer

#

its berkeley coding academy

#

its pretty good and helpful

grave frost May 10, 2021, 7:56 PM

#

sly salmon what is a data pipeline? is this similar to a ci/cd pipeline?

what has that got to do with data?

tidal bough May 10, 2021, 7:57 PM

#

misty torrent Does the memory / cpu required to drop a column from a dataframe scale with the ...

Hmm, good question. I'd assume that it's O(1), but best idea is to test it.

misty torrent May 10, 2021, 7:57 PM

#

Wait I'm disabled, why not just... not load useless columns

#

then I don't have to drop it

#

big brain time

sly salmon May 10, 2021, 7:58 PM

#

grave frost what has that got to do with data?

well, I don't know - I'm not sure what a data pipeline is

misty torrent May 10, 2021, 7:58 PM

#

https://aakashgoel12.medium.com/pandas-optimize-memory-and-speed-operation-17d8a66c8be4#3.-Reduce-DataFrame-Loading-Time

also this helped significantly with the source problem, not the main question

Medium

Pandas — Optimize Memory and Speed Operation

Watch Other Interesting Data Science Topics and my other medium articles

rapid ridge May 10, 2021, 8:11 PM

#

I have a pipeline where I need to send data, but should I yield data as a single object or send it in a list of hashes?

limber yew May 10, 2021, 8:24 PM

#

uncut orbit its berkeley coding academy

k thanks!
.

desert oar May 10, 2021, 9:57 PM

#

rapid ridge I have a pipeline where I need to send data, but should I yield data as a single...

Can you elaborate

rapid ridge May 10, 2021, 10:00 PM

#

desert oar Can you elaborate

how do I properly build the following pipeline where all my data sources shares the same structure ?? https://dpaste.org/knNX#L1,15 the example: https://avleonov.com/wp-content/uploads/2017/05/home-grown_vulnerability_database.png

I need currently like a chat where a user sends a message then the other party recieve that data and save it
my approach

celery for task queue (2 tasks one each 24 hours, and another each hour)
node manger that acts as a chat to stream data
scrapers streaming data to node manger
node manager saves data < I am not sure if a list of dicts or single json object = many queries to db

true crag May 10, 2021, 11:22 PM

#

I am trying to create a chatbot, I have a simple keras using intents, should I spend time adding information to that json, or try to find an already made one, or should I use a different method?

exotic maple May 11, 2021, 1:14 AM

#

for any of the more experienced guys here, do you think joining a kaggle competition and getting a decent score (even if not top of the leaderboards) would good to showcase some python / DS proficiency?

sour abyss May 11, 2021, 3:26 AM

#

quick stats-related question - in AP statistics we learn to estimate with confidence with methods that are optimal as long as we are sampling less than 10% of our population. I read somewhere that if we can sample over 10%, then there are more optimal statistical procedures to estimate parameters (a confidence interval, for example). how do these methods differ from the ones used?

ebon geyser May 11, 2021, 6:43 AM

#

Anyone here who knows of a good and free AI module/API? except Chatterbot

celest python May 11, 2021, 6:58 AM

#

Anyone knows the difference of ggplot between python and R?

glacial sparrow May 11, 2021, 7:13 AM

#

do 'interpretable machine learning' and 'explainable artificial intelligence' mean the same thing?

ripe forge May 11, 2021, 8:03 AM

#

Yes, pretty much

uncut barn May 11, 2021, 8:50 AM

#

what are the different types of scenarios of when a NN overfits/underfits?

grave frost May 11, 2021, 9:39 AM

#

exotic maple for any of the more experienced guys here, do you think joining a kaggle competi...

kind of yeah. most people in the industry realize the worth of kaggle and other competeions

grave frost May 11, 2021, 9:40 AM

#

uncut barn what are the different types of scenarios of when a NN overfits/underfits?

low data, low model power respectively

uncut barn May 11, 2021, 9:41 AM

#

What about number of hidden units?

#

Or batch size

grave frost May 11, 2021, 9:42 AM

#

that is model power - the number of parameters

uncut barn May 11, 2021, 9:43 AM

#

Do hyperparms also affect over/underfitting?

grave frost May 11, 2021, 9:45 AM

#

uncut barn Do hyperparms also affect over/underfitting?

ofc

uncut barn May 11, 2021, 9:47 AM

#

grave frost ofc

Ok but would this depend on the data or is there a general explanation for each hyperparam?

grave frost May 11, 2021, 9:48 AM

#

uncut barn Ok but would this depend on the data or is there a general explanation for each ...

I recommend you read up the topic on the net - it's too deep to communicate via chat

#

knowledge about your hyperparameters would also help a lot

uncut barn May 11, 2021, 9:48 AM

#

Kl could you share a useful link

grave frost May 11, 2021, 9:49 AM

#

🤷 I did this a long time ago, so I don't remeber any resources. I guess google overfitting and underfitting and read every single article you get your hands on

uncut barn May 11, 2021, 9:49 AM

#

Alright

#

Thanks for ur help for the above

worldly ruin May 11, 2021, 11:56 AM

#

anyone have any luck installing pandas on an m1 mac?

uncut barn May 11, 2021, 12:04 PM

#

when the number of samples is less than the number of features does this mean we are prone to underfitting?

grave frost May 11, 2021, 12:05 PM

#

uncut barn when the number of samples is less than the number of features does this mean we...

not always - it could simply be that most of your features are useless, and only a few are useful that have a linear relationship (which can be easily mapped).

loud kindle May 11, 2021, 12:07 PM

#

what are you trying to overcome exactly?

grave frost May 11, 2021, 12:21 PM

#

~~ google lords, forgive me for sinning,~~ but Is it just me, or is pytorch very easy to ....... debug?

#

previously, I only did tensor manipulation with PT. but now, I am running a pytorch model repo which is basically shit - and I am finding it far easier to edit the files rather than change it on my side; which is smthing I would have never dared to do with TF

uncut barn May 11, 2021, 12:25 PM

#

yes it is @grave frost

true crag May 11, 2021, 12:41 PM

#

true crag I am trying to create a chatbot, I have a simple keras using intents, should I s...

anyone?^

grave frost May 11, 2021, 12:53 PM

#

anyone have any leads on this type of output tensor(-0., device='cuda:0', grad_fn=<NegBackward>) I am unable to locate the reason and the particular code

#

I assume it has something to do with my model gradients, since it seems to be at 0 validation accuracy

#

Hmm...I will try pre-processing my data again

main moat May 11, 2021, 1:09 PM

#

I just want to make sure what I have done so far makes sense, because it looks like I can't use any of the generic test train split method such as from sklearn.

My goal is to detect which user is typing based on samples from a typing test which captures key press and release times. There is a warmup prompt, then a real test. Regardless, all of the data for each user is put in one file, and looks like this:

1620322580238912200::1620322580342812900::h
1620322580421799400::1620322580502047400::e
1620322580559144700::1620322580677672600::l
1620322580741669700::1620322580861339900::l
1620322580878105600::1620322580967490900::o
1620322580984171600::1620322581046345000::
1620322581103447000::1620322581181768700::r
1620322582151222600::1620322582230004600::backspace
1620322582215080400::1620322582309995200::w
1620322586805930900::1620322586941929800::o
1620322587061273700::1620322587149182600::r
1620322587125172800::1620322587245511000::l
1620322587285746100::1620322587365764900::d```
`timeUp` and `timeDown` are in nanoseconds time.

#

For each user, I calculate what the text ended up looking like when they were finished with the prompt. I won't go into detail but I'm sure the algorithm that does this works as intended. I just want to give an overview of my whole process.

I put each user's table of data in a pd.DataFrame, and calculate the following:

timeDown of first key to timeUp of same key
timeUp of first key to timeDown of second key
timeUp between two consecutive keys
timeDown between two consecutive keys
count of times key was pressed

I add this info to a group of lists, organized by character typed. For example 'l' in the above example was pressed twice, so the following dictionary entries would be made:

'l_d_d_avg': [1620322580741669700 - 1620322580559144700, 1620322580878105600 - 1620322580741669700]
'l_d_u_avg': [1620322580677672600 - 1620322580559144700, 1620322580861339900 - 1620322580741669700]
'l_u_u_avg': [1620322580861339900 - 1620322580677672600, 1620322580967490900 - 1620322580861339900]
'l_count': 2```

#

All these calculations are stored as key-value pairs in a dict.
All users will not type exactly the same count of each character, but each dict entry is supposed to be a column. That means different users will give me different number of columns. So after looping through all the data, I replace the lists with average(list) , which greatly reduces the chance of not getting each character's stats. In other words, our attributes are character-focused, not key-event-focused. Any missing columns from this approach are unlikely, but can be imputed.

Now that I've shared all that, I can discuss what I want verification on. I want to make sure I'm splitting the train and test in a logical way. Because I need consecutive rows for my calculations, I can't just randomly split the rows for train and test, because consecutive rows may not be consecutive keystrokes if I do that, and it will throw off my calculations.

#

My Solution:
Currently, I take a percentage of unique, random indices (sampling without replacement) of the input rows before I start getting the character stats. Then while looping, I keep track of the index. If the current index is in that sample of indices, I copy the stats for that iteration to test_dict. If not, I copy them to train_dict. That way, train and test can both have averages from accurate data, but they will be computed from mutually exclusive data points.

lapis sequoia May 11, 2021, 1:43 PM

#

sklearn or tensorflow?

neon marsh May 11, 2021, 1:53 PM

#

https://www.youtube.com/watch?v=O1aZmy_YuL4&t=190s does anyone know where I can learn how to recreate this

YouTube

Peter Ma

MixPose

MixPose on NVIDIA Jetson NANO

▶ Play video

main moat May 11, 2021, 1:55 PM

#

@lapis sequoia sklearn

ebon geyser May 11, 2021, 2:01 PM

#

https://github.com/topics/gpt2-chatbot

Anyone here who can help me set this up so that, it acts like a simple chatbot?

GitHub

Build software better, together

GitHub is where people build software. More than 65 million people use GitHub to discover, fork, and contribute to over 200 million projects.

#

Ping me if u do plz

silver widget May 11, 2021, 2:52 PM

#

hi all.. I've been studying on hyperparameter tuning . Of course my helper is youtube and datacamp videos.
But I came up with a question here;
grid = GridSearchCV(estimator= rfc, param_grid = param_grid, cv=3, n_jobs=2, verbose=2)
That's my code for gridsearchCV
Should I use all my dataset for fit or still need to train_test_split the data again? If I need to split it again, what does cv=3 do?
Thanks in advence

#

*advance

void prism May 11, 2021, 3:20 PM

#

Hello everyone, I just found the Python server and I am really excited to see how many channels there are. I was wondering if there is an equivalent server for R, I tried searching for it but I couldn't find it. Thanks in advance

light rain May 11, 2021, 4:36 PM

#

{
"hi": "hello"
}
how can i get hi by the value hello in json

#

idk if this is the right channel lmao

sick wedge May 11, 2021, 4:54 PM

#

I just designed my first Neural Network Model with a CNN architecture, it is meant to detect masked faces.

I've just trained it through 10 epochs, but can anyone explain these values to me such as accuracy and val_accuracy? I understand pretty well what they mean but what should I be aiming for with those values?

#

1st epoch

#

10th epoch

#

isn't it bad that the val_accuracy stays at around 78%, whereas the accuracy has increased from 71% to 98%

#

btw I did test the model with 6 images and got 50% accuracy so far

grave frost May 11, 2021, 4:56 PM

#

you model is overfitting. you should aim for the highest val_accuracy and your accuracy should be around 4-6% your val accuracy

sick wedge May 11, 2021, 4:56 PM

#

okay, I heard that term before, but what does it mean 😅

#

it's when it fits the data well but can't make good predictions with new data right?

grave frost May 11, 2021, 4:57 PM

#

it means that your model is basically "memorizing" your input data

#

which is not good, we want it to find the pattern

sick wedge May 11, 2021, 4:57 PM

#

also did you mean it should be around + or - 4-6%, e.g. if i have 70% val accuracy i should have 65-75 accuracy

grave frost May 11, 2021, 4:58 PM

#

in the radius of 4-6% should apply to a lot fo cases

#

+- 4->6%

sick wedge May 11, 2021, 4:58 PM

#

okay

#

I almost thought u meant accuracy should be 4-6% of 70

sick wedge May 11, 2021, 4:58 PM

#

grave frost which is not good, we want it to find the pattern

so what's the best way to fix this? Better training data or improving the model?

grave frost May 11, 2021, 4:59 PM

#

sick wedge so what's the best way to fix this? Better training data or improving the model?

more data is an option, but it's infinitely easier to reduce the model complexity

sick wedge May 11, 2021, 4:59 PM

#

I think I'll learn more from the latter, so will try that 😄

#

my models really basic though

#

grave frost May 11, 2021, 5:00 PM

#

sick wedge my models really basic though

it might be, but it seems to be more than enough for your task

#

or you can add dropout layer

#

you can read up more about it on net, it's pretty easy to implement

sick wedge May 11, 2021, 5:01 PM

#

okay, you think that might fix it alone? or is that on top of simplifying the model

grave frost May 11, 2021, 5:01 PM

#

it should. if it doesn't then you can simply increase the effectiveness of dropout. that would def do it

sick wedge May 11, 2021, 5:02 PM

#

Okay 👍

#

tbh I thought it was just cuz my training data sucked, so that's good I don't have to find more

grave frost May 11, 2021, 5:03 PM

#

nah, it's fine if the network can overfit. Data scientists always first overfit on the data to see everythin works, then gradually increase the complexity of the model

sick wedge May 11, 2021, 5:04 PM

#

But you said I have to reduce the complexity of the model, why is it different in my case?

grave frost May 11, 2021, 5:04 PM

#

ah, ignore that part

#

I was just describing a debugging flow for Models. you can leave it

sick wedge May 11, 2021, 5:05 PM

#

okay sweet

#

I'll see about getting that dropout layer implemented then, thanks 🙂

ebon geyser May 11, 2021, 6:21 PM

#

Anyways, anyone who used ChatterBot library before?

mint palm May 11, 2021, 6:56 PM

#

#

the earlier algo was trained on regular birds that are often seen

#

so why am i wrong if i marked option 1?

#

is it because i would still have lower examples

desert oar May 11, 2021, 7:21 PM

#

this is a weird question because presumably data augmentation is part of your model training pipeline, so #2 would imply #1

chilly geyser May 11, 2021, 7:25 PM

#

It's also a weirdly high-level question

mint palm May 11, 2021, 7:42 PM

#

desert oar this is a weird question because presumably data augmentation is part of your mo...

ya but #1 seems correct (only drawback is we are only getting couple of thousand example......thats very less

#

so what would be the the perfect out of thenm

desert oar May 11, 2021, 7:42 PM

#

The city expects a better system from you within the next 3 months
Better get a telephoto lens and start birdwatching 😛

mint palm May 11, 2021, 7:43 PM

#

haha

desert oar May 11, 2021, 7:43 PM

#

The question says "which should you do first" (emphasis mine), which maybe helps narrow it down. What is option 3? is there an option 4+?

#

I wonder if option 2 is inherently wrong because you should be putting 800 of the images into train and 200 into test

mint palm May 11, 2021, 7:44 PM

#

4

desert oar May 11, 2021, 7:44 PM

#

yep, that. that's the correct answer

#

that's what i would personally want to do first

#

then you can answer the question of "how bad is our model at predicting these birds if we only have 1000 of them?"

mint palm May 11, 2021, 7:45 PM

#

2 are rulled out

desert oar May 11, 2021, 7:45 PM

#

what's the full statement of #3

#

also i assume this is a graded assignment from the past and you arent sneaking around rule 5 🙂

mint palm May 11, 2021, 7:46 PM

#

oh sorrrrrryyyy

#

mint palm May 11, 2021, 7:46 PM

#

desert oar also i assume this is a graded assignment from the past and you arent sneaking a...

mint palm May 11, 2021, 7:47 PM

#

desert oar also i assume this is a graded assignment from the past and you arent sneaking a...

this is true ya

desert oar May 11, 2021, 7:47 PM

#

yeah pedantically the right answer is #3 - make sure your evaluation metric takes the new bird species into account, before trying to actually do anything with the new bird data

#

its a weird question though

mint palm May 11, 2021, 7:48 PM

#

what is the main reason to rule out # 1

desert oar May 11, 2021, 7:48 PM

#

because it's not what you do first

#

it's part of the process, but not the first thing you should do

mint palm May 11, 2021, 7:49 PM

#

yeah but is it still acceptable solution later on?

desert oar May 11, 2021, 7:49 PM

#

sure, but that's not what the question is asking

#

(they should have bolded first)

mint palm May 11, 2021, 7:49 PM

#

ok

desert oar May 11, 2021, 7:50 PM

#

this is actually a good question and as you can see it tripped me up too. it almost seems designed for you to get it wrong and then learn from your wrong answer.

#

rather, the question is badly phrased but the intent behind it is good

mint palm May 11, 2021, 7:50 PM

#

so almost all are right but we first try new eval matrix to know if we can tweak the code and work without changing data sets?

mint palm May 11, 2021, 7:50 PM

#

desert oar this is actually a good question and as you can see it tripped me up too. it alm...

ya thanks

desert oar May 11, 2021, 7:51 PM

#

i think the idea is, if you add the new bird species to your evaluation metrics for the current model, you will have 2 things happen:

the model will no longer appear to (erroneously) underperform on known birds
the model will be total useless shit on the new birds

mint palm May 11, 2021, 7:52 PM

#

yeah

desert oar May 11, 2021, 7:52 PM

#

then you can iterate towards improving 2 while maybe accepting some decrease in 1 (e.g. if the birds are similar or they happen to look similar in pictures)

mint palm May 11, 2021, 7:53 PM

#

so we work after checking what initially we can do with what made earlier

desert oar May 11, 2021, 7:53 PM

#

i think that is the idea, yes. something like test-driven development but for machine learning.

#

presumably also you still care about overall accuracy too, so you might be looking at multiple metrics even if you're only using 1 metric in your cross val loop

mint palm May 11, 2021, 7:54 PM

#

ok got it ....i am quite sure it was beyond syllabus....but andrew did say it will be worth the thinking and experience

mint palm May 11, 2021, 7:55 PM

#

desert oar presumably also you still care about overall accuracy too, so you might be looki...

multiple metric parameter?

#

or literally multiple matric(wont that create multiple f1scores)

desert oar May 11, 2021, 7:59 PM

#

yeah maybe you want to at least keep track of accuracy, f1, and brier score

mint palm May 11, 2021, 8:00 PM

#

👍 brier score is still left to be covered by me

desert oar May 11, 2021, 8:04 PM

#

some references:
https://www.fharrell.com/post/classification/
https://www.fharrell.com/post/class-damage/
https://stats.stackexchange.com/q/312780/36229
https://stats.stackexchange.com/q/359909/36229
https://neptune.ai/blog/brier-score-and-model-calibration
https://machinelearningmastery.com/probability-metrics-for-imbalanced-classification/

mint palm May 11, 2021, 8:04 PM

#

thank you very much

grave frost May 11, 2021, 9:00 PM

#

I think honestly, we can still make an AI that can code very well without reaching the AGI mark

#

since code would just be a set of instructions, we have an easy one-to-many relationship to map code with instructions performed. Once we have this loss metric, then it becomes easier to arrange research in a way to minimize that metric

#

the point won't be to write code fully autonomously, but to understand what a dev has already written and help them write much much more thereby reducing the programmers required, effort and money put in by companies for projects.

#

code is avaiable in plentiful, and we already have automatic documenter that can document each function so there's that

#

I think coding itself is ripe for automation 🤑

granite wolf May 11, 2021, 9:06 PM

#

anyone happen to be good with SQLAlchemy?

exotic maple May 11, 2021, 9:10 PM

#

grave frost I think coding itself is ripe for automation 🤑

ofc it is, there are plenty of no-code tools that can easily challenge or nullify the skills of 50% of devs

#

does it replace all devs? Not even close, just as a "coding AI" wouldnt make all devs obsolete either.

#

but id be surprised if a FAANG wasn't working on something like that

grave frost May 11, 2021, 10:32 PM

#

exotic maple ofc it is, there are plenty of no-code tools that can easily challenge or nullif...

you think too narrowly, my friend

#

I don't mean no-code; I mean generating most of the code

exotic maple May 11, 2021, 10:33 PM

#

grave frost you think too narrowly, my friend

I want my AI powered genetically engineered foxgirls.

#

Is that broad enough? 😳

grave frost May 11, 2021, 10:34 PM

#

mostly, the reason why I think we can acheive automated programming before AGI is that you can't really measure effectiveness of AGI - it's not a simple task to test and achieve best results on. But for automating coding, we can easily create a loss function. what's left is just to optimize it

#

so it's just the modelling part. and seeing the ingenuity from transformers, it does seem like a possibility

chilly geyser May 11, 2021, 10:35 PM

#

'easy loss function' for coding what

#

What losses would you be considering for that?

#

All losses essentially describe the structure of perfection and I'd say it's quite difficult to describe what is 'perfect code.'

Even if you just want the structure of something 'good enough' - what is 'good enough' code?

grave frost May 11, 2021, 10:37 PM

#

chilly geyser All losses essentially describe the structure of perfection and I'd say it's qui...

something that works well, and a loss that incorporates Big O wouldn't be too bad

#

mostly, something that runs. you guys are missing the point - the aim is not to fully automate the developers but drastically reduce the amount of developers required. this strategy plays out in almost every innovation and no reason for it to not work this time too

chilly geyser May 11, 2021, 10:40 PM

#

That's somewhat true, but despite Humans Need Not Apply being released in 2015, I feel like the economic revolution hasn't happened or would happen soon

grave frost May 11, 2021, 10:40 PM

#

humans always overestimate future; and CGP grey is not an AI expert

#

I would be willing to bet my every nickel that it would happen in my lifetime

chilly geyser May 11, 2021, 10:41 PM

#

FWIW, I think a no-licenses-required GPT-3 will transform the whole industry, especially as I think it's very end-user friendly

#

It can do stuff like "give me this website design" or "make sure no contrast decisions are bad"

#

Which is already quite a lot of freelance work I'd say?

grave frost May 11, 2021, 10:42 PM

#

it def would not; it suffers too much from biases. it would require some regulation but yeah, your general hypothesis is correct imo

#

again, you overestimate capabilities of GPT3

#

it's just a giant overfitted model

#

it's not supposed to be for techncial CS tasks or medical ones

#

only simple languages - and their usage

chilly geyser May 11, 2021, 10:43 PM

#

I'm not talkingn technical CS or medical

#

Simple language and simple work is a lot of work

#

Deploy this webapp, make a website showcasing X

grave frost May 11, 2021, 10:44 PM

#

yeah, GPT3 can do that. give it notes on news, and you have a news article for that news in the style of a editor

chilly geyser May 11, 2021, 10:44 PM

#

I mean the fact is most work is boring work

#

And GPT3 has the capability to replace that

grave frost May 11, 2021, 10:45 PM

#

yeah, it can. and it has happened

#

a lot of newspapers have these AI-generate low-level articles at the end (for testing)

#

what's your point? it seems you are proving my own point to demonstrate what models are taking over specific industries

chilly geyser May 11, 2021, 10:47 PM

#

Just discussion

grave frost May 11, 2021, 10:50 PM

#

ahh. and consider the fact that oh, "LAnGUaAgEs aRe HaRD" by the non-technical people few years back. they overestimated the complexity of language. while GPT-3 can't write deep books or win a nobel, it can take over quite a few jobs.

I suspect the same thing is gonna happen. programmers think their coding is pretty complicated task, but then it might later turn out that it wasn't as much as you though - tho by the time most realize, they would jobless 🤷

#

if companies are paying for 10 experts at 100,000$, and that product saves them the seats of 5, then they only have to pay like half a million rather than a whole million

chilly geyser May 11, 2021, 10:51 PM

#

What about human problems like https://www.youtube.com/watch?v=y8OnoxKotPQ

YouTube

KRAZAM

Microservices

it's because of the way our backend works

// https://www.instagram.com/krazam.tv

// https://twitter.com/krazamtv

▶ Play video

#

tl;dw, weird deprecation issues, and also possibly weird architected systems

#

Also it's just funny so I like posting lol

grave frost May 11, 2021, 10:53 PM

#

I know next to nothing about web-dev but I really don't see why same AI can't generate code to interface with microservices??

grave frost May 11, 2021, 10:54 PM

#

chilly geyser tl;dw, weird deprecation issues, and also possibly weird architected systems

again, I said that NOT ALL PROGRAMMERS would be replaced - only the bulk. that's how new products come into markets. but even say, a 20% loss in jobs can drastically decrease your salary

#

you have to compete with 20% more people who would work more at lower salary, so bye-bye money and perks

limpid saddle May 11, 2021, 10:56 PM

#

Hello, I need a bit of help with a kaggle project
First of all, this was supposed to take the Phrases and cut them down into words then put it in the same row, so why is it different?

slate hollow May 11, 2021, 11:12 PM

#

hey

#

import os
import random
import tensorflow as tf


def process_text(text: tf.Tensor, cutoff_len: int = 300, word_len: int = 100, pad: str = 'asdf') -> tf.Tensor:
    print(text, text.numpy())
    initial = tf.strings.lower(tf.strings.substr(text, 0, cutoff_len))
    no_special = tf.strings.regex_replace(initial, r'[^\w\s*]', '')
    brs_removed = tf.strings.regex_replace(no_special, r'<br\s*/?>', ' ')
    split_up = tf.strings.split(brs_removed)[:-1][:word_len]
    # tf.pad(split_up, [[0, max(0, word_len - len(split_up))]], "CONSTANT", constant_values=tf.constant(pad))
    return split_up  # remove the trailing word because it might be cutoff


raw_data = ([[[1, 0], open(f'data/aclImdb/train/neg/{f}', encoding='utf-8').readline().strip()]
             for f in os.listdir('data/aclImdb/train/neg')] +
            [[[0, 1], open(f'data/aclImdb/train/pos/{f}', encoding='utf-8').readline().strip()]
             for f in os.listdir('data/aclImdb/train/pos')])
random.shuffle(raw_data)

values = [d[0] for d in raw_data]
reviews = [d[1] for d in raw_data]
data = tf.data.Dataset.from_tensor_slices((values, reviews))
data = data.map(lambda v, r: (v, process_text(r)))

#

(the /neg and /pos) directories are just a bunch of txt files

#

so it's some simple strings

#

the thing is, i get this weird error: AttributeError: 'Tensor' object has no attribute 'numpy'

#

i googled it, and they said eager execution should solve this, but they also say tf2 (which is the one i'm using) has eager execution enabled by default so

velvet thorn May 11, 2021, 11:41 PM

#

slate hollow i googled it, and they said eager execution should solve this, but they also say...

wrap the map argument in tf.py_function

slate hollow May 11, 2021, 11:41 PM

#

yeah i tried that, but what should i pass to input and outputT

#

passing a raw tf.Tensor doesn't seem to work

fiery coyote May 12, 2021, 12:47 AM

#

Hello everyone, everything good?

Could someone guide me to solve a problem?

I have a text (recipes) where I need to identify the words that are the ingredients within it, how can I solve this problem? Could someone guide me?

Thanks!

serene scaffold May 12, 2021, 1:05 AM

#

fiery coyote Hello everyone, everything good? Could someone guide me to solve a problem? I ...

if you didn't already know, the problem you are trying to solve is called "named entity recognition". That might give you an entry point for finding more resources.

fiery coyote May 12, 2021, 1:11 AM

#

serene scaffold if you didn't already know, the problem you are trying to solve is called "named...

Thank you so much!!! I didn't know !! This will help me a lot to start research!

fiery coyote May 12, 2021, 1:12 AM

#

serene scaffold if you didn't already know, the problem you are trying to solve is called "named...

do you indicate any specific reading? Any material or website for this?

serene scaffold May 12, 2021, 1:12 AM

#

fiery coyote do you indicate any specific reading? Any material or website for this?

I'm not sure

sour abyss May 12, 2021, 2:43 AM

#

quick stats-related question - in AP statistics we learn to estimate with confidence with methods that are optimal as long as we are sampling less than 10% of our population. I read somewhere that if we can sample over 10%, then there are more optimal statistical procedures to estimate parameters (a confidence interval, for example). how do these methods differ from the ones used?

desert oar May 12, 2021, 3:02 AM

#

sour abyss quick stats-related question - in AP statistics we learn to estimate with confid...

they are probably referring to the "finite population correction" that can be applied to certain situations
https://stats.stackexchange.com/q/401763/36229
https://web.ma.utexas.edu/users/mks/M358KInstr/TenPctCond.pdf

Cross Validated

10% rule for sample sizes

In an introductory stats book by Nicole Radziwell "Statistics the easy way with R" , an assumption used for nearly every statistical test (e.g.t-tets, anova, etc) is that the sample size should no...

exotic maple May 12, 2021, 3:42 AM

#

desert oar they are probably referring to the "finite population correction" that can be ap...

whoa holy crap I feel old. I havent seen this correction in nearly a decade hahaha

#

do you remember the criterias for determining finite or infinite populations? I only ever saw it i the context of manufacturing and populations are assumed infinite always (you dont have a production ceiling)

desert oar May 12, 2021, 3:44 AM

#

the a 10% rule mentioned above is as good as any, i think

#

i don't think i've ever had to use the FPC "in anger" although in hindsight i probably should have in some cases

sour abyss May 12, 2021, 4:59 AM

#

desert oar they are probably referring to the "finite population correction" that can be ap...

read the posts, but i'm still a bit confused. does it basically mean that, because we are sampling without replacement, this can be harmful to the independence of each sample?

#

i think that's where the bit about infinity comes in, if N = infinity then this isn't a problem

#

so the finite population correction can take care of this in confidence intervals by multiplying the "original" margin of error by sqrt((N - n)/N) i think, right?

exotic maple May 12, 2021, 5:07 AM

#

sour abyss read the posts, but i'm still a bit confused. does it basically mean that, becau...

short story, yeah something like that.

olive lichen May 12, 2021, 5:19 AM

#

hey all, I'm working on building a text classifier using nltk and was hoping to get some guidance from someone. is anyone free to talk a bit about it?

weary summit May 12, 2021, 7:45 AM

#

Hello,
I have a 2d numpy array, and I would like to evaluate the following equation.
Is there a numpy like(vectorized) way to implement this evaluation rather than just iterating over the values?

Bonus question, is there a way(vectorized again) to save each result of the product inside a new 1d numpy array?
Thanks in advance

#

true crag May 12, 2021, 8:44 AM

#

I am using keras, tensorflow to make a simple AI bot, can u recommend any better data processing method?

#

I need to make a simple chatbot that ll interact with the user, keep a conversation... Is that process ok? Can I achieve that by adding more data to my training session?

iron basalt May 12, 2021, 9:14 AM

#

weary summit

>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([1, 2, 3, 4, 5])
>>> m1 = x[1:] * y[:-1]
>>> m2 = x[:-1] * y[1:]
>>> s = 0.5 * np.sum(m1 - m2)

#

Where s is the final half sum result, m1 is the first elementwise-multiplication and m2 the second.

weary summit May 12, 2021, 9:25 AM

#

iron basalt ```python >>> x = np.array([1, 2, 3, 4, 5]) >>> y = np.array([1, 2, 3, 4, 5]) >>...

Thanks, I think that would probably work.
Though, I was thinking that m1, m2 should look something like:
m1= x[:-1] * y[1:]
m2 = x[1:] * y[:-1]

iron basalt May 12, 2021, 9:33 AM

#

weary summit Thanks, I think that would probably work. Though, I was thinking that m1, m2 sho...

yeah the ranges are flipped for x and y

uncut barn May 12, 2021, 9:44 AM

#

Anyone know why there are 3 conv layers in a row; this is Alexnet?

bold timber May 12, 2021, 10:01 AM

#

Hi everybody

pallid parcel May 12, 2021, 10:07 AM

#

How do i rename these columns in pandas? Im using pandas.read_json() to read the data.json { "c": [ 120.57, 120.52, 120.54 ], "h": [ 120.75, 120.63, 120.75 ], "l": [ 120.54, 120.46, 120.48 ], "o": [ 120.74, 120.58, 120.53 ], "s": "ok", "t": [ 1615302420, 1615302480, 1615302540 ], "v": [ 483063, 498590, 516948 ] }

bold timber May 12, 2021, 10:11 AM

#

I have a question: How I know 'spain' equal to [0.0 0.0 1.0 27.0 48000.0]?

#

whether a binary number place as a randomly? Please give me the clue. I'm so confuse to understand it

lapis sequoia May 12, 2021, 11:39 AM

#

bold timber I have a question: How I know 'spain' equal to [0.0 0.0 1.0 27.0 48000.0]?

so basically OHE transforms your spain into a matrix

#

one good example would be if we have a list of variables

#

list = ['a', 'b', 'c']```

#

if we pass it through enc it gives us

#

a
will be
[1, 0, 0]

#

it basically takes all your strings and tranforms them into a matrix

#

so what in reality you are seeing is this

#

(don't mind this just a line filler)
[0, 0, 1, 27, 48000]
^ ^ ^ ^ ^ ^ ^^^^^^^^
Spain Age. Salary

#

btw for a started I'd recommend never to use fit_transform as it might confuse you
just use fit and then transform

bold timber May 12, 2021, 12:43 PM

#

lapis sequoia if we pass it through enc it gives us

what is 'enc'?

bold timber May 12, 2021, 1:23 PM

#

lapis sequoia `a` will be [1, 0, 0]

Why i get two 0 at the first place in 'Spain'?

broken quail May 12, 2021, 1:38 PM

#

Hello everyone, i am new for data science (2 month studied claning data and simple machine learning, and little bit streamlit)

Anyone had a suggestion simple project for portofolio ?

uncut barn May 12, 2021, 2:07 PM

#

In LeNet from 6x14x14 ---> 16x10x10 via convolution, would the total number of parameters be 16x((6x14x14) + 1) where 6x14x14 is the dimension of each kernel?

hoary wigeon May 12, 2021, 2:09 PM

#

I need help

#

asap

#

I got an dataset for assignment, I don't want you to complete my assignment
I want some hint to explore dataset

Dataset : HousingPrice

How can i start finding insights ?

#

I have related time with price, Year with price/area

#data-science-and-ml

4

2 are rulled out