grave frost Sep 28, 2020, 6:55 PM

#

oh - Dense does segmernation- is that what you mean?

#

FC layers are just another term in TensorFlow/Keras to indicate Dense layers

#

oh, I thought Fully connected.

brittle agate Sep 28, 2020, 6:57 PM

#

Fuck.

#

Man sorry.

#

That's my bad.

grave frost Sep 28, 2020, 6:57 PM

#

I don't see the similarity between a conv layer and Dense

brittle agate Sep 28, 2020, 6:58 PM

#

I messed up with R-CNN.

grave frost Sep 28, 2020, 6:58 PM

#

Ah- np

brittle agate Sep 28, 2020, 6:58 PM

#

Sorry, that I'm confused you.

#

;(

grave frost Sep 28, 2020, 6:59 PM

#

np

dire acorn Sep 28, 2020, 6:59 PM

#

Hey guys anyone have time for a quick question?

brittle agate Sep 28, 2020, 6:59 PM

#

Yes.

grave frost Sep 28, 2020, 7:00 PM

#

@modern hatch My only problem with using cross-entropy loss was that the accuracy was 0.0000e+ and the loss was negative. I think that it may be due to some other factor. Could you provide some ideas to fix that?

dire acorn Sep 28, 2020, 7:01 PM

#

I need to turn the numbers generated by the labelencoder() back into words for easy readability and I am not sure how to

brittle agate Sep 28, 2020, 7:01 PM

#

I can't help with it, sorry.

dire acorn Sep 28, 2020, 7:02 PM

#

Oh no!

modern hatch Sep 28, 2020, 7:02 PM

#

@grave frost I wouldn't worry about your loss function but your problem formulation. Why should this work? Come up with a simple, made-up encryption method and see if you can get a decent result on that

grave frost Sep 28, 2020, 7:03 PM

#

@modern hatch Well it is just an experimentation, lemon_swag don't get all hyped up 🙂

#

My problem was regarding accuracy. Even for a random guess, it is not correct

modern hatch Sep 28, 2020, 7:04 PM

#

I'm not, just trying to be constructive

#

I mean, even predicting the plain text from a "cypher" that permutes the input isn't trivial

grave frost Sep 28, 2020, 7:05 PM

#

ofc it isn't

#

That's the whole point of a cypher

#

Hmm... I think there is an error with my validation data generator. Will debug it tomorrow to see if it works.

proven kite Sep 28, 2020, 8:18 PM

#

someone tell me whats wrong with my implementation of data science?

📎 Screen_Shot_2020-09-28_at_3.43.17_PM.png

#

i mean linear regression lmao

serene scaffold Sep 28, 2020, 8:18 PM

#

My AI prof has the formula Sigma[i=1, n](w_i x_i) where w and x are both 1x3 arrays

proven kite Sep 28, 2020, 8:18 PM

#

pls help

serene scaffold Sep 28, 2020, 8:19 PM

#

is that basically the sum of the elements for the inner product of w and x?

#

@proven kite what is this code doing that isn't what you wanted?

proven kite Sep 28, 2020, 8:21 PM

#

wrong out put

#

ill send u a screenshot

#

actually fuck hang on

#

heres the full thhing from the jupyter notebook

📎 Screen_Shot_2020-09-28_at_4.22.04_PM.png

ripe forge Sep 28, 2020, 9:14 PM

#

What's the correct formula? As it stands, this is a logical error yes?

#

Tough to answer without knowing what the code is intended to write out.

gentle widget Sep 28, 2020, 10:22 PM

#

.help

solid aurora Sep 29, 2020, 1:34 AM

#

Do y'all know of a TensorFlow quickstart that includes reading images from files (not the tf.data module), performing augmentations, and training a model using them?

#

I just need example code to look at because I've never used the TensorFlow API before

hasty grail Sep 29, 2020, 1:39 AM

#

Is there a particular reason why you are avoiding tf.data?

solid aurora Sep 29, 2020, 2:14 AM

#

@hasty grail as far as I understand, tf.data is sample datasets

#

but maybe I'm misunderstanding?

#

I'm probably thinking of sklearn.datasets

rustic apex Sep 29, 2020, 2:33 AM

#

I created a “task” to create a jupyterlab project and start it, but, how can I also include to install Numpy and Pandas as well?

deft harbor Sep 29, 2020, 2:38 AM

#

@solid aurora no

#

@rustic apex how did you install jupyterlab?

rustic apex Sep 29, 2020, 2:41 AM

#

@deft harbor that installed fine. It creates a ENV, Jupyterlab project and starts it, but how do I then have it contribute and install np and pd?,

#

When I run my task of installing both, they do, but I want that to install them all in one command of the whole project

deft harbor Sep 29, 2020, 2:43 AM

#

Did you use anaconda, pip, or something else?

solid aurora Sep 29, 2020, 2:43 AM

#

@deft harbor so I found tf.keras.preprocessing.image_dataset_from_directory, but I'm kinda stuck on tensorflow 2.0.0 which doesn't seem to have that

#

is there a similar way to load images in the TF 2.0.0 api?

rustic apex Sep 29, 2020, 2:45 AM

#

@deft harbor i used pip3

deft harbor Sep 29, 2020, 2:50 AM

#

@rustic apex ```
pip3 install pipenv
mkdir task
cd task
pipenv installed jupyterlab pandas numpy
pipenv shell
jupyterlab lab

#

Then just use import using basic commands inside jupyter

#

@solid aurora I used from_tensor_slice last project

rustic apex Sep 29, 2020, 2:52 AM

#

@deft harbor do I need to use pipenv? I haven’t used that command. I use:
Python3 -m venv venv
Python source ./venv/bin/activate

deft harbor Sep 29, 2020, 2:53 AM

#

No, but its the way I find to work with projects outside conda

solid aurora Sep 29, 2020, 2:53 AM

#

@deft harbor ok, so I need to load the images and convert to tensors myself

deft harbor Sep 29, 2020, 2:53 AM

#

Makes it easy to deploy to production

solid aurora Sep 29, 2020, 2:53 AM

#

^

#

@rustic apex can't you just do pip3 install pandas numpy tho??

deft harbor Sep 29, 2020, 2:55 AM

#

There is a tensforflow hub way of doing it, bit I've never tried that. I think its google just trying to get my data. lemon_thinking

solid aurora Sep 29, 2020, 2:56 AM

#

@deft harbor uh is that method uploading your data to tensorflowhub and then downloading it?

#

ngl that seems roundabout and dumb

#

ugh I think I might as well figure out how to get tensorflow 2.3.0 working

#

all the tensorflow tutorials I find are for 2.3.0

rustic apex Sep 29, 2020, 2:56 AM

#

@solid aurora yes, but I would like that to be installed within the task that creates the project

solid aurora Sep 29, 2020, 2:57 AM

#

wdym "task"??

#

i've never used jupyterlab before, is it a jupyterlab-specific term?

#

jupyter notebooks definitely don't have tasks

rustic apex Sep 29, 2020, 3:01 AM

#

@solid aurora a task is in VSCode, it’s a json file

solid aurora Sep 29, 2020, 3:01 AM

#

ok paste the code for that json file here please

#

also how are jupyterlab and vscode working together, are you using the jupyter notebook view of vscode?

rustic apex Sep 29, 2020, 3:06 AM

#

@solid aurora I’m not by my computer now.... i looked I there’s a “delendsOn” part of the code, and then I guess it links to the task that’s being triggered.

solid aurora Sep 29, 2020, 3:06 AM

#

@rustic apex ok we can't help without seeing the task

#

in future, always post code

hasty grail Sep 29, 2020, 3:50 AM

#

@solid aurora If you're unwilling to upgrade to TF2.3 you can always look at the source code for the function and copy the operations into your code

solid aurora Sep 29, 2020, 3:50 AM

#

Hmm not a bad idea

lapis sequoia Sep 29, 2020, 4:12 AM

#

📎 unknown.png

#

can someone please help me with this

#

the error thrown is : AxisError: axis 2 is out of bounds for array of dimension 2

hasty grail Sep 29, 2020, 4:14 AM

#

what is normal_img?

#

can you print out its shape?

lapis sequoia Sep 29, 2020, 4:14 AM

#

yeah sure its (1125, 1600)

#

is it because of absence of colour channels?

hasty grail Sep 29, 2020, 4:15 AM

#

yeah

#

according to the docs it has to be 3-D

lapis sequoia Sep 29, 2020, 4:15 AM

#

so should I stop using Image data gen?

hasty grail Sep 29, 2020, 4:15 AM

#

you can just add a color channel

lapis sequoia Sep 29, 2020, 4:15 AM

#

how ?

#

by reshaping ?

hasty grail Sep 29, 2020, 4:16 AM

#

yeah

lapis sequoia Sep 29, 2020, 4:16 AM

#

to (1125, 1600,1)?

hasty grail Sep 29, 2020, 4:16 AM

#

or newaxis

#

or expand_dims

#

take your pick

lapis sequoia Sep 29, 2020, 4:16 AM

#

pic?

hasty grail Sep 29, 2020, 4:16 AM

#

reshape, newaxis and expand_dims all work

#

"take your pick" means choose any of them

lapis sequoia Sep 29, 2020, 4:16 AM

#

oh will try it

#

thanks!!!

hasty grail Sep 29, 2020, 4:17 AM

#

np

lapis sequoia Sep 29, 2020, 4:18 AM

#

should i consider reshaping all the images?

hasty grail Sep 29, 2020, 4:18 AM

#

if you want to use the generator on each image then yeah

#

and have to add a color channel when feeding it to the model anw

lapis sequoia Sep 29, 2020, 4:19 AM

#

its 1 right ?

hasty grail Sep 29, 2020, 4:19 AM

#

yes

lapis sequoia Sep 29, 2020, 4:19 AM

#

i thought of doing math to add the color channel 3

#

😅

hasty grail Sep 29, 2020, 4:19 AM

#

you're overcomplicating it 😛

lapis sequoia Sep 29, 2020, 4:19 AM

#

yeah ikr

#

i just fed up!

#

thanks for your help today!

hasty grail Sep 29, 2020, 4:20 AM

#

🙂

lapis sequoia Sep 29, 2020, 4:23 AM

#

thanks man it worked! @hasty grail

hasty grail Sep 29, 2020, 4:28 AM

#

np

olive lichen Sep 29, 2020, 5:16 AM

#

hello, first foray into coding. working with nltk. would anyone here be interested in helping me with some homework questions?

velvet thorn Sep 29, 2020, 6:31 AM

#

hello, first foray into coding. working with nltk. would anyone here be interested in helping me with some homework questions?
@olive lichen don't ask to ask, just ask.

olive lichen Sep 29, 2020, 6:50 AM

#

@velvet thorn good to know for next time when i'm not sleepy, thank you

zealous ermine Sep 29, 2020, 7:29 AM

#

Can we ask about gpu requirements for machine learning here?

cedar sky Sep 29, 2020, 7:32 AM

#

Can we ask about gpu requirements for machine learning here?
@zealous ermine Ofcourse you can

zealous ermine Sep 29, 2020, 7:34 AM

#

Yay 🙂 ok so for machine learning, why is it recommended to have more VRAM? Does it just make stuff faster?

#

Like do I need to store the whole model in vram, or can I store it partially in ram, and it’s just faster if the whole thing is in vram?

odd yoke Sep 29, 2020, 7:39 AM

#

some models take up a lot of VRAM, thus, simply can't run on smaller GPUs

#

You would store the entire model in (V)RAM

#

in fact i don't think i've ever used a model that was below 12GB when used with a batch size of 16/32

#

except toy stuff

#

@zealous ermine, ping, in case you left

zealous ermine Sep 29, 2020, 7:43 AM

#

Sry back (good ping)

#

So I need to store the whole model in vram?

#

I can’t have some of it in vram and the rest in ram?

odd yoke Sep 29, 2020, 7:46 AM

#

You could do that, but that would tank performances

zealous ermine Sep 29, 2020, 7:47 AM

#

Ah :/

#

When u say you’ve never used a model below 12GB, how big are your models on average?

odd yoke Sep 29, 2020, 7:48 AM

#

moving arrays back and forth between the GPU <-> CPU is extremely expensive, especially if you do that multiple times per iteration, i wouldn't be surprised if it gave you worse performances than just using the CPU directly
~20GB

#

+- 4 on average i'd say

#

needless to say we don't use GTX cards

zealous ermine Sep 29, 2020, 7:50 AM

#

Would you use a 3090?

odd yoke Sep 29, 2020, 7:50 AM

#

I would probably love to, but I don't make that choice

#

rn we use quadro cards

#

or whatever is available on GCP

zealous ermine Sep 29, 2020, 7:51 AM

#

Quadros are a little expensive, especially since you can’t also use them for gaming 😂

odd yoke Sep 29, 2020, 7:52 AM

#

well you can, but the speed/price ratio is bad

#

if it's for personal use, don't worry about it really, i have a 1650 at home and do just fine

#

admittedly i don't do heavy stuff of course

zealous ermine Sep 29, 2020, 7:53 AM

#

gtx1650?

odd yoke Sep 29, 2020, 7:53 AM

#

yes

#

completely fine for small POCs

zealous ermine Sep 29, 2020, 7:54 AM

#

I’m trying to decide if I go with 3080 which is reasonably priced, or 3090 because i’ll eventually want the vram

#

But sounds like 3080 is fine

odd yoke Sep 29, 2020, 7:55 AM

#

it also depends on what you want to do as well, are you planning on training DL models on video or large text corpora ? or mostly use "traditional" statistical models like regressions, svm, etc ?

zealous ermine Sep 29, 2020, 7:55 AM

#

Kinda wanna do video stuff

#

Which ig needs more

native bay Sep 29, 2020, 8:10 AM

#

Hey guys how can I start with ML and AI

lapis sequoia Sep 29, 2020, 9:19 AM

#

Are you familiar with data analysis?

lapis sequoia Sep 29, 2020, 10:11 AM

#

For those that are complete beginners in Data Science or Machine Learning, i have made a small Youtube channel where i upload weekly videos teaching Data Science:

#

https://www.youtube.com/channel/UCiFF3AvbzLWdRyRnQMEttqw/videos

YouTube

Mazen Ahmed

Weekly videos teaching Data Science.

#

If you find any of my videos to be useful please consider subscribing, that would be a great help!

cedar sky Sep 29, 2020, 11:17 AM

#

Hey guys how can I start with ML and AI
@native bay I think Andrew Ng's ML course is a great way to start

cedar sky Sep 29, 2020, 12:09 PM

#

For those that are complete beginners in Data Science or Machine Learning, i have made a small Youtube channel where i upload weekly videos teaching Data Science:
@lapis sequoia Great man.... Try growing your channel and I have subscribed to it

lapis sequoia Sep 29, 2020, 12:09 PM

#

Will do thanks for the motivation

cedar sky Sep 29, 2020, 12:11 PM

#

I really liked the way you went about the subject.. The presentation was great

#

Shall I post your channel link in another ML community?

lapis sequoia Sep 29, 2020, 12:39 PM

#

@lapis sequoia cool videos

keen root Sep 29, 2020, 2:31 PM

#

Hi, this is probably a stupid question, but can anyone help me understand how does this model differ from a simple perceptron?

📎 unknown.png

odd yoke Sep 29, 2020, 2:37 PM

#

a perceptron would only have the last layer

#

the simplest implementation of a perceptron in keras would be py model = Sequential() model.add(Dense(N))

solid aurora Sep 29, 2020, 2:42 PM

#

The importance of VRAM is allowing the GPU to store the training data in VRAM rather than in regular RAM, which is more higher latency to the GPU

#

@zealous ermine

odd yoke Sep 29, 2020, 2:43 PM

#

yes it does

solid aurora Sep 29, 2020, 2:43 PM

#

which btw, the model itself is probably quite small

#

ok yea fair enough both need to

#

but like, the data itself is really the limiting factor in training time

odd yoke Sep 29, 2020, 2:44 PM

#

that's because the number of parameter in the model is inherently tied to the size of the input

lapis sequoia Sep 29, 2020, 3:00 PM

#

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-67-e45e4d54822b> in <module>()
      1 res = model.fit(image_data_gen.flow(X_train,y_train,batch_size=batch_size),
      2                 validation_data = (X_test,y_test),
----> 3                 epochs=epochs)

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError:  Can not squeeze dim[2], expected a dimension of 1, got 2
     [[node binary_crossentropy/remove_squeezable_dimensions/Squeeze (defined at <ipython-input-67-e45e4d54822b>:3) ]] [Op:__inference_train_function_50516]

Function call stack:
train_function```

#

can somebody help me with this error ?

#

please?

#

i have used model.add(Flatten()) also to squeeze

#

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 222, 222, 32)      896       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 111, 111, 32)      0         
_________________________________________________________________
dropout_10 (Dropout)         (None, 111, 111, 32)      0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 109, 109, 64)      18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 54, 54, 64)        0         
_________________________________________________________________
dropout_11 (Dropout)         (None, 54, 54, 64)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 52, 52, 128)       73856     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 26, 26, 128)       0         
_________________________________________________________________
dropout_12 (Dropout)         (None, 26, 26, 128)       0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 86528)             0         
_________________________________________________________________
dense_16 (Dense)             (None, 128)               11075712  
_________________________________________________________________
dense_17 (Dense)             (None, 1)                 129       
=================================================================
Total params: 11,169,089
Trainable params: 11,169,089
Non-trainable params: 0
_________________________________________________________________

#

this is my model.summary

#

is the data preprocessed and it looks good?

#

yeah i checked the shape !

#

(3066, 224, 224, 3)

#

x_train's shape

#

(3066, 2, 2)

#

y_train's

#

(3066, 2, 2)?

#

yes

keen root Sep 29, 2020, 3:12 PM

#

Thank you

yes it does
@odd yoke

lapis sequoia Sep 29, 2020, 3:13 PM

#

the output is 2 * 2 ___?

#

which output ?

#

y_train

#

what is the label a image?

#

with mask and without mask

#

its a classification prob

#

I mean you provide input and what is the ouput?

#

a class?

#

no when i try to make .fit it gives an error

#

i cant see any output

#

it just says dim error

#

maybe link the notebook?

#

are you using colab?

#

yeah!

#

colab!

#

yeah share the notebook

#

download and share?

arctic wedgeBOT Sep 29, 2020, 3:16 PM

#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .flac, .afdesign, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia Sep 29, 2020, 3:16 PM

#

📎 unknown.png

#

wot

#

it deleted the notebook link

#

yeah there is it

#

🤦‍♂️

#

https://colab.research.google.com/drive/1IdYLawJmPYHPhSKD3fDiditNWfbUDY1w?usp=sharing

Google Colaboratory

#

what happened ?

#

nothing it didnt allow to upload notebook files

#

which one ? !

#

oh yeah!

#

the server

#

hmm the code looks fine to me GWgoaThinken

#

same here but why its showing the damn error?

#

an you find ?

#

can you ?

#

lemme try no guarentee tho

#

its ok though thanks a lot !

#

i think there is a prob with y_train right ?

#

i never worked with 3 dims for labels

#

image_data_gen = ImageDataGenerator(rotation_range=20,
                                    height_shift_range=0.2,
                                    width_shift_range=0.2,
                                    shear_range = 0.15,
                                    horizontal_flip = True,
                                    zoom_range =0.15,
                                    fill_mode = 'nearest')

#

something wrong here?

#

image_data_gen = ImageDataGenerator(rotation_range=20,
                                    height_shift_range=0.2,
                                    width_shift_range=0.2,
                                    shear_range = 0.15,
                                    horizontal_flip = True,
                                    zoom_range =0.15,
                                    fill_mode = 'nearest',
                                    class_mode='binary')

#

I added class_mode='binary'

#

yeah!!!

#

god!

#

did it work?

#

no its restaring kernel

#

restarting

#

oh ok

#

i am an idiot!

#

keep forgetting some params!

#

😅

#

TypeError: init() got an unexpected keyword argument 'class_mode'

#

bruh imagine remembering them

#

lol yeah

#

wait

#

how?!

#

i just remember giving the same for some other classification prob!

#

that should be in flow_from_directory

cedar sky Sep 29, 2020, 3:29 PM

#

you need class_mode only when you are flowing from directory

lapis sequoia Sep 29, 2020, 3:29 PM

#

oh!

#

i should convert this into test and train flow ?

#

yes since you are reading data from directory right

#

yeah!

#

what!!!

#

this is running!

#

how?!

#

this time this one is running !

#

though i didnt change the code !

#

👀

#

🤣

#

luck?

#

idk lol

#

no just suspicious

#

I am also doing the same thing but classifying dogs and cats :)

#

great i have done that!

#

maybe for 3 times lol

#

also it will take an hour or so to train your model

#

i got 90% acc

#

nice

#

yeah thats why i use google colab

#

I did baseline model which gave 72 %

#

now adding data augmentation

#

try using some other hyper params!

#

baseline model took 30 mins to train for 30 epochs 😢

#

like steps_per_epoch

#

yeah

#

omg!

#

I am actually reading a book

#

yeah same here!

#

fchollet on deep learning

#

sebastian 's

#

mine

#

this one is great!

#

nice pdf?

#

no

#

rough copy

#

nice

#

you ?

#

pdf?

#

yeah

#

those days are over

high ravine Sep 29, 2020, 3:34 PM

#

Can someone help me out with a recursive function in python, ive a hierarchical ruleset stored on an online DB and am able to fetch it in to a df, i need help in building the recursive logic to categorize a score based on that ruleset

lapis sequoia Sep 29, 2020, 3:34 PM

#

oh great i feel reading book is better to mark

#

everythingh is digitalized

#

yeah

#

lol yeah

#

i am still nervous !

#

running this

#

if epoch 1 runs successfully then its success :)

ripe forge Sep 29, 2020, 3:35 PM

#

what's a hierarchical ruleset?

lapis sequoia Sep 29, 2020, 3:35 PM

#

its almost!

#

great!

#

it gave an error

#

ValueError: logits and labels must have the same shape ((None, 1) vs (None, 2))

high ravine Sep 29, 2020, 3:36 PM

#

what's a hierarchical ruleset?
@ripe forge something where the ranges are defined in a tree structure

lapis sequoia Sep 29, 2020, 3:36 PM

#

yeah about that

#

https://stackoverflow.com/questions/49083984/valueerror-can-not-squeeze-dim1-expected-a-dimension-of-1-got-3-for-sparse

Stack Overflow

ValueError: Can not squeeze dim[1], expected a dimension of 1, got ...

I tried to replace the training and validation data with local images. But when running the training code, it came up with the error :
ValueError: Can not squeeze dim[1], expected a dimension o...

#

there is a similar error

#

yeah i used thisa bit !

#

flow from dir!

cedar sky Sep 29, 2020, 3:41 PM

#

@lapis sequoia I think I found the mistake you made

lapis sequoia Sep 29, 2020, 3:42 PM

#

what?!

#

can you please help?

cedar sky Sep 29, 2020, 3:44 PM

#

In the validation data instead of having: validation_data = (X_test,y_test), have a seperate ImageDataGenerator instance with no hyperparam and use: instance.flow(X_test, Y_test)

lapis sequoia Sep 29, 2020, 3:44 PM

#

oh!

#

i should use the damn image data gen right ?!

cedar sky Sep 29, 2020, 3:45 PM

#

yes

lapis sequoia Sep 29, 2020, 3:45 PM

#

then i should also add batch_size there itseelf!

cedar sky Sep 29, 2020, 3:45 PM

#

the validation data should be of similar type to train data

lapis sequoia Sep 29, 2020, 3:45 PM

#

yeah i just made my y _train to 2 dim

cedar sky Sep 29, 2020, 3:45 PM

#

then i should also add batch_size there itseelf!
@lapis sequoia It's not needed it already defaults to 32

lapis sequoia Sep 29, 2020, 3:46 PM

#

oh

cedar sky Sep 29, 2020, 3:46 PM

#

yeah i just made my y _train to 2 dim
@lapis sequoia ohh

lapis sequoia Sep 29, 2020, 3:46 PM

#

is that fine ?

#

it was 3 before!

#

y_train.shape = 3 dims

#

now 2

#

res = model.fit(image_data_gen.flow(X_train,y_train,batch_size=batch_size),
validation_data = (X_test,y_test),
epochs=epochs)

#

but this also gave an error!

#

I think this type of data preparation is safest to me

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary'
)

validation_generator = test_datagen.flow_from_directory(
    validation_dir,
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary'
)

#

yeah bt i have no train and test data

#

i just used sklearn's train_test_split

lapis sequoia Sep 29, 2020, 4:08 PM

#

how do i load a model which was saved using model.save and not model.save_model ?

#

rip I am training for 100 epochs now

#

will see you one eternity later XD

#

lol!

#

your pc will die eventually

#

so no replies

#

XD

#

nah google colab GWnonAiSmug

#

then thats fine!

#

I need to learn how to train on GPU

#

now its running on colab CPU

#

which is a bit slow

#

same here!

#

i have to learn to use it with gpu

#

omg i got a better dataset!

#

ok time to do hw meanwhile

#

multitasking 💯

#

lol!

#

you the great!

#

you use vs code?!

#

yep to take notes

#

what!

#

I read the pdf and take my notes

#

how?

#

what!

#

🤣

#

I write in markdown

#

you are next generation

#

📎 unknown.png

#

lol no

#

seriously man!

#

Its easy to take it in this way I can even include code blocks

#

which format?

#

📎 unknown.png

#

like this

#

oh yeah!

#

its not a txt file lol

#

yes a markdown file

#

https://learn-the-web.algonquindesign.ca/topics/markdown-yaml-cheat-sheet/

Learn the Web

Markdown & YAML cheat sheet · Web Dev Topics · Learn the Web

A cheat sheet for understanding and writing in Markdown and YAML.

#

oh great though

grave frost Sep 29, 2020, 5:08 PM

#

I am getting val_accuracy to be 0.000e+0 for my model when using keras and TensorFlow. Though the loss seems to be decreasing. Can anyone provide some pointers on how to fix that?

austere swift Sep 29, 2020, 5:14 PM

#

what does your output look like?

#

like is it one-hot encoded, sparse classes, is it a regression, etc

grave frost Sep 29, 2020, 5:15 PM

#

@austere swift Integers in a giant list. their locations corresponding to the input list. Though I can also convert it to other dtypes

lapis sequoia Sep 29, 2020, 5:31 PM

#

What is the problem description?

grave frost Sep 29, 2020, 5:32 PM

#

What do you mean by a "problem description"? Like should I explain my task I want to perform?

lapis sequoia Sep 29, 2020, 5:33 PM

#

yes

grave frost Sep 29, 2020, 5:33 PM

#

It is basically a sequence2sequence where I have a list of input sequences as numpy array and the same thing with the corresponding outputs.

#

My model consists of mostly Dense and Dropout layers

#

Hmm.. my val_labels look very weird. My val_train looks great and since it was the same function, I assumed that val_labels would be good to go too. Let me debug it first

cold mortar Sep 29, 2020, 8:05 PM

#

subject = input("What is your favourite subject" + name)

#

Can someone tell me how to make it have a space between subject and the name when i print this?

worthy olive Sep 29, 2020, 8:22 PM

#

does anyone know if it's possible to extrapolate a surface given a set of 3d points? (using python)

interpolation sounds simple - just curve fitting. but idk how extrapolation works

last peak Sep 29, 2020, 8:23 PM

#

YES

#

hey so just so I am on same page

#

theres all kinds of extrapolation methods right, I can think of very easy ways, but you want some sort of ML or gradient descent alogirthm to find your next point

#

have to do more research about that

#

but the way I see it is, interpolation, you are doing a fit in between datapoints, but extrapolation you are doing to do fits outside of the datapoints

worthy olive Sep 29, 2020, 8:27 PM

#

correct

last peak Sep 29, 2020, 8:27 PM

#

you can take a 2d case for an idea and build the intuition for 3d

#

for 2d, given the data points, youd way to lets say use linear regression if the point is close enough

#

or use a higher order regression

#

if that helps

#

but you are generally going to get some y=f(x) some guess function and use that to guess your points outside of dataset

#

so similarily you are going to have to build a z=f(x,y) for 3d, using like gradient descents or umm direction derivaties in x and y or whatever coordinate system u want

worthy olive Sep 29, 2020, 8:28 PM

#

so thinking in 2d - if i have 3 points that form a triangle (and ground truth is a rectangle/plane in 2d), it is unclear that any regression model would guess that the "4th corner" of the rectangle/plane exists

#

like my undrstanding of splines is that they perform well for interpolation, but break down at the ends of the distribution. and splines are fairly complex models

last peak Sep 29, 2020, 8:29 PM

#

hold on, can you explain what a spline is

worthy olive Sep 29, 2020, 8:29 PM

#

it's like piecewise regression

#

so you divide your x axis (or feature space) into several regions, and fit a function (linear, polynomial etc) within each region

#

then just connect the curves to get your complete curve

last peak Sep 29, 2020, 8:30 PM

#

oh okay i see spline now, ive worked with bezzier curves, spline is general version of that

#

oaky sure, so that make sense

#

this is the idea of higher order regerssion

worthy olive Sep 29, 2020, 8:31 PM

#

im just not sure how any algorithm can infer that an entire plane exists when i only give you 3 points. if that makes sense

last peak Sep 29, 2020, 8:31 PM

#

you cant in 2d

#

you can only guess a real functon

#

what do you mean by plan and rectangle in 2d,

worthy olive Sep 29, 2020, 8:32 PM

#

okay fine - move to higher dimensions

last peak Sep 29, 2020, 8:32 PM

#

that wont have a real function

worthy olive Sep 29, 2020, 8:32 PM

#

if you move to higher dimensions, just imagine a surface

last peak Sep 29, 2020, 8:32 PM

#

okay sure

worthy olive Sep 29, 2020, 8:32 PM

#

rectangle, triangle, plane

#

but if i give you three points. do any algorithms infer the "fourth point"/rectangle?

#

most reasonable guessses would just connect the dots and say "hey you got a triangle bud"

last peak Sep 29, 2020, 8:34 PM

#

yes I see what u are saying, thats an interesting problem, I think this a case you can build up differently

#

do you have data points for multiple rectangles, triangles

worthy olive Sep 29, 2020, 8:35 PM

#

so you're saying if i had more than 3 points, then the other points would suggest the presence of a rectangle or triangle

last peak Sep 29, 2020, 8:35 PM

#

based on that data you can infer from the points you are given if first it is of class rectangle and triangle, and then see which point would best fit according to the patterns fo rthe 4 points in a rectangle

#

and 3 vertex in triangle

worthy olive Sep 29, 2020, 8:36 PM

#

i mean, but both a triangle and rectangle would fit well?

last peak Sep 29, 2020, 8:36 PM

#

oh nevermind I dont understand the problem , I though you were saying u were given the vertices

#

yes that is true, you cant say anything if the points are just colinear or something

worthy olive Sep 29, 2020, 8:36 PM

#

well ideally we'd like all the vertices, but we don't have all the data, just some training sample

#

i see

last peak Sep 29, 2020, 8:37 PM

#

you need enough points to see that there is no triangle than can connect

worthy olive Sep 29, 2020, 8:37 PM

#

i suppose - depending on the algorithm - it might make a generalization. that is, triangle is a simple subset of a rectangle, so the algorithm "assumes" the more general case

last peak Sep 29, 2020, 8:37 PM

#

hmmmm

worthy olive Sep 29, 2020, 8:37 PM

#

i guess im looking for an algorithm that will make assumptions of that sort

last peak Sep 29, 2020, 8:37 PM

#

can we talk aobut the problem actually

#

how are you getting the data

#

like for example if its rectable and triangle

#

the probability of the spread of the data is going ot be different

#

we are talking about some rectangle and triangle planes right

worthy olive Sep 29, 2020, 8:38 PM

#

thats a very long story - but it's latent codes from a GAN that correspond to training images

#

and the idea is to - in this N dimensional space - use the latent codes to fit a surface that compasses all the data points and extrapolates to the (most likely) shape of the manifold

last peak Sep 29, 2020, 8:39 PM

#

like if its a triangle and u sample it enough, a bunch of times i mean it should form a triangleish shape

worthy olive Sep 29, 2020, 8:39 PM

#

yeah i understand

#

but theres always the case that we may not have that sufficient quantity of data to make that inference

last peak Sep 29, 2020, 8:39 PM

#

and the area will be around the area of the 2 surronding curves

#

u can take them as lines and do cross product

olive lichen Sep 29, 2020, 8:40 PM

#

hello, i'm working with nltk, and i've been given this code. I've been asked to determine what's wrong with the code. the goal output is to print the 10 most frequent bigrams (pairs of adjacent words) of a text, omitting bigrams that contain stop words.

📎 Screen20Shot202018-02-1720at2011.png

last peak Sep 29, 2020, 8:40 PM

#

ah nvm that doent help hmm

worthy olive Sep 29, 2020, 8:40 PM

#

therefore, i'd like an algorithm that just assumes "everything that lies on a plane (be it a triangle, hexagon, polygon) is the most general surface possible (rectangle)"

last peak Sep 29, 2020, 8:41 PM

#

hmm

#

what would you want ideally?

#

do you need to classify shape like triangle,rectangle

worthy olive Sep 29, 2020, 8:41 PM

#

well im hoping theres a nonlinear extension to that

#

because the linear case is literally always a plane (rectangle)

last peak Sep 29, 2020, 8:43 PM

#

i am confused about the problem statement now,

#

lets say you are gievn some random 3d surface

#

finite

worthy olive Sep 29, 2020, 8:44 PM

#

ok

last peak Sep 29, 2020, 8:44 PM

#

you want a function z=f(x,y) that as infinite

#

that would fit your surface right

#

ideal world

worthy olive Sep 29, 2020, 8:44 PM

#

yes, and im realizing right now that you'd never get a triangle - always a rectangle. fuckn ell duh

last peak Sep 29, 2020, 8:45 PM

#

lol

worthy olive Sep 29, 2020, 8:45 PM

#

because if z is a linear surface/hyperplane - it will always be a plane

last peak Sep 29, 2020, 8:45 PM

#

thsi is true

worthy olive Sep 29, 2020, 8:45 PM

#

so if you fit 3 points to z, it's just gonna be a plane oriented in whatever way

last peak Sep 29, 2020, 8:45 PM

#

yes this is why i was talking to aobut finite shapes

worthy olive Sep 29, 2020, 8:46 PM

#

ah i see

last peak Sep 29, 2020, 8:46 PM

#

if its an infinite plane therse no shape

#

if its finite then you get a probablility distribution

#

on how the points will spread out

worthy olive Sep 29, 2020, 8:47 PM

#

right. yeah for some reason i didn't make the infinite vs finite distinction

#

thanks for spit balling lol

last peak Sep 29, 2020, 8:47 PM

#

sure lol!

#

I am in the middle of work with no work to do so

#

ya hah

deft harbor Sep 29, 2020, 8:56 PM

#

If you have a bunch of images, all of different dimensions, and want to find the cosine similarity.. what is the best way to handle the issue of the images being different dimensions and sizes?

#

One idea was to crop to the largest dimensions and then scale if need be, but that seems like it would remove a lot of useful information.

worthy olive Sep 29, 2020, 9:06 PM

#

i mean dot product requires the dimensions to be equal so you can't really compute similarity that way. i think the only way would be to downsample to the smallest resolution then compute similarity

deft harbor Sep 29, 2020, 9:07 PM

#

yeah, thats sort of what i was thinking

#

figured I would ask in case someone had a better idea that prevented lost data

worthy olive Sep 29, 2020, 9:08 PM

#

@last peak okay shit actually i can't use an infinite surface 😅 . I need to sample uniformly from this space (can't just assume gaussian and mostly take points from close to the mode), and therefore need some "boundaries".......

#

on one hand, the infinite surface is very helpful because it includes directions that were not already present in the data. on the other hand, i have no idea how to truncate these directons...

last peak Sep 29, 2020, 9:14 PM

#

if its close to a known point

#

then you can take a plane approximation of the surface close to the point

#

and use the equation of that plane ax+by+cz = d, to get values for points (x,y,z)

#

@worthy olive If its a sample of a surface and its close, what id do is pick a direction lets say unit vector in x, and then do linear regression line on a small distance there, and rpeat it for unit vector y, then you have those 2 vectors one in x and 1 in y, then take cross product of that to get the normal line <a,b,c>
then plane: ax+bx+cz = d
you can solve for d by using one of those points

only thing is this is no good for points further away unless your data set is already plane like

#

you could do some gaussian disribution stuff though

#

like consider set of planes from repeating his process with other points

#

and pick the most likely or give an array of some likely approximations

worthy olive Sep 29, 2020, 9:31 PM

#

hmmm ok. i'll have to think on it

last peak Sep 29, 2020, 9:31 PM

#

sure lemme know what u come up with, im interested too

#

itd be nice if u find a library that does the spline or some other polynomial curve surfaces to approximate

#

incase ur points are considerably far from sample

#

and ur surface is inherently very curvy

worthy olive Sep 29, 2020, 10:02 PM

#

Ok. So i think i will do something like impose a distribution (maybe nonparametric?) on the surface. In order to include all "corners" that the original data previously didn't include, I will have to select a variance that is sufficiently large. I can possibly base this off of a z-score of the original data

#

So the variance is just some hyperparameter. Make it sufficiently large. Cool. I think the challenge is the distribution. If the curve has many peaks, should you really throw a unimodal distribution over it? Probably not. The distribution should correspond to the surface in some way... just need to figure that out

deft harbor Sep 29, 2020, 11:16 PM

#

This isn't data science in the straighest sense, but say I train a CNN autoencoder on a series of images. I think went to create a web frontend where people can upload there images, and then it will be ran through just the encoding of the pretrained autoencoder. How would I actually go about using that model with a web interface?

#

Would I just create a backend .py file importing the model and weights, then pass the image via post to this script?

wise garden Sep 30, 2020, 12:53 AM

#

Any idea why DataFrame.diff() would add a random column of null values in the middle of my data

stable otter Sep 30, 2020, 12:57 AM

#

anyone know how to save a model in pytorch

velvet thorn Sep 30, 2020, 1:49 AM

#

Any idea why DataFrame.diff() would add a random column of null values in the middle of my data
@wise garden which axis?

wise garden Sep 30, 2020, 1:49 AM

#

1

velvet thorn Sep 30, 2020, 1:50 AM

#

well, I'm not really sure what you were expecting then

#

like

wise garden Sep 30, 2020, 1:51 AM

#

i was expecting that not to happen

#

I passed through axis=1

velvet thorn Sep 30, 2020, 1:51 AM

#

📎 unknown.png

#

there will be one column where the difference cannot be calculated

wise garden Sep 30, 2020, 1:52 AM

#

wasn't talking about first col I said middle

velvet thorn Sep 30, 2020, 1:52 AM

#

what are your column names

#

and what's the name of the empty column

wise garden Sep 30, 2020, 1:52 AM

#

all col numbered 1 - 40

#

col that's nan is 31

velvet thorn Sep 30, 2020, 1:53 AM

#

hm.

#

what type is your columns index

#

is it RangeIndex?

wise garden Sep 30, 2020, 1:53 AM

#

df.columns = np.arange(1,41)

velvet thorn Sep 30, 2020, 1:54 AM

#

column types?

#

are they all numeric

wise garden Sep 30, 2020, 1:54 AM

#

yea int

#

sorry

#

I have null vals in data

velvet thorn Sep 30, 2020, 1:55 AM

#

oh

#

okay

#

so that's the reason?

#

I was about to ask about integer overflow

#

🥴

#

but yeah I forgot the simplest possible reason

wise garden Sep 30, 2020, 1:56 AM

#

I don't think so, I had someone else run my exact code and no random null column

#

maybe its a version problem I just didn't think that'd be the case

velvet thorn Sep 30, 2020, 1:57 AM

#

hm

#

could be a breaking change from pandas 1.0

wise garden Sep 30, 2020, 1:58 AM

#

wdym

#

nvm, got a work around. thanks for the info tho

velvet thorn Sep 30, 2020, 2:04 AM

#

np

lapis sequoia Sep 30, 2020, 4:05 AM

#

bruh I forgot to save the model that I trained yesterday

#

I didn't downlod the saved model to my local pc

#

now I need to train it again

#

GWnonexUmaruCry

hasty grail Sep 30, 2020, 4:52 AM

#

Use checkpoints next time

austere swift Sep 30, 2020, 5:20 AM

#

^

#

those saved me quite a few times

#

one time the guy who was fixing my AC flipped the wrong breaker and it shut down my training server lol

lapis sequoia Sep 30, 2020, 5:24 AM

#

yeah I am such a rookie

lapis sequoia Sep 30, 2020, 6:45 AM

#

Hey guys, running python3 -m notebook will open up my jupyter notebook in the browser however when running the standard way jupyter notebook I get zsh: command not found: jupyter.

Is there a solution to this?

📎 Screenshot_2020-09-30_at_08.43.32.png

#

maybe because those modules are not in the PATH?

#

type this
echo $PATH

#

and see if the directory of above package is present inside PATH

#

alexanderberg@Alexanders-MacBook-Pro ~ % echo $PATH
/Library/Frameworks/Python.framework/Versions/3.8/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin```

#

hmm it is in the PATH GWgoaThinken

#

hmm it is in the PATH :GWgoaThinken:
@lapis sequoia How do you see that?

#

/Library/Frameworks/Python.framework/Versions/3.8/bin

#

first line in the PATH

#

What does that have to do with Jupyter?

#

to run a command it must be in the PATH

#

the system scans those directories and searches for jupyter in it

#

if there is one it will execute that

#

Yeah I can run python3 -m notebook so it works but it should be just jupyter notebook.

#

yeah ye

#

did you try jupyter-notebook

#

I have just installed python3 from python.org but I haven't set any Path variables in nano or such things. So everything is just standard

#

did you try jupyter-notebook
?

#

?
@lapis sequoia Yes that worked too

#

shouldn't python be set up usually innano ~.zschrc

#

Cause when I installed ipykernel when running my first .ipynb is said something about consider putting in PATH

#

or something along those lines.

#

yes you need to do that step

#

see the last section in this page

#

http://pages.cs.wisc.edu/~paris/cs564-f18/material/jupyter_install.html#:~:text="Command jupyter not found"%3A&text=You may need to add,~%2Fbin to your path.

#

since you are in MacOS

#

yes you need to do that step
@lapis sequoia That is obsolete. Python 2.7?

#

I have a mac from this year so i have zsh, does it make a difference?

#

no I mean replace that python version with yours

#

I think the correct way to do is jupyter-notebook

#

I never tried jupyter notebook

#

did you try it before?

#

It is the official way

#

As per the instructions from Jupyter themselves

#

Yeah it worked before I did a factory reset of my laptop

#

Thanks for your help, I will try to solve it somehow

mild topaz Sep 30, 2020, 10:20 AM

#

Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 71, in <module>
    assert (x_train.shape[1:] == (imageDimensions)),  "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong```

grave frost Sep 30, 2020, 12:05 PM

#

Quick question - If I convert a list containing some elements along with a csv column for integers into a Pandas DataFrame (something like this :- ['a','b','c','d'],123 would the DF have 5 columns, like this :-

col1 | col2 | col3 | col4 | col5 (numeric type)
'a'  |  'b' |  'c' | 'd'  | 123

?

#

Like would it ignore the [] brackets or would that also form a seperate column? (one for each bracket)

kindred ridge Sep 30, 2020, 1:57 PM

#

Don't think you can convert that list..

lost moth Sep 30, 2020, 2:18 PM

#

I believe each list entry would go into a row rather than a column. This is a good case for a dict comprehension, probably followed by a merge:

df1 = pd.read_csv(path_to_csv)
df2 = pd.DataFrame({f'col{i}': element for i, element in enumerate(lst)})
df1.merge(df2)

#

So, I have a dataframe with X and y columns. The X was produced from SKLearn's PCA method, and behaves as expected. y was produced from LabelBinarizer, and also works as expected. I then do the following:

clf = KNeighborsClassifier(n_neighbors=5, weights="distance", n_jobs=-1)
X, y = df["X"].to_numpy(), df["y"].to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X, y)                    
clf.fit(X_train, y_train)

This gives me the following traceback:

#

TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cif3r/models/model_eval.py", line 117, in <module>
    main()
  File "/home/dal/.cache/pypoetry/virtualenvs/cif3r-fHcryY4V-py3.8/lib/python3.8/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/dal/.cache/pypoetry/virtualenvs/cif3r-fHcryY4V-py3.8/lib/python3.8/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/dal/.cache/pypoetry/virtualenvs/cif3r-fHcryY4V-py3.8/lib/python3.8/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/dal/.cache/pypoetry/virtualenvs/cif3r-fHcryY4V-py3.8/lib/python3.8/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "cif3r/models/model_eval.py", line 111, in main
    clf.fit(X_train, y_train)
  File "/home/dal/.cache/pypoetry/virtualenvs/cif3r-fHcryY4V-py3.8/lib/python3.8/site-packages/sklearn/neighbors/_base.py", line 1130, in fit
    X, y = check_X_y(X, y, "csr", multi_output=True)
  File "/home/dal/.cache/pypoetry/virtualenvs/cif3r-fHcryY4V-py3.8/lib/python3.8/site-packages/sklearn/utils/validation.py", line 747, in check_X_y
    X = check_array(X, accept_sparse=accept_sparse,
  File "/home/dal/.cache/pypoetry/virtualenvs/cif3r-fHcryY4V-py3.8/lib/python3.8/site-packages/sklearn/utils/validation.py", line 531, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File "/home/dal/.cache/pypoetry/virtualenvs/cif3r-fHcryY4V-py3.8/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

#

I've checked to see that X_train and y_train have the same shapes, and they do. Both have the object dtype in the parent dataframe, which I assume shouldn't make any difference with to_numpy() Any ideas what might be wrong here? Let me know if there's any other output/info I can give, and thanks in advance!

lapis sequoia Sep 30, 2020, 2:25 PM

#

#data-science-and-ml

proven phoenix Sep 30, 2020, 3:44 PM

#

[technologies: AWS Glue, PySpark, Python3]hello guys, I am trying to figure out how to pass variables to a function I created. This function is called on each record of my glue DynamicFrame (aws wrapper of a spark dataframe) but I can't figure out how to give extra arguments to my function. I need to use map(). I can either use

my dynamic frame directly DynamicFrame (Glue) => https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-map which would be something like my_dynamic_frame.map(replace_null_string)
or Map.apply() => https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-map.html#aws-glue-api-crawler-pyspark-transforms-map which would be Map.apply(frame=my_dynamic_frame, f=replace_null_string)
My function is something like that

# how to pass string_type_columns as parameters with kwargs or something else?
def replace_null_string(rec, string_type_columns???):
    # do something with the rec and the arg string_type_columns
    ...
    return rec

I already have my list prepared, I just want to be able to give it to the apply function and transfer it to my function replace_null_string().
Any idea? Thanks a lot

DynamicFrame Class - AWS Glue

Overview of the AWS Glue DynamicFrame Python class.

Map Class - AWS Glue

The Map transform builds a new DynamicFrame by applying a function to all records in the input DynamicFrame.

proven phoenix Sep 30, 2020, 4:10 PM

#

Nevermind, it is not possible because of the implementation of map() in the dynamicframe

    def map(self, f, preservesPartitioning=False,transformation_ctx = "", info="", stageThreshold=0, totalThreshold=0):
        def wrap_dict_with_dynamic_records(x):
            rec = _create_dynamic_record(x["record"])
            try:
                result_record = _revert_to_dict(f(rec))
                if result_record:
                    x["record"] = result_record
                else:
                    x['isError'] = True
                    x['errorMessage'] = "User-specified function returned None instead of DynamicRecord"
                return x
            except Exception as E:
                x['isError'] = True
                x['errorMessage'] = E.message
                return x
        def func(_, iterator):
            return imap(wrap_dict_with_dynamic_records, iterator)
        return self.mapPartitionsWithIndex(func, preservesPartitioning, transformation_ctx, info, stageThreshold, totalThreshold)

=> result_record = _revert_to_dict(f(rec))

worn sphinx Sep 30, 2020, 6:57 PM

#

Hi there, i wanna advance my feature engineering skills for the Kaggle's competitions. Are there any good guidelines on this subject?

raw blaze Sep 30, 2020, 7:18 PM

#

@worn sphinx there are literally hundreds, if not thousands, of books regarding machine learning, model building, feature engineering and the likes

#

loads of free ones too. Just google

desert oar Sep 30, 2020, 7:45 PM

#

I'm not aware of any good "general purpose" references in feature engineering

#

There are plenty of scattered recommendations in stats and ML books, and books on more specific topics like dimension reduction

#

As well as domain specific feature engineering for NLP, image processing, etc

chrome orbit Sep 30, 2020, 7:52 PM

#

Hey guys, I'm trying to learn to use plotly and i'm trying to plot 2 datasets... 1 is a bar chart and it has date and time on the x axis... and the second is a horizontal line (using scatter). when i added in the time to the bar chart (was only date before) i cannot see the horizontal lines anymore... i'm thinking its because the axis are all different

#

anyone know how i can fix that?

#

this is from the bar chart

📎 unknown.png

#

📎 unknown.png

#

this is from the horizontal line

last peak Sep 30, 2020, 7:54 PM

#

oh ur dates replaced *the points?

chrome orbit Sep 30, 2020, 7:56 PM

#

i think so

#

is there anyway i can create that similar layout on the xaxis using update_xaxis?

dire acorn Sep 30, 2020, 8:00 PM

#

@chrome orbit is there a specific reason you want to use that lib? there are others that do similar things.

chrome orbit Sep 30, 2020, 8:00 PM

#

plotly?

dire acorn Sep 30, 2020, 8:00 PM

#

yup

chrome orbit Sep 30, 2020, 8:00 PM

#

No i just heard it was good...

#

which would you recommend?

dire acorn Sep 30, 2020, 8:01 PM

#

there is seaborn, pandas

#

something else I can't think of

last peak Sep 30, 2020, 8:01 PM

#

matplotlib

dire acorn Sep 30, 2020, 8:01 PM

#

thats it

#

I like seaborn but to each their own. seaborn as some graphics that just make you look good

chrome orbit Sep 30, 2020, 8:03 PM

#

pandas can blot?

#

plot*

dire acorn Sep 30, 2020, 8:03 PM

#

yup

#

basics plots

chrome orbit Sep 30, 2020, 8:03 PM

#

hmm

#

ok. will take a look

#

i heard mpl is good

#

so plotly is not that good? it is damn confusing to learn thats for sure lol

last peak Sep 30, 2020, 8:05 PM

#

I think its good too

#

So did you want to make a scatter plot?

chrome orbit Sep 30, 2020, 8:07 PM

#

yeah. basically a scatter plot and then horizontal lines in them

#

with x-axis having the time & date similar to the pic above

last peak Sep 30, 2020, 8:08 PM

#

oh so a barplot

#

wait horizontal lines

#

Why are these horizontal lines in a scatter plot

#

import plotly.graph_objects as go

fig = go.Figure(go.Scatter(
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
y = [28.8, 28.5, 37, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9]
))

fig.update_layout(
xaxis = dict(
tickmode = 'linear',
tick0 = 0.5,
dtick = 0.75
)
)

fig.show()

#

You could do something like that if u want ticks

#

from here : https://plotly.com/python/tick-formatting/

Formatting Ticks

How to format axes ticks in Python with Plotly.

#

import plotly.graph_objects as go

go.Figure(go.Scatter(
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
y = [28.8, 28.5, 37, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9]
))

fig.update_layout(
xaxis = dict(
tickmode = 'array',
tickvals = [1, 3, 5, 7, 9, 11],
ticktext = ['One', 'Three', 'Five', 'Seven', 'Nine', 'Eleven']
)
)

fig.show()

#

Or like this where you can replace the tictext with those dates

#

fig.update_layout(
title = 'Time Series with Custom Date-Time Format',
xaxis_tickformat = '%d %B (%a)<br>%Y'
)

#

you can also use this, it gives you dates, just change the format like you want

#

if you scroll down on that website it also shows u how to define a start and end time and tick sparsity

#

@chrome orbit

dire acorn Sep 30, 2020, 8:27 PM

#

everyone has there own preference 🙂

#

@last peak hey man you got time for a question?

last peak Sep 30, 2020, 8:29 PM

#

hey yeah?

chrome orbit Sep 30, 2020, 8:30 PM

#

ok ill try it out @last peak and let you know

dire acorn Sep 30, 2020, 8:30 PM

#

i have a program where I returned text to the console but now I want to assign it to a dataframe and I am not sure how to do that.

last peak Sep 30, 2020, 8:30 PM

#

@chrome orbit ok cool

#

do u want to make a new df or is there an existing df u want to ad to

dire acorn Sep 30, 2020, 8:31 PM

#

whole new dataframe. i extracted the text from pdf files.

#

I think it is an issue with how I am initializing classes.

last peak Sep 30, 2020, 8:32 PM

#

oh

#

What the format of the text, how do u want to add to the df

dire acorn Sep 30, 2020, 8:32 PM

#

yea...

#

well the end goal is just to create a one column with all text extracted from pdfs assigned to a data frame.

#

but that may not be the best way to do it

last peak Sep 30, 2020, 8:34 PM

#

if its like word by word u can always just do
pd.Series([w1,w2,w3])

#

and then do pd.DataFrame({col_name : series})

#

or simply take as an array

dire acorn Sep 30, 2020, 8:35 PM

#

darn. I have no idea how may words there are. its like an 800 page pdf haha

last peak Sep 30, 2020, 8:35 PM

#

oh u want the whole txt in just one row then?

dire acorn Sep 30, 2020, 8:35 PM

#

one column

#

each word one row was what i was thinking

last peak Sep 30, 2020, 8:36 PM

#

ok sure

dire acorn Sep 30, 2020, 8:36 PM

#

but if you know of a better way to analyze text let me know 🙂

last peak Sep 30, 2020, 8:36 PM

#

then do
pd.DataFrame({'words' : set(text.split()}))

#

oh u want distinct words then

#

are u thinking of counting words

dire acorn Sep 30, 2020, 8:37 PM

#

nah not counting

last peak Sep 30, 2020, 8:37 PM

#

you might have to change that set back into list

dire acorn Sep 30, 2020, 8:37 PM

#

spilt might work

last peak Sep 30, 2020, 8:38 PM

#

ya and use set to get distinct

dire acorn Sep 30, 2020, 8:38 PM

#

is there a good way to save the text printed on the console?

last peak Sep 30, 2020, 8:38 PM

#

you can write to file

#

instead of stdout

dire acorn Sep 30, 2020, 8:38 PM

#

hmmm i think I did that one sec

last peak Sep 30, 2020, 8:39 PM

#

whever you are calling that print make it just append to the end of a file

dire acorn Sep 30, 2020, 8:39 PM

#

yea i did use stdout

last peak Sep 30, 2020, 8:39 PM

#

ok so change stdout to some file

#

or you can put it as a variable

#

and then pickle it

#

or just pickle a dataframe

dire acorn Sep 30, 2020, 8:40 PM

#

class Savecsv(Transform):
    def test(text):
        sys.stdout= open("text.csv","w")
        print(text)
        sys.stdout.close()
        return "compeleted"

last peak Sep 30, 2020, 8:40 PM

#

some ppl like to keep everythign as objects so

dire acorn Sep 30, 2020, 8:41 PM

#

that's how i have my object set up and I passed the previous class extracting the text through that object.

last peak Sep 30, 2020, 8:41 PM

#

oh okay

#

does it work

dire acorn Sep 30, 2020, 8:41 PM

#

nope haha

last peak Sep 30, 2020, 8:41 PM

#

u see the text there in text.csv

#

with open ('myfile', 'a') as f: f.write ('hi there\n')

#

with open('text.csv','w') as f:
f.write(text)

dire acorn Sep 30, 2020, 8:46 PM

#

shoot i am not getting it

#

sys.sdout = open("text.csv", "w") as f?

#

then f . write text

last peak Sep 30, 2020, 8:47 PM

#

class Savecsv(Transform):
def test(text):
with open('text.csv','w') as f :
f.write(text)
return "compeleted"

#

like that

dire acorn Sep 30, 2020, 8:47 PM

#

https://tenor.com/view/alonzo-lerone-oh-iget-it-gif-10735355

Tenor

last peak Sep 30, 2020, 8:48 PM

#

lol

split eagle Sep 30, 2020, 8:59 PM

#

I've been working with a large data frame trying to isolate studies that started before 2018-06-01. After looking through online resources, I've come to this: import datetime
df_start_date = df_studies[(df_studies["start_date"] <= "2018-6-1")] This code did not produce any errors, but I wanted to make sure that the values in "start_date" will be read as dates and not strings. Are there any changes I need to make to this code to ensure that or is it ok as is?

dire acorn Sep 30, 2020, 8:59 PM

#

check the data type with .dtype

last peak Sep 30, 2020, 9:32 PM

#

@split eagle that comparison really works with date objects?

#

just make sure date obj can handle a comparison with a str.. or else you might have to make that date object first

#

df_start_date = df_studies[(df_studies["start_date"] <= pd.Timestamp(2018,6,1)]
try this if the dtype is not right like thisbi pointed out then ud get an error

#

or else ur going to have to do a mapping

#

on that column to change them into date objects

split eagle Sep 30, 2020, 9:40 PM

#

I used the code from your last message and got this error message: File "<ipython-input-8-a5e9b3181afa>", line 18
df_start_date = df_studies[(df_studies["start_date"] <= pd.Timestamp(2018,6,1)]
^
SyntaxError: invalid syntax

#

The syntax error was apparently the last brace, which i don't understand.

last peak Sep 30, 2020, 9:41 PM

#

oh okay can you do a df_studies.head()
and show me the output here

split eagle Sep 30, 2020, 9:41 PM

#

Sure thing. Give me a sec.

last peak Sep 30, 2020, 9:42 PM

#

and u do have import pandas as pd right

split eagle Sep 30, 2020, 9:42 PM

#

yes

#

df_studies.head() gave this error...

#

NameError Traceback (most recent call last)
<ipython-input-9-8de5dcb0e6b0> in <module>
----> 1 df_studies.head()

NameError: name 'df_studies' is not defined

#

hold on, i think i ran things out of order.

last peak Sep 30, 2020, 9:45 PM

#

oh okay also

#

its df_studies.dtypes

#

show me that output too

#

just make sure its a date obj

split eagle Sep 30, 2020, 9:48 PM

#

I got this table

📎 2020-09-30.png

#

which looks right

#

^df_studies.head()

last peak Sep 30, 2020, 9:49 PM

#

okay can you right under that block also do

#

df_studies.dtypes

split eagle Sep 30, 2020, 9:49 PM

#

Yep

#

nct_id object
nlm_download_date_description object
study_first_submitted_date object
results_first_submitted_date object
disposition_first_submitted_date object
...
ipd_url object
plan_to_share_ipd object
plan_to_share_ipd_description object
created_at object
updated_at object
Length: 64, dtype: object

last peak Sep 30, 2020, 9:49 PM

#

okay so they are justobject you have to make that a date obj first then

split eagle Sep 30, 2020, 9:50 PM

#

^df_studies.dtypes. I got this earlier (sorry. I think I forgot to mention that.)

last peak Sep 30, 2020, 9:51 PM

#

so what I suggest is doing a mapping

#

if your string is of the format '2019-08-10'

split eagle Sep 30, 2020, 9:51 PM

#

Do I make an object a date with datetime()?

last peak Sep 30, 2020, 9:52 PM

#

oh umm can u see if datetime can handle string of that format

#

then you can do the same with the other string too instead of pd.Timestamp

#

I was just going to use pd.Timestamp all over again

#

for example you can do:

my_list = list(map(lambda x : pd.Timestamp(int(x.split('-')[0]),int(x.split('-')[1]),int(x.split('-')[2])), df_studies['start_date']))

#

then you can get a list of true and false values to put into the df
like
true_falses = [dt <= pd.Timestamp('2018-01-1') for dt in my_list]
df_start_date = df_studies[true_falses]

#

.

split eagle Sep 30, 2020, 10:05 PM

#

@last peak What about something like this?pd.to_datetime('-', format='%Y%m%d', errors='coerce')

serene scaffold Sep 30, 2020, 10:26 PM

#

I have a dataframe and the third column tells us what class that row represents. And I want to use 75% of the data for training and 25% for test. So once I do df.groupby(2) how can I pull a representative sample from 75% of that? This is for homework but I know how I could do it with random and loops and stuff and I want to learn how to do it in pandas.

charred blaze Sep 30, 2020, 10:34 PM

#

you select randomly N indexes and use those to retrieve the relevant rows

#

no need to use loops

#

Don't quite recall the exact functions to do that in Pandas but it is possible from what I recall.

dire acorn Sep 30, 2020, 11:30 PM

#

@split eagle change datatype like this .astype('datetime64') for whatever column the date info is in

fading wigeon Sep 30, 2020, 11:47 PM

#

Yay machine learning

#

Also, is it supposed to be this hard to plot graphs inline in jupyter notebooks?

kind granite Sep 30, 2020, 11:48 PM

#

Show your code, gonna be easier

fading wigeon Sep 30, 2020, 11:51 PM

#

I think I figured it out from SO. Apparently you need to have this in the first code cell %matplotlib inline

charred blaze Sep 30, 2020, 11:54 PM

#

what's up man

#

yeah, usually you need that boilerplate on your notebook

fading wigeon Sep 30, 2020, 11:55 PM

#

My first day using notebooks, lol

#

Boss being super anal about me using and loving it

charred blaze Sep 30, 2020, 11:55 PM

#

I find them overrated personally... I only use them when necessary, basically for plotting some data and some explanations about some results

#

and it's typically a good idea to not have all your code inside a notebook, some of it should be outside of it in a src folder or something

fading wigeon Sep 30, 2020, 11:57 PM

#

Hmm

#

Good idea

charred blaze Sep 30, 2020, 11:58 PM

#

but if you're doing a lot of data analysis work... yeah, you'll be living inside notebooks.

fading wigeon Sep 30, 2020, 11:58 PM

#

Is there an easy way to export code from a notebook to a file?

dire acorn Sep 30, 2020, 11:58 PM

#

@fading wigeon it depends what you do with them. for my job we put them everywhere. I meant to ask you what error did you get from jyputer

charred blaze Sep 30, 2020, 11:58 PM

#

not to mention that versioning Jupyter Notebooks... is quite bothersome.

fading wigeon Sep 30, 2020, 11:59 PM

#

I've gotten two errors from Juptyer thus far. At first I just got an error message that only said error and then it just says In [*] which I guess means the program was stuck doing things

charred blaze Sep 30, 2020, 11:59 PM

#

https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit -> An interesting presentation from 2018 which made the rounds back then and brings up some pitfalls when using notebooks

Google Docs

I Don't Like Notebooks - Joel Grus - #JupyterCon 2018

I Don't Like Notebooks hi, I'm Joel, and I don't like notebooks Joel Grus (@joelgrus) #JupyterCon 2018

fading wigeon Sep 30, 2020, 11:59 PM

#

I'll check it out, I like avoiding pitfalls

charred blaze Sep 30, 2020, 11:59 PM

#

I think everyone does : D

#

yeah, that latter error means that your Jupyter Kernel is stuck running that cell

dire acorn Oct 1, 2020, 12:00 AM

#

refreshing helps that

#

restarting i mean

fading wigeon Oct 1, 2020, 12:37 AM

#

So many pitfalls... I've already encountered a few on my first day

chrome orbit Oct 1, 2020, 12:43 AM

#

@last peak how can i get more ticks on this?

#

📎 unknown.png

last peak Oct 1, 2020, 12:56 AM

#

@chrome orbit you can manually do it

#

import plotly.graph_objects as go

go.Figure(go.Scatter(
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
y = [28.8, 28.5, 37, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9]
))

fig.update_layout(
xaxis = dict(
tickmode = 'array',
tickvals = [1, 3, 5, 7, 9, 11],
ticktext = ['One', 'Three', 'Five', 'Seven', 'Nine', 'Eleven']
)
)

fig.show()

#

like adding more tickvals and more ticktext

#

you have to generate the list of datetimes yourself

velvet thorn Oct 1, 2020, 12:56 AM

#

I have a dataframe and the third column tells us what class that row represents. And I want to use 75% of the data for training and 25% for test. So once I do df.groupby(2) how can I pull a representative sample from 75% of that? This is for homework but I know how I could do it with random and loops and stuff and I want to learn how to do it in pandas.
@serene scaffold hint

#

you can groupby stuff that is not in the DataFrame, as long as it has the same length along the grouping axis

serene scaffold Oct 1, 2020, 12:57 AM

#

thxxxx

#

the prof postponed that assignment because everyone in that class is also taking the data science class, and there's a data science project due tomorrow

#

it's with rapid miner

#

I think I'd rather rapidly mine coal and deal with the adverse effects of that.

chrome orbit Oct 1, 2020, 12:59 AM

#

cant do it automatically? @last peak

bold ledge Oct 1, 2020, 1:40 AM

#

hi, if i want to find the dot product of EACH row in the blue circle vs the red array. how would i do so (do i broadcast?reshape? splice axis?) (its a 5x7, and i have a 7x1) i want to multiple each of the 5 rows of 7, but that 1x7 array.

📎 unknown.png

velvet thorn Oct 1, 2020, 1:42 AM

#

@bold ledge so the result should be of shape (5,), right?

bold ledge Oct 1, 2020, 1:44 AM

#

yes

#

@velvet thorn

velvet thorn Oct 1, 2020, 1:45 AM

#

@bold ledge a @ b.T[0]

#

a is the (5, 7) array

#

wait, no

#

yeah, fixed

bold ledge Oct 1, 2020, 1:47 AM

#

i think that worked thanks @velvet thorn

velvet thorn Oct 1, 2020, 1:47 AM

#

yw

tidal sonnet Oct 1, 2020, 2:31 AM

#

I can't figure out this practice question for the life of me :(

📎 unknown.png

#

Any Ideas on how to goa round working it out?

#

I tried going thru the options one by one... but i'm obviously still doing something wrong...

last peak Oct 1, 2020, 2:37 AM

#

@chrome orbit you probably can, i am just not familiar with this library sry

#

@tidal sonnet u have it right

tidal sonnet Oct 1, 2020, 2:42 AM

#

:o
Thank you

last peak Oct 1, 2020, 2:42 AM

#

-2(<3,4,1> - 3*<1,3/2,1/2>)

tidal sonnet Oct 1, 2020, 2:43 AM

#

????

#

The -2 is done first??

last peak Oct 1, 2020, 2:43 AM

#

no whats in bracket is done first

tidal sonnet Oct 1, 2020, 2:43 AM

#

OH... i get it

#

thank you...

last peak Oct 1, 2020, 2:44 AM

#

sure lol

#

@tidal sonnet

a = np.array([3,4,1,7])
b=np.array([1,3/2,1/2,9/4])
-2 * (a - 3*b)
array([-0. , 1. , 1. , -0.5])

#

i motivate numpy lol

tidal sonnet Oct 1, 2020, 2:52 AM

#

:o

#

I'll get into that as soon as I understand the maths >:)

#

But thanks for the advice

last peak Oct 1, 2020, 2:54 AM

#

ya good idea

#

by the way

#

you can also solve for that transformation matrix, to go from step 2 to step 3

#

T A = B

a=[[1,3/2,1/2],[3,4,1],[2,8,13]]

b=[[1,3/2,1/2],[0,1,1],[2,8,13]]
a=np.array(a)
b=np.array(b)
Then you can solve for T by doing left inverse of A on both sides

#

T = BA^-1

#

and T should resemble what u see as the answer just in matrix form

tidal sonnet Oct 1, 2020, 2:56 AM

#

I don't understand...
How did that work...
Have i been subtracting the vectors wrong?

last peak Oct 1, 2020, 2:57 AM

#

maybe you forgot the -2 multiplication

#

how are u subtracting vectors...

tidal sonnet Oct 1, 2020, 2:57 AM

#

3 4 1
1 1.5 0.5

2 2.5 0.5 x -2

last peak Oct 1, 2020, 2:58 AM

#

oh dont forget the 3*

tidal sonnet Oct 1, 2020, 2:58 AM

#

OHHHHHHHHHHHHH

#

[3, 4, 1]
[3, 4.5, 1.5]

[0, -0.5, -0.5] x -2

#

AHHHHHHH
I SEE IT NOW

#

Deepest appreciation

last peak Oct 1, 2020, 3:01 AM

#

sure thing 👍

tidal sonnet Oct 1, 2020, 3:06 AM

#

what is meant by linear combination?
Does that mean adding them together and then giving them a Scalar?
Or giving them a scalar separately then adding them together?

#

It's more of the method i'm interested in learning

📎 unknown.png

last peak Oct 1, 2020, 3:07 AM

#

linear combination of 2 vector v1,v2

#

is k1v1 + k2v2

#

so the new vector you get from this is a linear combination

#

for example
v=<v1,v2,v3>
u=<u1,u2,u3>
so a liinear combination of u and v can be
2*<v1,v2,v3> + 3*<u1,u2,u3>
=<2v1+3u1, 2v2+3u2, 2v3+3u3

#

That's basically it for linear combination, for echelon form they just do this over and over until you have 0s in the lower left triangle

#

so that you are able to use back substitution from bottom to top

tidal sonnet Oct 1, 2020, 3:12 AM

#

AH

#

THANK YOU!

#

That has been noted... I appreciate your help 518nad

last peak Oct 1, 2020, 3:13 AM

#

yw

spiral yew Oct 1, 2020, 3:51 AM

#

I'm wondering wheter I should use my desktop pc which can run either windows or linux or my macbook pro. The desktop pc has pretty good specs (a quad core cpu, 16gb ram, and a nvidia 980 gpu). my macbook on the other hand has like a dual core cpu and 16gb of ram. For deep learning and computer vision stuff, which computer/OS should I use? I either use windows on the desktop pc, linux on the desktop pc, or use my macbook for coding the models and then training them on my desktop pc.

austere swift Oct 1, 2020, 3:52 AM

#

well youd probably wanna train the models on the desktop cus of the gpu

#

but as for coding thats up to you

#

and same with os

#

its mostly preference

plucky spindle Oct 1, 2020, 3:53 AM

#

Hello guys, can someone recommend a python module to analyze files in .wav format, I need to convert the audio into text and apply machine learning, thanks in advance

austere swift Oct 1, 2020, 3:54 AM

#

you can use the wave plugin to read it

#

and i think scipy can do it too

#

and scipy does it into numpy arrays so you can train straight off of that

#

https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html

spiral yew Oct 1, 2020, 3:58 AM

#

@austere swift do u use linux or windows?

austere swift Oct 1, 2020, 3:58 AM

#

i use both

#

dual boot

#

and the training server i have is straight ubuntu 18.04

spiral yew Oct 1, 2020, 3:59 AM

#

hm so windows is ur daily driver essentially?

austere swift Oct 1, 2020, 3:59 AM

#

yeah windows is for like games and files and stuff and linux is for dev work

#

but you can use windows for both if you want

plucky spindle Oct 1, 2020, 4:00 AM

#

@austere swift i see, reading the docs, thanks

austere swift Oct 1, 2020, 4:00 AM

#

the only thing is linux doesnt have support for a lot of programs anyways

serene scaffold Oct 1, 2020, 4:47 AM

#

group_0 = df[df[2] == 0 | df[2] == 0.] how can I express this for dataframes?

lapis sequoia Oct 1, 2020, 5:15 AM

#

do you want to select a subset of dataframe?

#

@serene scaffold

serene scaffold Oct 1, 2020, 5:16 AM

#

yes, though it turns out df[2] == 0 works for both 0 and 0. so this is not necessary.

#

unless my code is silently doing something unexpected.

#

but it appears to be working

lapis sequoia Oct 1, 2020, 5:17 AM

#

is the column named as 2?

serene scaffold Oct 1, 2020, 5:17 AM

#

it doesn't have a name, it's just the third column

lapis sequoia Oct 1, 2020, 5:17 AM

#

oh ok

serene scaffold Oct 1, 2020, 5:18 AM

#

apparently serieses get array-like indexing if you specify that there aren't any headers.

lapis sequoia Oct 1, 2020, 5:18 AM

#

yes

thin terrace Oct 1, 2020, 6:39 AM

#

I'm using some file-extensions as features in a classification problem. But I've got a feeling that it may make the feature vector too sparse, do you think it's a good idea to group the extensions by types and how would you categorize them if so?

The extensions I'm looking at are ['java', 'json', 'ts', 'xml', 'js', 'html', 'css', 'ini', 'py', 'cfg', 'sh', 'yaml', 'env', 'properties']
Perhaps 'src': ['java'], 'web': ['html', 'css'], 'script': ['json', 'ts', 'js', 'py', 'sh'], 'conf': ['xml', 'ini', 'cfg', 'yaml', 'env', 'properties']?

glass jetty Oct 1, 2020, 7:16 AM

#

That probably depends on the classification task. Sparsity is also not necessarily a problem, that depends on your system's memory limits etc., and on the specific model.

lapis sequoia Oct 1, 2020, 7:26 AM

#

@plucky spindle Are you looking for speech-to-text?

#

There are several APIs. Google or Azure for example

#

you may also look into Mozilla's deepspeech

#

if there are privacy issues for example

thin terrace Oct 1, 2020, 7:57 AM

#

That probably depends on the classification task. Sparsity is also not necessarily a problem, that depends on your system's memory limits etc., and on the specific model.
@glass jetty yeah I know. it's an RFC and afaik they don't like sparse features as the splits become messed up when you create dummies out of categorical features. I guess I could use a NN to avoid that problem

glass jetty Oct 1, 2020, 7:57 AM

#

RFC = RandomForest?

thin terrace Oct 1, 2020, 7:58 AM

#

yes (classifier)

lapis sequoia Oct 1, 2020, 11:18 AM

#

What is the problem here?

📎 Screenshot_2020-10-01_at_13.17.46.png

eager heath Oct 1, 2020, 11:35 AM

#

Didn't you mean only one =?

mild topaz Oct 1, 2020, 12:05 PM

#

Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 75, in <module>
    assert (x_train.shape[1:] == (imageDimensions)),  "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong```

#

imageDimensions = 32, 32, 3 i am passing here

mild topaz Oct 1, 2020, 1:41 PM

#

i am following this tutorial

#

https://www.youtube.com/watch?v=SWaYRyi0TTs

YouTube

Murtaza's Workshop - Robotics and AI

Traffic Signs Classification Using Convolution Neural Networks CNN ...

Train and classify Traffic Signs using Convolutional neural networks This will be done using OPENCV in real time using a simple webcam . CNNs have been gaining popularity in the past couple of years due to their ability to generalize and classify the data with high accuracy. ...

▶ Play video

halcyon vale Oct 1, 2020, 1:46 PM

#

https://www.linkedin.com/posts/thinam-tamang-3b12831a2_66daysofdata-datascience-machinelearning-activity-6717423517610082306-IFJU

Thinam Tamang posted on LinkedIn

Day 28 of #66DaysOfData! with Ken Jee

In my journey of Natural Language Processing, Today I have read and Implemented about Convolutional Neural Network...

mild topaz Oct 1, 2020, 1:50 PM

#

@halcyon vale sorry to ping you can u look into my issue?

plucky spindle Oct 1, 2020, 1:54 PM

#

you may also look into Mozilla's deepspeech
@lapis sequoia ok thanks for the info, i will search for it, yes i need speech-to-text for identify calls motives

sand pivot Oct 1, 2020, 1:57 PM

#

hey doe anyone know why u would get runtimewarning : invalid value encountered greater than equal when trying to make a boxplot?

lapis sequoia Oct 1, 2020, 2:06 PM

#

https://github.com/gpk2000/Kitty-vs-Doggo

GitHub

gpk2000/Kitty-vs-Doggo

Building a deep learning model, which recognises cats and dogs. - gpk2000/Kitty-vs-Doggo

#

🙏 plz try this

mild topaz Oct 1, 2020, 2:12 PM

#

@lapis sequoia is it related to my issue?

lapis sequoia Oct 1, 2020, 2:12 PM

#

no It's my project

mild topaz Oct 1, 2020, 2:12 PM

#

can u look into my issue ? @lapis sequoia

#

Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 75, in <module>
    assert (x_train.shape[1:] == (imageDimensions)),  "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong```

lapis sequoia Oct 1, 2020, 2:15 PM

#

print these and see where the issue is

mild topaz Oct 1, 2020, 2:17 PM

#

can i share u my code so u get better idea what i am doing here?

#

https://paste.pythondiscord.com/tekulibage.coffeescript my code here @lapis sequoia

lapis sequoia Oct 1, 2020, 3:04 PM

#

from scipy import stats

class AI:
    used = []
    @classmethod
    def get_answers(cls):
        times = int(input('How many answers: '))
        count = 0
        while count < times:
            AI.used.append(input())
            count += 1

    @classmethod
    def get_mode(cls):
        mostused = stats.mode(AI.used)
        spl = str(mostused).split("'")
        print(f'Most Used: {spl[1]}')

AI.get_answers()
AI.get_mode()

does anyone know a more optimized way to do this?

fringe stag Oct 1, 2020, 3:21 PM

#

hello, could some suggest way to detect rectangles in bitmap image. Like part of OCR or computer vision system. I want to understand round of table cell and table itself.

lapis sequoia Oct 1, 2020, 3:38 PM

#

Hello, I'm looking for a feedback based on experience regarding ML forecasting technics issues in training the model in a small sample size and moving it to whole set of data, for example take a small sample of data that is very representative of the whole set test different models select one then go to the selected model train it in the whole set of data to see if that's the right model to use for our data ?

real wigeon Oct 1, 2020, 3:39 PM

#

anyone know how to append the output to a row? I'm using iterrows()

#

im iterating per row, and one of the columns is blank; im hashing an xls file and want to place the hashed values into the empty column (per row)

#

from flask import Flask
from flask_bcrypt import Bcrypt
import pandas as pd


df = pd.read_excel('/Users/daskjdhaDownloads/Employee Data Final.xlsx', names=['email','password',
                                                                                       'hashed password'])
app = Flask(__name__)
bcrypt = Bcrypt(app)


with open('hashed employee passwords.xls', 'a+') as f:
    for _, row in df.iterows():
        email, unhashed_password, hashed_password = row
        pw_hash = bcrypt.generate_password_hash(unhashed_password).decode('utf-8')
        run = bcrypt.check_password_hash(pw_hash, unhashed_password)```

raw blaze Oct 1, 2020, 4:18 PM

#

I figured I'd ask here because pandas relates to data science..

I have a dataframe where each row is values to create a custom class.

class MyClass:
  def __init__(self, a,b,c):
    pass

df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
fields= ['a','b','c']

# my objective is to convert my dataframe into a list of generated MyClass objects. So i do..
new_series = df.apply(lambda row: dict(zip(fields, row)), axis=1)
# for example for the first row this gives me: {'a': 1, 'b': 4, 'c': 7} which is want i want..
#however here, when I apply to create my custom class, I get the error

new_series.apply(lambda key_pairs: MyClass(**key_pairs)) # got multiple values for argument 'a' TypeError

Any advice?

#

Just for brevity, this works perfectly:

first_row = new_series.iloc[0]
obj = MyClass(**first_row)

hybrid rampart Oct 1, 2020, 4:46 PM

#

what do i have to do with these things?

📎 unknown.png

deft harbor Oct 1, 2020, 4:57 PM

#

class SiteAE(Model):
    def __init__(self):
        super(SiteAE, self).__init__()
    
        self.encoder = tf.keras.Sequential([
            layers.Input(shape=(BATCH_SIZE, IMG_HEIGHT, IMG_WIDTH, 3), name='Inp_enc'),
            layers.Conv2D(16, 5, 2, padding='same', activation='relu', name='C1_enc'),
            layers.Conv2D(32, 5, 2, padding='same', activation='relu', name='C2_enc'),
            layers.Conv2D(64, 5, 2, padding='same', activation='relu', name='C3_enc'),
            layers.Flatten(),
            layers.Dense(LATENT_DIM, activation='relu', name='D_enc')], 
            name="Encoder")
    
        self.decoder = tf.keras.Sequential([
            layers.Input(shape=(LATENT_DIM, ), name='Inp_dec'),
            layers.Dense(4096, activation='relu', name='D_dec'),
            layers.Reshape((4, 4, 256), name='RS_dec'),
            layers.Conv2DTranspose(64, 3, 2, activation='relu', padding='same', name='C1_dec'),
            layers.Conv2DTranspose(32, 3, 2, activation='relu', padding='same', name='C2_dec'),
            layers.Conv2DTranspose(16, 3, 2, activation='relu', padding='same', name='C3_dec'),
            layers.Conv2DTranspose(3, 3, 2, activation='sigmoid', padding='same', name='Final_dec')], 
            name="Decoder")
    
    def call(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

autoencoder = SiteAE()

#

Model: "site_ae"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Encoder (Sequential)         (None, 64)                4259680   
_________________________________________________________________
Decoder (Sequential)         (None, 64, 64, 3)         437283    
=================================================================
Total params: 4,696,963
Trainable params: 4,696,963
Non-trainable params: 0

#

Any idea why my Transpose layers aren't actually scaling up the tensor?

#

It should be scaling the latent space to (256, 256, 3)

mossy grotto Oct 1, 2020, 7:08 PM

#

Hello... I'm looking to hire a tutor for a few hours for some help with matplotlib. If you have experience with DSP that would be even better. I'm using spyder and I have a lot of random questions. They are all fairly easy. Shoot me a message with some of your work and we can discuss rates and whatnot.

dry hearth Oct 1, 2020, 7:36 PM

#

Hi, I'm looking for a couple of minutes of someone's time here who's skilled in NLP to understand some things

stark orchid Oct 1, 2020, 7:57 PM

#

GitHub and Great Expectations just published an awesome GitHub action that is the first CI workflow for Data Pipelines available directly from PRs on GitHub.
https://twitter.com/HamelHusain/status/1311699555243552769?s=20

Hamel Husain (@HamelHusain)

Really excited to announce the new @expectgreatdata GitHub Action!

The first CI workflow (that I know of) ✨✨for Data Pipelines✨✨ available directly from PRs on GitHub.

Read more about it here: https://t.co/MmzkrROADx

A teaser 👇, also thread: 🧵 (1/7) https://t.co/Ws8AUkB...

▶ Play video

heady hatch Oct 1, 2020, 8:10 PM

#

@deft harbor I don't know enough about autoencoders, so I don't know if this will give you some insight.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_5 (Dense)              (None, 8, 8, 4096)        8192      
_________________________________________________________________
reshape_1 (Reshape)          (None, 4, 4, 256)         0         
_________________________________________________________________
conv2d_transpose_75 (Conv2DT (None, 8, 8, 64)          147520    
_________________________________________________________________
conv2d_transpose_76 (Conv2DT (None, 16, 16, 32)        18464     
_________________________________________________________________
conv2d_transpose_77 (Conv2DT (None, 32, 32, 16)        4624      
_________________________________________________________________
conv2d_transpose_78 (Conv2DT (None, 64, 64, 3)         435       
=================================================================

I noticed that the reshaping makes it 4, 4, 256. Then the Conv2DTranspose only shape up from there.

deft harbor Oct 1, 2020, 8:11 PM

#

That was indeed the problem. Thanks for the response.

#

Reshape has to be (16, 16, x)

heady hatch Oct 1, 2020, 8:12 PM

#

Ahh, I'm happy to hear that you're able to solve it! I'm planning on studying autoencoders later myself.

deft harbor Oct 1, 2020, 8:14 PM

#

They are fun when you start doing CVAEs

#

You can make people smile, or make them change genders

#

This one is simple, as all I need for the project are the latent variables

heady hatch Oct 1, 2020, 8:17 PM

#

~~latent variables similar to SVD and PCA?~~ Oh no, never mind. Lots to learn.

lone tusk Oct 1, 2020, 8:41 PM

#

is this the right place to ask a panda question :3 ?

fast plover Oct 1, 2020, 9:17 PM

#

re: matplotlib; Is there a way for me to increase the y-axis 'excess' by a percentage? I mean instead of it bounding at the min and max of my dataset, increase that by like 10% in each direction

#

Documentation is incredibly dense so finding the exact thing I need is finicky.

dreamy spoke Oct 1, 2020, 10:18 PM

#

Hi there, If I want to get into the ML world, where should I start? Thanks in advance!

paper niche Oct 2, 2020, 12:57 AM

#

@fast plover yep, https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.axes.Axes.set_ymargin.html

#

is this the right place to ask a panda question :3 ?
@lone tusk yes it is, feel free to ask

tidal sonnet Oct 2, 2020, 2:23 AM

#

I have this matrix A which is:
[[1, 1, 1],
[3, 2, 1],
[2, 1, 2]]

Which multiplies with [a, b, c]

Which is equal to S = [15, 28, 23]

The Values of [a, b, c] were then found to be [3, -1/2, 0], and I have been told to put the A in echelon form.
The values i'm supposed to replace are A12, A13, A23 and s1, s2 and s3.
A = [[ 1 , A12, A13],
[ 0, 1 , A23],
[ 0, 0 , 1 ]]
s = [s1, s2, s3]

But no matter how I was doing it... I just ended up being confused. Because that would mean that the price for carrots (c) is equal to 0, and the price for banana's (b) is equal to -0.5...

#

since [a, b, c] were said to be the prices of apples, bananas and carrots, and s is the total for that day

#

A = [[1, 1, 1], [0, -1, -2], [1, 0, 1]]

s = [15, -17, 8]
I reach as far as here, but then I get stuck...

heady hatch Oct 2, 2020, 3:23 AM

#

My linear algebra is a bit rusty but let's work through this step by step.

#

So what are the steps to turning A into REF (row echelon form)?

#

@tidal sonnet

tidal sonnet Oct 2, 2020, 3:55 AM

#

subtract a scalar of row one from row 2, then row 3, aim being to get the trailing diagonals as 0

#

@heady hatch

heady hatch Oct 2, 2020, 6:40 AM

#

@tidal sonnet how did you find the values of a, b, c to be 3, -1/2, and 0?

molten bronze Oct 2, 2020, 7:13 AM

#

I am trying to train a NEAT neural net to play a game with a screenshot as the input but I am having an odd bug. If I run the game in directx everything works fine but if it is running using the vulkan API I just get the same frame over and over. I need to run it in vulkan as it is far more performance friendly. I have seen some discussion on issues with capturing vulkan applications. Would anyone have any ideas?

royal thunder Oct 2, 2020, 7:56 AM

#

can anyone help me i cant understand what the diagram is saying

📎 unknown.png

lapis sequoia Oct 2, 2020, 8:11 AM

#

well, you have some data points the blue dots

#

and you probably want to predict the 'life satisfaction' based on the 'GDP per capita'

#

so you try to find a model which does that

#

as you see the blue dots seem to fit on a line (a linear model)

#

which is described by the equation life_satisfaction = theta(0) + theta(1) x GDP

#

now this diagramm shows some models for more or less random values of theta(0) and theta(1)

#

green line doesnt fit at all, red is doing better but way to low and blue is already doing an ok job

#

but could still be better, if theta(0) would be a bit higher

royal thunder Oct 2, 2020, 8:18 AM

#

the blue line and red line intersects right? is that the point where poeple gets satisfaction?

lapis sequoia Oct 2, 2020, 8:19 AM

#

these lines are purely random

#

they dont have any specific meaning

royal thunder Oct 2, 2020, 8:19 AM

#

so what does it mean?

lapis sequoia Oct 2, 2020, 8:20 AM

#

well, theyre just examples for possible models

royal thunder Oct 2, 2020, 8:20 AM

#

Oh i get it

lapis sequoia Oct 2, 2020, 8:20 AM

#

and of these, blue model is the one that fits the data best

royal thunder Oct 2, 2020, 8:20 AM

#

red?

lapis sequoia Oct 2, 2020, 8:20 AM

#

red is far away from the data points

#

but got at least the right trend

#

it goes up

#

whereas green goes down

royal thunder Oct 2, 2020, 8:21 AM

#

so the blue line is the trend where it shows satisfactory level right?

lapis sequoia Oct 2, 2020, 8:21 AM

#

blue is doing ok, but is clearly not the optimal solution

royal thunder Oct 2, 2020, 8:22 AM

#

what is the optimal solution then?

lapis sequoia Oct 2, 2020, 8:23 AM

#

well, finding that one is often the data scientist job

royal thunder Oct 2, 2020, 8:23 AM

#

i get that one now

lapis sequoia Oct 2, 2020, 8:23 AM

#

and theres no general way to find an optimal solution. You have to define a goal before that

#

what is often used is the root mean square distance

#

thats getting really technical

#

not sure if that helps you

royal thunder Oct 2, 2020, 8:24 AM

#

i am an noobie in machine learning so

#

i am haven't learned math that much for now

lapis sequoia Oct 2, 2020, 8:25 AM

#

a good solution is one, where the line (the linear model) has a low distance

#

to the data points

royal thunder Oct 2, 2020, 8:26 AM

#

oh i get it

lapis sequoia Oct 2, 2020, 8:26 AM

#

so you are confident you can predict unknown values

#

however, there are several ways to think of distance

royal thunder Oct 2, 2020, 8:26 AM

#

yeah i totatlly get it man

#

so i started like a day ago

#

with hands on machine learning on scikit and tensor flow in python

#

will you help me get better?

lapis sequoia Oct 2, 2020, 8:28 AM

#

I can try, but others here will help you as well

royal thunder Oct 2, 2020, 8:29 AM

#

not a problem man

#

for now i started

lapis sequoia Oct 2, 2020, 8:30 AM

#

finding a model to predict unknown data is a big part of machine learning

#

if not the essence

royal thunder Oct 2, 2020, 8:32 AM

#

ok for now you can help me with numpy

#

and pandas

lapis sequoia Oct 2, 2020, 8:33 AM

#

just ask a question here or in one of the help channels

royal thunder Oct 2, 2020, 8:34 AM

#

yeah i wanna learn numpy tho

#

so i got headed to their website

#

it kind of showed me a 1500 pages reference guide so i want resources for now to study

#

you know any?

lapis sequoia Oct 2, 2020, 8:38 AM

#

well, there are hundreds of tutorials out there

#

but I cant recommend one for you

royal thunder Oct 2, 2020, 8:39 AM

#

umm why?

lapis sequoia Oct 2, 2020, 8:39 AM

#

Didnt use one myself

#

learned most of the stuff i know in school and university

#

and got a pretty solid mathematical background

outer geyser Oct 2, 2020, 8:40 AM

#

I would recommend checking out freecodecamp in yt for tutorials

royal thunder Oct 2, 2020, 8:40 AM

#

i am not good in maths man

lapis sequoia Oct 2, 2020, 8:40 AM

#

not sure about you

royal thunder Oct 2, 2020, 8:40 AM

#

I would recommend checking out freecodecamp in yt for tutorials
@outer geyser checking that one thanks man

outer geyser Oct 2, 2020, 8:40 AM

#

they have a 4 hour beginner course

lapis sequoia Oct 2, 2020, 8:40 AM

#

yep, thats why you probably need other resources than Id watch

outer geyser Oct 2, 2020, 8:40 AM

#

can I send links here? @lapis sequoia

royal thunder Oct 2, 2020, 8:40 AM

#

i got low grades in math man wanna improve that

lapis sequoia Oct 2, 2020, 8:41 AM

#

I think so, but im not a mod

#

got the colored nick from the code jam

royal thunder Oct 2, 2020, 8:41 AM

#

i wanna get good in linear algebra ,calculus and probability statistics man

outer geyser Oct 2, 2020, 8:42 AM

#

https://www.youtube.com/watch?v=HfACrKJ_Y2w&t=1202s

YouTube

freeCodeCamp.org

Calculus 1 - Full College Course

Learn Calculus 1 in this full college course.

This course was created by Dr. Linda Green, a lecturer at the University of North Carolina at Chapel Hill. Check out her YouTube channel: https://www.youtube.com/channel/UCkyLJh6hQS1TlhUZxOMjTFw

This course combines two courses t...

▶ Play video

#

check this one out @royal thunder ^

#data-science-and-ml

[3, 4, 1] [3, 4.5, 1.5]

[3, 4, 1]
[3, 4.5, 1.5]