#data-science-and-ml | Python | Page 222

raw rapids Apr 26, 2020, 3:45 PM

#

is there a discord for python machine learning

#

please tag me

#

with your answer

lapis sequoia Apr 26, 2020, 3:46 PM

#

I dont beleive so

#

@raw rapids

raw rapids Apr 26, 2020, 3:46 PM

#

hmmm

#

ok

lapis sequoia Apr 26, 2020, 3:46 PM

#

Well there may be some

#

but they may be hard to find

#

also

raw rapids Apr 26, 2020, 3:46 PM

#

but u dont know

lapis sequoia Apr 26, 2020, 3:46 PM

#

Yeah im not too sure

raw rapids Apr 26, 2020, 3:46 PM

#

ok

#

also?

lapis sequoia Apr 26, 2020, 3:46 PM

#

oh

#

It will be hard to find one if there arent maybe

#

many*

#

and if there arent many it may be a small server with a couple hundred

raw rapids Apr 26, 2020, 3:47 PM

#

ok

#

i found a server

#

with 7,304

tired copper Apr 26, 2020, 3:48 PM

#

https://discordapp.com/invite/CbVJYtz

raw rapids Apr 26, 2020, 3:48 PM

#

ya

#

i joined that

oblique belfry Apr 26, 2020, 4:41 PM

#

Any thoughts on how you would implement machine learning and video streams? I am trying to run AI on different video streams and I am trying to think of the best way to present predictions to the end user. I am thinking I would need to incorporate different techniques from event sourcing.

worldly elm Apr 26, 2020, 7:54 PM

#

hey everyone, I am trying to train a transformer language model on wikitext2 using 6 layers, 12 attention heads and seq lenght 64

#

I know the papers usually report seq lenghts in the order of 512,but I can only dream that many GPUs, that being said, convergence to a low enough perplexity can be expected in less than 100 epochs?

#

I am using a steplr optimizer, starting from a lr of 6.8

noble gyro Apr 26, 2020, 8:38 PM

#

well if i understand it (probably i dont) databases u store data in them right? And data science is just using it

#

in ml or some analysis

jolly briar Apr 26, 2020, 8:39 PM

#

@noble gyro i don't understand how that seems to be similar to you

noble gyro Apr 26, 2020, 8:39 PM

#

i said something wrong?

jolly briar Apr 26, 2020, 8:40 PM

#

@noble gyro data-science is typically analysing data, databases are storing it efficiently and designing the storage of it

noble gyro Apr 26, 2020, 8:42 PM

#

maybe its just connected like someone who is doing data science probably knows databases

jolly briar Apr 26, 2020, 8:43 PM

#

they likely have to interact with them, but they might not know about design etc

#

a data engineer would be more likely to understand databases

noble gyro Apr 26, 2020, 8:44 PM

#

hmm oki

#

xD

#

just asking cuz idk what field to choose

jolly briar Apr 26, 2020, 8:45 PM

#

i think it makes sense to do something within one first

noble gyro Apr 26, 2020, 8:50 PM

#

'within one' means?

spark stag Apr 26, 2020, 8:59 PM

#

have a go at both and experiment to see which field you like

jolly briar Apr 26, 2020, 9:00 PM

#

@noble gyro i mean do them, see what you like

late jackal Apr 27, 2020, 12:24 AM

#

is there a good place to learn how to use sklearn PCA?

#

i have a 58x1957 dataset

slim elm Apr 27, 2020, 12:28 AM

#

hello, I am using sqlite3 to execute a query, I am running into an issue where I am not sure how to create a parameter around a datetime column to only execute items within that date range.

arctic wedgeBOT Apr 27, 2020, 12:36 AM

#

Hey @slim elm!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

slim elm Apr 27, 2020, 12:37 AM

#

https://paste.pythondiscord.com/ecokunuxip.py

dull turtle Apr 27, 2020, 6:36 AM

#

hi huyz i am having my image recognition model.

#

I have used passport images & liscence images

#

I want that my model should identify that if user provides different image than passport & liscence image

#

Then model should identify that it is wrong image or different than provided categories.

#

How i tell my model that to identify it is wrong image

lapis sequoia Apr 27, 2020, 7:38 AM

#

How to get started in data science ?

uncut shadow Apr 27, 2020, 7:40 AM

#

that is the question

echo tendon Apr 27, 2020, 12:33 PM

#

@echo tendon sample can be useful df.sample(5, random_state = 1) for example, if the top/tail of the dataframe aren't very representative
@jolly briar thank you, you have helped me so much!

#

exactly what I was looking for 😄

twilit imp Apr 27, 2020, 3:53 PM

#

hey so I am trying to create a neural net from scratch

#

and got into some problems when implementing the softmax functions

#

so here are some questions

#

do I have to use a softmax function in my neural network and why?

#

when using the softmax function, do I have still multiply the hidden neurons to the weights?
is just calculating the softmax from my hidden layer enough?

uncut shadow Apr 27, 2020, 3:56 PM

#

umm

#

softmax is used mostly in the output layer when you want to perform multi-class classification

#

so I don't think you should use it in hidden layer

#

but I might be wrong

twilit imp Apr 27, 2020, 3:58 PM

#

no I dont use it inside my hidden layer

#

I calculate the softmax from my hidden layer to my output layer

#

but I had weights and biases assigned between hidden and output layer

#

wait @uncut shadow do you know a little about neural networks?

uncut shadow Apr 27, 2020, 4:09 PM

#

ye

#

so you calculate the output for the output layer

#

and apply softmax function

#

and the output for the output layer will be wx + b where w are weights and b is bias

#

if this is Dense layer

twilit imp Apr 27, 2020, 4:24 PM

#

alright ive another question

#

so I am working on this "project" of mine where I am trying to learn a simple neural network to get to the apple, where I have two input neurons those tell the network if the distance to apple x is negative or positive and distance of apple y is negative or positive

#

I have one hidden layer where I multiply weights and biases to the input layers there are 8 hidden neurons

#

and 4 output neurons for the player movement: x, -x, y, -y

#

I am using a genetic algorithm where I pick every time the closest player to the apple and mutate his weights and biases

#

but for some reason after mutating weights nothing happens

#

all the players behave the same

#

!paste

#

this is the code I have so far

#

https://paste.pythondiscord.com/urapabasok.py

lapis ice Apr 27, 2020, 7:10 PM

#

Question regarding DCGAN/GAN's and datasets.
Can you manipulate your own dataset to increase the training data amount? For example, take the image, invert it.

agile shale Apr 27, 2020, 9:16 PM

#

Im beginner in datascience, pls i need init in this

noble gyro Apr 27, 2020, 9:19 PM

#

Hmm guys okay i am using pandas and is there a way to visualate data without going to jypiter notebook site

graceful ice Apr 27, 2020, 9:47 PM

#

Hey guys

#

anybody have worked with tensorflow here ?

#

Please dm me

#

if possible

vital sphinx Apr 27, 2020, 10:01 PM

#

Hmm guys okay i am using pandas and is there a way to visualate data without going to jypiter notebook site
@noble gyro not sure, I understand the question. can't you just print the dataframe in whatever IDE you're using?

noble gyro Apr 27, 2020, 10:02 PM

#

yes i can, i mean to make it like a graph @vital sphinx

vital sphinx Apr 27, 2020, 10:23 PM

#

@noble gyro have you tried using the matplotlib library?

noble gyro Apr 27, 2020, 10:24 PM

#

hmm i didnt, it allows data visualization?

jolly briar Apr 28, 2020, 12:02 AM

#

@noble gyro yes it's a visualisation library

drowsy ibex Apr 28, 2020, 2:12 AM

#

What are some datasets related to Covid-19? Can someone make a recommendation or two for getting started in DS?

royal lodge Apr 28, 2020, 3:09 AM

#

Hi guys, pandas question. I dont understand the behavior or pandas.datetime today vs now

pd.to_datetime('today', utc=True)
Timestamp('2020-04-28 11:06:51.310959+0000', tz='UTC')
pd.to_datetime('now', utc=True)
Timestamp('2020-04-28 03:06:54.922901+0000', tz='UTC')

#

why is it different

polar acorn Apr 28, 2020, 8:14 AM

#

Seems pd.to_datetime("today") ignores the utc argument and returns local time. Why though? Not sure.

thin kindle Apr 28, 2020, 11:57 AM

#

Hello guys, does someone know a website that store covid datas in JSON format

slim fox Apr 28, 2020, 11:59 AM

#

https://covid-19-apis.postman.com/

Postman COVID-19 API Resource Center | List of APIs and Blueprints

View our growing list of novel coronavirus (COVID-19) API collections to help fight this pandemic. And learn how to use our blueprints for quickly deploying new APIs from existing data sets.

#

you can start here

thin kindle Apr 28, 2020, 12:00 PM

#

Thx 🙏

jolly briar Apr 28, 2020, 2:58 PM

#

@thin kindle if you want json and have csv then pd.read_csv(<url to raw csv>).to_json() might be useful... there might be a nicer way to do that though.

oblique belfry Apr 28, 2020, 3:15 PM

#

Does anyone know if compiling OpenCV from scatch gives performance boosts?

buoyant trout Apr 28, 2020, 3:38 PM

#

maybe, if there are instruction set extensions that are disabled in the default binary and your CPU support those extensions.

thin kindle Apr 28, 2020, 3:44 PM

#

Thx for the advice @rie

hardy harness Apr 28, 2020, 3:46 PM

#

In neural nets, do you apply dropout regularisation both in the forward and the backward pass?

#

If so, do you want to have the same masks for the layers in both passes?

buoyant trout Apr 28, 2020, 3:53 PM

#

masks as in what nodes gets dropped?

hardy harness Apr 28, 2020, 3:53 PM

#

yes

buoyant trout Apr 28, 2020, 3:53 PM

#

yeah they are the same

#

in fact dropouts are applied to a layer, and you don't even specify which "direction"

#

at least for frameworks like keras or tf

hardy harness Apr 28, 2020, 3:54 PM

#

I'm implementing a ffnn

#

and it's kinda weird as in we aren't using classes

#

so the network is defined by a dictionary of weights

#

which are in the dimensions needed to go from one layer to the next

buoyant trout Apr 28, 2020, 3:56 PM

#

Ahh can’t help you there my experience with neural nets are limited to that one class I took in college. Sorry buddy.

hardy harness Apr 28, 2020, 3:56 PM

#

heh no worries

#

any case, my weights look like this

#

📎 unknown.png

#

so I guess I'd 0 some weights as a dropout?

buoyant trout Apr 28, 2020, 3:58 PM

#

yeah but after one batch you want to restore them to the original values

#

the nodes only have weights and no bias right

hardy harness Apr 28, 2020, 3:59 PM

#

yeap

#

we're not using biases for simplicity

#


def forward_pass(x, W, dropout_rate=0.2):
    out_vals = {}
    h_vecs = []
    a_vecs = []
    dropout_vecs = []
    
    #embedding lookup
    h0 = np.add.reduce(W[0][x,:],axis=0)
    a0 = relu(h0)
    h_vecs.append(h0)
    a_vecs.append(a0)
    
    for k in range(1,len(W)):
        h = np.dot(a_vecs[k-1],W[k])
        #don't calculate relu or store h,a for output layer
        if k+1 != len(W):
            a = relu(h)
            h_vecs.append(h)
            a_vecs.append(a)
        
        dropout = dropout_mask(W[k].shape[0],dropout_rate)
        dropout_vecs.append(dropout)
        
    out_vals = {
        'h':h_vecs,
        'a':a_vecs,
        'dropout_vec':dropout_vecs,
        'y': softmax(h)
    }

    return out_vals

this is my forward pass. I'm not sure how to use dropout tbh

buoyant trout Apr 28, 2020, 4:13 PM

#

http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

#

here chapter 4 talks about how to actually do it

#

but i'm too stupid to understand it, hope it helps tho

hardy harness Apr 28, 2020, 4:18 PM

#

thank you for spending the time! I will look into it

simple ocean Apr 28, 2020, 4:41 PM

#

So all the various things I've read about (deep) neural networks seem to say that to calculate a neurons value you should take the sum of the previous neurons multiplied by the weights linking them to your selected neuron. Is there any use of taking the average of these values instead?

spark stag Apr 28, 2020, 4:57 PM

#

you possibly could, it just means all your inputs to future layers would be however many inputs that neuron has times less than it would overwise be, but it may just lead the network to increase the weights and biases by that scale too, idk i'm no expert just speculating

simple ocean Apr 28, 2020, 5:02 PM

#

Seems likely. I guess it might be affected by the activation function as well... using sigmoid for example I was wondering about how large networks dealt with the huge sensitivity of the neuron, because even a tiny change in the input would change the sigmoid if it was close to 0. So if we had 1000 input weights and all of them were 0 except for one, which was set to say, 5, then given the previous neuron was high value enough it would completely change the output of the neuron. That's really what I'm confused by

turbid hearth Apr 28, 2020, 5:50 PM

#

Are there any courses online for me to learn scipy and numpy?

robust dome Apr 28, 2020, 6:02 PM

#

hey guys can anyone help me understand this:
def PatternCount(Text, Pattern):
count = 0
for i in range(len(Text)-len(Pattern)+1):
if Text[i:i+len(Pattern)] == Pattern:
count = count+1
return count

print(PatternCount("GCGCG","GCG" ))
it returns 2

uncut shadow Apr 28, 2020, 6:04 PM

#

@robust dome it tells you exactly how many times GCG occured in this text

#

also, I assume it's some DNA, right?

wispy cradle Apr 28, 2020, 6:05 PM

#

I have a tensorflow question if there are any experts here

uncut shadow Apr 28, 2020, 6:05 PM

#

@wispy cradle it will be better to ask a question and if somebody knows the answer he/she can easily answer it

wispy cradle Apr 28, 2020, 6:07 PM

#

Does anybody know if its possible to add extra text data to a tensorflow object detection bounding box? For example, adding a count of how many objects of that class in the image, to the bounding box.

robust dome Apr 28, 2020, 6:07 PM

#

yes it is. @uncut shadow. I know that it does that but what confuses me the for loop part of the function, I jeep looking at it and I can't wrap my head around it

uncut shadow Apr 28, 2020, 6:07 PM

#

@turbid hearth there is a numpy course on Udemy altough I don't think it's worth, you can always learn how to use it for free just using some cheat sheats and stuff

#

@robust dome so, this loop basically iterates through the text and checks if there is a pattern. Let's say that you look for GCG pattern in GCGCG.
It works like that:

It chooses the first letter (G in this text).
it checks if G and next few letters together make this pattern (so it's G + next 2 letters which will together make GCG)
It checks if together they make the pattern you specified so (in this first loop it checks if GCG == GCG which is true)

for i in range(len(Text)-len(Pattern)+1):
  if Text[i:i+len(Pattern)] == Pattern:
    count = count+1

#

in your example, it returns 2 because it found GCG 2 times in your text

turbid hearth Apr 28, 2020, 6:14 PM

#

@robust dome is that from the UCSD bioinformatics course on coursera?

uncut shadow Apr 28, 2020, 6:14 PM

#

O.o

turbid hearth Apr 28, 2020, 6:14 PM

#

witcher by cheat sheets do u mean the official documentation of the specific python libraries?

robust dome Apr 28, 2020, 6:14 PM

#

@turbid hearth yeah it is

uncut shadow Apr 28, 2020, 6:15 PM

#

well, docs are great too but I meant things like for example this
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf

turbid hearth Apr 28, 2020, 6:16 PM

#

oh thanks

uncut shadow Apr 28, 2020, 6:16 PM

#

but you can google "numpy cheat sheet" and you will find many more

#

or for example here https://www.dataquest.io/blog/numpy-cheat-sheet/

Dataquest

NumPy Cheat Sheet — Python for Data Science – Dataquest

Download a free NumPy Cheatsheet to help you work with data in Python. Includes importing, exporting, filtering, sorting, scalar and vector maths and more.

robust dome Apr 28, 2020, 6:17 PM

#

@uncut shadow ah okay I think I get it now. Thank you so much man

uncut shadow Apr 28, 2020, 6:17 PM

#

👍

serene scaffold Apr 28, 2020, 7:16 PM

#

I'm taking a look at this word embedding implementation: https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281

Medium

A line-by-line implementation guide to Word2Vec using Numpy

Word2Vec is touted as one of the biggest recent breakthrough in the field of Natural Language Processing. The concept is simple, elegant…

#

their code transposes some matrices, but the text explaining what's going on doesn't actually acknowledge this, so I think it might have been a mistake.

eternal sentinel Apr 28, 2020, 7:49 PM

#

i need help with my homework. i'm suppose to write an implementation of a confusion matrix but i have no clue how to

lapis sequoia Apr 28, 2020, 7:52 PM

#

hey is there a course that teaches me ML and math needed for ML (I have a good basics but not sure all the basics I need)

serene scaffold Apr 28, 2020, 8:04 PM

#

@eternal sentinel do you understand what a confusion matrix is?

eternal sentinel Apr 28, 2020, 8:05 PM

#

i get it

serene scaffold Apr 28, 2020, 8:05 PM

#

great

eternal sentinel Apr 28, 2020, 8:05 PM

#

but how do you make it from scratch

serene scaffold Apr 28, 2020, 8:05 PM

#

you can use a dictionary

#

I would do this as a dictionary of tuples

#

so that the dictionary is flat

eternal sentinel Apr 28, 2020, 8:06 PM

#

do you have any reference you could pointme to

serene scaffold Apr 28, 2020, 8:07 PM

#

not off the top of my head

#

what sort of data are you trying to represent?

eternal sentinel Apr 28, 2020, 8:08 PM

#

regurlar 1D arrays

hardy harness Apr 28, 2020, 8:27 PM

#

noob question: in neural nets, does the size of each hidden layer have to be the same? meaning the amount of neurons

feral helm Apr 28, 2020, 8:35 PM

#

Is asking for help with a school assignment not allowed on this server

sterile zenith Apr 28, 2020, 8:39 PM

#

@hardy harness no, in fact, varying the number of neurons between layers is what helps neural nets choose the right features to pay attention to that give accurate predictions

hardy harness Apr 28, 2020, 8:57 PM

#

@sterile zenith huh, interesting. So both the layers and their sizes are hyperparameters then

sterile zenith Apr 28, 2020, 8:59 PM

#

I don't know what a hyperparameter is, but yeah, those are both variables

#

I remember seeing a website where you can play around with the # of layers and their size, let me see if I can find it

hardy harness Apr 28, 2020, 8:59 PM

#

well, variables that require tuning to get optimal performance from the nn

sterile zenith Apr 28, 2020, 8:59 PM

#

https://playground.tensorflow.org

Tensorflow — Neural Network Playground

Tinker with a real neural network right here in your browser.

hardy harness Apr 28, 2020, 9:00 PM

#

you wouldn't happen to be able to help with implementing dropout regularization, would you?

#

wow that's nice, thanks!

sterile zenith Apr 28, 2020, 9:03 PM

#

nope, can't help with that, sorry

hardy harness Apr 28, 2020, 9:03 PM

#

cool, cheers!

opaque crest Apr 28, 2020, 9:16 PM

#

I'm trying to import tkinter or Turtle, but it says I don't have tkinter. I'm trying to do pip instal tkinter but it can't find it, can anyone help me plz?

#

?

late jackal Apr 28, 2020, 9:28 PM

#

trying here again but im hoping someone here has some familiarity with the Kmeans algorithm

sterile zenith Apr 28, 2020, 9:36 PM

#

I think I can help with that, though it’s been a while, could you remind me of it?

lapis sequoia Apr 28, 2020, 9:38 PM

#

Anyone know of an API's, or barring that scrape friendly sites, with relatively accurate and as up to date as possible data for the labor market? Looking for things like projected shortages and surpluses, pay scales for different fields in different locations, and basically any other bits of data I can get. Ideally worldwide, but the US is my target so just that is fine as well.

sterile zenith Apr 28, 2020, 9:40 PM

#

I’d recommend checking out data.gov

rough tapir Apr 28, 2020, 9:40 PM

#

Am not sure if my problem relates to data science

#

Am kinda new ;/

#

Has anyone heard of glowscript?

late jackal Apr 28, 2020, 9:44 PM

#

@sterile zenith it all of the sudden worked but i still have to do some heiarchical clustering and DBSCAN so maybe ill ping you then 😹

#

eww why is it a cat

sterile zenith Apr 28, 2020, 9:44 PM

#

lol 👍 good luck

opaque crest Apr 28, 2020, 10:06 PM

#

How can I match an action to a number? I mean only if the number is 3 for example, an action ceiling, if it is 4, another action ceiling.
Just with if?

#

if for every number?

vital sphinx Apr 28, 2020, 11:59 PM

#

How can I match an action to a number? I mean only if the number is 3 for example, an action ceiling, if it is 4, another action ceiling.
Just with if?
@opaque crest you could write it in a dictionary

amber ice Apr 29, 2020, 12:09 AM

#

hello, does anyone have any resources that would help in learning point clouds through python?

drowsy ibex Apr 29, 2020, 6:20 AM

#

https://covid-19-apis.postman.com/
@slim fox Thank you

Postman COVID-19 API Resource Center | List of APIs and Blueprints

View our growing list of novel coronavirus (COVID-19) API collections to help fight this pandemic. And learn how to use our blueprints for quickly deploying new APIs from existing data sets.

balmy oar Apr 29, 2020, 6:21 AM

#

Hey folks, I would like to ask some help, books or resources to get up to speed with the interview coming, I will be getting a Python task which might include usage of Scikit Learn and Pandas, any ideas on the books or other reading material ?

brave frost Apr 29, 2020, 6:33 AM

#

I think it's far more important to understand the thing you need to do with scikit learn than to understand how to use the lib itself (but I am not an experienced data scientist, just used it in college)

balmy oar Apr 29, 2020, 6:45 AM

#

@brave frost that's a good point, so you would recommend steering for some materials that are more data science related than the lib itself ?

brave frost Apr 29, 2020, 6:47 AM

#

Depends on what the job description is and what you think you will be doing with the library

#

If you will be running ml models then know the different ones well and what their strengths and weaknesses are and which work for which data sets

#

For example, understanding this will help if the job is about clustering

#

https://scikit-learn.org/stable/_images/sphx_glr_plot_cluster_comparison_001.png

balmy oar Apr 29, 2020, 6:50 AM

#

@brave frost ok, let me ask more straightly then, any recommendations Data Science books, that would dig deeper into understanding the screenshot above, since I couldn't explain it as this point

brave frost Apr 29, 2020, 6:51 AM

#

Can't help you there, Stanford has some good free online courses about ai/ml/clustering/data mining/etc

balmy oar Apr 29, 2020, 6:58 AM

#

@brave frost https://see.stanford.edu/Course/CS229 ?

brave frost Apr 29, 2020, 6:58 AM

#

Yeah one of those

#

That is the more advanced one. The lower level of that course is 221

finite solar Apr 29, 2020, 7:55 AM

#

I'd like to add 1 to a specific "area" of a 2d np.array like this:

arr = np.zeros((10, 10))
arr[(2, 2):(4, 4)] += 1

arr

[[0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 1 1 1 0 0 0 0 0]
 [0 0 1 1 1 0 0 0 0 0]
 [0 0 1 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 ...
]```

#

That's not the real syntax but I couldn't find what I actually need to do

#

Unless this is not possible and I just need to use a for product over that "area"

#

Since currently I have this py for point in itertools.product(range(s1, e1), range(s2, e2)): arr[point] += 1

#

But I imagine numpy has something like this built-in and I just can't find it

polar acorn Apr 29, 2020, 8:34 AM

#

Works for me with arr[2:4,2:4] += 1, except slices are not end inclusive so it won't be exactly like your example.

dull turtle Apr 29, 2020, 8:47 AM

#

how to raise an alert when something goes wrong in python. If i expect "cat image" as input and user provides "snake image or rat image" then it should raise an alert for it
alert may be like "Something wrong has happen" or "plz provide proper image"```

finite solar Apr 29, 2020, 8:52 AM

#

@polar acorn thanks

dull turtle Apr 29, 2020, 12:09 PM

#

how i can train to ML model to identify "Wrong image"?

hardy harness Apr 29, 2020, 2:26 PM

#

I'm implementing a nn with 3 class output for text classification. As such, we're using softmax as the activation function for the output layer. I'm having a tough time computing the gradient on the output layer. any suggestions on that? or someone willing to take a look

acoustic forge Apr 29, 2020, 2:27 PM

#

I've written a script that gets a bunch of information on all of the games on steam. It does this through a bunch of API calls. It takes about 1 second for each game. My question is whether there's a way of making this faster? And is there a way of checking which parts of my script is taking up most of the time?

#

There are approximately 95k steam games/demo stuff etc. So that would be about 26 hours to get all information from steam

hardy harness Apr 29, 2020, 2:40 PM

#

well off the top of my head one way would be to somehow index the games, and then run parallel tasks each on a subset of the games list

#

but, tbh, given that you want to do it once, 26h sounds quite reasonable for that amount

acoustic forge Apr 29, 2020, 2:43 PM

#

That's probably true. It just seems like a lot to a new programmer :)))

uncut shadow Apr 29, 2020, 3:16 PM

#

@acoustic forge you can technically speed it up a bit but it depends if the API is so slow (cuz if it is then you will have to wait those 26h) or if that's your code. Also, if you are using python it means you doesn't actually care about speed. Python was made to be easy but it's quite slow so to make it fast u should use C++/C

delicate rune Apr 29, 2020, 4:44 PM

#

Hey guys, I'm have a problem related to Pygal

#

can anyone help me

#

📎 Capture_decran_164.png

uncut shadow Apr 29, 2020, 5:50 PM

#

it means you cannot subtract lists

#

!e

list1 = [1, 2, 3]
list2 = [1, 2, 3]
print(list1 - list2)

arctic wedgeBOT Apr 29, 2020, 5:52 PM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

uncut shadow Apr 29, 2020, 5:52 PM

#

oh

#

ok

#

so that's what it basically does

📎 unknown.png

spark stag Apr 29, 2020, 6:41 PM

#

@delicate rune i think you can replace that try-except block with .capitalize, it does the same thing, makes the first letter of a string upper case and the rest lower case

delicate rune Apr 29, 2020, 6:45 PM

#

Done it @spark stag

#

@uncut shadow yeah I know that but there is no list substruction in my code

#

Guys I do need help if anyone knows about Pygal help me I think the error is related to pygal library

oak furnace Apr 29, 2020, 7:29 PM

#

I dont see where youre importing pygal.maps.world

#

did you install it?

#

pip install pygal_maps_world

delicate rune Apr 29, 2020, 7:31 PM

#

ey I did

oak furnace Apr 29, 2020, 7:33 PM

#

does the file need to have a path

delicate rune Apr 29, 2020, 7:33 PM

#

📎 Capture_decran_168.png

oak furnace Apr 29, 2020, 7:33 PM

#

instead of just a filename

delicate rune Apr 29, 2020, 7:33 PM

#

I use it in this file and I imported this file to the otherone

oak furnace Apr 29, 2020, 7:34 PM

#

Im talking about the file youre trying to render

delicate rune Apr 29, 2020, 7:34 PM

#

No it doesn't need

oak furnace Apr 29, 2020, 7:34 PM

#

chart.render_to_file('/tmp/chart.svg')  # Write the chart in the specified file

#

that shows a path

#

other than that im out of ideas

#

Ive never used pygal

#

you should try some small test examples to help you zero in on the issue

#

try to save a simple line chart with out all the country stuff

delicate rune Apr 29, 2020, 7:37 PM

#

@oak furnace I already did that it works but I need to find a solution to make it work with the country stuff

oak furnace Apr 29, 2020, 7:38 PM

#

is world.World() right?

#

yes I see that it is

#

pygal.maps.world.World()

#

try the example on the install page see if you cant get that to work

#

http://www.pygal.org/en/stable/documentation/types/maps/pygal_maps_world.html#installing

delicate rune Apr 29, 2020, 8:22 PM

#

If I use a simple int variable like in this exemple the value of deaths in all countries it works properly but I need it to work with a list of variables

📎 Capture_decran_174.png

bronze grove Apr 30, 2020, 12:39 AM

#

Hi! I am making a line chart in matplotlib.pyplot using plot_dates. I have that part under the nail, but i want to know, how would i change the line style to indicate a change from current to predicted? e.g solid = current, dashed = predicted

wise igloo Apr 30, 2020, 5:15 AM

#

Is python decent for quasi-experimetal designs (time series, propensity score matching, ect)

lapis sequoia Apr 30, 2020, 6:41 AM

#

hello, I plan on doing text analysis on scientific publications. what type of analysis can be done?

grizzled quest Apr 30, 2020, 8:16 AM

#

whats the best way you will read from pdf with python for data extraction

polar acorn Apr 30, 2020, 8:55 AM

#

@wise igloo I'm sure python has some packages for that, certainly you'll find some time series stuff in the statsmodels package. But in general R is the place to go for more stats oriented problems.

radiant notch Apr 30, 2020, 10:38 AM

#

I've got a bit of an odd question here -

#

Does anyone have any ideas on how I could measure the correlation between two n dimensional arrays - where them lining up exactly gives the closest correlation?

#

The problem is I need some way of only checking if they line up when they are actually in proximity of each other

#

📎 unknown.png

#

that's lining up pretty well

lapis sequoia Apr 30, 2020, 1:27 PM

#

what's wrong here?

📎 unknown.png

copper panther Apr 30, 2020, 1:55 PM

#

Anyone know a fix for this

📎 unknown.png

tawny oak Apr 30, 2020, 2:09 PM

#

I think you should install textblob module

copper panther Apr 30, 2020, 2:16 PM

#

@tawny oak I tried to by looking on the internet but unsure how to install it as those never worked

acoustic forge Apr 30, 2020, 2:26 PM

#

How do I merge two dataframes where both contains some of the same data and some new data to get one new dataframe that has the collected data but not duplicates?

tawny oak Apr 30, 2020, 3:25 PM

#

@copper panther write (pip install textblob) on your command prompt. I should work

copper panther Apr 30, 2020, 3:27 PM

#

Its not that I figured it out

#

U cant do it in cmd u have to do it in a seperate command propmt

wise igloo Apr 30, 2020, 4:18 PM

#

I kinda hate R lol

#

@polar acorn lol what if I hated learning R

lapis sequoia Apr 30, 2020, 5:14 PM

#

I have made a detailed visualization about deaths,confirmed cases,recovered patients and a cnn model to predict the number of confirmed cases in future.Please feel free to share your views and how to improve it.Thanks!!
https://www.kaggle.com/frozenwolf/covid-19-visualization-prediction-nn-model/notebook?scriptVersionId=33041397

COVID-19 VISUALIZATION + PREDICTION NN MODEL

Explore and run machine learning code with Kaggle Notebooks | Using data from Novel Corona Virus 2019 Dataset

late jackal Apr 30, 2020, 6:04 PM

#

i am running some unsupervised learning algorithms to cluster data in 3d but every time i run the code i get slightly different plots

#

i am not sure if thats even an error or if that is how it should go

uncut shadow Apr 30, 2020, 6:24 PM

#

It depends on what you mean by "slightly", but I don't think really small difference will be a huge problem

late jackal Apr 30, 2020, 6:27 PM

#

well ok more than slightly like its odd it seems to vary by dimensionality reduction method

#

PCA changes almost none but logical linear embadding and ISOMAP flip flop all over

unreal quiver Apr 30, 2020, 6:46 PM

#

Hello guys I'm new here and I wanted to ask you to recommend me a machine learning project that icludes data visualisation. My experience so far is that I managed to create my neural network using only numpy and built minimax algorithm, also I have a vague understanding in liniar regression.

deft harbor Apr 30, 2020, 7:05 PM

#

What interests you?

#

Find some data on something you like, do some EDA

#

If you have labeled data, try doing prediction with something like a decision tree

uncut shadow Apr 30, 2020, 7:09 PM

#

well, in ml there aren't often many things to visualize tho. I mean, you can still visualize data with 1, 2 features but it often comes with way more features (even after dimensionality reduction)

unreal quiver Apr 30, 2020, 7:13 PM

#

What exactly is EDA?

#

@uncut shadow I wanted to visualize the data that I find

uncut shadow Apr 30, 2020, 7:14 PM

#

it often has many features tho

#

but if it has 1, 2 then it should be ok

#

I think matplotlib has 3D function

unreal quiver Apr 30, 2020, 7:16 PM

#

Ok thx I can look that up

late jackal Apr 30, 2020, 7:22 PM

#

@uncut shadow one of my main ocncerns is im not getting very clean seperated clusters

#

after DR i have a 1995x3 dataset

uncut shadow Apr 30, 2020, 7:22 PM

#

hmm

late jackal Apr 30, 2020, 7:22 PM

#

i can send an image

uncut shadow Apr 30, 2020, 7:22 PM

#

maybe you should try with different algorithm?

late jackal Apr 30, 2020, 7:23 PM

#

📎 unknown.png

#

left collumn is like the true labels then we have kmeans dbscan and hdbscan

#

the second figure is a slightly different dataset

#

we are suposed to tune the clustering algoristhms ill that match the truth labels then apply those settings to the second data set

#

. Truth KMeans DB HDB
PCA
ICA
t-SNE
ISOMAP
LLE

#

kmeans seem to be the only one functioning well

uncut shadow Apr 30, 2020, 7:25 PM

#

hmmm

late jackal Apr 30, 2020, 7:26 PM

#

🤷‍♂️

#

yeah not sure either i emailed a TA about it but we have a report due tomorrow lol

tall pollen May 1, 2020, 1:44 AM

#

hi, does anyone know how to perform a LR with adjustments by gender, age and so on?? I am new to python, and i know how to do this in SPSS but not in python.

#

anyone?

#

great community thankyou

tame kernel May 1, 2020, 4:34 AM

#

is there a way to fill a numpy array with the entire contents of a different array as the first entry?

lapis sequoia May 1, 2020, 11:36 AM

#

any book recommendations for data science?

jolly briar May 1, 2020, 1:50 PM

#

@lapis sequoia google ISLR, there' s a free text online as well

lapis sequoia May 1, 2020, 1:50 PM

#

Thanks

#

But i got a book "DATA SCIENCE FROM SCRATCH"@jolly briar

jolly briar May 1, 2020, 1:51 PM

#

@lapis sequoia fair , i think ISLR is better, depends what you want really, any learning is good learning

lapis sequoia May 1, 2020, 1:53 PM

#

Ok

jolly briar May 1, 2020, 1:54 PM

#

islr doesn't use python either - so it's not necessarily better @lapis sequoia , it's just preference

lapis sequoia May 1, 2020, 1:54 PM

#

@jolly briari checked islr

#

But thats based on r language

#

I wanna learn in python

jolly briar May 1, 2020, 1:54 PM

#

yea - but it's more about the concepts

lapis sequoia May 1, 2020, 1:54 PM

#

Ok

#

Thanks

worldly elm May 1, 2020, 6:23 PM

#

how do I add new items to a spacy vocab? I have some pseudowords which are not in the spacy vocab and I would like the tagger to tag them correctly

#

namely the text contains <eos> and similar stuff that gets tagged as <=XX, eos=NNP and >=XX

#

while I'd like spacy to tag the whole <eos> as SYM

lunar holly May 2, 2020, 5:36 AM

#

This is more of a general data science question, but when performing EDA how do you find relationships or potential trends among variables in the dataset? So far all the multivariate relationships I've explored have produced no results

worldly elm May 2, 2020, 7:10 AM

#

@lunar holly I usually check correlations, variance explained, normality of distribution, and also all the assumptions of linear reg, like heteroscedasticity

lapis sequoia May 2, 2020, 8:20 AM

#

is it necessary to define default value None by using pydantic??

class Model(BaseModel):
    attribute: Optional[int] = None
    

class Model(BaseModel):
    attribute: Optional[int]

north sluice May 2, 2020, 8:33 AM

#

uhhi actually have no idea

#

quick question - do you guys know any good resources to learn the Keras API? I've been trying to get into machine learning, but most of the resources I've found pertain to the commands themselves, rather than the theory and process behind it

#

I'd like to learn how to make a machine learning model based on the environment, rather than being a one trick pony refactoring some code i found on the internet

uncut shadow May 2, 2020, 9:24 AM

#

@north sluice if you want to learn the Keras API you should check it's documentation

rigid storm May 2, 2020, 10:37 AM

#

Hi guys just a quick stats questions. I have 2 samples (n=34 and n=62). Should i even bother trying to run a student's t-test or should i go for Welsch's due to the fact that the variances are probably not equal anymore?

#

i was thinking, this ratio is 1.8. is there like a number, like a rule of thumb for which ratio is the maximum to be able to still assume equal variances?

worldly elm May 2, 2020, 12:23 PM

#

always welsch imo

thin remnant May 2, 2020, 1:00 PM

#

I'm analysing a covid-19 dataset from a few weeks ago and try to plot some countries/regions. I made some code that takes all countries out of the dataset that start with the letter 'B' as shown in the first screenshot. In the second screenshot i'm plotting every country/region in the dataset. Now what i'm trying to do is only plotting the countries/regions I filtered out in the first screenshot. Could someone help me out on this one ?

📎 unknown.png

#

📎 unknown.png

lunar holly May 2, 2020, 3:53 PM

#

@worldly elm cheers man, thanks I'll take a look into that. Just exploring a dataset for a school project and figuring out if there are any derived variables to create or relationships to figure out before I build a classification model for it

worldly elm May 2, 2020, 3:56 PM

#

@thin remnant pie charts, if necessary, should have no more than 4 groups.

coral yoke May 2, 2020, 5:05 PM

#

you also get the countries and then revert entirely and plot the whole series rather than the ones you intended

fiery maple May 2, 2020, 7:13 PM

#

Do you guys understand a little bit of how Allen NLP design pattern works? Because I know they have a interesting way to consume config files for example.

lapis sequoia May 2, 2020, 7:19 PM

#

Hey guys! I posted this video in discussion channel but it sank in a sea of messages in seconds 😄

I've prepared a list of 10 Python libraries for Data Science, that you might have missed. I have spent like 10 hours on editing because I lost all the progress due to Premiere Pro crush ^^

Please, tell me that at least such a list is useful for you.

This is the video: https://www.youtube.com/watch?v=FFEVAZhT7iw

Peace & love tech people. Let the algorithm be with us!

YouTube

Machine Learning Jack

TOP 10 Python Libraries You Could Have MISSED

I have prepared 10 Python libraries that in my opinion might be useful for you and you could have missed them. I hope you will choose some of them and give them a try. It's always good to have a wider range of possibilities, so you can adjust the tool to the problem and not th...

▶ Play video

tribal granite May 2, 2020, 7:39 PM

#

Does anyone have advice/ideas for unit testing a random number generator?

uncut shadow May 2, 2020, 8:10 PM

#

@lapis sequoia well, I didn't knew about any of those libraries (except spaCy) and they might come in handy later. But I still don't like using libraries and prefer making things from scratch tho 👍

pine dome May 2, 2020, 11:04 PM

#

Hi, I'm new to deep learning and ran into this error while following a tutorial:
ValueError: Input arrays should have the same number of samples as target arrays. Found 12708 input samples and 25416 target samples.

#

Based on my reading online, I need the input and target samples to match

#

In this case, it seems like I have twice as many target samples as input samples

#

What is usually the cause of this?

coral yoke May 3, 2020, 1:26 AM

#

@pine dome when you're preparing your data your x and y are way off when they should match. If you have 10,000 images, then you should have 10,000 labels

pine dome May 3, 2020, 1:36 AM

#

Makes sense

#

I realized what I did wrong - I used the x input array twice instead of x and y

coral yoke May 3, 2020, 1:36 AM

#

@lapis sequoia heard of most and used a bit of those libraries myself, hug I wouldn't suggest though. I'd much rather use fastapi or just flask as both are faster than hug in benchmarks i've seen.

empty vector May 3, 2020, 5:24 AM

#

Learning python for futures analysis. Slowly making progress 🥳

📎 d1f71163b0c015cd088c7e7aed3a428a.png

noble gale May 3, 2020, 5:52 AM

#

is data science actually hard?

lapis sequoia May 3, 2020, 6:36 AM

#

Yes, if you want to do anything sophisticated. Basic data analysis isn't too hard to jump in to

thin remnant May 3, 2020, 8:48 AM

#

📎 unknown.png

#

I'm trying to put the country name or the date on the x-axis

#

can't get it to work...

#

both would be beste

#

Belgium - date

vast tapir May 3, 2020, 12:02 PM

#

@thin remnant
Does something like this work?
belgie_new.set_index('ObservationDate').plot(kind='bar', rot=0, linedwidth=2, figsize=(7,7))

Another that might work is add use_index=False
belgie_new.plot(kind='bar', rot=0, linedwidth=2, figsize=(7,7), use_index=False)

Also you could add x='Belgium' to get the country name on the x-axis (assuming there's only 1 country name)
belgie_new.plot(x='Belgium', kind='bar', rot=0, linedwidth=2, figsize=(7,7))

thin remnant May 3, 2020, 12:06 PM

#

Give me a sec

#

Imma try that

#

@vast tapir can you pm me ?

graceful birch May 3, 2020, 12:14 PM

#

i have an array qs = [0.13, 0.2, 0.3, 0.4, 0.5, ...] with 10 elements
i have a numpy array indexes which consists of indexes into qs (ie integers 0-9 inclusive).
indexes is quite large (10s of millions).
when I do y = qs[indexes] it seems very slow and uses a lot of RAM

#

is this expected? is there a better way to do that?

stoic condor May 3, 2020, 12:51 PM

#

@graceful birch have you ever tried using Numpy arrays?

graceful birch May 3, 2020, 12:53 PM

#

@graceful birch have you ever tried using Numpy arrays?
@stoic condor I was doing stuff with pandas, turned out pandas is allocating loads whenever I do df.loc[bool_mask, 'column'] = some_values

#

that was the problem

#

for some reason df['column'] = np.where(mask, some_values, np.nan) is massively faster and barely allocates

#

this weird tbh, my df is ~100Gb but if I even look at it funny by slicing it, pandas seems to copy it in into a massive temporary

stoic condor May 3, 2020, 1:03 PM

#

@graceful birch you could use

pandas.DataFrame.at()

graceful birch May 3, 2020, 1:05 PM

#

@stoic condor i have df.shape = (50_000_000, 50) , bool_mask is a vector of 50_000_000 bools, 90% of them true. I

stoic condor May 3, 2020, 1:05 PM

#

another option might be

pandas.DataFrame.update()

For tons of data..

graceful birch May 3, 2020, 1:06 PM

#

is there a reason why df.loc[bool_mask, 'column'] = some_values is so much slower than the df['column'] = np.where(mask, some_values, np.nan) ?

#

some_values is float64

#

perhaps i should not be using bool_mask with loc ?

stoic condor May 3, 2020, 1:11 PM

#

I might be slower for the interpreter check each element via .loc, given multiple function calls. np.where implement problably is applied in place

#

if data to be replaced are Nan, you have .fillna function ofr DataFrames

faint furnace May 3, 2020, 2:54 PM

#

Hello! I am pretty new to data science and learning pandas, numpy, matplotlib,etc through courses online . Need to learn a new skill during this pandemic!
I want to find out which genre has received the maximum number of votes.

📎 unknown.png

#

I converted the genre to a List

📎 unknown.png

#

Now I just need to be able to select each genre like for example Documentary. Then get its index , use that index and add the number of votes it got.

#

Im trying to find out how to go about it but unable to get a lead

#

added them both in same dataframe

📎 unknown.png

coral yoke May 3, 2020, 3:30 PM

#

@faint furnace

>>> import pandas as pd
>>> df = pd.DataFrame({'genre': ['Documentary,Short', 'Animation,Short', 'Animation,Short', 'Comedy,Short'],\
...                    'votes': [1, 1, 1, 1]})
>>> df
               genre  votes
0  Documentary,Short      1
1    Animation,Short      1
2    Animation,Short      1
3       Comedy,Short      1
>>> df.groupby('genre').sum()
                   votes
genre
Animation,Short        2
Comedy,Short           1
Documentary,Short      1

opal knoll May 3, 2020, 4:11 PM

#

https://gist.github.com/DismalCamel/48addae43e36930b7653cb21b85705fd

Gist

570 Zillow Housing Data Project.ipynb

GitHub Gist: instantly share code, notes, and snippets.

#

Working on a project. posted at this link. How do I assign names to the rotated rows that are unnamed?

valid drum May 3, 2020, 7:45 PM

#

Is that a correct implementation of backpropagation in regular fully connected layer?

    def backprop(self, dA_prev):
        dA_prev = self.activation.backprop(dA_prev)
        x = self.cache['X']
        self.grads['dW'] = np.dot(dA_prev, x.transpose())
        self.grads['dB'] = dA_prev
        return np.dot(self.weights.transpose(), dA_prev)

slow lynx May 3, 2020, 7:56 PM

#

Hello guys!

#

any one of you have experience with machine learning, AI or robotics ?

lapis sequoia May 3, 2020, 8:01 PM

#

yeah

#

audio mostly

slow lynx May 3, 2020, 8:05 PM

#

Awesome if i may ask how did you start did you learn programming at school or by yourself and decided to dive into AI and stuff ?

lapis sequoia May 3, 2020, 8:13 PM

#

by my self

#

just out of curiousity

#

and my hobby is music

slow lynx May 3, 2020, 8:15 PM

#

Oh shit okay that amazing!

lapis sequoia May 3, 2020, 8:15 PM

#

had a little bit on school

slow lynx May 3, 2020, 8:15 PM

#

how many years of experience do you have ?

lapis sequoia May 3, 2020, 8:16 PM

#

did a computer science

#

what ever

#

higher education

#

in holland

#

i prefer other languages to but whatever

slow lynx May 3, 2020, 8:17 PM

#

Holland yeah nice place

#

so would you say that you can create an audio ai like siri or something similar ?

lapis sequoia May 3, 2020, 8:18 PM

#

for filename in *.wav; do TempoDetector single "$filename" ; done

#

can anyone help me with this one liner

#

its spits outs text

#

the tempo

#

nah there are just a couple of datasets there are freely avalaible

#

but i want the output to be embedded in the wav fle instead

#

its madmom

#

library

coral yoke May 3, 2020, 8:27 PM

#

i have experience with ML as well

slow lynx May 3, 2020, 8:30 PM

#

@coral yoke how many experience and what all have you made ?

coral yoke May 3, 2020, 8:32 PM

#

in ML specifically about half a year now and i've just made simple things, nothing fancy. image classifiers for different things, sentiment analysis for reviews, power plant generation prediction given very basic data, couple other things

slow lynx May 3, 2020, 8:33 PM

#

For anyone wondering i am just asking cause i am @ my 978 attempt in learning python i started with the automate the boring stuff seems to be going well and eventually want to deepen my skills with ML , ai or robotics but i want to have a realistic road map i guess you could say to work towards that goal

#

Pretty awesome man!

gloomy current May 3, 2020, 9:49 PM

#

got a quick question about best practices- ive written a web scraper but now i want to grab lots more points of data from each page

#

but if i add on what i need it seems of an ugly way to do it

#

 timeframe_volumes = [[] for _ in range(4)]
         scrap data iterate through url list
             append data list for each iteration  create dataframe using lists 
store dataframe to csv ```

#

the data i want is in so many different places id have to declare like 15 different loops at the start which seems like the wrong way to do it

#

anone know if there is a better way to go about this? just need pointing in the right direction as im kinda learning as i go

jolly briar May 3, 2020, 11:02 PM

#

how are python notebooks such trash compared with things like RMD? I'd have thought by now they'd have either sorted themselves out or just moved things over to a markdown setup like R has, what's holding them back?

Having a variable in a markdown cell, or showing / hiding cells in report output is completely trivial in RMD but in notebooks is a complete hassle.

given that data analysis is so widely done in python, and notebooks are one of the main means for it, I'm constantly amazed with how awful they are.

coral yoke May 4, 2020, 12:14 AM

#

@jolly briar I don't know what you're talking about with "how awful they are." They're very useful and intuitive

#

Assuming by notebooks you mean Jupyter

jolly briar May 4, 2020, 12:24 AM

#

@coral yoke yeah jupyter - they have use but when one compares them to something like rmd they're severely lacking

#

don't know what you're talking about with "how awful they are."
i've given specific examples in my comment, so i'm not sure how you don't know what i'm talking about

#

unless you are aware of approaches to this which are as straightforward as those in rmd?

#

Assuming by notebooks you mean Jupyter

This is also ambiguous, unfortunately. Are we talking about notebooks or lab? See - i was actually talking about lab, but said notebook through habit... though - the issues that I've outlined aren't solved in notebook.

There's now lab and notebook, lab is meant to be the next version of notebook, but notebook widgets don't work with lab because the back end was changed a lot.

coral yoke May 4, 2020, 12:47 AM

#

Jupyter notebook is just fine and I nor my data analysis friends have ever had issues with it in any way. There wasn't any specific examples in your random, factless rant. I have no idea why you're acting like this

jolly briar May 4, 2020, 12:48 AM

#

@coral yoke variables in markdown

#

there's an example

#

showing / hiding cells in report output, there's another @coral yoke

#

In an rmd file having an variable in a markdown section is simply done with

text `r variable` more text

#

and a cell is hidden / shown with echo=False/True (iirc).
it's also possible to turn off evaluation of a cell with eval = False, all these are just put within { } tags at the top.

I think it's quite clear how much more straightforward this is than with jupyter

coral yoke May 4, 2020, 12:57 AM

#

Also none of that is anything I've ever needed so congratulations?

jolly briar May 4, 2020, 12:57 AM

#

i didn't realise they were designed for you personally

coral yoke May 4, 2020, 12:57 AM

#

If you don't want a cell evaluating in Jupyter notebook, don't run it

jolly briar May 4, 2020, 12:58 AM

#

but having a variable in a markdown cell is a pretty basic ask for something to build a report from

coral yoke May 4, 2020, 12:58 AM

#

I didn't realize they were designed for you personally either?

jolly briar May 4, 2020, 12:58 AM

#

If you don't want a cell evaluating in Jupyter notebook, don't run it
that's not practical

#

I didn't really they were designed for you personally either?
no - i'm comparing it to general tooling

#

i'm not sure if you've ever used rmd before?

coral yoke May 4, 2020, 12:59 AM

#

I haven't and I never will however it doesn't mean I haven't seen it used

jolly briar May 4, 2020, 12:59 AM

#

well maybe you don't understand the difference then

#

but these kind of things are extremely straightforward in rmd, and it's frustrating that they're not in jupyter

coral yoke May 4, 2020, 12:59 AM

#

Then why randomly rant in a python discord when nobody asked for it

jolly briar May 4, 2020, 12:59 AM

#

because it's about a python tool?

coral yoke May 4, 2020, 1:00 AM

#

Especially when you act like Jupyter is trash, and even said it, and don't even point out anything nice about it? Real nice of you

jolly briar May 4, 2020, 1:00 AM

#

Yes - I think it is kinda trash for these reasons

#

They're pretty basic aren't they? using variables in markdown etc

coral yoke May 4, 2020, 1:01 AM

#

No honestly. I've never needed variables in my markdown. I use code cells to display code, that's what it's for

jolly briar May 4, 2020, 1:01 AM

#

It's quite common - when writing to want to reference a variable

#

it also enables dynamic reporting, pretty basic stuff

#

Similarly with hiding / showing cells in the output - sometimes the code is not relevant, sometimes it is

coral yoke May 4, 2020, 1:02 AM

#

I just write in markdown "In this example we can see the mean of sales" and then just display it. It's not trash

jolly briar May 4, 2020, 1:02 AM

#

Yes - that is a trash work around

coral yoke May 4, 2020, 1:02 AM

#

In your opinion

jolly briar May 4, 2020, 1:02 AM

#

because it's having to work around a basic functionality

coral yoke May 4, 2020, 1:03 AM

#

It's not honestly

#

You're being over dramatic

jolly briar May 4, 2020, 1:03 AM

#

sure - look at rmd, look at jupyter, and tell me one is not objectively much better in this regard

coral yoke May 4, 2020, 1:03 AM

#

I'd still choose Jupyter

jolly briar May 4, 2020, 1:03 AM

#

I'd like to hear why not having the option is better than having the option

coral yoke May 4, 2020, 1:03 AM

#

I'm not going to fuss over having to write code in a code cell lol

jolly briar May 4, 2020, 1:04 AM

#

i'm assuming you've not had to use them for reporting

#

there's also the matter of plain text vs the mess of json in jupyter notebooks i guess

coral yoke May 4, 2020, 1:04 AM

#

I have for each project I've done and never have we had issues

jolly briar May 4, 2020, 1:04 AM

#

Well I am highlighting things you don't seem to have considered, and your response is to double down

coral yoke May 4, 2020, 1:05 AM

#

My response is my experience?

jolly briar May 4, 2020, 1:05 AM

#

I'm not asking whether these are useful, I'm telling you they are, hence them being functionality in many other tools

#

and hence there being insane work arounds to try and bring them into jupyter rather than them being a default as they are in other tools

coral yoke May 4, 2020, 1:05 AM

#

You find them useful, no? It's your opinion on the usefulness not a fact dude

jolly briar May 4, 2020, 1:06 AM

#

Yes I do - and many others do as well

coral yoke May 4, 2020, 1:06 AM

#

It's not insane to click the plus button and type one line

jolly briar May 4, 2020, 1:06 AM

#

Yes it's clunky - the variable belongs embedded within the paragraph so that it reads naturally

coral yoke May 4, 2020, 1:06 AM

#

Why so?

jolly briar May 4, 2020, 1:06 AM

#

not a stupid staccato flow of "and here's a blah" <code cell> etc

coral yoke May 4, 2020, 1:06 AM

#

I can read it naturally in Jupyter version

#

Does that not mean it comes down to preference?

jolly briar May 4, 2020, 1:07 AM

#

It means you're willing to have something that's further removed from reading like a natural paragraph of written text

coral yoke May 4, 2020, 1:07 AM

#

Have you looked at Jupyter notebooks then? They're very smooth transitions

jolly briar May 4, 2020, 1:07 AM

#

which isn't usually what I'd go for when writing something, i'd rather have something that has the ability to have variables integrated into it

#

pretty basic

#

I've used notebooks a lot yes

coral yoke May 4, 2020, 1:08 AM

#

You'd rather that, again preference

jolly briar May 4, 2020, 1:08 AM

#

they're not smooth at all in this regard, not even close

coral yoke May 4, 2020, 1:08 AM

#

Don't you see my point?

jolly briar May 4, 2020, 1:08 AM

#

yes - you're saying that i can just have an evaluation follow a line of text

coral yoke May 4, 2020, 1:08 AM

#

No disregard that

jolly briar May 4, 2020, 1:08 AM

#

and that is breaking the flow of text - it's not as fluid as being able to embed the variable within a paragraph

coral yoke May 4, 2020, 1:08 AM

#

I'm telling you it's preference, opinion

jolly briar May 4, 2020, 1:09 AM

#

and if you want say 8 variable in the paragraph then it's horrendous

coral yoke May 4, 2020, 1:09 AM

#

Not facts that you're sitting here screaming

jolly briar May 4, 2020, 1:09 AM

#

well you could like lots of daft things i'm not going to assume that it's all a good idea am i

coral yoke May 4, 2020, 1:09 AM

#

But it's again, your opinion, no?

jolly briar May 4, 2020, 1:09 AM

#

No - hence me giving examples and you trying to boil it down to "everyone has their own view"

coral yoke May 4, 2020, 1:10 AM

#

My point is there's no need to argue over a preference. You're honestly being childish man.

jolly briar May 4, 2020, 1:10 AM

#

no i'm highlighting a deficiency

coral yoke May 4, 2020, 1:10 AM

#

I'll let you do you and echo in your chamber oh great R programmer, have a good one

jolly briar May 4, 2020, 1:11 AM

#

@coral yoke it's not about that - I use python far more than R - but it's pretty blatant given any thought that there are some extremely basic features missing from notebooks which severely gimp it compared to something like rmd

agile forge May 4, 2020, 1:14 AM

#

apparently you can use python from rmd, though it's not clear to me if you can do the variable referencing stuff in markdown

jolly briar May 4, 2020, 1:15 AM

#

@agile forge I've heard some do this actually - i'm always a bit sceptical of these things for some reason... like using R within jupyter etc

agile forge May 4, 2020, 1:15 AM

#

jupyter's name comes from Julia/Python/R

#

so I'd expect jupyter to work fine with R

jolly briar May 4, 2020, 1:16 AM

#

yeah - i mean when one mixes python / r in the same notebook - i've never looked into it

#

I know others use it though, so I'm sure it's fine

agile forge May 4, 2020, 1:16 AM

#

ah, mixing I don't know about

#

also apparently there's a jupyter extension for variables-in-markdown

jolly briar May 4, 2020, 1:16 AM

#

yeah - you can have both, reticulate does it iirc

#

also apparently there's a jupyter extension for variables-in-markdown
they're often hassle though - and it depends if it's notebook or lab

agile forge May 4, 2020, 1:17 AM

#

nods

jolly briar May 4, 2020, 1:17 AM

#

because they changed the back end a lot so notebook doesn't often work with lab

#

😭

agile forge May 4, 2020, 1:17 AM

#

notebook is obsolete at this point, it seems, so if it works with lab it's all good

jolly briar May 4, 2020, 1:17 AM

#

there was one with a {{}} kinda syntax or something

agile forge May 4, 2020, 1:17 AM

#

definitely a lot of rough edges in notebooks though

#

like, three different solutions for working well with git

jolly briar May 4, 2020, 1:18 AM

#

notebook is obsolete at this point, it seems, so if it works with lab it's all good
yeah it'd be great if there was a kinda centralisation or something to force people to use lab instead of notebook

agile forge May 4, 2020, 1:18 AM

#

and they all feel a litle hacky

jolly briar May 4, 2020, 1:18 AM

#

because the widget ecosystem still isn't there

#

like, three different solutions for working well with git
yeah this is another hassle

#

IMO I just don't know if they're ever going to be as flexible / straightforward as rmd... I just envy that they have plain text, simple flags for execution / hiding things etc

agile forge May 4, 2020, 1:20 AM

#

I guess I should look into rmd sometime

jolly briar May 4, 2020, 1:20 AM

#

yeah - although if you use python maybe it's not so much value, as a lot of others won't use it 😦

#

but it's _so_straightforward to have cells on / off for evaluation, having cells hidden, variables in markdown and the rest of it. So for generating reports it's super nice

agile forge May 4, 2020, 1:21 AM

#

the best ideas are the one's you can, uh, liberate without any work cause someone else had a smart idea

jolly briar May 4, 2020, 1:21 AM

#

yeah definitely - I'm not sure how they'd port over to lab though , i guess they'd have to have 'magic' stuff

#

like

%%show_cell
this cell will be in report

#

something along those lines might be ok

#

in rmd they have:

\`\`\`{r, echo=FALSE}
1 + 1
\`\`\`

#

nested backticks beat me

agile forge May 4, 2020, 1:23 AM

#

https://rmarkdown.rstudio.com/ marketing talks about Python outright

R Markdown

Turn your analyses into high quality documents, reports, presentations and dashboards with R Markdown. Use a productive notebook interface to weave together narrative text and code to produce elegantly formatted output. Use multiple languages including R, Python, and SQL. R Ma...

#

not sure if that's reticulate or not

jolly briar May 4, 2020, 1:23 AM

#

So here - the code would be hidden, but the output would be shown

agile forge May 4, 2020, 1:23 AM

#

sounds like knitr just supports python

jolly briar May 4, 2020, 1:23 AM

#

yeah the r in the {} is saying to execute with r, so it might be super easy

agile forge May 4, 2020, 1:23 AM

#

maybe I'll write an article

jolly briar May 4, 2020, 1:24 AM

#

you can publish straight to rpubs from r-studio ha

agile forge May 4, 2020, 1:25 AM

#

seems like the wrong audience, though 🙂

#

later

jolly briar May 4, 2020, 1:25 AM

#

👍

faint furnace May 4, 2020, 10:50 AM

#

This is a very big dataset of IMDB movies. I want to find which genre has the maximum votes. Everyone movie has max 3 genres. and the votes column is next to it as in the pic. I want to basically get the list of all titles with their total number of votes

📎 unknown.png

#

like number of votes for "Animation" genre will be calculated by adding the votes of all movies which come under animation genre (like here 197+1287+121 + ....) . This is a big dataset. not just 5 entries. How do i do that?

#

How do I extract unique genres from this list? Like Documentary, Short, Animation, Comedy, Romance,Sport,etc.. this should be the result

📎 unknown.png

lapis ice May 4, 2020, 12:36 PM

#

Question regarding GANs,
What has already been done and what 'should be done' using GAN's?
It's a question which answers can be based on your own opinion or based on a paper, etc.
I would really appreciate any and all inputs 🙂

coral yoke May 4, 2020, 1:03 PM

#

@faint furnace I gave you your answer the last time you asked. Scroll up dude

faint furnace May 4, 2020, 1:46 PM

#

@coral yoke i read it. i didnt understand it 😦

#

>>> df = pd.DataFrame({'genre': ['Documentary,Short', 'Animation,Short', 'Animation,Short', 'Comedy,Short'],\
...                    'votes': [1, 1, 1, 1]})
>>> df
               genre  votes
0  Documentary,Short      1
1    Animation,Short      1
2    Animation,Short      1
3       Comedy,Short      1
>>> df.groupby('genre').sum()
                   votes
genre
Animation,Short        2
Comedy,Short           1
Documentary,Short      1```

#

ok i understand it now. but that is not what i want

coral yoke May 4, 2020, 1:48 PM

#

It's simply grouping by the given column and then summing the votes

#

Yes, but that's the snippet you need to finish now

faint furnace May 4, 2020, 1:49 PM

#

see you are considering Animation, Short as same genre. i want it different

#

Animation =
Short =

#

each movie comes under max 3 genres. and i awnt to calculate the number of votes for each genre

coral yoke May 4, 2020, 1:49 PM

#

Yes I know, that should be decent enough to work with

faint furnace May 4, 2020, 1:50 PM

#

yea i know groupby can be used. but the problem im facing is how do i seprate this list of genres

coral yoke May 4, 2020, 1:51 PM

#

You can separate the genres but you'll lose the votes when they go to one or the other

#

Short or long isn't a genre, you don't want that separated you want it removed

faint furnace May 4, 2020, 1:52 PM

#

you are right . i was planning to ignore the number of votes of short. but how do i get the totan number of votes of Documentary, animation and other genres

coral yoke May 4, 2020, 1:53 PM

#

Just remove short or long from the columns

#

And then groupby

faint furnace May 4, 2020, 1:55 PM

#

how do i remove it btw?

coral yoke May 4, 2020, 1:55 PM

#

Just use the built in replace function

faint furnace May 4, 2020, 1:56 PM

#

ya replace thanks

faint furnace May 4, 2020, 2:19 PM

#

I was able to remove the Short from genres_list. but groupby not working

📎 unknown.png

#

@coral yoke

#

But you see there are genres like Crime Romance, Thriller. in same list.

coral yoke May 4, 2020, 2:43 PM

#

not sure if groupby likes lists but it shouldn't be a one-length list

faint furnace May 4, 2020, 2:54 PM

#

📎 unknown.png

#

trying to get a list of genres

drowsy marsh May 4, 2020, 6:16 PM

#

I have a set of data that I can interpolate using scipy.interp2d, like z = f(x,y).
Now, can I obtain an analytic formula, type z = a.x + b.y or something from this?

I tried using linear_model from sklearn. But it does a 2D-linear-regression if I understood, which is, I just realized, different than a 2D interpolation. And so the results are very different than what I expect.

terse torrent May 4, 2020, 6:34 PM

#

Is IBM_DB2 working with anyone else’s python?

patent kiln May 4, 2020, 6:38 PM

#

Just a heads-up. HumbleBundle is having nice sales on O'Reilly books
https://www.humblebundle.com/books/definitive-guides-to-all-things-programming-oreilly-books?partner=limitedtimeoffer

Humble Bundle

Humble Book Bundle: Definitive Guides to All Things Programming by ...

Pay what you want for awesome ebooks and support charity!

mortal vessel May 4, 2020, 7:05 PM

#

Got a question for you all: is there anything special I have to do with pycharm to have my plots show up? Matplotlib plots are not displaying with plt.show command

uncut shadow May 4, 2020, 7:18 PM

#

well

#

it should open new window with this plot

#

idk if pycharm supports opening plots

#

I mean

#

if you are going to run plt.show() (and you have made a plot already tho) then it doesn't change anything if it's pycharm

#

pycharm scientific mode has a function (I'm not 100% sure tho) for plots and stuff like that which makes it easier to use but it's only for profesional version

mortal vessel May 4, 2020, 7:21 PM

#

Maybe I should try using scientific mode

#

I just know plots I would display in notebooks do not display correctly in pycharm. They don’t display at all

#

Dubugger steps right over them

coral yoke May 4, 2020, 7:37 PM

#

@patent kiln 99% of the time you can find O'Reilly books as free PDFs instead of spending that money

lapis sequoia May 4, 2020, 7:48 PM

#

Hey my fellow data science friends, I am trying to see if this is possible in pandas

#

I'm trying to get all rows from a specific column between two rows with certain strings, all the non string rows are random integers

#

is this possible?

#

I'm trying to "slice all rows" *

#

in between the rows with two particular strings

terse torrent May 4, 2020, 7:51 PM

#

What should I use for SQL? MySQL or Jupyter?

coral yoke May 4, 2020, 7:53 PM

#

what?

#

also @lapis sequoia i need a bit better explanation to understand what you're trying to do

lapis sequoia May 4, 2020, 7:54 PM

#

Ok, for sure ! Thanks so much for reading

#

Ok so

terse torrent May 4, 2020, 7:55 PM

#

What should I use for creating sql tables?

coral yoke May 4, 2020, 7:55 PM

#

a program that can interface with the database or the database command line in the case of something like postgres

lapis sequoia May 4, 2020, 7:56 PM

#

I have anexcel sheet of 48 columns

#

and for each column, they are basically all random integers

#

but the words " buy curve" appear , and "sell curve" appear in each column, in different indexes for each column

#

so I want to basically iterate for every column: find where buy curve appears and slice the column up to where "sell curve" appears

coral yoke May 4, 2020, 7:58 PM

#

just grab the index of each and then slice

lapis sequoia May 4, 2020, 7:58 PM

#

I know I can find a boolean for where buy curve appears I believe,

#

oh theres not a automated way to do it?

coral yoke May 4, 2020, 7:58 PM

#

no?

lapis sequoia May 4, 2020, 7:58 PM

#

because theres 48 columns and id have to individually find each buy curve and sell curve index

coral yoke May 4, 2020, 7:58 PM

#

pandas allows you to work with data, it can't do what you don't tell it to do

lapis sequoia May 4, 2020, 7:59 PM

#

oh I can use boolean actually yeah

coral yoke May 4, 2020, 7:59 PM

#

no, use pandas

#

df.loc the rows with that buy and sell, grab the index

#

no boolean required

lapis sequoia May 4, 2020, 7:59 PM

#

but from my understanding df .loc

#

it can slice within a column from one string to another even if there are integers in between?

coral yoke May 4, 2020, 8:00 PM

#

then do that

lapis sequoia May 4, 2020, 8:01 PM

#

ok, i remember trying that and it would give some error, let me give it a shot again

terse torrent May 4, 2020, 8:05 PM

#

Why would my IDLE be daying ibm_db could not be found if I imported it already

coral yoke May 4, 2020, 8:06 PM

#

because you're not using the correct interpreter

#

also, highly advise against IDLE. just get a proper IDE

lapis sequoia May 4, 2020, 8:09 PM

#

so I made a small scale Example, Soul, is there something wrong with my slicing notation? it doesnt return the values in between my slicing parameters

terse torrent May 4, 2020, 8:09 PM

#

Downloading Anaconda to run Jupyter.

lapis sequoia May 4, 2020, 8:09 PM

#

📎 Screen_Shot_2020-05-04_at_4.08.36_PM.png

#

oh derp, df loc only treats the inputs as indexes or columns?

coral yoke May 4, 2020, 8:44 PM

#

@terse torrent jupyter will not do it for you. it is not meant for that

#

@lapis sequoia yes

#

which is why i said use loc to find the row index where the str is contained

lapis sequoia May 4, 2020, 10:46 PM

#

@coral yoke I think i figured out how to do it!

#

📎 Screen_Shot_2020-05-04_at_6.48.26_PM.png

#

that returns the index , now I can try to slice that with iloc or something, since i have a numbered index right?

#

thanks a million man

lapis sequoia May 4, 2020, 11:32 PM

#

Yo i think i got everything! THanks a million @coral yoke for nudging me in the right direction! I converted the series to a list and used int( list.index ("Sell Curve")) then used i loc with that!! amazing

coral yoke May 4, 2020, 11:46 PM

#

np!

worldly elm May 5, 2020, 8:07 AM

#

Is there a simple way to get all the data from an MLFLOW experiment so that I can share them?

unreal thistle May 5, 2020, 2:03 PM

#

hi guys :)!

#

so i work at this project and i did a classification of a model and now i have to do the segmentation ,i found some examples online but they train and do the segmentation in the same time ,is there any method i can use to do the segmentation without editing all the code i did .Sorry if it is a stupid question ,i'm a begginer .

lapis sequoia May 5, 2020, 2:06 PM

#

classify what now

#

what model did you make to classify what

#

and what do you need to do segmentation on

warm wedge May 5, 2020, 2:40 PM

#

bit of a newb question, but wondering if anyone could let me know...

#

if you run some vectorization over a column in a DataFrame, can the function be your own function, or is it only a select set of native functions you can do it with?

agile forge May 5, 2020, 2:41 PM

#

you can run any function on a pandas column, but random python functions will be slower

warm wedge May 5, 2020, 2:42 PM

#

but it would still be faster than doing it via iterrow() or .apply()?

#

with a custom function

lapis sequoia May 5, 2020, 2:50 PM

#

Do i have to master matplotlib ?

#

Or being average in using matplotlib is fine

#

For data science

warm wedge May 5, 2020, 3:01 PM

#

theres never and excuse not to master anything

#

but from my limited knowledge, its one of the bigger libraries so I'm sure theres no harm

#

but im not expert, im pretty much a newb to it all

fervent bridge May 5, 2020, 3:08 PM

#

Anyone here close to finishing the Neural Networks from Scratch book at NNFS.io? I like Sentdex series just want to know if the book makes it easy to grab the readers attention.

#

Also if I want to build a portfolio like what are some good example projects to actually work on?

uncut shadow May 5, 2020, 3:16 PM

#

@lapis sequoia you should be able to use matplotlib without problems tho. It's not like plotting and stuff is the heart of ML and DS, but it's super usefull. It's often impossible to plot some data in some fields of ML especially in Deep Learning because it comes in many dimensions but if you have a chance to do that, don't hesitate if it's worth time. Not every dataset (datasets are often checked but some of them are not) is correct and some labels might be wrong. If you plot this dataset you will be able to easilly see the problem and solve it. For example, let's say you have a dataset of people's height, weight, gender and stuff like that. You do .head() (assuming u opened with pandas) and .tail() and everything seems ok, but it's not. 5 people in the dataset have wrong height and you don't know about that

#

wrong data might cause problems with your model which is very bad for it

#

and if you would plot this dataset you could easilly spot those anomalies and fix them

#

also, datasets are often not just 200 examples nor even 2000 examples. Some of them have millions of examples and human couldn't check them all

#

so they might contain wrong data

lapis sequoia May 5, 2020, 3:27 PM

#

Ok@uncut shadow

uncut shadow May 5, 2020, 3:27 PM

#

👍

lapis sequoia May 5, 2020, 3:27 PM

#

Thanks

#

Are u a data scientist ?

uncut shadow May 5, 2020, 3:28 PM

#

no

lapis sequoia May 5, 2020, 3:28 PM

#

Then ?

uncut shadow May 5, 2020, 3:28 PM

#

but I really like DS and ML

#

and I find it really usefull plotting data

lapis sequoia May 5, 2020, 3:28 PM

#

I am reading the book "DATA SCIENCE FROM SCRATCH"

#

Have u heard of it

uncut shadow May 5, 2020, 3:29 PM

#

gimme a sec

lapis sequoia May 5, 2020, 3:31 PM

#

I found this matplotlib series on youtube

#

https://www.youtube.com/playlist?list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_

YouTube

Matplotlib Tutorials - YouTube

In this Python Programming series, we will be learning how to use the Matplotlib library. Matplotlib allows us to create some great looking plots in order to...

#

What do u think of it ?

uncut shadow May 5, 2020, 3:33 PM

#

well, I didn't watch those tutorials but from what I can see they are good. Still, there is nothing which can replace your own coding so watching videos might help and show you the way but in the end you still have to make it for yourself to understand it

lapis sequoia May 5, 2020, 3:34 PM

#

In that book i am reading he didn't teach a lot about matplotlib

uncut shadow May 5, 2020, 3:35 PM

#

well, yeah, but still it doesn't mean it's not usefull

#

datasets are sometimes assumed to be clear where in reality they often aren't

#

there is no better way than plotting everything and seeing for yourself

merry violet May 5, 2020, 3:36 PM

#

That matplotlib series on youtube is brilliant. Corey Schafer has one of the best python channels you will find.

lapis sequoia May 5, 2020, 3:54 PM

#

@uncut shadow@merry violet ok thanks

mortal vessel May 5, 2020, 4:08 PM

#

So I know python 2.7 is no longer supported. Have some groups discontinued the ability to download packages for 2.7?

#

Like numpy or pandas etc

agile forge May 5, 2020, 4:19 PM

#

mostly you'll just end up getting old versions

trail light May 5, 2020, 4:44 PM

#

is anyone here?

#

i wanna tell someone what i have just created

#

using python

uncut shadow May 5, 2020, 4:55 PM

#

Well, you can just post it tho but the best place for that (If It's not related to data science or ML) to post it in #303934982764625920

#

altough I'm not a staff member so it might not be what they would do

past pewter May 5, 2020, 5:47 PM

#

@lapis sequoia matplotlib and seaborn are both useful. Don't kill yourself memorizing syntax, but be familiar with them. Try and make a few professional looking, highly customized graphs and then call it done.

Corey Schafer is good, Sentdex is also good
https://www.youtube.com/playlist?list=PLQVvvaa0QuDfefDfXb9Yf0la1fPDKluPF

YouTube

Matplotlib Tutorial Series - Graphing in Python - YouTube

Learn how to visualize data in the form of line graphs, bar charts, pie charts, 3D graphs, and more with Python 3 and Matplotlib.

lapis sequoia May 5, 2020, 6:08 PM

#

@past pewterok thanks

#

Learning syntaxes isnt easy

#

You learn them then u forget them later

#

What is seaborn btw ?

past pewter May 5, 2020, 6:12 PM

#

It's another plotting library built on top of matplotlib. Its defaults are a bit faster and sexier than matplotlib, but if you want to do any deep customization you need to use matplotlib

#

It's common in industry

jolly briar May 5, 2020, 7:06 PM

#

@lapis sequoia knowing how to use dir() and help() is a bit of a must with matplotlib, in my experience at least

fervent bridge May 5, 2020, 7:28 PM

#

Ah so many tuts, I did not know Sentdex has a Matplotlib tut. Will definetly look into it.

umbral aspen May 5, 2020, 7:52 PM

#

Does anyone know of any good tutorials for predicting thousands of images at once with Tensorflow? I find lots of things about building pipelines to train models but not much around actually using that model once it has been trained...

polar acorn May 5, 2020, 8:00 PM

#

@umbral aspen Search for some info on tfx, I haven't tried it myself but it's a tensorflow library for hosting and using your tf models.

umbral aspen May 5, 2020, 8:01 PM

#

@pptt Thanks for the tip never heard of it until now...Will take a look

spark stag May 5, 2020, 8:42 PM

#

i have a question about the categorical crossentropy loss, from what I understand with how the it computed the loss value, as it multiplies the log of each prediction by the actual value, if there is only 1 true label does that mean only the prediction onto the real label is used in calculating the loss and the distribution of the predictions onto the labels that are not the true label isn't affecting the output

#

so if i have a prediction output of [0.05, 0.1, 0.7, 0.05, 0.1] and the labels are [0, 0, 1, 0, 0] the only values that are used to calculate loss are the log(0.7) * 1, none of the other values are used?

uncut shadow May 5, 2020, 8:54 PM

#

I'm not sure if I understood your question right but no

#

You compute loss for each of those predictions

#

And then sum it

spark stag May 5, 2020, 8:57 PM

#

so if that is one prediction with 5 output nodes, I calculate a loss for each then sum yes, but because I think the equation is -sum([real[index] * log(pred[index]) for index in range(len(real))]) where real is the labels, anything which has a label of 0 (it wasn't that item) has a * 0 so it isn't considered so is my implementation wrong or just how I am expecting the data

uncut shadow May 5, 2020, 8:58 PM

#

Well

#

I'm not sure if we are talking about the same loss function tho. Cross-entropy loss function is
-(ylog(p) + (1 - y)log(p -1)) (or sth like that)

#

Where p is prediction and y is the label

spark stag May 5, 2020, 9:01 PM

#

ok, i'll look into that because this is just how i saw it

uncut shadow May 5, 2020, 9:01 PM

#

Ok

spark stag May 5, 2020, 9:37 PM

#

@uncut shadow thanks for your help, i think i found the issue now

deft harbor May 5, 2020, 11:04 PM

#

If im running a for loop, and I need to trigger an embedded if loop to run 15% of the time (extended to the future), what is a good way of going about it?

#

Some sort of draw from a distribution i would think

jolly briar May 6, 2020, 12:38 AM

#

@deft harbor

if np.random.choice([True, False], p=[0.15, 0.85]):
    do stuff

like that?

deft harbor May 6, 2020, 2:13 AM

#

yes

#

I ended up doing

#

        prob = np.random.uniform(0, 1)
        
        if prob <= 0.05:

#

curious what is the fastest way given i am using this to flip the labels while training a gan

lapis sequoia May 6, 2020, 5:16 AM

#

@jolly briar ok thanks

faint furnace May 6, 2020, 12:36 PM

#

Can someone tell me how to get it right

📎 unknown.png

#

📎 unknown.png

#

Basically I am trying to get the total number of votes for each genre. So I started out by making the "genres" column a list -> "genres_list" and set it as an index so I can use it in the find_votes() function

#

if there is anyone who can help fix my code or maybe give a better solution, would highly appreciate. been stuck on this problem for last couple days

flat bough May 6, 2020, 1:13 PM

#

@faint furnace Hi, are you trying to access a list element with round braces. It won't work you need square ones. line 6

faint furnace May 6, 2020, 1:15 PM

#

how do i correct it

#

it gives a longer error with square brackets

📎 unknown.png

flat bough May 6, 2020, 1:18 PM

#

I'm not sure how to fix this exact function, but if i were you i'd give a try loc https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

faint furnace May 6, 2020, 1:18 PM

#

This is my dataset. I want to find the number of votes for each Genre. Every movie comes more than 1 genre. as you can ese more than 1 genre in each row

📎 unknown.png

#

i dunno how I will use loc here. If I can get the index of a element then it will be great

#

i want to be able to get the index of [Documentary,Short] cells, then use this index to get the basics.numVotes[idx]

#

any way to do that?

flat bough May 6, 2020, 1:21 PM

#

honestly, not sure I need some time to think

#

can you send link to this dataset, I wonna try to also solve it

paper niche May 6, 2020, 3:16 PM

#

@faint furnace https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html

#

explode the genres column then do a groupby + sum

junior fossil May 6, 2020, 3:17 PM

#

I was also curious, thank's for this explode hint. @faint furnace maybe this helps: https://share.cocalc.com/share/fe9a112b880dd0189ff891e503e420036824c001/pandas-groupby-list.ipynb?viewer=share

faint furnace May 6, 2020, 3:24 PM

#

Thanks for the replies I will check these links!

#

damn I was actually thinking of something like this!!! explode ! I think this might be it !

faint furnace May 6, 2020, 3:54 PM

#

Finally!!! Thanks for the help!

📎 unknown.png

#

📎 unknown.png

#

Now I can proceed to next question

flat bough May 6, 2020, 3:56 PM

#

good job

placid stratus May 6, 2020, 4:02 PM

#

Hey all! I'm looking to develop a preferred citation for a model I work on. I'm confused about how to determine authorship for software, since I'm used to doing it for papers. How far back do I have to cite contributors? There are a couple who did initial work, but none of their code exists in the model anymore

#

and do we cite the primary investigator? He doesn't do any of the work, but I think we do

mossy crow May 6, 2020, 6:14 PM

#

Anybody here use boto3 (AWS SDK) with s3?

past pewter May 6, 2020, 6:22 PM

#

@mossy crow what's the issue?

mossy crow May 6, 2020, 6:23 PM

#

@past pewter For the past 3 weeks I've been uploading 5 files in a for loop to an s3 bucket with no issue. When I ran it today it randomly will get through 1-3 of them and then throws a socket.gaierror: [Errno -2] Name or service not known error

#

@past pewter here is a detailed explanation https://stackoverflow.com/questions/61642290/boto3-error-socket-gaierror-errno-2-name-or-service-not-known

Stack Overflow

boto3 Error: socket.gaierror: [Errno -2] Name or service not known

I have a script that uploads files in a directory to s3 using the boto3 AWS SDK. I've been using it for weeks with no issue, and today it will upload a random amount of files (between 1-3 out of th...

past pewter May 6, 2020, 6:26 PM

#

I haven't encountered that one but just saw this:
https://boto-users.narkive.com/l82CTCe4/cannot-create-a-bucket-following-simple-web-example-why

#

any chance it's on your end and not a boto3 thing?

mossy crow May 6, 2020, 6:27 PM

#

I didn't change anything between yesterday and today

#

I think it might be a DNS issue

past pewter May 6, 2020, 6:28 PM

#

Yeah, that's what they concluded on the linked thread

mossy crow May 6, 2020, 6:28 PM

#

Yeah.

#

Hmm ok

mossy crow May 6, 2020, 7:00 PM

#

@past pewter I turned on DEBUG logging for the function, and then it worked flawlessly, so, guess i'll never know what actually was the issue.

past pewter May 6, 2020, 7:06 PM

#

🤣

gritty solstice May 6, 2020, 7:24 PM

#

hopefully quick question...
How do I get bytes to stop automatically showing in their ascii values?

IE: b'l' to b'\x4C'

#

.hex()

NVM, thanks guys

tribal granite May 7, 2020, 1:54 AM

#

Hey guys, do you think Docker is worth learning for data science?

flat quest May 7, 2020, 3:13 AM

#

yeah its used pretty commonly with the data science / ml libraries

#

@deft harbor
if ur running np.random.uniform every time u want to run that if statement its not going to be much faster than a regular random generator

most of numpy's speed come from vectorizing i.e. doing multiple computations at once. A better way may to be to precreate a matrix of probabilities to refer to with the if statement

There might be a even faster way than that, tho i'm not sure what that might be

deft harbor May 7, 2020, 3:19 AM

#

that makes a lot sense

#

        # Generate noisy labels for discriminator ------------
        # 5% of the time, switch the labels       
        if np.random.uniform(0.0, 1.0) <= 0.05:
            # Labels swapped
            dis_loss_real = cdis.train_on_batch([X_real, label_batch], y_fake)
            dis_loss_fake = cdis.train_on_batch([X_fake, L_fake], y_real-smoother)
        else:
            # True labels
            dis_loss_real = cdis.train_on_batch([X_real, label_batch], y_real-smoother)
            dis_loss_fake = cdis.train_on_batch([X_fake, L_fake], y_fake)

#

That is what I have now, but I knew there had to be a way to speed it up.

#

I'm fighting the model while also trying to figure out speed.

#

the model won

#

that is per batch, which is 64 out of like 1,000,000

flat quest May 7, 2020, 3:27 AM

#

what kinda model are u making?

reef flume May 7, 2020, 3:33 AM

#

What exaclty is an AJAX API and how will it influence web scraping with BeautifulSoup? I'm trying to parse data from a table, but upon parsing, the extracted data (specifically from the table) differs from that of the source code found on Google Chrome. Almost as if I can't access it.

flat quest May 7, 2020, 3:33 AM

#

whats with the dis_loss_real and dis_loss_fake btw 😂

#

uh ajax is just for getting data asynchronously, using the js event loop
haven't used beautifulsoup yet, so not sure how it integrates with that

terse torrent May 7, 2020, 3:39 AM

#

What SQLs databases should I be sure to learn? I learned IBMs

deft harbor May 7, 2020, 3:51 AM

#

@flat quest a conditional gan

#

📎 unknown2.png

#

discriminator loss on real vs generated images (fake)

#

📎 Screenshot_from_2020-05-06_20-37-28.png

#

early epochs

opaque stratus May 7, 2020, 5:08 AM

#

Hello Everyone,

I am a soon-to-be sophomore in college with strong interests in Machine Learning/Data Science and I am currently trying to decide on a major. As of now, I am majoring in mathematics, and I believe I like it enough. But please, do you guys have any advice with this? Those of you working in some field within data science, what was your major? What are some good majors for data science in general? Thanks! Data science is so vast, I am just sure there are many strong majors to help prepare one for the field that I could perhaps like even more!

flat quest May 7, 2020, 5:31 AM

#

ohh gotcha
yeah haven't worked with gan's yet,
are u using custom layers to build the gan? or the built in ones?

lunar holly May 7, 2020, 5:54 AM

#

Hello! I'm having a bit of trouble, working on a school project where I'm using K-Nearest Neighbors for classification. I'm trying to figure out if/when it would be wise to omit certain columns from my data in order to improve model accuracy? Certain features of my data, after EDA seem to be pretty useless in relation to my target variable... I removed them, and from testing it seems I've actually I was able to improve my error rate

flat quest May 7, 2020, 5:58 AM

#

features that have very little relevance to the output should be removed
they're generally just providing extra noise, and that can create greater variance within the model

But make sure that they aren't related, otherwise you may be taking out an important predictors from your data

lunar holly May 7, 2020, 6:00 AM

#

Got it, thanks! ^^ The features are all categorical, and from what I've seen it doesn't seem like there's a signal or anything indicating some type of trend

flat quest May 7, 2020, 6:01 AM

#

yeah then its probably safe to remove them, the only real determinant of the performance of a model is having a strong predictor within the data. Sometimes you can even get rid of like all of the features, and have a better performance lol

lunar holly May 7, 2020, 6:12 AM

#

And it keeps getting better ^^ the more I take out
Thanks once again

lapis sequoia May 7, 2020, 8:19 AM

#

https://www.coursera.org/specializations/data-science-python

Coursera

Applied Data Science with Python | Coursera

Learn Applied Data Science with Python from University of Michigan. The 5 courses in this University of Michigan specialization introduce learners to data science through the python programming language. This skills-based specialization is ...

#

Hey guys, is this a good course to start in data science ?

#

I havent any data science yet

#

I only know python

faint furnace May 7, 2020, 9:36 AM

#

looks good i should also enroll

cunning wadi May 7, 2020, 11:58 AM

#

hello chaps

#

Is this the right place if you need help with like decision trees and stuff

faint furnace May 7, 2020, 12:19 PM

#

Question is "Which actor - director pair is most successful (in terms of IMDB ratings)?" . So I was able to create this data set with Actor - Director pairs and their ratings in movies. But clearly I cannot judge which pair is best without taking into account the number of movies they have did together. Any recommendation on how to deal with this?

📎 unknown.png

polar acorn May 7, 2020, 12:26 PM

#

@cunning wadi Sure

cunning wadi May 7, 2020, 12:29 PM

#

Cool

#

I'm new to this kind of stuff and was wondering what I'm missing here cause something is clearly wrong

📎 unknown.png

polar acorn May 7, 2020, 12:48 PM

#

What do you suppose is wrong there? I mean cells 6-8 apparently do nothing at all. As for the results, we don't really know your data, a 98% accuracy might be nothing out of the ordinary.

cunning wadi May 7, 2020, 12:53 PM

#

Let me try and clean it up a bit

#

📎 unknown.png

#

Maybe this makes more sense

#

My question is how do i increase the accuracy?

polar acorn May 7, 2020, 1:06 PM

#

Yes that makes more sense :D. One thing you can check is how well your model predicts the training data. If your training data accuracy is very high but the test data accuracy is very low. You might suffer from overfitting, in which case you can limit the maximum depth of your decision tree. If both testing and training are bad maybe you would want a deeper tree, then again maybe thats the best you model can give you for that data. Looking into overfitting is probably where I would start at least.

flat bough May 7, 2020, 2:38 PM

#

hi, I want I to get familiar with machine learning. Can you recommend the best way to start

fervent bridge May 7, 2020, 2:51 PM

#

Sentdex on youtube has some good courses. @flat bough

flat bough May 7, 2020, 2:52 PM

#

Thank you

fervent bridge May 7, 2020, 2:52 PM

#

Building my portfolio my code is running for approximately 30 hours to gather first touched data so excited 🙂

lapis sequoia May 7, 2020, 2:53 PM

#

hey guys can anyone tell that which course is best to start in data science out of these two?
https://www.coursera.org/specializations/jhu-data-science?siteID=OyHlmBp2G0c-0328ZKV34mF3.yMgOBpdWA&utm_content=2&utm_medium=partners&utm_source=linkshare&utm_campaign=OyHlmBp2G0c

Coursera

Data Science | Coursera

Learn Data Science from Johns Hopkins University. Ask the right questions, manipulate data sets, and create visualizations to communicate results. This Specialization covers the concepts and tools you'll need throughout the entire data science ...

#

https://www.coursera.org/specializations/data-science-python

Coursera

Applied Data Science with Python | Coursera

Learn Applied Data Science with Python from University of Michigan. The 5 courses in this University of Michigan specialization introduce learners to data science through the python programming language. This skills-based specialization is ...

#

i have learned python but dont know R language

fervent bridge May 7, 2020, 2:53 PM

#

I was looking at Data from Github and I saw that R was not widely used as other languages.

deft harbor May 7, 2020, 2:56 PM

#

@flat quest mix of custom convolution layers in the generator and discriminator. The gan model itself is trained using python train_on_batch loops.

flat quest May 7, 2020, 5:09 PM

#

gotcha,
yeah its something i'll have to look into pretty soon

Trying to find better ways to deal with categorical data, it has to be in numbers, but ordinal data represents relationships that don't exist, same with one-hot (although its better)

#

@cunning wadi ya ur definitely overfitting, use the tree parameters to limit the depth / nodes of the tree

And then try using ensemble methods (xgboost is prob ur best choice here) to try to further increase accuracy.

umbral aspen May 7, 2020, 7:19 PM

#

I am still struggling to find simple examples of using tensorflow to classify a lot of images at once. Should I just loop through all files and classify them one by one? I also tried the predict_on_batch method, however that only returns one result for the entire numpy array of images I send to it...

#

Anyone have any ideas?

flat quest May 7, 2020, 7:24 PM

#

tf models take batches by default u can just use the predict method
no use batches, it has better performance

create a tf dataset or do it manually
but run a for loop and each iteration get a batch of elements. Feed those elements using model.predict
and u'll get a batch of predictions

glad obsidian May 7, 2020, 7:32 PM

#

Is this the right place to ask questions about python especially juypter notebooks ?

umbral aspen May 7, 2020, 7:48 PM

#

@flat quest Thanks I will try that!

flat quest May 7, 2020, 7:48 PM

#

yeah go for it, jupyter notebooks generally related to data science

#

np

glad obsidian May 7, 2020, 7:52 PM

#

I#m currently working on a knn-algorithm
Got a function which produces a list like that,
label are values like 0, 0.3232, 1 ,....
now i want to count every label based on its occurence
and save the label with the highest occurence as a new variable
I am only allowed to use numpy, itertools, pandas and maths

[([Vector1], dist, label), ([Vector2], dist, label), ([Vector3], dist, label)]

Heres an example output for 1 Vector:

([0.2711736617240513, 0.014151057562208049, 0.125], 0.0, 0)

Desired Output:

1: 30 , 0.2: 4, 0: 3
a =1

umbral aspen May 7, 2020, 7:52 PM

#

@flat quest The predict method also seems to only return one prediction even when I pass an array. Any ideas?

images = []
files = os.listdir('../img/raw/category')
for i in range(len(files)):
    img = cv2.imread(f'../img/raw/category/{files[i]}')
    img = cv2.resize(img, (224, 224),3)
    img = np.array(img).astype(np.float32)/255.0
    img = np.expand_dims(img, axis=0) 
    images.append(img)

predictions = new_model.predict(images)

flat quest May 7, 2020, 7:55 PM

#

check the shapre of the images
and whats the input shape of the model? It should be (None, input_dims, ...)
The None is the batch_shape

umbral aspen May 7, 2020, 7:55 PM

#

My input shape was input_shape=(224, 224, 3)

flat quest May 7, 2020, 7:57 PM

#

hmmmm and whats the shape of the images array?

lapis sequoia May 7, 2020, 8:00 PM

#

any pandas experts here?

umbral aspen May 7, 2020, 8:01 PM

#

Hmm not sure how to really see that...I do know that it works if I use predict on the individual images though...

#

I am fairly new to ML

lapis sequoia May 7, 2020, 8:01 PM

#

does anybody have any idea why pandas reads everything as columns and no rows in this situation?

📎 Skjermbilde_2020-05-07_kl._21.45.17.png

umbral aspen May 7, 2020, 8:02 PM

#

that looks like one string

#

loaded into the first cell of the dataframe

#

Show us the source of where you create your data frame and we can check

lapis sequoia May 7, 2020, 8:02 PM

#

ait ill send link now

#

basically nasdaq stockholm

#

http://www.nasdaqomxnordic.com/indexes/historical_prices?Instrument=SE0000337842

Historical prices OMXS30, OMX Stockholm 30 Index, (SE0000337842) - ...

Historical prices

#

and you can download the csv file below

#

@umbral aspen forgot to tag u

umbral aspen May 7, 2020, 8:07 PM

#

Can you add your code

lapis sequoia May 7, 2020, 8:08 PM

#

📎 Skjermbilde_2020-05-07_kl._22.08.25.png

umbral aspen May 7, 2020, 8:10 PM

#

Your code looks fine - I also just ran the same on my side and it is creating a dataframe and intepreting the columns correctly

lapis sequoia May 7, 2020, 8:11 PM

#

what

#

can you send your code?

#

maybe there is something im missing

umbral aspen May 7, 2020, 8:11 PM

#

It is the exact same as yours 🙂

lapis sequoia May 7, 2020, 8:12 PM

#

even more confusing

#

@umbral aspen can you do me one more favour?

#

can u open up the csv file in a text editor so i can see how it looks like from your side?

#

and just screenshot

flat quest May 7, 2020, 8:28 PM

#

@umbral aspen try converting ur list to an np.array and calling the .shape property on the np array

umbral aspen May 7, 2020, 8:37 PM

#

this is the shape (189, 1, 224, 224, 3)

#

@lapis sequoia It looks like this

sep=;
Date;Highprice;Lowprice;Closingprice;Averageprice;Totalvolume;Turnover;
2020-05-06;1,537.55;1,520.71;1,521.40;;1;;
2020-05-05;1,539.40;1,515.41;1,538.18;;1;;
2020-05-04;1,531.99;1,505.21;1,505.21;;1;;
2020-04-30;1,606.80;1,577.90;1,577.92;;1;;
2020-04-29;1,602.34;1,559.63;1,600.86;;1;;
2020-04-28;1,571.31;1,538.26;1,568.50;;1;;
2020-04-27;1,543.42;1,528.14;1,539.51;;1;;
2020-04-24;1,529.19;1,511.28;1,514.13;;1;;
2020-04-23;1,547.48;1,512.68;1,541.89;;1;;
2020-04-22;1,527.69;1,503.30;1,527.63;;1;;
2020-04-21;1,525.60;1,493.45;1,493.45;;1;;
2020-04-20;1,548.68;1,518.12;1,541.17;;1;;
2020-04-17;1,536.84;1,520.72;1,534.55;;1;;
2020-04-16;1,499.91;1,469.14;1,483.79;;1;;
2020-04-15;1,534.41;1,478.82;1,480.60;;1;;
2020-04-14;1,538.91;1,516.50;1,535.84;;1;;
2020-04-09;1,526.95;1,495.79;1,498.76;;1;;
2020-04-08;1,502.88;1,485.02;1,499.51;;1;;
2020-04-07;1,514.17;1,483.19;1,510.85;;1;

lapis sequoia May 7, 2020, 8:38 PM

#

so basically the same..

#

alright, thanks for the help buddy

umbral aspen May 7, 2020, 8:38 PM

#

no problem

delicate rune May 7, 2020, 8:44 PM

#

Hi guys I'm working on a worldmap with 'folium' it works perfectly with my needs but if there is any other library in python that does the same job or a better job then 'folium' tell me!!

flat quest May 7, 2020, 8:47 PM

#

yeah @umbral aspen that image shape doesn't coincide with the input_shape thats expected

the model expects an input_shape of (batch_size, 224,224, 3)
u have an extra 1 there

umbral aspen May 7, 2020, 8:48 PM

#

So would batch_size be the amount of photos I want to predict the classification for

#

Or the batch_size I used to train?

flat quest May 7, 2020, 8:50 PM

#

it can be any
it doesnt really matter, tho its preferred to use the same batch_size that u used during training

lapis sequoia May 7, 2020, 8:58 PM

#

So I have one column in pandas, and I want to split it in half and put them side by side in a new excel sheet

#

I'm up to the point where I have them side by side, but there is a gap of empty cells, does that makes sense

#

How do I slide up the first two columns so they are at row 0?

📎 Screen_Shot_2020-05-07_at_4.59.44_PM.png

#

without using the names of the columns, I only want to use the numbered names of the columns if that makes sense

surreal ivy May 7, 2020, 10:01 PM

#

does anybody have any idea why pandas reads everything as columns and no rows in this situation?
@lapis sequoia show us the command.

lapis sequoia May 7, 2020, 10:22 PM

#

i just found a similar dataset with better formatting tbh lol.. appreciate the help tho

unreal thistle May 7, 2020, 10:22 PM

#

hello guys

#

sorry for bothering

#

can someone explain to me how i can choose the learning rate epoches and batch for a machine learning process

#

?

lapis sequoia May 7, 2020, 10:30 PM

#

@unreal thistle i know nothing about this, but have you tried asking for help in the "help" rooms?

unreal thistle May 7, 2020, 10:36 PM

#

@lapis sequoia i solved it thanks anyway 🙂

woven saffron May 7, 2020, 10:45 PM

#

import tensorflow as tf

(x_train, y_train), (x_test, y_test) = tf.keras.datasetsmnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(
    optimizer='adam', 
    loss=loss_fn, 
    metrics=['accuracy']
)

model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test, y_test, verbose=2)

model.summary()
model.save('mnist.model')


probability_model = tf.keras.Sequential([
    model,
    tf.keras.layers.Softmax()
])

probability_model.save('mnist_probability.model')

I have the following code, I can properly do predictions with the probability model generated and it gives me correct output, and I can save the probability model. However, when I load the probability model I get the following error: ValueError: An empty Model cannot be used as a Layer.. Anyone know why?

#

If needed I can just construct this model where I load it and call it a day

#

This code is straight off tensorflow website

flat quest May 7, 2020, 10:46 PM

#

how are u loading the model? can u show the code for the loading part?

woven saffron May 7, 2020, 10:46 PM

#

# Load model
model = tf.keras.models.load_model('mnist.model')
pmodel = tf.keras.models.load_model('mnist_probability.model')

#

Top one works

#

Bottom one doesn't

#

It is only the probability model

#

Is it because probability depends on another model?

#

And somehow I need to save that model within it?

flat quest May 7, 2020, 10:48 PM

#

i dont think so
hold on

woven saffron May 7, 2020, 10:49 PM

#

Oh wait

#

I need to do compile=False I think

#

@flat quest

#

It is a load_model kwarg

flat quest May 7, 2020, 10:50 PM

#

yea maybe, try that

woven saffron May 7, 2020, 10:50 PM

#

Nvm still not working

#

Traceback (most recent call last):
  File ".\app.py", line 11, in <module>
    pmodel = tf.keras.models.load_model('mnist_probability.model', compile=False)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\saving\save.py", line 150, in load_model
    return saved_model_load.load(filepath, compile)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\saving\saved_model\load.py", line 86, in load
    model = tf_load.load_internal(path, loader_cls=KerasObjectLoader)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\saved_model\load.py", line 541, in load_internal
    export_dir)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\saving\saved_model\load.py", line 103, in __init__
    self._finalize()
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\saving\saved_model\load.py", line 127, in _finalize
    node.add(layer)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\training\tracking\base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\engine\sequential.py", line 170, in add
    batch_shape, dtype = training_utils.get_input_shape_and_dtype(layer)
  File "C:\Users\Ryan\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py", line 1776, in get_input_shape_and_dtype
    raise ValueError('An empty Model cannot be used as a Layer.')
ValueError: An empty Model cannot be used as a Layer.

#

Full traceback

flat quest May 7, 2020, 10:58 PM

#

hmm works for me when I run the code

woven saffron May 7, 2020, 10:58 PM

#

Really?

#

So you can save and load the model?

flat quest May 7, 2020, 10:58 PM

#

yeah

woven saffron May 7, 2020, 10:59 PM

#

Let me just wipe my saved models from disk

#

And rerun it

#

And see what happens

flat quest May 7, 2020, 10:59 PM

#

yeah try that

#

wait

#

oh nvr mind

#

yeah try it

woven saffron May 7, 2020, 11:00 PM

#

did yours stop working?

flat quest May 7, 2020, 11:00 PM

#

nah i was wondering why u were using tf.keras.sequential
instead of tf.keras.models.sequential

but it worked for ur initial model so that shouldnt be an issue

woven saffron May 7, 2020, 11:00 PM

#

Oh yeah

#

Idk if they just bring it up a layer

#

Or if its different

#

Lemme change it to be safe

#

Yeah same error for me

flat quest May 7, 2020, 11:02 PM

#

does the code work if u take out the load_model for the second model?

woven saffron May 7, 2020, 11:02 PM

#

Yeah

#

I am using tensforflow version 2.0.0 and python 3.7.7

#

Hbu?

#

Ultimately I just need the confidence values given by predict

#

However whenever I use a single model and change the activation of the last layer to softmax

#

I get incorrect outputs

flat quest May 7, 2020, 11:07 PM

#

pretty much same versions im using tf 2 as well
u could always change the logits to probability directly urself, using the conversion equation e^yi / sum(e^yj)

but it should work normally

#

can u check if there is a folder thats created called probability_model and check the contents inside of it?

#

or mnist_probability.model

woven saffron May 7, 2020, 11:08 PM

#

📎 unknown.png

#

assets is empty but variables has stuff in it

flat quest May 7, 2020, 11:10 PM

#

hm, somehow the original model is empty
What does it say when u run model.summary() on the probability model?

woven saffron May 7, 2020, 11:11 PM

#

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
sequential (Sequential)      (None, 10)                101770
_________________________________________________________________
softmax (Softmax)            (None, 10)                0
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________

flat quest May 7, 2020, 11:12 PM

#

and u can run it as expected?

woven saffron May 7, 2020, 11:12 PM

#

Correct, here is some output

#

[[3.80928800e-09 1.10353615e-10 1.40276351e-08 2.80762464e-02
  8.11891407e-20 9.71923769e-01 2.31030445e-13 3.15607471e-11
  2.01315395e-10 1.41472851e-08]
 [9.99963284e-01 9.15403656e-11 3.66451968e-05 3.89118000e-08
  3.50633727e-14 1.38634775e-08 2.98542346e-08 1.21814037e-09
  3.78561360e-09 6.25292316e-08]
 [2.68988494e-08 2.68716594e-05 3.06581824e-05 2.91746710e-05
  9.99746859e-01 3.31298179e-06 2.84864218e-05 8.04476658e-05
  4.74372700e-06 4.93127300e-05]
 [4.18286461e-10 9.99978781e-01 1.48045337e-05 3.34784325e-08
  5.45938263e-08 2.90107138e-08 3.23322631e-08 5.08780204e-06
  1.27303042e-06 2.15702745e-09]
 [1.18402343e-10 3.49863080e-06 2.44232268e-07 2.54041806e-05
  7.62854412e-04 2.78798484e-06 2.42455744e-09 4.18028831e-05
  1.83790416e-05 9.99145031e-01]]

#

I predicted the first 5 items from the training set

flat quest May 7, 2020, 11:13 PM

#

gotcha

woven saffron May 7, 2020, 11:14 PM

#

I might just build the model in the app where I load the non-probability model

worldly elm May 7, 2020, 11:15 PM

#

How do I slide up the first two columns so they are at row 0?
@lapis sequoia iterate using the df.loc() method through each row of that column, make its value equale to the +6 row, then remove the last 5 rows.

unreal thistle May 7, 2020, 11:15 PM

#

guys sorry for bothering again i just try to undertand some things so i've seen this deep learnig model and he is constructing the head of the model that will be placed of on top of the base model

#

what does it mean

#

?

woven saffron May 7, 2020, 11:16 PM

#

it adds an additional layer to the model

flat quest May 7, 2020, 11:17 PM

#

yeah its really weird, cause there's no problems when i run it and our package versions are the same

woven saffron May 7, 2020, 11:17 PM

#

im assuming you're on windows?

flat quest May 7, 2020, 11:18 PM

#

im on a mac rn
but im using colab

woven saffron May 7, 2020, 11:18 PM

#

Can you explain to me what an EagerTensor is?

flat quest May 7, 2020, 11:18 PM

#

eagerexecution?

woven saffron May 7, 2020, 11:19 PM

#

Yeah, and an EagerTensor is a class I guess

#

My prediction is an EagerTensor

flat quest May 7, 2020, 11:19 PM

#

yeah basically a tensor within an eagerexecution environment

unreal thistle May 7, 2020, 11:19 PM

#

@woven saffron why is the head of the layer needed in training?

woven saffron May 7, 2020, 11:20 PM

#

So drag

#

I was doing this before

prediction = pmodel(digit_data)
confidence = prediction.tolist()[0]

#

However now I am getting this error

#

AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'tolist'

#

The output type of my prediction has changed

flat quest May 7, 2020, 11:21 PM

#

it just means that instead of placeholders, the outputs are directly displayed
in tf 1.x, everything was done inside a session, so all variables were placeholders, and outputs were placeholders as well, until we actually ran the session

#

anything in 2.x should default to an eager exeuction environment
just convert the values to numpy using .numpy()

woven saffron May 7, 2020, 11:22 PM

#

Ah I see

#

If I do a single prediction

#

Will it still give me back a 2d array?

#

Of length 1

flat quest May 7, 2020, 11:23 PM

#

yeah

woven saffron May 7, 2020, 11:23 PM

#

Awesome, thanks

#

Let me try this

flat quest May 7, 2020, 11:24 PM

#

np

woven saffron May 7, 2020, 11:26 PM

#

My model has a 98% accuracy but when I sent it a 1

#

It predicts 3

#

That is pretty strange

flat quest May 7, 2020, 11:26 PM

#

u sent it a 1? what do you mean

woven saffron May 7, 2020, 11:26 PM

#

It is the mnist dataset of hand drawn digits

flat quest May 7, 2020, 11:26 PM

#

ah

woven saffron May 7, 2020, 11:26 PM

#

I think I know why it is doing that though one sec

flat quest May 7, 2020, 11:27 PM

#

make sure ur using argmax on the last axis

#

or the results get messed up

woven saffron May 7, 2020, 11:27 PM

#

I was doing np.argmax(prediction)

#

Instead of on the first numpy array

#

prediction.numpy()[0]

#

I think that mightve screwed it up

#

 prediction = pmodel(digit_data)
 confidence = prediction.numpy()[0].tolist()
 highest = np.argmax(confidence)

flat quest May 7, 2020, 11:28 PM

#

yeah, easier way to do it is just run it on the last axis, so it can work with any batch size

woven saffron May 7, 2020, 11:28 PM

#

This is my code

#

I am turning it to a list because the data is being returned in an API

#

So it needs to be JSON safe

flat quest May 7, 2020, 11:28 PM

#

ah

#

i would do np.argmax(preds, -1)
then convert the resulting array into a list

woven saffron May 7, 2020, 11:28 PM

#

Alright will do

#

preds being the numpy array?

flat quest May 7, 2020, 11:29 PM

#

yeah the predictions

#

tensors automatically get converted to np arrays

woven saffron May 7, 2020, 11:30 PM

#

prediction = pmodel(digit_data)
highest = np.argmax(prediction, -1)
``` like this?

flat quest May 7, 2020, 11:30 PM

#

ya

#

if ur using a batch size of one the result will still be a 2d array, so u can remove extra dims if u want using [0]

woven saffron May 7, 2020, 11:33 PM

#

Yeah unfortunately the prediction is still wrong

#

Same value

#

{'confidence': [1.6521667149409522e-18, 2.794477973674936e-12, 0.002019522013142705, 0.9946495890617371, 0.0, 0.002819732530042529, 2.1464751850941433e-11, 0.0005111345089972019, 3.013474395628175e-32, 9.863308066247621e-36], 'prediction': 3}

#

Here is the image

#

📎 im.png