#data-science-and-ml | Python | Page 232

flat plank Jul 2, 2020, 3:02 PM

#

Hi everyone new to the channel and datascience as a hobby been working on my first toy project which I can't stop adding to it.

#

I have simple cold start recommender using get_dummies then moved on run sentiment analysis using vader and topic modeling. I want to now combine the 3 dataframes and make a hybrid combining them and pointers as to the docs to read. Is this the right place to ask?

paper niche Jul 2, 2020, 3:22 PM

#

@tender wind So what you have is a list of JSON (raw data); whereas the training set should be a dataframe of features. E.g. you have an array of transactions per row; you would extract features like number of transactions, average transaction amount etc., and not input the entire transactions array into the ML model.

I would just map functions that extract individual features on the list of dictionaries (converted over from JSON), for example:

json_list = [...]  # your raw data (map `json.loads` on these if need be)

# define a few functions here to extract specific features from your raw data
def number_of_transactions(row):
    return len(row['transactions'])

def avg_transaction_amt(row):
    return np.mean([txn['transactionAmount'] for txn in row['transactions']])

# map these functions onto json_list and concatenate the results together
feature_list = [[f(row) for f in [number_of_transactions, avg_transaction_amt]] for row in json_list]
df_X = pd.DataFrame(feature_list)

Of course, plenty of ways to clean this up, but this is one possible way to approach the extraction process. Basically, the idea being to write functions, each function corresponding to 1 feature that you want to extract from the "raw data".

lapis galleon Jul 2, 2020, 3:33 PM

#

Hello, I am trying to add a column to a dataframe to later obtain the frequencies. However, when I add the column I get the following warning:

In[]:
a=FFMonthsXY[2] #FFMonths[2] is a dataframe
a.loc[:, 'count']=1
print(a.head(3))

Out:
   X  Y  count
0  1  3     1
1  2  2     1
2  2  2     1
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py:845: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead```

acoustic halo Jul 2, 2020, 3:56 PM

#

@dull turtle 175 images is a really small datatset, you really need a lot more than that, that's probably why it isn't working very well

#

infact if your program always guesses invalid image for everything, it would still have accuracy higher than 50%

#

if your validation set it roughly the same

queen jungle Jul 2, 2020, 4:08 PM

#

What is the best method for saving a sklearn model, what happens to me is that the model which I save to a pickle file, it's actuary is always different (usually for the worst) to its original value. To add when trying to find the best model possible and iteratively looping over the fitting of the model and have it save to a pickle file when the best one is found (this could mean .dumping a model 30+ times) I almost always receive the worst version of my model.

acoustic halo Jul 2, 2020, 4:10 PM

#

Is the model so big that you cant just have a list of models?

queen jungle Jul 2, 2020, 4:11 PM

#

Well the data it's using size is 575154.
I didn't know that was possible

acoustic halo Jul 2, 2020, 4:11 PM

#

also i think sklearn recommends joblib, though it's effectively the same as pickle

#

And fwiw, if you dump a model, reload it and it is different, it's more likely you are dumping it wrong somehow

queen jungle Jul 2, 2020, 4:15 PM

#

well I'm using the normal syntax and one placed on the sklearn docs python s = pickle.dumps(clf) clf2 = pickle.loads(s)

tender wind Jul 2, 2020, 4:33 PM

#

@paper niche That is a neat way to go at it. I am still wondering how to catch the relationship between the payment method used for a transaction among all the payment methods a user might have...

pastel compass Jul 2, 2020, 4:36 PM

#

What's a good way to the get the average of multiple vectors?

lapis sequoia Jul 2, 2020, 5:54 PM

#

OOh i have a good example! ```listOfVectors = [[1, 2, 3], [3,2,1]]
numberOfVectors = len(listOfVectors)
totalXTerms = 0
totalYTerms = 0
totalZTerms = 0

for i in listOfVectors:
totalXTerms += i[0]
totalYTerms += i[1]
totalZTerms += i[2]

print("Averaged vector is: " + str([totalXTerms / numberOfVectors, totalYTerms / numberOfVectors, totalZTerms / numberOfVectors]))```

#

Efficiency is 3 * O(n), where n is the number of 3D vectors in the list

#

not bad

cursive sun Jul 2, 2020, 6:32 PM

#

@pastel compass np.mean(np.array(listOfVectors),axis=0)

pastel compass Jul 2, 2020, 6:45 PM

#

Thanks! @cursive sun @lapis sequoia

cursive sun Jul 2, 2020, 6:53 PM

#

I think the more fun question is how do you average vectors that aren't in R^n

#

e.g. S^2

#

so if I gave you pairs of days of the month and hours of the day, e.g. [28,10],[31,8],[7,2], etc

#

how would you meaningfully talk about averages

#

like, what's the average time of day something happens

lapis sequoia Jul 2, 2020, 6:56 PM

#

lol i forgot about numpy XD

#

i need to learn how to use that

#

i'm still so new to python

#

data science is sick

queen jungle Jul 2, 2020, 8:10 PM

#

Is there something I'm doing wrong here?

📎 unknown.png

sonic goblet Jul 2, 2020, 9:31 PM

#

I want to ask about tfidf vectorizer

#

suppose I have testing and training separate datasets

#

if i fit the tfidfvectorizer to training set only

#

can i use the vectorizer on testing set?

tight stone Jul 2, 2020, 10:12 PM

#

Hello,
I am currently working on a web-project (React) which takes in 2 images (simple shapes like a triangle or circle on a white canvas) as inputs, sends them to a backend (Python) where multiple samples are created by using randomized transformation-functions from opencv and then are finally used to build (compile) a tf.keras-model which is supposed to return probabilities on a 3rd image that determines as to how close it looks to the 2 input images and how much it does not look like them.
For example:
3rd image (circle) probabilities: 60% - input-image1 (circle), 30% input-image2 (square), 10% undefined

I know this is a lot to take in but my actual issue lies in the model that I compile.
While the model goes through its epochs it shows a stable increment in its growth, so, acc is rising and the loss is decreasing. Though, I still get very confusing results when I predict the probabilities. For some reason, my model sometimes returns probabilities that make no sense.
For example, I choose to draw a triangle and a square and predict my 3rd image where I draw a different triangle it sometimes returns something like: 9% triangle, 91% square.
And here I am now trying to understand these kind of results.

important note:
I have not worked on neural networks/machine learning before.
I also have limited knowledge about all the ways you can build a model for different kinds of cases. So, please excuse my lack of knowledge at this point.
I build my current model by watching a video from the youtuber sentdex with the title: Deep Learning with Python, TensorFlow, and Keras tutorial

paper niche Jul 3, 2020, 12:06 AM

#

@tender wind hmm, I assume you mean you want to do the 'join' between transaction and payment methods. That would preferably be done during the raw dataset generation stage (i.e., place each payment method struct inside the transaction struct ) -- do you have any influence over the generation of this JSON data whatsoever?

if not, you could just implement the left / inner join yourself using pure python..

#

Is there something I'm doing wrong here?
@queen jungle you need to call fit on the CV object first before you can use transform. Alternatively you can use fit_transform() which does both for you.

#

@tight stone maybe start by looking at all the artificial/augmented training examples that opencv is throwing out. and compare them visually with the 3rd image you're drawing.

ripe forge Jul 3, 2020, 12:52 AM

#

if i fit the tfidfvectorizer to training set only
@sonic goblet that is the only correct way to do it, yes. And yes you are expected to use it on both train and test.

sullen glacier Jul 3, 2020, 1:08 AM

#

@everyone
Just shipped something you'll like! 🚀
Highly recommend this free chrome/firefox extension as a must-have. It automatically finds code implementations for machine learning papers anywhere on the web (Google, Arxiv, Twitter, Scholar, and other sites)
https://chrome.google.com/webstore/detail/mlai-code-implementation/aikkeehnlfpamidigaffhfmgbkdeheil
or
https://addons.mozilla.org/en-US/firefox/addon/code-finder-catalyzex/

ML/AI Code Implementation Finder - CatalyzeX

Code auto-finder for ML/AI papers, powered by CatalyzeX.com's repository used by thousands of engineers & researchers worldwide.

ML/AI Code Implementation Finder – Get this Extension for 🦊 Firefox...

Download ML/AI Code Implementation Finder for Firefox. Code auto-finder for ML/AI papers, powered by CatalyzeX.com's repository used by thousands of engineers & researchers worldwide.
This add-on automatically finds and links open-source code implementations in-line on the cu...

quasi cape Jul 3, 2020, 3:05 AM

#

Any one new to Data science and want to practice , i am also in same situation and need someone to practice with

leaden snow Jul 3, 2020, 3:51 AM

#

@quasi cape yep ,

quasi cape Jul 3, 2020, 3:55 AM

#

I am at very beginner level

leaden snow Jul 3, 2020, 4:19 AM

#

@quasi cape oh , no worries

lapis sequoia Jul 3, 2020, 4:30 AM

#

I know almost nothing about data science, but I'm willing to learn XD

#

I'm guessing it's generating statistics on data, but I'm not sure how useful that is XD

#

Unless the data is SUPER extensive, like to the point of violating privacy

#

Like recording what times a person is online, what they search on their web browser, etc

#

Then, I guess you could use that information to figure out what to advertise to them lol

ripe forge Jul 3, 2020, 5:39 AM

#

see, that's not the only side of things that has data

#

taking your same analogy, i can either research every habit of a person.. or, quite simply, i just observe my own trends as a proxy

#

so, if i find that brand x sold more than brand y of some item, i dont have to research into the buying habits of my customer, i already have the evidence of what was sold

#

data science usually tries to work with the "indicators" that the business has, not stuff that is private to an individual

lapis sequoia Jul 3, 2020, 5:57 AM

#

is anyone familiar with spark here

#

I'm wondering about the efficiency of processes

#

so, I know spark api's in python work at the spark level on the jvm processes

#

but python processes are also spawned .. and we have to maintain the data processing on the spark level

#

I'm wondering if using python functions def that encapsulate pyspark code slows done spark by adding overhead

dull turtle Jul 3, 2020, 6:03 AM

#

when i do batch size = 32 i am getting score= model.evaluate_generator(test_set) [14.239362716674805, 0.17073170840740204]

#

i am getting more loss and accuracy

dull turtle Jul 3, 2020, 7:46 AM

#

what can be wrong happening here?

#

i am having droput (0.4), batch size = 32, epoch = 1500

indigo steppe Jul 3, 2020, 8:21 AM

#

b = np.array([[[1,2], [3,4]], [[5,6], [7,8]]])
print(b)

[[[1 2]
[3 4]]

[[5 6]
[7 8]]]

So this is a 3d array in numpy,right?It is 3d because of the [[5,6], [7,8]] being one dimension and the other arrays are the other two dimensions,right?

ripe marlin Jul 3, 2020, 9:10 AM

#

from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import numpy as np
iris=load_iris()
knn=KNeighborsClassifier()
X=iris.data
y=iris.target
X_train,X_test,y_train,y_test=train_test_split(X,y)
knn.fit(X_train,y_train)
knn.predict(X_test)

#

TypeError Traceback (most recent call last)
<ipython-input-20-b2700a2c1e9a> in <module>
----> 1 knn.predict(X_test)

TypeError: 'numpy.ndarray' object is not callable

#

what's going on here?

dull turtle Jul 3, 2020, 9:38 AM

#

now i am getting this way score= model.evaluate_generator(test_set) [11.011309623718262, 0.0]

#

when i do prediction on this it is predicting correctly

acoustic halo Jul 3, 2020, 9:47 AM

#

@indigo steppe Every level you go down in a list effectively is a new dimension

#

so [1,2] is 1d, [[1,2],[2,3]] is 2d

#

a list of 2ds would be 3d

indigo steppe Jul 3, 2020, 9:49 AM

#

Ok thx,but looking on the printed out result,could you tell the dimension?

#

the number of dimensions

acoustic halo Jul 3, 2020, 9:49 AM

#

its 3d

indigo steppe Jul 3, 2020, 9:50 AM

#

i am new to numpy so indexing and dimensions are a bit confusing to me

acoustic halo Jul 3, 2020, 9:51 AM

#

if you print len(array.shape), that gives you a rough idea

indigo steppe Jul 3, 2020, 9:51 AM

#

oh,cool,thx for the info

pale thunder Jul 3, 2020, 9:52 AM

#

array.ndim iirc

lapis sequoia Jul 3, 2020, 10:22 AM

#

any kagglers here?

topaz delta Jul 3, 2020, 10:24 AM

#

Hi all, i am doing a project using regression on how to predict the forex market with information relative to covid-19 and was wondering if there was anyone i could discuss my logic with over a video call? id really appreciate it! do let me know 🙂 vivienneobrien.github.io / twitter: @iamvob

acoustic halo Jul 3, 2020, 11:18 AM

#

@topaz delta hello fellow msc

spare stone Jul 3, 2020, 11:19 AM

#

Hi all

#

I'm trying to figure out how feasible it would be to leverage an open sourced social listening tool and some AWS credits to build word cloud trend reports

#

https://github.com/openstream/open-social-media-monitoring

GitHub

openstream/open-social-media-monitoring

Open Source Social Media Monitoring Suite. Contribute to openstream/open-social-media-monitoring development by creating an account on GitHub.

#

Willing to pay for consultation and/or work, just trying to scope out the project

acoustic halo Jul 3, 2020, 11:30 AM

#

@spare stone It would be a relatively simple project but that repo is super old, I would imagine few of the APIs would be dead, especially major ones like FB or twitter

spare stone Jul 3, 2020, 11:30 AM

#

that was a very limited search, mostly to validate that something like it existed.

#

Thank you for your input. Assuming there is an updated repo, this wouldn't be too difficult to pull off?

acoustic halo Jul 3, 2020, 11:32 AM

#

If all you want is word cloud info based on time/date then yes

spare stone Jul 3, 2020, 11:35 AM

#

I think so. I'd like to be able to look at different location parameters if possible. Would it cost a lot of amazon credits?

#

if I was looking at twitter for the whole US, for example.

unkempt rapids Jul 3, 2020, 11:41 AM

#

can i ask a web scraping question here?

acoustic halo Jul 3, 2020, 11:41 AM

#

I think if anything, the twitter api will cost more than cloud credit

#

esp if you want to continuously scrape all US tweets every day

spare stone Jul 3, 2020, 11:45 AM

#

hmm, maybe I could run it to only scrape tweets at intervals and only if they meet a certain amount of engagements...

#

I wasn't really thinking of using the API but a web scraper instead.

#

bc I have much more aws credits than I could use

acoustic halo Jul 3, 2020, 11:48 AM

#

I'm fairly sure most scrapers will use a websites API anyway

buoyant vine Jul 3, 2020, 11:48 AM

#

its not really a scraper if its using a api lol

acoustic halo Jul 3, 2020, 11:48 AM

#

tell that to tweepy, though admittedly thats the only "scraper" i have used

spare stone Jul 3, 2020, 11:49 AM

#

I'm not entirely sure

buoyant vine Jul 3, 2020, 11:49 AM

#

its also against twitters ToS btw

spare stone Jul 3, 2020, 11:49 AM

#

I don't think that phantombuster does

#

I don't know how else they can keep getting away with the solutions they offer

#

well, if I'm not using their API that wouldn't be a problem, right?

acoustic halo Jul 3, 2020, 11:51 AM

#

You'd probably be okay as long as you only got trends from the country you are actually in

#

Otherwise you need to also be logged in

buoyant vine Jul 3, 2020, 11:51 AM

#

🤔 Why wouldnt you use the API if its availble

spare stone Jul 3, 2020, 11:52 AM

#

because it costs dough

buoyant vine Jul 3, 2020, 11:52 AM

#

one is very much against ToS and rule 5 in this server
and the other is put in place for people to use

spare stone Jul 3, 2020, 11:52 AM

#

and I have a ton of aws credits if there is a way of open sourcing or licensing whatever phantombuster does

acoustic halo Jul 3, 2020, 11:52 AM

#

If you limited what tweets you were after you would be alright

spare stone Jul 3, 2020, 11:52 AM

#

wait web scraping is against ToS for this server?

buoyant vine Jul 3, 2020, 11:52 AM

#

that doesnt change the fact its against ToS which also goes to rule 5

#

!rule 5

arctic wedgeBOT Jul 3, 2020, 11:52 AM

#

Rules

5. Do not provide or request help on projects that may break laws, breach terms of services, be considered malicious/inappropriate or be for graded coursework/exams.

spare stone Jul 3, 2020, 11:53 AM

#

heard

#

but, I'm not breaching any ToS if I'm not engaging them for any service

buoyant vine Jul 3, 2020, 11:54 AM

#

you're literally webscraping the site lmao

spare stone Jul 3, 2020, 11:54 AM

#

and I don't see how data scraping could be considered malicious, but I'm not the arbiter of that

buoyant vine Jul 3, 2020, 11:54 AM

#

If sites dont want you doing it, you shouldnt do it

#

this being a partnered server those rules are heavy applied

spare stone Jul 3, 2020, 11:55 AM

#

ahh, gotcha

buoyant vine Jul 3, 2020, 11:55 AM

#

if it breaches any site's ToS we cant help you

spare stone Jul 3, 2020, 11:55 AM

#

understood

buoyant vine Jul 3, 2020, 11:55 AM

#

pithink

spare stone Jul 3, 2020, 11:56 AM

#

appreciate the help thus far! Sorry I broke your rules!

sonic goblet Jul 3, 2020, 1:06 PM

#

@sonic goblet that is the only correct way to do it, yes. And yes you are expected to use it on both train and test.
@ripe forge Thankss

tender wind Jul 3, 2020, 2:20 PM

#

@paper niche I have no influence on how that JSON blob is generated

#

I will probably run a model generating features without joins and see where it lands

flat plank Jul 3, 2020, 2:35 PM

#

I am new at data science looking for anyone to talk to about a toy project. I

#

made a simple hot-encoded cold start recommender, sentiment analysis and topic modeling on the same data set. The recommender table is 1.0 to 0.0 the sentiment analysis ranges from 1.0 to -1.0 and the topics are 1 to 7. When I got to make the hybrid I know I can hot encode the topics and add to the recommeder then maybe I sort by the pos or neg or the compound 0.8705}

pastel compass Jul 3, 2020, 3:02 PM

#

So I am using this online corpus and the data is all in xml files. Does anyone know a good module for parsing xml files?

acoustic halo Jul 3, 2020, 3:10 PM

#

@pastel compass ElementTree is built into python, I would start there

pastel compass Jul 3, 2020, 3:10 PM

#

Thanks!

dull turtle Jul 3, 2020, 3:23 PM

#

how i can get accuracy of CNN model? @acoustic halo

acoustic halo Jul 3, 2020, 3:44 PM

#

@dull turtle Assuming keras, model.evaluate with your validation data returns the loss and accuracy

tight stone Jul 3, 2020, 3:47 PM

#

@tight stone maybe start by looking at all the artificial/augmented training examples that opencv is throwing out. and compare them visually with the 3rd image you're drawing.
@paper niche So, did I get you correctly, to check if the randomly created samples are correct/close to the 3rd input image? I can do that through my frontend but visually you see differences between the samples and the input images.
Like, you can tell that the samples from input-image1 are triangles and the samples from input-image2 are circles but they don't look identical. My 3rd input image (the image I wanna identify) is basically like my first 2 input images except that I use it for identification.

upbeat knot Jul 3, 2020, 4:09 PM

#

Hello! I'm looking for a Python plotting library which allows for multiple dropdowns/widgets to filter data? Tried plotly but it looks like unless I use the ipywidgets library, I'm stuck with only one dropdown. Looking for something that is aesthetically pleasing, interactive and embeddable. Any help would be appreciated, thank you

serene scaffold Jul 3, 2020, 4:34 PM

#

>>> bools = token_idx != 0
>>> bools
tensor([[True, True, True, True, True, True, True, True, True, True, True]])
>>> torch.LongTensor(bools)  # causes error

#

any idea on how to get this as a long tensor?

#

looks like it's torch.tensor(bools, dtype=torch.float64)

acoustic halo Jul 3, 2020, 4:45 PM

#

tensor.long()

tacit brook Jul 3, 2020, 6:08 PM

#

Guys i was trying to visualize a decision tree with the following code. But i am getting the following error
Can anybody please help me?

📎 unknown.png

#

Error - https://pastebin.com/ABuT3iSQ

Pastebin

[Python] IndexError Traceback (most ...

lapis sequoia Jul 3, 2020, 6:27 PM

#

I feel like I have seen you're github profile before @tacit brook, but I'm not sure

tacit brook Jul 3, 2020, 6:28 PM

#

I feel like I have seen you're github profile before @tacit brook, but I'm not sure
@lapis sequoia ooh

#

I think you have seen the same dataset might be. Coz it has turned out to be a standard ds for decision tree lol

lapis sequoia Jul 3, 2020, 6:30 PM

#

oh actually i found the profile I thought it was close to your name

#

its sdusmantha something

tacit brook Jul 3, 2020, 6:30 PM

#

😅

lapis sequoia Jul 3, 2020, 6:30 PM

#

But the name is almost the same as yours

#

I got confused for some seconds there xD

tacit brook Jul 3, 2020, 6:30 PM

#

😆

lapis sequoia Jul 3, 2020, 6:30 PM

#

!paste

#

use this link and then send the traceback

#

it's much easier readable

tacit brook Jul 3, 2020, 6:31 PM

#

ok sure

#

@lapis sequoia https://paste.pythondiscord.com/ahapahovun.sql

#

Its a long error though

lapis sequoia Jul 3, 2020, 6:32 PM

#

Omg lol im so stupid

#

sorry for bothering you i acidentally pressed alt + up

#

and now im in data science instead of #discord-bots

#

just realized

tacit brook Jul 3, 2020, 6:33 PM

#

grumpchib

lapis sequoia Jul 3, 2020, 6:33 PM

#

@tacit brook Sorry for bothering you I don't know that much about data science 😦 sorry

tacit brook Jul 3, 2020, 6:33 PM

#

ya np

serene scaffold Jul 3, 2020, 7:00 PM

#

        token_idx = token_tensor.unsqueeze(0)
        mask_idx = torch.tensor(token_idx != 0, dtype=torch.long).unsqueeze(0)
        segment_idx = torch.tensor([token != '[MASK]' for token in tokenized_text], dtype=torch.long)

        token_idx = tf.reshape(token_idx, (num_tokens,))
        mask_idx = tf.reshape(mask_idx, (num_tokens,))

        with torch.no_grad():
            result = self.model(token_idx, segment_idx, masked_lm_labels=None)

#

somewhere along the way, my tensors are being converted to a type called EagerTensor, presumably within the call to self.model

serene scaffold Jul 3, 2020, 7:19 PM

#

huh I figured it out

drowsy kite Jul 3, 2020, 8:11 PM

#

Hey guys I got a really weird error after upgrading to python 3.8 and installing vs code (not sure if the two are relatable)

#

Basically my note book wont run now though

#

https://i.imgur.com/SNDj8vr.png

Imgur

#

anyone ever experience something similar?

shadow quiver Jul 3, 2020, 8:21 PM

#

Hey guys. Using pandas, can I make this to give blue -> 0 in mercedes and volvo? I mean if a color doesn't exists for a brand, I still want to see it as 0 count

📎 unknown.png

#

Found it: https://stackoverflow.com/a/49128246/6402099

hearty jewel Jul 3, 2020, 9:16 PM

#

got a question regarding the following: my question is, i want to understand why were looking at the fraction of replicates that were LESS than or equal to the test statistic - in other problems, sometimes were looking at what was greater. Do you choose less than / greater than depending on the context of what you're testing?

📎 unknown.png

#

so hold on

#

ive done some digging, and by definition, a p valaue is : p value = the probability of observing a test statistic equally or more extreme than the one you observed, given that the null hypothesis is true. so i looked at the ihst, and so what im always gunna do is look at the distirbution of the relpiactes and choose the extreme

📎 unknown.png

#

??

#

im remembering from undergrad we'd look at the absolute value, and this is probably why

#

in this situation the test statistic was negative, so were looking at the left side of the curve

rare portal Jul 3, 2020, 9:35 PM

#

Hmm, has anyone worked with USGS` rdb files? I'm wondering if there's a pretty way to load a file with data from multiple sites into a dataframe.
I was thinking of using the to_datetime coerse option and dropna to do it, but that seems a bit hacky?

Here's an example image.
You can see that the data for each site is separated by a few lines of comments and a header column. The comments are not a problem, the issue is the header row and the nonsense row that occurs right after the header row.
I don't really want to hardcode which rows to skip, so I'm wondering if there's a better way to do this...

📎 unknown.png

undone needle Jul 3, 2020, 11:58 PM

#

I have a quick question if someone is able to help.
I'm looking for a library that will allow me to not only build a heatmap with a few given points, but also let me sample points on it.

For example, I would like to have some points (2, 3) = 4; (5, 2) = 10; and (10, 10) = 1. I would like to be able to sample something like (5, 5) to get what value could be expected. It's something like FEA, but with an arbitrary arrangement and number of points.

I've found ways to plot heatmaps, but not a way to query interpolated points.

tacit brook Jul 4, 2020, 1:13 AM

#

I think with plotly its possible. But im not 100% sure abt it

flat quest Jul 4, 2020, 3:21 AM

#

if there is indeed always a few lines of comments and a header column
you could get the indices of those points and split the dataframe based on those indices @rare portal

paper niche Jul 4, 2020, 4:35 AM

#

to check if the randomly created samples are correct/close to the 3rd input image? I can do that through my frontend but visually you see differences between the samples and the input images.
@tight stone I guess my general point was to try and characterize the mistakes made by your model. i.e., is your model always confused by a specific 'pattern' in your 3rd image? Or is it when you trained with a triangle+square drawn in a particular way, then it makes a certain kind of mistake, but not when you train with a differently drawn triangle+square. Etc.

Basically if I understand you correctly, during training, your model only sees the 2 images + their augmentations, so if you're perplexed by why your model is so confident in guessing square when the 3rd image is a circle, could it be that it somehow saw a circle-like image that was generated by opencv but is labeled as square during training? Maybe opencv's transformations rounded the corners of your square, etc.

There are no magic solutions here. Try to first characterize under what circumstances your models are making mistakes; then I would say half the battle is won.

dull turtle Jul 4, 2020, 5:50 AM

#

hello guyz i have a CNN code

umbral aspen Jul 4, 2020, 7:55 AM

#

Hi guys are there any helper libraries or something to help choose the learning rate when using tensorflow/keras? Curious to see if there is something out there before I dive into some more complex code examples...

acoustic halo Jul 4, 2020, 9:14 AM

#

You tune the learning rate via the optimizer in keras

minor sapphire Jul 4, 2020, 9:18 AM

#

hello! i have this problem to solve. I need to find the first occurence of a sum of 2 integers in an array that equal a certain value. What is a good algorithm for that? i need it to be efficient so it can search a list of 10,000,000 elements. Thank you!

tame basalt Jul 4, 2020, 9:28 AM

#

@minor sapphire maybe consider looking at a generator object with next(i for i in list if i == sum) or using filter(function, list) also try out different stuff with timeit to test efficiency

quasi cape Jul 4, 2020, 9:28 AM

#

All remaining properties are passed to the constructor of
the specified trace type

    (e.g. [{'type': 'scatter', ...}, {'type': 'bar, ...}])

any one know how to over come this error in Choropleth

minor sapphire Jul 4, 2020, 9:30 AM

#

@tame basalt gonna check yout the filter function. And yeah good idea, i even forgot about timeit module. Gonna check it out

umbral aspen Jul 4, 2020, 10:48 AM

#

You tune the learning rate via the optimizer in keras
@acoustic halo Hmm ok - I found this which seems to make it quite straight forward: https://github.com/keras-team/keras-tuner - would that be a good approach?

GitHub

keras-team/keras-tuner

Hyperparameter tuning for humans. Contribute to keras-team/keras-tuner development by creating an account on GitHub.

white wave Jul 4, 2020, 10:56 AM

#

guys, data science is just consist in data field?

nova timber Jul 4, 2020, 12:05 PM

#

Hello, I'm working with python 3.8.3, Im trying to install tensorflow but it won't work, can anyone help please?.

lapis sequoia Jul 4, 2020, 12:18 PM

#

@nova timber use Colab, tensorflow comes pre-installed

#

and it has a free gpu/tpu environment if you ever need it

stable cave Jul 4, 2020, 12:51 PM

#

Hello I have a problem with scipy.interpolate.splprep.
I am getting an ValueError less than 1% of the time in my program and I'm not sure why. I am using it to get points on a curve between two points random points on my screen.
The function https://paste.pythondiscord.com/iqekupolot.py
Error (couldnt get entire traceback, it was too far back in the terminal): ```
File "/home/johan/repos/chanscape/venv/lib/python3.8/site-packages/scipy/interpolate/fitpack.py", line 156, in splprep
res = _impl.splprep(x, w, u, ub, ue, k, task, s, t, full_output, nest, per,
File "/home/johan/repos/chanscape/venv/lib/python3.8/site-packages/scipy/interpolate/_fitpack_impl.py", line 279, in splprep
t, c, o = _fitpack._parcur(ravel(transpose(x)), w, u, ub, ue, k,
ValueError: Invalid inputs.

It happens on this line in the function: `tck, u = scipy.interpolate.splprep([x, y], k=degree)`
Example values when fail: ```
x=[702 708 708 714] y=[145 139 139 133] k=3

paper niche Jul 4, 2020, 1:01 PM

#

that's because you have consecutive duplicate values in your x and y

real hollow Jul 4, 2020, 1:31 PM

#

Happy 4th to all of you from overseas 🥳
I wonder if any of you guys would like to recommend some resources to learn making a custom NLP sentiment analyzer with python

dull turtle Jul 4, 2020, 2:52 PM

#

i have my code but i am not able to see training process

#

see here

📎 unknown.png

#

!pastebin

arctic wedgeBOT Jul 4, 2020, 2:52 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

dull turtle Jul 4, 2020, 2:53 PM

#

https://paste.pythondiscord.com/homaxaseka.py see here

#


  File "E:\paymentz\image_save_api.py", line 194, in trainmodel
    model.fit_generator(
UnboundLocalError: local variable 'model' referenced before assignment```

knotty moon Jul 4, 2020, 4:51 PM

#

I think i solved the 2 generals problem

rough umbra Jul 4, 2020, 5:35 PM

#

Not sure if this goes under data sience but i'm trying to pull a image url from a website and i am able to scrap the website down to finding the url but i'm left with a long HTML content i noyl want what's inside "image url here" however i'm ot sure how and here is my code:

📎 unknown.png

#

I'm left with this html content but i only want what's inside the blue marker

📎 unknown.png

umbral aspen Jul 4, 2020, 6:16 PM

#

@rough umbra I would not consider that data science, but you can try this link from stackoverflow: https://stackoverflow.com/questions/2612548/extracting-an-attribute-value-with-beautifulsoup

Stack Overflow

Extracting an attribute value with beautifulsoup

I am trying to extract the content of a single "value" attribute in a specific "input" tag on a webpage. I use the following code:

import urllib
f = urllib.urlopen("http://58.68.130.147")
s = f.re...

rough umbra Jul 4, 2020, 6:18 PM

#

Alright, thanks for letting me now and the link!

robust dome Jul 4, 2020, 7:17 PM

#

Hey just a quick question. what is a boolean tatement that I could use to tell if something is in a list

#

or a graph

dawn turtle Jul 4, 2020, 11:28 PM

#

item in collection

#

returns true if item == collection[i] for some i

hoary breach Jul 5, 2020, 5:23 AM

#

Hey I got a quick question-- when setting up my jupyterlab notebook, the function time does not work. any idea why? Am using arch, conda and pip are both up to date.

lapis sequoia Jul 5, 2020, 6:01 AM

#

Hi

#

Needed some help with numpy module

#

Wanted to know more about numpy

woeful kelp Jul 5, 2020, 6:04 AM

#

Same here but I have a very specific question regarding 2D arrays

flat quest Jul 5, 2020, 6:18 AM

#

just ask the question lol

dull turtle Jul 5, 2020, 8:24 AM

#

hi guyz i have a code which saves a image in training folder and then it starts training a CNN model
when training completes it gives "loss" and "accuracy"
i want to save a CNN model on the basis of loss and accuracy
now i want that if "loss < 0.05" and "accuracy > 85 %" then only it saves a model. otherwise it again retrains a model

#

i am having two functions in it def trainmodel this function contains training a CNN model and def post this function contains image saving part

#

https://paste.pythondiscord.com/vizabuxopa.py code here

toxic hound Jul 5, 2020, 12:19 PM

#

tf.Tensor([[ 101 1192 1132 1177 103 119 102]], shape=(1, 7), dtype=int32)
how do i get ([[ 101 1192 1132 1177 103 119 102]]
from this

dusk osprey Jul 5, 2020, 1:38 PM

#

how do you guys share models across AI applications?

lapis sequoia Jul 5, 2020, 2:02 PM

#

Guys im planning to make a word predictor using rnn which predicts the next word based on prev sentences, but how do you manage with the time steps? For example i'd train the model with n time steps but what if the input has different time steps?
Im assuming, we truncate the array if it has more than n time steps but what if its lesser than n
Please Ping while replying

ivory plank Jul 5, 2020, 2:37 PM

#

@lapis sequoia You only use what you know, so while you should worry about how that affects your predictions, you shouldn't worry about the model itself

#

You're probably looking at using an LSTM for next word prediction

lapis sequoia Jul 5, 2020, 2:37 PM

#

yes

#

im thinking off using a time step of 5

#

alright ty, ill ask help again incase i face any prob

ivory plank Jul 5, 2020, 2:40 PM

#

You should really look at how RNNs and LSTMs work to understand the answer to your question tho @lapis sequoia

lapis sequoia Jul 5, 2020, 2:40 PM

#

i have watched few vids, explaining them

#

on an intuitive level

#

cause i dont really know the math behind lstm or rnn

#

is it like those valves which control the flow of info?

#

from one layer to another

#

i was thinking of using a 0 for filling sentences which have less than n time steps

#

and incase its more i'd take the last five words

ivory plank Jul 5, 2020, 2:44 PM

#

that's generally how that works, but you don't have to explicitly do that yourself

#

the model dimensions aren't a problem

lapis sequoia Jul 5, 2020, 2:46 PM

#

oh okay

ivory plank Jul 5, 2020, 2:48 PM

#

Well that's not entirely true; you do have to set up your code with the correct shape and then do it but I don't think that's the same as what you think you're doing

#

I really encourage you to check out medium posts on lstms and next word prediction

#

you'll get a much deeper understanding by looking at the code and the math of an lstm

ivory plank Jul 5, 2020, 3:40 PM

#

The short answer is what you think it is; you'd pass an empty spot. But the solution to this problem is built much deeper into the architecture and the input (with distinct end and missing states) than that. You should look closely at an LSTM to really understand how that works @lapis sequoia

lapis sequoia Jul 5, 2020, 4:05 PM

#

if u dont mind, for reference can u give a link for a medium post?

flat quest Jul 5, 2020, 4:16 PM

#

@lapis sequoia for dealing with timestep issue. The general methodology is to make the seq_len the mimimum viable length to fit all the input sentences. For any sentences that are less than you're required seq len, you pad them with 0's.

As for the mathematics behind RNN's and LSTM's, I'd suggest reviewing those and actually understanding the inner workings. It'll help explain why your accuracy for text generation will be fairly low (which it will be)

lapis sequoia Jul 5, 2020, 4:17 PM

#

oh okay, so does that mean there are better ways to deal with this problem?

flat quest Jul 5, 2020, 4:18 PM

#

for increasing nlp accuracy? there are

#

but i'd suggest going with RNN's first

lapis sequoia Jul 5, 2020, 4:20 PM

#

okay thanks

unreal shoal Jul 5, 2020, 5:50 PM

#

thanks, I have now deleted my reply.

serene scaffold Jul 5, 2020, 5:50 PM

#

have fun!

unreal shoal Jul 5, 2020, 5:50 PM

#

ty

serene scaffold Jul 5, 2020, 6:00 PM

#

I currently have a project where I get a vector representation of a phrase after passing it through BERT, and then I find the nearest neighbor to the BERT vector in an entirely separate vector space. Obviously the outputs are initially meaningless, but the goal is to discover the mapping between the two vector spaces.

#

My advisor told me to use a feed forward neural network

#

All I've really done so far is speed up the nearest neighbor search using a KD tree.

robust dome Jul 5, 2020, 7:51 PM

#

does anyone know how to make python print elements that are common in a list?

rancid brook Jul 5, 2020, 8:20 PM

#

not data science

cinder pilot Jul 5, 2020, 8:24 PM

#

@serene scaffold why do you even need a neural network for this?

#

or I didn't get you

serene scaffold Jul 5, 2020, 8:25 PM

#

it could be that the problem is solvable without them

cinder pilot Jul 5, 2020, 8:26 PM

#

as far as I know the problem of the nearest neighbors is solved with plain algorithms, not neural nets

#

kd-trees, as you mentioned

#

but I might understood you wrong

serene scaffold Jul 5, 2020, 8:28 PM

#

are you familiar with word embeddings?

cinder pilot Jul 5, 2020, 8:28 PM

#

a little bit 🙂

serene scaffold Jul 5, 2020, 8:28 PM

#

so we have word embeddings that were created from a large body of medical literature

#

and embeddings that represent labels that certain words and phrases get to place them in an ontology

#

they're called concept unique identifiers. I'm not entirely sure how they were created.

#

The assumption is that if CUIs represent a number of similar terms, then you can find a mapping between the two semantic spaces

cinder pilot Jul 5, 2020, 8:32 PM

#

I need to check this out and think about it a little bit

serene scaffold Jul 5, 2020, 8:37 PM

#

I'd appreciate your help. I can also talk to my coworkers tomorrow.

#

Feel free to DM me

cinder pilot Jul 5, 2020, 9:07 PM

#

well, I have never faced such a problem before

#

so my answer will depend on how much you know about neural networks

#

if you are familiar with them and want to hear a concrete architecture/approach that is the best and most used for your type of problem then I am unable to help you there, sorry

#

but if you are absolutely new to NNs then I can tell you that this problem seems like neural nets are able to solve for sure

#

as far as I understood it it's just a mapping between n-dimensional space and m-dimensional space. A function. And neural nets are good at approximation of functions

ripe forge Jul 5, 2020, 9:22 PM

#

Well put. I think a neural network should be able to do this task no problem

#

More concretely, do you have a set of "answers" corresponding to your Bert vectors in this new space?

#

Basically if you have the representation of your inputs in this new space, or a set of vectors that are close in this new space, you're simply trying to predict the new vectors from Bert vectors

#

The network then hopefully ends up simply learning a mapping inherently by doing this task

cinder pilot Jul 5, 2020, 9:27 PM

#

what I see as the most simple and obvious approach that comes to mind: your network has an n-dimensional input layer and m-dimensional output layer (and, of course, layers in between them). If I understood it right you already have labels for input embeddings (so, it's supervised learning) otherwise I can't say (I am not familiar with unsupervised NNs at all). And that becomes just a common deep neural network

ripe forge Jul 5, 2020, 9:27 PM

#

How well it works you'll just have to see and tweak. But you need this kind of pairs as your training data

cinder pilot Jul 5, 2020, 9:29 PM

#

But, again, I am not familiar with these type of problems and seems like mapping between 2 semantic spaces should be deeply known and explored by now, though quick googling didn't give anything useful

#

Sorry if it's not what you expected and not a thorough explanation 😦

flat quest Jul 5, 2020, 11:29 PM

#

if you have both inputs and output vectors that you'd like to map to each other

NN's should do just fine, as long as the training data is there.

serene scaffold Jul 6, 2020, 12:08 AM

#

if you are familiar with them and want to hear a concrete architecture/approach that is the best and most used for your type of problem then I am unable to help you there, sorry
@cinder pilot I don't really know very much about neural networks even though I've seen all those network diagrams. I've taken linear algebra though.

#

But, again, I am not familiar with these type of problems and seems like mapping between 2 semantic spaces should be deeply known and explored by now, though quick googling didn't give anything useful
I have a paper where they did exactly what I'm trying to do but the part about the neural network was vague.

tight stone Jul 6, 2020, 12:13 AM

#

@tight stone I guess my general point was to try and characterize the mistakes made by your model. i.e., is your model always confused by a specific 'pattern' in your 3rd image? Or is it when you trained with a triangle+square drawn in a particular way, then it makes a certain kind of mistake, but not when you train with a differently drawn triangle+square. Etc.

Basically if I understand you correctly, during training, your model only sees the 2 images + their augmentations, so if you're perplexed by why your model is so confident in guessing square when the 3rd image is a circle, could it be that it somehow saw a circle-like image that was generated by opencv but is labeled as square during training? Maybe opencv's transformations rounded the corners of your square, etc.

There are no magic solutions here. Try to first characterize under what circumstances your models are making mistakes; then I would say half the battle is won.
@paper niche Sorry for answering so late.
This is indeed a good argument and clue to solve this issue of false classification.
But this is apparently not the case. I send their transformed samples back to the frontend, so, the user can actually check them themselves.

There were also cases where I used a triangle and circle as inputs.
Even there were cases where the neural network identified the triangle as a circle.

The image shows how my application looks like.
I marked which shape-probability belongs to which canvas.
The 3rd probability is not set at the moment because I don't know how I would implement that properly.

📎 unknown.png

flat quest Jul 6, 2020, 12:42 AM

#

did u make the classifier yourself @tight stone or ur using a classifier a company has already built?

tight stone Jul 6, 2020, 12:45 AM

#

I made them myself if I got you correctly.

#

If the classifier is the number of the input-canvas'/-image that is used for the neural network.

flat quest Jul 6, 2020, 1:54 AM

#

classifier would be the neural network itself
@tight stone

tight stone Jul 6, 2020, 1:59 AM

#

Ah, sorry, I made the neural network myself @flat quest . I didn't use anything that was already built.

flat quest Jul 6, 2020, 2:56 AM

#

yeah was thinking itd be weird if an already existing model was performing that poorly.

how does the model do on training data? @tight stone

potent nymph Jul 6, 2020, 3:08 AM

#

I have a list of student dictionaries read in from a CSV file

#

Each student dictionary looks like this
{"Name": "A", "Email": "email@email.com", "Gender":"Male", "Homeroom":"Mr X's class", "Nationality":"American"},

#

The goal is to create groups based on gender, homeroom or nationality

#

So the user will specify the number of groups, say 3 groups

#

And it needs to create groups with even numbers of male and female students

e.g. something like
[MALE, MALE, FEMALE], [FEMALE, MALE, FEMALE], [MALE, FEMALE, MALE]

i.e. no groups should have all males or all females if it can be avoided

#

How can I do this?

rancid brook Jul 6, 2020, 4:24 AM

#

You could fill the males one by one into each group then the females after

#

Like put one male into each group, then if you still have some left do that again, until you are out

#

Then females go in after

#

@potent nymph

cinder pilot Jul 6, 2020, 7:11 AM

#

@serene scaffold can you share this paper?

dull turtle Jul 6, 2020, 10:10 AM

#

!pastebin

arctic wedgeBOT Jul 6, 2020, 10:10 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

paper niche Jul 6, 2020, 11:02 AM

#

@tight stone Nice app. Very reminiscent of QuickDraw and the likes.
Hmm, apart from the training performance that @flat quest brought up, I'ld also be interested to hear about the general architecture that you're using. I'm assuming some variant of CNN. Intuitively, one would expect the NN to pick up on (in the case of triangles vs circles) the fact that one has 3 vertices / pointy-areas whereas the other does not.
If you visualize the convolutional filters, you might get some sense of what your model is actually learning. (there was something like this for the MNIST dataset; but I can't quite remember which site it is anymore)

If it were me, I would just stick to 1 model first (say, the one that you showed in your image that learnt from triangles and circles), and attempt to debug on that 1 model.

what kind of drawings is the model prone to getting wrong? Do some tests on a bunch of different drawings. Maybe triangles with very wide bases are being misconstrued as circles. Or maybe it's confused by right-angled triangles Etc.
Similarly, the flip side: what kind of drawings is your model prone to getting correct?

Then dig deeper into & visualize the filters the model has learnt. Which pixels are getting picked up as "important" for the classification task?

charred ocean Jul 6, 2020, 12:01 PM

#

I have a question. Is C# effective for data analytics ? Does anyone have any insight on this ?

pale thunder Jul 6, 2020, 12:11 PM

#

from what I understand, data analytics are done mostly in python, java, domain specific languages like R and Julia, and sometimes JS. Never heard of C# in the domain

slim fox Jul 6, 2020, 12:39 PM

#

@pale thunder did you recently got promoted to helper? Grats 😉

pale thunder Jul 6, 2020, 12:39 PM

#

I did indeed

slim fox Jul 6, 2020, 12:42 PM

#

nice 🙂 On the topic, I see that Scala is also a language that seem to get more and more traction in data related stuff

manic bronze Jul 6, 2020, 12:42 PM

#

I have a question. Is C# effective for data analytics ? Does anyone have any insight on this ?
@charred ocean better go with python

lapis sequoia Jul 6, 2020, 1:39 PM

#

hi, i'm beginner with data science with phyton. Welcome any advice or recommadation

steel roost Jul 6, 2020, 2:00 PM

#

@lapis sequoia definitely recommend Pandas, bs4, and the requests library

lapis sequoia Jul 6, 2020, 2:02 PM

#

do you know what is best website to learn Panda, bs4

#

@steel roost Thanks

steel roost Jul 6, 2020, 2:04 PM

#

I learned from youtube and did used random data from kaggle

#

i even went a step further and made charts from the data i extracted

lapis sequoia Jul 6, 2020, 2:08 PM

#

@steel roost can you give best tutotrial on youtube, please

calm scarab Jul 6, 2020, 2:10 PM

#

Hi guys, I am strugling to find a job as a data scientist. Are there any professional data scientist working in Europa, Asia, or USA? I want to send my CV and if he/she can check fastly and give some advises about CV and projects I did or should de to improve my change. I really need it, and if someone help me, I would be reeeealy appreciated.

steel roost Jul 6, 2020, 2:17 PM

#

https://www.youtube.com/watch?v=vmEHCJofslg&feature=share

YouTube

Keith Galli

Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel fi...

Data used in this Tutorial: https://github.com/KeithGalli/pandas
Python Pandas Documentation: http://pandas.pydata.org/pandas-docs/stable/

Let me know if you have any questions!

In this video we walk through many of the fundamental concepts to use the Python Pandas Data Scie...

▶ Play video

#

@lapis sequoia

lapis sequoia Jul 6, 2020, 2:20 PM

#

@steel roost Thank you very much

steel roost Jul 6, 2020, 2:20 PM

#

no probelm @lapis sequoia

modest blaze Jul 6, 2020, 3:40 PM

#

Hi all, Im a bit of a noob here in python.. I'm trying to plot all my cycling rides in matplotlib so I can create a "year in review"..

📎 Screen_Shot_2020-07-06_at_10.39.59_AM.png

#

This is what i have so far, but the scale or aspect ratio doesn't seem to be maintained like it does on a map

#

ideally, each subplot would be 1x1 (inches), but I think I need to normalize the lat/longs somehow to do this

#

anyone have any ideas on how to achieve this?

serene scaffold Jul 6, 2020, 3:49 PM

#

@cinder pilot I only have the abstract and some diagrams that they made. This one describes their whole pipeline

#

📎 unknown.png

#

They calculate the cosine distance from the BERT output to every single vector in the CUI vocabulary, which was taking forever even with 40 CPUs working concurrently. I sped it up with a KD tree.

lapis sequoia Jul 6, 2020, 5:00 PM

#

hi i need help

#

how i can load my cvs document

📎 he.PNG

ripe forge Jul 6, 2020, 5:02 PM

#

Use the r flag for the string. r'C:yourpath'

lapis sequoia Jul 6, 2020, 5:04 PM

#

is not work

#

bottom message error

📎 hell.PNG

chrome barn Jul 6, 2020, 5:08 PM

#

df = pd.read_csv('c:/test/file.csv')
/ instead of \ and also specify the file you want to load

lapis sequoia Jul 6, 2020, 5:13 PM

#

give this error message

#

📎 help.PNG

chrome barn Jul 6, 2020, 5:14 PM

#

probably you mean pokemon_data.csv instead of cvs

lapis sequoia Jul 6, 2020, 5:18 PM

#

@chrome barn Thanks,

lapis sequoia Jul 6, 2020, 6:00 PM

#

does anyone know how to remove a commit i mistakenly made to a a specific branch

flat quest Jul 6, 2020, 6:02 PM

#

have you pushed the commit yet?

lapis sequoia Jul 6, 2020, 6:02 PM

#

yea

#

im new to git :/

#

i accidentally pushed it to several branches when i meant to do it to just one

limpid oak Jul 6, 2020, 6:07 PM

#

Hello everyone, I'm new to this server, sorry for mistake, I have one problem for that I have to submit solution tommorow

#

I have to find solution for making polygons from list of cordinates

spark stag Jul 6, 2020, 6:10 PM

#

this isn't really data science and we can't help with examined/school work because of

#

!rule 5

arctic wedgeBOT Jul 6, 2020, 6:10 PM

#

Rules

5. Do not provide or request help on projects that may break laws, breach terms of services, be considered malicious/inappropriate or be for graded coursework/exams.

limpid oak Jul 6, 2020, 6:10 PM

#

not for exam

#

please listen my question

spark stag Jul 6, 2020, 6:11 PM

#

you may have better luck in a help channel then because this doesn't really sound like data science

limpid oak Jul 6, 2020, 6:13 PM

#

'[{"position":0,"Latitude":19.334445,"Longitude":77.2685681},{"position":1,"Latitude":19.3344453,"Longitude":77.2685673}]'

#

how to get this seprated (split) to make pandas DataFrame

#

or any other solution

lapis sequoia Jul 6, 2020, 6:23 PM

#

can someone help with git?

#

i accidentally pushed files to the wrong branch and im trying to remove them from appearing

uncut shadow Jul 6, 2020, 6:36 PM

#

Well, that's not a python question and not a data science question

serene scaffold Jul 6, 2020, 6:41 PM

#

@lapis sequoia take a look at #tools-and-devops

lapis sequoia Jul 6, 2020, 6:41 PM

#

thanks

serene scaffold Jul 6, 2020, 6:42 PM

#

@limpid oak see #❓｜how-to-get-help

cinder pilot Jul 6, 2020, 6:49 PM

#

@serene scaffold well, they took SciBERT model that is a better trained version of BERT for science. Then they add additional layers right after the output of SciBERT. To be more accurate they use average pooling, then fully connected layer (with tanh activation function) to CUI-sized layer. I am not familar with cosine similarity and how it can be applied there, so I'm unable to describe this layer but then they use softmax fucntion which gives you the probability

#

So, after you add these layers you're going to have to train this net again a little bit on your particular problem and dataset

#

I think I can't explain it more detailed

#

But I can say what you need to try to google: BERT fine-tuning, adding layers to pre-trained networks and maybe transfer learning

serene scaffold Jul 6, 2020, 7:28 PM

#

@cinder pilot Thanks! I'm busy with something else today but I plan to dive into this some more pretty soon

tight stone Jul 6, 2020, 8:05 PM

#

@flat quest @paper niche
These are some images about what my model is made of and how it performs in general.

It usually starts to looks acc on epoch 7 but I have also seen cases were it rises in acc even after epoch 7 - that's why I put in the EarlyStopping.
Usually, my training-curve for acc rises just like that. The same goes for the training-curve for loss. The only issue is that my val_acc-/val_loss-curve never really fits because it randomly seems to be good or bad.

📎 unknown.png

#

📎 unknown.png

#

📎 unknown.png

#

Nice app. Very reminiscent of QuickDraw and the likes.
Thanks, I actually didn't know about QuickDraw hahaha.

If it were me, I would just stick to 1 model first (say, the one that you showed in your image that learnt from triangles and circles), and attempt to debug on that 1 model.

what kind of drawings is the model prone to getting wrong? Do some tests on a bunch of different drawings. Maybe triangles with very wide bases are being misconstrued as circles. Or maybe it's confused by right-angled triangles Etc.
I did a bunch of tests with various forms but the results do not differ that much.
Though, what I realized was that it sometimes guesses things wrong depending on where I am drawing on the 3rd canvas. For example: It guesses most of my triangle-drawings right but, suddenly, guesses my triangle wrong because I drew it in the left corner of the 3rd canvas.
Maybe that has something to do with my randomly generated samples?

Similarly, the flip side: what kind of drawings is your model prone to getting correct?
None in particular. Probably because my neural network is build whenever both input images are drawn and a number of samples is set.
So basically, every time I draw my 2 input-images, set the number of samples and start generating all the stuff (samples+NN) I create a brand new neural network.

Then dig deeper into & visualize the filters the model has learnt. Which pixels are getting picked up as "important" for the classification task?
I really didn't find any help while googling for that. Would you be so kind to tell me as to how I can do that? Whenever I search for visualizations of my layers/filters I just articles as to why I should do that but not how I do it.

@paper niche

modest blaze Jul 6, 2020, 8:51 PM

#

I have a list of dataframes with 2 columns in each df, what's the best way to get the max value from the list of dataframes per column?

chilly geyser Jul 6, 2020, 9:01 PM

#

Do you want the max per column or

modest blaze Jul 6, 2020, 9:02 PM

#

I want max value across all dataframes of a given column

chilly geyser Jul 6, 2020, 9:03 PM

#

I'm not sure if combining the DFs then max is faster or if you get df-by-df-max, then take column-max over that is better

#

But those are the two ways I'd do it

modest blaze Jul 6, 2020, 9:04 PM

#

i have the latter, but was curious if there was a more "pythonic way" using list comprehension/zip maybe

chilly geyser Jul 6, 2020, 9:08 PM

#

I can't get a python-yet-good way of getting it for both columns

#

This is what I have if you want the first column
max(map(lambda x:x.max(), df_list), key=lambda x:x[0])[0]

#

But I think it's a lot better to do this:

df_list = list(DataFrame(uniform(size=(10, 2))) for _ in range(5))
num_cols = df_list[0].shape[1]
col_max = [-float('inf') for _ in range(num_cols)]
for df_max in map(lambda x:x.max(), df_list):
    for idx in range(num_cols):
        if col_max[idx] < df_max[idx]:
            col_max[idx] = df_max[idx]
print(col_max)

#

So if you have and want 40 columns you don't iterate through it 40 times unncessarily

#

Oh and uniform is a numpy.random func while DataFrame is from pandas

#

^This is assuming you want col 1 max and col 2 max by the way, even if they are different members of df_list

flat quest Jul 6, 2020, 9:43 PM

#

you don't want to use pythonic methods when dealing with datasets. Python lists are terribly slow compared to matrix operations @modest blaze

#

@tight stone. What are the class distributions for your dataset? roughly even in circles and triangles?

lusty coral Jul 6, 2020, 9:45 PM

#

hey guys

#

is it possible to relate two same shaped numpy arrays with each other?

#

relating because i want to preserver order of data

#

one arrays is [4, 5, 6] and the other [33,44,88]. but first elements of each array is related. how can i preserve it that way?

flat quest Jul 6, 2020, 9:51 PM

#

not sure what you're exactly trying to do

You're trying to relate them in what way?

lusty coral Jul 6, 2020, 10:12 PM

#

yeah they are related to each other, i dont want to store them in a single ndarray though

#

they'll be separated but i want to know the "indices" of the arrays as i calculate different things you know

#

i know i can use dataframe and series and stuff but numpy is like millions of times faster in my case where i iterate over rows of data and do calculations with them

#

using their order @flat quest

#

order of existence

#

i dont know if that s the right term 😄

spice widget Jul 6, 2020, 10:34 PM

#

If I have query regarding python for finance, can it get solved here? It's a pretty basic doubt

hearty jewel Jul 6, 2020, 11:08 PM

#

lol why does the imputer argument for axis have axis=0 meaning columns, when in every other API it is axis=1 for columns

safe tapir Jul 6, 2020, 11:09 PM

#

Is there an explanation for models that make sense only on the margins?

For example, suppose I have a stock picking model which selects the top N stocks for a time period. For the universe of stocks, there is poor R^2, but the edge cases seem to perform well in backtesting. Is it acceptable to use the model, or is it biased?

flat quest Jul 6, 2020, 11:22 PM

#

er xd i still didn't get how ur trying to relate them. @lusty coral

like 4 is related to 33, 5 to 44, and so forth?

tight stone Jul 6, 2020, 11:56 PM

#

@tight stone. What are the class distributions for your dataset? roughly even in circles and triangles?
@flat quest Em, I am not sure if I got your question completely, sorry.
I usually have the same amount of training data for each image. So, for example, 20.000 for input-image1 and 20.000 for input-image2. But there are very few cases where my randomized image transformation fails. So, there might be cases where I have 19.980 and 20.000.

chilly geyser Jul 7, 2020, 12:26 AM

#

I assume the question is asking if the classes are balanced

#

But more directly the full distribution would be the exact percentage of each class

#

Imbalanced classes creates problems

tight stone Jul 7, 2020, 12:29 AM

#

They are. At least that's how I set it up.

paper niche Jul 7, 2020, 12:38 AM

#

@tight stone

I really didn't find any help while googling for that. Would you be so kind to tell me as to how I can do that?
Something like https://machinelearningmastery.com/how-to-visualize-filters-and-feature-maps-in-convolutional-neural-networks/; the general idea is to extract out the weights of the conv. layers and visualize them on a heatmap. Or extract the output after 1 convolution to see what has been done to your original image by each of the learned filters.

But I realize your architecture is just a feed-forward NN. Have you tried your luck with CNN's yet? You might try to visualize the dense layer weights that you have via the method above, but I'm not sure that you would find the model is learning anything intuitive (to us humans) from your images (like a 'triangle' has 3 vertices as opposed to a circle having none). CNNs are more suited for such feature extraction tasks. (or rather, tasks that rely on the NN learning specific spatial features like the existence of straight lines, or pointy-ends or what-not).

Maybe that has something to do with my randomly generated samples?
How much 'variety' is there in your augmented samples? What kinds of augmentations are you doing? You generate 20k different images from a single one, if there were all just translations of each other, you can imagine it wouldn't do much in helping the model to generalize to your '3rd' image (the test set), if it were a slightly deformed triangle or smth. It's an extreme example, but you get my drift.

Machine Learning Mastery

Jason Brownlee

How to Visualize Filters and Feature Maps in Convolutional Neural N...

Deep learning neural networks are generally opaque, meaning that although they can make useful and skillful predictions, it is not clear how or why a given prediction was made. Convolutional neural networks, have internal structures that are designed to operate upon two-dimens...

tight stone Jul 7, 2020, 12:52 AM

#

@paper niche No, I haven't tried CNN's yet. I used this kind of model because I just happened to see that in one youtube-video where somebody used it for the mnist-dataset.
But I get the point why a CNN would possibly suit this case better.

Thanks for the link, I will give it a shot.

I use 4 different transformations: perspective transformation, translation, scaling and rotation.
Also, I use them randomly. So, there might be a transformed image that has gone through all 4 whereas there might be another image that has been translated, scaled and rotated. The order of transformations is also random.

I will just try out a CNN and see how this behaves.

flat quest Jul 7, 2020, 3:21 AM

#

make a test set
@tight stone don't train on this portion of the dataset.

Once you're done training with the test set, try evaluating on the test set. That'll tell you if the model is just screwing up on that single image, or for the other images as well. Convnets will boost performance, but ffn should do decently well

ebon nebula Jul 7, 2020, 5:39 AM

#

Hello all. Can someone recommended me a book/site/app where I can start studying Data Science. (Beginner level)

hardy shale Jul 7, 2020, 6:42 AM

#

@ebon nebula
https://www.freecodecamp.org/learn
They have a large section of material on Data Analysis which introduces you to Numpy/Pandas/Jupyter, even data cleaning. From there you can move on to the projects themselves and they even have a course over data visualization with D3. You can even get into ML as well

freeCodeCamp.org

Learn to code. Build projects. Earn certifications.Since 2015, 40,000 graduates have gotten jobs at tech companies including Google, Apple, Amazon, and Microsoft.

ebon nebula Jul 7, 2020, 6:43 AM

#

Thanks !!

cunning furnace Jul 7, 2020, 7:53 AM

#

Not sure if this is the correct channel. What module does this look like?

📎 nMD02.png

uncut shadow Jul 7, 2020, 9:01 AM

#

what do you mean

chrome barn Jul 7, 2020, 9:17 AM

#

probably which library was used to create the graph, hard to tell from the graph but to get create such a graph look into popular libraries like matplotlib and ploty for example and look for 3d plots into the documentation of these libraries

lapis sequoia Jul 7, 2020, 11:36 AM

#

@hardy shale thanks

robust marsh Jul 7, 2020, 12:36 PM

#

where can i find some good tutorials about Dash? i have no experience with plotly only know bare basics about html enough to write my flask app

tight stone Jul 7, 2020, 1:27 PM

#

@flat quest So, I should have a completely separated set just for testing?
Isn't the validation_split already doing this?

Or are you talking about pinpointing my mistake by using a training set that is not bound to my web-application (for example by using Jupyter Notebook)?

safe tapir Jul 7, 2020, 2:01 PM

#

Is there a way to get a "rolling" timeseries split?

In traditional TS split, you get this behaviour:

[0], [1]
[0, 1], [2]
[0, 1, 2], [3]
[0, 1, 2, 3], [4]

I want this behaviour:

[0], [1]
[0, 1], [2]
[1, 2], [3]
[2, 3], [4]

astral scaffold Jul 7, 2020, 2:13 PM

#

hi, do you have any reference for sound classification using deep learning? thank you

pure axle Jul 7, 2020, 2:33 PM

#

Hi Guys, I got an interview coming up for a Data Analyst role, i applied for, I have been told that I will be given a dataset and within half an hour i need to analyst and produce a report. I am still working on my learning and analyzing skills. Could someone give me example, what kind of analysis one can carry out on a simple data set? Thank you in advance.

chrome barn Jul 7, 2020, 3:14 PM

#

it really depends on the kind of dataset they will give you and the tools that are available for you to use they have given you any information about that

pure axle Jul 7, 2020, 3:38 PM

#

Thank you for your response. I haven't been given any information other than , I am told that I will be doing an unseen Task (30 min) at the start of the interview and I will be emailed the task- my guess is since it will be emailed to me it could be a dataset on an excel spreadsheet. The data set is in the Health and Safety Directorate of Transport authority i am currently working with. So dataset may contain some information on Health & Safety elements.

#

I will appreciate any advice or even example of what kind of simple ..may be a universal analysis I can carry out. Thank you.

chrome barn Jul 7, 2020, 3:48 PM

#

ah oke so it is a remote test only with a 30 minute time limit after which you have to send it back, i am not really domain knowledgeable about the industry your operating in so it will be hard to give specific advice for that but have you experience with working with datasets and drawing conclusions from them?

chrome barn Jul 7, 2020, 4:06 PM

#

if you want some dataset to practice with: use your favorite search engine: search for the tableau super store dataset it is an excel file. Open the file in excel or python and try to make some meaningful graphs and tables with the data. If you are stuck on what to do, or need some inspiration, search again with your favorite search engine and this time for tableau public dashboards that make use of the super store dataset to get a feeling of what kind of dashboard/graphs that can be made with the data and what questions they try to solve (sales per state, sales over time, etc), good luck with your interview

spare lotus Jul 7, 2020, 4:07 PM

#

Does Python have a way of fitting a stable distribution like Matlab's StableDistribution? I've found this https://erdogant.github.io/distfit/pages/html/Parametric.html#distributions and it has levy_stable, cauchy and norm distributions, but all of these have the alpha parameter fixed. Matlab can fit the alpha parameter as well. Is there a way to do this in Python?

limpid raft Jul 7, 2020, 4:12 PM

#

Can someone recommend a site to learn about convolutional nets?

chrome barn Jul 7, 2020, 4:15 PM

#

you could check out fastai and check if it is your cup-of-tea of learning style and what your looking for

limpid raft Jul 7, 2020, 4:22 PM

#

I'll take a look at it, thx

flat quest Jul 7, 2020, 4:36 PM

#

@tight stone

well generally you'll need to have a train set, a val set, and a test set.

While in your case the validation_split does do that. You generally make your model better based on validation_data. You use the test_data to to a final check on how good your model performance is (i.e. you only use it once - at the very end).

Your model might just be performing badly for that one triangle. I wanted to see if the model was performing badly on other triangles as well

tight stone Jul 7, 2020, 5:51 PM

#

@flat quest Ah, I see, sorry for not getting it right away.
I will try generating a test_set and see how model handles the test-images.
I can't really tell if this helps but at least it might help me pinpoint my mistakes, thanks.

flat quest Jul 7, 2020, 5:53 PM

#

yeah
its just to see if your model performs decently on other triangles. Cause then we know its probably not a programming error you made

scarlet river Jul 7, 2020, 8:37 PM

#

can anyone here help with pandas? currently i imported a jsonfile and turned it into a dataframe and im trying to get summary statistics on price (category) using df.describe but i want it to only count the prices in a row for a specific product (which is a separate column) how could i do this?

chrome barn Jul 7, 2020, 9:14 PM

#

if i am understanding your question correctly the groupby method could help you with this: an example would be df.groupby(["column_with_product"],as_index=False).agg({"column_with_price":"count"}) you could use a filter afterwards to only get the specific product(s) you need

lapis sequoia Jul 7, 2020, 9:59 PM

#

How can I visualize a plot of 21 lines easier lol

#

This just looks like a mess

#

Is there any better way to visualize that lol

#

📎 image0.jpg

obsidian mica Jul 7, 2020, 10:46 PM

#

is there a good library to use for data analysis on a list of numbers

like seeing deltas of when the numbers go up after x many indexes and back down etc, see if theres patterns

would numpy do something like that

pseudo sonnet Jul 7, 2020, 11:15 PM

#

Ok so I have a set of 1.4 million research abstracts and an exclusion list of 10k phrases

#

What I need to do is count the number of times a phrase from the exclusion list appears in each abstract

#

with open('suffix_phrase_exclusion.txt') as f:
    re_exclusion = f.read()

re_exclusion = re_exclusion.split('\n')
re_exclusion = re.compile('|'.join(re_exclusion))


def find_num_exclusion(regex, abstract):
    import re#necessary because of multiprocessing
    return len(re.findall(regex, abstract))


n_proc = multiprocess.cpu_count()
start = time.perf_counter()
with multiprocess.Pool(processes=n_proc) as pool:
    # starts the sub-processes with blocking
    # pass the chunk to each worker process
    results = pool.map(partial(find_num_exclusion, re_exclusion), data['Abstract'].to_list())
end = time.perf_counter()
print(end - start)

data['num_exclusion'] = results
data['num_exclusion'].to_csv('num_exclusion.csv', sep='\t', header=True)
data.head()

#

Is there any way I can speed this up more? It's been running for almost an hour

#

As you can see, I've parallelized it and I'm using compiled regex

rancid brook Jul 7, 2020, 11:21 PM

#

Thats a pretty huge regex

pseudo sonnet Jul 7, 2020, 11:21 PM

#

it is

rancid brook Jul 7, 2020, 11:22 PM

#

searching for 10k seperate things in each of the 1.4 million papers is gonna take quite a while

#

14 billion invididual checks

pseudo sonnet Jul 7, 2020, 11:22 PM

#

any idea ballpark how long?

rancid brook Jul 7, 2020, 11:23 PM

#

you could test it for like 100 papers and then extrapolate

pseudo sonnet Jul 7, 2020, 11:23 PM

#

If it's like a few hours I'll just let it run

#

I thought of that but I'd have to terminate what's running

#

Which is my fault lmao

rancid brook Jul 7, 2020, 11:24 PM

#

Why?

pseudo sonnet Jul 7, 2020, 11:25 PM

#

I'm at 100% load on my CPU. If I run a test alongside this I probably wouldn't get an accurate time right?

rancid brook Jul 7, 2020, 11:25 PM

#

True, but you'll at least get an upper bound

pseudo sonnet Jul 7, 2020, 11:27 PM

#

I mean off the top of your head would you have any idea how to calculate how much memory it would end up using?

#

My memory usage has been creeping up the whole time and if I knew what the final size would be I could get an idea how far along it is

rancid brook Jul 7, 2020, 11:28 PM

#

that sounds like a memory leak

#

Is it from all the numbers of words your storing?

#

mm can't be because 1.4 million numbers takes up like 5 MB of space

pseudo sonnet Jul 7, 2020, 11:32 PM

#

looking at task manager I have a python process where there's around 16 subprocesses actually using CPU power

#

But then there's a ton of top level python processes taking around 60MB each

#

no CPU power

#

Yeah something smells here it's probably gonna run out of memory before it finishes

#

I guess I'll just terminate it and do some tests

#

So yeah it would take 19 hours to do all of it

pseudo sonnet Jul 8, 2020, 12:11 AM

#

Hey I found an issue

#

It's matching every space

#

yay

turbid hearth Jul 8, 2020, 1:45 AM

#

I am trying to check if a pandas dataframe has NaNs and then output an error message if it does

#

what can i do in pandas to do that

spiral peak Jul 8, 2020, 1:53 AM

#

You can do something like: df.isnull().values.any()
That'll run through the entire dataframe and determine if each cell/item is null/NaN. Values will remove axis and just have values in an numpy array and then you check if any is True.

turbid hearth Jul 8, 2020, 2:05 AM

#

would i have to filter the dataframe first so there are only numerical columns?

#

since the dataframe im working with would have categorical and numerical data

spiral peak Jul 8, 2020, 2:06 AM

#

It should work for any type of cell. All it's doing is checking if it's NaN/Null or not. It doesn't really care what the datatype is inside beyond that since it converts it to boolean based on the .isnull() output.

past maple Jul 8, 2020, 6:12 AM

#

hello. i want to extract data from a PDF to csv format.

#

📎 v1.png

#

its arranged in this format.
So how do i go about it?

rough lava Jul 8, 2020, 6:22 AM

#

if you do get an answer to that, I would love to know about it!

chrome barn Jul 8, 2020, 7:01 AM

#

you could look into tabula-py: https://aegis4048.github.io/parse-pdf-files-while-retaining-structure-with-tabula-py it worked for while extracting data from pdf files

Pythonic Excursions

Parse PDF Files While Retaining Structure with Tabula-py | Pythonic...

It's hard to copy-and-paste rows of data out of PDF files. Try tabula-py to extract data into a CSV or Excel spreadsheet using a simple, easy-to-use interface.

dull turtle Jul 8, 2020, 8:46 AM

#

i want to train my model till i get "loss < 0.05" and "accuracy >85 %" , is this correct way to code for same```python
if score[0] < 0.05 and score[1] >.85:
#model_json = model.to_json()
#with open("model.json", "w") as json_file:
#json_file.write(model_json)
model.save_weights(save_path+country+"model.h5")
model.save_weights(save_path+country+".model")
print("model saved...1")

    else:
        data["epoch"]+=100
        #epoch = epoch + 200
        print("model retrained...")
        print("epochs 2",data['epoch'])
        model.save(save_path+country+'.model')    
        model.save(save_path+country+'.model.h5')
        print("model saved...after retraining")
        self.trainmodel(self, country,data['epoch'])```

chilly sphinx Jul 8, 2020, 11:03 AM

#

i am creating a desktop asistant i require it to understand sentences can anyone help me with that

marble jasper Jul 8, 2020, 11:07 AM

#

quite a complex task, look into NLTK as a starting point, but that rabbit hole goes deep

#

unless you want to go use neural networks, for which lately BERT and GPT-2/3 are making big splashes, look into parse trees

#

this is the one on nltk's website: https://www.nltk.org/_images/tree.gif

#

this is part of one that Google's NLP API (which performs better than NLTK) produces:

📎 unknown.png

#

you'd hop through the tree to identify what the user wants in terms of verbs and nouns. It's quite complex doing it this way (due to the many different ways someone can express something), and you may be better limiting the user to a very strict set of syntices

lapis sequoia Jul 8, 2020, 11:10 AM

#

class perceptron(object):
def init(self,eta = 0.01,iters = 50,random_state =1):
self.eta = eta
self.iters = iters
self.random_state = random_state
def fit(self,X,y):
rgen = np.random.RandomState(self.random_state)
self.w_ = rgen.normal(loc=0.0,scale=0.01,size=1+X.shape[1])
self.errors_ = []
    for _ in range(self.iters):
        errors = 0
        for xi,target in zip(X,y):
            update = self.eta * (target - self.predict(xi))
            self.w_[1:] += update *xi
            self.w_[0] +=update
            errors += int(update != 0.0)
            self.errors_.append(errors)
        return self
def net_input (self,X):
    return np.dot(X,self.w_[1:])+self.w_[0]
def predict(self,X):
    return np.where(self.net_input(X)>=0.0,1,-1)

#

can somebody explain the def fit function please?

#

i tried to break this down but couldnt understand

wise cypress Jul 8, 2020, 1:34 PM

#

Anyway to change externalstylesheets in dash later after declaring the app?

glacial rune Jul 8, 2020, 2:56 PM

#

If I've got some data that erroneously flatlines (amongst real, trending data) for a couple of hours, does anyone have any suggestions on how to identify it? As doing if current value = previous value doesn't sound entirely robust, so maybe I could incorporate a check to see if most of the values are the same?

marble jasper Jul 8, 2020, 3:06 PM

#

sounds like checking the range or variance in a sliding window might be good?

stuck cloak Jul 8, 2020, 6:08 PM

#

hi guys. new here and new to programming as well.

#

I have a question that maybe you guys can help me with. I have this dataset that I want to build a CNN model for. Trying to covert the column sequence to numerical values but dont know how. anyone can help?

📎 Screen_Shot_2020-07-08_at_9.54.51_AM.png

rigid summit Jul 8, 2020, 6:13 PM

#

Is there a way to stop the Kite pop-up from loading onto my screen when I start Spyder?

lapis sequoia Jul 8, 2020, 6:34 PM

#

if i have a bunch of dates and i want to predict next months values, but the values depend on specific variables, should i use time series forecasting or regression?

#

or both

desert oar Jul 8, 2020, 7:01 PM

#

@stuck cloak "how to convert something to numerical values" is a huge topic

#

you might want to refer to the genetics literature for your problem. they were early adopters of machine learning and there should be plenty of field-specific techniques available

#

it also looks like you might need to apply some string/text cleaning first...

slow flare Jul 8, 2020, 8:12 PM

#

@lapis sequoia ARIMA with transfer functions. Unless you have a lot of variables, then you could try a RNN.

lapis sequoia Jul 8, 2020, 8:14 PM

#

ty

desert oar Jul 8, 2020, 9:30 PM

#

are there any good transfer function libraries in python right now?

#

for time series modeling

last agate Jul 8, 2020, 11:44 PM

#

Is Google colab enough when starting with machine learning?

#

As in I don't have the hardware needed

flat quest Jul 9, 2020, 12:01 AM

#

yeah its a great starting point

flat quest Jul 9, 2020, 1:04 AM

#

tho its limits on ram and storage aren't very generous. If you start working with larger datasets you might need to use a local computer or cloud

slow flare Jul 9, 2020, 2:10 AM

#

@desert oar you could try the NAG library, TSA submodule. I haven’t performed any time series analysis in python. I’m very strict in using R for stats and python for ML. 🙂

desert oar Jul 9, 2020, 3:20 AM

#

interesting thank you

#

yes normally i use R as well for that kind of thing

solid aurora Jul 9, 2020, 5:14 AM

#

When I perform feature normalization on my dataset,

#

should I perform it on the entire thing?

#

or just the training data (and obviously re-apply it to my test data)?

lapis sequoia Jul 9, 2020, 10:57 AM

#

can someone help with RL and time series?

glacial rune Jul 9, 2020, 11:24 AM

#

I have a group of shops and their prices with timestamps and want to plot this on a graph
I've made 2 dictionaries, both with key as shop name, then variable as prices and timestamps as a list
I think I can populate the two lists then plot them
is this a sensible way of going about this?

lapis sequoia Jul 9, 2020, 12:19 PM

#

can anyone say me what's exactly the difference between the types of gradients ?

#

i know the three types

#

but i exactly donno the difference

desert oar Jul 9, 2020, 12:35 PM

#

@solid aurora the latter: "learn" the normalization on the training set, apply to test set and at prediction time

#

@glacial rune have you used pandas and/or matplotlib before?

#

@lapis sequoia what are the three types?

lapis sequoia Jul 9, 2020, 12:37 PM

#

batch gradient descent,stochastic gradient descent and mini batch descent

glacial rune Jul 9, 2020, 12:42 PM

#

yup used matplotlib before

desert oar Jul 9, 2020, 12:46 PM

#

@lapis sequoia oh. those aren't 3 kinds of gradients

#

the gradient is the gradient

#

those are 3 different forms of gradient descent

#

which is a first-order optimization algorithm

lapis sequoia Jul 9, 2020, 12:47 PM

#

ohh ok i just confused

#

perceptron

desert oar Jul 9, 2020, 12:47 PM

#

@glacial rune can you provide some sample data? im not sure i understand the format of your data

lapis sequoia Jul 9, 2020, 12:48 PM

#

@desert oar means?

#

please?

desert oar Jul 9, 2020, 12:48 PM

#

gradient descent is an algorithm

#

there are different forms of it

lapis sequoia Jul 9, 2020, 12:48 PM

#

yea i know that

#

but what is the main difference between them?

#

those three?

desert oar Jul 9, 2020, 12:49 PM

#

the number of data points that are used to compute the parameter update

lapis sequoia Jul 9, 2020, 12:49 PM

#

ohh

desert oar Jul 9, 2020, 12:49 PM

#

stochastic = 1 point at a time

lapis sequoia Jul 9, 2020, 12:49 PM

#

is that like classification tasks?

desert oar Jul 9, 2020, 12:49 PM

#

mini batch = a few points at a time

#

batch = everything

ebon nebula Jul 9, 2020, 12:50 PM

#

Hello all. What course/book would you suggest me to learn data science from. (I know the basics of python)

lapis sequoia Jul 9, 2020, 12:50 PM

#

ohhk

#

but is it like the perceptron and adaline ?

glacial rune Jul 9, 2020, 1:11 PM

#

yeah sure @desert oar !
so I actually have a list of dictionaries which was from a json:

[{"price" : 120, "shop" : qwe, "timestamp" : "00:00"}, {"price" : 140, "shop" : asd, "timestamp" : "00:00"}... {"price" : 130, "shop" : qwe, "timestamp" : "01:00"} ]

is the sort of data I have in it. Ultimately I want to plot price against time for the different shops.
I've made two dictionaries using collections.defaultdict for storing values.
Both dictionaries have the different shop names as keys, e.g.

prices = {"qwe": [], "asd": []}
timestamps = {"qwe": [], "asd": []}

I'd like to put the prices and timestamp data into the empty lists in the dictionary

#

so I can ultimately plot it

#

is this a sensible approach?

desert oar Jul 9, 2020, 1:11 PM

#

i see

#

do you use pandas?

#

that will be the easiest way

#

otherwise you can do it "manually" like you're describing w/ the defaultdicts

glacial rune Jul 9, 2020, 1:13 PM

#

I've never used pandas no, I could have a look into it though!

desert oar Jul 9, 2020, 1:15 PM

#

yeah just use your current method then

#

nothing wrong with it

glacial rune Jul 9, 2020, 1:33 PM

#

ok, managed to populate the prices and timestamps dictionaries, so if I wanted to plot qwe timestamps on the x axis, how do I refer to that in the code?

desert oar Jul 9, 2020, 1:41 PM

#

oh you should convert them to datetime objects

glacial rune Jul 9, 2020, 1:41 PM

#

yeah I've converted the timestamps to datetime objects

desert oar Jul 9, 2020, 1:41 PM

#

then just iterate over shop names and plot as desired

glacial rune Jul 9, 2020, 1:41 PM

#

ahh ok

#

that makes sense I'll give that a go

#

thanks!

glacial rune Jul 9, 2020, 2:16 PM

#

I'd like it to plot it all on the same graph but the way I've set up my loop, it's plotting it after every
plt.plot(...) line

#

I only have plt.show() after all of the plt.plot() iterations as I thought it would only show once it reaches plt.show()

#

but the graph pops out when the plt.plot() line is ran... is this normal? all the guides I've seen have multiple plt.plots and then a plt.show() to make the graph appear

#

nvm, it was an indent 😄

lapis sequoia Jul 9, 2020, 2:45 PM

#

is it bad to have a value of 0 at ACF and PACF values of 1

#

for timee series

lapis sequoia Jul 9, 2020, 3:07 PM

#

I'm trying to work with an API but it's a little bit confusing because I don't have the best experience when it comes to JSON.

Code:
https://paste.pythondiscord.com/sicocalasi.py

Error:
I'm trying to get this right here:
"stats": {
"hp": "39",
"attack": "52",
"defense": "43",
"sp_atk": "60",
"sp_def": "50",
"speed": "65",
"total": "309"
}
But it's pretty hard because it has the "{" and for some reason confuses my code and apparently it isn't an array it's bigger so how would I get this to work?

#

hp = dictionary["hp"]

hp = dictionary[{"hp"}]

#

None of these two works

desert oar Jul 9, 2020, 3:14 PM

#

@glacial rune do you want them all on the same plot? or a grid of plots

#

oh you got it owkring

#

ok

glacial rune Jul 9, 2020, 3:15 PM

#

yeah I messed up an indent 😄

#

thanks for your help!

#

I now have plotted my data and have some flat lines that I want to get rid of. I’m fairly sure the value just stays stuck for a long period of time so could I perhaps iterate through the list of times and simply check if previous value = current value occurs consecutively over a large period of time then remove those data points?

chrome barn Jul 9, 2020, 3:19 PM

#

@lapis sequoia try: hp = dictionary[0]["hp"]

lapis sequoia Jul 9, 2020, 3:19 PM

#

ok

chrome barn Jul 9, 2020, 3:21 PM

#

@lapis sequoia for stats do hp = dictionary[0]["stats"]["hp"]

lapis sequoia Jul 9, 2020, 3:27 PM

#

thanks

ebon nebula Jul 9, 2020, 4:12 PM

#

Hi . Can someone recommended me a good course/site/book where I can study Data Science

lapis sequoia Jul 9, 2020, 4:32 PM

#

Hi i have question what this message erros mean : Error in match.names(clabs, names(xi)) :

glacial rune Jul 9, 2020, 4:51 PM

#

Ok so I want to remove consecutive duplicates, but only if there are more than n consecutive duplicates... does anyone have a way of doing this please?

#

ah sorry, consecutive duplicates of elements in a list

#

I've been googling and checking stack exchange but can't find something entirely

#

I have been looking through pandas videos to see if they have any

#

sure, so for example:
[9, 9, 2, 3, 4, 5, 6 , 8, 9, 9, 9 ,9 ,9 ,9 ,9 ,9, 2, 4, 6, 6, 6, 6]
let's say if there are more than 3 consecutive duplicates, all of those will be removed
so we would have
[9, 9, 2, 3, 4, 5, 6 , 8, 2, 4]
could keep the first/last duplicate but not that important

desert oar Jul 9, 2020, 5:00 PM

#

is a dict not similar?

glacial rune Jul 9, 2020, 5:00 PM

#

actually the data I have is in dicts

#

with lists as the values

#

but it's consecutive duplicates within those lists

#

as I'm tracking them over time

#

so my initial data was a list of dictionaries:

[{"shop": "qwe", "price": 123, "time": "00:00"}, {"shop": "asd", "price": 156, "time": "00:00"}, {"shop": "zxc", "price": 236, "time": "00:00"} etc. ] # with changes in price over time

I made two dictionaries to store the data, the keys being shop for both, but values being a list of prices and a list of time

#

as I want to remove flatlined data, I need to look for consecutive duplicates, no

#

?

#

I was thinking, if I could iterate over the lists and find say, >n consecutive duplicates, it would remove those for me... but can I iterate over the elements of a list within a dictionary?

ebon nebula Jul 9, 2020, 5:06 PM

#

Hi . Can someone recommended me a good course/site/book where I can study Data Science

slow flare Jul 9, 2020, 5:08 PM

#

Has anyone used any really good tools for labeling images with polygons instead of rectangular boxes? I am currently using labelImg.

chrome barn Jul 9, 2020, 5:14 PM

#

with pandas you could use the shift function to figure out which rows are duplicates and later remove them, maybe even the drop_duplicates function with the subset argument could maybe also work

peak zealot Jul 9, 2020, 6:36 PM

#

why is pandas converting discord ids like 308778632111980554 to 3.0877863211198054e+17?

#

I can just convert the ids to strings to fix it but I prefer to preserve them as ints

chrome barn Jul 9, 2020, 7:00 PM

#

pandas is not really converting them but in the output because it is a large integer or float it is showing/outputting it in scientific notation formatting

plain forge Jul 9, 2020, 7:02 PM

#

That's how panda outputs large numbers

lapis sequoia Jul 9, 2020, 7:15 PM

#

wouldn't it be easier if you would convert these large numbers into a string since they are IDs

#

@peak zealot

peak zealot Jul 9, 2020, 7:15 PM

#

yeah that's what i'm doing now

lapis sequoia Jul 9, 2020, 7:15 PM

#

👍

oblique belfry Jul 9, 2020, 7:19 PM

#

https://paperswithcode.com/methods

Papers with code is such an awesomre resource

Papers with Code - The Methods Corpus

753 methods • 25676 papers with code.

limpid raft Jul 9, 2020, 7:45 PM

#

Simple question about CNN kernel size. I'm trying to fit 8000 images of 32 by 32. Whenever I use a kernel size different than (1,1) it doesn't fit, which I don't understand why. Could anyone tell me what I should do to increase the kernel size or what I'm doing wrong? (I'd like to be more specific but I'm fairly new to this)

#

here the relevant code:

#

def build_model(learningrate=0.01):
Model = Sequential()

Model.add(Conv2D(64, kernel_size=(3,3), strides=(1,1), activation='relu', input_shape=(32,32,1), use_bias=False, kernel_initializer="he_uniform"))

opt = tf.keras.optimizers.Adam(learning_rate = learningrate)
Model.compile(loss="mape",optimizer='adam')
return(Model)

#

model = build_model()

#

n_epochs = 2
es_callback = EarlyStopping(monitor = 'loss', patience = 5)

model.fit(training_input,training_output,epochs = n_epochs, verbose = 1, callbacks = es_callback)

flat quest Jul 9, 2020, 7:48 PM

#

totally agree @oblique belfry

Tho being able to understand those papers is a whole different thing. Most ppl have difficulty even when someone explains the paper to them.

oblique belfry Jul 9, 2020, 7:52 PM

#

True.

Though I guess that is an education thing. You have to have a certain amount of knowledge to understand what papers are talking about. But, I have found that although I might read a bunch of papers, trying to translate that in code is HARD.

#

I really like the push for Machine Learning and AI papers to post their code with the paper and if possible the dataset(s) used or the method of how to create that data yourself.

flat quest Jul 9, 2020, 7:54 PM

#

oh yeah definitely

its one thing to know the math behind it. Whole nother thing to be able to code it, even if you have a lot of exp with ml libraries.

Yeah I wish they did that too. But i suppose the papers would get even longer than they are now then

oblique belfry Jul 9, 2020, 8:00 PM

#

To me, the code is more enlightening to the process than the paper.

flat quest Jul 9, 2020, 8:09 PM

#

possibly

but you'll still need the actual verbal explanation. Code on its own won't cut it

glacial rune Jul 9, 2020, 9:31 PM

#

pick that dictionary's item and just iterate over it
@mellow saffron maybe I missed something super obvious then but I found it difficult to iterate over a dictionary since it wasn’t ordered? As I have each shop and price as a key: value respectively

#

So... iterating over the list for each one, I wasn’t sure about the syntax

peak zealot Jul 9, 2020, 9:49 PM

#

why is df.drop(labels=["whatever"] always return KeyError when I definitely have a row with the appropriate value

#

my csv looks like guild_id,role_name 308778632111980554,Bot Dev 308778632111980554,Shade's Bots and no matter what I put as role name it gives a KeyError

dapper nexus Jul 9, 2020, 10:17 PM

#

Is data engineering competitive?

#

I can web scrap with python and sql

oblique belfry Jul 10, 2020, 12:06 AM

#

Yes.

pseudo sonnet Jul 10, 2020, 1:10 AM

#

So can I switch keras backends at runtime?

#

I have plaidML installed so i can accelerate my neural net on my R9 290 but I want to do the vectorization on my CPU

tawdry bobcat Jul 10, 2020, 6:01 AM

#

Any good resources on intent classifcation with python?

#

Sentence classifcation, sentiment analysis etc

lapis sequoia Jul 10, 2020, 6:08 AM

#

hey, i just started using numpy for my project as otherwise i wait like 30 minutes for my script to finish.

but im encoutering massive memory problems;
i have a folder of around 2 gb of text files, each file is around 200-500 kb but can go up to 10mb and i load them as 1d array like following:

for item_x in glob.glob(input_dir + '*'):
mofo = np.append(mofo,[open(item_x).read().split('\n')])

after a couple of seconds it uses 10gb of my ram and i don't know how to fix it. anyone got an idea ?

#

the goal of my project is basically to compare each text file with each other and find similar text files,
i can do this with numpy using np.setdiff1d(input_1,input_2,assume_unique=True)
and then using the .shape function to calculate the difference in %

rapid escarp Jul 10, 2020, 6:24 AM

#

Is there any API like google's Smart compose that i can use for my project ? Thanks for the help.

crude flame Jul 10, 2020, 9:19 AM

#

I did some data science online courses a while ago, but forgot a lot of it already... does anyone have a good cheat sheet or quick reference to brush up on the core concepts?

glacial rune Jul 10, 2020, 9:48 AM

#

following the code here: https://towardsdatascience.com/pandas-dataframe-group-by-consecutive-same-values-128913875dba
is there a way I can only show it if the group size is greater than a certain size?

Medium

Pandas DataFrame Group by Consecutive Same Values

Grouping Pandas DataFrame by consecutive same values repeated multiple times

lapis sequoia Jul 10, 2020, 9:55 AM

#

following the code here: https://towardsdatascience.com/pandas-dataframe-group-by-consecutive-same-values-128913875dba
is there a way I can only show it if the group size is greater than a certain size?
@glacial rune interessing

Medium

Pandas DataFrame Group by Consecutive Same Values

Grouping Pandas DataFrame by consecutive same values repeated multiple times

chrome barn Jul 10, 2020, 10:01 AM

#

use apply to iterate over each row and see if the value is greater then what you specify and if not just drop the row

glacial rune Jul 10, 2020, 10:09 AM

#

ok I figured it out - not familiar with looping when you have two things(?) to loop, e.g.

for k, v in df.groupby((df['v'].shift() != df['v']).cumsum()):

#

I figured that v was the group ('v' is the column heading) so I just made an if line to see if the group size was big enough

#

what are the proper names for 'k' and 'v' here?

#

also, if I want to delete the rows in those groups in my data, could I make a list of all the index numbers that show up in the group, then drop those from my original dataframe?

chrome barn Jul 10, 2020, 11:29 AM

#

df.loc[df["v"]-df["v"].shift(1) != 0, "v_prev"] = 1
df.loc[df["v"]-df["v"].shift(1) == 0, "v_prev"] = 0
df.loc[0,"v_prev"] = 1
df.loc[(df["v_prev"] == 1), "group"] = df['v_prev'].cumsum()
df['group'] = df['group'].fillna(method='ffill')
df['group_sum'] = df.groupby('group').v.transform('sum')
df = df[df['group_sum'] >= 10]

#

you could try something like this

#

it is based upon the same dataframe as in the link you mentioned earlier

glacial rune Jul 10, 2020, 1:43 PM

#

thanks, I forgot to check discord as I did it 😛 but I did something like:

df = df.drop(list(v.index)

#

as v is a dataframe of then group

graceful ice Jul 10, 2020, 2:02 PM

#

how to parse data from nested tables from html string

#

@chrome barn

chrome barn Jul 10, 2020, 2:04 PM

#

you got an example or the html string?

graceful ice Jul 10, 2020, 2:05 PM

#

yes I have the example string

arctic wedgeBOT Jul 10, 2020, 2:08 PM

#

Hey @graceful ice!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

serene veldt Jul 10, 2020, 2:09 PM

#

Hello, could someone help the understand how the mean works when working with 3d tensors?

#

📎 unknown.png

#

not understanding how its calculated when using this axis

graceful ice Jul 10, 2020, 2:10 PM

#

@chrome barn will pastebinw ork for you

chrome barn Jul 10, 2020, 2:10 PM

#

yes is fine

arctic wedgeBOT Jul 10, 2020, 2:12 PM

#

Hey @graceful ice!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

graceful ice Jul 10, 2020, 2:14 PM

#

@chrome barn https://file.io/xojV3JFEt6X1

#

check this

#

I want to fetch all tables where columns are more than 2

chrome barn Jul 10, 2020, 2:15 PM

#

it says file not found

ebon nebula Jul 10, 2020, 2:16 PM

#

Hello all. Can someone suggest me a course/book/site where I can start learning Data Science with Python.

graceful ice Jul 10, 2020, 2:19 PM

#

https://we.tl/t-1wuUm3bppX
@chrome barn

New Text Document (2).txt

1 file sent via WeTransfer, the simplest way to send your files around the world

arctic wedgeBOT Jul 10, 2020, 2:20 PM

#

Hey @graceful ice!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg.

Feel free to ask in #community-meta if you think this is a mistake.

graceful ice Jul 10, 2020, 2:21 PM

#

basically I want to read tabular data from emails

#

And I am not getting any ways to do that

chrome barn Jul 10, 2020, 2:22 PM

#

yes i see what you want

graceful ice Jul 10, 2020, 2:22 PM

#

can you help me

chrome barn Jul 10, 2020, 2:26 PM

#

there are multiple ways in which it can be done:

the easiest way that i know off will be using pandas and the read_html method but this will not always work
the second method which required more work will be using a package like BeautifulSoup

#

and of course there will also be other methods but i am less familiar with them

graceful ice Jul 10, 2020, 2:27 PM

#

Umm

#

I tried with bs

#

but not getting a way to to fetch the data

#

if you have worked with this type of things

#

could you please provide me a refference

#

or a snippet

chrome barn Jul 10, 2020, 2:28 PM

#

i can provide you with a bs snippet hold on

graceful ice Jul 10, 2020, 2:28 PM

#

please

#

do atleast help me to do this

rancid dove Jul 10, 2020, 2:39 PM

#

When is it appropriate to use datetime in pandas

#

Like I have simulation data that has sim time, of about 600 seconds

#

Would use datetime be appropriate for anything time based or only when it's like over the period of days months , when u would use categoricals too

ripe forge Jul 10, 2020, 3:17 PM

#

id use datetime when its real datetimes

#

not "durations"

#

or i suppose, to clarify. "time" is not enough for me to start using datetime. "dates" are. this is just personal preference though

lapis sequoia Jul 10, 2020, 5:03 PM

#

Can someone help me get this into an iterative form?
This creates a list of all possible unordered pairs from an input list.
Unfortunately my input list has 25 elements in it, wich leads to nearly 2 bil possible permutations and over 40TB of data.
Since i dont have that much capacity i want to reduce the list on the fly, hoping it will save me enough space, but my recursive implementation does not allow for that.

def rec_perm(list):
    res = []
    if len(list) <= 2:
        return [[list]]
    for i in range(len(list)-1):
        touple = [list[0], list[i+1]]
        tmp = rec_perm(np.delete(np.delete(list, i+1), 0))
        for k in range(len(tmp)):
            tmp[k].append(touple)
        for x in tmp:
            res.append(x)
    return res```

spark stag Jul 10, 2020, 5:15 PM

#

@lapis sequoia i'm not sure this is really data science but you could try convert your function into a generator then it wont store all the possible values and will lazily generate them as needed meaning less memory required and less initial processing (as that is done when requested)

lapis sequoia Jul 10, 2020, 5:23 PM

#

im sorry all these different channels confuse me, i never know if im in the correct one..

#

i thought this one fits the best since im trying to analyze data

spark stag Jul 10, 2020, 5:24 PM

#

thats fine, usually looking at the channel description gives an overview of what the channel is about

lapis sequoia Jul 10, 2020, 5:24 PM

#

oh mb didnt realize discord offers that

peak zealot Jul 10, 2020, 6:08 PM

#

How come when I update a row in a CSV with Pandas it messes up other rows

#

146705060938776579,1,1``` 
the 1.0 in the first row should stay as 1

loud sorrel Jul 10, 2020, 6:38 PM

#

What the start to Data Science?

lusty coral Jul 10, 2020, 7:34 PM

#

Google

#

@peak zealot how it messes things up there?

#

@lapis sequoia use caching

#

And also generators as well

upper ginkgo Jul 10, 2020, 8:14 PM

#

Hi, I've been trying to build Tensorflow for the last 2 days without any luck. I've followed this guide: https://www.tensorflow.org/install/source_windows

I'm using Python 3.8.3, bazel 2.0.0

Build label: 2.0.0
Build target: bazel-out/x64_windows-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 19 12:31:16 2019 (1576758676)
Build timestamp: 1576758676
Build timestamp as int: 1576758676
``` and the latest version of MSYS2.

When running bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package after configuring the build with py configure.py it takes a long time before erroring:

INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (339 packages loaded, 21623 targets configured).
INFO: Found 1 target...
INFO: Deleting stale sandbox base C:/users/luis_/bazel_luis/r4xt3sxh/sandbox
ERROR: C:/users/luis_/desktop/tensorflow/tensorflow/core/framework/BUILD:1373:1: ProtoCompile tensorflow/core/framework/variable_pb2.py failed (Exit -1073741795)
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 1102.609s, Critical Path: 61.15s
INFO: 166 processes: 166 local.
FAILED: Build did NOT complete successfully


Can someone help me? I can't use the package provided by pip as it uses AVX2 instructions and errors.

lusty coral Jul 10, 2020, 8:44 PM

#

146705060938776579,1,1``` 
the 1.0 in the first row should stay as 1

@peak zealot cast the column back into int again then

peak zealot Jul 10, 2020, 8:45 PM

#

I ended up just using all floats since pandas was adament about them being floats

#

works fine now

lusty coral Jul 10, 2020, 8:46 PM

#

Yeah. one of their tricks i guess

worldly elk Jul 10, 2020, 10:57 PM

#

!close

haughty cedar Jul 10, 2020, 11:07 PM

#

hi guys

#

i have an interesting problem

#

when plotting multiple lines with matplotlib

#

i get the following image

📎 unknown.png

#

however when i plot the orange line alone i have no issues

#

📎 unknown.png

#

if anyone has any idea why this is happening or what it could be I would really appreciate the help

rancid brook Jul 10, 2020, 11:33 PM

#

Posting your code will probably make it much easier to help you

spiral peak Jul 10, 2020, 11:45 PM

#

what does your x-y data look like for both?

velvet thorn Jul 11, 2020, 12:56 AM

#

seems like a type issue

obtuse condor Jul 11, 2020, 2:36 AM

#

Hello guys, anyone knows some things about discrete linear optimization in Python?

lament dust Jul 11, 2020, 2:37 AM

#

@obtuse condor oh noooooooooooooooo

obtuse condor Jul 11, 2020, 2:37 AM

#

hmm?

#

Well, if anyone does, let me know

mellow spruce Jul 11, 2020, 3:16 AM

#

Hello all, first time posting so sorry if formatting is not correct. I have a question about how to filter further after using group by and value counts. I have a huge data set (10 million rows) that consists in this: df = pd.DataFrame([["John", "Cleaning"], ["John", "Cleaning"], ["Mary", "Driving"], ["Mary", "Cleaning"], ["Mary", "Walking"], ["John", "Driving"], ["Peter", "Cleaning"], ["John", "Driving"], ["John", "Cleaning"], ["John", "Walking"]], columns=["Name", "Activity"]) For each unique name I want to find how many times each activity occured and filter only the names that performed any activitie more than once. Someone on SO told me to do this result = df[
df['Name'].isin(df[df.duplicated()]['Name'])
].groupby('Name')['Activity'].value_counts() and it returned this Name Activity
John Cleaning 3
Driving 2
Walking 1 however I want to return only the name. Since it is a series I cannot access the columns so I am not sure how to proced. any help is welcomed

mellow spruce Jul 11, 2020, 3:43 AM

#

Pls grumpchib

chrome barn Jul 11, 2020, 4:08 AM

#

@mellow spruce ```python
result = df.groupby(['Name','Activity']).size().reset_index(name='count')
result = result[result['count'] > 1]
result.drop(columns=['Activity','count'],inplace=True)

#

you could add some drop duplicates at the end to remove any duplicates

mellow spruce Jul 11, 2020, 4:14 AM

#

Thank you master @chrome barn

dull turtle Jul 11, 2020, 8:53 AM

#

i want to access this .model.h5 file

📎 unknown.png

#

but i am not able to access this can anyone help here

#

country = data["country"]
        modelType = data["modelType"]
        
        if modelType == r"E://paymentz//"+country+".model.h5" or country :
            country = data["country"]
            #model = tf.keras.models.load_model(r'E://paymentz//albania//albania.model.h5') or ("albania")
            model = tf.keras.models.load_model(r"E://paymentz//"+country+".model.h5") or (country)
            print("model loaded...")
            test_img = image.load_img(r'E://paymentz//albania//training//albania_passport//asd.jpg', target_size= (64, 64))
            ```

lapis sequoia Jul 11, 2020, 9:34 AM

#

wow

#

spaghetti code

#

you shouldn't concatenate strings like that (use f strings or .format) and you don't need to reference filepaths like this, pass them as a flag or declare all your variables at the top

#

model loaded is not really doing anything here, you need to use a method that actually checks if your model is available

rose saffron Jul 11, 2020, 9:48 AM

#

I never thinked i had to ask help for the pd.read_csv() but.... look at this... in the original file I can read the csv file without problem; but in the same file_copy i can't read the csv file because it says the file does not exists ...

#

📎 smadonne_1.jpg

#

📎 smadonne_2.jpg

#

DO u have any idea why this happens?

#

FileNotFoundError: [Errno 2] File b'Downloads/home-data-for-ml-course/train.csv' does not exist: b'Downloads/home-data-for-ml-course/train.csv'

lapis sequoia Jul 11, 2020, 9:52 AM

#

is this a windows environment

rose saffron Jul 11, 2020, 9:53 AM

#

yep

lapis sequoia Jul 11, 2020, 9:53 AM

#

'C:\\mydir'

rose saffron Jul 11, 2020, 9:54 AM

#

nothing...

#

if u see into the 1st image and the second one the code is the same

#

i didn't changed the path of the csv file

lapis sequoia Jul 11, 2020, 10:05 AM

#

hmm yes I see that

#

try this

#

!pwd

rose saffron Jul 11, 2020, 10:14 AM

#

!pwd
@lapis sequoia where?

lapis sequoia Jul 11, 2020, 1:26 PM

#

Edit: going to try the scikit-learn tutorial

glacial torrent Jul 11, 2020, 1:56 PM

#

guys does somebody know why

#

the method .columns doesnt appear to me?

#

📎 unknown.png

#

📎 unknown.png

#

in the video that im following it worked 😦

📎 unknown.png

lusty coral Jul 11, 2020, 2:04 PM

#

Use another editor if you don't like its suggestions

glacial torrent Jul 11, 2020, 2:05 PM

#

but now it doesnt suggest a thing too @lusty coral

📎 unknown.png

#

it should be okay

lusty coral Jul 11, 2020, 2:06 PM

#

Suggests nothing?

#

What do you get when you hover over data variable in the first row of the code

#

What does it say

glacial torrent Jul 11, 2020, 2:07 PM

#

look

#

📎 unknown.png

#

just that

#

it should be okay

lusty coral Jul 11, 2020, 2:10 PM

#

So writing "data." Gives you nothing?

glacial torrent Jul 11, 2020, 2:11 PM

#

a lot of methods

📎 unknown.png

#

but not the columns one

#

that's the problem 😦

#

@lusty coral i have pasted the same code at google colab and there it worked

lapis sequoia Jul 11, 2020, 2:26 PM

#

Hello everyone.
I try to create an object but I fail.

#

📎 Annotation_2020-07-11_172527.png

#

This give me error:
AttributeError: 'Reader' object has no attribute 'fpx1'

uncut shadow Jul 11, 2020, 2:34 PM

#

well, it means you can't do self.fpx1 because ur class has no attribute like that

lapis sequoia Jul 11, 2020, 3:12 PM

#

@uncut shadow Thank you. But it's in def getdara(self)

#

I don't understand

#

Object reads dat file. Order its data into variables.

#

Then I want to use them. I don't want to do regex search-match job again and again

#

Hi, i need help

#

ValueError: invalid literal for int() with base 10: 'Or'

uncut shadow Jul 11, 2020, 3:14 PM

#

yes, but this fpx1 only exists in the scope of __init__ method. In order to make it usable in whole class, you have to define it using self.fpx1 = ... (of course, replace ... with that list you want it to store)
@lapis sequoia

#

@lapis sequoia you have to provide some code

lapis sequoia Jul 11, 2020, 3:15 PM

#

the code is :all_data['Month'] = all_data['Order Date'].str[0:2]
all_data['Month'] = all_data['Month'].astype('int32')

#

@uncut shadow Thank you very much. I think I figured out by typing self before every word 🙂

#

📎 Annotation_2020-07-11_172527.png

uncut shadow Jul 11, 2020, 3:17 PM

#

yeah

lapis sequoia Jul 11, 2020, 3:18 PM

#

I don't know. Maybe I don't need to write "self" everywhere. But it works

uncut shadow Jul 11, 2020, 3:19 PM

#

@lapis sequoia it means that your data contains data type (probably string) which cannot be converted to int32. In your example it's probably string or

#

@lapis sequoia Well, you should read more about classes in python, it will help a lot to answer this question. You can check this https://www.geeksforgeeks.org/self-in-python-class/ or this https://stackoverflow.com/questions/2709821/what-is-the-purpose-of-the-word-self for more info about what actually self is in python

GeeksforGeeks

self in Python class - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

Stack Overflow

What is the purpose of the word 'self'?

What is the purpose of the self word in Python? I understand it refers to the specific object created from that class, but I can't see why it explicitly needs to be added to every function as a par...

lapis sequoia Jul 11, 2020, 3:22 PM

#

Thank you so much.

#

@uncut shadow Thanks

lapis sequoia Jul 11, 2020, 5:58 PM

#

Hi, i need help for this message error:

#

ModuleNotFoundError Traceback (most recent call last)
<ipython-input-4-7829a603588a> in <module>
1 import math
2 import matplotlib.pyplot as plt
----> 3 import keras
4 import pandas as pd
5 import numpy as np

ModuleNotFoundError: No module named 'keras'

high kettle Jul 11, 2020, 5:59 PM

#

are you using anaconda?

lapis sequoia Jul 11, 2020, 6:01 PM

#

yes with jupyter notebook

stuck cloak Jul 11, 2020, 6:02 PM

#

hey guys. rookie here. I have a pressing issue. trying to drop the \t\ in \t\ttactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt how can i do it? if i do .str.strip() it drops inclusive of the first t in the sequence.

uncut shadow Jul 11, 2020, 6:39 PM

#

@stuck cloak that's not exactly data science question, but if you want to remove \t\ just from that string, then you can use replace() or just slice it

#

also, it's something connected with DNA and Nucleic acid structure, right? :P

tardy portal Jul 11, 2020, 8:52 PM

#

Hello, i'm new to this server, but would someone be able to provide assistance with some questions I have regarding to dataframes?

desert oar Jul 11, 2020, 8:54 PM

#

just ask

#

don't "ask to ask"

#

!ask

arctic wedgeBOT Jul 11, 2020, 8:54 PM

#

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving.
• Be patient while we're helping you.

You can find a much more detailed explanation on our website.

tardy portal Jul 11, 2020, 8:54 PM

#

perfect

#

thank you

#

one second

#

📎 unknown.png

#

however, when I test case I receive this error

#

📎 unknown.png

#

what am I doing incorrectly?

desert oar Jul 11, 2020, 8:58 PM

#

what do you expect that to do

#

and what is score

#

and what do you think the error message means?

tardy portal Jul 11, 2020, 9:01 PM

#

the second screenshot is to verify if I completed the question correctly

#

@desert oar would it be possible to pm?

desert oar Jul 11, 2020, 9:05 PM

#

no

#

all those questions are related to solving your problem

#

im trying to be socratic here

tardy portal Jul 11, 2020, 9:05 PM

#

I see

desert oar Jul 11, 2020, 9:05 PM

#

because this is a common misunderstanding with python and pandas novices

#

but it stems from fundamentally not knowing how pandas works

tardy portal Jul 11, 2020, 9:06 PM

#

okay, moving forward

desert oar Jul 11, 2020, 9:06 PM

#

so what is the intended outcome

tardy portal Jul 11, 2020, 9:06 PM

#

📎 unknown.png

#

okay so I need to create another column named "AutoPayment"

desert oar Jul 11, 2020, 9:07 PM

#

is this a completely different problem?

tardy portal Jul 11, 2020, 9:07 PM

#

that derives from the column named "PaymentMethod"

#

yes it is, ill get back to the other problem later

#

so to create a new column is simple

#

df_churn['AutoPayment'] = df_churn['PaymentMethod']

#

no?

#

I know I need to create a for loop to obtain the answer, but should I use a split method?

desert oar Jul 11, 2020, 9:08 PM

#

you do not need a for loop

tardy portal Jul 11, 2020, 9:08 PM

#

if statement?

desert oar Jul 11, 2020, 9:08 PM

#

no

#

well

lapis sequoia Jul 11, 2020, 9:08 PM

#

<Message id=731618038905831505 channel=<TextChannel id=366673247892275221 name='data-science' position=25 nsfw=False news=False category_id=409692123944714240> type=<MessageType.default: 0> author=<Member id=389497659087650836 name='salt rock lamp' discriminator='0679' bot=False nick=None guild=<Guild id=267624335836053506 name='Python' shard_id=None chunked=False member_count=63127>> flags=<MessageFlags value=0>>

desert oar Jul 11, 2020, 9:08 PM

#

yes

lapis sequoia Jul 11, 2020, 9:08 PM

#

<Message id=731618041774866613 channel=<TextChannel id=366673247892275221 name='data-science' position=25 nsfw=False news=False category_id=409692123944714240> type=<MessageType.default: 0> author=<Member id=389497659087650836 name='salt rock lamp' discriminator='0679' bot=False nick=None guild=<Guild id=267624335836053506 name='Python' shard_id=None chunked=False member_count=63127>> flags=<MessageFlags value=0>>

south quest Jul 11, 2020, 9:09 PM

#

uhh

desert oar Jul 11, 2020, 9:09 PM

#

@tardy portal are you familiar with Series.map?

#

or Series.apply?

tardy portal Jul 11, 2020, 9:10 PM

#

no I am not familiar with that

#

let me look it up

#

one second

#

im sure the df that I'm working with is not a dict

desert oar Jul 11, 2020, 9:11 PM

#

sorry?

#

why would it be a dict?

#

its a dataframe

tardy portal Jul 11, 2020, 9:11 PM

#

📎 unknown.png

desert oar Jul 11, 2020, 9:12 PM

#

ok lets not use things you didnt learn about

#

you can do it with a for loop

#

it will just be very slow on a big dataset

#

im a little alarmed you didnt learn about .apply or .where or .mask

tardy portal Jul 11, 2020, 9:12 PM

#

don't you think it would be effective to use an if statement?

desert oar Jul 11, 2020, 9:12 PM

#

none of those sound familiar to you?

#

show me what you are thinking of

tardy portal Jul 11, 2020, 9:13 PM

#

one moment

desert oar Jul 11, 2020, 9:13 PM

#

!code-block

arctic wedgeBOT Jul 11, 2020, 9:13 PM

#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
• These are backticks, not quotes. Backticks can usually be found on the tilde key.
• You can also use py as the language instead of python
• The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')

tardy portal Jul 11, 2020, 9:14 PM

#

lets break this problem down step by step

#

so to create the new column named 'AutoPayment' would be the following:

#

''' df_churn['AutoPayment'] = df_churn['PaymentMethod'] '''

desert oar Jul 11, 2020, 9:15 PM

#

use backticks ` not single quotes '

tardy portal Jul 11, 2020, 9:15 PM

#

oh okay, my apologies

desert oar Jul 11, 2020, 9:15 PM

#

but ok

df_churn['AutoPayment'] = df_churn['PaymentMethod']

#

sure, that's fine

#

this should be a one liner btw

#

.apply, .map, .where, or .mask would all help you solve this

#

but its more important imo

#

to see what you currently are planning

#

because i think its the same issue

#

where you think if works differently than it does

#

and i want to have a simple example to demonstrate

tardy portal Jul 11, 2020, 9:18 PM

#

sure if you want to provide an example I would appreciate that

desert oar Jul 11, 2020, 9:18 PM

#

fundamentally it comes down to the fact that python is unaware that pandas objects are "vectors"

#

look at your if statement above

#

ok actually wait

#

your code in q 7 is even more confusing than i realized at first

#

lets just look at what you were planning to do for q 11

tardy portal Jul 11, 2020, 9:19 PM

#

Yes I want to create a new column where it would assign a value based on the payment method containing the words 'automatic'

desert oar Jul 11, 2020, 9:19 PM

#

yes i see that

#

so how were you planning to use if

tardy portal Jul 11, 2020, 9:26 PM

#

if search('automatic', df_churn['AutoPayment']): print 1 else: print 0

#

Im pretty sure its incorrect

#

that the if function does not operate that way

#

or if df_churn['AutoPayment'] == 'automatic': df_churn['AutoPayment'] = 1 else: df_churn['AutoPayment'] = 0

#

@desert oar

desert oar Jul 11, 2020, 9:33 PM

#

Yeah that's exactly the issue

#

Give me a minute

#

Im doing something offline ill @ you

tardy portal Jul 11, 2020, 9:33 PM

#

yeah would I have to incorporate a split function since the word 'automatic' is in cased (automatic)?

#

no worries, take your time

#

I appreciate your help

desert oar Jul 11, 2020, 9:34 PM

#

No

#

The if is misused

#

Python itself has no knowledge of the fact that a Series is a vector or collection

#

I guess this is a bit difficult to explain, basically "it doesn't work like that"

#

It's hard to explain on my phone anyway

#

If you write "if" that just acts on a single True/False

#

There's no notion in python of distributing that operation over a collection or sequence

#

At least, not with "if"

#

That's why pandas has all of these methods like apply, map, mask, etc.

#

That's what enables you to perform these vectorized/distributed computations

tardy portal Jul 11, 2020, 9:39 PM

#

that makes more sense

#

im going through each method to find what's most appropriate for the question

#

the datatype of the new column should be the same as the column we're take the information from

#

let me find out the datatype

desert oar Jul 11, 2020, 9:40 PM

#

That's not true in this case as per the question

#

You have a text column and you're trying to produce a numerical/integer column

tardy portal Jul 11, 2020, 9:41 PM

#

so I would have to change the dtype for the new column hence if the payment method includes the word 'automatic' it would assign its value to either 1 or 0

desert oar Jul 11, 2020, 9:41 PM

#

You would just end up overwriting the column

#

You shouldn't need to manually change dtype

tardy portal Jul 11, 2020, 9:42 PM

#

so i don't have to write anything that would change the value of the new column, there's obviously a way for the new column to show a certain type of value

desert oar Jul 11, 2020, 9:43 PM

#

I'm not sure what you mean by that, if you write code that returns a column of integers, the new column is full of integers

#

Doesn't need to be more complicated than that

#

Do you have experience with other programming languages or SQL?

tardy portal Jul 11, 2020, 9:44 PM

#

I know basic SQL

#

nothing intricate

desert oar Jul 11, 2020, 9:45 PM

#

OK, sometimes misconceptions like this arise because people make incorrect analogies to other tools

#

That doesn't seem to be the case here

tardy portal Jul 11, 2020, 9:45 PM

#

which makes sense, and I totally understand that notion

#

i'm just confused on creating the new column, it should equal to the column i'm taking data from

#

there should be more to it than this:

desert oar Jul 11, 2020, 9:45 PM

#

Why should it?

#

You don't have to create a new column first and then modify it

tardy portal Jul 11, 2020, 9:46 PM

#

df_churn['AutoPayment'] = df_churn['PaymentMethod']

desert oar Jul 11, 2020, 9:46 PM

#

Just write some code that emits the column you want, and assign that new column to the df

#

Don't overthink this

tardy portal Jul 11, 2020, 9:47 PM

#

okay I just did that

#

what i'm having trouble is how the column is able to assign the value of 1 or 0 based on the payment method including the word 'automatic'

#

I have a tendency of overthinking things, but i'm trying not to

desert oar Jul 11, 2020, 9:49 PM

#

Oh

#

I know how you can do this

#

You used loc above right?

#

To subset

tardy portal Jul 11, 2020, 9:49 PM

#

yes

desert oar Jul 11, 2020, 9:49 PM

#

With a boolean valued series

#

You can assign to a subset

#

df.loc[df['a'] == 3, 'q'] = -99

#

The above being one silly example

tardy portal Jul 11, 2020, 9:53 PM

#

but the concern is that within the original column 'PaymentMethod' the word 'automatic' is enclosed on parentheses

#


df_churn.loc[(df_churn['AutoPayment'] == 'automatic'), 'PaymentMethod'] = 1

#

📎 unknown.png

#

wouldn't those values be assigned a 1 if it includes the word 'automatic'?

desert oar Jul 11, 2020, 10:03 PM

#

yes, you're thinking one step ahead

#

also you can write df_churn['AutoPayment'] == 'automatic', you don't need to write (df_churn['AutoPayment'] == 'automatic')

#

also you have the column names swapped

#

of what you're assigning to

#

i think that's because the question is unclear, not your fault

tardy portal Jul 11, 2020, 10:04 PM

#

no the column names shouldn't be swapped because i'm creating the 'AutoPayment' column

desert oar Jul 11, 2020, 10:04 PM

#

look at the 2nd line

#

you're assigning to PaymentMethod

tardy portal Jul 11, 2020, 10:06 PM

#

alrighty

#

well after running df_churn['AutoPayment'] == 'automatic'

#

everything is false because the word 'automatic' is enclosed in parentheses

desert oar Jul 11, 2020, 10:07 PM

#

right

#

so that's the next challenge

#

you need to use some kind of string subset searching instead of testing for equality

#

right?

#

have you learned anything that helps you do that?

tardy portal Jul 11, 2020, 10:07 PM

#

that's why i ask if its wise to run a split function or iterate it based on '()'

desert oar Jul 11, 2020, 10:07 PM

#

no i think that's overengineered to your particular use case

#

do you know how to check if something is in a string in python?

#

i don't want to give away answers because this is homework

tardy portal Jul 11, 2020, 10:08 PM

#

one moment

#

df_churn['AutoPayment'].str.find('automatic')

desert oar Jul 11, 2020, 10:14 PM

#

you found that by searching the docs?

tardy portal Jul 11, 2020, 10:15 PM

#

well google lol

desert oar Jul 11, 2020, 10:15 PM

#

alright, let's pretend you found .str.contains instead 😉

fringe hearth Jul 11, 2020, 10:16 PM

#

hi, does anyone here know how to count points in an image using scikit?

desert oar Jul 11, 2020, 10:16 PM

#

you could do it with .str.find but you need to read the docs carefully

fringe hearth Jul 11, 2020, 10:16 PM

#

📎 pacman.jpg

tardy portal Jul 11, 2020, 10:16 PM

#

after running that, everything that does not include the word 'automatic' is valued at -1

fringe hearth Jul 11, 2020, 10:16 PM

#

Im willing to pay via paypal if anyone is able to count how many balls are there by using regionprops by scikit

#

please DM me

desert oar Jul 11, 2020, 10:17 PM

#

it's against the rules to ask for paid help here

#

!rules 6

arctic wedgeBOT Jul 11, 2020, 10:17 PM

#

Rules

6. No spamming or unapproved advertising, including requests for paid work. Open-source projects can be showcased in #show-your-projects.

fringe hearth Jul 11, 2020, 10:17 PM

#

whee should i ask?

tardy portal Jul 11, 2020, 10:17 PM

#

after changing it to .str.contains everything is either valued true or false

fringe hearth Jul 11, 2020, 10:17 PM

#

where

desert oar Jul 11, 2020, 10:17 PM

#

you can ask for help, you can't offer payment

#

but people will not do your work for you

slim fox Jul 11, 2020, 10:17 PM

#

everything is either valued true or false
That's normal, you can use it than as boolean mask

desert oar Jul 11, 2020, 10:17 PM

#

right ^

#

you're using this inside loc

#

not to assign on the right hand side of =

#

again, dont overthink this