desert parcel Jul 26, 2020, 3:53 AM

#

Yeah and how do I differenciate a vector?

bitter harbor Jul 26, 2020, 3:53 AM

#

https://en.wikipedia.org/wiki/Matrix_calculus

Matrix calculus

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, and/or of a multivariate function with respect to a...

flat quest Jul 26, 2020, 3:54 AM

#

You do it by having a single loss. Or a way to combine the loss

#

Into a single value

desert parcel Jul 26, 2020, 3:54 AM

#

hmmm

flat quest Jul 26, 2020, 3:54 AM

#

Just sum c into a scalar vale

bitter harbor Jul 26, 2020, 3:54 AM

#

^

desert parcel Jul 26, 2020, 3:55 AM

#

📎 unknown.png

#

what does that mean

#

just add those two together?

bitter harbor Jul 26, 2020, 3:55 AM

#

numpy() isn't a thing

desert parcel Jul 26, 2020, 3:55 AM

#

I changed it into a scalar and made it into a numpy array

#

📎 unknown.png

bitter harbor Jul 26, 2020, 3:56 AM

#

this si why I don't like nn libraries

desert parcel Jul 26, 2020, 3:56 AM

#

what

#

but in anycase

#

.numpy() does exist

flat quest Jul 26, 2020, 3:57 AM

#

I mean nn libs make our life a lot easier. But they’re useless if you don’t understand how they drive calculations.

bitter harbor Jul 26, 2020, 3:58 AM

#

I haven't really looked into them although I know they're optimized

#

usually just because I want to do the math

#

math being like building it with numpy

desert parcel Jul 26, 2020, 3:59 AM

#

well I did your thing

#

📎 unknown.png

#

This came out

flat quest Jul 26, 2020, 3:59 AM

#

You should try making a 5 layer dense model without it 😉

I mean there’s really not too much to it. Just gradients

desert parcel Jul 26, 2020, 3:59 AM

#

even though c is now one equation I can't do anything with it

because it's not a scalar

bitter harbor Jul 26, 2020, 4:00 AM

#

I actually used to have one with 5 hidden

desert parcel Jul 26, 2020, 4:00 AM

#

You mean with keras or something?

bitter harbor Jul 26, 2020, 4:00 AM

#

no

desert parcel Jul 26, 2020, 4:00 AM

#

Wait is it related with my problem?

bitter harbor Jul 26, 2020, 4:00 AM

#

partically

#

you should be learning about__at least__ linear algebra and how nn's work

#

stats too but you can __kind of __ get away without intensively studying it

desert parcel Jul 26, 2020, 4:02 AM

#

The video said

#

requirements: HS level math

bitter harbor Jul 26, 2020, 4:02 AM

#

no

desert parcel Jul 26, 2020, 4:02 AM

#

I'm in the final year of HS so I thought I could do it

#

but guess not

bitter harbor Jul 26, 2020, 4:02 AM

#

absolutely not

#

^ to the math level

desert parcel Jul 26, 2020, 4:03 AM

#

then maybe just tell me what to do

#

I'm trying out different things

#

how do i make a matrix into a scalar anyways

bitter harbor Jul 26, 2020, 4:03 AM

#

linear algebra's a lot different from Euclidean algebra (so mostly what you've been doing)

tidal bough Jul 26, 2020, 4:03 AM

#

matrixes are, uhhm, not scalars.

desert parcel Jul 26, 2020, 4:04 AM

#

I didn't even know euclidean algebra is a thing

#

I know matrixes aren't scalars

#

📎 unknown.png

bitter harbor Jul 26, 2020, 4:04 AM

#

what you've learnt with scalars

desert parcel Jul 26, 2020, 4:04 AM

#

This is some exercise from the video I'm using as a resource

📎 unknown.png

bitter harbor Jul 26, 2020, 4:04 AM

#

matrices can contain scalars?

desert parcel Jul 26, 2020, 4:05 AM

#

I mean can't they?

#

Matrices are just a list of scalars?

#

I'm very confused

#

and have no idea what i'm saying

tidal bough Jul 26, 2020, 4:05 AM

#

2d array, sure.

desert parcel Jul 26, 2020, 4:05 AM

#

What

tidal bough Jul 26, 2020, 4:05 AM

#

a matrix is a 2d array of scalars.

bitter harbor Jul 26, 2020, 4:06 AM

#

this is why the oxford comma is needed

desert parcel Jul 26, 2020, 4:06 AM

#

what's an oxford comma

#

is it any different from a normal comma

#

nvm that's english but what can I do

bitter harbor Jul 26, 2020, 4:06 AM

#

first, second, and last

desert parcel Jul 26, 2020, 4:06 AM

#

oh ok I like doing that

#

📎 unknown.png

#

I reverted it to the most basic

bitter harbor Jul 26, 2020, 4:07 AM

#

because idk if they want a combination of (x,w) and b, or x,w, and b

desert parcel Jul 26, 2020, 4:07 AM

#

and I got this

tidal bough Jul 26, 2020, 4:08 AM

#

did you just scalar-multiply two vectors or something?

#

What are you trying to do?

desert parcel Jul 26, 2020, 4:09 AM

#

I want to make an equation in this case c = a *b

#

and then differientiate it

#

with respect to either a or b

#

I am curious about finding the differentials of matrices

bitter harbor Jul 26, 2020, 4:12 AM

#

what's the above example?

desert parcel Jul 26, 2020, 4:14 AM

#

wdym

bitter harbor Jul 26, 2020, 4:14 AM

#

the questions says something about referring to the above example

#

what is it

desert parcel Jul 26, 2020, 4:15 AM

#

Let me link you the notebook

#

https://jovian.ml/aakashns/01-pytorch-basics

aakashns/01-pytorch-basics | Jovian

Share Juptyer notebooks instantly. Jovian makes Jupyter notebooks shareable, commentable and reproducible.

#

These are the important parts, which are the examples mentioned in the questions/exercises

📎 unknown.png

#

these are just the questions

📎 unknown.png

tidal bough Jul 26, 2020, 4:19 AM

#

What makes PyTorch special is that we can automatically compute the derivative of y w.r.t. the tensors that have requires_grad set to True i.e. w and b. To compute the derivatives, we can call the .backward method on our result y.

bitter harbor Jul 26, 2020, 4:20 AM

#

^^

desert parcel Jul 26, 2020, 4:20 AM

#

I understand this part

tidal bough Jul 26, 2020, 4:20 AM

#

I don't see you doing that on a or b 😛

desert parcel Jul 26, 2020, 4:20 AM

#

I did that at the top

tidal bough Jul 26, 2020, 4:20 AM

#

🤔

📎 unknown.png

desert parcel Jul 26, 2020, 4:20 AM

#

📎 unknown.png

#

I have multiple instances

#

Doing .backward() on a matrix gives an error

tidal bough Jul 26, 2020, 4:21 AM

#

...on the matrix for which you don't set requires_grad? :/

desert parcel Jul 26, 2020, 4:21 AM

#

📎 unknown.png

#

I'll just get everything hold on

bitter harbor Jul 26, 2020, 4:21 AM

#

f

desert parcel Jul 26, 2020, 4:21 AM

#

📎 unknown.png

bitter harbor Jul 26, 2020, 4:22 AM

#

...on the matrix for which you don't set requires_grad? :/

desert parcel Jul 26, 2020, 4:22 AM

#

what

bitter harbor Jul 26, 2020, 4:22 AM

#

c

desert parcel Jul 26, 2020, 4:22 AM

#

what

bitter harbor Jul 26, 2020, 4:23 AM

#

you have to set the requires_grad set to True on c

#

c is the only one you haven't

tidal bough Jul 26, 2020, 4:23 AM

#

not sure about that

desert parcel Jul 26, 2020, 4:23 AM

#

I don't

#

in the previous ones

tidal bough Jul 26, 2020, 4:23 AM

#

you might notice that it's a different error

desert parcel Jul 26, 2020, 4:23 AM

#

setting everything with requires_grad=True

#

and then making c equal the vars with that condition

#

can do .backward() just fine

tidal bough Jul 26, 2020, 4:23 AM

#

in fact, it's an error you have not shown us before 😛

desert parcel Jul 26, 2020, 4:24 AM

#

c doesn't need requires_grad

📎 unknown.png

#

the dc/dc should be dc/dbs

bitter harbor Jul 26, 2020, 4:27 AM

#

My thought is this guide is either outdated or wrong

tidal bough Jul 26, 2020, 4:28 AM

#

There's no reason to assume that, considering that the guide doesn't have an example using backward with anything but scalars

desert parcel Jul 26, 2020, 4:29 AM

#

The exercise says

#

📎 unknown.png

#

making an equation with matrices then finding the differentials

#

He didn't say it will work but the wikipedia link he added at the bottom to say you can makes me wanna make it work since it is possible

tidal bough Jul 26, 2020, 4:33 AM

#

https://discuss.pytorch.org/t/loss-backward-raises-error-grad-can-be-implicitly-created-only-for-scalar-outputs/12152

PyTorch Forums

Loss.backward() raises error 'grad can be implicitly created only f...

Hi, loss.backward() do not go through while training and throws an error when on multiple GPUs using torch.nn.DataParallel grad can be implicitly created only for scalar outputs But, the same thing trains fine when I give only deviced_ids=[0] to torch.nn.DataParallel. Is t...

desert parcel Jul 26, 2020, 4:34 AM

#

haizz

#

Alright more reading here I come

bitter harbor Jul 26, 2020, 4:38 AM

#

how hard is it to send nn's through the gpu?

flat quest Jul 26, 2020, 6:34 AM

#

well you have to reduce c into a single value. Sum over it and it's a one value @desert parcel

Ah nice. Did you implement the backprop yourself? @bitter harbor

bitter harbor Jul 26, 2020, 6:35 AM

#

Yep it’s honestly not too hard

#

My computer didn’t have the mem for it tho

flat quest Jul 26, 2020, 6:36 AM

#

yeah i know
its just it can get horrible memory inefficient when you're working with a million params

#

and some of the gradients aren't as easily defined. For example you can get the gradient of an if function with tf through graph defs

bitter harbor Jul 26, 2020, 6:39 AM

#

its just it can get horrible memory inefficient when you're working with a million params
It wasn’t even that, my input was just too big

ebon plinth Jul 26, 2020, 7:57 AM

#

Hello!

#

I need help accessing a variable form another .py on Visual Studio Code

silk axle Jul 26, 2020, 7:59 AM

#

@ebon plinth probably best to stick to your help channel (#help-peanut)

#

Someone's already answered too

ebon plinth Jul 26, 2020, 7:59 AM

#

Oooh, sorry. :<

digital juniper Jul 26, 2020, 10:17 AM

#

so i have a linear regression model and i'm getting very similar mean_absolute_error for my training and test data of roughly 10%, and the R^2 of my predictions vs test data is about 0.6-0.7

#

any ideas on the next steps i can take to improve the model? i think it's underfitting and I don't have an easy way (yet) to get more features

#

also i haven't done any kind of preprocessing other than removing null values, don't think feature scaling would help here?

shadow quiver Jul 26, 2020, 12:53 PM

#

Hey guys, I think I have a fun topic to discuss here.

In "social media languages", there are different types of "laughs" in different languages. E.g. English has "lol, rofl" etc. In Turkish, there is a laugh called "random laugh". You just type randomly on keyboard to remark that you lol'd. Like "aslkjhfsalkjg".

I want to detect these "laughs" for NLP purposes in twitter data. Do you have any idea where to start? What do you think? Thanks.

marble jasper Jul 26, 2020, 1:14 PM

#

hmm... interesting question

#

there's always manual data collection

#

but to automate it, I think one way would be to find a twitter bot account that's tweeting out non-language-specific humourous content, that has a wide audience who might reply with laughs; discard any tweets that contain real words or just emojis, and the rest send to human labellers

#

but I think on the balance, it might actually be easier to get humans to provide examples

#

probably easier to find a group of people and say "how would you express a laugh in response to a funny tweet?". probably much faster and more reliable than to scrape data off twitter and label it

shadow quiver Jul 26, 2020, 1:34 PM

#

@marble jasper I see. These are good points. Thanks. But you know, just like labeling named entities on the text and training a model for NER, how can I do this to catch these laughs?

Can I train a character-level network to catch these words? I can find a formal written data and use it for normal words, because I'm sure there won't be any random laughs. And I can collect random laughs from my coworkers, then train a character-level model or something.

There are grammar rules for formal words like in Turkish. E.g. there can't be any successive vowels in a word (of course there are exceptions). Can such model learn these and detect random laughs as "this word doesn't seem to be ensure grammar rules" or something? At least a model that can decide "there are random laughs in this text" would be good, because these laughs can be very important for sentiment analysis problems.

marble jasper Jul 26, 2020, 1:35 PM

#

I was thinking just using a dictionary of valid words

#

and if they aren't in there, then the chance of that word being a laugh is higher

#

if you're looking at responses to humour, the chance of that word being a laugh is higher

#

so for both those reasons you should have a bias of leftover words that are laughs, that a human can then label; BUT the human would also need to be knowledgeable about laughs in different languages, at which point you'd have to ask...why don't you just ask that same human to write them out?

#

this is one of those datasets where labelling might take longer and provide no more information than having a human generate the data in the first place

shadow quiver Jul 26, 2020, 1:43 PM

#

@marble jasper Yeah. But you know as this is social media, people don't pay much attention to grammar rules. So any word that doesn't follow grammar rules there is a high chance that it's a normal word just written informal.

Maybe I can calculate the "formality" of each tweet and use them in training. So if a tweet is seem to be negative, but written in an informal way, may be just a chit-chat and may not be that negative. But if it seem to be negatie and is formal, it's probably really in a negative mood. I don't know, it's hard to guess without trying but it may improve the model accuracy.

marble jasper Jul 26, 2020, 1:58 PM

#

I feel like you're ignoring all the bits of my reply that is telling you to get a human to provide the dataset directly rather than attempt to work it out

#

because whatever you do, whatever rules you put in place, you will spend a lot of effort to reduce the number of false positives, but ultimately you will still need a human to make a final judgement about whether these are correct or not. And I am saying that this seems to me that you would get more and better results just asking the human to provide you with a bunch of examples of laughs

#

go into a discord server, or go on twitter, and ask people to provide examples. immediately you have generated a far higher quality dataset than if you had tried to scrape twitter and reduce the dataset down to something a human can look at and five a final yes/no label to

shadow quiver Jul 26, 2020, 2:04 PM

#

Okay I see now. Thank you again for discussion.

ebon nebula Jul 26, 2020, 2:07 PM

#

Hello all. Can someone suggest me a good book teaching mathematics trough Python?

lapis sequoia Jul 26, 2020, 2:30 PM

#

Hi, im very new to Python and got a lot of help making this 3D graph so please don't over estimate my knowledge (~10hrs). I was able to get this 3D scatter plot to work, but i want to add a color bar, same as the colormap, stuck to the side of the interactive figure when viewed. What coding would I have to pass to make that work along side my 'terrain' cmap?

import numpy as np
import matplotlib.pyplot as plt

f = open('wsands_lidar2003_topex_sample100.txt', 'r')

print(f.name)

lat, lon, elev = np.loadtxt('wsands_lidar2003_topex_sample100.txt', unpack=True, usecols=(1, 2, 3))

xlims = [253.55, 253.59]
ylims = [32.885, 33.01]
zlims = [1166, 1169]


def threeaxisgraph(xdata, ydata, zdata, xlabel, ylabel, zlabel):
  fig = plt.figure(figsize=(20, 2.5))
  ax = fig.add_subplot(projection='3d')
  ax.scatter(xdata, ydata, zdata, alpha=0.1, marker='o', c=zdata, cmap='YlOrBr', vmin=1165., vmax=1169., s=1)
  ax.set_xlabel(xlabel, size=20)
  ax.set_ylabel(ylabel, size=20)
  ax.set_zlabel(zlabel, size=20)
  ax.set_xlim(xlims)
  ax.set_ylim(ylims)
  ax.set_zlim(zlims)
  plt.show(threeaxisgraph)


threeaxisgraph(lon, lat, elev, "Longitude", "Latitude", "Elevation")

f.close()

arctic cliff Jul 26, 2020, 3:27 PM

#

Should I always get rid of rows with nan? even if only one column has Nan in it ?

lone nebula Jul 26, 2020, 3:35 PM

#

hey guys uh

#

im doing datacamp

#

doing this project

#

can someone help me with this error

#

cant figure it out

#

import numpy as np
# Clean the special case columns
# Changing kB to MB by dividing by 1000
apps['Size'] = apps['Size'].apply(lambda x: str(float(x.replace('k', '')) / 1000) \
                                  if 'k' in x else x)
apps['Size'] = apps['Size'].replace('Varies with device', np.nan)

chars_to_remove = ['+', ',', 'M', '$']
cols_to_clean = ['Installs', 'Size', 'Price']
for col in cols_to_clean:
    # Remove the characters preventing us from converting to numeric
    for char in chars_to_remove:
        apps[col] = apps[col].str.replace(char, '')
    # Convert the column to numeric
    apps[col] = pd.to_numeric(apps[col])```

#

the error is this

TypeError                                 Traceback (most recent call last)
<ipython-input-44-341eaa82adae> in <module>
      2 # Clean the special case columns
      3 # Changing kB to MB by dividing by 1000
----> 4 apps['Size'] = apps['Size'].apply(lambda x: str(float(x.replace('k', '')) / 1000) \
      5                                   if 'k' in x else x)
      6 apps['Size'] = apps['Size'].replace('Varies with device', np.nan)

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-44-341eaa82adae> in <lambda>(x)
      3 # Changing kB to MB by dividing by 1000
      4 apps['Size'] = apps['Size'].apply(lambda x: str(float(x.replace('k', '')) / 1000) \
----> 5                                   if 'k' in x else x)
      6 apps['Size'] = apps['Size'].replace('Varies with device', np.nan)
      7 

TypeError: argument of type 'float' is not iterable```

lone nebula Jul 26, 2020, 4:03 PM

#

nvm fixed

arctic cliff Jul 26, 2020, 4:46 PM

#

Doesn't look to be looping

📎 unknown.png

limpid oak Jul 26, 2020, 5:04 PM

#

@arctic cliff try apply method on df

ocean horizon Jul 26, 2020, 5:11 PM

#

Hi everyone. I'm looking for free resources to learn AI programming with Python. I would really appreciate it if you could help me out.

limpid oak Jul 26, 2020, 5:11 PM

#

!resources

arctic wedgeBOT Jul 26, 2020, 5:11 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

limpid oak Jul 26, 2020, 5:11 PM

#

try this

#

@ocean horizon

ocean horizon Jul 26, 2020, 5:13 PM

#

Thank Spidy!

lapis sequoia Jul 26, 2020, 5:46 PM

#

anyone know how i could update a subset of rows in my dataframe?

#

📎 image0.png

#

i'd like to update the top section to the bottom section

#

df.loc[df['APPLICATION_ID']=='AP000156'] = df.loc[df['APPLICATION_ID']=='AP000156'].reindex(idx, fill_value=0)``` doesn't work

#

anyone know

signal sluice Jul 26, 2020, 6:19 PM

#

man data science is hard when ur trying get as much useful data as possible but also store as little data as possible for space but you dont KNOW what data is useful

uncut shadow Jul 26, 2020, 6:47 PM

#

data science is about data, ya know

flat quest Jul 26, 2020, 7:00 PM

#

yup thats the dillemma. Though you could figure out which one's are strong predictive features by statistical analysis or empirical observation @signal sluice

modern canyon Jul 26, 2020, 7:12 PM

#

Is there any better way to do this?

alpha_list = []
numeric_list = []
for i in sorted(imdb['primaryTitle'].tolist()):
    if i[0] in string.ascii_letters:
        alpha_list.append(i)
    else:
        numeric_list.append(i)

sorted_list = alpha_list + numeric_list

flat quest Jul 26, 2020, 7:18 PM

#

well you need to reindex the entire df if you want the original one but modified. You're reindexing a subset of it @lapis sequoia

lapis sequoia Jul 26, 2020, 7:18 PM

#

i did it in an extremely inefficient way but i'd be curious how to efficiently do it lol

flat quest Jul 26, 2020, 7:20 PM

#

you could directly sort the dataframe itself
then select the ones with ascii chacarters and convert that into a alpha list

get the ones without ascii and convert that into a list for the numeric list.
You check for ascii by doing a regex match or just first character match if its guarenteed that a string will either be entirely ascii or it won't. @modern canyon

modern canyon Jul 26, 2020, 7:25 PM

#

I read your answer three times, but I still couldn't figure out how your approach is different than mine.

flat quest Jul 26, 2020, 7:26 PM

#

well to do it more efficiently, you could just copy the top values into a temporary variable.

Then do

tmp = df.loc[bottom section]

df.loc[bottomsection] = df.loc[uppersection]
df.loc[uppersection] = tmp

@lapis sequoia

#

it gets rid of the for loops since ur using vectorized pandas methods

#

pandas has a built in sorting function for df's

lapis sequoia Jul 26, 2020, 7:26 PM

#

oh ok, the problem is i had to do that inside a for loop which is what i had trouble with

#

but yeah i see how i could apply this in a for loop

#

thank you

flat quest Jul 26, 2020, 7:33 PM

#

you wouldnt necesarrily have to do it in a for loop.
just get the indices or boolean array that match the upper section. Just get the app_ids that equal AP000156.

If its the case that you're trying to move specific values dispersed all throughout the df into the upper region you could just do

upper = df.loc[bool array]
lower = df.loc[~bool ar]

joined = upper.append(lower).reset_index()

modern canyon Jul 26, 2020, 7:46 PM

#

then select the ones with ascii chacarters and convert that into a alpha list

@flat quest how do I "select"? .apply, .filter, .where?

#

i'm asking because idk how they work internally

#

does all of these use vectorized operations?

lapis sequoia Jul 26, 2020, 8:00 PM

#

i am trying to use numba in my code, can anyone help me out a bit?

#

code works for a few lines

#

and shows output

#

but it shows no output when i use more than 1000 lines

flat quest Jul 26, 2020, 8:01 PM

#

any one of those will work

There's also a number of str operations available. if you df[col].str that you could possibly use.

Well they're much more vectorized than python for loops, as they're all coded in C++. Apply and filter will be slower than any boolean condition

Such as df[col].str.contains([chars]) @modern canyon

signal sluice Jul 26, 2020, 8:04 PM

#

oh yeah drag

#

most of my data is categorical

#

which im not altogether too familiar with tbh

#

should probably look at some stats resources for categorical data

uncut shadow Jul 26, 2020, 8:06 PM

#

@lapis sequoia provide some code or errors

flat quest Jul 26, 2020, 8:24 PM

#

yeah
look into one hot encoding and label encoding @signal sluice

And brush up on terms like cardinality, ordinal data, and so forth. A basic statistics knowledge won't hurt 😉

desert oar Jul 26, 2020, 9:35 PM

#

does anyone know if pandas has any plans to add native list/array type columns in upcoming versions?

last agate Jul 26, 2020, 9:39 PM

#

Any body know of any good recourses on pathfinding algorithms?

wanton anchor Jul 26, 2020, 9:49 PM

#

Anyone know of any good Tensorflow tutorials?

silk knot Jul 26, 2020, 10:44 PM

#

!paste

#

alright

#

https://paste.pythondiscord.com/agizekusak.coffeescript

#

this piece of code

#

results in

📎 unknown.png

#

but I need the ''rows'' array to be the vertical ones to the left

#

actually, just realised it's irrelevant for the end result

desert oar Jul 26, 2020, 10:59 PM

#

@silk knot in general you can make a column the "index" (the stuff on the left) with DataFrame.set_index()

silk knot Jul 26, 2020, 11:01 PM

#

where should I put that? line 48?

silk knot Jul 26, 2020, 11:19 PM

#

pd.DataFrame.set_index(rows) doesn't really work for me right now

#

TypeError: set_index() missing 1 required positional argument: 'keys'

frank bone Jul 26, 2020, 11:40 PM

#

How would you go about grouping similar values (say +- 10% within a pandas df columns, mostly in dfs with 50ish entries and 2-3 such similar value pairs?

hoary flint Jul 26, 2020, 11:57 PM

#

@frank bone https://scikit-learn.org/stable/modules/clustering.html

frank bone Jul 27, 2020, 12:07 AM

#

Thanks! Looking into it 😄

desert oar Jul 27, 2020, 12:08 AM

#

@silk knot i still don't know what your code does, does rows represent a single Series?

lapis sequoia Jul 27, 2020, 12:08 AM

#

do i need to transform my data into a normal distribution if i want to calculate the +/- 1 standard deviation from the mean if my mean is 0.40, standard deviation is 1.25, and min value is 0?

#

it's heavily skewed right

desert oar Jul 27, 2020, 12:09 AM

#

standard deviation is always valid to compute

#

whether it's useful or relevant to your problem is another story

lapis sequoia Jul 27, 2020, 12:09 AM

#

median is 0 btw*

#

i see

#

so if i want to label something as low frequency or high frequency, given those values above, you think it'd be best to transform the skewed data and then label low/high frequency based on transformed values?

silk knot Jul 27, 2020, 12:11 AM

#

rows are mass, like g/mol values

📎 unknown.png

#

But I got alot of help with writing the code but as I understand it, those values are saved in 'rows'

desert oar Jul 27, 2020, 12:14 AM

#

@lapis sequoia maybe. another option would be to use above/below median

#

since then 50% of the data is "high" and 50% is "low"

#

so "rows" is a list/array/series of numbers? @silk knot

silk knot Jul 27, 2020, 12:14 AM

#

Yes

desert oar Jul 27, 2020, 12:15 AM

#

ok, i recommend using a more descriptive name like mass

lapis sequoia Jul 27, 2020, 12:15 AM

#

thanks

desert oar Jul 27, 2020, 12:15 AM

#

then you can assign it as a column to your dataframe data['mass'] = mass

#

and then you can do data.set_index('mass') once you've created the 'mass' column as per above

silk knot Jul 27, 2020, 12:17 AM

#

yea you are right, that would be better

#

im gonna give it a go

#

Didn't really workout, but Im gonna have to go to bed now, really appreciate your help tho!

#

the mass doesn't affect the PCA, or so I think, so its fine really.

#

ty very much tho

waxen inlet Jul 27, 2020, 12:39 AM

#

What do you guys reccomend for getting into data science?

marsh berry Jul 27, 2020, 2:01 AM

#

Anyone know how I can skip every 4th row in pandas?

#

@waxen inlet I recommend learning the Scipy libraries

velvet thorn Jul 27, 2020, 3:08 AM

#

@marsh berry

#

you can do this:

#

subset = s[s.index % 4 != 3]
subset.groupby(subset.index // 4).mean()

marsh berry Jul 27, 2020, 3:10 AM

#

@velvet thorn Thank you so much! What I am trying to achieve is this: How can I get the mean of every 3 rows skipping the 4th row in pandas? So like get the mean of rows 1,2,3 then skip row 4 and then get the mean of 5,6,7 and skip 8 and so forth

📎 unknown.png

velvet thorn Jul 27, 2020, 3:10 AM

#

where s is the Series that you want to apply that transformation to

#

>>> s = pd.Series(range(12))                                                                      >>> subset = s[s.index % 4 != 3]                                                                  >>> subset.groupby(subset.index // 4).mean()                                                      0    1
1    5
2    9

...because (0 + 1 + 2) / 3 is 1, (4 + 5 + 6) / 3 is 5 and (8 + 9 + 10) / 3 is 9

#

yes

marsh berry Jul 27, 2020, 3:13 AM

#

Thank you so much! And is this also possible to apply to a dataframe? Like its basically similar data just with multiple columns. And technically each column is considered a series right?

velvet thorn Jul 27, 2020, 3:13 AM

#

yes

#

not "technically"

#

each column is a Series

desert oar Jul 27, 2020, 3:18 AM

#

@marsh berry are these dates or something?

#

but yes this subset groupby thing is probably best in the absence of additional information

marsh berry Jul 27, 2020, 3:19 AM

#

@desert oar Nah they're UV absorbance values

#

The entire dataset looks like this

📎 unknown.png

#

I was trying to figure out how to skip every fourth row and get the mean of the first three

#

@velvet thorn Thank you btw!

desert oar Jul 27, 2020, 3:22 AM

#

note that if your index isn't integers it won't work

marsh berry Jul 27, 2020, 3:22 AM

#

Yeah my indices are ints like this

📎 unknown.png

desert oar Jul 27, 2020, 3:23 AM

#

ok

#

in general you can do something like np.arange(data.shape[0]) instead of data.index

#

if for example you are using strings or something else as index values

velvet thorn Jul 27, 2020, 3:23 AM

#

if your index isn't an integer you can use an iterable of the same shape

marsh berry Jul 27, 2020, 3:26 AM

#

So it's looking it skips every 3rd row instead of 4th

#

📎 unknown.png

#

df[df.index % 4 != 3]

desert oar Jul 27, 2020, 3:27 AM

#

i think there's some double subsetting going on

marsh berry Jul 27, 2020, 3:27 AM

#

Did I do something wrong in this

desert oar Jul 27, 2020, 3:27 AM

#

your index starts at 1 not 0

#

so make that (df.index - 1) % 4

#

oh

#

and they arent sequential integers

#

or are they?

#

either way if they start at 1 you will be off by 1

marsh berry Jul 27, 2020, 3:29 AM

#

Thank you so much for the catch!

#

It was indeed the 0 and 1 index

#

📎 unknown.png

velvet thorn Jul 27, 2020, 3:33 AM

#

incidentally, you can check if it's a sequential index

desert oar Jul 27, 2020, 3:34 AM

#

@velvet thorn is there some magic pandas method for this? or do you have some math trick in mind

#

you can check if it's a RangeIndex of course

#

but that doesn't necessarily cover all cases

velvet thorn Jul 27, 2020, 3:35 AM

#

ye, that's what I meant

#

like

#

that is the case in which you can be sure it's sequential

#

but if it's like an Int64Index or something

#

then I think you need the manual approach

slate scroll Jul 27, 2020, 3:37 AM

#

Couldn't you just do something like df.index.values() == list(range(df.shape[0]))? Or are there performance concerns with dataset size/etc.

velvet thorn Jul 27, 2020, 3:37 AM

#

in general you can do something like np.arange(data.shape[0]) instead of data.index
@desert oar basically this?

#

like

#

the thing is

#

if you're not sure whether the index is sequential

#

you might as well just groupby a separate array

desert oar Jul 27, 2020, 3:38 AM

#

yeah i would just do the arange thing in most cases

velvet thorn Jul 27, 2020, 3:38 AM

#

because you would have to do that anyway if the index wasn't sequential

desert oar Jul 27, 2020, 3:38 AM

#

im very frequently subsetting and slicing such that the index is non sequential

#

right

#

i lean heavily on indexes in pandas

slate scroll Jul 27, 2020, 3:39 AM

#

I also use this heavily: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html

desert oar Jul 27, 2020, 3:39 AM

#

i try to avoid it tbh

#

feels like an anti pattern sometimes

#

i wish groupby didnt automatically create an index for example

slate scroll Jul 27, 2020, 3:39 AM

#

Really, how so?

desert oar Jul 27, 2020, 3:40 AM

#

its just inelegant i guess

#

as for groupby, sometimes i want the group value as the index and sometimes i dont

slate scroll Jul 27, 2020, 3:40 AM

#

I find it's the clearest way to switch back to a range index

desert oar Jul 27, 2020, 3:40 AM

#

ah

#

i rarely want that, but maybe this is a good application for it

slate scroll Jul 27, 2020, 3:40 AM

#

I typically find myself say, joining so I need to set a column as an index, then I need a range index for sorting etc so I reset it

desert oar Jul 27, 2020, 3:40 AM

#

again, i tend to depend heavily on indexes so i dont like to reset them

slate scroll Jul 27, 2020, 3:41 AM

#

I use it when I want to switch from a manual index back to a range index.

desert oar Jul 27, 2020, 3:41 AM

#

yeah but how often do you need range index

slate scroll Jul 27, 2020, 3:41 AM

#

Pretty often, they're useful for sorting and ordering things which I do lots of.

desert oar Jul 27, 2020, 3:41 AM

#

can you give an example

#

maybe this is a way to use pandas i havent thought of

slate scroll Jul 27, 2020, 3:42 AM

#

Business logic might be, "I want a TV show with x category in slots 3-6, and 7-11. If one isn't in those spots, move the next one up"

desert oar Jul 27, 2020, 3:43 AM

#

so the index represents TV show ordering

slate scroll Jul 27, 2020, 3:43 AM

#

Yep.

desert oar Jul 27, 2020, 3:43 AM

#

and you actually want to change the ordering

#

i see

slate scroll Jul 27, 2020, 3:43 AM

#

Exactly

#

It's useful when ordering matters.

desert oar Jul 27, 2020, 3:43 AM

#

i dont think ive ever had to do such a thing with pandas

#

interesting

slate scroll Jul 27, 2020, 3:44 AM

#

Might be a bit of a niche problem I can see that but I do find myself using it.

desert oar Jul 27, 2020, 3:44 AM

#

im almost always carrying around things like 'customer_id'

#

which as you can imagine i do not want to lose track of

#

especially when i have customer_id, account_id, agent_id, et multa alia

slate scroll Jul 27, 2020, 3:44 AM

#

Right so that's where I'll set my id to the index for joining, but then reset it to do reordering

hollow oyster Jul 27, 2020, 3:45 AM

#

Hey may I ask a question if I m not interrupting

desert oar Jul 27, 2020, 3:45 AM

#

go ahead

hollow oyster Jul 27, 2020, 3:45 AM

#

I wanted to ask my X_train, X_test have 181 cols while training the model so does that mean if I have to deploy it on flask I need to take 181 inputs from the user?

desert oar Jul 27, 2020, 3:45 AM

#

flask is for handling HTTP requests

velvet thorn Jul 27, 2020, 3:45 AM

#

uh.

desert oar Jul 27, 2020, 3:46 AM

#

so if you want them to do it via a website or web API, then yes flask is one way to do it

velvet thorn Jul 27, 2020, 3:46 AM

#

that would depend on what those columns are...

desert oar Jul 27, 2020, 3:46 AM

#

if the "user" is a machine that seems easy enough

#

i have built apps like this for example where the input is a json array of dicts

slate scroll Jul 27, 2020, 3:46 AM

#

In case anyone else is unaware, I 💟 this for adding an API on top of models https://www.tensorflow.org/tfx/serving/api_rest

TensorFlow

RESTful API | TFX | TensorFlow

desert oar Jul 27, 2020, 3:46 AM

#

that array of dicts becomes a pandas data frame

#

we do data processing on said data frame

#

then pass that along to the ML stuff

#

but yeah there are lots of tools to build APIs automatically

#

however i will generally take umbrage at most uses of the term "REST" w/ respect to a deployed model..

hollow oyster Jul 27, 2020, 3:47 AM

#

👍🏻

desert oar Jul 27, 2020, 3:47 AM

#

oh this is actually pretty RESTful

#

nice

slate scroll Jul 27, 2020, 3:48 AM

#

Right, but I agree most are not. But that one is beautiful. Gives you a built in gRPC server as well.

desert oar Jul 27, 2020, 3:48 AM

#

yeah this is very complete

slate scroll Jul 27, 2020, 3:48 AM

#

All written in C too, very performant

desert oar Jul 27, 2020, 3:48 AM

#

its slick enough to make me want to use TF at work just for the convenience of having this

#

although its likely we are switching to MLFlow for "deployment" anyway

slate scroll Jul 27, 2020, 3:49 AM

#

We evaluated MLFlow and decided against it in favor or Kubeflow + KFServing (which includes the above). What about MLFlow made your $job wannt to use it?

desert oar Jul 27, 2020, 3:49 AM

#

Databricks integration

slate scroll Jul 27, 2020, 3:49 AM

#

Ahh luckily a need we don't share.

desert oar Jul 27, 2020, 3:50 AM

#

fwiw i think databricks has saved us a huge amount of overhead in maintaining a hadoop cluster just for spark

#

nobody was using hadoop for anything else

slate scroll Jul 27, 2020, 3:50 AM

#

we use dataproc on google cloud for the same thing

desert oar Jul 27, 2020, 3:50 AM

#

that said, there are definitely people here who just love throwing money at microsoft

#

it doesnt affect me either way, i stay far far away from those databricks notebooks

#

databricks-connect is at least as good as livy if not better

#

and there is a nice suite of CLI utilities for interacting with databricks and DBFS, which acts more or less the same as HDFS

#

plus it also has lots of integration w/ other azure things like ADLS

#

which we depend on heavily

slate scroll Jul 27, 2020, 3:52 AM

#

Yeah I like how EMR and GCP dataproc get native HDFS access to S3/GCS respectively.

desert oar Jul 27, 2020, 3:52 AM

#

not to mention spark itself

#

yep basically the same idea

#

we are an azure shop more or less

slate scroll Jul 27, 2020, 3:52 AM

#

Interesting, I've done a few years in AWS and am now in GCP but never touched Azure (aside from their Klingon translation API)

desert oar Jul 27, 2020, 3:53 AM

#

i think the killer feature for azure (for enterprise) is that everything integrates with active directory

#

so you can e.g. link your databricks account to your active directory account

#

SSO everywhere

#

very easy for the user

#

more secure for the org

slate scroll Jul 27, 2020, 3:53 AM

#

Yeah similarly we have G-Suite which integrates with GCP

desert oar Jul 27, 2020, 3:53 AM

#

yep makes sense

#

do your people manage k8s in house?

#

ive heard that can be quite involved

#

for context, i try to stay away from IT and devops/MLops

slate scroll Jul 27, 2020, 3:54 AM

#

Yes we do. It's really not bad. I do most of the management for my team of 8 devs. It has a learning curve.

desert oar Jul 27, 2020, 3:55 AM

#

(but it matters to me quite a bit when it comes to the team's workflow, reproducibility, sharing/collaboration ability, etc)

#

good to know

#

i'll keep it in mind for if/when i move to a smaller org again

slate scroll Jul 27, 2020, 3:55 AM

#

Yeah I love process improvement so I like doing CI/CD, devops, etc.

desert oar Jul 27, 2020, 3:55 AM

#

yeah i like process improvement but on the data science side

#

i really dont want to care about deployment

#

and i try to avoid caring as much as i can 😛

slate scroll Jul 27, 2020, 3:56 AM

#

Ahhh yeah so my group is applied ML, so most of our process is around building (and deploying) applications.

desert oar Jul 27, 2020, 3:56 AM

#

i see. yeah im much more concerned with making sure i can use my colleagues' code

#

than i am with making sure Joe in underwriting has a web interface for getting model predictions

slate scroll Jul 27, 2020, 3:57 AM

#

Yeah our products are 95% APIs which are integrated into products to provide inference for users

desert oar Jul 27, 2020, 3:57 AM

#

yeah i wish we had a team like yours

#

i've built dashboards etc before i just really dont want to support it in production

#

so much extra work that isn't getting my work done

slate scroll Jul 27, 2020, 3:58 AM

#

Yeah our group split out of a group like that

desert oar Jul 27, 2020, 3:58 AM

#

"Dear VP, please find attached a discussion I had with a guy on the internet, who supports my idea that we need to hire more qualified ML dev people. Best, salt rock."

#

sounds good right?

slate scroll Jul 27, 2020, 4:01 AM

#

Getting our group formed was not clean. It took a sr. exec threatening to leave since their work wasn't being integrated into products. They were given a mandate to start a group to prove we could do it. Now 2 years later we're growing.

desert oar Jul 27, 2020, 4:02 AM

#

ah, enterprise

#

glad youre making it work

#

sounds like you enjoy what you do

slate scroll Jul 27, 2020, 4:03 AM

#

I do, It's definitely rewarding.

fleet moth Jul 27, 2020, 6:17 AM

#

Hello Everybody. I try to get a grouped BarChart with Matplotlib and pandas as annoted here: https://matplotlib.org/3.3.0/gallery/lines_bars_and_markers/barchart.html#sphx-glr-gallery-lines-bars-and-markers-barchart-py

#

my code: ```py
class StatisticDialog:
def init(self):
self.labels = []
self.score = []
self.x = 0
self.width = 0.38
self.db = OctopusDB()

def create_graph(self):
    datas = self.db.select().to_numpy()
    x = np.arange(len(self.labels))
    fig, ax = plt.subplots()

    for data in datas:
        self.labels.append(f"{data[0]}: {data[1]}")
        self.labels = list(dict.fromkeys(self.labels))
        self.score = list(dict.fromkeys(self.score))

    for i in self.labels:
        for s in self.score:
            ax.bar(x - self.width / 2, s, self.width, label=i)


    ax.set_ylabel("Temps d'interruption")
    ax.set_title("Interruption en temps et priorité")
    ax.set_xticklabels(self.labels)
    ax.legend()

    fig.tight_layout()
    plt.show()

#

my error: No handles with labels found to put in legend. QCoreApplication::exec: The event loop is already running

#

and the visual result:

📎 Capture.PNG

#

can you help me to find the error why my code doesn't build bar chart ?

south wedge Jul 27, 2020, 7:11 AM

#

can some one help to build my own model in machine learning without using sklearn or tensorflow?

winter barn Jul 27, 2020, 7:13 AM

#

Whats the fastest way to jump into ML/AI

untold aspen Jul 27, 2020, 7:33 AM

#

can some one help to build my own model in machine learning without using sklearn or tensorflow?
@south wedge that's feasible using standard libs such as numpy and such but why would you do that?TF is used so much for ML because it leverages the use of tensors which is the perfect data type for dealing with n-dim data which ML models encounter all the time. You can build your own model in TF by using tensor placeholders, variables, and such.

south wedge Jul 27, 2020, 7:35 AM

#

but i need to learn how to program my own programs without using any others.

untold aspen Jul 27, 2020, 7:36 AM

#

You can even pick apart pre-existing models in TF libs and change it in your way by overwriting some functions in the models (due to them being Python classes at the very low level)

#

If you want to try to implement it without TF then perhaps audit some courses by Andrew Ng on Coursera

#

He's got notebooks that teaches you to build those ML models from just Numpy

#

and some other standard libs

south wedge Jul 27, 2020, 7:38 AM

#

Thanks

tidal bough Jul 27, 2020, 7:39 AM

#

some simple ML algorithms can be implemented yourself indeed; check out gradient descent, linear regression by directly solving the matrix equation, and NN backpropagation.

untold aspen Jul 27, 2020, 7:39 AM

#

i can help later if you struggle but just check those out cus Andrew Ng's courses are pretty nuanced

#

oh about implementing them with raw python codecademy premium course on ML is perfect

south wedge Jul 27, 2020, 7:40 AM

#

is cousera free or paid?

untold aspen Jul 27, 2020, 7:40 AM

#

free for audit

#

paid for if you want to get that certif

#

Codeacademy's course on ML is interactive

#

and it's basically you program the ML algos yourself (guided by instructions)

south wedge Jul 27, 2020, 7:41 AM

#

im trying to build my own AI system better if you can join my project.

untold aspen Jul 27, 2020, 7:41 AM

#

and then after that you use the API

#

I'm happy to help

#

Tell me more about it via PM

south wedge Jul 27, 2020, 7:43 AM

#

PM?

untold aspen Jul 27, 2020, 7:43 AM

#

priv msg

#

just friend me and chat

south wedge Jul 27, 2020, 7:43 AM

#

ok

#

so you or me should join a private server

#

im new to DISCORD

untold aspen Jul 27, 2020, 7:44 AM

#

nah don't need a serv if only 2 ppl chatting

#

click on my profile and copy my name+tag

#

then go to your home and go add friend

#

copy and paste to find me

frank bone Jul 27, 2020, 7:47 AM

#

does anyone have experience with isolation forest and streaming data? my question is on performance let's say I want a sliding window (30 days) and i want to omit the last and add the newst but dont want to retrain all 30 days...since ill have hundreds of thousands if not millions of trainings

#

is it possible? is there reference code for this? couldnt find anything usable so far. It concerns time series data

untold aspen Jul 27, 2020, 7:49 AM

#

Whats the fastest way to jump into ML/AI
@winter barn I highly rec Codecademy courses. They're great for starters who need interactive learning.

vivid idol Jul 27, 2020, 8:39 AM

#

Hello everyone, is the right place to ask questions for python coding related to data science?

tidal bough Jul 27, 2020, 8:41 AM

#

Presumably 🙂

last agate Jul 27, 2020, 8:55 AM

#

Any body know of any good recourses for pathfinding algorithms?

frank bone Jul 27, 2020, 9:08 AM

#

How do I avoid list within list when using csv module & csv.reader?

#

tried to make an empty list first and then use list.extend(csv.reader(file)) but it ends up putting a list into my list when I just want the elements

untold aspen Jul 27, 2020, 9:16 AM

#

Hello everyone, is the right place to ask questions for python coding related to data science?
@vivid idol well it's #data-science-and-ml so you're in the right place buddy

#

How do I avoid list within list when using csv module & csv.reader?
@frank bone what does the csv file contain? if it's something to make into a dataframe then pandas.csv_reader(file) is perfect

#

otherwise csv_reader is still good

frank bone Jul 27, 2020, 9:19 AM

#

i just want 1 row with data as follows: 1,2,3,4,5,6,7,8,9 in a simple array

untold aspen Jul 27, 2020, 9:19 AM

#

oh then make an empty list

#

open(csv.reader(file))

frank bone Jul 27, 2020, 9:19 AM

#

and currently i get array = [[1,2,3,4,5,6,7,8,9]]

#

dont want list within list

untold aspen Jul 27, 2020, 9:20 AM

#

for row in that csv.reader object: list.append(row)

#

it's inspired from the first example on the csv doc

#

https://docs.python.org/3/library/csv.html

frank bone Jul 27, 2020, 9:23 AM

#

worked thank you 🙂

untold aspen Jul 27, 2020, 9:27 AM

#

np

frank bone Jul 27, 2020, 9:27 AM

#

@untold aspen is it possible to get a specific subset of such an array without calling it with elements number like [0:5] but with ["string1":"string2"] doesnt work

untold aspen Jul 27, 2020, 9:28 AM

#

oh so you don't wanna use indices? smth like that?

frank bone Jul 27, 2020, 9:28 AM

#

yeah call by elements, list is filled with dates

#

and then i want to get a list of dates from date1 to date2

untold aspen Jul 27, 2020, 9:29 AM

#

use datetime lib

frank bone Jul 27, 2020, 9:29 AM

#

currently my dates are strings and not datetimes

#

does such a thing work with strings?

untold aspen Jul 27, 2020, 9:29 AM

#

what format is it in

frank bone Jul 27, 2020, 9:30 AM

#

"yyyy-mm-dd"

untold aspen Jul 27, 2020, 9:30 AM

#

strptime() from datetime is how to convert it to datetime obj

#

basically pass in a placeholder kind of string indicating your datetime format, and the string itself

frank bone Jul 27, 2020, 9:35 AM

#

list_dates.extend(datetime.strptime(str(row), '%Y-%m-%d'))

#

.......,'2019-02-22']" does not match format '%Y-%m-%d'

#

what did i do wrong? 😄

untold aspen Jul 27, 2020, 9:38 AM

#

what's the error raised?

frank bone Jul 27, 2020, 9:39 AM

#

raise ValueError ("time data %r does not match format %r

#

row is like this

#

row = ['yyyy-mm-dd', 'yyyy-mm-dd', 'yyyy-mm-dd', etc]

untold aspen Jul 27, 2020, 9:45 AM

#

try making the format string raw

#

put r infront

frank bone Jul 27, 2020, 9:49 AM

#

figured it out works like this list_dates.extend(datetime.strptime(str(row), '%Y-%m-%d').date() for row in row)

untold aspen Jul 27, 2020, 9:49 AM

#

nice

silk knot Jul 27, 2020, 9:49 AM

#

ValueError: The number of observations cannot be determined on an empty distance matrix. I can't figure out why this ValueError comes up.

#

from pyteomics import mzxml, auxiliary
from os import listdir
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn import preprocessing
from scipy.cluster import hierarchy
import scipy
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.cluster.hierarchy import fcluster
from scipy.cluster.hierarchy import cophenet

import plotly.figure_factory as ff



vert = {}
columns_1 = []

fileNum = 1
for file in listdir("./mzxml_files"):
  columns_1.append(file)
  lijst = []
  with mzxml.read(f"./mzxml_files/{file}/1/1SLin/analysis.mzxml") as reader:
    lijst.append(next(reader))
  # print(list[0]["intensity array"])
  i = 0
  for number in lijst[0]["m/z array"]:
    if number not in vert:
      vert[number] = []
    vert[number].append(lijst[0]["intensity array"][i])
    i+=1
  for item in vert:
    if len(vert[item]) != fileNum:
      vert[item].append(0)
  fileNum+=1

rows = []
data = []


for item in vert:
  rows.append(item)
  data.append(vert[item])

f = open("output.txt", "w")

f.write("\t\t   " + " ".join(columns_1) + "\n")

for i in range(0, len(rows)):
    f.write(f"{rows[i]} : {data[i]} \n")

data = pd.DataFrame.from_dict(dict(zip(columns_1, data)))
print(data.head())

X = data.loc[:'0:36']

fig = ff.create_dendrogram(X)
fig.update_layout(width=800, height=500)
fig.show()

frank bone Jul 27, 2020, 9:50 AM

#

nice
@untold aspen now that i have a datetime array, how do i go about specifying a range to print?

#

ie. date1 to (and including) date2

#

as easy as list[datetime1:datetime2]?

untold aspen Jul 27, 2020, 9:53 AM

#

i think you should make them into timestamps and then order it from there

frank bone Jul 27, 2020, 9:55 AM

#

and those are treated differently than strings?

#

when calling

untold aspen Jul 27, 2020, 9:56 AM

#

they're just strings

#

timestamp is an attribute of a datetime object

frank bone Jul 27, 2020, 9:57 AM

#

so in the end the timestamps will work when trying to call a range within list?

#

list[timestamp1:timestamp2]?

untold aspen Jul 27, 2020, 9:59 AM

#

no im just suggesting that you use timestamps to try ordering the dates

#

by convention lists uses indices

frank bone Jul 27, 2020, 10:00 AM

#

im trying to avoid calling by index like [0:5]

untold aspen Jul 27, 2020, 10:00 AM

#

probably make a dictionary?

frank bone Jul 27, 2020, 10:00 AM

#

since the indices are dynamically changing all the time

untold aspen Jul 27, 2020, 10:02 AM

#

make the key = datetime string that has keys values = index

#

that way if you call dict["DATETIME STRING"] = index

#

so i guess we can do list[dict["datetime_string_1"]:dict["datetime_string_2"]]

frank bone Jul 27, 2020, 10:04 AM

#

ill try that

untold aspen Jul 27, 2020, 10:04 AM

#

what do you mean by dynamically changing indices? you're gonna add more dates on?

frank bone Jul 27, 2020, 10:04 AM

#

yes

#

its streaming data

#

in the end

untold aspen Jul 27, 2020, 10:05 AM

#

well i thought if you append then indices don't change?

frank bone Jul 27, 2020, 10:05 AM

#

and im using the datetime range to train a sliding window for isolation forest

untold aspen Jul 27, 2020, 10:06 AM

#

indices shouldn't change if you append more to list

#

unless you're adding something inbetween existing ones

#

in that case use .index('datetime_string')

#

on your list of streaming data

#

that will search through your list for that datetime and return the index

#

so you don't have to worry about indices changing

acoustic halo Jul 27, 2020, 10:10 AM

#

Whats the best way to ensemble neural nets? Concatenate them at the last hidden layer before softmax or taking their individual softmax outputs and weighting them?

frank bone Jul 27, 2020, 10:26 AM

#

in that case use .index('datetime_string')
@untold aspen we still talking about dict here? having difficulty getting a range within dict printed

untold aspen Jul 27, 2020, 10:26 AM

#

.index() is applied on your list of datetime strings

frank bone Jul 27, 2020, 10:27 AM

#

yes but what the data type? pd np array, dict list?

untold aspen Jul 27, 2020, 10:27 AM

#

list

frank bone Jul 27, 2020, 10:28 AM

#

so your idea is to create just a list with my data as index then print index range?

#

to get range

#

sorry if i got it completely wrong, im new 🙂

#

still trying with dict rn

#

so i guess we can do list[dict["datetime_string_1"]:dict["datetime_string_2"]]
@untold aspen i got a dict with key=dates and im doing print(list_dates['dates'["yyyy-mm-dd"]:'dates'["yyyy-mm-dd"]]) but that doesnt work

untold aspen Jul 27, 2020, 10:50 AM

#

ok so basically the dictionary has keys = string of datetimes and the corresponding values = index of that datetime string on the list_dates

silk forge Jul 27, 2020, 10:50 AM

#

i need help with sklearn.compose.ColumnTransformer()

#

encoder = make_column_transformer(
    (LabelEncoder(),["Embarked"])
    #(LabelEncoder(),["Sex"])

,remainder="passthrough")

newtrainx = pd.DataFrame(imputer.fit_transform(trainx),columns=train_data.drop("Survived",axis=1).columns)
print(encoder.fit_transform(newtrainx))

untold aspen Jul 27, 2020, 10:51 AM

#

to get that index you use list_dates.index("string you wanted to find")

#

@frank bone

silk forge Jul 27, 2020, 10:51 AM

#

my LabelEncoder() isn't working when i put it inside the column transformer

#

TypeError: fit_transform() takes 2 positional arguments but 3 were given

untold aspen Jul 27, 2020, 10:52 AM

#

@untold aspen i got a dict with key=dates and im doing print(list_dates['dates'["yyyy-mm-dd"]:'dates'["yyyy-mm-dd"]]) but that doesnt work
@frank bone 'dates' is interpreted as a string and not a dict

frank bone Jul 27, 2020, 10:55 AM

#

when i write dates it days name dates is not defined

elder vault Jul 27, 2020, 10:56 AM

#

I'm trying to send a simple search request to ES but I'm getting all these errors: https://bpa.st/Z52Q
and each block of errors ends with ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123) what am i doing wrong?

untold aspen Jul 27, 2020, 11:03 AM

#

@frank bone you create a defaultdict() that allows you to add more key value pairs

dict_date_index = defaultdict()
for datetime_string in list_dates:
    dict_date_index[datetime_string] = list_dates.index(datetime_string)

frank bone Jul 27, 2020, 11:03 AM

#

ill just do it with pandas dataframe, setting dates as index and column 0 at same time, then using tolist() into a variable

untold aspen Jul 27, 2020, 11:03 AM

#

yea or that

frank bone Jul 27, 2020, 11:03 AM

#

seems easier rn 😄

untold aspen Jul 27, 2020, 11:04 AM

#

definitely

#

you have to sort them tho

#

if it was not sorted

frank bone Jul 27, 2020, 11:04 AM

#

my datelist.csv is alrdy sorted

#

so its no prob

untold aspen Jul 27, 2020, 11:04 AM

#

ok then that's good

frank bone Jul 27, 2020, 11:04 AM

#

thanks for the help

untold aspen Jul 27, 2020, 11:04 AM

#

np

#

encoder = make_column_transformer(
    (LabelEncoder(),["Embarked"])
    #(LabelEncoder(),["Sex"])

,remainder="passthrough")

newtrainx = pd.DataFrame(imputer.fit_transform(trainx),columns=train_data.drop("Survived",axis=1).columns)
print(encoder.fit_transform(newtrainx))

@silk forge shouldn't newtrainx be an np array?

silk forge Jul 27, 2020, 11:10 AM

#

it should work with dataframe too

untold aspen Jul 27, 2020, 11:10 AM

#

that's what it is in the ColumnTransformers doc

#

try making it an np array

#

it prob got interpreted as two args in the dataframe

elder vault Jul 27, 2020, 11:32 AM

#

I'm trying to send a simple search request to ES but I'm getting all these errors: https://bpa.st/Z52Q
and each block of errors ends with ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123) what am i doing wrong?
@elder vault anyone?

arctic cliff Jul 27, 2020, 11:55 AM

#

I'm trying to get the name of the city that has the most population

📎 unknown.png

untold aspen Jul 27, 2020, 11:56 AM

#

I'm trying to get the name of the city that has the most population
@arctic cliff try a simple for loop for if population = max then return the name

arctic cliff Jul 27, 2020, 11:57 AM

#

Oh

#

Thanks !

#

df['city'][df['population_total'].argmax()]

#

What if I wanna get the whole row ?

#

@arctic cliff try a simple for loop for if population = max then return the name
@untold aspen Oh

#

This helps too !

untold aspen Jul 27, 2020, 11:59 AM

#

df['city'][df['population_total'].argmax()]
@arctic cliff that's more succinct

arctic cliff Jul 27, 2020, 12:00 PM

#

Can't I specify the whole row instead of city ?

#

I tried to type : but It didn't work :/

#

Oh

#

Got it

#

iloc

broken sparrow Jul 27, 2020, 12:01 PM

#

Can somebody please help me with this? I don't understand the error I'm making.

📎 JPEG_20200727_173131.jpg

marble jasper Jul 27, 2020, 12:02 PM

#

your error is in the line above

#

possibly unmatched brackets

#

I can't see the rest of that line however, but check your brackets are corret at the end of the line

broken sparrow Jul 27, 2020, 12:04 PM

#

Yup it was the brackets 😅
Thank you so much @marble jasper

lapis sequoia Jul 27, 2020, 12:06 PM

#

also pd.Index isnt necessary there

#

just pass the list

broken sparrow Jul 27, 2020, 12:09 PM

#

@lapis sequoia I have zero background in CS and I'm learning online😅
Do I just omit pd.index or give some other command instead?

lapis sequoia Jul 27, 2020, 12:10 PM

#

Just omit and pass the list as argument

broken sparrow Jul 27, 2020, 12:12 PM

#

Thank you!

visual violet Jul 27, 2020, 12:54 PM

#

are you guys data scientist?

lapis sequoia Jul 27, 2020, 1:01 PM

#

I'm a second year undergrad lmao

visual violet Jul 27, 2020, 1:01 PM

#

wow nice

lapis sequoia Jul 27, 2020, 1:03 PM

#

I'm trying to print out the cute_name for the profile

And here is the error:

Exception has occurred: TypeError
list indices must be integers or slices, not str

And here is the code:

@skyblock.command()
    async def profiles(self, ctx, *, uuid: str):
        await ctx.message.delete()
        url = f"https://api.hypixel.net/skyblock/profiles?key={config['hypixelapi']}&uuid={uuid}"
        async with request("GET", url) as response:
            if response.status == 200:
                data = await response.json()
                profiles = data["profiles"]
                cute_name = profiles["cute_name"]

                for profile in cute_name:
                    print(profile)

            elif response.status == 400:
                print(f"API Returned {response.status}\nBad Request 400")
            else:
                print(f"API Returned {response.status} status.")

I feel like an example is needed for this but I'm not sure so I will show you the date I'm trying to get my hands on:
DESCRIPTION: under success you see profiles and that is actually a list not a dict.
https://imgur.com/a/GqlKc9E

Imgur

#

profiles is a list i guess, you're indexing it with a string

#

it's not clear what you're aiming for here but you can try type conversion of profile if you expect it to be a dictionary

frank bone Jul 27, 2020, 1:44 PM

#

anyone familiar with isolation forest? i cant figure out how to get an anomaly score for a single row/colum value instead of the whole dataframe

#

trying to save some computations..

#

on streaming data i only want the score for the current day, not the whole data frame in the past

uncut shadow Jul 27, 2020, 2:06 PM

#

@lapis sequoia like a user above said, profiles is probably a list. Also, you'd probably get better answer if u asked your question in #discord-bots or #networks (both can technically be good, but not data science lol)

earnest wadi Jul 27, 2020, 2:11 PM

#

         [[node sequential/embedding/embedding_lookup (defined at c:/Users/Silv3/OneDrive/Desktop/datasetup/datasetup/hillbilly.py:71) ]] [Op:__inference_train_function_995]

Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
 sequential/embedding/embedding_lookup/691 (defined at C:\Python38\lib\contextlib.py:113)

Function call stack:
train_function```

not sure whats going on here

untold aspen Jul 27, 2020, 2:19 PM

#

@earnest wadi you probably specified your embedding layer to have not enough input dim to cover the actual input

#

i read that from here

earnest wadi Jul 27, 2020, 2:20 PM

#

alright that what i read online

#

but thats jiberish to me

#

can you simplify it?

#

becuase i dont really ujderstand

untold aspen Jul 27, 2020, 2:20 PM

#

you probably did a sequential model in keras

#

with an embedding layer first right?

earnest wadi Jul 27, 2020, 2:21 PM

#

model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(32, activation=tf.nn.relu))
model.add(keras.layers.Dense(64, activation=tf.nn.relu))
model.add(keras.layers.Dense(32, activation=tf.nn.relu))
model.add(keras.layers.Dense(33, activation=tf.nn.sigmoid))

#

thats my model

#

do I need to change any order or values @untold aspen

untold aspen Jul 27, 2020, 2:22 PM

#

right so embedding layer takes on 3 non-default values

#

input dim, output dim, and input length

earnest wadi Jul 27, 2020, 2:22 PM

#

okay

untold aspen Jul 27, 2020, 2:23 PM

#

wait no sry

earnest wadi Jul 27, 2020, 2:23 PM

#

does dim mean dimension, like shape?

#

oh

untold aspen Jul 27, 2020, 2:23 PM

#

input length is None by default

earnest wadi Jul 27, 2020, 2:23 PM

#

okay

untold aspen Jul 27, 2020, 2:23 PM

#

what's vocab_size = ?

#

i want to understand your input data dims as well

earnest wadi Jul 27, 2020, 2:24 PM

#

835

#

as it says in the error

#

the dims are

#

(100, 33) for train data, labels and test data, labels

untold aspen Jul 27, 2020, 2:25 PM

#

so 100 training samples and 33 testing samples?

earnest wadi Jul 27, 2020, 2:25 PM

#

shouldnt be

#

there is 100 samples over all

#

im sure i split them evenly

untold aspen Jul 27, 2020, 2:25 PM

#

then what's that 33

earnest wadi Jul 27, 2020, 2:25 PM

#

¯_(ツ)_/¯

#

I havent mentioned 33 of anything anywhere

#

I honestly

#

have no clue

untold aspen Jul 27, 2020, 2:26 PM

#

(100, 33) for train data, labels and test data, labels
@earnest wadi

earnest wadi Jul 27, 2020, 2:26 PM

#

yeah i know

acoustic halo Jul 27, 2020, 2:26 PM

#

You didn't specify input dimension in the first layer

earnest wadi Jul 27, 2020, 2:26 PM

#

in my code

#

wdym spagoose

acoustic halo Jul 27, 2020, 2:26 PM

#

Fir example this is my first layer in an old model

#

model.add(Embedding(2000, 24, input_length=1000))

#

Where the input is a sequence of 1000 words

earnest wadi Jul 27, 2020, 2:27 PM

#

mine is a sequence of 100 sentances

#

or

#

250

untold aspen Jul 27, 2020, 2:27 PM

#

doesn't python ignore the input length if you don't specify names?

earnest wadi Jul 27, 2020, 2:27 PM

#

letters

untold aspen Jul 27, 2020, 2:27 PM

#

default is none

#

well best practice is to specify names

earnest wadi Jul 27, 2020, 2:28 PM

#

so what do I need to do / change

acoustic halo Jul 27, 2020, 2:29 PM

#

Embedding needs it if you plan on using dense layers on top

earnest wadi Jul 27, 2020, 2:29 PM

#

okay

#

so what value should it be set to?

#

my inoput length?

#

250?

acoustic halo Jul 27, 2020, 2:29 PM

#

yeah

earnest wadi Jul 27, 2020, 2:29 PM

#

okay

#

input_length=250

#

still same issue

acoustic halo Jul 27, 2020, 2:31 PM

#

Have you already converted the text to numerical values?

earnest wadi Jul 27, 2020, 2:31 PM

#

yes

acoustic halo Jul 27, 2020, 2:31 PM

#

You probably have a value that is higher than 835

earnest wadi Jul 27, 2020, 2:31 PM

#

yes i do

acoustic halo Jul 27, 2020, 2:31 PM

#

somewhere in the data

untold aspen Jul 27, 2020, 2:31 PM

#

try that value in the error

earnest wadi Jul 27, 2020, 2:31 PM

#

ooooooooooh

#

i know

untold aspen Jul 27, 2020, 2:32 PM

#

972

earnest wadi Jul 27, 2020, 2:32 PM

#

whats happpened

#

835 is the length of my word index but it reaches up to 2435

acoustic halo Jul 27, 2020, 2:32 PM

#

It is trying to look up 972 but there are only 835 possible values

untold aspen Jul 27, 2020, 2:32 PM

#

ok try that then

earnest wadi Jul 27, 2020, 2:32 PM

#

hang on

#

i got to epoch 10/10 now

#

but still error

arctic wedgeBOT Jul 27, 2020, 2:34 PM

#

Hey @earnest wadi!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

untold aspen Jul 27, 2020, 2:34 PM

#

how many unique vocabs are in your sentences?

earnest wadi Jul 27, 2020, 2:34 PM

#

got it

#

its done

untold aspen Jul 27, 2020, 2:34 PM

#

what did you do

earnest wadi Jul 27, 2020, 2:34 PM

#

it had to be 2346

#

not 2345

untold aspen Jul 27, 2020, 2:35 PM

#

ok prob vocabs then

earnest wadi Jul 27, 2020, 2:37 PM

#

why is my loss so frikin low -6956908.0000

#

i havent optimized the dataset yet, so I know why the rest isnt finished

#

but thats really low

untold aspen Jul 27, 2020, 2:38 PM

#

what loss are you using

earnest wadi Jul 27, 2020, 2:38 PM

#

binary cross

acoustic halo Jul 27, 2020, 2:43 PM

#

How many different labels do you have? It is usually when you have more than 2 labels by mistake

untold aspen Jul 27, 2020, 2:45 PM

#

wait why is binary cross entropy negative

#

something screwed up here

acoustic halo Jul 27, 2020, 2:46 PM

#

Probably because the labels are not 0 or 1

#

It goes weird if you use other values

untold aspen Jul 27, 2020, 2:47 PM

#

oh it's probably cuz of the 1-y term

#

so if it's y=2 it goes neg

#

@earnest wadi you should also do 0 and 1 for next time

earnest wadi Jul 27, 2020, 2:48 PM

#

my labels are 0 and 1

#

here

untold aspen Jul 27, 2020, 2:49 PM

#

ok now that's weird

earnest wadi Jul 27, 2020, 2:49 PM

#

sike nvm, my net just pranked me

#

uh

#

i got this...

[[   4.    5.    6. ...    0.    0.    0.]
 [   4.    5.   14. ...    0.    0.    0.]
 [  22.   23.    5. ...    0.    0.    0.]
 ...
 [1080.   32.    5. ...    0.    0.    0.]
 [  89.    5.   25. ...    0.    0.    0.]
 [ 448.   59.   76. ...    0.    0.    0.]]```

#

it shouldnt be that

#

uh

#

what

untold aspen Jul 27, 2020, 2:51 PM

#

yea i saw ppl said they got big negative loss using BCE

#

it's something with not labelling your target as 0 or 1 for binary classif problem

#

depends on your task

earnest wadi Jul 27, 2020, 2:55 PM

#

my labels are = data

#

have I done this right:

#

train_data, train_labels, test_data, test_labels = ds.import_data("Pickup Lines - Insults")

#

return train_data, train_labels, test_data, test_labels

#

is that correct? because before I return the data, they all work

untold aspen Jul 27, 2020, 2:56 PM

#

what is that function ds.import_data

#

ill assume they're fine

earnest wadi Jul 27, 2020, 2:56 PM

#

nvm found the probme again

#

loads of me being dubass

#

dumbass

autumn veldt Jul 27, 2020, 2:56 PM

#

Excuse me sir, do u know this problem? im tryin to train some dataset with method GLCM + SVM using colab. why i keep getting the same accuracy when i try to run on 5 epoch?

📎 dc3.PNG

earnest wadi Jul 27, 2020, 2:57 PM

#

0 errors now, but this, clearly isnt normal

4/4 [==============================] - 0s 1ms/step - loss: 0.6922 - acc: 0.0000e+00
Epoch 2/10
4/4 [==============================] - 0s 1ms/step - loss: 0.6893 - acc: 0.0000e+00
Epoch 3/10
4/4 [==============================] - 0s 1000us/step - loss: 0.6860 - acc: 0.0000e+00
Epoch 4/10
4/4 [==============================] - 0s 2ms/step - loss: 0.6822 - acc: 0.0000e+00
Epoch 5/10
4/4 [==============================] - 0s 1ms/step - loss: 0.6777 - acc: 0.0000e+00
Epoch 6/10
4/4 [==============================] - 0s 1ms/step - loss: 0.6724 - acc: 0.0000e+00
Epoch 7/10
4/4 [==============================] - 0s 1ms/step - loss: 0.6662 - acc: 0.0000e+00
Epoch 8/10
4/4 [==============================] - 0s 750us/step - loss: 0.6589 - acc: 0.0000e+00
Epoch 9/10
4/4 [==============================] - 0s 1ms/step - loss: 0.6504 - acc: 0.0000e+00
Epoch 10/10
4/4 [==============================] - 0s 750us/step - loss: 0.6405 - acc: 0.0000e+00
4/4 [==============================] - 0s 750us/step - loss: 0.6326 - acc: 0.0000e+00```

#

is it because of the super small dataset?

untold aspen Jul 27, 2020, 2:58 PM

#

100 samples

earnest wadi Jul 27, 2020, 2:58 PM

#

yeah

#

ill get to a bigger one now

#

just was using this to test

untold aspen Jul 27, 2020, 2:58 PM

#

with a fairly deep net

earnest wadi Jul 27, 2020, 2:58 PM

#

shall I reduce the nurons?

#

neurons*

untold aspen Jul 27, 2020, 2:58 PM

#

i think it should be in the thousands

earnest wadi Jul 27, 2020, 2:58 PM

#

yeah

#

alr

acoustic halo Jul 27, 2020, 2:59 PM

#

@autumn veldt You call .fit() every epoch which resets the model

untold aspen Jul 27, 2020, 2:59 PM

#

im more concerned with the 0% accuracy

earnest wadi Jul 27, 2020, 2:59 PM

#

yeah

acoustic halo Jul 27, 2020, 2:59 PM

#

Plus you don't need to run it multiple times in a loop like tha

autumn veldt Jul 27, 2020, 3:00 PM

#

@acoustic halo so what should i do? should i put fit() on top before def run_test?, im just tryin to get some accuracy with 5 epoch

untold aspen Jul 27, 2020, 3:00 PM

#

@earnest wadi try using a bigger dataset for this or remove some dense layers

earnest wadi Jul 27, 2020, 3:00 PM

#

there is only on dl now

#

still 0%

#

ill find a bigger one

#

i really wanted to do a dad joke generate

#

whats the best way to find a huge set od dad jokes

untold aspen Jul 27, 2020, 3:01 PM

#

scrape the web

#

boi

#

bunch of them on everywhere

earnest wadi Jul 27, 2020, 3:01 PM

#

most of the internet searches came wit the same small 100 set

#

alr

untold aspen Jul 27, 2020, 3:01 PM

#

reddit is nice

earnest wadi Jul 27, 2020, 3:01 PM

#

ill just get as many

#

oh yeah

#

untold aspen Jul 27, 2020, 3:01 PM

#

source of dankness

earnest wadi Jul 27, 2020, 3:01 PM

#

haha

untold aspen Jul 27, 2020, 3:02 PM

#

but yea web scrape for them

acoustic halo Jul 27, 2020, 3:02 PM

#

@autumn veldt You don't use epochs at all, the model will iterate until it is finished, the most you can do is specify the max iterations, but that would likely get you a worse accuracy

untold aspen Jul 27, 2020, 3:02 PM

#

@autumn veldt you can specify the epoch in there

#

assume you're using TF

#

or not

acoustic halo Jul 27, 2020, 3:03 PM

#

it's an sklearn model so no

autumn veldt Jul 27, 2020, 3:03 PM

#

yea, im using sklrean not TF

untold aspen Jul 27, 2020, 3:03 PM

#

ok then it will stop

#

once it converges

#

sorry got NN vibes

autumn veldt Jul 27, 2020, 3:05 PM

#

@acoustic halo sorry sir, but where and how can i put that 'max iterations'?

#

when i want to get more than 1 accuracy

acoustic halo Jul 27, 2020, 3:06 PM

#

YOu can't get more than 1 accuracy

#

1 is 100%

#

but its svm.SVC(maxiter=n) I think

#

Also you will get worse accuracy if you use it because it won't run as many, by default it rins infinite times until convergance

autumn veldt Jul 27, 2020, 3:08 PM

#

uhm... do u have any link that i can learn about this one sir?

acoustic halo Jul 27, 2020, 3:08 PM

#

Which bit?

autumn veldt Jul 27, 2020, 3:08 PM

#

im kinda dont understand

desert oar Jul 27, 2020, 3:08 PM

#

Read the scikit learn user guide

autumn veldt Jul 27, 2020, 3:09 PM

#

about testing running on dataset until i can get the most accurate of accuracy score

acoustic halo Jul 27, 2020, 3:10 PM

#

Okay, here's the simple low-down on sklearn models: You don't use epochs like neural nets, they will run as many as needed to get the the best results with the given parameters, so you only need to use fit once

untold aspen Jul 27, 2020, 3:11 PM

#

SVM stops when it gets the best result

autumn veldt Jul 27, 2020, 3:11 PM

#

okay

untold aspen Jul 27, 2020, 3:12 PM

#

so yea in your code remove that for loop

still delta Jul 27, 2020, 3:22 PM

#

Hello guys I am junior in AI and I am seek for a group of people to work with them

earnest wadi Jul 27, 2020, 3:30 PM

#

@untold aspen how can I print a vlue in a for loop every 10 iters?

desert oar Jul 27, 2020, 3:31 PM

#

Dont tag random people who happen to seem knowledgeable. It's intrusive to the person

earnest wadi Jul 27, 2020, 3:31 PM

#

its just we where talking like a min ago

#

ok

desert oar Jul 27, 2020, 3:31 PM

#

Ok i missed that

earnest wadi Jul 27, 2020, 3:31 PM

#

well

#

longer

desert oar Jul 27, 2020, 3:31 PM

#

My apologies

earnest wadi Jul 27, 2020, 3:31 PM

#

lol dw

#

mine too

desert oar Jul 27, 2020, 3:32 PM

#

To answer your question, use the enumerate function

#

Then you can get the iteration number

untold aspen Jul 27, 2020, 3:32 PM

#

print a value for every 10 iteration?

earnest wadi Jul 27, 2020, 3:32 PM

#

yes

untold aspen Jul 27, 2020, 3:32 PM

#

say in a range of 100

earnest wadi Jul 27, 2020, 3:32 PM

#

got that

#


for submission in ye:
    text = submission.title.rstrip('\n') + " " + submission.selftext.rstrip('\n')
    if "\n" not in text:
        f.write(text+"\n")
f.close()

#

but i have an int

untold aspen Jul 27, 2020, 3:33 PM

#

for i in range(100):
(something here)
if i % 10 == 0:
(print smth)

earnest wadi Jul 27, 2020, 3:33 PM

#

ok

#

thx

untold aspen Jul 27, 2020, 3:33 PM

#

yea use the divisibility of index

#

actually it should be range(1,101)

#

this will make a list of [1,2,...,99,100]

earnest wadi Jul 27, 2020, 3:34 PM

#

dont worry

#

i got it workin

untold aspen Jul 27, 2020, 3:34 PM

#

nice

desert oar Jul 27, 2020, 3:35 PM

#

for i, submission in enumerate(ye):
    # do things
    if i % 10 == 0:
        print(i)

earnest wadi Jul 27, 2020, 3:36 PM

#

snokpok, you think 10 000 samples will be enough?

untold aspen Jul 27, 2020, 3:36 PM

#

that's pretty good

earnest wadi Jul 27, 2020, 3:36 PM

#

ill try that

untold aspen Jul 27, 2020, 3:36 PM

#

make sure they're individually fine tho

earnest wadi Jul 27, 2020, 3:36 PM

#

yes I know

#

i have a package for text cleanup

#

that i made spciffially for datasets

#

so i can use that

untold aspen Jul 27, 2020, 3:36 PM

#

but yea that's really enough for training

earnest wadi Jul 27, 2020, 3:37 PM

#

ok good

#

ill get back later

#

oh

#

it only returned 1000 samples

#

from reddit

frank bone Jul 27, 2020, 3:46 PM

#

is it possible to pass variables between 2 functions multiple times?

#

like FunctionA > FunctionB > FunctionA

#

at the end of FunctionB im doing "return variable" but how do i take that up in the 3rd step?

acoustic halo Jul 27, 2020, 3:48 PM

#

value = FunctionB(parameter)

frank bone Jul 27, 2020, 3:48 PM

#

and its a new variable, that gets created in FunctionB, not defined as a FunctionA argument

marsh berry Jul 27, 2020, 3:49 PM

#

Without me having to manually enter it for each line that is

frank bone Jul 27, 2020, 3:51 PM

#

value = FunctionB(parameter)
@acoustic halo thanks a ton, worked like a charm, sory for noob questions 😄

untold aspen Jul 27, 2020, 3:55 PM

#

it only returned 1000 samples
@earnest wadi i think 1000 is manageable if you scale your architecture down

acoustic halo Jul 27, 2020, 3:55 PM

#

@marsh berry Some kind of iterable? You would still have to define a bunch of colours, but you would only have to do it one time and just call the iterable in the future

earnest wadi Jul 27, 2020, 3:56 PM

#

@untold aspen nah jnust gonna leave it running for 24 hours to collect all the new ones as they release

marsh berry Jul 27, 2020, 3:57 PM

#

@acoustic halo Should I just throw it in a list and iterate over it?

#

Is that the best way to do it?

acoustic halo Jul 27, 2020, 3:58 PM

#

I would, I remember doing something similar in the past

marsh berry Jul 27, 2020, 4:03 PM

#

@acoustic halo Ok I will definitely do that then. Seems the easiest. Do you know if its possible to create a gradient of colors?

acoustic halo Jul 27, 2020, 4:05 PM

#

As in the line is a gradient or to select the colours for your list?

marsh berry Jul 27, 2020, 4:06 PM

#

Like this basically

#

📎 5v9yY.png

acoustic halo Jul 27, 2020, 4:07 PM

#

I would assume so but no idea how

quasi pivot Jul 27, 2020, 4:21 PM

#

Hi I'm new to this discord, I was wondering if this was the appropriate area to ask about my code and how I can make variables that one function returns get called by another function that will use those called variables. I can move elsewhere or show my code if that helps. Thanks!

marsh berry Jul 27, 2020, 4:27 PM

#

@quasi pivot Maybe try general

quasi pivot Jul 27, 2020, 4:27 PM

#

Ok, thanks!

frank bone Jul 27, 2020, 4:58 PM

#

what am i doing wrong? It triggers 1 day after the actual trigger day

model.fit(data[[ticker]].loc[start_date:trade_date_minus])
data.loc[start_date:trade_date, 'Scores']=model.decision_function(data[[ticker]].loc[start_date:trade_date])
data.loc[start_date:trade_date, 'Anomaly']=model.predict(data[[ticker]].loc[start_date:trade_date])
anomalyscore = data['Anomaly].loc[trade_date]
scorevalue = data['Scores'].loc[trade_date]
if anomalyscore == -1 and scorevalue < -0.1:
  print("Triggered on", trade_date)```

#

when I change trade_date_minus to trade_date, it triggers on correct day but I don't understand it. Why would the model fitting function influence it? I purposely don't want to model the actual trigger date, as to not increase "scorevalue"

solemn atlas Jul 27, 2020, 5:00 PM

#

Hello I wnt to get started with Machine learning

#

Do I need to have solid grasp on calculus 😅

acoustic halo Jul 27, 2020, 5:02 PM

#

Not really, there are plenty of libraries like sklearn and keras where you just chuck data into a model and watch it work

solemn atlas Jul 27, 2020, 5:03 PM

#

Ooo I wnt to build my career as ai programmer

acoustic halo Jul 27, 2020, 5:03 PM

#

You should probably understand how it works though if you want a career in it though

untold aspen Jul 27, 2020, 5:03 PM

#

@solemn atlas well you need to understand calculus yea

solemn atlas Jul 27, 2020, 5:04 PM

#

Of which lvl

untold aspen Jul 27, 2020, 5:04 PM

#

AI researchers are much about math

solemn atlas Jul 27, 2020, 5:04 PM

#

What all maths area I need to focus on

untold aspen Jul 27, 2020, 5:04 PM

#

for beginning u needs to get to basic multivariate

#

multivariate calc is most of the basic concepts like backprop, loss min

solemn atlas Jul 27, 2020, 5:05 PM

#

Ok I will search these things on yt and try to get solid grasp on these

untold aspen Jul 27, 2020, 5:05 PM

#

gl

solemn atlas Jul 27, 2020, 5:05 PM

#

Ty buddy

frank bone Jul 27, 2020, 5:07 PM

#

when I change trade_date_minus to trade_date, it triggers on correct day but I don't understand it. Why would the model fitting function influence it? I purposely don't want to model the actual trigger date, as to not increase "scorevalue"
@frank bone any clue anyone?

lapis sequoia Jul 27, 2020, 6:00 PM

#

Guys I'm trying to make a data science discord community for things like kaggle kernel discussions and research paper reading

#

If youre interested DM me for link

spark stag Jul 27, 2020, 6:05 PM

#

!rule 6 @lapis sequoia please don't advertise your own server

arctic wedgeBOT Jul 27, 2020, 6:05 PM

#

Rules

6. No spamming or unapproved advertising, including requests for paid work. Open-source projects can be showcased in #show-your-projects.

lapis sequoia Jul 27, 2020, 6:06 PM

#

okay

frank bone Jul 27, 2020, 6:38 PM

#

when I call FunctionB from FunctionA and FunctionB calls FunctionC, and then FunctionC should return a variable but in extreme cases it can't, how would I go about continuing in FunctionA where I left off?

#

is there something like a goto in python?

marble jasper Jul 27, 2020, 6:40 PM

#

look into exceptions

pale thunder Jul 27, 2020, 6:40 PM

#

ye, either an exception or return some special value

frank bone Jul 27, 2020, 6:41 PM

#

maybe its worth noting that all functions are involved in one big iteration

#

right now im doing if .... = False -> break in FunctionC, which leads me back to FunctionB where it throws an error

#

because of missing variable

#

looking into exception

frank bone Jul 27, 2020, 7:02 PM

#

alright got it working by returning some 0 values

marble jasper Jul 27, 2020, 7:07 PM

#

the purpose of an exception is to provide another route back out, letting you exit all the middle functions without having to explicitly handle a return value

#

so in FunctionC, you would:

if ... == False:
  raise SomeException("uh oh")

in FunctionB you don't need to put anything, it'll just skip right through, until it hits FunctionA where you have a try/except block:

try:
  FunctionB(...) # run function B here
except SomeException as err:
  print("you did a naughty")

#

where SomeException is either a valid built-in exception, or one that you made yourself

#

@frank bone

#

the whole point of exceptions is to avoid having to deal with special return values indicating errors

frank bone Jul 27, 2020, 7:10 PM

#

oh i see! thanks for the explanation. never used this before so i wasnt sure how to do this

marble jasper Jul 27, 2020, 7:10 PM

#

it solves exactly your problem, which is why I suggested it

frank bone Jul 27, 2020, 7:11 PM

#

was more familiar with returns

#

still a noob 🙂

modern canyon Jul 27, 2020, 10:15 PM

#

@void anvil https://stackoverflow.com/a/31257931/13993951

Stack Overflow

Pandas error - invalid value encountered

I'm new to Pandas. I downloaded and installed Anaconda. Then I tried running the following code via the Spyder app:

import pandas as pd
import numpy as np

train = pd.read_csv('/Users/Ben/Documents/

marble jasper Jul 27, 2020, 10:56 PM

#

my nan was nice to me, I'm sorry you don't have any

drifting umbra Jul 28, 2020, 1:03 AM

#

wondering if anyone can help decipher this Tensorflow error?

#

I am trying to use TPUs on Google Colab

#

RuntimeError: apply_gradients() cannot be called in cross-replica context. Use tf.distribute.Strategy.experimental_run_v2` to enter replica context.

#

📎 unknown.png

#

i tried basing my code on this tutorial-
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/classification_iris_data_with_keras.ipynb#scrollTo=0tcdTWw1KLiF

Google Colaboratory

#

but i am doing a regression problem, not classification

desert oar Jul 28, 2020, 1:42 AM

#

@void anvil over/under flow

untold aspen Jul 28, 2020, 2:33 AM

#

ill just do it with pandas dataframe, setting dates as index and column 0 at same time, then using tolist() into a variable
@frank bone btw i just rmb that .loc on pandas dataframe does what you're trying to do lol

#

datetimes_df.loc["2017-05-20":"2017-05-30"] works exactly like how you want it, assuming the dates are the rows' names

flat quest Jul 28, 2020, 3:04 AM

#

run mdoel.fit outside the with_strategy block @drifting umbra

You only need it to create the model

severe island Jul 28, 2020, 3:25 AM

#

anyone here who has worked on twitter data stream dataset by the archive team?

#

I have a tar file packed with lots of jsons in many folders

#

how do i read it

slate scroll Jul 28, 2020, 3:32 AM

#

!d tarfile.open

arctic wedgeBOT Jul 28, 2020, 3:32 AM

#

`tarfile.open`

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)```
Return a [`TarFile`](#tarfile.TarFile "tarfile.TarFile") object for the pathname *name*. For detailed information on [`TarFile`](#tarfile.TarFile "tarfile.TarFile") objects and the keyword arguments that are allowed, see [TarFile Objects](#tarfile-objects).

*mode* has to be a string of the form `'filemode[:compression]'`, it defaults to `'r'`. Here is a full list of mode combinations:

mode

action

`'r' or 'r:*'`

Open for reading with transparent compression (recommended).

`'r:'`

Open for reading exclusively without compression.

`'r:gz'`

Open for reading with gzip compression.

`'r:bz2'`

Open for reading with bzip2 compression.

`'r:xz'`

Open for reading with lzma compression.

`'x'` or `'x:'`

Create a tarfile exclusively without compression. Raise an [`FileExistsError`](exceptions.html#FileExistsError "FileExistsError") exception if it already exists.

`'x:gz'`... [read more](https://docs.python.org/3/library/tarfile.html#tarfile.open)

slate scroll Jul 28, 2020, 3:32 AM

#

!d json.load

arctic wedgeBOT Jul 28, 2020, 3:32 AM

#

`json.load`

json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)```
Deserialize *fp* (a `.read()`-supporting [text file](../glossary.html#term-text-file) or [binary file](../glossary.html#term-binary-file) containing a JSON document) to a Python object using this [conversion table](#json-to-py-table).

*object\_hook* is an optional function that will be called with the result of any object literal decoded (a [`dict`](stdtypes.html#dict "dict")). The return value of *object\_hook* will be used instead of the [`dict`](stdtypes.html#dict "dict"). This feature can be used to implement custom decoders (e.g. [JSON-RPC](http://www.jsonrpc.org) class hinting).

*object\_pairs\_hook* is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of *object\_pairs\_hook* will be used instead of the [`dict`](stdtypes.html#dict "dict"). This feature can be used to implement custom decoders. If *object\_hook* is also defined, the *object\_pairs\_hook* takes priority.... [read more](https://docs.python.org/3/library/json.html#json.load)

slate scroll Jul 28, 2020, 3:33 AM

#

Those two should get you started in the right direction, although I've never worked with that exact data set.

severe island Jul 28, 2020, 3:34 AM

#

the structure looks like ```
| - 01
| | x.json.bz2
| | y.json.bz2
..
..
| - 02
| |x2.json.bz2
| ...
..
..

#

in the .tar

slate scroll Jul 28, 2020, 3:50 AM

#

Ahh so you'll need bz2 also https://docs.python.org/3/library/bz2.html

#

hard to provide anything more concrete without some example data

drifting umbra Jul 28, 2020, 4:12 AM

#

@flat quest thank you!
when I run model fit I now get error stating

#

Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.distribute_lib._DefaultDistributionStrategy object at 0x7f9da99775c0>), which is different from the scope used for the original variable (TPUMirroredVariable:{

#

📎 unknown.png

#

"Make sure the slot variables are created under the same strategy scope. This may happen if you're restoring from a checkpoint outside the scope"

#

i am not restoring a checkpoint

#

just defined this model and loaded csv file from scratch

flat quest Jul 28, 2020, 4:52 AM

#

is build model a keras model subclass? @drifting umbra

drifting umbra Jul 28, 2020, 4:55 AM

#

@flat quest sorry, am noob

#

"Make sure the slot variables are created under the same strategy scope. This may happen if you're restoring from a checkpoint outside the scope"

#

"ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.distribute_lib._DefaultDistributionStrategy object at 0x7f77f87d4438>), which is different from the scope used for the original variable "

drifting umbra Jul 28, 2020, 6:07 AM

#

strategy = tf.distribute.experimental.TPUStrategy(resolver)

with strategy.scope():
  model = Sequential()
  model.add(Dense((input_length+1), input_dim=input_length, kernel_initializer='normal', activation='relu'))
  model.add(Dense(hidden_layer_neurons, activation='relu'))
  model.add(Dense(hidden_layer_neurons, activation='relu'))
  model.add(Dense(1, activation='linear'))

  adam = tf.keras.optimizers.Adam(
      learning_rate=0.001,
      beta_1=0.9,
      beta_2=0.999,
      epsilon=1e-07,
      amsgrad=False,
      name="Adam"
      )


  model.compile(loss='mse',
                optimizer=adam,
                metrics=['mse'])
  #['mae', 'mse']
  model.summary()

#

"Make sure the slot variables are created under the same strategy scope. This may happen if you're restoring from a checkpoint outside the scope"

#

?

#

any ideas welcome im dying here

lapis sequoia Jul 28, 2020, 6:17 AM

#

Anyone know how to manipulate code to create AI
I mean I know how to retrieve data
But how do I make code which can learn from it

tidal bough Jul 28, 2020, 6:19 AM

#

@lapis sequoia Here's a list of recommended reading I just shamelessly copied from the Artificial Intelligence discord server:

MACHINE LEARNING
Before you start specialising in any particular field, it's important to learn the core theory of Machine Learning for a broad exposure to ideas and techniques that you can likely apply to any field.

Core
• Bishop - Pattern Recognition and Machine Learning

Also check out Model-Based Machine Learning by the same author
• Tibshirani, Friedman, Hastie - The Elements of Statistical Learning
• ColumbiaX on edX - Machine Learning

SPECIALISATIONS
Computer Vision
• Stanford - CS231n: Convolutional Neural Networks for Visual Recognition

Natural Language Processing
• Stanford - CS224n: Natural Language Processing with Deep Learning

Reinforcement Learning
• Sutton, Barto - Reinforcement Learning: An Introduction
• Berkeley - CS285: Deep Reinforcement Learning

#

this course is also awesome as an introduction: https://www.coursera.org/learn/machine-learning

lapis sequoia Jul 28, 2020, 6:20 AM

#

I C thenks

outer fulcrum Jul 28, 2020, 7:36 AM

#

Hey guys, I don't understand how operators are working here

#

This line drops any 'Iris-setosa' rows with a separal width less than 2.5 cm

iris_data = iris_data.loc[(iris_data['class'] != 'Iris-setosa') | (iris_data['sepal_width_cm'] >= 2.5)]

#

Someone can explain me ?

tidal bough Jul 28, 2020, 7:37 AM

#

When you use a logical operator with a Series, you get a series of True/Falses, one for each element.

#

You can then supply it as the index to get the elements at those locations.

#

So you can do, say: data = data[data>5] to drop all values that are <=5

#

In this case, you retain the values that fullfill either of two conditions:

(iris_data['class'] != 'Iris-setosa')
or
(iris_data['sepal_width_cm'] >= 2.5)

#

(| is the bitwise OR operator, which for Series works elementwise instead).

outer fulcrum Jul 28, 2020, 7:39 AM

#

Oh oke Thank you !

#

Its clear now

serene oar Jul 28, 2020, 8:35 AM

#

Hi! I had some trouble with this yesterday, but I didn't get an effective answer.

I'm splitting strings (urls) in a dataframe and then assign each part of the URL path to a different column. Thing is, some urls are shorter, so it produces a lot of "None" values.
How can I avoid that?

new = df["url"].str.split("/", expand = True)

 if str(new[3]):
        df['terto'] = new[3]

```py 
Results in:

Name: terto, dtype: object
0 None
1 None
3 modal
4 index
5 edit-modal?id=2713697


I would just like the None to not be there at all, so when I export to csv I can have clean cells.

uncut shadow Jul 28, 2020, 8:36 AM

#

well, wdym?

tidal bough Jul 28, 2020, 8:37 AM

#

pandas has a function to drop rows with NaN values, if that's what you want.

uncut shadow Jul 28, 2020, 8:37 AM

#

I don't think you can delete those Nones

tidal bough Jul 28, 2020, 8:37 AM

#

dropna or something.

serene oar Jul 28, 2020, 8:37 AM

#

I don't want to drop the whole row though

uncut shadow Jul 28, 2020, 8:37 AM

#

you can only replace them to become NaNs

tidal bough Jul 28, 2020, 8:37 AM

#

what do you want to do with them, then?

serene oar Jul 28, 2020, 8:37 AM

#

Just so that they are blank.

tidal bough Jul 28, 2020, 8:37 AM

#

I mean, a CSV must have a constant number of columns.

serene oar Jul 28, 2020, 8:37 AM

#

Because previous columns for that row have values

tidal bough Jul 28, 2020, 8:38 AM

#

Just so that they are blank.
replace them with ""'s, I suppose

serene oar Jul 28, 2020, 8:39 AM

#

And that's a post assignment thing then?
So I do df['blabla'] = new[3] first
and then change the None values to "" afterward?

#

Because I can imagine cases where maybe it finds a "none" from the url that I wouldn't want to delete.

tidal bough Jul 28, 2020, 8:52 AM

#

If that's the case, your only solution is to change the code that puts those Nones in the first place.

serene oar Jul 28, 2020, 8:58 AM

#

I haven't had any luck finding solutions online yet.

serene oar Jul 28, 2020, 9:23 AM

#

if new[1].empty:
        df['primo'] = ""
    else:
        df['primo'] = new[1]
    if str(new[2]):        
        df['secundo'] = new[2]
    if not str(new[3]):
        df['terto'] = new[3]

All of these versions still put None.

patent ferry Jul 28, 2020, 9:35 AM

#

whats the best way to use a multi-pronged approach to machine-learning, i.e using multiple csvs?

tidal bough Jul 28, 2020, 9:37 AM

#

CSVs?

#

anyway, both PyTorch and Tensorflow support parallelizing the computations.

#

Like with other things, TF allows fine control over what exactly is done where, whereas PyTorch is far easier 🙂

patent ferry Jul 28, 2020, 9:38 AM

#

sorry im noob, i use csvs being used as im googling around

#

ah ok thanks, i will have to do some more googling

tidal bough Jul 28, 2020, 9:40 AM

#

still don't know what you mean by "csvs"

patent ferry Jul 28, 2020, 9:40 AM

#

csv files

tidal bough Jul 28, 2020, 9:41 AM

#

ah, so feeding it data from many sources at the same time? yeah, probably possible in both.

patent ferry Jul 28, 2020, 9:41 AM

#

yeah i just feel like my data is too complicated to put in one csv file, so trying to figure out how to deal with that

thin terrace Jul 28, 2020, 10:05 AM

#

Anyone can recommend a simple and "fun" tabular dataset for binary classification.

I'm intending to use it to illustrate how KNN works for complete AI/ML noobs. So it needs at least 2 easy-to-understand features.

acoustic halo Jul 28, 2020, 10:06 AM

#

imdb positive or negative reviews

#

https://keras.io/api/datasets/imdb/

Keras documentation: IMDB movie review sentiment classification dat...

thin terrace Jul 28, 2020, 10:14 AM

#

I believe that is too complicated. The datapoints should be easy to plot in a graph with pen and paper

#

They should do something like drawing a graph with one feature on the X-axis and one on the Y-axis. Then plot some given datapoints, and then predict a new datapoint given K

acoustic halo Jul 28, 2020, 10:21 AM

#

You probably won't find a dataset with only two features, unless you do something like the titanic survival dataset and just select two of the features yourself

#

like class and age or something

thin terrace Jul 28, 2020, 11:03 AM

#

@acoustic halo yeah I want to select two features just like you said.

signal sluice Jul 28, 2020, 11:06 AM

#

what abt the iris dataset?

acoustic halo Jul 28, 2020, 11:06 AM

#

@thin terrace SHould be fairly easy to do, iirc when I was first starting out age and sex were the best survival predictors

thin terrace Jul 28, 2020, 11:07 AM

#

I feel like sex will be a little boring as it's binary

#

wont give a nice spread in the graph

signal sluice Jul 28, 2020, 11:07 AM

#

https://en.m.wikipedia.org/wiki/Iris_flower_data_set

#

classification of species based on a few fields

thin terrace Jul 28, 2020, 11:07 AM

#

maybe age and sibsp

#

yeah iris is just so boring

uncut shadow Jul 28, 2020, 11:24 AM

#

I feel like sex will be a little boring as it's binary
@thin terrace I had to stop for a minute and think what u have said, but then I saw u r talking bout dataset 😂

thin terrace Jul 28, 2020, 11:26 AM

#

😁

serene oar Jul 28, 2020, 1:17 PM

#

Hi again.
Is there a way to get a count of each unique string in a series object?
I have an issue where a dataset is so big that the series breaks the cell of csv. So I want to get the amount of occurrences of each unique string in the column.

I can't figure how. Is group by the way I should be leaning towards?
Sample of the column df['terto'], there are more columns as such.

Index          "terto"
0               [['1231', '12312', '32123', '31231'...
1               [['123', '554543', '463', '2342'...

desert oar Jul 28, 2020, 1:18 PM

#

@serene oar it looks like the valeues of 'terto' are nested lists, or numpy arrays, containing strings. si that right?

#

anyway you can use .value_counts() for that

#

df['terto'].value_counts()

serene oar Jul 28, 2020, 1:21 PM

#

That seems to be correct.
If I do

for a in df['terto']:
  print(a)

I get every print starting with

["['14142',...

#

I guess it's the result of my group by actions

#

Ah, but I think here I need to do something like

for a in df['terto']:
  print(a.value_counts())

Because I want it per row, as each row has a lot of different values in it.
But I get an error:
AttributeError: 'list' object has no attribute 'value_counts'

desert oar Jul 28, 2020, 1:26 PM

#

you did something very weird

#

each element of df['terto'] is one row

#

you must have done some operation that led to having a list of lists in each row

#

instead of data

#

can you show the code you used to produce this data

serene oar Jul 28, 2020, 1:31 PM

#

I have a column of URL's

    new = df["url"].str.split("/", expand = True)

then

    function_dict = {"primo": list, "secundo": list, "terto": list, "doce": list, "cinco": list, "seis": list, "siete": list}
    gdf = df.groupby('eventId').aggregate(function_dict)

then I write it to CSV and re read it from another place. Then merge two csv's as one has additional details

df = pd.merge(left=df, right=cname, how='left', left_on='eventId', right_on='eventId')

and I regroup them based on the new values I added to the dataframe in the merge

function_dict = {"primo": list, "secundo": list, "terto": list, "doce": list, "cinco": list, "seis": list, "siete": list, "eventId": list}
gdf = df.groupby('name').aggregate(function_dict)

#

Forgot this, after the first time I create the "new":

    df['primo'] = new[1]  
    df['secundo'] = new[2]
    df['terto'] = new[3]
    df['doce'] = new[4]
    df['cinco'] = new[5]
    df['seis'] = new[6]
    df['siete'] = new[7]

desert oar Jul 28, 2020, 1:34 PM

#

why would you do this? gdf = df.groupby('name').aggregate(function_dict)

#

what is the purpose?

serene oar Jul 28, 2020, 1:35 PM

#

I had a list of url's that correspond to an "event" and this event has a parent that isn't mentioned in the first dataframe. So after I had grouped the url paths to each event, I wanted to add the parent to these events and group them per parent.
The end goal is to group per parent

#

parent is 'name'

desert oar Jul 28, 2020, 1:36 PM

#

before you write to csv you should use json.dumps on the columns that contain list data

#

where do you use new?

#

function_dict = {"primo": list, "secundo": list, "terto": list, "doce": list, "cinco": list, "seis": list, "siete": list}
gdf = df.groupby('eventId').aggregate(function_dict)

# Convert lists -> JSON
list_cols = list(function_dict.keys())
gdf[list_cols] = gdf[list_cols].applymap(json.dumps)

# Use tab as delimiter to avoid confusion with JSON data
gdf.to_csv('gdata.tsv', sep='\t')

#

gdf = pd.read_csv('gdata.tsv', sep='\t')
gdf[list_cols] = gdf[list_cols].applymap(json.loads)

serene oar Jul 28, 2020, 1:38 PM

#

new is made by splitting the url path's by "/"
And is used for assigning values to other columns.

desert oar Jul 28, 2020, 1:39 PM

#

i see

#

i recommend using Parquet format

#

not CSV