#data-science-and-ml | Python | Page 287

misty flint Feb 20, 2021, 9:50 PM

#

ID_BoomKek

iron basalt Feb 20, 2021, 9:50 PM

#

I don't understand how that is an issue, also it's still sorted after cutting out the non-us rows.

analog pike Feb 20, 2021, 9:51 PM

#

all i know is that python is disagreeing with me trying to edit values on index, which is a copy of ufos

iron basalt Feb 20, 2021, 9:52 PM

#

what does ufos look like

#

with row and column labels

analog pike Feb 20, 2021, 9:53 PM

#

https://www.kaggle.com/NUFORC/ufo-sightings?select=complete.csv here

UFO Sightings

Reports of unidentified flying object reports in the last century

coral walrus Feb 20, 2021, 9:58 PM

#

anyone knows how I can format multiple datetime columns in pandas?

df['WPCBDATO'] = df['WPCBDATO'].dt.strftime('%d-%m-%Y')
df['WPCIDATO'] = df['WPCIDATO'].dt.strftime('%d-%m-%Y')
df['WPCPLSLUT'] = df['WPCPLSLUT'].dt.strftime('%d-%m-%Y')
``` already tried df[[]]

analog pike Feb 20, 2021, 9:59 PM

#

pd.to_datetime i'm pretty sure

coral walrus Feb 20, 2021, 9:59 PM

#

gonna try again, pretty sure it's not working

#

seems odd if I can't format multiple columns at once

#

df[['WPCBDATO', 'WPCIDATO', 'WPCPLSLUT']] = pd.to_datetime(df[['WPCBDATO', 'WPCIDATO', 'WPCPLSLUT']], format='%d-%m-%Y')
```returns ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing

#

.apply(pd.to_datetime, format='') did the trick it seems

analog pike Feb 20, 2021, 10:03 PM

#

hm

#

never heard of that

coral walrus Feb 20, 2021, 10:03 PM

#

now you have Wowee

analog pike Feb 20, 2021, 10:03 PM

#

yup

coral walrus Feb 20, 2021, 10:04 PM

#

but it ignores the format

#

strange

iron basalt Feb 20, 2021, 10:08 PM

#

column = ufos.loc[ufos['country'] == 'us', 'datetime']
for i, val in enumerate(column):
  column[i] = val.split()[1]

#

@analog pike

analog pike Feb 20, 2021, 10:10 PM

#

Well there's no more error

#

but it's giving me the half with the time not the date

#

okay nevermind

#

i switched the 1 to 0

iron basalt Feb 20, 2021, 10:11 PM

#

yea the format on https://www.kaggle.com/NUFORC/ufo-sightings?select=scrubbed.csv only has two parts

UFO Sightings

Reports of unidentified flying object reports in the last century

#

date and time

analog pike Feb 20, 2021, 10:11 PM

#

wdym

iron basalt Feb 20, 2021, 10:11 PM

#

example:

#

10/10/1949 20:30

analog pike Feb 20, 2021, 10:12 PM

#

oh yeah

#

only one space

iron basalt Feb 20, 2021, 10:12 PM

#

split()[0] is the first part which is what you want

analog pike Feb 20, 2021, 10:13 PM

#

alright thanks so much

iron basalt Feb 20, 2021, 10:13 PM

#

(side note: split without any arguments passed to it splits by all whitespace, so it works even with multiple spaces in-between, or new lines or tabs)

analog pike Feb 20, 2021, 10:15 PM

#

nice

grave frost Feb 20, 2021, 10:24 PM

#

Quick question - I am doing Hyperparameter search for my classification model. Is it reasonable to expect that if a model attains higher accuracy (or lower loss) in 5 epochs against another iteration of parameters then it would also attain higher accuracy after full training (say, like 15 epochs)?

cerulean spindle Feb 20, 2021, 10:28 PM

#

I'm pretty sure that if you put too many epochs, the model will overfit and the loss will go up.

#

so you'll have to have a balance

velvet thorn Feb 20, 2021, 10:29 PM

#

grave frost Quick question - I am doing Hyperparameter search for my classification model. I...

not necessarily

grave frost Feb 20, 2021, 10:29 PM

#

I am fine-tuning

velvet thorn Feb 20, 2021, 10:29 PM

#

example

#

let’s say the hyperparameter is learning rate

#

and weight initialisation is the same

#

you might get near a local minimum faster, but you might also be less likely to reach it exactly

#

with a higher learning rate

grave frost Feb 20, 2021, 10:30 PM

#

Aha.. good point

#

Look like Im gonan have to keep it for like 20 hours

#

Thanx a lot @velvet thorn 🚀

velvet thorn Feb 20, 2021, 10:32 PM

#

grave frost Thanx a lot <@!171929073063297024> 🚀

yw 👋

astral path Feb 20, 2021, 11:43 PM

#

If I have a column in my dataset which contains short string descriptions using keywords, how could I include that in a heatmap/correlogram to show relationships between the keyword and other variables? e.g. I could use this to find that, for example, descriptions that contain the word "red" and "dress" have a smaller value in a column called stock than a description that includes "green" and "bag"
example of data

velvet thorn Feb 20, 2021, 11:44 PM

#

astral path If I have a column in my dataset which contains short string descriptions using ...

create columns representing the presence of words

#

i.e. one-hot encoding

steady horizon Feb 21, 2021, 12:49 AM

#

how can i find a degree of similarity between two topics obtained with lda in different documents?

digital crescent Feb 21, 2021, 1:15 AM

#

I have a dataframe called data with two columns.

print(data.dtypes) yields:
data_in_datetimeformat datetime64[ns]
data_in_float64_format float64
dtype: object

What does "dtype: object" mean? It isn't one of my columns as far as I know.

simple flume Feb 21, 2021, 1:16 AM

#

any data type in python is object based

#

its considered as an object from a class

#

do u get me ?

digital crescent Feb 21, 2021, 1:18 AM

#

I think that makes sense, yeah.

#

Why does it list that though?

simple flume Feb 21, 2021, 1:20 AM

#

object as well but you can tell python that i want list of objects

#

which is a list

digital crescent Feb 21, 2021, 1:20 AM

#

Is it listing each dataframe's columns' data types and then also saying that "dtype" itself is an object?

simple flume Feb 21, 2021, 1:22 AM

#

yes data frame columns are objects as well i think you can define them as series and give it to them or a list

#

from panda package

digital crescent Feb 21, 2021, 1:24 AM

#

I mean, I don't understand why "dtype: object" is part of the output of print(data.dtypes)

simple flume Feb 21, 2021, 1:24 AM

#

for example dataframe is a class or bigger object which contains smaller objects which are columns

digital crescent Feb 21, 2021, 1:24 AM

#

That makes sense, yeah

simple flume Feb 21, 2021, 1:28 AM

#

when python for example tells us something data type is list its origin is a list [] the brackets for example define that you tell the compiler i will have just a number of objects

#

thats why the list could have different data types

#

its not like array in c++

digital crescent Feb 21, 2021, 1:29 AM

#

I think I understand what you are saying

#

So if a pandas array is composed of columns all themselves composed of the same dataype, pandas.dtypes might return something other than "dtype: object"?

#

Never mind. The output of the function seems like something I don't really need to understand right now

#

The important part to me was being able to identify the datatype being used for each column

plain jungle Feb 21, 2021, 2:12 AM

#

python has the best list than any other language imo because they allow for multiple datatypes

misty flint Feb 21, 2021, 2:46 AM

#

astral path If I have a column in my dataset which contains short string descriptions using ...

like gm said one-hot encoding or other recoding methods

#

scikit learn has the OrdinalEncoder function too. thats the one i used recently

velvet thorn Feb 21, 2021, 3:15 AM

#

@digital crescent no

#

because that’s a series

#

with string values

#

representing the data types

#

and strings are objects

#

(although that seems like it’s a bit outdated because there is a specific string dtype now)

velvet thorn Feb 21, 2021, 3:16 AM

#

plain jungle python has the best list than any other language imo because they allow for mult...

it’s because Python is dynamically typed

#

that has its own drawbacks.

#

it’s not necessarily better

velvet thorn Feb 21, 2021, 3:17 AM

#

simple flume any data type in python is object based

this is true but not the point

velvet thorn Feb 21, 2021, 3:17 AM

#

velvet thorn with string values

actually the contents could be type objects and not strings

#

which would explain it

#

you can check

digital crescent Feb 21, 2021, 3:18 AM

#

velvet thorn <@266774717803921410> no

Gotcha. So a pandas dataframe with 2 data columns essentially has an index column, the 2 data columns, a series with values representing the 2 data columns' data types, and everything else a dataframe object would have, right?

velvet thorn Feb 21, 2021, 3:19 AM

#

digital crescent Gotcha. So a pandas dataframe with 2 data columns essentially has an index colum...

well, the dtypes series is dynamically generater, I THINK?

digital crescent Feb 21, 2021, 3:20 AM

#

I'm not sure. Either way "series/dynamic generator" 🙂

molten bluff Feb 21, 2021, 4:00 AM

#

guys, I have this tweets data set i am trying to clean. I want to remove all words that begin with @ from a text column and then drop the rows that have the same texts after the above process. I have the following code but it isn't working

df['clean_text']=df['text'].str.replace('(@\w+.*?)',"")
df = df.drop_duplicates(subset = ['clean_text', 'username'])

can someone help me? Thanks!

misty flint Feb 21, 2021, 4:02 AM

#

sounds like a regex problem

#

pithink

molten bluff Feb 21, 2021, 4:04 AM

#

misty flint sounds like a regex problem

anyone suggestion on how to remove words that begin with @ or how to modify the above regex? I tried printing dataframe to the console and it appears to have removed the @sub_strings from the texts but the data frame isn't dropping the duplicates

misty flint Feb 21, 2021, 4:07 AM

#

regex's are above my paygrade, sorry amegablobsweats

molten bluff Feb 21, 2021, 4:07 AM

#

molten bluff guys, I have this tweets data set i am trying to clean. I want to remove all wor...

#

regex appears to be working:

harsh timber Feb 21, 2021, 4:10 AM

#

Hak's regex is working. I think the subset is the problem. Specifically, using the username column which I don't actually see... Probably removing "username" would work?

#

No prob. If it's an issue, just delete that post and let me know if the usernames are the same. I.e. for privacy reasons, you should prob delete that post : )

molten bluff Feb 21, 2021, 4:14 AM

#

harsh timber No prob. If it's an issue, just delete that post and let me know if the username...

the usernames are the same. I want to delete rows with the same usernames and the same clean_text values

quiet locust Feb 21, 2021, 4:14 AM

#

Hi guys I’m trying to figure out a good project to do for my data science portfolio

#

Having a hard time coming up with a research questions

harsh timber Feb 21, 2021, 4:15 AM

#

molten bluff the usernames are the same. I want to delete rows with the same usernames and th...

~~Maybe try adding inplace=True in drop_duplicates?~~ Scratch that. You're returning df

molten bluff Feb 21, 2021, 4:15 AM

#

harsh timber ~~Maybe try adding `inplace=True` in `drop_duplicates`?~~ Scratch that. You're r...

yes

harsh timber Feb 21, 2021, 4:21 AM

#

molten bluff yes

Herm I'm not entirely sure. I would just try to take those two records and compare each string/list using == operator to double check. And then try again. If they are truly equal in Python, then try doing that inplace argument I suppose. Not really my field of expertise since I had my own question to ask, but hopefully testing out a bunch of edge cases should help resolve your issue.

molten bluff Feb 21, 2021, 4:23 AM

#

harsh timber Herm I'm not entirely sure. I would just try to take those two records and compa...

ok thanks! I will try it out. I tried inplace, but that didn't work. Probably will need to compare each string and check what is happening

flint mason Feb 21, 2021, 4:39 AM

#

for i in range(100):
  loss = mse(model(inputs),targets)
  loss.backward()
  with torch.no_grad():
    weights -=weights*1e5
    bias -= bias*1e5
    weights.grad.zero_()
    bias.grad.zero_()

    ```

#

Can someone have a look is the weights and bias used properly

misty flint Feb 21, 2021, 4:41 AM

#

quiet locust Having a hard time coming up with a research questions

do something in an industry that interests you

#

if you dont like it, you wont finish the project

quiet locust Feb 21, 2021, 4:46 AM

#

Hmmm okay Rex thanks!

#

Do you mind if I send it in here as I go along?

misty flint Feb 21, 2021, 5:09 AM

#

sure why not

velvet thorn Feb 21, 2021, 5:22 AM

#

molten bluff the usernames are the same. I want to delete rows with the same usernames and th...

is that an AND

#

or an OR

molten bluff Feb 21, 2021, 5:25 AM

#

velvet thorn is that an AND

AND

velvet thorn Feb 21, 2021, 5:32 AM

#

molten bluff AND

my best guess is that you have spaces

molten bluff Feb 21, 2021, 6:04 AM

#

velvet thorn my best guess is that you have spaces

yeah, i tried updating the regex to (@\w+\s*) so spaces following the @substrings are stripped. But that didn't drop the duplicates as well.

#

I figured out that the strings weren't unique because there were exactly same texts with different @s, hastags and urls for some reason. So had to remove all of them and then the drop duplicates worked

#

some user spammed the exact same tweet multiple times with different @s, hashtags and urls multiple times on different days and was messing up my eda

misty flint Feb 21, 2021, 6:33 AM

#

sounds like a bot

#

ID_BoomKek

#

are you doing some sentiment analysis or something?

#

i need to do a project involving twitter api sometime

molten bluff Feb 21, 2021, 6:34 AM

#

I swear to god the guy was a real person, i personally verified it

misty flint Feb 21, 2021, 6:34 AM

#

to better understand it

#

pithink

#

how many calls are you limited to daily?

#

100?

dusty pasture Feb 21, 2021, 6:47 AM

#

Hi

#

Pls verify me I accidentally left

#

Before

molten bluff Feb 21, 2021, 6:51 AM

#

misty flint how many calls are you limited to daily?

I used twint and other scrappers to collect data.

astral path Feb 21, 2021, 7:03 AM

#

i'm using a seaborn distplot to visualize the distribution of one of my columns, but it's extremely distorted because there's some variables (which I'm not sure I want to leave out) that are extremely far away in value from the others

#

it looks like this now

#

there's 301 of these outlier values with a mean of ~10220 and an std of 5544, so idk if i should remove them

#

what do you think I should do?

#

same thing happens with another column

#

its seems like i just have some datapoints which are outliers in all variables

#

i've tried using sb.distplot(plot_df[plot_df['stock'] < 100]) to get all items under 100 (as a test case) and it changed nothing at all...

misty flint Feb 21, 2021, 7:25 AM

#

molten bluff I used twint and other scrappers to collect data.

oh smart

#

i would want to work with twitter api only so that i can say ive worked with it

#

but in practicality its annoying

#

lol

misty flint Feb 21, 2021, 7:26 AM

#

astral path there's 301 of these outlier values with a mean of ~10220 and an std of 5544, so...

big yikes

#

monkaCHRIST

#

maybe split the dataset?

#

if its that many outliers in that region, seems like a dif. subcategory

astral path Feb 21, 2021, 7:28 AM

#

i guess i could try that

misty flint Feb 21, 2021, 7:28 AM

#

honestly idk what youre supposed to do in that case

#

thats just what i would do

#

lol

quiet locust Feb 21, 2021, 7:56 AM

#

This is a really simple question

#

But what’s the best method for making an api call

#

And why is it necessary?

astral path Feb 21, 2021, 7:58 AM

#

i think i'll just ask my TA

random thicket Feb 21, 2021, 9:23 AM

#

/python

slender radish Feb 21, 2021, 11:22 AM

#

hey everyone

#

i have a question about data types

#

Screen_Shot_2021-02-21_at_6.24.07_AM.png

#

so i have this dataset and height and weight are all integers, are these two attributes continuous data or discrete data?

#

i feel like height and weight should be continuous data, but in this case where all the entries are integers, are they discrete?

dawn turtle Feb 21, 2021, 11:58 AM

#

they are continuous

#

even if the measurement is granular

idle sail Feb 21, 2021, 1:17 PM

#

hey i'm new to the server can anyone link me some resources to get started with data science? (like some projects or bootcamp idk).
I have a basic knowledge of python, but i don't know where to find the resources to learn more... thank you in advance 🙂

grave frost Feb 21, 2021, 2:36 PM

#

I think there is a way to trigger a bot to list all the resources for DS

slender radish Feb 21, 2021, 2:41 PM

#

@dawn turtle what about an attribute that only takes 0 or 1. 1 being true and 0 being false. Is this categorical data or numerical (discrete) data?

dawn turtle Feb 21, 2021, 3:49 PM

#

If they represent true and flase its categorical. Its about the thing that the data is representing not how it is measured

#

@slender radish

slender radish Feb 21, 2021, 3:53 PM

#

@dawn turtle okie thanks so much!

misty flint Feb 21, 2021, 3:58 PM

#

grave frost I think there is a way to trigger a bot to list all the resources for DS

oh? how? i dont think ive seen that before

idle sail Feb 21, 2021, 4:04 PM

#

grave frost I think there is a way to trigger a bot to list all the resources for DS

really? what is it?

lapis sequoia Feb 21, 2021, 4:06 PM

#

quiet locust But what’s the best method for making an api call

The requests library is a great and simple tool for API calls

arctic wedgeBOT Feb 21, 2021, 6:42 PM

#

Hey @sharp pumice!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

lapis sequoia Feb 21, 2021, 8:01 PM

#

Hi, hopefully this question isn't too taboo. I work with R a lot and am trying to relearn Python as more and more jobs are using Python (plus a personal interest in it). I've been learning how to use Pandas, matplotlib, numpy etc. Does anyone have any good resources/suggestions on how to get the hang of Python's syntax coming from R?

heady path Feb 21, 2021, 8:07 PM

#

Anyone interested in chatting approaches to text classification? I am not familiar with data science or python really, but I was a software developer for a few years and have a degree in computer engineering, so I'm sure I can keep up with a convo. I have a specific use case I'm trying to determine whether to go third party, hire a junior developer to work to build something or build it myself

#

Anyway, DM me if anyone is interested

grave frost Feb 21, 2021, 9:16 PM

#

heady path Anyone interested in chatting approaches to text classification? I am not famili...

There are still some basics to ML that are mathematically related than present in normal CS, but still I think you would be able to make a decent model on your own

#

we are here to help in case of any problem 🙂 but if you want, freelance is always there

little compass Feb 21, 2021, 9:56 PM

#

The Embedding module of PyTorch is very simple and extremely powerful at the same time. If you are interested in how to deal with representing tokens of a language or encoding categorical variables do not miss this video:)

https://youtu.be/euwN5DHfLEo

YouTube

mildlyoverfitted

torch.nn.Embedding explained (+ Character-level language model)

In this video, I will talk about the Embedding module of PyTorch. It has a lot of applications in the Natural language processing field and also when working with categorical variables. I will explain some of its functionalities like the padding index and maximum norm. In the second part of this video I will use the Embedding module to represent...

▶ Play video

quiet locust Feb 21, 2021, 9:57 PM

#

would anyone be able to hop on a zoom call with me?

#

I'm having trouble making a request for an api

astral path Feb 21, 2021, 10:09 PM

#

would binary yes/no variables be considered categorial?

#

e.g. a variable called inStock holds a value 1 if the item is in stock, 0 if not

earnest falcon Feb 21, 2021, 10:15 PM

#

Hello so basically i wanna visualise a cumulative data of a specific country using panda but no matter what i cant get it to work but im able to get cumulative data of every country this is my data set https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv

GitHub

CSSEGISandData/COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE - CSSEGISandData/COVID-19

quaint bloom Feb 21, 2021, 10:53 PM

#

this is probably a dumb question, but how is this 4 dimensions? and what would the shape even be?

velvet thorn Feb 21, 2021, 11:08 PM

#

quaint bloom this is probably a dumb question, but how is this 4 dimensions? and what would ...

in this case, “dimension” means “axis length”

quaint bloom Feb 21, 2021, 11:08 PM

#

ohhhhhhhh

#

i was so confused lmao

velvet thorn Feb 21, 2021, 11:08 PM

#

okay so you will notice

#

ML programmers

iron basalt Feb 21, 2021, 11:08 PM

#

quaint bloom this is probably a dumb question, but how is this 4 dimensions? and what would ...

The is the dimensions of the shape, and then there is dimension as it refers to the number of components total.

velvet thorn Feb 21, 2021, 11:09 PM

#

velvet thorn okay so you will notice

and mathematicians

#

actually use the term differently

#

so in coding

#

if we say “3D array” (3 dimensional)

#

we mean an array that requires 3 levels of indexing

#

to hit a scalar value

#

in mathematics, this is more commonly referred to as a rank 3 tensor

velvet thorn Feb 21, 2021, 11:11 PM

#

velvet thorn in mathematics, this is more commonly referred to as a rank 3 tensor

if you had a vector

#

let’s say representing a point on an x, y plot

#

that would be a rank 1 tensor

#

with 2 dimensions (values, namely x and y)

#

all that said

velvet thorn Feb 21, 2021, 11:13 PM

#

velvet thorn if we say “3D array” (3 *dimensional*)

this is usually more common but you should know the mathematical usage of you’re doing ML

quaint bloom Feb 21, 2021, 11:13 PM

#

ah ok

#

thx for the pro explanation

iron basalt Feb 21, 2021, 11:13 PM

#

velvet thorn in mathematics, this is more commonly referred to as a rank 3 tensor

I'm being pedantic here, but tensor as used in ML is a programmer term. In the same way that convolutions are not actually used in ML, they just incorrectly use / borrow the term.

#

Though it's so far spread the misuse that it's more or less the accepted use now.

velvet thorn Feb 21, 2021, 11:15 PM

#

iron basalt I'm being pedantic here, but tensor as used in ML is a programmer term. In the s...

that happens a lot in programming

#

cf. functor

#

🥴

iron basalt Feb 21, 2021, 11:16 PM

#

True. Though my favorite nonsensical term in programming is Dynamic Programming. As it was named as such to disguise the fact that the he was doing mathematical research because the US general had a pathological fear of the word "research" and to make it sound so cool that no congressmen could object to it.

#

I think explaining that difference in dimension usage with number of indices is nice, i'm gonna borrow that.

quiet locust Feb 21, 2021, 11:48 PM

#

What’s the best way to convert a json into a data frame

iron basalt Feb 21, 2021, 11:52 PM

#

quiet locust What’s the best way to convert a json into a data frame

Depends on the format of the json.

quiet locust Feb 21, 2021, 11:55 PM

#

So basically I requested an api

#

And want to take the json and convert it to a data frame

#

So that I can do some analysis

lapis sequoia Feb 21, 2021, 11:56 PM

#

nice

merry ridge Feb 21, 2021, 11:58 PM

#

I don't really think of it as mathematicians using the term differently more so that-non mathematicians are not specific enough when they refer to dimension

#

Take for example, a 2x2 matrix. What is the dimension? Are you referring to the dimension of the column space? Or is this over the vector space of 2x2 matrices over a binary operation like addition?

#

It is especially unclear in machine learning where there is usually no isomorphsim between the row and column space and a mathematician would be more likely to specify exactly what they mean while a non-mathematician is more likely to brush it under the rug because there is a contextually obvious answer without having to include possibly incorrect mathematical rigor

velvet thorn Feb 22, 2021, 12:03 AM

#

merry ridge Take for example, a 2x2 matrix. What is the dimension? Are you referring to the ...

in a programming context?

#

you wouldn't ask that question

#

you would say "what is the length of <this> axis?"

#

furthermore

#

"matrix" is a mathematical abstraction

#

you can have a 2D array representing that

merry ridge Feb 22, 2021, 12:04 AM

#

playing a game and my queue popped so I won't be able to reply for a while if necessary

velvet thorn Feb 22, 2021, 12:04 AM

#

quiet locust And want to take the json and convert it to a data frame

again, depends on the JSON.

#

if it's nested you probably won't be able to convert it to a DF as easily

#

because dataframes inherently deal with tabular data

quiet locust Feb 22, 2021, 12:05 AM

#

can I send a screenshot?

#

Screen_Shot_2021-02-21_at_4.06.21_PM.png

#

here's what I did

velvet thorn Feb 22, 2021, 12:06 AM

#

and then

quiet locust Feb 22, 2021, 12:07 AM

#

That’s as far as I got

velvet thorn Feb 22, 2021, 12:07 AM

#

what do you expect the result to look like

#

I'm not seeing much data there

quiet locust Feb 22, 2021, 12:07 AM

#

It seems like that api wasn’t a good example

#

Yeah it didn’t come out the way I expected

#

I thought I was getting season statistics for the nba

#

Let me try with a different api and see the result

#

Thank you though @velvet thorn

velvet thorn Feb 22, 2021, 12:10 AM

#

🥴 you're welcome but I didn't do much

prisma willow Feb 22, 2021, 12:11 AM

#

I just found that the machineLearning course for my university uses matlab, i wanted to do python with tensorflow+keras and stuff should i drop it and learn supervised/unsupervised/clustering my self or do the course anyway? im at the end of my diploma and done all the major stuff i wanted.

#

what u think?

quiet locust Feb 22, 2021, 12:18 AM

#

@velvet thorn do you have any suggestions in terms of api usage

#

Because I thought I was getting a large dataset and ended up with this

wispy tangle Feb 22, 2021, 12:19 AM

#

is there a way to use max for a list and ignoring strings found in the list

velvet thorn Feb 22, 2021, 12:19 AM

#

quiet locust <@!171929073063297024> do you have any suggestions in terms of api usage

tbh

#

it just sounds to me

#

like you used the wrong API

#

🥴

#

maybe look through the docs

#

?

#

if there are docs

velvet thorn Feb 22, 2021, 12:20 AM

#

wispy tangle is there a way to use max for a list and ignoring strings found in the list

yes

#

but why do you have strings in the list

#

like you can do it but I would suggest you filter first

wispy tangle Feb 22, 2021, 12:23 AM

#

like i want to get the index of the max number in a list and print the item of that index in another list then move on to the second max number etc..

#

what im trying to do is replace that max num after i use it with a string so it doesn't count

#

if i replace it with zero it gets mixed with other nums

#

oh i could just replace it with -1 since the min num possible is zero

misty flint Feb 22, 2021, 12:25 AM

#

prisma willow I just found that the machineLearning course for my university uses matlab, i wa...

do what you want dude. you can either a) do ML with matlab or b) not and learn tf/keras on your own or c) learn both methods of doing it. just depends on your endgoal.

#

ML Engineer? youll probs want that matlab experience. Just a Data Scientist? tf/keras will probs be sufficient

#

or if you think youre NOT disciplined enough to learn on your own/you want structure, i would do the ML course

quiet locust Feb 22, 2021, 12:34 AM

#

@velvet thorn so I used a different api and got this result. Now I want to take this and make it a dataframe

Screen_Shot_2021-02-21_at_4.33.44_PM.png

velvet thorn Feb 22, 2021, 12:40 AM

#

quiet locust <@!171929073063297024> so I used a different api and got this result. Now I want...

doesn't look flat

#

do you know what "flat" means in this context?

velvet thorn Feb 22, 2021, 12:40 AM

#

wispy tangle like i want to get the index of the max number in a list and print the item of t...

hm

#

what I would suggest instead is

#

are you familiar with the concept of "argsort"

quiet locust Feb 22, 2021, 12:41 AM

#

I don't know what flat means

velvet thorn Feb 22, 2021, 12:41 AM

#

quiet locust I don't know what flat means

okay, so

quiet locust Feb 22, 2021, 12:41 AM

#

yeah I'm running into a whole host of errors when I try to read it into a dataframe

velvet thorn Feb 22, 2021, 12:44 AM

#

!e

import pandas as pd

json = """[
    {
        "col_1": "a",
        "col_2": 3
    },
    {
        "col_1": "w",
        "col_2": -3
    }
]"""

print(pd.read_json(json))

arctic wedgeBOT Feb 22, 2021, 12:44 AM

#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 |   col_1  col_2
002 | 0     a      3
003 | 1     w     -3

velvet thorn Feb 22, 2021, 12:44 AM

#

this source JSON is flat

#

it maps nicely to a table

#

on the other hand

quiet locust Feb 22, 2021, 12:45 AM

#

ah ok I see what you mean now

velvet thorn Feb 22, 2021, 12:45 AM

#

!e

import pandas as pd

json = """[
    {
        "col_1": "a",
        "col_2": {
            "sub_col_1": 1,
            "sub_col_2": None
        }
    },
    {
        "col_1": "w",
        "col_2": {
            "sub_col_1": 6,
            "sub_col_2": 4
        }
    }
]"""

print(pd.read_json(json))

arctic wedgeBOT Feb 22, 2021, 12:45 AM

#

@velvet thorn :x: Your eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 20, in <module>
003 |   File "/snekbox/user_base/lib/python3.9/site-packages/pandas/util/_decorators.py", line 199, in wrapper
004 |     return func(*args, **kwargs)
005 |   File "/snekbox/user_base/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper
006 |     return func(*args, **kwargs)
007 |   File "/snekbox/user_base/lib/python3.9/site-packages/pandas/io/json/_json.py", line 563, in read_json
008 |     return json_reader.read()
009 |   File "/snekbox/user_base/lib/python3.9/site-packages/pandas/io/json/_json.py", line 694, in read
010 |     obj = self._get_object_parser(self.data)
011 |   File "/snekbox/user_base/lib/python3.9/site-packages/pandas/io/json/_json.py", line 716, in _get_object_parser
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/imiluyarol.txt

velvet thorn Feb 22, 2021, 12:45 AM

#

this is not

quiet locust Feb 22, 2021, 12:45 AM

#

hmmm okay

velvet thorn Feb 22, 2021, 12:45 AM

#

the opposite of flat is "nested"

#

so

#

the term for what you want to do is "normalise"

#

ideally

quiet locust Feb 22, 2021, 12:46 AM

#

And with "nested" it's much harder to untangle?

velvet thorn Feb 22, 2021, 12:46 AM

#

you would have some knowledge of the relational model of data

velvet thorn Feb 22, 2021, 12:46 AM

#

velvet thorn you would have some knowledge of the relational model of data

how SQL does it

quiet locust Feb 22, 2021, 12:46 AM

#

hmmm ok

velvet thorn Feb 22, 2021, 12:46 AM

#

quiet locust And with "nested" it's much harder to untangle?

more like

#

your data cannot be mapped to a row-column strucutre

#

pandas has a json_normalize method

#

but that won't work in all cases

#

you can try that

#

if it doesn't

#

then you need to do it manually

quiet locust Feb 22, 2021, 12:47 AM

#

ok let me look up some syntax on json_normalize

#

thank you!

#

by manually you mean with like functions and iterations?

velvet thorn Feb 22, 2021, 12:48 AM

#

quiet locust by manually you mean with like functions and iterations?

ye

quiet locust Feb 22, 2021, 12:50 AM

#

shit I am not good at those

#

I'm like a beginner for sure as you could prolly already tell haha

#

@velvet thorn do you have any book recs for python with data science in particular?

velvet thorn Feb 22, 2021, 12:51 AM

#

quiet locust <@!171929073063297024> do you have any book recs for python with data science in...

nope, sorry...

#

not really a book-for-learning person

quiet locust Feb 22, 2021, 12:52 AM

#

I feel that

#

I got a module error when trying json_normalize

#

That's prolly bc of version?

quiet locust Feb 22, 2021, 1:07 AM

#

@velvet thorn I gave up and decided to use a kaggle dataset LOL

merry pebble Feb 22, 2021, 2:30 AM

#

i'm looking to create a nice looking line chart that can be styled, is mobile friendly/dynamic. i need the package to either render html of the chart or a .svg so the image is interactive. How could I go about this?

lapis sequoia Feb 22, 2021, 2:51 AM

#

misty flint do what you want dude. you can either a) do ML with matlab or b) not and learn t...

what exactly do you need matlab for for ML?

#

I used to use matlab at university but havent touched it much since I started doing ML

#

So far I've been able to rely on sklearn for any stat crunching models

quasi sparrow Feb 22, 2021, 3:24 AM

#

Guys, quick question, Why would this throw me an error?

#

'''

#

    df_bert = pd.DataFrame({
        'id': range(len(train_df)),
        'label': train_df[0],
        'alpha': ['a'] * train_df.shape[0],
        'text': train_df[1].replace(r'\n', ' ', regex=True)
    })

#

So, I have a CSV data file and I am using that file to create a data frame with only two columns

#

My interpreter does not like how I parse the columns [1] and [0] into a new dataframe

#

Anything helps!

lavish swift Feb 22, 2021, 4:22 AM

#

@quasi sparrow Not sure, but here are some thoughts. 1. Without knowing what's in your train_df, I think you're trying to set the label to the values you have in the first column? If so, you may want to try something like:

'label': train_df.iloc[:, 0].values

#

I don't think you can use the index of the column without telling pandas that's what you're trying to do.

quasi sparrow Feb 22, 2021, 4:23 AM

#

Yes, that's exactly what I'm trying to do! Many thanks!

#

I'm trying to train a bert model

lavish swift Feb 22, 2021, 4:24 AM

#

as for the replace, you want to add an .str.replace.... So pandas knows you're using a string method

#

or maybe pandas has a replace method? but it looks like you're doing string work

quasi sparrow Feb 22, 2021, 4:25 AM

#

I have a dataframe with 2 columns: One with string sentences and the other one with numbers (labels)

lavish swift Feb 22, 2021, 4:25 AM

#

though my first suggestion (using iloc) might solve both issues without having to use .str.

quasi sparrow Feb 22, 2021, 4:26 AM

#

I am trying to create a new dataframe with 4 columns, 2 coming from the original dataframe that I already have.

lavish swift Feb 22, 2021, 4:27 AM

#

hopefully those suggestion help. lemme know how it goes! 🙂 good luck!

quasi sparrow Feb 22, 2021, 4:27 AM

#

Thanks! Sure I will!

astral path Feb 22, 2021, 7:43 AM

#

Hi all, I'm trying to use some functions created in a kaggle notebook to create a custom correlation heatmap (https://www.kaggle.com/mlwhiz/seaborn-visualizations-using-football-data/), however despite the data I'm using being seemingly very similar to the data that they're using, I'm getting an error UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32') when I try to call results = associations(plot_df,nominal_columns=catcols,return_results=True) on my own dataframe plot_df. I have absolutely no idea what this means and answers from other places online haven't really explained this well at all

#

this is their dataframe when i called .info() on it

#

and here's mine

#

I really don't see any differences here in types, any ideas what could possibly be causing this?

#

if more info is needed I can provide it. Thank you!!!

viral goblet Feb 22, 2021, 7:53 AM

#

astral path I really don't see any differences here in types, any ideas what could possibly ...

What do you exactly mean by difference in types?

astral path Feb 22, 2021, 8:26 AM

#

viral goblet What do you exactly mean by difference in types?

The types of my data and theirs arent different

velvet thorn Feb 22, 2021, 11:11 AM

#

astral path Hi all, I'm trying to use some functions created in a kaggle notebook to create ...

<U32 means 32 character unicode string

#

that error is saying that you're treating non-numeric columns as numeric, basically

tender gyro Feb 22, 2021, 11:15 AM

#

Hi guys, can you guide me on how to go about building this vehicle classifier/tracker system? I have data of vehicles in the form of video recordings

grave frost Feb 22, 2021, 11:43 AM

#

do you want to track or classify?

balmy bear Feb 22, 2021, 12:05 PM

#

hi

onyx drum Feb 22, 2021, 4:26 PM

#

Any suggestions for functions I can use to compute correlations? "pearsonr" from scipy.stats crashes my Jupyter notebook consistently. I have 500000 datapoints for each set of distributions whose correlation I want to compute, but surely there has to be a non-crashy way...

astral path Feb 22, 2021, 4:52 PM

#

velvet thorn `<U32` means 32 character unicode string

thank you! it worked

wise kettle Feb 22, 2021, 5:02 PM

#

What are some good resources to learn about graphs, probability, and statistics? I am auditing an edX course on python that involves things like JupyterLab and data science stuff but I never passed stats in high school

trail breach Feb 22, 2021, 5:08 PM

#

Is anyone familiar with the perception algorithm, trying to use mnist, tensorflow, to figure out how to determine a three or a five

#

looking for a reference

#

I have my data normalized, and separated, but not sure where to go from here.

magic panther Feb 22, 2021, 5:29 PM

#

Anyone know soimething about objective functions? How do i create one out of a 3d or more plot of data

misty flint Feb 22, 2021, 5:48 PM

#

wise kettle What are some good resources to learn about graphs, probability, and statistics?...

statquest on YT

#

best stats explanations ever

#

tina huang

#

DoggoKek

wise kettle Feb 22, 2021, 6:06 PM

#

Thanks!

#

Checking it out ASAP

grave frost Feb 22, 2021, 7:02 PM

#

hi

late shell Feb 22, 2021, 7:02 PM

#

hello. I was building a Multiple Linear Regression Model, with the dummy dataset of 50_Startups which is usually used by beginner ML learners. The data set looks like this

#

#

I made 2 models, one in which I dropped one dummy variable in order to avoid the dummy variable trap after onehot encoding the State feature, and in the other model I didn't drop any dummy variable, kept all 3 of them. But the results were exactly the same

#

same R^2
same predictions
why did this happen? Aren't we supposed to drop one dummy variable in order to avoid multicollinearity ?

iron basalt Feb 22, 2021, 7:31 PM

#

The make_input_fn returns a function which is then later passed to tensorflow during training and evaluation. The reason for the nesting is that train_input_fn and eval_input_fn must be functions that take no arguments. Tensorflow expects / allows them to only have the arguments mode, params and config (and maybe input_context). The functions train_input_fn and eval_input_fn are suppose to return the data, but how are they suppose to do that if they don't take any arguments? How do you get your data to tensorflow? The trick is to either use globals or do the wrapping trick in which the inner function has the data without getting it passed as an argument, but unlike a global, it's not global. In general this is a common trick you will find in python programs. Side note: this is how decorators are implemented.

severe python Feb 22, 2021, 7:45 PM

#

hi data science people

#

have a pandas question that I haven't had a response for in the help channels:

#

made a script that searches an excel file (currently one column) based on user input and prints matching rows which is great, but i would like it to search multiple columns. any idea how to achieve this? here is my code:

from tabulate import tabulate
from termcolor import colored

class bcolors:
    FAIL = '\033[91m'

while True:
    try:
        variable = input("Please provide an acronym:  ")
        variable = variable.upper()
        df = pd.read_excel("accounts.xlsx")
        df = df.set_index('Acronym')
        result = df.loc[variable]
        print(tabulate(result, headers='keys', tablefmt='psql'))
        
    except KeyError:
        print(f"{bcolors.FAIL}Invalid acronym{bcolors.ENDC}")```

iron basalt Feb 22, 2021, 7:52 PM

#

severe python hi data science people

Show input and output.

severe python Feb 22, 2021, 7:54 PM

#

Please provide an acronym: RYAN
+-----------+----------+-----------+-----------+
| Acronym | Parent | Clearer | Account |
|-----------+----------+-----------+-----------|
| RYAN | 26KM291 | GS | 285M322 |
| RYAN | 2378DM | Socgen | 2HKLM242 |
| RYAN | 26KM60 | GS | 285M322 |
| RYAN | 26KM60 | BAML | 268132 |
+-----------+----------+-----------+-----------+
Please provide an acronym:

iron basalt Feb 22, 2021, 7:55 PM

#

Show accounts.xlsx (assuming it's test data and not something that needs to be kept secret).

#

(screenshot a section of it)

severe python Feb 22, 2021, 7:56 PM

#

#

has 4k rows

#

goal is to search by parent or account as well as acronym, searching by clearer isn't necessary

iron basalt Feb 22, 2021, 7:57 PM

#

so you want to be able to search by a key other than Acronym?

severe python Feb 22, 2021, 7:58 PM

#

yes exactly, by the parent ID as well as the account ID

#

and print matching rows (including acronym column)

#

i'm thinking i will need to restructure a lot? @iron basalt

misty flint Feb 22, 2021, 8:25 PM

#

might as well throw it into SQL

#

CCL_Kek

iron basalt Feb 22, 2021, 8:26 PM

#

@severe python I'm back, so you could restructure, much like in a database, but you could also just not use an index and instead do something like df.loc[df["column name"] == value].

#

df.loc will give you the row and df["some name"] gives you the column with that name. You then check that column for all values that match value and get the rows with at that those spots.

#

If you go the index route you can make things a lot faster, but you need to normalize (1NF, 2NF, 3NF, BCNF).

magic pivot Feb 22, 2021, 8:28 PM

#

hi

#

do anyone know about opencv library

#

i have poblem in its harrcascade classfier

#

the .detectMultiScale() method gets stuck while executing

lapis sequoia Feb 22, 2021, 8:30 PM

#

hi

magic pivot Feb 22, 2021, 8:30 PM

#

@lapis sequoia hi

#

can you help me with that problem

lapis sequoia Feb 22, 2021, 8:31 PM

#

don't now nothing

#

about phyton

misty flint Feb 22, 2021, 8:31 PM

#

5_KekBoom

lapis sequoia Feb 22, 2021, 8:31 PM

#

just looking around

magic pivot Feb 22, 2021, 8:31 PM

#

ohk

#

np

#

@misty flint can you help?

misty flint Feb 22, 2021, 8:31 PM

#

sorry ive only used the draw functions with opencv

magic pivot Feb 22, 2021, 8:31 PM

#

ohk np

misty flint Feb 22, 2021, 8:32 PM

#

and the image processing module

iron basalt Feb 22, 2021, 8:32 PM

#

@magic pivot opencv is a buggy mess (on the c++ side, which the python side inherits), try using scikit-image instead. If that does not work, come back.

magic pivot Feb 22, 2021, 8:32 PM

#

@iron basalt thanks

#

i was thinking it was a bug

iron basalt Feb 22, 2021, 8:33 PM

#

@magic pivot https://scikit-image.org/docs/stable/auto_examples/applications/plot_face_detection.html

magic pivot Feb 22, 2021, 8:34 PM

#

@iron basalt thank you very much 🙏

severe python Feb 22, 2021, 8:35 PM

#

@iron basalt so without index, it would look like df.loc[df["Acronym, Parent, Account"] == variable] ?

astral hound Feb 22, 2021, 8:41 PM

#

Hey so if I need to read an image and find it as accurately as possible what methods are there? Canny keeps giving me false positives

iron basalt Feb 22, 2021, 8:41 PM

#

import pandas as pd

df = pd.DataFrame(
    {
        "month": [1, 4, 7, 10],
        "year": [2012, 2014, 2013, 2014],
        "sale": [55, 40, 84, 31]
    }
)

print(df)
print("--------------------------")

print(df.loc[df["year"] == 2014])
print("--------------------------")

print(df.loc[df["year"] == 2014].loc[df["month"] == 4])
print("--------------------------")

#

run that

#

@severe python

#

it gets all rows with year == 2014 and from all those rows it gets all rows with month == 4.

severe python Feb 22, 2021, 8:43 PM

#

AttributeError: partially initialized module 'pandas' has no attribute 'DataFrame' (most likely due to a circular import)

#

i see what you mean, in my case would i just reference variable? which is basically the user input @iron basalt and would i need to change the ending lines?

iron basalt Feb 22, 2021, 8:47 PM

#

yeah, if your user inputs an acronym (wants to search by one), then you just do what you are doing now. If they select to search by Parent, then it's the same, but with parent instead.

#

So here is what you can do

#

There is a loop, the user selects which column they want to search by.

#

Then which value they want to match in that column.

#

It spits out all rows that match.

#

Then go back to step 1. But this time it only searches the remaining rows.

#

rows = df
# In a loop with user input
rows = rows.loc[rows[column_to_search_by] == value_to_match]
print(rows)

severe python Feb 22, 2021, 8:51 PM

#

i see what you're saying

iron basalt Feb 22, 2021, 8:51 PM

#

It filters down till you only have whatever you want left.

severe python Feb 22, 2021, 8:51 PM

#

so there isn't a way to search multiple columns normally right? like instead of asking user what criteria they want to search by

iron basalt Feb 22, 2021, 8:52 PM

#

You can also detect multiple columns were inputted e.g. month, year and then accept multiple values 4, 2014 to speed things up a bit for the user.

#

This would just execute as before, one after another.

severe python Feb 22, 2021, 8:54 PM

#

a little lost

#

i feel like there's a way to search by both. like if i search a parent account, to print that row. if i search an acronym, print that row

#

what is preventing me from doing that? and can i use df.loc with multiple columns referencing the user input? what you said above

iron basalt Feb 22, 2021, 8:59 PM

#

severe python i feel like there's a way to search by both. like if i search a parent account, ...

The solution I gave you does exactly that.

severe python Feb 22, 2021, 9:01 PM

#

gotcha, can you double check this with the full code i gave above?
df.loc[df["Acronym, Parent, Account"] == variable]

iron basalt Feb 22, 2021, 9:11 PM

#

import pandas as pd

df = pd.DataFrame(
    {
        "month": [1, 4, 7, 10],
        "year": [2012, 2014, 2013, 2014],
        "sale": [55, 40, 84, 31]
    }
)

print(df)
print("--------------------------")

while True:
    search = input("Enter your query: ")

    if search == "quit":
        break

    sp = search.split(",")

    col = sp[0]
    val = int(sp[1]) 

    print("Showing all results where {} == {}:".format(col, val))
    print(df.loc[df[col] == val])

#

@severe python

#

Example output:

#

   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31
--------------------------
Enter your query: sale,55
Showing all results where sale == 55:
   month  year  sale
0      1  2012    55

iron basalt Feb 22, 2021, 9:23 PM

#

severe python gotcha, can you double check this with the full code i gave above? ```df.loc[df[...

No you can't do that in pandas. The solution is really simple, just search by Acronym, then take those results and search by Parent, and then take those results and search by Account.

silk axle Feb 22, 2021, 9:26 PM

#

Is there a way to install TensorFlow with GPU support on Windows 10 without using anaconda? All the tutorials I've seen are either outdated or use anaconda which I don't have

iron basalt Feb 22, 2021, 9:27 PM

#

The reason why they use anaconda is because tensorflow GPU needs the CUDA toolkit.

#

(And conda has that)

silk axle Feb 22, 2021, 9:29 PM

#

Right

iron basalt Feb 22, 2021, 9:29 PM

#

https://developer.nvidia.com/cuda-toolkit-archive

NVIDIA Developer

CUDA Toolkit Archive

Previous releases of the CUDA Toolkit, GPU Computing SDK, documentation and developer drivers can be found using the links below. Please select the release you want from the list below, and be sure to check www.nvidia.com/drivers for more recent production drivers appropriate for your hardware configuration.

silk axle Feb 22, 2021, 9:30 PM

#

But could I not just install the CUDA toolkit myself?

iron basalt Feb 22, 2021, 9:30 PM

#

You can

silk axle Feb 22, 2021, 9:30 PM

#

Or is this just something where I should install anaconda?

final granite Feb 22, 2021, 9:30 PM

#

Not sure if there's an interest in this, but I've been working on a channel for Python for a while. It's not data science-centric, but it is text-analysis specific with bits of DS thrown in the mix. Thought I'd share it here: https://www.youtube.com/pythontutorialsfordigitalhumanities

YouTube

Python Tutorials for Digital Humanities

On this channel, I provide tutorials for working with Python in a digital humanities project. I design my videos and tutorials for humanists who have no coding experience. I am a medieval historian by trade, but I create my videos with all humanists in mind. If you want to interact with the videos in more dynamic ways, check out my website, www....

serene scaffold Feb 22, 2021, 9:33 PM

#

final granite Not sure if there's an interest in this, but I've been working on a channel for ...

You may be familiar with our #rules about self-promotion. On-topic self-promotion is a bit of a grey area, so try to make sure references to your own content are part of a legitimate effort to discuss that content.

I've actually been working with NLP for a while. What sort of text analysis are you doing?

final granite Feb 22, 2021, 9:34 PM

#

Oh dear. I didn't mean to violate the rules. I am sincerely sorry. Everything from custom NER for domain-specific problems to topic modeling.

#

I don't make money of these videos, just helping develop them as part of a postdoc and spreading the word.

serene scaffold Feb 22, 2021, 9:34 PM

#

final granite Oh dear. I didn't mean to violate the rules. I am sincerely sorry. Everything fr...

no problem, I was just telling you that for your information. I'm not accusing you of "drop it and run" self promotion

final granite Feb 22, 2021, 9:35 PM

#

Ah gotcha. No nothing like that.

serene scaffold Feb 22, 2021, 9:35 PM

#

I also see from your message history that you finished an NLP project with spaCy. My first big Python project was refactoring a spaCy-based NER package that my coworker wrote.

final granite Feb 22, 2021, 9:35 PM

#

Thanks for the heads up about the rules, though. I'll be more explicit in my intentions if I do something like that in the future.

#

Oh cool. Indeed. I am a huge fan of spaCy. Looking forward to preparing a series of tutorials for version 3.0. There's lots to unpack in the new update and I haven't had the time to fully explore it yet

serene scaffold Feb 22, 2021, 9:37 PM

#

Oh, there's going to be a new major release?

final granite Feb 22, 2021, 9:37 PM

#

I wrote a textbook on NER using spaCy. ner.pythonhumanities.com , if you are interested

#

Was released Feb 1.

#

They are moving towards BERT. Results are expected. Marked increase in accuracy at the cost of performance, but not as much as competitors.

serene scaffold Feb 22, 2021, 9:38 PM

#

Interesting. One of the reasons we didn't build our NER package entirely through spaCy (we collected features from Doc instances and used other learners) was because we eventually wanted to use BERT. Which we did.

#

But you're telling me that BERT embeddings will be what ship with the large model?

final granite Feb 22, 2021, 9:40 PM

#

I have only toyed with spaCy 3.0, but it's a fairly easy implementation of BERT. Lots more customization now too with 3.0. You can control your ANN architecture

#

No. There is a .trf model that is the BERT model. They still have all the same sm, md, lg models with embeddings

#

So you can still use the old embeddings, if you desire

serene scaffold Feb 22, 2021, 9:42 PM

#

Interesting. Right now my advisor has me working on dataset ablation and that might carry me through to when I leave the college. I'm in my last semester of undergrad and I've just had the opportunity to do research because she took a chance on me.

final granite Feb 22, 2021, 9:43 PM

#

Oh that's really cool. DS/NLP is a fun career path. I'm a historian by training

serene scaffold Feb 22, 2021, 9:43 PM

#

That is to say, your undergrad was in history but you pursued CS thereafter and that's how you're a postdoc?

final granite Feb 22, 2021, 9:44 PM

#

Oh no my B.A., M.A., and PhD are all in medieval history. During my PhD I taught myself DS/CS/and programming in secret so that I could do the research I wanted

serene scaffold Feb 22, 2021, 9:45 PM

#

final granite Oh no my B.A., M.A., and PhD are all in medieval history. During my PhD I taught...

I'd have to DM you to discuss that since history is off topic. Do you mind?

final granite Feb 22, 2021, 9:45 PM

#

Go for it. I will be dashing soon, though so I can only chat for a few miun

#

min*

lone blaze Feb 22, 2021, 10:30 PM

#

aa = []
varss = []
for i in range(10000):
    x1 = np.random.rand()*2-1
    x2 = np.random.rand()*2-1
    X = np.array([[x1], [x2]])

    y1 = f(x1)
    y2 = f(x2)
    y = np.array([[y1], [y2]])

    a = np.linalg.solve(X.T @ X, X.T @ y)[0][0]
    var = scipy.integrate.quad(lambda x: (a*x-1.4286*x)**2, -1, 1)
    error = scipy.integrate.quad(lambda x: (a*x - f(x))**2, -1, 1)


    aa.append(a)
    varss.append(var)
    errors.append(error)
    
variance = np.mean(varss)
error = np.mean(errors)
print("variance", variance)
print("error", error)
    
ahat = np.mean(aa)
print("ahat", ahat)

plt.plot(x, f(x))

#

No idea where (ax-1.4286x)^2, 1.4286 came from?

#

#

My thoughts: it seems to me like the guy who wrote the code first calculated ghat
and got it to be approximately 1.4286
then put the numerical value in, since it won't really differ by much
and then avoid having to do all the calculations twice
seems reasonable or did I get something wrong?

astral path Feb 22, 2021, 10:35 PM

#

I have a dataframe with two columns

#

and anoth dataframe with a 3rd column representing the count of each pairing of values from the first dataframe with multiindexing

#

how would i populate a 3rd column in the original dataframe with the value of the count in the second dataframe for each row's pairing?

#

i originally thought something along the lines of mb_counts.loc([plot_df['merchantID'], plot_df['BrandID']]) but that doesn't work because it's using lists as an index

exotic maple Feb 22, 2021, 10:38 PM

#

I think you should be to able to create a new column with apply and a lambda

#

something like

#

DF["NEW COLUMN"] = df.loc["applicable_index"].apply(len)

#

I -think- thats what yo want no? the size of the multiindex?

#

or do you want the count of the elements inside the index, per element?

astral path Feb 22, 2021, 10:41 PM

#

what i'm trying to do is if there's, say 4 rows in the original dataframe where merchantID is 9359 and BrandID is 8360, then the other dataframe contains the number 4 at the multiindex of merchantID being 9359 and BrandID being 8360

#

im adding a column to the original dataframe which contains the number of times each row occurs in the dataframe

exotic maple Feb 22, 2021, 10:45 PM

#

yeah you can do something like what i said, if i got you right

#

#

so for example that multiindex has 4 (in the dead) test

#

i can get the number with len(df.loc["Alabama])

#

#

and you can multiindex it too

#

so what you need is to pass len into an apply

#

to get the size based on a multiindex loc

astral path Feb 22, 2021, 10:47 PM

#

hmm ok i'll try that

astral path Feb 22, 2021, 10:50 PM

#

exotic maple DF["NEW COLUMN"] = df.loc["applicable_index"].apply(len)

i tried merchant_brand_df['counts'] = merchant_brand_df.loc[['merchantID', 'BrandID']].apply(len) and got KeyError: "None of [Index(['merchantID', 'BrandID'], dtype='object')] are in the [index]"

velvet thorn Feb 22, 2021, 10:50 PM

#

no

exotic maple Feb 22, 2021, 10:51 PM

#

that means those are not in the index

#

i'm also testing the solution myself and its not optimal. let me try something else

astral path Feb 22, 2021, 10:51 PM

#

ok

velvet thorn Feb 22, 2021, 10:51 PM

#

that’s wrong

#

you want to merge.

#

AKA join

astral path Feb 22, 2021, 10:52 PM

#

merge the counts dataframe with the original one?

exotic maple Feb 22, 2021, 10:52 PM

#

ahhhh

velvet thorn Feb 22, 2021, 10:52 PM

#

ye

exotic maple Feb 22, 2021, 10:52 PM

#

yes that's actually much better

scenic patio Feb 22, 2021, 10:52 PM

#

may i ask some resources to learn data science.

astral path Feb 22, 2021, 10:52 PM

#

ahhh ok

exotic maple Feb 22, 2021, 10:52 PM

#

you can do it via pivot as well

#

or groupby

velvet thorn Feb 22, 2021, 10:52 PM

#

merge on those two columns

exotic maple Feb 22, 2021, 10:52 PM

#

gorupby with .agg {"count"}

velvet thorn Feb 22, 2021, 10:52 PM

#

exotic maple gorupby with .agg {"count"}

by “count”

exotic maple Feb 22, 2021, 10:52 PM

#

yeah honestly grouping / merging would be better there, @velvet thorn is right

velvet thorn Feb 22, 2021, 10:53 PM

#

doesn’t @astral path mean like just the value in the second DF

#

which is the already computed count

astral path Feb 22, 2021, 10:53 PM

#

yeah

exotic maple Feb 22, 2021, 10:53 PM

#

he's trying to compute it, though, right?

#

Or did i misunderstand the whole thing lol

astral path Feb 22, 2021, 10:53 PM

#

i already have it computed in the second dataframe

exotic maple Feb 22, 2021, 10:54 PM

#

ok i'm dumb lol

astral path Feb 22, 2021, 10:54 PM

#

although if it's better to compute it in a one-liner that also adds a column then that's better

exotic maple Feb 22, 2021, 10:54 PM

#

@velvet thorn 's approach is much cleaner

#

try using pivot or groupby

#

and aggregate

#

via len/count

#

@scenic patio any specifics in mind?

velvet thorn Feb 22, 2021, 10:55 PM

#

also @astral path a tip

scenic patio Feb 22, 2021, 10:55 PM

#

nope i just learned basics of python

velvet thorn Feb 22, 2021, 10:55 PM

#

in general for this kind of question

exotic maple Feb 22, 2021, 10:55 PM

#

My advice would be to first learn python

velvet thorn Feb 22, 2021, 10:55 PM

#

if you can provide a runnable sample of input and expected data

#

like something people can copy paste and run

scenic patio Feb 22, 2021, 10:55 PM

#

i was searching through videos and learned that python is good for ds

velvet thorn Feb 22, 2021, 10:55 PM

#

it gets a lot easier to understand your problem

astral path Feb 22, 2021, 10:56 PM

#

how would I get that expected data sample in an easily exportable way from python?

exotic maple Feb 22, 2021, 10:56 PM

#

@scenic patio There's Pthon, R, and Julia is interesting too, for example

#

Python is the most popular though

velvet thorn Feb 22, 2021, 10:56 PM

#

astral path how would I get that expected data sample in an easily exportable way from pytho...

I would just make one up

astral path Feb 22, 2021, 10:56 PM

#

ah ok

velvet thorn Feb 22, 2021, 10:56 PM

#

with a few rows

exotic maple Feb 22, 2021, 10:56 PM

#

do you have a background with decent math studies?

scenic patio Feb 22, 2021, 10:56 PM

#

i am in high school

exotic maple Feb 22, 2021, 10:56 PM

#

Ok you need to learn some math then

#

Try Khan Academy on these topics: Linear Algebra. Calculus I, II, III (vectors)

#

Multivariate calculus

#

As per resources...there's countless to be honest lol

astral path Feb 22, 2021, 10:57 PM

#

exotic maple Try Khan Academy on these topics: Linear Algebra. Calculus I, II, III (vectors)

also statistics

scenic patio Feb 22, 2021, 10:57 PM

#

thats the problem so much resources dont know what to use

exotic maple Feb 22, 2021, 10:57 PM

#

I'm going through the University of Michigans Applied DAta Science

#

which is in python

#

and its pretty good

#

but be aware of one thing

#

not one resource EVER is giong to teach you everthing

#

you need to research a lot b yyourself

scenic patio Feb 22, 2021, 10:58 PM

#

i am in a midst of exploring web development and ds

#

dont know what i want to main

astral path Feb 22, 2021, 10:58 PM

#

https://worldpece.org/sites/default/files/datastyle.pdf
this is a fantastic book that my data wrangling professor used to teach us the fundamental questions behind data science

exotic maple Feb 22, 2021, 10:58 PM

#

Well, that's an important life topic you shouldnt ask strangers in discord about lol

#

but for data science you can start there

scenic patio Feb 22, 2021, 10:59 PM

#

yh i just want to know what ds resources are reliable

exotic maple Feb 22, 2021, 10:59 PM

#

University of Michigans Applied DAta Science -> I like this, but there are others

astral path Feb 22, 2021, 10:59 PM

#

scenic patio yh i just want to know what ds resources are reliable

i would search for textbooks/lectures/resources that universities provide or use

exotic maple Feb 22, 2021, 10:59 PM

#

Codequest? I think its good too

#

EDX has a fantastic data science esp from Harvard, but thats in R, not Python

scenic patio Feb 22, 2021, 11:00 PM

#

what about this website i stumbled upon: dataquest

exotic maple Feb 22, 2021, 11:00 PM

#

I think thats good

#

ive heard good reviews about

#

it

#

but i havent tried it mysel

#

dont get stuck in tutorial hell though dude

#

review courses

#

choose one

#

learn as much as you can

#

and then do a project of your interests, whatever that is

grave frost Feb 22, 2021, 11:03 PM

#

agreed, you would learn more in a project than in a course with spoon-fed material

scenic patio Feb 22, 2021, 11:04 PM

#

thank you!!!

grave frost Feb 22, 2021, 11:04 PM

#

IMO that's one of the most enjoyable ways to learn. Because the project would be something you like and would be cool, you won't lose motivation.

exotic maple Feb 22, 2021, 11:05 PM

#

Exactly

#

choose a topic you like, find data about it, and make sense of it

#

You like anime? Try getting anime data depending on seasons, gender, profits, etc, and try to predict anime's success based on 2 or 3 parameters.

#

You like weather? Plenty of datasets lol

#

economics? shitloads of datasets

#

I think even pornhub has API nowadays...

grave frost Feb 22, 2021, 11:06 PM

#

That turned dark pretty quickly 🙂

misty flint Feb 22, 2021, 11:06 PM

#

~~dont corrupt the highschooler~~

#

ID_BoomKek

grave frost Feb 22, 2021, 11:06 PM

#

😆

exotic maple Feb 22, 2021, 11:06 PM

#

he'0s in high school

#

not kinder lmao

misty flint Feb 22, 2021, 11:07 PM

#

ID_BoomKek

exotic maple Feb 22, 2021, 11:07 PM

#

besides, I just said API :p

misty flint Feb 22, 2021, 11:07 PM

#

true

#

kids gotta grow up fast nowadays

#

amegablobsweats

exotic maple Feb 22, 2021, 11:07 PM

#

actually, jokes aside, those guys at PH have some decent datasets and insights

#

i always laugh at their yearly summaries lol

astral path Feb 22, 2021, 11:07 PM

#

facts

grave frost Feb 22, 2021, 11:07 PM

#

Well what do you know - there actually is an API for PH

misty flint Feb 22, 2021, 11:07 PM

#

i heard thats what let them stay ahead of their competition. them using data science and ML

astral path Feb 22, 2021, 11:07 PM

#

imagine getting a masters in statistics to go work for pornhub

exotic maple Feb 22, 2021, 11:07 PM

#

grave frost Well what do you know - there actually is an API for PH

Yeah, how would I know

grave frost Feb 22, 2021, 11:07 PM

#

I thought MS wont allow it

exotic maple Feb 22, 2021, 11:08 PM

#

-looks away-

exotic maple Feb 22, 2021, 11:08 PM

#

astral path imagine getting a masters in statistics to go work for pornhub

"What do you do your a living?" I investigate what people jerk off to

scenic patio Feb 22, 2021, 11:08 PM

#

true

grave frost Feb 22, 2021, 11:08 PM

#

exotic maple "What do you do your a living?" I investigate what people jerk off to

predicting that with a model

exotic maple Feb 22, 2021, 11:08 PM

#

Data Science in PH -> Interquartile range of "session" lenght based on categories

misty flint Feb 22, 2021, 11:09 PM

#

scenic patio true

dw too much about figuring out what you want to do. you still have time. take a chance to explore everything and find what you like

exotic maple Feb 22, 2021, 11:09 PM

#

this guy is so ahead of things he shouldnt stress

misty flint Feb 22, 2021, 11:09 PM

#

yeah def

exotic maple Feb 22, 2021, 11:10 PM

#

im 28, with career and shit, and im still a bit loss lmao

astral path Feb 22, 2021, 11:10 PM

#

i got my problem fixed merchant_brand_df = merchant_brand_df.groupby(['merchantID', 'BrandID']).size().to_frame('count').reset_index()

exotic maple Feb 22, 2021, 11:10 PM

#

thought Im liking data science a lot

misty flint Feb 22, 2021, 11:10 PM

#

exotic maple im 28, with career and shit, and im still a bit loss lmao

same except grad school

#

ID_BoomKek

exotic maple Feb 22, 2021, 11:10 PM

#

astral path i got my problem fixed `merchant_brand_df = merchant_brand_df.groupby(['merchant...

why the reset index though? 'screeches at positional indexes'

misty flint Feb 22, 2021, 11:10 PM

#

mid career change

#

to data science

#

or at least trying

#

DoggoKek

#

you at least seem to know your stuff unlike me

astral path Feb 22, 2021, 11:11 PM

#

depressing thought: it would suck being the data scientist for the FBI who has to create algorithms for detecting certain illegal kinds of porn

misty flint Feb 22, 2021, 11:11 PM

#

NervousSip

misty flint Feb 22, 2021, 11:11 PM

#

astral path depressing thought: it would suck being the data scientist for the FBI who has t...

depressing yes but someones got to do it

astral path Feb 22, 2021, 11:11 PM

#

you would have to go through and analyze the features of those videos and that could be scarring

misty flint Feb 22, 2021, 11:12 PM

#

astral path you would have to go through and analyze the features of those videos and that c...

oh no, i heard its mostly looking at metadata

astral path Feb 22, 2021, 11:12 PM

#

oh thank god

misty flint Feb 22, 2021, 11:12 PM

#

then if its actual stuff they investigate it more closely

grave frost Feb 22, 2021, 11:12 PM

#

misty flint same except grad school

I am starting to feel too young 😅

misty flint Feb 22, 2021, 11:12 PM

#

grave frost I am starting to feel too young 😅

how young are you NervousSip

exotic maple Feb 22, 2021, 11:12 PM

#

grave frost I am starting to feel too young 😅

Age is just a number

#

oh hi FBI

grave frost Feb 22, 2021, 11:13 PM

#

exotic maple Age is just a number

Agreed

misty flint Feb 22, 2021, 11:13 PM

#

exotic maple Age is just a number

and jail is just a place

#

E_MonkaFBI

exotic maple Feb 22, 2021, 11:13 PM

#

misty flint and jail is just a place

Technically....

grave frost Feb 22, 2021, 11:13 PM

#

Juvenile your 3rd home

astral path Feb 22, 2021, 11:13 PM

#

exotic maple Age is just a number

oh wait i took this the wrong way based on prior context

#

or did i

misty flint Feb 22, 2021, 11:13 PM

#

double entendre?

#

NervousSip

astral path Feb 22, 2021, 11:13 PM

#

https://tenor.com/view/breaking-bad-walter-gif-10358432

Tenor

exotic maple Feb 22, 2021, 11:14 PM

#

Humanity will be doomed the day a NPL algorithm understands double meaning jokes

#

in multiple languages

astral path Feb 22, 2021, 11:14 PM

#

oh boy

#

i cant wait for the day algorithms can generate jokes so complex and clever that only other AIs can understand it

misty flint Feb 22, 2021, 11:15 PM

#

AI jokes

#

NervousSip

astral path Feb 22, 2021, 11:15 PM

#

centuple-entendre jokes

grave frost Feb 22, 2021, 11:16 PM

#

https://www.cnet.com/news/what-happens-when-ai-bots-invent-their-own-language/

CNET

Facebook puts cork in chatbots that created a secret language

Alice and Bob, the two bots, raise questions about the future of artificial intelligence.

astral path Feb 22, 2021, 11:16 PM

#

http://www.ucolick.org/~enrico/ast111/material_files/libraryofbabel.pdf

#

could be of related interest

#

fictional short story but with massive societal implications

exotic maple Feb 22, 2021, 11:17 PM

#

reddit and twitter are basically 50% bots

#

jerking off and upvoting each other

#

and creating trends out of nowhere

astral path Feb 22, 2021, 11:17 PM

#

exotic maple jerking off and upvoting each other

literally sometimes

exotic maple Feb 22, 2021, 11:17 PM

#

change my mind

astral path Feb 22, 2021, 11:17 PM

#

https://www.reddit.com/r/all/new/

r/all

Today’s top content from hundreds of thousands of Reddit communities.

#

at least 1/3 of new reddit posts are porn bots

#

oh god not marge

misty flint Feb 22, 2021, 11:19 PM

#

grave frost https://www.cnet.com/news/what-happens-when-ai-bots-invent-their-own-language/

Pika

#

amegablobsweats

#

big yikes moment

grave frost Feb 22, 2021, 11:19 PM

#

Nothing serious tho

#

Right now our Ai is not that advanced. It's just an effective mode of communication

#

https://towardsdatascience.com/the-truth-behind-facebook-ai-inventing-a-new-language-37c5d680e5a7

Medium

The truth behind Facebook AI inventing a new language

There have been so many articles published about Facebook shutting down its robots after they developed their own language. The media is…

misty flint Feb 22, 2021, 11:23 PM

#

~~for now~~

#

but yeah i getcha

#

semi-related

#

theres this poster thing at my uni, and i wanna do a poster about ai and ethics/society but idk what would be interesting to others

grave frost Feb 22, 2021, 11:24 PM

#

https://www.youtube.com/watch?v=WnzlbyTZsQY
The comments are pure gold 😁

YouTube

CornellCCSL

AI vs. AI. Two chatbots talking to each other

Are you a Robot or a Unicorn? Let the world know: http://yosinski.com/IAmAUnicorn/

What happens when you let two bots have a conversation? We certainly never expected this... (More: http://creativemachines.cornell.edu/AI-vs-AI)

By Igor Labutov, Jason Yosinski, and Hod Lipson of the Cornell Creative Machines Lab (http://creativemachine...

▶ Play video

misty flint Feb 22, 2021, 11:25 PM

#

ID_BoomKek

#

what is this

exotic maple Feb 22, 2021, 11:28 PM

#

the one thing I havent been able to find in a "simple" understandable way is the theory behind some of the ML algorithms

#

Like, yeah I 99% care about the applied part, but Im curious about the theory too lol 😦

grave frost Feb 22, 2021, 11:31 PM

#

Ofc you can learn about theory - or get an idea at least if you watch 3B1B

#

But the real challenge is how exactly does a model learn?

exotic maple Feb 22, 2021, 11:32 PM

#

grave frost Ofc you can learn about theory - or get an idea at least if you watch 3B1B

3B1B?

grave frost Feb 22, 2021, 11:32 PM

#

3Blue 1Brown - youtube channel

exotic maple Feb 22, 2021, 11:33 PM

#

depends on what you mean by "learn". for advanced stuff like image recognition or NPL I have no clue -yet-

velvet thorn Feb 22, 2021, 11:33 PM

#

grave frost But the real challenge is how exactly does a model learn?

hm

#

which algorithm specifically

misty flint Feb 22, 2021, 11:33 PM

#

grave frost But the real challenge is how exactly does a model learn?

lol that video was hilarious

grave frost Feb 22, 2021, 11:33 PM

#

No, I mean interpretability of Black Box models. Its a pretty hot research topic

misty flint Feb 22, 2021, 11:33 PM

#

i thought for AI like this its a lot of reinforcement learning

grave frost Feb 22, 2021, 11:34 PM

#

misty flint i thought for AI like this its a lot of reinforcement learning

no, they are different

misty flint Feb 22, 2021, 11:34 PM

#

wait what are we talking about

#

ID_BoomKek

velvet thorn Feb 22, 2021, 11:34 PM

#

grave frost No, I mean interpretability of Black Box models. Its a pretty hot research topic

it is

#

not sure if "how does a model learn" is exactly the same thing though

grave frost Feb 22, 2021, 11:34 PM

#

The underlying RL structure has MDP (Markov DEcision processes) at its core. Normal ML is basically supervised .... function mapping? dunno the exact term lolol

velvet thorn Feb 22, 2021, 11:35 PM

#

the techniques that have been developed to interpret DL models are p cool though

grave frost Feb 22, 2021, 11:35 PM

#

velvet thorn the techniques that have been developed to interpret DL models are p cool though

true. Some visualizations are pretty cool

exotic maple Feb 22, 2021, 11:35 PM

#

That personally trascends me lol I'm more interested in the ways to apply ML / DL / NPL in real life

velvet thorn Feb 22, 2021, 11:35 PM

#

grave frost The underlying RL structure has MDP (Markov DEcision processes) at its core. Nor...

not necessarily

#

most commonly, yes

astral path Feb 22, 2021, 11:35 PM

#

woahhh

exotic maple Feb 22, 2021, 11:35 PM

#

theory is cool, but its not my thing

astral path Feb 22, 2021, 11:35 PM

#

huge heatmap zoomed way out

exotic maple Feb 22, 2021, 11:36 PM

#

excellent example of "WTF IS THAT CHART?"

grave frost Feb 22, 2021, 11:36 PM

#

@velvet thorn Hmm.. weren't all RL based on MDP? Sorry if I am behind times 🤷 I only know basic theory

exotic maple Feb 22, 2021, 11:36 PM

#

I have no idea of what that heatmap is lol

misty flint Feb 22, 2021, 11:36 PM

#

uhh explainability. the only thing i know is this is a cool tool: https://github.com/slundberg/shap

GitHub

slundberg/shap

A game theoretic approach to explain the output of any machine learning model. - slundberg/shap

velvet thorn Feb 22, 2021, 11:36 PM

#

grave frost <@!171929073063297024> Hmm.. weren't all RL based on MDP? Sorry if I am behind t...

most are

#

but

grave frost Feb 22, 2021, 11:37 PM

#

A3C maybe? I dont get it 🙂

velvet thorn Feb 22, 2021, 11:38 PM

#

it's just like how there's DL without gradient descent

#

(admittedly a less "out there" case)

grave frost Feb 22, 2021, 11:38 PM

#

velvet thorn it's just like how there's DL without gradient descent

Hmmm....

exotic maple Feb 22, 2021, 11:39 PM

#

iron basalt Feb 22, 2021, 11:39 PM

#

@grave frost RL has a ton of methods, and an obivous example of non MDP is when they are solving a partially observable MDP, which is a much more interesting (and much more difficult problem).

grave frost Feb 22, 2021, 11:39 PM

#

I stopped learning about RL cuz I don't think it's very useful for real-world application. mostly for playing games. The ones that are useful require a million lines of code with no libs

grave frost Feb 22, 2021, 11:39 PM

#

iron basalt <@!738058085083381760> RL has a ton of methods, and an obivous example of non MD...

complex env?

misty flint Feb 22, 2021, 11:39 PM

#

isnt RL used heavily in robotics?

#

pithink

velvet thorn Feb 22, 2021, 11:39 PM

#

misty flint isnt RL used heavily in robotics?

that's one application, yes

misty flint Feb 22, 2021, 11:40 PM

#

thats the only one ik ID_BoomKek

grave frost Feb 22, 2021, 11:40 PM

#

evolution

iron basalt Feb 22, 2021, 11:40 PM

#

@grave frost Reality is only partially observable.

velvet thorn Feb 22, 2021, 11:40 PM

#

grave frost evolution

do you mean genetic algorithms?

misty flint Feb 22, 2021, 11:40 PM

#

i also recently learned about that

#

very interesting way

grave frost Feb 22, 2021, 11:40 PM

#

velvet thorn do you mean genetic algorithms?

yep. those too, but even the simple ones can replicate simple evolutionary processes

misty flint Feb 22, 2021, 11:40 PM

#

of doing things

grave frost Feb 22, 2021, 11:41 PM

#

carykh does that kinda things - simple RL algo for simulating animals, development and such

exotic maple Feb 22, 2021, 11:42 PM

#

Is there a way to add inline code in discord?

misty flint Feb 22, 2021, 11:42 PM

#

backticks

velvet thorn Feb 22, 2021, 11:44 PM

#

`code` -> code

grave frost Feb 22, 2021, 11:44 PM

#

Haha 🤣

iron basalt Feb 22, 2021, 11:44 PM

#

@grave frost One thing to note is that RL is normal ML. RL, supervised, unsupervised are all ML, just supervised is typically the most immediately applicable one.

grave frost Feb 22, 2021, 11:45 PM

#

iron basalt <@!738058085083381760> One thing to note is that RL is normal ML. RL, supervised...

Hmmm.... debatable

iron basalt Feb 22, 2021, 11:45 PM

#

I'm pretty sure that is just kind of set in stone, but ok

velvet thorn Feb 22, 2021, 11:45 PM

#

grave frost Hmmm.... debatable

can you elaborate

misty flint Feb 22, 2021, 11:45 PM

#

what else would Reinforcement Learning be if its not Machine Learning?

#

pithink

#

most places ive read place RL under ML

#

unless you consider it under the umbrella term AI instead

grave frost Feb 22, 2021, 11:47 PM

#

velvet thorn can you elaborate

I think it's more like an algo that maximizes reward. ML to me seems a bit.... Well, its kinda similar but still 🥴

iron basalt Feb 22, 2021, 11:47 PM

#

grave frost Feb 22, 2021, 11:47 PM

#

yeah, well I guess its the same 🤷

iron basalt Feb 22, 2021, 11:47 PM

#

wikipedia machine learning page

misty flint Feb 22, 2021, 11:47 PM

#

yeah its like the black sheep but it happens

iron basalt Feb 22, 2021, 11:48 PM

#

RL is arguable the most difficult and would be most essential to creating AI. But that means that few people bother with it because it can't be immediately applied most of the time (which means there is no money in it).

misty flint Feb 22, 2021, 11:49 PM

#

to creating General AI?

#

you mean?

grave frost Feb 22, 2021, 11:49 PM

#

iron basalt RL is arguable the most difficult and would be most essential to creating AI. Bu...

nah, its just that the translation from simulated environments to real world is much jittery than supervised or unsup

iron basalt Feb 22, 2021, 11:49 PM

#

I recommend https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf if you are interested. It's the goto introduction.

grave frost Feb 22, 2021, 11:50 PM

#

grave frost nah, its just that the translation from simulated environments to real world is ...

OpenAi has some research for such translation but the tasks are pretty narrow to be very usefull

iron basalt Feb 22, 2021, 11:50 PM

#

@grave frost Ideally for AI you need to be able to do RL online in the real world, not in a simulation (like a human).

grave frost Feb 22, 2021, 11:50 PM

#

iron basalt <@!738058085083381760> Ideally for AI you need to be able to do RL online in the...

ofc, that is the end goal. its just not viable with current tech

#

atleast not commercially

iron basalt Feb 22, 2021, 11:50 PM

#

It's getting pretty close, just not popularized.

grave frost Feb 22, 2021, 11:51 PM

#

iron basalt It's getting pretty close, just not popularized.

I think it's pretty popular 🤷 people use it for games all the time 🙂 just less for real-world applications

iron basalt Feb 22, 2021, 11:51 PM

#

I don't see too many games with RL.

#

Or AI at all.

#

Game AI is a different thing, it does not learn.

#

Perhaps you mean Dynamic Programming? That is part of ML / a technique that can be used, but it's not RL.

grave frost Feb 22, 2021, 11:53 PM

#

iron basalt Perhaps you mean Dynamic Programming? That is part of ML / a technique that can ...

I don't know dynamic programming. what's that??

grave frost Feb 22, 2021, 11:54 PM

#

iron basalt Game AI is a different thing, it does not learn.

it does actually

iron basalt Feb 22, 2021, 11:54 PM

#

It's the general idea of breaking down a problem into subproblems, solving those, combining those answers and so on.

velvet thorn Feb 22, 2021, 11:54 PM

#

iron basalt Game AI is a different thing, it does not learn.

some do

iron basalt Feb 22, 2021, 11:55 PM

#

Yeah some do, but I don't see it too often.

grave frost Feb 22, 2021, 11:55 PM

#

iron basalt It's the general idea of breaking down a problem into subproblems, solving those...

nah, that's not the case with AI for DOTA

iron basalt Feb 22, 2021, 11:55 PM

#

Game AI usually refers to the traditional stuff like GOAP. Or a chess bot.

grave frost Feb 22, 2021, 11:55 PM

#

Games like DOTA are way too complex to be broken into anyting

iron basalt Feb 22, 2021, 11:56 PM

#

You mean like how DL was used to make a DOTA bot?

#

Yeah that is a more recent development in game design.

grave frost Feb 22, 2021, 11:56 PM

#

The model does real-time inferencing and partial learning (IDK how exactly does that work) to study the players patterns and counter

#

maybe partial learning is kinda like supervised fine-tuning? who knows

iron basalt Feb 22, 2021, 11:57 PM

#

However, the DL bots in DOTA don't learn, they only do inference, which is much easier than doing online learning.

#

(DL requires lots of iterations to learn things so it's unsuitable for online learning)

#

(and other reasons)

grave frost Feb 22, 2021, 11:58 PM

#

iron basalt (DL requires lots of iterations to learn things so it's unsuitable for online le...

ikr

#

but I don't see why an architecture cannot store the players moves into memory to be interpreted by a part of NN (Like previous timesteps) to predict future moves 🤷 A naive workaround

velvet thorn Feb 22, 2021, 11:59 PM

#

grave frost but I don't see why an architecture cannot store the players moves into memory t...

so you're saying

#

log what's happening

#

but don't use it for training

#

right then?

grave frost Feb 22, 2021, 11:59 PM

#

Just an idea bud

#

I don't know what OpenAi has done. But by that, I meant that the model can be trained to learn this new piece of info and try to predict the players moves and adjust accordingly

velvet thorn Feb 23, 2021, 12:00 AM

#

🥴

grave frost Feb 23, 2021, 12:00 AM

#

not exactly real-time learning, but you can't expect me to come up with an idea right now

#

🙂

#

honestly, even if the model does better job than human without real-time learning, I doubt it would make much difference on the job to be done

iron basalt Feb 23, 2021, 12:02 AM

#

The DL bots do prediction, but they 1. do it with the exact position of the players and they know where all the players are at all time (much easier than what a human need to do from screen pixels and limited knowledge of the game state). 2. they don't learn online, they do it later. 3. They often learn to abuse their ability to do super human timing (a human needs to slowly process the pixel data, and do a complex learned sequence of muscle commands to do things).

grave frost Feb 23, 2021, 12:03 AM

#

a human needs to slowly process the pixel data, and do a complex learned sequence of muscle commands to do things
That's such a big insult to reflexology lolo

iron basalt Feb 23, 2021, 12:03 AM

#

slowly compared to a computer

grave frost Feb 23, 2021, 12:03 AM

#

Gamers rely on their muscle memory which is the most effecient thing

iron basalt Feb 23, 2021, 12:03 AM

#

Computers are limited by the speed of light (memory transfer speeds).

grave frost Feb 23, 2021, 12:03 AM

#

iron basalt Computers are limited by the speed of light (memory transfer speeds).

But the computations

#

https://tenor.com/view/matthew-perry-chandler-bing-you-friends-pointing-gif-16589860

Tenor

iron basalt Feb 23, 2021, 12:04 AM

#

Modern computers are bottle-necked by the memory speed, not computations on that memory.

#

(see caching)

grave frost Feb 23, 2021, 12:05 AM

#

When you have to make computations that fast, often multiple devices are required CPU, GPU, RAM, etc. all the time taken to immediately process things adds overheads

#

you can't expect all compuations to be placed on 1 device

#

its a colab b/w CPU and GPU (+ RAM)

#

that adds overhead. M1 tries to reduce that by a unified pool of memory (an Idea I very much like) but still, nothings out there for the M1 yet

misty flint Feb 23, 2021, 12:07 AM

#

well yeah. thats why you have distributed computing but i thought you guys were talking about supercomputers

#

ID_BoomKek

grave frost Feb 23, 2021, 12:07 AM

#

The more devices (gpus) ou have, the more computations you have to do on which device to place tensors on

#

its not exactly a if/else

iron basalt Feb 23, 2021, 12:08 AM

#

If you did it distributed the latency would probably climb above 200ms which would make the human the winner.

misty flint Feb 23, 2021, 12:09 AM

#

so supercomputers only?

#

ValkNaruhodo

grave frost Feb 23, 2021, 12:09 AM

#

yup. and most models are very complex, so they have to be distrivuted

iron basalt Feb 23, 2021, 12:09 AM

#

It's a very different and more difficult task to have an AI actually run and learn online in real time on just one device (like a robot's single (and power consumption limited) processor).

grave frost Feb 23, 2021, 12:09 AM

#

but it is possible

misty flint Feb 23, 2021, 12:10 AM

#

online learning amegablobsweats

#

so. much. data

grave frost Feb 23, 2021, 12:10 AM

#

With AGI ¯_(ツ)_/¯

iron basalt Feb 23, 2021, 12:10 AM

#

You can also just not use DL and instead use much more efficient methods. DL is computationally inefficient like crazy.

grave frost Feb 23, 2021, 12:10 AM

#

iron basalt You can also just not use DL and instead use much more efficient methods. DL is ...

yeah, but its powerful too

#

Gotta go. adios

misty flint Feb 23, 2021, 12:15 AM

#

bye

#

waveboye

iron basalt Feb 23, 2021, 12:16 AM

#

int[] arr = new int[64 * 1024 * 1024];

// Loop 1
for (int i = 0; i < arr.Length; i++) arr[i] *= 3;

// Loop 2
for (int i = 0; i < arr.Length; i += 16) arr[i] *= 3;

#

For those wondering what I meant by memory speed being the limiter. These two for loops take the same amount of time. The first does 16x more computation than the second, but they are both limited by memory speed (cpu fetches 16 ints at a time).

#

(Memory speed is also a huge issue on GPUs)

misty flint Feb 23, 2021, 12:26 AM

#

an increment operator. interesting

#

oh interesting

#

i see

iron basalt Feb 23, 2021, 12:27 AM

#

Yeah I could not use python code, because python is not running the bare metal, it's really slow and kind of nullifies any speed gains you could have by abusing the ability to do a bunch of computation due to fetching.

misty flint Feb 23, 2021, 12:28 AM

#

makes sense

iron basalt Feb 23, 2021, 12:28 AM

#

numpy definitely abuses it (more specifically BLAS, which it just calls).

#

Numpy also uses vector operations so it would fetch let's say 16 ints, and 16 other ints, and then add those 16 ints to the other 16 in 1 operation (as opposed to 16 addition operations / operation level parallelism).

iron basalt Feb 23, 2021, 12:59 AM

#

Btw, in the past, memory speed was much faster than the actual time is took to do the operations. So in the past, things like linked lists actually made sense. Now everyone uses dense arrays even though you sometimes need to resize them (reallocate memory (slow)).

lapis sequoia Feb 23, 2021, 2:46 AM

#

cool

#

ok guys

#

If you guys use pycharm

#

then eneter this code ok?

#

if you want to send automatic emails

vital ocean Feb 23, 2021, 2:59 AM

#

What's the codE?

lapis sequoia Feb 23, 2021, 2:59 AM

#

ok

#

one min

vital ocean Feb 23, 2021, 3:00 AM

#

lapis sequoia if you want to send automatic emails

Why are you sending in data science you can send in #python-discussion

#

It will be more cool 😄

lapis sequoia Feb 23, 2021, 3:01 AM

#

oh ok

coral ginkgo Feb 23, 2021, 3:05 AM

#

Tried searching on google but any communities I can join for SQL?

misty flint Feb 23, 2021, 3:47 AM

#

dunno but you can always ask sql stuff here or #databases

astral path Feb 23, 2021, 4:29 AM

#

so

#

i have two categorical variables, merchantID and brandID, which have a positive correlation of 0.11. I'm trying to somehow plot the correlation. I've chosen a heatmap for now to show correlation between different merchants based on brandID for my assignment, but I'm also looking for ways to visualize merchantID and brandID in a way that's useful. This is hard because both variables have thousands of categories, so a plot like a scatterplot looks extremely cluttered

#

e.g.

Ss30LiD9K7RDCDEU2fy14Ku016Hj20BXEjp6LgdMAPrFXPU7Sna3QFkCSE2CiHWIfsvDoRbgH8IIVYDzV8mhCTXBfifcv5kjuuTY.png

#

what should I do to visualize correlation between two categorical variables when they take on so many values?

#

should I just take a subset?

junior bane Feb 23, 2021, 4:41 AM

#

how can I pull data from spreadsheet in ETL?

misty flint Feb 23, 2021, 4:41 AM

#

out of these 9 subjects, what are the top 5 most important topics for data science? my guess would be 1) Programming, 2) Math for CS, 3) DS&A, 4) Databases, and 5) Distributed Systems

#

what do you guys think

astral path Feb 23, 2021, 4:43 AM

#

i would think programming, math for CS, databases, algo and data structures, and languages and compilers

misty flint Feb 23, 2021, 4:44 AM

#

you swapped languages and compilers for distributed systems?

#

hadoop, spark, etc. are so important for data science tho

#

esp for big data

astral path Feb 23, 2021, 4:45 AM

#

yeah, i think it would be wise to understand the languages and how they work, not very familiar with distributed systems

#

although if you think the other is a better option you should do that

misty flint Feb 23, 2021, 4:45 AM

#

ugh not enough time, too many things to learn

#

ill add it in as #6

#

DoggoKek

astral path Feb 23, 2021, 4:46 AM

#

oof nice

#

also i just ended up taking a sample

#

unfortunately, all that stuff earlier with pivoting was for nothing

#

cus the heatmap means nothing

misty flint Feb 23, 2021, 4:48 AM

#

hmm

#

do you have the dataset

#

or a link

#

let me see what happens when i upload it to tableau

astral path Feb 23, 2021, 4:48 AM

#

https://www.dropbox.com/s/9th2isoev1thgvj/current_farfetch_listings.csv?dl=1

#

uncleaned version

#

what's tableau?

misty flint Feb 23, 2021, 4:48 AM

#

just another data viz tool

#

lets see

astral path Feb 23, 2021, 4:53 AM

#

#

just a more clean version

misty flint Feb 23, 2021, 4:54 AM

#

rip just had a power surge gimme a sec

astral path Feb 23, 2021, 4:54 AM

#

ooooof

misty flint Feb 23, 2021, 5:01 AM

#

wait why are you comparing these two variables

#

i get brand id corresponds to the brand names

#

what does merchant id represent

astral path Feb 23, 2021, 5:02 AM

#

merchant id represents the merchant it's sold from

#

farfetch is a website that basically acts as a middle man for luxury botiques which wouldn't otherwise have the reach to sell their products

#

i'm trying to find a correlation between merchants and brands

#

well

#

i used Cramer's V and found there's a correlation of 0.11 between them

misty flint Feb 23, 2021, 5:06 AM

#

i see

#

also tableau hates your dataset

#

DoggoKek

astral path Feb 23, 2021, 5:06 AM

#

:(

#

what does it look like

misty flint Feb 23, 2021, 5:15 AM

#

it treats both as independent variables instead of trying to correlate them

astral path Feb 23, 2021, 5:16 AM

#

yeesh

misty flint Feb 23, 2021, 5:17 AM

#

its bc theyre both categorical technically

astral path Feb 23, 2021, 5:19 AM

#

here's another one with both categorical and numerical variables

misty flint Feb 23, 2021, 5:24 AM

#

pithink

#

interesting

astral path Feb 23, 2021, 5:26 AM

#

with line of best fit

#

doesnt look that great tbh

#

#

with smaller sample

misty flint Feb 23, 2021, 5:30 AM

#

yeah i tried some things too

#

nothing worked

#

DoggoKek

astral path Feb 23, 2021, 5:30 AM

#

¯_(ツ)_/¯

uncut bloom Feb 23, 2021, 5:50 AM

#

is the visualization important or knowing the groupings? I think of this as a simple collaborative filter, then tsne, draw bounds, grab 4 samples from each

misty flint Feb 23, 2021, 6:39 AM

#

interesting quote

#

But data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.

This is what leads to the big joke of how a data scientist is someone who knows more stats than a computer programmer and can program better than a statistician. What is this joke saying? It’s saying that a data scientist is someone who knows a little bit about two things.

#

this is the rest of the bit:

But I’d say they know about more than just two things. They also have to know to communicate. They also need to know more than just basic statistics; they’ve got to know probability, combinatorics, calculus, etc. Some visualization chops wouldn’t hurt. They also need to know how to push around data, use databases, and maybe even a little OR. There are a lot of things they need to know. And so it becomes really hard to find these people because they have to have touched a lot of disciplines and they have to be able to speak about their experience intelligently. It’s a tall order for any applicant.

#

john foreman from mailchimp

devout scroll Feb 23, 2021, 8:27 AM

#

Restarting my jupyter kernel does not reset variables. I'm using jupyter notebook in vs code. Does someone know what could cause this behaviour?

astral path Feb 23, 2021, 8:29 AM

#

how do I call sizes (normally a parameter for the seaborn scatterplot function) in scatter_kws in a regplot function?

stoic hollow Feb 23, 2021, 10:23 AM

#

which minor stream would be most ideal for data science

silk axle Feb 23, 2021, 10:45 AM

#

I'm currently using a pandas dataframe with scikit-learn LinearRegression as part of my ML program for predicting student grades:```py
data: pd.DataFrame = pd.read_csv('./data/student-mat.csv', sep=";")
data = data[['sex', 'studytime', 'failures', 'schoolsup', 'paid', 'absences', 'G1', 'G2', 'G3']]
data = data.replace({'F': 0, 'M': 1, "no": 0, "yes": 1})

to_predict = "G3"
X = np.array(data.drop([to_predict], 1))
y = np.array(data[to_predict])

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.1)

linear = linear_model.LinearRegression()
linear.fit(X_train, y_train)```How can I implement certain rules to the `linear.fit()`?

For more context, I have two previous test scores ("G1" and "G2", values of 0-20) of each student where 0 means they didn't take the test. I need to implement logic so that when both scores are 0, it'll predict 0 (since it can't make a prediction), when one score is 0 it'll ignore that score and predict on the other score, and when neither are 0 it'll just do as normal --- I need to ignore the scores that are 0 since it means the student didn't take the test and so I can't predict a test score based off it

earnest ibex Feb 23, 2021, 11:25 AM

#

Hi Guys!!

heady tide Feb 23, 2021, 12:12 PM

#

What is the best platform for training NNs on the cloud ?

lilac geyser Feb 23, 2021, 1:49 PM

#

Hello
Recently I was going through Hypothesis testing.
What I understood after listening to the introduction to hypothesis testing is.

So basically hypothesis testing is nothing but.
When we take the sample data from population and try predicting the population parameters. Whatever we get the population parameters from the sample data will be judged whether to reject or not...
This process is known as hypothesis testing.
Is my understanding correct?
Please help me!

#

Please @ me🙏

solar phoenix Feb 23, 2021, 2:51 PM

#

Does anyone know how i can change the name of a dataframe to an item i have in a list

#

I have 6 datafames and 6 items in my list

#

and i want to name the dataframes those items

wild dome Feb 23, 2021, 3:06 PM

#

Trying to convert a Jupyter notebook to PDF throws the following trace error:

#

https://paste.mod.gg/raw/nogufeqaco

#

pls help I just want a PDF hahaha :(

velvet thorn Feb 23, 2021, 3:18 PM

#

solar phoenix Does anyone know how i can change the name of a dataframe to an item i have in a...

rename with zip

lapis sequoia Feb 23, 2021, 3:19 PM

#

wassap

#

u guys use R?

velvet thorn Feb 23, 2021, 3:22 PM

#

you know this is a Python channel, right

lapis sequoia Feb 23, 2021, 3:22 PM

#

oh right tho i was gonna ask something about the relationship of R and python

velvet thorn Feb 23, 2021, 3:23 PM

#

sure

#

go ahead

lapis sequoia Feb 23, 2021, 3:23 PM

#

like can i put some r files in my python program? or hmmm?

velvet thorn Feb 23, 2021, 3:25 PM

#

lapis sequoia like can i put some r files in my python program? or hmmm?

uh

#

well

lapis sequoia Feb 23, 2021, 3:25 PM

#

like lets say I have a bunch of data and stats and graphs in my r script, and some functions like which and i wanna use that in my py code like can i do that?

velvet thorn Feb 23, 2021, 3:25 PM

#

I mean, you could put them in

#

if you expect to be able to run them

#

that's more complicated

lapis sequoia Feb 23, 2021, 3:25 PM

#

aaah hmmm

#

ok ok

velvet thorn Feb 23, 2021, 3:25 PM

#

look into foreign function interfaces

proper swift Feb 23, 2021, 3:26 PM

#

Hi I have a question, can anyone help with running a Linear Regression in Python? I have a dataset that has 5 categorical variables and 1 dependant variable, and im a little bit confused on how best to do this. Before learning Python, i did this sort of stuff in SPSS

wild dome Feb 23, 2021, 4:04 PM

#

wild dome Trying to convert a Jupyter notebook to PDF throws the following trace error:

does anybody knows why this is happening

#

do I have to uninstall Python and reinstall it for all users???

#

or guide me if this is not the channel for this question

misty flint Feb 23, 2021, 4:34 PM

#

proper swift Hi I have a question, can anyone help with running a Linear Regression in Python...

take a look at this link and see if it applies to your situation https://matplotlib.org/stable/gallery/lines_bars_and_markers/categorical_variables.html

serene scaffold Feb 23, 2021, 5:06 PM

#

I have a bunch of dataframes like this:

tag,precision,recall,f1
ADE,0.106,0.062,0.078
Dosage,0.788,0.804,0.796
Drug,0.534,0.655,0.588
Duration,0.674,0.609,0.640
Form,0.800,0.845,0.822
Frequency,0.668,0.725,0.695
Reason,0.250,0.259,0.254
Route,0.759,0.730,0.745
Strength,0.767,0.828,0.796

I want to name each dataframe and get the "argmax" of all the frames. So if dataframe A has the highest value for (Form, recall), I want that cell to be 'A' in the resulting frame. I heard it suggested that one use a multiindex but it's not clear to me from the docs how it could be used for that.

short heart Feb 23, 2021, 5:09 PM

#

need help with tensorflow ASAP

serene scaffold Feb 23, 2021, 5:11 PM

#

short heart need help with tensorflow ASAP

your best bet is to jump right in to describing what kind of help you need.

short heart Feb 23, 2021, 5:11 PM

#

yeah hold on

#

what is train_function error

#

Function call stack:
train_function

abstract zealot Feb 23, 2021, 5:12 PM

#

Anyone here familiar with chi square goodness of fit?

#

I have a problem with expected values that im generating that are far too small

#

basically im generating these values:```py
val = np.array([abs(int(e)) for e in norm.rvs(loc=1800, scale=2000, size=25, random_state=144)])

#

Ill send you the rest of the code if you can help 😄

grave frost Feb 23, 2021, 5:43 PM

#

@short heart post the whole traceback

carmine iron Feb 23, 2021, 5:48 PM

#

is there any finance / quant here. Having trouble understanding what a holding vector is .
Given a ticker of of the stocks, compute the holdings vector h E R^3 for the unique stock porftolio that is both dollar and beta neutral and has unit exposure to the specified stock

verbal jetty Feb 23, 2021, 5:49 PM

#

im completly new to scripts, so maybe im even in the wrong channel. but after installing the script of: https://github.com/andrewning/sortphotos im not able to run it, and i dont even know where to start

GitHub

andrewning/sortphotos

SortPhotos is a Python script that organizes photos and videos into folders using date/time information - andrewning/sortphotos

severe python Feb 23, 2021, 5:55 PM

#

@carmine iron have never heard of a "holdings vector" before

#

@iron basalt you there? want to bounce some ideas off you

carmine iron Feb 23, 2021, 5:57 PM

#

@severe python yeah me neither...i think it has to do with covariance matrix and optimizing the portfolio for each beta under/over 1

#

but honestly i could be way off. its all linear algebra

severe python Feb 23, 2021, 6:00 PM

#

carmine iron <@!345334655333171200> yeah me neither...i think it has to do with covariance m...

that would make sense, not sure why it couldn't have been said in simple terms

carmine iron Feb 23, 2021, 6:01 PM

#

well even with that understanding, i am still stuck on how to proceed. All i have is a df with date index, benchmark of SPX, then three unique stocks

quaint kelp Feb 23, 2021, 6:04 PM

#

Has anybody here encountered this error?:

severe python Feb 23, 2021, 6:04 PM

#

hmm, so you're not constructing a portfolio, you're evaluating a company's exposure to USD and their beta in general? @carmine iron

carmine iron Feb 23, 2021, 6:06 PM

#

SPX is the proxy, it will be a portfolio of the three stocks given not including SPX. After the holdings vector is found, i need do calculate daily PnL

#

i believe the exposure should be to SPX

#

unit_exposure is an argument in the function

severe python Feb 23, 2021, 6:09 PM

#

ah i see, i understand the concept but can't really apply it to python because i'm relatively new to it

carmine iron Feb 23, 2021, 6:10 PM

#

@severe python lets work together! whats the concept.

#

even if talking in terms of excel, i dont have too much time left to figure this one out

quaint kelp Feb 23, 2021, 6:12 PM

#

quaint kelp Has anybody here encountered this error?:

anyone know how to fix this?

proper swift Feb 23, 2021, 6:13 PM

#

misty flint take a look at this link and see if it applies to your situation https://matplot...

will look into it thanks

severe python Feb 23, 2021, 6:18 PM

#

carmine iron <@!345334655333171200> lets work together! whats the concept.

in layman's terms, you are looking for exposure of one stock to SPX. could use the covariance function between avg return of the stock given a period vs SPX variance (use over 3yrs or so) to get beta. or if they want you to use linear regression you could do that

#

then you could filter with IF beta over/under 1 then .... whatever. not sure if you are looking for currency exposure as well but could do something similar would have to google. not sure that any of that helps

carmine iron Feb 23, 2021, 6:26 PM

#

thanks that does!

simple torrent Feb 23, 2021, 7:04 PM

#

i can't figure it out how to run spark on jupyer notebook. Please help

lapis sequoia Feb 23, 2021, 7:39 PM

#

hello i am new to data science and i am learning it

#

any tips i can use ?

hollow sentinel Feb 23, 2021, 7:40 PM

#

Kaggle

lapis sequoia Feb 23, 2021, 7:41 PM

#

and ?

misty flint Feb 23, 2021, 7:42 PM

#

lapis sequoia any tips i can use ?

take a look at some of the vids from this playlist https://youtube.com/playlist?list=PLtqF5YXg7GLlHv-pD8PVu6NFqjwG-_U-s

YouTube

How to learn data science

A playlist of videos on how to learn data science curated from all of YouTube.

lapis sequoia Feb 23, 2021, 7:42 PM

#

thanks

misty flint Feb 23, 2021, 7:42 PM

#

np

ripe forge Feb 23, 2021, 8:12 PM

#

serene scaffold I have a bunch of dataframes like this: ```csv tag,precision,recall,f1 ADE,0.106...

One time operation or do you plan to do this many times? I don't know about multi index, but for a one off I'd just iterate. And for many times perhaps just make a 3d array of just precision recall f1, and store the tag externally. I'm assuming tag stays same in same order for all df.

serene scaffold Feb 23, 2021, 8:13 PM

#

ripe forge One time operation or do you plan to do this many times? I don't know about mult...

I'd like a generalized solution because I'll probably need it multiple times.

ripe forge Feb 23, 2021, 8:14 PM

#

Does the assumption about the number of rows and the tag column hold?

serene scaffold Feb 23, 2021, 8:16 PM

#

ripe forge Does the assumption about the number of rows and the tag column hold?

You can assume that every dataframe will have identical sets of indices and columns and I don't care if violating that assumption has unpredictable behavior

restive obsidian Feb 23, 2021, 8:20 PM

#

how to get row count in pandas with chunksize?

ripe forge Feb 23, 2021, 8:23 PM

#

Ah then yeah, my knee jerk reaction is to make a 3d array

#

Shape (num_dataframes, rows, cols) and then just freely use numpy operations after slicing for a single row

rare ice Feb 23, 2021, 8:30 PM

#

I am using Apache Spark (specifically, pyspark) for some data processing. I noticed that syntax for "case when" and for "when otherwise" is very different and they are used differently Can someone explain the pros/cons of each method? Thanks!

#

It looks like the "case when" approach is potentially harder to use and debug because you are writing a big expression string

real wigeon Feb 23, 2021, 8:32 PM

#

hey folks, maybe you can chime in when available. I'm using sqlalchemy to pull some data data = db.session.query, with a whole bunch of questions, a few of which contain datime values

#

i then go on to drop that data in a pandas df, so i can prep it for conversion to xlsx

#

df = pd.DataFrame(data, columns=['upload_timestamp',