#data-science-and-ml | Python | Page 296

serene scaffold Mar 16, 2021, 12:40 AM

#

some people assume that Python won't do math the way they expect if they see 5.0 and not 5.000000000000000000000000000

exotic maple Mar 16, 2021, 12:51 AM

#

really? i didnt know that at all

#

so inside the env i should use conda install, not pip install?

loud finch Mar 16, 2021, 12:53 AM

#

Yes always conda install

#

If you cant run it that way, run it in other env

#

Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times, establishing a state that can be hard to reproduce. Most of these issues stem from that fact that conda, like other package managers, has limited abilities to control packages it did not install. Running conda after pip has the potential to overwrite and potentially break packages installed via pip. Similarly, pip may upgrade or remove a package which a conda-installed package requires. In some cases these breakages are cosmetic, where a few files are present that should have been removed, but in other cases the environment may evolve into an unusable state.

#

https://www.anaconda.com/blog/using-pip-in-a-conda-environment

Anaconda

Anaconda | Using Pip in a Conda Environment

Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times, establishing a state that can be hard to reproduce. Most of these issues stem from that fact that conda, like other package managers, has…

misty flint Mar 16, 2021, 1:00 AM

#

oops

#

i use pip install when ive activated an env

#

RunFail

exotic maple Mar 16, 2021, 1:02 AM

#

misty flint i use pip install when ive activated an env

me too lmao. rip env

#

one always learns something

#

-destroys env-

misty flint Mar 16, 2021, 1:03 AM

#

but now i know

#

hey its better than not building in an env

#

DoggoKek

velvet thorn Mar 16, 2021, 1:08 AM

#

loud finch Yes always conda install

honestly

#

it doesn’t really matter for smaller environments where all dependencies are Python

#

for convenience you can just use pip

#

problems arise when you keep mixing the two over time

#

which you can generally avoid with a bit of planning

misty flint Mar 16, 2021, 1:09 AM

#

pithink

velvet thorn Mar 16, 2021, 1:09 AM

#

although using conda all the time is the safe play

bitter harbor Mar 16, 2021, 1:12 AM

#

do you happen to know where/how this is done?

misty flint Mar 16, 2021, 1:35 AM

#

thanks for the insight gm

#

ok_handbutflipped

misty flint Mar 16, 2021, 1:36 AM

#

bitter harbor do you happen to know where/how this is done?

your name = dsylexia²

#

blobhyperthink

bitter harbor Mar 16, 2021, 1:36 AM

#

ik I keep forgetting how bad it looks

misty flint Mar 16, 2021, 1:36 AM

#

DoggoKek

#

immunity

misty flint Mar 16, 2021, 1:56 AM

#

sounds like you want a scatterplot instead

#

which is a different plot

lavish tundra Mar 16, 2021, 1:57 AM

#

but i want keep the lines

#

i was thinking about draw another graphic, and hide the legend of the lineplot and try to hide the other graphic, but i think it will take too much performance to render 2 graphic

exotic maple Mar 16, 2021, 2:07 AM

#

at that point its easier to cast it with matplotlib

#

plt.plot(x, 'ro') and so on

misty flint Mar 16, 2021, 2:32 AM

#

thats what id do but only bc im not as familiar with seaborn

exotic maple Mar 16, 2021, 2:35 AM

#

seaborn is default gorgeous but sometimes you want something a biiit more specific

#

and you end up accessing the matplotlib background anyways

misty flint Mar 16, 2021, 2:36 AM

#

~~this is where R's ggplot or plotly would shine~~

#

RunFail

bitter harbor Mar 16, 2021, 2:38 AM

#

honestly the 1 and only thing I miss about R is the plotting

misty flint Mar 16, 2021, 2:39 AM

#

~~better python viz libraries when~~

#

RunFail

bitter harbor Mar 16, 2021, 2:40 AM

#

just python 4 things

lean ledge Mar 16, 2021, 2:40 AM

#

https://plotnine.readthedocs.io/en/stable/

misty flint Mar 16, 2021, 2:40 AM

#

based on ggplot2
already hooked

#

DoggoKek

#

oh this is a really cool approach

#

ID_blurryeyes

#

might play with this lib next time i have to plot something

still otter Mar 16, 2021, 2:43 AM

#

Is there a library/tool that lets you run custom simulations with AI agents? For example, I wrote a gridworld simulation, and I want to test AI models with it, and train them/evolve them.

exotic maple Mar 16, 2021, 2:44 AM

#

I think there is one library for visual simulations but i cant remember it lol

exotic maple Mar 16, 2021, 2:44 AM

#

misty flint > based on ggplot2 already hooked

Dude I love that dog sticker you use lol

lean ledge Mar 16, 2021, 2:45 AM

#

It's a grid world, shouldn't be hard to simulate yourself

still otter Mar 16, 2021, 2:46 AM

#

yeah, I've already written one as a sort of prototype

#

but I want to see if there are other options for me before I start "finalizing" things

#

specifically the "connecting AI agents" part, right now my ai is baked into the simulation code and I'd want to change that going forward

#

but if something already exists that does that for me i'd rather use that than reinvent the wheel

misty flint Mar 16, 2021, 2:49 AM

#

exotic maple Dude I love that dog sticker you use lol

its my favorite one. i also use it WAY too much. literally 50% of the time.

still otter Mar 16, 2021, 2:49 AM

#

I'm looking at openai/gym right now, but it's difficult to know where I'd find things similar to it

#

especially since gym seems to be built around reinforcement learning, while i'm focusing more on unsupervised learning right now

misty flint Mar 16, 2021, 2:57 AM

#

simulations with AI agents most of the time will be focused on RL

#

since you need it to actually do RL

#

lol

#

i think a cool side project to do would be to teach an AI to play a small game with Q-Learning

#

like that flappy bird example

still otter Mar 16, 2021, 3:02 AM

#

actually maybe i have my terms messed up and i am doing reinforcement learning lol

#

brute force reinforcement learning

#

or monte carlo might be the closest I guess

#

anyway, I see now that openai/gym provides specific environments and wasn't really made to handle custom environments

bitter harbor Mar 16, 2021, 3:09 AM

#

misty flint i think a cool side project to do would be to teach an AI to play a small game w...

actually I was thinking about doing that for my bots if I ever get there
I'm also thinking it'd be easier just to code in how to play the game well (with a ltsm?)

#

send all the bot's vars for dectection/movement/whatever through a ltsm but clamp the values first based on conditions?

misty flint Mar 16, 2021, 3:12 AM

#

that would be a cool idea

#

i only bring up q-learning since thats the easiest to implement

#

less parameters

#

also have you seen the code for some q-learning stuff?

#

its SO short

#

its wild

exotic maple Mar 16, 2021, 3:14 AM

#

@misty flint whats q learning?

bitter harbor Mar 16, 2021, 3:18 AM

#

"Q-learning is a model-free reinforcement learning algorithm to learn quality of actions telling an agent what action to take under what circumstances."```you send a bunch of actions to the algorithm, those actions change the algorithm's state, a reward is given + the algo optimises that reward

misty flint Mar 16, 2021, 3:19 AM

#

a type of RL that is model-FREE but still performs very, very well

#

2 main parameters is alpha (learning rate) and gamma (temporal discounting)

#

thats it

#

only downside is it requires lots of repetitions

#

to "learn"

lean ledge Mar 16, 2021, 3:20 AM

#

misty flint a type of RL that is model-FREE but still performs very, very well

performs alright if you have a small discrete state space*

misty flint Mar 16, 2021, 3:20 AM

#

the model-free part i would say is the biggest advantage

misty flint Mar 16, 2021, 3:20 AM

#

lean ledge performs alright if you have a small discrete state space*

hahaha

#

yep

#

lets add that caveat

#

what are all the environmental constraints again

#

like it has to be deterministic ,etc.

#

DoggoKek

bitter harbor Mar 16, 2021, 3:21 AM

#

doesn't it also like just not work sometimes?

misty flint Mar 16, 2021, 3:22 AM

#

single-agent

#

yeah

#

sometimes

#

lol

#

q-learning is like the linear regression of reinforcement learning

#

so dont put your hopes on it

#

DoggoKek

lean ledge Mar 16, 2021, 3:23 AM

#

q learning is brute forcing it

#

there's X actions you can take, Y states you can be in

bitter harbor Mar 16, 2021, 3:23 AM

#

im sure it'd be fine for flappy bird tho

lean ledge Mar 16, 2021, 3:23 AM

#

Make an X*Y table of actions-state combination

#

choose them randomly

#

as you learn more you can update your actions to exploit whatever information about the reward you gained

bitter harbor Mar 16, 2021, 3:25 AM

#

it's been a while but isn't one of the problems that it's maximising not optimising?

lean ledge Mar 16, 2021, 3:26 AM

#

maximising is optimising

exotic maple Mar 16, 2021, 3:27 AM

#

maximising is a form of optimizing

#

any optimization is, by definition, max or min of a cost function.

exotic maple Mar 16, 2021, 3:28 AM

#

lean ledge performs alright if you have a small discrete state space*

in English?

#

I usually struggle more with terms than with understanding lol

fading vector Mar 16, 2021, 3:29 AM

#

i want to be a data scientist and i will pay cash anyone who can teach me and be my mentor..

misty flint Mar 16, 2021, 3:30 AM

#

bitter harbor im sure it'd be fine for flap*py* bird tho

it actually works amazing for flappy bird. take a look http://sarvagyavaish.github.io/FlappyBirdRL/

lean ledge Mar 16, 2021, 3:32 AM

#

exotic maple in English?

If there's only a few states the agent can be in and a few different types of actions. If the state of the agent is continuous, or there's a lot of states it can occupy and lots of actions it can take, it becomes incredibly inefficient

#

Remember you're making a table where the columns are the action you take and the rows are the states you can be in, and you're randomly taking actions to see how the reward changes

#

As action space and state space become large, it becomes a massive table

#

Or if they're continuous spaces, that's not a table anymore

astral path Mar 16, 2021, 3:35 AM

#

This might be a dumb question with an obvious answer, but what should I do if my dataset contains names such as St. Joseph's, Pennsylvania, Virginia Commonwealth, or Loyola, Illinois but I'm trying to use these as parts of a URL for scraping which contain these names under different aliases (St-josephs, VCU, Loyola-IL)? there's no dictionary that I know of which contains a list of aliases to try for each instance, so is the only option to manually rename each?

misty flint Mar 16, 2021, 3:35 AM

#

there's no dictionary that I know of
NLTK might

#

DoggoKek

#

itd probs be easier to rename tho

lean ledge Mar 16, 2021, 3:37 AM

#

Unless there's a lot of different names it can be, it might just be easier to manually search for key terms

astral path Mar 16, 2021, 3:37 AM

#

ooh lemme see

misty flint Mar 16, 2021, 3:37 AM

#

how big is your dataset

#

you always have the most interesting issues joseph. i like it

lean ledge Mar 16, 2021, 3:38 AM

#

If it's only 3 names and variations of those, pick key words like "joseph" and find all the ones that have it, add it in. Keep doing it for them until there's none left.

astral path Mar 16, 2021, 3:38 AM

#

186 different names, idk how many are inconsistent with the URL

lean ledge Mar 16, 2021, 3:38 AM

#

hmm 186 is not small

misty flint Mar 16, 2021, 3:39 AM

#

how much time are you willing to spend

#

lol

astral path Mar 16, 2021, 3:39 AM

#

oof a lot i guess lol

misty flint Mar 16, 2021, 3:39 AM

#

DoggoKek

astral path Mar 16, 2021, 3:39 AM

#

😔

misty flint Mar 16, 2021, 3:40 AM

#

idk how many are inconsistent with the URL
maybe take a quick look through the data

#

its probably not that many

astral path Mar 16, 2021, 3:40 AM

#

idk i mean

#

they're all college names

#

so like half are acronyms

#

i gtg for another thing 😩 sorry bye

misty flint Mar 16, 2021, 3:45 AM

#

bye

#

waveboye

astral path Mar 16, 2021, 4:24 AM

#

ok it was only like 3 lmao

#

it now looks like this

#

what i want to do is take the winner name, for example 1985 Temple, and get a URL like https://www.sports-reference.com/cbb/schools/temple/1985.html

#

df['url'] = df.apply[lambda x: x['Winner'].lower().replace(' ', '-') + "/" + str(x['Date']) + ".html"] is what I have, but i'm getting TypeError: 'method' object is not subscriptable

#

any ideas how I should do this?

#

(this is just for converting the columns Winner and Date to winner/date with dashes instead of whitespace

hasty grail Mar 16, 2021, 4:34 AM

#

astral path ok it was only like 3 lmao

you need regular brackets after df.apply, not square brackets

astral path Mar 16, 2021, 4:34 AM

#

<> brackets?

#

shit nvm im dumb curly braces {}?

hasty grail Mar 16, 2021, 4:34 AM

#

df.apply(...)

#

not df.apply[...]

astral path Mar 16, 2021, 4:35 AM

#

ohhh ok

#

i'll try that in a few

misty flint Mar 16, 2021, 4:36 AM

#

you know what this is like

#

this is like...

#

reverse web scraping

#

blobhyperthink

autumn veldt Mar 16, 2021, 4:49 AM

#

Excuse me, i have an question here.

uid      category
110      banana      
101      banana
.
.
001      apple
010      apple

when i train this dataset with datasplit 80% training and 20% testing, and then when my classification program running, I input manually test data with "uid 000" which is this data not on data train, and then the result is apple. my question is, how come the data that not include on data train can be classified as apple? i want to know how does the classification tell us that this data is classified as an apple?```

sorry for bad writing, my english is not good.

keen kestrel Mar 16, 2021, 5:08 AM

#

autumn veldt Excuse me, i have an question here. ```lets say i got this dataset(500 datasampl...

How do you preprocess the "uid" into vector?

astral path Mar 16, 2021, 5:42 AM

#

huh so

#

i got a column like this

#

and i used df['Date'] = df['Date'].str.replace('85', '1985') to change 85 to 1985 and df['Date'] = df['Date'].str.replace('16', '2016') for 16 to 2016, and a same one for all years in between

#

this is what's returned

#

       '2011199989', '20119990', '20119991', '20119992', '20119993',
       '20119994', '20119995', '20119996', '20119997', '20119998', '1999',
       '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007',
       '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015',
       '2016'], dtype=object)``` these are all the unique values of the column

#

what am I doing wrong?

#

it's a problem with all before 1999

#

wait nvm im a dumbass

autumn veldt Mar 16, 2021, 5:45 AM

#

keen kestrel How do you preprocess the "uid" into vector?

Its just an example, the real data have 8 features, one of them is uid... Something like that.

astral path Mar 16, 2021, 5:45 AM

#

the 99 and 98 functions change the 98 and 99 in 198x and 199x to something else

#

how should I rewrite it to not have this bug?

keen kestrel Mar 16, 2021, 5:54 AM

#

autumn veldt Its just an example, the real data have 8 features, one of them is uid... Someth...

Strange, but usually if it is a unique id for each sample. That feature needs to be drop.

keen kestrel Mar 16, 2021, 5:55 AM

#

astral path how should I rewrite it to not have this bug?

What does Date column look like? is it consistently two-digit that represents year from 85 to 16?

misty flint Mar 16, 2021, 5:56 AM

#

astral path and i used `df['Date'] = df['Date'].str.replace('85', '1985')` to change 85 to 1...

ID_BoomKek

astral path Mar 16, 2021, 5:56 AM

#

yeah so like 85, 94, 00, 06, 12, etc...

misty flint Mar 16, 2021, 5:57 AM

#

honestly i would probably run into the same problem bc id do it your way. its just funny seeing it joseph

#

cattohug

astral path Mar 16, 2021, 5:57 AM

#

lol

#

def f(x):
  if int(x[0]) == 0 | int(x[0]) == 1:
    x = str("20" + x)
  if int(x[0]) == 8 | int(x[0]) == 9:
    x = str("19" + x)
df['Date'] = df['Date'].apply(lambda x: f(x))

also tried it this way but it all ended up as None

keen kestrel Mar 16, 2021, 5:58 AM

#

astral path ```python def f(x): if int(x[0]) == 0 | int(x[0]) == 1: x = str("20" + x) ...

try this pd.to_datetime(df["Date"], format="%y").dt.year

misty flint Mar 16, 2021, 5:58 AM

#

pithink

#

will it work if you feed it as a string

#

idk the datetime module as well as i should

astral path Mar 16, 2021, 5:59 AM

#

YESSSSS

astral path Mar 16, 2021, 5:59 AM

#

keen kestrel try this pd.to_datetime(df["Date"], format="%y").dt.year

this worked

misty flint Mar 16, 2021, 5:59 AM

#

Praise

astral path Mar 16, 2021, 5:59 AM

#

THANK YOU

misty flint Mar 16, 2021, 5:59 AM

#

party time

astral path Mar 16, 2021, 6:00 AM

#

im building a march madness predictor btw if y'all couldn't tell

keen kestrel Mar 16, 2021, 6:00 AM

#

astral path THANK YOU

That column type changes to Date by the way, need to convert it back to string if you somehow need it.

astral path Mar 16, 2021, 6:00 AM

#

i got a $2000 pool and im the only one in a stem major lol

#

ok i'll make sure to do that

misty flint Mar 16, 2021, 6:00 AM

#

Praise

#

march madness leggo

#

actually ken jee is building something similar

#

DoggoKek

#

since hes a sports analytics guy and all

astral path Mar 16, 2021, 6:01 AM

#

oh yeah ive heard of that guy

#

cool channel

#

i have 3 days left ! ! ! !

misty flint Mar 16, 2021, 6:03 AM

#

Praise

#

best of luck

#

listen to his podcast

astral path Mar 16, 2021, 6:03 AM

#

thank u!!

misty flint Mar 16, 2021, 6:03 AM

#

kens nearest neighbors

#

its great

astral path Mar 16, 2021, 6:03 AM

#

oof i never have time for podcasts

misty flint Mar 16, 2021, 6:03 AM

#

DoggoKek

#

you dont sit down and listen

#

i listen while im driving, running or doing chores

astral path Mar 16, 2021, 6:04 AM

#

lol i'm basically doing homework 18 hours of the day

#

i tried it and i either cant focus on the podcast or i cant focus on the hw 😔

misty flint Mar 16, 2021, 6:05 AM

#

hw all day everyday...

#

memecringeharold

#

anyway i will leave this here for the lurkers

#

https://open.spotify.com/show/7fJsuxiZl4TS1hqPUmDFbl?si=0ihyaNbTTkCy7NkyY-03gA

Spotify

Ken's Nearest Neighbors

Listen to Ken's Nearest Neighbors on Spotify. Welcome to Ken's Nearest Neighbors, the podcast where I interview the most interesting people I can find on Data Science, Sports Analytics, Content Creation, Health, Performance, and much much more!

astral path Mar 16, 2021, 6:06 AM

#

grazie i'll check it if i ever find time!!

modest void Mar 16, 2021, 6:09 AM

#

is there a way to use a rolling window on multiple columns in pandas?

uncut barn Mar 16, 2021, 8:38 AM

#

Anyone have any good links for auto encoders for curve fitting?

#

/resources

lapis sequoia Mar 16, 2021, 9:46 AM

#

hi

#

tall trail Mar 16, 2021, 9:55 AM

#

i have this list of lists i want to put in my dataframe, i have the row number and the column name but it doesnt work the way google tells me to do it, what am i doing wrong?

ls = df_sub.values.tolist()
print(ls)                          df.loc[i,'trains'] = ls

i keep getting this value error: ValueError: Must have equal len keys and value when setting with an ndarray

grave frost Mar 16, 2021, 10:32 AM

#

I am surprised anyone even uses Q-learning now. Weren't DQN's the go-to "always better" method than Q-learning, since Q-learning is such a naive method

lean ledge Mar 16, 2021, 10:34 AM

#

DQN is still Q learning, the table is just replaced with an NN

#

It's still relatively naive

grave frost Mar 16, 2021, 10:38 AM

#

I was talking about the ease of implementation. If you could implement a DQN In the same amount of time as Q-learning, why would you use the more naive/simpler method?

obtuse sable Mar 16, 2021, 10:52 AM

#

what happens if I only standardize my features but not normalize to range(0, 1)

#

for neural network

#

my model training was stalling at 10 epochs before standardizing and now it seems to be improving past that after standardizing. is there a need to normalize then?

lean ledge Mar 16, 2021, 10:55 AM

#

grave frost I was talking about the ease of implementation. If you could implement a DQN In ...

Assuming you have an NN library, DQN doesn't take that many more lines than Q Learning

#

And also because Q learning is probably faster to train and computationally faster

stray mica Mar 16, 2021, 10:59 AM

#

Hey guys,
I have one of my thesis topic as 'Video captioning system to search through videos if an digital assistant needs to find an answer.'
Could anyone guide me on how do i start research on this topic and the concepts that i need to learn for this?
Currently I'm going through papers with video captioning system. Any help would be really helpful.
Thanks.

pearl vault Mar 16, 2021, 11:52 AM

#

Hello guys
I am thinking of buying a new laptop for training deep learning models
Can anyone tell the minimum requirements (especially the CPU like no of cores) should I consider before buying

austere swift Mar 16, 2021, 11:53 AM

#

are you gonna be training on the laptop? or coding on the laptop and using something like colab or kaggle

pearl vault Mar 16, 2021, 11:54 AM

#

Coding

#

And also training in the same laptop

#

@austere swift

austere swift Mar 16, 2021, 11:57 AM

#

if you're gonna be training locally on the laptop you definitely need a gpu

pearl vault Mar 16, 2021, 11:57 AM

#

austere swift Mar 16, 2021, 11:58 AM

#

8gb ram won't be enough lol

#

get at least 16

pearl vault Mar 16, 2021, 11:58 AM

#

I will upgrade it to 16

lapis sequoia Mar 16, 2021, 11:58 AM

#

bro that bottleneck is gonna be insane

pearl vault Mar 16, 2021, 11:58 AM

#

The problem is the CPU it's an quad core processor

#

Will it affect my performance?

lapis sequoia Mar 16, 2021, 12:01 PM

#

i think so

#

@pearl vault i actually think u should try to config a pc

#

that would be cheeper and you can freely choose ur components

austere swift Mar 16, 2021, 12:03 PM

#

i'm pretty sure they want a laptop

#

you can't really build a laptop very easily

lapis sequoia Mar 16, 2021, 12:04 PM

#

oh ok

lapis sequoia Mar 16, 2021, 12:05 PM

#

pearl vault

how muck dollars are that?

pearl vault Mar 16, 2021, 12:05 PM

#

1241$

#

Usd

lapis sequoia Mar 16, 2021, 12:06 PM

#

i think there are lots of videos for laptops to train ur ai

pearl vault Mar 16, 2021, 12:06 PM

#

Other option is

lapis sequoia Mar 16, 2021, 12:06 PM

#

still to less RAM

pearl vault Mar 16, 2021, 12:07 PM

#

I can upgrade ram

lapis sequoia Mar 16, 2021, 12:07 PM

#

16 would be enough

#

so serarch another model

pearl vault Mar 16, 2021, 12:08 PM

#

One slot will be empty I can buy an another ram stick but the CPU gpu choice is the problem

lapis sequoia Mar 16, 2021, 12:09 PM

#

#

https://analyticsindiamag.com/which-is-the-best-laptop-for-machine-learning-and-artificial-intelligence/

Analytics India Magazine

Kishan Maladkar

Which Is The Best Laptop For Machine Learning and Artificial Intell...

By having one of these laptops, one can build, train and test their own deep learning models in a short span of time.

#

this could help

pearl vault Mar 16, 2021, 12:09 PM

#

Ty

lapis sequoia Mar 16, 2021, 12:09 PM

#

ur wlcm

shadow grail Mar 16, 2021, 12:34 PM

#

Can someone help me with this please ?

grave frost Mar 16, 2021, 12:34 PM

#

I do discourage using laptop tho - you can easily use it up7in a few months.

pearl vault Mar 16, 2021, 12:39 PM

#

@grave frost I just want to know wether the CPU is a bottle neck or not

#

Currently Laptop is the only option I have now

obtuse sable Mar 16, 2021, 1:03 PM

#

What happens if there is multicollinearity or useless features in a neural network? Is the algorithm smart enough to fix them? Sorry if this sounds stupid I'm a complete beginner

serene scaffold Mar 16, 2021, 1:17 PM

#

are df.loc and .iloc O(1) lookup if you only use indices/column names (or any for iloc)?

#

I mean I guess they're both O(n) for the number of rows or columns you ask them to fetch

civic ferry Mar 16, 2021, 1:46 PM

#

#Create a function to combine the values of the important columns into a single string
def get_important_features(data):
important_features = []
for i in range(0, data.shape[0]):
important_features.append(data['Actors'][i]+' '+data['Director'][i]+' '+data['Genre'][i]+' '+data['Title'][i])

return important_features                                                                                              #Create a column to hold the combined strings

df['important_features'] = get_important_features(df)

#Show the data
df.head(3)
This is my code although i am getting below error: Traceback (most recent call last)
<ipython-input-10-50d23e3e0015> in <module>()
1 #Create a column to hold the combined strings
----> 2 df['important_features'] = get_important_features(df)
3
4 #Show the data
5 df.head(3)

3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in sanitize_index(data, index)
746 if len(data) != len(index):
747 raise ValueError(
--> 748 "Length of values "
749 f"({len(data)}) "
750 "does not match length of index "

ValueError: Length of values (1) does not match length of index (1000)
Please help

serene scaffold Mar 16, 2021, 2:16 PM

#

!code @civic ferry

arctic wedgeBOT Mar 16, 2021, 2:16 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold Mar 16, 2021, 2:17 PM

#

also what is data?

civic ferry Mar 16, 2021, 2:17 PM

#

Its data of movie reccomendation collected from kaggle

serene scaffold Mar 16, 2021, 2:18 PM

#

civic ferry Its data of movie reccomendation collected from kaggle

can you run print(data)?

civic ferry Mar 16, 2021, 2:18 PM

#

serene scaffold can you run `print(data)`?

Yes ,should i send you the whole code?

serene scaffold Mar 16, 2021, 2:18 PM

#

No, only what a few lines of print(data) prints out

#

!code

arctic wedgeBOT Mar 16, 2021, 2:18 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold Mar 16, 2021, 2:18 PM

#

^ please paste it like that

civic ferry Mar 16, 2021, 2:19 PM

#

Okay 1 sec

#

#Store the data
df = pd.read_csv('IMDB-Movie-Data.csv')
#Show the first 3 rows of data
df.head(3)

#

I havent tried the print function though

#

This prints 3 columns of data

serene scaffold Mar 16, 2021, 2:20 PM

#

def get_important_features(data):
  important_features = []
  for i in range(0, data.shape[0]):
    important_features.append(data['Actors'][i]+' '+data['Director'][i]+' '+data['Genre'][i]+' '+data['Title'][i])

I can't guess what this function does unless I know what data looks like. Please run print(data)

civic ferry Mar 16, 2021, 2:20 PM

#

#

Here,hope this helps

serene scaffold Mar 16, 2021, 2:21 PM

#

I still don't know what data is. I really need to know that or I can't continue.

civic ferry Mar 16, 2021, 2:22 PM

#

#

Should i run print(data)?

serene scaffold Mar 16, 2021, 2:22 PM

#

yes

#

please have that be the first line of get_important_features, and then copy and paste what it prints as text.

civic ferry Mar 16, 2021, 2:26 PM

#

Its giving an error

arctic wedgeBOT Mar 16, 2021, 2:28 PM

#

Hey @civic ferry!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

civic ferry Mar 16, 2021, 2:29 PM

#

@serene scaffold

serene scaffold Mar 16, 2021, 2:29 PM

#

civic ferry Its giving an error

did it print out what data is before you got the error?

civic ferry Mar 16, 2021, 2:29 PM

#

Nope

serene scaffold Mar 16, 2021, 2:29 PM

#

can you copy and paste the error as text into this chat?

lavish swift Mar 16, 2021, 2:58 PM

#

@civic ferry I usually try to avoid any sort of loop when I'm working with dataframes. If you need to create a new column that combines other columns, have you tried something like:

df['combined'] = df['actors'] + ' ' + df['director'] + ' ' + df['genre'] + ' ' + df['title']

Though you may want to use a better delimiter than space, otherwise parsing the new column may get a little ugly (the names from actors, director and title are all likely going to have spaces)

#

oh...and replace df with data (since I think your dataframe is called data

#

out of habit, I called mine df

fierce shadow Mar 16, 2021, 3:04 PM

#

pearl vault Other option is

focus more on your graphics card as machine learning and ai are graphics card intensive, if you are looking forward for reinforcement learning and stuff then 16gb+ ram is also recommended

#

1050ti is kinda bad choice for ai, I would recommend you going for 1650 super or better

pearl vault Mar 16, 2021, 3:11 PM

#

@fierce shadow 1660ti sir

fierce shadow Mar 16, 2021, 3:11 PM

#

thats pretty nice then

pearl vault Mar 16, 2021, 3:12 PM

#

What about processor? I5 10300 h is a quad core processor many people are telling it will bottle neck the gpu

fierce shadow Mar 16, 2021, 3:13 PM

#

I don't think so that it would bottle neck

pearl vault Mar 16, 2021, 3:14 PM

#

Ok ty

blazing bridge Mar 16, 2021, 3:16 PM

#

no a 1660ti and i5 10300h would not bottleneck the gpu

fair girder Mar 16, 2021, 4:12 PM

#

What's a good "sparsity" to switch a dense matrix to a sparse one?

#

I am dealing with a matrix with roughly 50% zeros and want to reduce ram usage

tidal bough Mar 16, 2021, 4:15 PM

#

not sure if 50% is low enough - you can test it yourself, compare the memory usage of numpy.ndarrays with scipy.sparse.csr_matrixes, say.

#

often you have to deal with matrixes with a density of like one percent, in which case it's definitely profitable

grave frost Mar 16, 2021, 4:16 PM

#

pearl vault <@738058085083381760> I just want to know wether the CPU is a bottle neck or not

Deep Learning shouldn't be done on a laptop usually.

#

Your CPU doesn't matter much. what is priority is memory and GPU. since mobile GPU's (like the on in laptops) have to optimize power for batteries with thermals, you likely will face a lot of crashes and won't be able to use it for long (Speaking from experience). Very few laptops have good thermals like Razer Laptops that cost 1.5 Lakh+

#

Desktop is usually the best choice of Deep Learning. Laptops are good only for light stuff

astral path Mar 16, 2021, 4:21 PM

#

quick stats question: what metric should I use for measuring how spread out the majority of a variable is (if that makes sense?

tidal bough Mar 16, 2021, 4:22 PM

#

Variance?

astral path Mar 16, 2021, 4:22 PM

#

for example
you can see that pts drops off after 4 rows

tidal bough Mar 16, 2021, 4:22 PM

#

Alternatively, you may be looking for kurtosis.

astral path Mar 16, 2021, 4:22 PM

#

what's kurtosis?

exotic maple Mar 16, 2021, 4:23 PM

#

@tidal bough Isnt kurtosis only used when you want to know how one-sided your distribution is?

tidal bough Mar 16, 2021, 4:23 PM

#

tidal bough Mar 16, 2021, 4:23 PM

#

exotic maple <@!266216750876459008> Isnt kurtosis only used when you want to know how one-sid...

Nah, you're thinking of skewness

exotic maple Mar 16, 2021, 4:23 PM

#

as in, you want to know if you can assume a normal distribution

tidal bough Mar 16, 2021, 4:23 PM

#

kurtosis is for measuring how thicc the tails are, basically

exotic maple Mar 16, 2021, 4:23 PM

#

tidal bough Nah, you're thinking of skewness

oh yeah skewness. i dont remember what kurtosis is lol

exotic maple Mar 16, 2021, 4:23 PM

#

tidal bough kurtosis is for measuring how thicc the tails are, basically

E X T R A T H I C C

astral path Mar 16, 2021, 4:24 PM

#

as a more broad question, I'm trying to figure out if a team relies very heavily on a few players to score or if its scoring is very evenly distributed throughout the entire team

#

so like how top-heavy it is

#

would kurtosis apply ?

exotic maple Mar 16, 2021, 4:24 PM

#

I think that would be skewness

#

but honestly, you should be able to find it with a basic histogram and / or kde curve of goals

astral path Mar 16, 2021, 4:25 PM

#

ah ok

#

i'm using this as a feature for an ML model

#

so it'd need to be quantified

exotic maple Mar 16, 2021, 4:25 PM

#

basically what you want to know is if a player or group of players as the majority of goals right?

astral path Mar 16, 2021, 4:26 PM

#

yeah basically

exotic maple Mar 16, 2021, 4:26 PM

#

also consider: you need to normalize it by position. I dont know much about american "football" but

astral path Mar 16, 2021, 4:26 PM

#

or minutes played alternatively

#

this is basketball btw

exotic maple Mar 16, 2021, 4:26 PM

#

positions have a major influence

#

ah

astral path Mar 16, 2021, 4:26 PM

#

positions don't have a huge influence because all 5 posisitions are on the court the same amt of time for 90% of teams

exotic maple Mar 16, 2021, 4:27 PM

#

yeah for basketball i dont think you will have that consideration

grave frost Mar 16, 2021, 4:27 PM

#

how thicc the tails
I honestly never though confusedreptile would ever use smthing like that lol

astral path Mar 16, 2021, 4:27 PM

#

lol

grave frost Mar 16, 2021, 4:27 PM

#

reminds of Dani 😁

astral path Mar 16, 2021, 4:27 PM

#

so skew is the way to go?

exotic maple Mar 16, 2021, 4:28 PM

#

I would think so, yes. perhaps someone ese can chime

#

remember that skewness measures "simmetry" basically

#

so, if every position / person scored the same, it would be symmetrical

astral path Mar 16, 2021, 4:29 PM

#

ah yeah that makes sense

exotic maple Mar 16, 2021, 4:29 PM

#

but if one person scores most, your distribution of scores will be assymetrical

astral path Mar 16, 2021, 4:29 PM

#

ok, that really helps a lot!

exotic maple Mar 16, 2021, 4:29 PM

#

I really need to have a statistics book to keep on hand ffs.

astral path Mar 16, 2021, 4:29 PM

#

im really thinking i should get at least a minor in statistics, i couldn't imagine only taking two courses and then going into industry

exotic maple Mar 16, 2021, 4:29 PM

#

I love hypothesis testing but i always forget when to use what

#

-rages in 5 types of t-tests-

astral path Mar 16, 2021, 4:30 PM

#

funny story

#

my data science class was supposed to be for stats majors only and i'm a data science major, and my university forgot to put a stats prereq on it even though most students are DS majors in the class and none of us knew about the prereq

#

so me and like 80% of the class went halfway thru without knowing a bit of statistics when the prof thought there was a stats prereq

#

luckily she caught on and removed the stats part of assignments cus i was lost lol

exotic maple Mar 16, 2021, 4:36 PM

#

lliterally 0 statistics?

#

wtf lol

#

like not even basic descriptives?

astral path Mar 16, 2021, 4:37 PM

#

yeah i never took it in high school and my first semester was filled with calc, CS, and gen ed

uncut orbit Mar 16, 2021, 4:38 PM

#

oof

exotic maple Mar 16, 2021, 4:38 PM

#

oof

uncut orbit Mar 16, 2021, 4:38 PM

#

do you know it now?

exotic maple Mar 16, 2021, 4:38 PM

#

look at this guy not knowing what's the difference between mean, median and mode :v

#

-calls them all average-

uncut orbit Mar 16, 2021, 4:39 PM

#

lmao

exotic maple Mar 16, 2021, 4:39 PM

#

pydis_snake

astral path Mar 16, 2021, 4:39 PM

#

uncut orbit do you know it now?

no lmao

astral path Mar 16, 2021, 4:39 PM

#

exotic maple look at this guy not knowing what's the difference between mean, median and mode...

lol that and stdev is about the extent of what i know

exotic maple Mar 16, 2021, 4:39 PM

#

that's basically descriptive statistics :p

#

add range, and quartiles and you're set on descriptives

astral path Mar 16, 2021, 4:39 PM

#

and stdev is stretching it because i was only told that it was "the average of how far away from the average things are" in sophomore year of HS

astral path Mar 16, 2021, 4:40 PM

#

exotic maple add range, and quartiles and you're set on descriptives

hehe my prof called the central limit theorem basic stats and i was like 😳

exotic maple Mar 16, 2021, 4:40 PM

#

depending on how elitist the person is, some will claim stddev is useless and you should always speak in variances instead

#

i think each has their use :v

astral path Mar 16, 2021, 4:40 PM

#

nah she's a really down to earth prof who understands the students, she just didnt' know we didnt know anything about stats

exotic maple Mar 16, 2021, 4:41 PM

#

central limit theorem isnt 'basic' but its definetely not the toughjest thing out there

astral path Mar 16, 2021, 4:41 PM

#

yeah i just have a hard time when she put an integral into stats when i dont even get the stats without calculus in there lmao

exotic maple Mar 16, 2021, 4:41 PM

#

i've literally never, ever used calculus for stats

#

its in the definition of the density and all, but ive never, ever used it

uncut orbit Mar 16, 2021, 4:42 PM

#

LMAO

#

you guys are good

tidal bough Mar 16, 2021, 4:42 PM

#

how do you calculate conditional distributions and stuff like that, then?

astral path Mar 16, 2021, 4:42 PM

#

that bodes well ig

#

calculate what 👀

#

shit i should take stats this summer

exotic maple Mar 16, 2021, 4:42 PM

#

tidal bough how do you calculate conditional distributions and stuff like that, then?

if I knew what that is, i'd have an answer .:p

astral path Mar 16, 2021, 4:42 PM

#

i cant go on in my major like this

tidal bough Mar 16, 2021, 4:43 PM

#

in my probability theory course, we eventually switched to a notation where even sums over discrete distributions are written via integrals

#

because it's just nicer to have things consistent

exotic maple Mar 16, 2021, 4:43 PM

#

tidal bough in my probability theory course, we eventually switched to a notation where even...

that sounds a bit overkill lol

#

I can see where its coming from thou

astral path Mar 16, 2021, 4:44 PM

#

well its been nice talkin but i gtg to said data science class in 1 minute

exotic maple Mar 16, 2021, 4:45 PM

#

I've always found the definition of Riemann integrals to be so succintly beautiful yet complex

bronze jacinth Mar 16, 2021, 4:46 PM

#

https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction
can anyone help me with this? i have some doubts

Company Bankruptcy Prediction

Bankruptcy data from the Taiwan Economic Journal for the years 1999–2009

uncut orbit Mar 16, 2021, 4:47 PM

#

explain your doubts

bronze jacinth Mar 16, 2021, 4:48 PM

#

im not sure which is bankrupt and which is not

#

like are 0=bankrupt and 1=not in the 'Bankrupt?' column?

#

also since im new its hard and takes time understanding the code

uncut orbit Mar 16, 2021, 4:50 PM

#

ah ofc

#

i would think that 1 means bankrupt

bronze jacinth Mar 16, 2021, 4:51 PM

#

yes even i thought the same thing

#

uncut orbit Mar 16, 2021, 4:51 PM

#

hmmm

bronze jacinth Mar 16, 2021, 4:52 PM

#

but (1min)

uncut orbit Mar 16, 2021, 4:52 PM

#

your data is skewed a lot

exotic maple Mar 16, 2021, 4:52 PM

#

0 normally is no

#

and lso, use logic

#

do you believe most companies would be bankrupt?

bronze jacinth Mar 16, 2021, 4:53 PM

#

exotic maple do you believe most companies would be bankrupt?

my friend online believed so and i fell into confusion

uncut orbit Mar 16, 2021, 4:53 PM

#

go to google datasets...you might find an explanation

bronze jacinth Mar 16, 2021, 4:53 PM

#

he knows ml so maybe i mightve trust him :/

uncut orbit Mar 16, 2021, 4:53 PM

#

https://datasetsearch.research.google.com/

bronze jacinth Mar 16, 2021, 4:53 PM

#

oh thanks!

exotic maple Mar 16, 2021, 4:53 PM

#

you can know something and still screw up

uncut orbit Mar 16, 2021, 4:54 PM

#

#

its from uci

exotic maple Mar 16, 2021, 4:54 PM

#

always curate things with a bit of common sense as well lol. It's unlikely 90% of taiwanese companies are bankrpt...

uncut orbit Mar 16, 2021, 4:54 PM

#

https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction

exotic maple Mar 16, 2021, 4:54 PM

#

if they are, i'll send you some money :v

bronze jacinth Mar 16, 2021, 4:54 PM

#

exotic maple you can know something and still screw up

ill check that out

bronze jacinth Mar 16, 2021, 4:54 PM

#

exotic maple always curate things with a bit of common sense as well lol. It's unlikely 90% o...

yes yes i asked the same thing and he was like it might be an error in the dataset

#

i didnt know if that was possible so here i am 🙂

#

wait ew no i dont want the emoji

bronze jacinth Mar 16, 2021, 4:55 PM

#

uncut orbit https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction

thx!

uncut orbit Mar 16, 2021, 4:56 PM

#

welcome anytime

bronze jacinth Mar 16, 2021, 4:56 PM

#

for the prev dataset(one im using)

#

i put an example to check but it gives the same output for any data i put in there

uncut orbit Mar 16, 2021, 4:58 PM

#

check all ur variables

exotic maple Mar 16, 2021, 4:58 PM

#

what is the best place to take a refresher in calculus? I think i need to review a lot of things.

uncut orbit Mar 16, 2021, 4:59 PM

#

coursera

exotic maple Mar 16, 2021, 4:59 PM

#

any specifics?

#

i tried khan academy in the past and liked it, but im thinking something a bit more...applied

uncut orbit Mar 16, 2021, 5:00 PM

#

hmmm

#

i haven't done calculus yet

#

but lemme see

#

most courses are for free

astral path Mar 16, 2021, 5:00 PM

#

exotic maple i tried khan academy in the past and liked it, but im thinking something a bit m...

i might be able to hook you up with some of my calc 3 lab pdfs

bronze jacinth Mar 16, 2021, 5:00 PM

#

uncut orbit check all ur variables

variables as in all the atributes?

astral path Mar 16, 2021, 5:01 PM

#

they're pretty applied to engineering problems

exotic maple Mar 16, 2021, 5:01 PM

#

Calc3 as in vector calculus?

astral path Mar 16, 2021, 5:01 PM

#

we're getting into that right now actually so yeah

uncut orbit Mar 16, 2021, 5:01 PM

#

bronze jacinth variables as in all the atributes?

no like the variables that you have defined

astral path Mar 16, 2021, 5:01 PM

#

but start of the semester was series

exotic maple Mar 16, 2021, 5:01 PM

#

I think i have my college notebook for that one. The ones i've lost are calc 1 and 2 lol

astral path Mar 16, 2021, 5:01 PM

#

ohhh lemme check if im still in the canvas course for those

exotic maple Mar 16, 2021, 5:01 PM

#

thanks man 🙂

bronze jacinth Mar 16, 2021, 5:02 PM

#

uncut orbit no like the variables that you have defined

alright

astral path Mar 16, 2021, 5:02 PM

#

ye im in it!

astral path Mar 16, 2021, 5:02 PM

#

exotic maple I think i have my college notebook for that one. The ones i've lost are calc 1 a...

anything specific?

exotic maple Mar 16, 2021, 5:03 PM

#

nothing in general

#

just a frefsher

#

anything is ok

astral path Mar 16, 2021, 5:03 PM

#

something like this?

uncut orbit Mar 16, 2021, 5:04 PM

#

https://www.coursera.org/specializations/mathematics-machine-learning

Coursera

Mathematics for Machine Learning

Offered by Imperial College London. Mathematics for Machine Learning. Learn about the prerequisite mathematics for applications in data ... Enroll for free.

#

it has some calculus

bronze jacinth Mar 16, 2021, 5:04 PM

#

@uncut orbit my friend asked me to run this and came to the conclusion that majority are bankrupt

uncut orbit Mar 16, 2021, 5:05 PM

#

id say not...

exotic maple Mar 16, 2021, 5:06 PM

#

astral path something like this?

that could do

arctic wedgeBOT Mar 16, 2021, 5:07 PM

#

Hey @astral path!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @astral path!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @astral path!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

astral path Mar 16, 2021, 5:07 PM

#

oof

#

i'll just DM

exotic maple Mar 16, 2021, 5:08 PM

#

thanks a lot man 😄 you too @uncut orbit

uncut orbit Mar 16, 2021, 5:08 PM

#

welcome

bronze jacinth Mar 16, 2021, 5:08 PM

#

uncut orbit id say not...

how should i download the dataset from the link you sent? the dataset folder has only parent dictionary

#

and i dont know how to go after that because in whatever ive learned there'll be .data files there

uncut orbit Mar 16, 2021, 5:10 PM

#

the kaggle one is the same

bronze jacinth Mar 16, 2021, 5:11 PM

#

oh

#

i checked the variables and i dont think they are wrong...

uncut orbit Mar 16, 2021, 5:12 PM

#

ok

#

try to directly copy paste the data points

bronze jacinth Mar 16, 2021, 5:12 PM

#

bronze jacinth i checked the variables and i dont think they are wrong...

to my knowledge

uncut orbit Mar 16, 2021, 5:13 PM

#

ok

#

are you using colab?

bronze jacinth Mar 16, 2021, 5:13 PM

#

uncut orbit try to directly copy paste the data points

yes thats what i did

#

vs code

uncut orbit Mar 16, 2021, 5:13 PM

#

hmmm do you have anaconda?

bronze jacinth Mar 16, 2021, 5:13 PM

#

nope :/

#

ill make a colab?

uncut orbit Mar 16, 2021, 5:13 PM

#

yea

#

it'll be easier that way

spring seal Mar 16, 2021, 5:16 PM

#

Hi there, I am interested in Simulation of Physics/Engineering problems by using python. For example: Simulation in Fluid Dynamics and Thermodynamics . So, which modules i have to learn and practice and get projects. Now days, I am learning Pandas, Numpy, Matplotlib and seaborn. My background is Mechanical Engineer. Please suggest modules / Techniques , I would like to hear and apply.

jaunty plaza Mar 16, 2021, 5:32 PM

#

yo how much does tensorRT help

misty flint Mar 16, 2021, 6:14 PM

#

exotic maple what is the best place to take a refresher in calculus? I think i need to review...

im also right there with you

#

DoggoKek

#

havent done calc in 10 years

#

amegablobsweats

#

but looked at partial derivatives just so i could understand backpropagation

#

DoggoKek

exotic maple Mar 16, 2021, 6:24 PM

#

8 years here

#

I remember some things about derivatives integration and calculus

#

But if u ask me to solve one ill embarass myself lol

grave frost Mar 16, 2021, 6:26 PM

#

I am learning Pandas, Numpy, Matplotlib and seaborn
Uh, how does that have anything to do with fluid simulations?

misty flint Mar 16, 2021, 6:27 PM

#

itll start to come back if you start looking at examples again

warm coyote Mar 16, 2021, 6:35 PM

#


## Import Libraries



import pandas as pd                 # pandas is a dataframe library
import matplotlib.pyplot as plt     # matplotlib.pyplot plots data
import numpy as np                  # numpy provides N-dim object support

# do ploting inline instead of in a separate window
%matplotlib inline

## Load and review data 

df = pd.read_csv("./tree/Notebooks/data/pima-data.csv")      # load Pima data.  Adjust path as necessary

df.shape

Error that I'm getting:

NameError Traceback (most recent call last)
<ipython-input-8-633337079cd0> in <module>
----> 1 df.shape

NameError: name 'df' is not defined

Also, I am using Jupyter notebook

misty flint Mar 16, 2021, 6:36 PM

#

what happens when you run 'df' by itself in a cell

#

if it doesnt show up, you probably didnt import pandas

silk marsh Mar 16, 2021, 7:39 PM

#

movie recomodation is based on KNN??

serene scaffold Mar 16, 2021, 7:40 PM

#

silk marsh movie recomodation is based on KNN??

what is the context for this question? you could make a movie recommendation system that uses KNN in some way.

silk marsh Mar 16, 2021, 7:45 PM

#

like it is based on matrix factorization ?

misty flint Mar 16, 2021, 7:47 PM

#

this was interesting to see

serene scaffold Mar 16, 2021, 7:47 PM

#

silk marsh like it is based on matrix factorization ?

we're talking about k nearest neighbors, right?

misty flint Mar 16, 2021, 7:48 PM

#

then for those in the states

astral path Mar 16, 2021, 7:48 PM

#

lmao wyoming

misty flint Mar 16, 2021, 7:48 PM

#

california, texas, and new york seem to have most jobs postings

misty flint Mar 16, 2021, 7:48 PM

#

astral path lmao wyoming

rip

astral path Mar 16, 2021, 7:48 PM

#

ay my state is second highest %

misty flint Mar 16, 2021, 7:48 PM

#

Praise

#

washington?

astral path Mar 16, 2021, 7:48 PM

#

actually my state is highest percent

#

ye

misty flint Mar 16, 2021, 7:48 PM

#

technically first

#

bc DC isnt a state

#

DoggoKek

astral path Mar 16, 2021, 7:49 PM

#

washington state not DC!

misty flint Mar 16, 2021, 7:49 PM

#

my state is 2nd highest in postings

astral path Mar 16, 2021, 7:49 PM

#

texas nice

#

never been

misty flint Mar 16, 2021, 7:49 PM

#

🇨🇱

#

yes ik thats the chile flag

#

but good enough

astral path Mar 16, 2021, 7:50 PM

#

lmao

#

austin?

misty flint Mar 16, 2021, 7:50 PM

#

i wish

#

dfw

astral path Mar 16, 2021, 7:50 PM

#

oh nice

#

austin is the big tech scene in texas so that was my first guess

misty flint Mar 16, 2021, 7:50 PM

#

spent some time in austin tho. at least ik there will be jobs for me in my state after i graduate

#

yeah gonna try to move there after my first job

#

we shall see

astral path Mar 16, 2021, 7:51 PM

#

slc is on the rise in tech jobs overall

misty flint Mar 16, 2021, 7:51 PM

#

they have a ton jobs down there

astral path Mar 16, 2021, 7:51 PM

#

which is where i go to school

misty flint Mar 16, 2021, 7:51 PM

#

noice

astral path Mar 16, 2021, 7:51 PM

#

ok so anyways i had a question

#

more boring than usual

misty flint Mar 16, 2021, 7:51 PM

#

lolol

#

go for it

astral path Mar 16, 2021, 7:51 PM

#

I'm building a function to scrape from a table and one part is dropping certain columns from a dataframe at some point. however, not all tables will have a specific column. how would I make it so this one column is only dropped if it exists?

#

rows_df = rows_df.drop(columns=['#', 'Weight', 'Hometown', 'High School', 'RSCI Top 100']) is what i have right now, and not all tables have High School as a column

misty flint Mar 16, 2021, 7:52 PM

#

what happens if you try to drop the column but it doesnt exist?

#

does it return an error?

astral path Mar 16, 2021, 7:52 PM

#

yeah

#

KeyError: "['High School'] not found in axis"

misty flint Mar 16, 2021, 7:53 PM

#

i would just do a small try/except code right before dropping the column

astral path Mar 16, 2021, 7:53 PM

#

i thought about it but

misty flint Mar 16, 2021, 7:53 PM

#

see if it exists

astral path Mar 16, 2021, 7:53 PM

#

wait nvm

misty flint Mar 16, 2021, 7:53 PM

#

did it not work?

#

i think it should work

astral path Mar 16, 2021, 7:54 PM

#

yeee it works!!

#

thank u man

misty flint Mar 16, 2021, 7:54 PM

#

Praise

astral path Mar 16, 2021, 7:54 PM

#

i'm ALMOST done with data collection ! !

misty flint Mar 16, 2021, 7:54 PM

#

np sometimes its easy to figure out once you talk your ideas out

#

and nice

#

Praise

#

i believe in you

astral path Mar 16, 2021, 7:55 PM

#

$350 now on the line here!

misty flint Mar 16, 2021, 7:55 PM

#

💸

astral path Mar 16, 2021, 7:55 PM

#

💰

silk marsh Mar 16, 2021, 7:58 PM

#

@serene scaffold yes like i was searching for the project like what it is based upon

#

am not able to understand is it based o knn or matrix....

astral path Mar 16, 2021, 7:59 PM

#

misty flint i would just do a small try/except code right before dropping the column

hmm so it's happening on other columns now in debugging

#

so it's not just one column

#

is there a way to just conditionally do it for any col?

exotic maple Mar 16, 2021, 8:03 PM

#

@astral path You can try creating a master list of cols, then retrieve the columns names from the actual table, evaluate which ones match, and then only delete the matche

#

matches

#

for example

#

!code

arctic wedgeBOT Mar 16, 2021, 8:03 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

exotic maple Mar 16, 2021, 8:05 PM

#

cols_to_delete = ["PLOP1", "PLOP2", "COL3"]

# here you get the cols from the df
df_cols = list(df.columns)

# evlauate and keep only those that are found
to_del = [col for col in df_cols if col in cols_to_delete

#finally, drop them
df.drop(to_del, axis=1)

#

I did it via lists but there are other ways to do it

misty flint Mar 16, 2021, 8:13 PM

#

that seems like it would work yeah

#

just dont forget the ending brackets for the list comprehension

#

that ALWAYS trips me up

#

DoggoKek

#

"why doesnt this list comprehension work"

#

forgets the brackets

#

Clown2

exotic maple Mar 16, 2021, 8:16 PM

#

misty flint that ALWAYS trips me up

I sometimes forget that list comprehension needs them at all lol

misty flint Mar 16, 2021, 8:17 PM

#

we are literally the same person i swear

#

ID_BoomKek

astral path Mar 16, 2021, 8:58 PM

#

exotic maple ```py cols_to_delete = ["PLOP1", "PLOP2", "COL3"] # here you get the cols from ...

thank you! this worked

exotic maple Mar 16, 2021, 9:19 PM

#

what would be a good, non iteration way to find the largest string on a list

#

?

#

right now I have this

#

    long_word = ""
    
    for word in unique_words:
        if len(word) > len(long_word):
            long_word = word
        else:
            continue

#

but im sure there has to be more optimal way to do it 😦

astral path Mar 16, 2021, 9:26 PM

#

I'm assuming it's not sorted?

#

if not I'm pretty sure the only way would be O(N)

spark stag Mar 16, 2021, 9:32 PM

#

exotic maple what would be a good, non iteration way to find the largest string on a list

>>> strings = ["abcd", "12345", "this_is_a_very_long_string"]
>>> max(strings, key=len)
'this_is_a_very_long_string'```

#

max will implicitly iterate over the list but if the list is in no way sorted, you would need to iterate over the list to find the longest string

astral path Mar 16, 2021, 9:35 PM

#

if I have a dataframe which looks like this

#

and i want to append a row with new columns that looks like this for each row in the dataframe

#

ho would I do that?

#

so the first row with 2016 Villanova vs. 2016 North Carolina would have that all the data from the second image as data in new columns

#

and all the other rows in the dataframe would have the same columns

nocturne kraken Mar 16, 2021, 9:52 PM

#

probably would be something called a "join"

astral path Mar 16, 2021, 9:58 PM

#

nocturne kraken probably would be something called a "join"

what if there's no common columns?

nocturne kraken Mar 16, 2021, 9:58 PM

#

then how do you decide which row gets to which row?

astral path Mar 16, 2021, 9:59 PM

#

i'm iterating over each row

#

for index, row in df.iterrows():
    try:
      # a should be the two rows combined, get_stats is a function that creates the new columns
      a = get_stats(row['url']).join(row)
      display(a)
    except:
      print("ERROR: ", row['url'])
    time.sleep(1.2)
    break

nocturne kraken Mar 16, 2021, 10:00 PM

#

so just by the order?

astral path Mar 16, 2021, 10:04 PM

#

nocturne kraken so just by the order?

yeah just by order

nocturne kraken Mar 16, 2021, 10:05 PM

#

you can concat with axis=1

astral path Mar 16, 2021, 10:10 PM

#

i'll try it

astral path Mar 16, 2021, 10:12 PM

#

nocturne kraken you can concat with axis=1

pd.concat([row, get_stats(row['url'])], axis=1) gives, although it should be a single row

nocturne kraken Mar 16, 2021, 10:13 PM

#

I don't know what you are doing

#

I thought you had two dataframes: df1 and df2

#

and you want them smooshed together horizontally

#

that is done by pdf.concat([df1, df2], axis=1)

astral path Mar 16, 2021, 10:15 PM

#

i have a dataframe df with a column url, and I have a function get_stats(url) which takes the value of url for a given row and returns a new dataframe which is a single row. I want to append get_stats(url) to the dataframe for each value of url

nocturne kraken Mar 16, 2021, 10:19 PM

#

use row.append

#

or you could use pd.apply and create the entire dataframe of stats from the urls

astral path Mar 16, 2021, 10:32 PM

#

aight i'm running that

kindred radish Mar 16, 2021, 10:43 PM

#

In machine learning, is it important to remove outliers?

#

I've got this column where the majority of values are in the range of 50-100, however the column has a max value of 400. So when i normalise the column, i get a bunch of low numbers because of only one outlier

#

This would presumably have an affect on how the machine learns right? Is there a scheme for removing outliers? Like looking at the standard deviation or something?

lean ledge Mar 16, 2021, 10:51 PM

#

kindred radish I've got this column where the majority of values are in the range of 50-100, ho...

You can normalise using standard deviation instead of max value

lean ledge Mar 16, 2021, 10:51 PM

#

kindred radish This would presumably have an affect on how the machine learns right? Is there a...

Yeah standard deviations for finding outliers works fine

kindred radish Mar 16, 2021, 10:51 PM

#

oh hi Raggy!

lean ledge Mar 16, 2021, 10:51 PM

#

PanGSneakyWave

kindred radish Mar 16, 2021, 10:52 PM

#

Is that what this is doing?

#

astral path Mar 16, 2021, 10:53 PM

#

that's just normalizing it by min and max values

lean ledge Mar 16, 2021, 10:53 PM

#

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

astral path Mar 16, 2021, 10:53 PM

#

yeah what raggy pasted is better for yuo

kindred radish Mar 16, 2021, 10:55 PM

#

physics wizards pulling through once again 👏 cheers Raggy ^^

kindred radish Mar 16, 2021, 11:10 PM

#

So i gave this a whirl, and I just want to know how this will affect learning:

#

#

So there are negatives, so if i put this through a MLP i imagine this will "slacken" weights instead of "tightening" them (does that make sense)?

#

Is this better or worse than using normalised values from 0 to 1?

#

Moreover, do i still need to remove these outliers? Or is it fine to keep them as this data is "standardised" instead of normalised?

high badge Mar 16, 2021, 11:13 PM

#

the outliers should be removed i think

#

or replaced

#

by median or mean or imputed with whatever statistic you see fit

kindred radish Mar 16, 2021, 11:14 PM

#

so if i remove them i get rid of the whole row right?

high badge Mar 16, 2021, 11:14 PM

#

you can get rid of the whole row, column, or just the value itself

kindred radish Mar 16, 2021, 11:14 PM

#

I thought i had to get rid of NaN data?

high badge Mar 16, 2021, 11:14 PM

#

but if u were to get rid of just the value itself you should replace it with another value that wont disproportionately affect training

kindred radish Mar 16, 2021, 11:15 PM

#

oh i see

#

which is why you suggested the mean/median

high badge Mar 16, 2021, 11:15 PM

#

yea

kindred radish Mar 16, 2021, 11:15 PM

#

hmmm i didn't think of that before. In this case it doesn't matter too much since it's only 2 values. But i have removed a decent amount of NaN data before

#

Didn't think to replace the values with the mean/median

#

is removing it the "safer" option?

high badge Mar 16, 2021, 11:16 PM

#

depends on how important your row/column is

#

if you identify a column as significant or you think it will be useful during training, then perhaps u should keep the oclumn

#

heres a stackoverflow question explaining the difference between minmaxscaler and standardscaler
https://stackoverflow.com/questions/51237635/difference-between-standard-scaler-and-minmaxscaler

Stack Overflow

Difference between Standard scaler and MinMaxScaler

What is the difference between MinMaxScaler and standard scaler.

MMS= MinMaxScaler(feature_range = (0, 1)) ( Used in Program1)

sc = StandardScaler() ( In another program they used Standard scaler...

kindred radish Mar 16, 2021, 11:17 PM

#

the answer would be that I don't know, since im not sure how much of an affect this particular column has on the data

high badge Mar 16, 2021, 11:18 PM

#

u can try estimating how important it is, if its not something blatantly irrelevant then it should be kept for the network to find hidden patterns in

shy kraken Mar 16, 2021, 11:24 PM

#

I can't seem to find the explanation for when you use a ! or a % in google colab....when do you use which? why do I have to use ! for ls and % for cd?

misty flint Mar 16, 2021, 11:35 PM

#

its the same as for jupyter magic commands

#

since colab is built on jupyter

#

look up jupyter notebook magic

#

blobwizard

shy kraken Mar 16, 2021, 11:38 PM

#

Got it! I am a magician now

misty flint Mar 17, 2021, 12:04 AM

#

ducky_wizard

eternal hare Mar 17, 2021, 2:33 AM

#

RuntimeError: Input type (torch.FloatTensor) and weight type (XLAFloatType) should be the same

#

i have no idea

#

what to do

serene scaffold Mar 17, 2021, 3:06 AM

#

eternal hare RuntimeError: Input type (torch.FloatTensor) and weight type (XLAFloatType) shou...

there's no way to guess what the problem is without more context.

#

though I'll make one uninformed guess anyway: is there a way to convert whatever your weights are to a FloatTensor?

dire comet Mar 17, 2021, 3:13 AM

#

Hi, anyone knows what should i do here? The only thing that I know is that I should make a py code using a montecarlo and the maximum likelihood estimation, but not really know how.

exotic maple Mar 17, 2021, 3:14 AM

#

ooh a confidence interval question

#

I havent seen it wth std of random error thou lol

#

Confidence interval is

#

mean +- (t-score @ confidence * std / sqrt(samples)

#

t-score also requires degrees of freedom, which normally is samples -1

#

but i have never seen a case where you're given the standard deviation of the error

#

in your case can calculate the sample mean from the observations, the t-score from the confidence interval and degrees of freedom

#

but i have no clue if you can just input the standard error of the mean there lol

dire comet Mar 17, 2021, 3:18 AM

#

Oh ok ok, I will give it a try. Thanks for the help tho @exotic maple 🙂

exotic maple Mar 17, 2021, 3:18 AM

#

this is what i said, ni math form

#

x bar is the average of the sample

dire comet Mar 17, 2021, 3:36 AM

#

Thank you so much!

mortal flicker Mar 17, 2021, 4:14 AM

#

can someone please help me to re-arrange this data frame?

#

want a matrix that is 50 (states) x months

#

i have tried to group by date and then transpose

exotic maple Mar 17, 2021, 4:20 AM

#

You want a multilevel index of state and month?

mortal flicker Mar 17, 2021, 4:22 AM

#

yes

#

month1  val    val         val
month2  val    val         val```

#

maybe i can export it as a csv and reshape it in R lol

lavish swift Mar 17, 2021, 4:52 AM

#

@mortal flicker are you able to share your data? or a snippet? Just easier to try some things on.

#

If not, have you tried pivot? https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot

astral path Mar 17, 2021, 5:50 AM

#

im just stumped at filling in the columns in this dataframe

#

im at the point where it looks like this, it has the desired columns and the original columns

#

and i have a function

def stats(x):
  try:
    a = get_stats(x)
  except:
    return np.nan
  display(x)
  return a

which takes a URL and returns a dataframe with one row containing all the stats in the columns from ppg on

#

like this

#

i just have no clue how to loop over each url and insert the desired values from stats(url) into the desired columns

#

any help? Thanks!!

misty flint Mar 17, 2021, 6:07 AM

#

lavish swift If not, have you tried pivot? https://pandas.pydata.org/docs/reference/api/panda...

oh nice function. def taking note of that one for later use

#

ValkNaruhodo

brave owl Mar 17, 2021, 7:27 AM

#

Posted a Question about Normalizing at #help-kiwi , Please help if possible

sudden panther Mar 17, 2021, 7:56 AM

#

Hi everyone ,I’m Anggita. Currently I’m making sentiment analysis twitter. I want to analyze hashtag of trending tweets on different country? Can anyone know what should I do?

obtuse sable Mar 17, 2021, 8:10 AM

#

Trade signals xD

bronze jacinth Mar 17, 2021, 9:27 AM

#

getting precision as 0 with this dataset. help?

neon thistle Mar 17, 2021, 9:29 AM

#

Hello guys, I'm starting in python and I trying detect anomalies in a list. I search anything to make this and tested a code, but I'm receiving a error:
ModuleNotFoundError: No module named 'numpy'
Anyone can help me?

bronze jacinth Mar 17, 2021, 9:31 AM

#

you have to import numpy

lean ledge Mar 17, 2021, 9:33 AM

#

You have to install numpy first

bronze jacinth Mar 17, 2021, 9:34 AM

#

@lean ledge would you mind helping me out?

neon thistle Mar 17, 2021, 9:36 AM

#

bronze jacinth you have to import numpy

TY

neon thistle Mar 17, 2021, 9:36 AM

#

lean ledge You have to install numpy first

TY

misty tiger Mar 17, 2021, 9:41 AM

#

can anyone help me to setup gpu fr jupyter notebook

rich reef Mar 17, 2021, 10:41 AM

#

Greetings,
I have a pandas dataframe with a Datetimeindex and various discrete values.
For simplification:

index       p    q
2020-1-1    6    9
2020-1-8    4    19
2020-1-12   4    17

Every time a value p or q changes, a record is added. That means that at 2020-1-2 the values are essentially the same as 2020-1-1.

I want to plot these data but face a problem:

Plotting the sparse data structure as a line, there's a slope between measurements which is misleading.
Plotting the sparse data structure as a bar leaves countless gaps.
Filling the missing dates and plotting line or bar is slow, because the real dataset spans several years and has 40 columns.

Essentially I want a bar chart where bars have different widths, until the next measure, or a line plot that moves "like a staircase" with only horizontal and vertical segments.

What would you recommend?

untold cove Mar 17, 2021, 10:47 AM

#

@rich reef this may help you: https://stackoverflow.com/questions/66545695/python-plotly-dash-question-custom-labels-and-color-based-on-values

Stack Overflow

Python - Plotly, Dash question - Custom Labels and Color based on V...

I have the following code, im new to Plotly, dash, and pandas so I am hoping someone may be able to help me out. I am after 3 things:

A Second Y line with the data from df['ScoreMaths']
A Legend, so

rich reef Mar 17, 2021, 10:49 AM

#

This seems to be a very different issue than what I'm asking 🤔

#

But thanks anyway

velvet thorn Mar 17, 2021, 10:59 AM

#

@rich reef I like the line option

grave frost Mar 17, 2021, 11:02 AM

#

Oh god stupid Jupyter notebooks. I put my model to train for 48 hours on cloud time and it didn't say anything at all. the next time when I used the command line I got killed. Jupyter sucks 🤬

#

Now I have to spend more money to train it for the 3rd time with more RAM 😞

velvet thorn Mar 17, 2021, 11:03 AM

#

grave frost Now I have to spend more money to train it for the 3rd time with more RAM 😞

😔

rich reef Mar 17, 2021, 11:07 AM

#

velvet thorn <@185043591343636480> I like the line option

I'll try that, then. I think I'll duplicate the index, offset by a second, and merge and forwardfill.

That should get

index                    p    q
2019-12-31 23:59:59      NA   NA
2020-01-01 00:00:00      6    9
2020-01-07 23:59:59      6    9
2020-01-08 00:00:00      4    19
2020-01-11 23:59:59      4    19
2020-01-12 00:00:00      4    17

A line plot should then practically be a staircase. The NA at the start doesnt matter because its outside the interval.

velvet thorn Mar 17, 2021, 11:10 AM

#

rich reef I'll try that, then. I think I'll duplicate the index, offset by a second, and m...

I would have done it in pure numpy but that works too (and is arguably better!)

rich reef Mar 17, 2021, 11:11 AM

#

How would you do that? Honestly I always grab pandas out of convenience, and I work with Dataframe's in the rest of my data wrangling

rich reef Mar 17, 2021, 11:18 AM

#

grave frost Oh god stupid Jupyter notebooks. I put my model to train for 48 hours on cloud t...

https://www.youtube.com/watch?v=7jiPeIFXb6U I'll just leave this here ...

YouTube

O'Reilly

I don't like notebooks.- Joel Grus (Allen Institute for Artificial ...

I have been using and teaching Python for many years. I wrote a best-selling book about learning data science. And here’s my confession: I don’t like notebooks. (There are dozens of us!) I’ll explain why I find notebooks difficult, show how they frustrate my preferred pedagogy, demonstrate how I prefer to work, and discuss what Jupyter could do ...

▶ Play video

warm moth Mar 17, 2021, 12:04 PM

#

Ai is cool but cyber secuiryt is better

#

im doing cyber sec currently

#

just hopefully i can get a good gpa

#

but im working on it

lavish tundra Mar 17, 2021, 12:35 PM

#

i'm doing a bot where multiple ppl gonna use it to generate a img and sent it to the person, so i was thinking how the file should be render, but i was thinking, is possible to XY graphic be sent without be saved and the script understand the different of ppl like using ram?
What i mean is: instead save the file with a ID and sent it and after it delete, is possible i dont need to save it and he just sent the file who match the person who request?

fickle socket Mar 17, 2021, 12:47 PM

#

Anyone got any recommends on using Python and SLURM? I got a NLP pipeline I am working on and I am wondering if anyone here got some tips on trying to scale Python on a supercomputer.

#

Specifically I am using Spacy and pipe multiprocessing but I seems like I am going to hit some road blocks with that but I can figure that out on my end.

serene scaffold Mar 17, 2021, 12:54 PM

#

fickle socket Anyone got any recommends on using Python and SLURM? I got a NLP pipeline I am w...

What is slurm?

grave frost Mar 17, 2021, 12:56 PM

#

warm moth Ai is cool but cyber secuiryt is better

There are plenty of projects to automate CyberSec with AI. currently they are experimental so it required bombarding the server with requests - however I read about some corporate AI projects that can query a server and construct a full profile of it to aid another model to find vulnerabilities in it. pretty cool if you ask me

whole idol Mar 17, 2021, 1:00 PM

#

fickle socket Specifically I am using Spacy and pipe multiprocessing but I seems like I am goi...

If you are using SLURM I guess you'll be deploying your program on a Multi-Mode scale? The easiest would be to start using MPI via mpi4py to communicate your processes in the cluster

fickle socket Mar 17, 2021, 1:01 PM

#

serene scaffold What is slurm?

It's a job scheduler for super computers. It's not like Hadoop with map reduce. It's looks like a very fancy bash script.

whole idol Mar 17, 2021, 1:01 PM

#

whole idol If you are using SLURM I guess you'll be deploying your program on a Multi-Mode ...

multinode rather

fickle socket Mar 17, 2021, 1:02 PM

#

whole idol If you are using SLURM I guess you'll be deploying your program on a Multi-Mode ...

I am worried about figuring out how to get Nodes to talk to each other. So far my code is basically what I run on my computer. Single node basically and spawning a bunch of processes.

How many of a pain it is to run things with MPI?

lean ledge Mar 17, 2021, 1:08 PM

#

fickle socket Anyone got any recommends on using Python and SLURM? I got a NLP pipeline I am w...

Horovod works well

#

Very very automatic

#

Essentially you just need to add a few lines to your code and it's automatically capable of running on a cluster

fickle socket Mar 17, 2021, 1:10 PM

#

Does that work with SLURM? It looks like I have to drag the admins to add that.

lean ledge Mar 17, 2021, 1:11 PM

#

It should work fine with slurm. Just need to load the openmpi and then set up relevant Python environments with horovod installed

#

Admins not needed unless for some reason there's no ompi

#

Edit your few lines of code, then you can mpirun within your sbatch

fickle socket Mar 17, 2021, 1:11 PM

#

There is definitely OMPI.

#

Isn't this for TensorFlow stuff? I am not running any GPU stuff just Python multiprocessing stuff.

lean ledge Mar 17, 2021, 1:12 PM

#

Oh you're not distributed NLP training?

fickle socket Mar 17, 2021, 1:13 PM

#

Oh, I am not doing training. I am running Spacy models. I should have mentioned that.

#

There is already a trained model. The challenge is to run it in a reasonable amount of time. I got thousands of biomedical papers to work through so clearly I need to think harder about running things.

lean ledge Mar 17, 2021, 1:15 PM

#

Do you have your NLP corpus stored in a pandas df at any point?

fickle socket Mar 17, 2021, 1:17 PM

#

Yea. But the dataset itself is a bunch of little json files with the plaintext in there. It's the CORD19 dataset if you know about it.

serene scaffold Mar 17, 2021, 1:17 PM

#

fickle socket There is already a trained model. The challenge is to run it in a reasonable amo...

So you're going to predict over some data? Can't you just divide it up between however many cores you have?

lean ledge Mar 17, 2021, 1:18 PM

#

Honestly there's a good chance you're better off launching multiple sbatch runs with different number arguments that pick which data they end up processing and then combining the information later

fickle socket Mar 17, 2021, 1:19 PM

#

serene scaffold So you're going to predict over some data? Can't you just divide it up between h...

That's the idea so far. It's "embarrassingly parallelized" since the docs don't have anything to do with each other. We aren't trying to predict things but mining them for linguistic features.

lean ledge Mar 17, 2021, 1:19 PM

#

Just make a script to split up the data into X directories, have a generic sbatch with an argument for which directory to look at, then launch X sbatch runs

#

They can be squeued and executed on their own

fickle socket Mar 17, 2021, 1:19 PM

#

lean ledge Honestly there's a good chance you're better off launching multiple sbatch runs ...

Makes sense. The bash scripts to do this is shouldn't be hard to do.

lean ledge Mar 17, 2021, 1:19 PM

#

While your first test run seems to be running, you can write a script to combine the output of the runs

#

Yep

#

Much easier than trying to parallelise the actual work on the application level using MPI

serene scaffold Mar 17, 2021, 1:20 PM

#

You can also use more_itertools.chunked and joblib

fickle socket Mar 17, 2021, 1:20 PM

#

I still need to make sure the pipeline itself runs with multiprocessing though, but that's already done.

lean ledge Mar 17, 2021, 1:21 PM

#

Psst, if in the future you need to do distributed training, horovod is lit 🔥 🔥

#

Implemented it on prod for all AutoML clients to use

#

Was fun

fickle socket Mar 17, 2021, 1:21 PM

#

lean ledge Psst, if in the future you need to do distributed training, horovod is lit 🔥 🔥

I'll keep that in mind. I know the cluster has some spicy GPUs on it but I don't we are at that point yet.

lean ledge Mar 17, 2021, 1:23 PM

#

I sorta miss HPC stuff. The power rush you get from launching 64 GPUs for training is like a drug to my ego

#

Plus module load tensorflow is the chillest experience I've had with dependency management ever

grave frost Mar 17, 2021, 1:23 PM

#

Hmm... is there any way we can use pre-trained word embeddings with large documents (except averaging them)?

#

coz I don't think averaging would be very useful or retain information tbh

lean ledge Mar 17, 2021, 1:25 PM

#

https://towardsdatascience.com/document-embedding-techniques-fed3e7a6a25d

Medium

Document Embedding Techniques

A review of notable literature on the topic

grave frost Mar 17, 2021, 1:35 PM

#

Got it. Thanx a ton!

#

stuff like sentence-BERT isn't viable, and I only have the word2vec embeddings for a particular language. averaging loses the order, so that's not so good either

#

What d'you reckon might work better :-
Doc2Vec trained from scratch on dataset (which is small but for a specific domain)
OR
Word2Vec trained on Wiki + average + Tf-Idf

lapis sequoia Mar 17, 2021, 2:34 PM

#

Is it fine to return a variable from a function where the variable has the same name as the function it is contained in? For example:

def ds(x, y):
    z = x + y**3
    a = 2 * (z / 3.14)
    ds = a + x / y
    return ds

# use the function as follows
>>> ds = ds(4.1, 9)

Or would something like this be better:

def calc_ds(x, y):
    z = x + y**3
    a = 2 * (z / 3.14)
    ds = a + x / y
    return ds

# use the function as follows
>>> ds = calc_ds(4.1, 9)

I have a bunch of functions like this defined in a module called solid. So I would actually call the function as

import solid
# approach 1
ds = solid.ds(4.1, 9)
# or using approach 2
ds = solid.calc_ds(4.1, 9)

I'm just wondering if there is a preferred naming convention for functions contained in a module.

serene scaffold Mar 17, 2021, 2:52 PM

#

@lapis sequoia functions contain references to themselves in their own namespace via the same name of the function in the outer namespace. That's what enables us to use recursion in the language. However in your first example, the statement ds = a + x / y re-assigns that name within the function's namespace.

#

however you could just do

def ds(x, y):
    z = x + y**3
    a = 2 * (z / 3.14)
    return a + x / y

no need to assign a name to it if you're just going to return it right away

#

please DM @sonic vapor about that

#

Anyway, it's bad practice to overwrite existing variable names unless you're trying to update the data that that name is supposed to represent.

lapis sequoia Mar 17, 2021, 2:56 PM

#

Gotcha.

serene scaffold Mar 17, 2021, 2:56 PM

#

a common convention to avoid doing that is to put a trailing underscore at the end of the variable name. So you could have ds_ = a + x / y, though in this case that wouldn't be useful since you can just return that expression right away

lapis sequoia Mar 17, 2021, 3:00 PM

#

I like to assign whatever I return in a function to variable. So the return statement is something like return x where x can be a small calculation (as my example above) or are large calculation that spans 2 or 3 lines of code. I like the idea of using an underscore. Something like return _ds would work for my example.

serene scaffold Mar 17, 2021, 3:03 PM

#

lapis sequoia I like to assign whatever I return in a function to variable. So the return stat...

You can do it that way if you'd like, though I'm not sure I see the advantage.

The convention is usually to use a trailing underscore to avoid name overwriting. Leading underscores indicate that an attribute isn't part of an object's interface, but variables internal to a function aren't exposed anyway.

lapis sequoia Mar 17, 2021, 3:10 PM

#

My second approach seems to avoid these issues by providing a more descriptive function name which avoids the naming problems.

# in solid.py
def calc_ds(x, y):
    z = x + y**3
    a = 2 * (z / 3.14)
    ds = a + x / y
    return ds
# use the function
import solid
ds = solid.calc_ds(4,1, 9)

serene scaffold Mar 17, 2021, 3:11 PM

#

lapis sequoia My second approach seems to avoid these issues by providing a more descriptive f...

my suggestion is for the last two lines to be:

ds_ = a + x / y
return ds_

rather than

_ds = a + x / y
return _ds

by changing the location of the underscore.

bronze jacinth Mar 17, 2021, 3:11 PM

#

im getting my precision = 0
searched up online and realised i have too many values for class 1 (6k+) compared to class 2(around 200)
now the ratio is 500 to 200 ish but im still getting precision = 0
help?

serene scaffold Mar 17, 2021, 3:12 PM

#

I don't agree with the second approach. if your function already has a descriptive name for the namespace that it's in, prepending it with "calc_" isn't very informative. most functions to a calculation.

#

@bronze jacinth either your model doesn't work, or the system you're using to calculate your precision score doesn't work. Or some other third thing. Can you be more descriptive about how you arrived at this point?

bronze jacinth Mar 17, 2021, 3:13 PM

#

sure yes

#

(im new so i apologise for any mistakes or misunderstandings)
i think the model is working because im getting decent accuracy and confidence

serene scaffold Mar 17, 2021, 3:15 PM

#

bronze jacinth (im new so i apologise for any mistakes or misunderstandings) i think the model ...

no problem. do you understand what the precision score is telling you?

#

like, what is the formula, and why does it matter?

bronze jacinth Mar 17, 2021, 3:16 PM

#

is it like the number of predicted values that turned out to be right?

haughty finch Mar 17, 2021, 3:16 PM

#

Help, my code keeps craching when it reaches model.fit(X,Y)
https://paste.pythondiscord.com/fanihamipe.py

serene scaffold Mar 17, 2021, 3:16 PM

#

bronze jacinth is it like the number of predicted values that turned out to be right?

"right" in what sense? lemon_sweat

bronze jacinth Mar 17, 2021, 3:16 PM

#

our teacher didnt explain the code much so i have to refer to youtube to understand syntax

serene scaffold Mar 17, 2021, 3:17 PM

#

what course is this?

bronze jacinth Mar 17, 2021, 3:17 PM

#

serene scaffold "right" in what sense? <:lemon_sweat:754441881718620281>

¯_(ツ)_/¯

serene scaffold Mar 17, 2021, 3:17 PM

#

on second thought though, it sounds like you do understand it

bronze jacinth Mar 17, 2021, 3:17 PM

#

serene scaffold what course is this?

just a college course where our seniors teach us stuff

serene scaffold Mar 17, 2021, 3:17 PM

#

because precision tells you how often the predictions that you actually make are right

bronze jacinth Mar 17, 2021, 3:18 PM

#

serene scaffold because precision tells you how often the *predictions that you actually make* a...

oOOo how often alright

serene scaffold Mar 17, 2021, 3:18 PM

#

if it is then I should leave

bronze jacinth Mar 17, 2021, 3:18 PM

#

im here so i drop the average here by a couple

serene scaffold Mar 17, 2021, 3:18 PM

#

bronze jacinth oOOo how often alright

do you know the formula?

#

I'm looking for one expression with tp, fp, fn, tn in there. but you won't use ~~one~~ two of those

bronze jacinth Mar 17, 2021, 3:18 PM

#

nope im afraid that wasnt thought

#

but i did see something like that while learning online

serene scaffold Mar 17, 2021, 3:19 PM

#

precision is tp / (tp + fp)

#

do you know what tp and fp are?

bronze jacinth Mar 17, 2021, 3:20 PM

#

no 😦

serene scaffold Mar 17, 2021, 3:20 PM

#

suppose you're making a classifier that tells you if something is a sandwich

#

if something is a sandwich, and your model says it's a sandwich, that's a true positive

misty flint Mar 17, 2021, 3:20 PM

#

ooo confusion matrix

serene scaffold Mar 17, 2021, 3:21 PM

#

if something is a sandwich, and your model says it's a salad, that's a false negative. it said it wasn't something (negative) and it was wrong (false)

#

but if it was a salad, and your model said it was a sandwich, that's a false positive. It said it was the thing you were looking for, but it's wrong.

misty flint Mar 17, 2021, 3:21 PM

#

speaking of stats, this was a good statement on p values and common misconceptions from the ASA https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108

bronze jacinth Mar 17, 2021, 3:21 PM

#

serene scaffold if something is a sandwich, and your model says it's a salad, that's a false neg...

sir that explaination is beautiful

serene scaffold Mar 17, 2021, 3:21 PM

#

bronze jacinth sir that explaination is beautiful

you're beautiful 💚

#

so basically, precision means "out of all the times my model said it was a sandwich, how many of them were actually sandwiches?"

#

as opposed to "what percentage of the sandwiches in the data did the model find?"

#

the latter is called recall

bronze jacinth Mar 17, 2021, 3:24 PM

#

oOhh yes

serene scaffold Mar 17, 2021, 3:24 PM

#

So to your question, why are you getting 0 for your precision score

bronze jacinth Mar 17, 2021, 3:24 PM

#

and true predictions over all predictions made is accuracy right?

serene scaffold Mar 17, 2021, 3:25 PM

#

bronze jacinth and true predictions over all predictions made is accuracy right?

yes, though the accuracy score isn't very useful if true negatives aren't helpful

lapis sequoia Mar 17, 2021, 3:25 PM

#

serene scaffold I don't agree with the second approach. if your function already has a descripti...

Well it depends on the use case. Pandas has a bunch of functions that start with read_. So knowing if a function performs a calculation or parses some data or provides a file path by just looking at the name of the function can be useful.

bronze jacinth Mar 17, 2021, 3:25 PM

#

my model says that these are the sandwiches, but actually none of them are
and thats why precision is coming 0

serene scaffold Mar 17, 2021, 3:26 PM

#

bronze jacinth my model says that these are the sandwiches, but actually none of them are and t...

so if your precision is quite literally zero, and this isn't an error with the precision calculator

#

I guess you'll have to adjust the parameters of your model 🙃

bronze jacinth Mar 17, 2021, 3:26 PM

#

hmm

#

not sure how but ill will try doing that

serene scaffold Mar 17, 2021, 3:26 PM

#

but not having run your code, I can't rule out that the precision calculator isn't broken

bronze jacinth Mar 17, 2021, 3:27 PM

#

what does support mean?

serene scaffold Mar 17, 2021, 3:27 PM

#

bronze jacinth what does support mean?

in what context?

bronze jacinth Mar 17, 2021, 3:28 PM

#

while using classification report

serene scaffold Mar 17, 2021, 3:28 PM

#

lapis sequoia Well it depends on the use case. Pandas has a bunch of functions that start with...

I assume ds has an established meaning

#

like in this other conversation, if I had a function that calculates precision, I would just call it precision because it's known that precision is a metric

serene scaffold Mar 17, 2021, 3:29 PM

#

bronze jacinth while using classification report

so it's some other metric?

#

I've never heard of it

bronze jacinth Mar 17, 2021, 3:29 PM

#

yea maybe

#

#

i printed the confusion matrix beacuse it sounded cool (i still have to learn that tho)

serene scaffold Mar 17, 2021, 3:30 PM

#

It might be that support is the number of instances of that class for a given calculation

#

but I'm not completely sure

bronze jacinth Mar 17, 2021, 3:31 PM

#

hmm

serene scaffold Mar 17, 2021, 3:31 PM

#

anyway, was this informative for you? I was going to study for my midterm, and then everything changed.

bronze jacinth Mar 17, 2021, 3:32 PM

#

this was hella helpful, thank you so much!

#

best of luck for your exams!

serene scaffold Mar 17, 2021, 3:32 PM

#

miss me with that operating systems shit

astral path Mar 17, 2021, 3:47 PM

#

im just stumped at filling in the columns in this dataframe
im at the point where it looks like this, it has the desired columns and the original columns

#

and i have a function

def stats(x):
  try:
    a = get_stats(x)
  except:
    return np.nan
  display(x)
  return a

#

which takes a URL and returns a dataframe with one row containing all the stats in the columns from ppg on
like this

i just have no clue how to loop over each url and insert the desired values from stats(url) into the desired columns
any help? Thanks!!

deft ruin Mar 17, 2021, 3:55 PM

#

You can use the map method: df[“url”] = df[“url”].map(stats)

astral path Mar 17, 2021, 3:59 PM

#

aight that's running right now

#

it'll take a while to see if it worked tho, uses a webscraper

deft ruin Mar 17, 2021, 3:59 PM

#

Oh it’s probably a good idea to make a simple test case so you can see if it works quickly

astral path Mar 17, 2021, 4:00 PM

#

oop yeah thats true

#

this is what's returned

#

vs. before

#

df_test = df.head(5)
df_test['more_stats'] = df_test['url'].map(stats)
df_test

deft ruin Mar 17, 2021, 4:05 PM

#

ignoring the warning for a moment, what's in the more_stats column(s)?

astral path Mar 17, 2021, 4:08 PM

#

this is it

deft ruin Mar 17, 2021, 4:09 PM

#

looks like it worked but created a nested structure

#

my bad i didnt realize there were multiple columns at first

astral path Mar 17, 2021, 4:10 PM

#

yeah, so that's a good first step

#

farther than i'd gotten so far

deft ruin Mar 17, 2021, 4:10 PM

#

instead of redefining df["more_stats"] you can just append the new dataframe as extra columns

astral path Mar 17, 2021, 4:11 PM

#

would that just append new columns for each iteration or just for the first time i iterate

deft ruin Mar 17, 2021, 4:12 PM

#

it will only append once if you use pd.concat([df1, df2], axis = 1) after creating the more_stats df

#

as an intermediate variable I mean

astral path Mar 17, 2021, 4:17 PM

#

hmm so just df_test['url'].map(stats) still returns a nested structure

misty flint Mar 17, 2021, 4:23 PM

#

whats the difference between a 3d array and a tensor

#

is it like the difference between a computer science vector and a physics vector

#

different definitions depending on the field?

#

pithink

sudden panther Mar 17, 2021, 4:24 PM

#

Hi everyone, can anyone know how to fetch_tweets in arabic word?

serene scaffold Mar 17, 2021, 4:24 PM

#

misty flint whats the difference between a 3d array and a tensor

isn't a tensor just an array that can be on the GPU?

serene scaffold Mar 17, 2021, 4:26 PM

#

sudden panther Hi everyone, can anyone know how to fetch_tweets in arabic word?

would changing tweet.lang == 'en' to 'ar' work?

deft ruin Mar 17, 2021, 4:26 PM

#

@astral path maybe use .loc again to pull out the data frame you want and then concatenate

astral path Mar 17, 2021, 4:28 PM

#

ended up with this

#

it's slow cus its not vectorized but it'll work for my needs

serene scaffold Mar 17, 2021, 4:28 PM

#

astral path ended up with this

could you make one dataframe out of da and concatenate it onto the bottom?

#

rather than doing len(da) append operations?

astral path Mar 17, 2021, 4:29 PM

#

what do you mean?

#

da ended up returning a Series of DataFrames

serene scaffold Mar 17, 2021, 4:30 PM

#

@astral path can you copy and paste the code in that cell into this chat as text?

astral path Mar 17, 2021, 4:30 PM

#

ya

#

da = df_test['url'].map(stats)
dw = pd.DataFrame()
for i in da:
  dw = dw.append(i, ignore_index=True)
dw

#

!code

serene scaffold Mar 17, 2021, 4:32 PM

#

da = df_test['url'].map(stats)
dw = pd.concat(da, ignore_index=True)

#

see if that's what you want. it might be faster.

astral path Mar 17, 2021, 4:32 PM

#

i'll try

#

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "Series"

serene scaffold Mar 17, 2021, 4:33 PM

#

it's a series of dataframes, right?

#

try dw = pd.concat(da.iteritems(), ignore_index=True)

#

@astral path does that work?

misty flint Mar 17, 2021, 4:40 PM

#

serene scaffold isn't a tensor just an array that can be on the GPU?

thats my working definition. mathematicians have a slightly dif variant but i dont envision it changing much

serene scaffold Mar 17, 2021, 4:41 PM

#

I didn't think "tensor" had a mathematical definition

#

I thought it was just a way of specifying how a mathematical data structure is being stored in a machine 😛

#

I stand corrected, but then I'm not a mathematician: https://en.wikipedia.org/wiki/Tensor

astral path Mar 17, 2021, 4:42 PM

#

serene scaffold <@!478676609914765322> does that work?

lemme check

#

nah TypeError: cannot concatenate object of type '<class 'tuple'>'; only Series and DataFrame objs are valid

#

it didn't

serene scaffold Mar 17, 2021, 4:43 PM

#

astral path nah `TypeError: cannot concatenate object of type '<class 'tuple'>'; only Series...

so it's not a series of dataframes

sudden panther Mar 17, 2021, 4:43 PM

#

serene scaffold would changing `tweet.lang == 'en'` to `'ar'` work?

It’s error and says “failed to parse JSON payload: Unterminated string starting at: line 1 column 644416”

serene scaffold Mar 17, 2021, 4:44 PM

#

sudden panther It’s error and says “failed to parse JSON payload: Unterminated string starting ...

I'd need to see the whole error message and the related code

#

!paste

arctic wedgeBOT Mar 17, 2021, 4:44 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

misty flint Mar 17, 2021, 4:44 PM

#

serene scaffold I stand corrected, but then I'm not a mathematician: https://en.wikipedia.org/wi...

DoggoKek

astral path Mar 17, 2021, 4:44 PM

#

serene scaffold Mar 17, 2021, 4:44 PM

#

astral path

how about you do print(da) and copy/paste the text?

astral path Mar 17, 2021, 4:45 PM

#

ok

#

2049                                                  ...
2048                                                  ...
2047                                                  ...
2046                                                  ...
2045                                                  ...
Name: url, dtype: object

serene scaffold Mar 17, 2021, 4:46 PM

#

astral path ```python 2049 ... 2048 ...

Can you instead do print(da.iloc[:5].to_csv())

astral path Mar 17, 2021, 4:46 PM

#

ya

#

,url
2049,"                                                 url   ppg  ... momentum experience
0  https://www.sports-reference.com/cbb/schools/v...  80.6  ...        1        1.7

[1 rows x 20 columns]"
2048,"                                                 url   ppg  ... momentum experience
0  https://www.sports-reference.com/cbb/schools/v...  80.6  ...        1        1.7

[1 rows x 20 columns]"
2047,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/n...  87.5  ...  0.833333        1.8

[1 rows x 20 columns]"
2046,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/s...  72.7  ...  0.666667        1.6

[1 rows x 20 columns]"
2045,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/n...  87.5  ...  0.833333        1.8

[1 rows x 20 columns]"

sudden panther Mar 17, 2021, 4:54 PM

#

serene scaffold I'd need to see the whole error message and the related code

These are the error

serene scaffold Mar 17, 2021, 4:54 PM

#

@sudden panther I can't help with this, unfortunately

sudden panther Mar 17, 2021, 4:55 PM

#

serene scaffold <@!360941343050170378> I can't help with this, unfortunately

Ok, no problem and thank you🙏🏻

hazy niche Mar 17, 2021, 4:56 PM

#

Hi folks, I have imported a csv files and I want to drop every column if the name equals alertXYZ, where XYZ is [0-9]

serene scaffold Mar 17, 2021, 4:56 PM

#

hazy niche Hi folks, I have imported a csv files and I want to drop every column if the nam...

can you copy and paste the first few rows of the CSV as text?

#

!paste

arctic wedgeBOT Mar 17, 2021, 4:56 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

astral path Mar 17, 2021, 4:57 PM

#

i think i'll just keep it with the O(N) implementation instead of vectorized unless it takes too long

serene scaffold Mar 17, 2021, 4:57 PM

#

astral path i think i'll just keep it with the O(N) implementation instead of vectorized unl...

I think your performance issue is that each append operation has to copy all the data in a dataframe, since append isn't a mutator method

#

In fact I think that might mean that it's O(n^2)

astral path Mar 17, 2021, 4:59 PM

#

oh yikes

serene scaffold Mar 17, 2021, 4:59 PM

#

I'd still like to know what print(da.iloc[:5].to_csv()) looks like

#

I can probably help you solve it if you are able to provide that

astral path Mar 17, 2021, 4:59 PM

#

OH i sent it like 10 minutes ago

#

turns out it errored

#

,url
2049,"                                                 url   ppg  ... momentum experience
0  https://www.sports-reference.com/cbb/schools/v...  80.6  ...        1        1.7

[1 rows x 20 columns]"
2048,"                                                 url   ppg  ... momentum experience
0  https://www.sports-reference.com/cbb/schools/v...  80.6  ...        1        1.7

[1 rows x 20 columns]"
2047,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/n...  87.5  ...  0.833333        1.8

[1 rows x 20 columns]"
2046,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/s...  72.7  ...  0.666667        1.6

[1 rows x 20 columns]"
2045,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/n...  87.5  ...  0.833333        1.8

[1 rows x 20 columns]"

#

it didn't say it was an error until i hovered over the message...

serene scaffold Mar 17, 2021, 5:00 PM

#

this doesn't look right

#

actually

#

hmm

#

@astral path join the code help 0 voice chat

astral path Mar 17, 2021, 5:00 PM

#

ok

hazy niche Mar 17, 2021, 5:01 PM

#

serene scaffold can you copy and paste the first few rows of the CSV as text?

Have to bring tomorrow from work. Computer is not connected to Internet

astral path Mar 17, 2021, 5:02 PM

#

ok im there now

deft ruin Mar 17, 2021, 5:39 PM

#

@astral path this avoids the append issue:

df2 = pd.concat([stats(url) for _, url in df['url'].iteritems()])
full_df = pd.concat([df, df2], axis = 1)

astral path Mar 17, 2021, 5:39 PM

#

oh i should have specified, stelercus helped me in the VC to fix it

deft ruin Mar 17, 2021, 5:40 PM

#

oh no worries -- glad you got it working

astral path Mar 17, 2021, 5:40 PM

#

thanks!

mortal dove Mar 17, 2021, 6:32 PM

#

I have a dataframe like this, I want to create a new column C That has the first value of A for each value of B(table is sorted by B)

   A        B        C
0  150      0   
1  153      0
2  157      0
3  160      1
4  165      1

So, when populated it would look like this:

   A        B        C
0  150      0        150 
1  153      0        150
2  157      0        150
3  160      1        160
4  165      1        160

Any ideas?

uncut orbit Mar 17, 2021, 6:38 PM

#

copy A

#

and then define it a df['C']

mortal dove Mar 17, 2021, 6:38 PM

#

It's not just a copy of A

uncut orbit Mar 17, 2021, 6:39 PM

#

ohhhh

exotic maple Mar 17, 2021, 6:55 PM

#

mortal dove I have a dataframe like this, I want to create a new column C That has the first...

I would do it like this (logical not code)

find unique values of B
define a function that, for every value of B determines
- range; samples of that B value
- minimum value of A for that range (this is the value you want repeated)
create a new column C that pastes min(a) for range B

#

You can probably implement something like that with df.apply()

#

df["C"] = df.apply(YOUR FUNCTION HERE, axis=1 -> since you want it over columns)

mortal dove Mar 17, 2021, 7:00 PM

#

Ah, dataframe isn't sorted on A, so it wouldn't always be the smallest value that has to be put into C, needs to be the first occurring value(time series index).
I hacked my way through with a loop, but it's pretty slow since I have 61k rows

exotic maple Mar 17, 2021, 7:07 PM

#

you can still do it

#

instead of min you can just cast list(a) and retrieve index 0

#

but since you loop already nvm :p

deft ruin Mar 17, 2021, 7:36 PM

#

I think groupby with transform would work here

#

df.groupby('b').transform(min)

#

oh my b not min but your function for getting the first index

mighty cobalt Mar 17, 2021, 7:43 PM

#

Hello I need a little help with cv2

#

So cv2 does not support gifs. How can I read gifs from Url's to manipulate them

#

I tried reading frame by frame and storing them in a np array but it didn't work

exotic maple Mar 17, 2021, 7:47 PM

#

@deft ruin to this day i still dont know wtf does transform do lol

deft ruin Mar 17, 2021, 7:47 PM

#

@exotic maple at least in this case it will broadcast the grouped df back to the original size

#

nice when you want e.g. a column with group means

exotic maple Mar 17, 2021, 7:49 PM

#

so if i nwated a column

#

with the mean of A

#

i can do

#

df["MEan"] = df.groubpy("A").transform(np.average) ?

deft ruin Mar 17, 2021, 7:50 PM

#

yeah exactly

exotic maple Mar 17, 2021, 7:51 PM

#

thanks man. I never wtf that was for lol. never got the hang of it

#

btw question, just to confirm if im right.

deft ruin Mar 17, 2021, 7:51 PM

#

yeah no problem man I'm still learning it as well

#

coming from R actually lol

exotic maple Mar 17, 2021, 7:52 PM

#

apply() on a dataframe is applied as a vector or as functional programming? That is, to the wholc olumn at once or value-by--value

#

i was having a debate about that in another forums

#

last i remember apply is also vectorized

#

applies the function to the whole column, at once

deft ruin Mar 17, 2021, 7:53 PM

#

yeah I think it's confusing because the dataframe method applies the function to each column or row (i.e. vector) but the apply method on a series is element by element

#

so you have to be careful with types

iron basalt Mar 17, 2021, 8:07 PM

#

exotic maple apply() on a dataframe is applied as a vector or as functional programming? That...

It depends on the function that you pass to it. Pandas will do different things depending on which function is given.

#

To understand apply vs transform: https://pbpython.com/pandas_transform.html

Understanding the Transform Function in Pandas

The transform function in pandas can be a useful tool for combining and analyzing data.

grave frost Mar 17, 2021, 8:39 PM

#

Why is it depends the most common programming answer for almost everything?

austere swift Mar 17, 2021, 8:41 PM

#

grave frost Why is `it depends` the most common programming answer for almost everything?

it depends

sick furnace Mar 17, 2021, 8:48 PM

#

I'm having some trouble in configuring my environment for Apache spark - when I try to run things like connecting to a postgresql, I get streams of errors

exotic maple Mar 17, 2021, 8:58 PM

#

iron basalt To understand apply vs transform: https://pbpython.com/pandas_transform.html

Sexy

plucky loom Mar 17, 2021, 9:37 PM

#

Do any of you, guys know about sympy?