#data-science-and-ml

1 messages · Page 296 of 1

serene scaffold
#

some people assume that Python won't do math the way they expect if they see 5.0 and not 5.000000000000000000000000000

exotic maple
#

really? i didnt know that at all

#

so inside the env i should use conda install, not pip install?

loud finch
#

Yes always conda install

#

If you cant run it that way, run it in other env

#

Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times, establishing a state that can be hard to reproduce. Most of these issues stem from that fact that conda, like other package managers, has limited abilities to control packages it did not install. Running conda after pip has the potential to overwrite and potentially break packages installed via pip. Similarly, pip may upgrade or remove a package which a conda-installed package requires. In some cases these breakages are cosmetic, where a few files are present that should have been removed, but in other cases the environment may evolve into an unusable state.

misty flint
#

oops

#

i use pip install when ive activated an env

exotic maple
#

one always learns something

#

-destroys env-

misty flint
#

but now i know

#

hey its better than not building in an env

velvet thorn
#

it doesn’t really matter for smaller environments where all dependencies are Python

#

for convenience you can just use pip

#

problems arise when you keep mixing the two over time

#

which you can generally avoid with a bit of planning

misty flint
velvet thorn
#

although using conda all the time is the safe play

bitter harbor
#

do you happen to know where/how this is done?

misty flint
#

thanks for the insight gm

misty flint
bitter harbor
#

ik I keep forgetting how bad it looks

misty flint
#

immunity

misty flint
#

sounds like you want a scatterplot instead

#

which is a different plot

lavish tundra
#

but i want keep the lines

#

i was thinking about draw another graphic, and hide the legend of the lineplot and try to hide the other graphic, but i think it will take too much performance to render 2 graphic

exotic maple
#

at that point its easier to cast it with matplotlib

#

plt.plot(x, 'ro') and so on

misty flint
#

thats what id do but only bc im not as familiar with seaborn

exotic maple
#

seaborn is default gorgeous but sometimes you want something a biiit more specific

#

and you end up accessing the matplotlib background anyways

misty flint
#

this is where R's ggplot or plotly would shine

bitter harbor
#

honestly the 1 and only thing I miss about R is the plotting

misty flint
#

better python viz libraries when

bitter harbor
#

just python 4 things

lean ledge
misty flint
#

based on ggplot2
already hooked

#

oh this is a really cool approach

#

might play with this lib next time i have to plot something

still otter
#

Is there a library/tool that lets you run custom simulations with AI agents? For example, I wrote a gridworld simulation, and I want to test AI models with it, and train them/evolve them.

exotic maple
#

I think there is one library for visual simulations but i cant remember it lol

exotic maple
lean ledge
#

It's a grid world, shouldn't be hard to simulate yourself

still otter
#

yeah, I've already written one as a sort of prototype

#

but I want to see if there are other options for me before I start "finalizing" things

#

specifically the "connecting AI agents" part, right now my ai is baked into the simulation code and I'd want to change that going forward

#

but if something already exists that does that for me i'd rather use that than reinvent the wheel

misty flint
still otter
#

I'm looking at openai/gym right now, but it's difficult to know where I'd find things similar to it

#

especially since gym seems to be built around reinforcement learning, while i'm focusing more on unsupervised learning right now

misty flint
#

simulations with AI agents most of the time will be focused on RL

#

since you need it to actually do RL

#

lol

#

i think a cool side project to do would be to teach an AI to play a small game with Q-Learning

#

like that flappy bird example

still otter
#

actually maybe i have my terms messed up and i am doing reinforcement learning lol

#

brute force reinforcement learning

#

or monte carlo might be the closest I guess

#

anyway, I see now that openai/gym provides specific environments and wasn't really made to handle custom environments

bitter harbor
#

send all the bot's vars for dectection/movement/whatever through a ltsm but clamp the values first based on conditions?

misty flint
#

that would be a cool idea

#

i only bring up q-learning since thats the easiest to implement

#

less parameters

#

also have you seen the code for some q-learning stuff?

#

its SO short

#

its wild

exotic maple
#

@misty flint whats q learning?

bitter harbor
#
"Q-learning is a model-free reinforcement learning algorithm to learn quality of actions telling an agent what action to take under what circumstances."```you send a bunch of actions to the algorithm, those actions change the algorithm's state, a reward is given + the algo optimises that reward
misty flint
#

a type of RL that is model-FREE but still performs very, very well

#

2 main parameters is alpha (learning rate) and gamma (temporal discounting)

#

thats it

#

only downside is it requires lots of repetitions

#

to "learn"

lean ledge
misty flint
#

the model-free part i would say is the biggest advantage

misty flint
#

yep

#

lets add that caveat

#

what are all the environmental constraints again

#

like it has to be deterministic ,etc.

bitter harbor
#

doesn't it also like just not work sometimes?

misty flint
#

single-agent

#

yeah

#

sometimes

#

lol

#

q-learning is like the linear regression of reinforcement learning

#

so dont put your hopes on it

lean ledge
#

q learning is brute forcing it

#

there's X actions you can take, Y states you can be in

bitter harbor
#

im sure it'd be fine for flappy bird tho

lean ledge
#

Make an X*Y table of actions-state combination

#

choose them randomly

#

as you learn more you can update your actions to exploit whatever information about the reward you gained

bitter harbor
#

it's been a while but isn't one of the problems that it's maximising not optimising?

lean ledge
#

maximising is optimising

exotic maple
#

maximising is a form of optimizing

#

any optimization is, by definition, max or min of a cost function.

exotic maple
#

I usually struggle more with terms than with understanding lol

fading vector
#

i want to be a data scientist and i will pay cash anyone who can teach me and be my mentor..

misty flint
lean ledge
# exotic maple in English?

If there's only a few states the agent can be in and a few different types of actions. If the state of the agent is continuous, or there's a lot of states it can occupy and lots of actions it can take, it becomes incredibly inefficient

#

Remember you're making a table where the columns are the action you take and the rows are the states you can be in, and you're randomly taking actions to see how the reward changes

#

As action space and state space become large, it becomes a massive table

#

Or if they're continuous spaces, that's not a table anymore

astral path
#

This might be a dumb question with an obvious answer, but what should I do if my dataset contains names such as St. Joseph's, Pennsylvania, Virginia Commonwealth, or Loyola, Illinois but I'm trying to use these as parts of a URL for scraping which contain these names under different aliases (St-josephs, VCU, Loyola-IL)? there's no dictionary that I know of which contains a list of aliases to try for each instance, so is the only option to manually rename each?

misty flint
#

there's no dictionary that I know of
NLTK might

#

itd probs be easier to rename tho

lean ledge
#

Unless there's a lot of different names it can be, it might just be easier to manually search for key terms

astral path
#

ooh lemme see

misty flint
#

how big is your dataset

#

you always have the most interesting issues joseph. i like it

lean ledge
#

If it's only 3 names and variations of those, pick key words like "joseph" and find all the ones that have it, add it in. Keep doing it for them until there's none left.

astral path
#

186 different names, idk how many are inconsistent with the URL

lean ledge
#

hmm 186 is not small

misty flint
#

how much time are you willing to spend

#

lol

astral path
#

oof a lot i guess lol

misty flint
astral path
#

😔

misty flint
#

idk how many are inconsistent with the URL
maybe take a quick look through the data

#

its probably not that many

astral path
#

idk i mean

#

they're all college names

#

so like half are acronyms

#

i gtg for another thing 😩 sorry bye

misty flint
#

bye

astral path
#

ok it was only like 3 lmao

#

it now looks like this

#

what i want to do is take the winner name, for example 1985 Temple, and get a URL like https://www.sports-reference.com/cbb/schools/temple/1985.html

#

df['url'] = df.apply[lambda x: x['Winner'].lower().replace(' ', '-') + "/" + str(x['Date']) + ".html"] is what I have, but i'm getting TypeError: 'method' object is not subscriptable

#

any ideas how I should do this?

#

(this is just for converting the columns Winner and Date to winner/date with dashes instead of whitespace

hasty grail
astral path
#

<> brackets?

#

shit nvm im dumb curly braces {}?

hasty grail
#

df.apply(...)

#

not df.apply[...]

astral path
#

ohhh ok

#

i'll try that in a few

misty flint
#

you know what this is like

#

this is like...

#

reverse web scraping

autumn veldt
#

Excuse me, i have an question here.

uid      category
110      banana      
101      banana
.
.
001      apple
010      apple

when i train this dataset with datasplit 80% training and 20% testing, and then when my classification program running, I input manually test data with "uid 000" which is this data not on data train, and then the result is apple. my question is, how come the data that not include on data train can be classified as apple? i want to know how does the classification tell us that this data is classified as an apple?```

sorry for bad writing, my english is not good.
keen kestrel
astral path
#

huh so

#

i got a column like this

#

and i used df['Date'] = df['Date'].str.replace('85', '1985') to change 85 to 1985 and df['Date'] = df['Date'].str.replace('16', '2016') for 16 to 2016, and a same one for all years in between

#

this is what's returned

#
       '2011199989', '20119990', '20119991', '20119992', '20119993',
       '20119994', '20119995', '20119996', '20119997', '20119998', '1999',
       '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007',
       '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015',
       '2016'], dtype=object)``` these are all the unique values of the column
#

what am I doing wrong?

#

it's a problem with all before 1999

#

wait nvm im a dumbass

autumn veldt
astral path
#

the 99 and 98 functions change the 98 and 99 in 198x and 199x to something else

#

how should I rewrite it to not have this bug?

keen kestrel
keen kestrel
astral path
#

yeah so like 85, 94, 00, 06, 12, etc...

misty flint
#

honestly i would probably run into the same problem bc id do it your way. its just funny seeing it joseph

astral path
#

lol

#
def f(x):
  if int(x[0]) == 0 | int(x[0]) == 1:
    x = str("20" + x)
  if int(x[0]) == 8 | int(x[0]) == 9:
    x = str("19" + x)
df['Date'] = df['Date'].apply(lambda x: f(x))

also tried it this way but it all ended up as None

keen kestrel
misty flint
#

will it work if you feed it as a string

#

idk the datetime module as well as i should

astral path
#

YESSSSS

misty flint
astral path
#

THANK YOU

misty flint
#

party time

astral path
#

im building a march madness predictor btw if y'all couldn't tell

keen kestrel
# astral path THANK YOU

That column type changes to Date by the way, need to convert it back to string if you somehow need it.

astral path
#

i got a $2000 pool and im the only one in a stem major lol

#

ok i'll make sure to do that

misty flint
#

march madness leggo

#

actually ken jee is building something similar

#

since hes a sports analytics guy and all

astral path
#

oh yeah ive heard of that guy

#

cool channel

#

i have 3 days left ! ! ! !

misty flint
#

best of luck

#

listen to his podcast

astral path
#

thank u!!

misty flint
#

kens nearest neighbors

#

its great

astral path
#

oof i never have time for podcasts

misty flint
#

you dont sit down and listen

#

i listen while im driving, running or doing chores

astral path
#

lol i'm basically doing homework 18 hours of the day

#

i tried it and i either cant focus on the podcast or i cant focus on the hw 😔

misty flint
#

hw all day everyday...

#

anyway i will leave this here for the lurkers

astral path
#

grazie i'll check it if i ever find time!!

modest void
#

is there a way to use a rolling window on multiple columns in pandas?

uncut barn
#

Anyone have any good links for auto encoders for curve fitting?

#

/resources

lapis sequoia
#

hi

tall trail
#

i have this list of lists i want to put in my dataframe, i have the row number and the column name but it doesnt work the way google tells me to do it, what am i doing wrong?

ls = df_sub.values.tolist()
print(ls)                          df.loc[i,'trains'] = ls

i keep getting this value error: ValueError: Must have equal len keys and value when setting with an ndarray

grave frost
#

I am surprised anyone even uses Q-learning now. Weren't DQN's the go-to "always better" method than Q-learning, since Q-learning is such a naive method

lean ledge
#

DQN is still Q learning, the table is just replaced with an NN

#

It's still relatively naive

grave frost
#

I was talking about the ease of implementation. If you could implement a DQN In the same amount of time as Q-learning, why would you use the more naive/simpler method?

obtuse sable
#

what happens if I only standardize my features but not normalize to range(0, 1)

#

for neural network

#

my model training was stalling at 10 epochs before standardizing and now it seems to be improving past that after standardizing. is there a need to normalize then?

lean ledge
#

And also because Q learning is probably faster to train and computationally faster

stray mica
#

Hey guys,
I have one of my thesis topic as 'Video captioning system to search through videos if an digital assistant needs to find an answer.'
Could anyone guide me on how do i start research on this topic and the concepts that i need to learn for this?
Currently I'm going through papers with video captioning system. Any help would be really helpful.
Thanks.

pearl vault
#

Hello guys
I am thinking of buying a new laptop for training deep learning models
Can anyone tell the minimum requirements (especially the CPU like no of cores) should I consider before buying

austere swift
#

are you gonna be training on the laptop? or coding on the laptop and using something like colab or kaggle

pearl vault
#

Coding

#

And also training in the same laptop

#

@austere swift

austere swift
#

if you're gonna be training locally on the laptop you definitely need a gpu

pearl vault
austere swift
#

8gb ram won't be enough lol

#

get at least 16

pearl vault
#

I will upgrade it to 16

lapis sequoia
#

bro that bottleneck is gonna be insane

pearl vault
#

The problem is the CPU it's an quad core processor

#

Will it affect my performance?

lapis sequoia
#

i think so

#

@pearl vault i actually think u should try to config a pc

#

that would be cheeper and you can freely choose ur components

austere swift
#

i'm pretty sure they want a laptop

#

you can't really build a laptop very easily

lapis sequoia
#

oh ok

lapis sequoia
pearl vault
#

1241$

#

Usd

lapis sequoia
#

i think there are lots of videos for laptops to train ur ai

pearl vault
#

Other option is

lapis sequoia
#

still to less RAM

pearl vault
#

I can upgrade ram

lapis sequoia
#

16 would be enough

#

so serarch another model

pearl vault
#

One slot will be empty I can buy an another ram stick but the CPU gpu choice is the problem

lapis sequoia
#

this could help

pearl vault
#

Ty

lapis sequoia
#

ur wlcm

shadow grail
#

Can someone help me with this please ?

grave frost
#

I do discourage using laptop tho - you can easily use it up7in a few months.

pearl vault
#

@grave frost I just want to know wether the CPU is a bottle neck or not

#

Currently Laptop is the only option I have now

obtuse sable
#

What happens if there is multicollinearity or useless features in a neural network? Is the algorithm smart enough to fix them? Sorry if this sounds stupid I'm a complete beginner

serene scaffold
#

are df.loc and .iloc O(1) lookup if you only use indices/column names (or any for iloc)?

#

I mean I guess they're both O(n) for the number of rows or columns you ask them to fetch

civic ferry
#

#Create a function to combine the values of the important columns into a single string
def get_important_features(data):
important_features = []
for i in range(0, data.shape[0]):
important_features.append(data['Actors'][i]+' '+data['Director'][i]+' '+data['Genre'][i]+' '+data['Title'][i])

return important_features                                                                                              #Create a column to hold the combined strings

df['important_features'] = get_important_features(df)

#Show the data
df.head(3)
This is my code although i am getting below error: Traceback (most recent call last)
<ipython-input-10-50d23e3e0015> in <module>()
1 #Create a column to hold the combined strings
----> 2 df['important_features'] = get_important_features(df)
3
4 #Show the data
5 df.head(3)

3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in sanitize_index(data, index)
746 if len(data) != len(index):
747 raise ValueError(
--> 748 "Length of values "
749 f"({len(data)}) "
750 "does not match length of index "

ValueError: Length of values (1) does not match length of index (1000)
Please help

serene scaffold
#

!code @civic ferry

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

also what is data?

civic ferry
#

Its data of movie reccomendation collected from kaggle

serene scaffold
civic ferry
serene scaffold
#

No, only what a few lines of print(data) prints out

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

^ please paste it like that

civic ferry
#

Okay 1 sec

#

#Store the data
df = pd.read_csv('IMDB-Movie-Data.csv')
#Show the first 3 rows of data
df.head(3)

#

I havent tried the print function though

#

This prints 3 columns of data

serene scaffold
#
def get_important_features(data):
  important_features = []
  for i in range(0, data.shape[0]):
    important_features.append(data['Actors'][i]+' '+data['Director'][i]+' '+data['Genre'][i]+' '+data['Title'][i])

I can't guess what this function does unless I know what data looks like. Please run print(data)

civic ferry
#

Here,hope this helps

serene scaffold
#

I still don't know what data is. I really need to know that or I can't continue.

civic ferry
#

Should i run print(data)?

serene scaffold
#

yes

#

please have that be the first line of get_important_features, and then copy and paste what it prints as text.

civic ferry
#

Its giving an error

arctic wedgeBOT
#

Hey @civic ferry!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

civic ferry
#

@serene scaffold

serene scaffold
civic ferry
#

Nope

serene scaffold
#

can you copy and paste the error as text into this chat?

lavish swift
#

@civic ferry I usually try to avoid any sort of loop when I'm working with dataframes. If you need to create a new column that combines other columns, have you tried something like:

df['combined'] = df['actors'] + ' ' + df['director'] + ' ' + df['genre'] + ' ' + df['title']

Though you may want to use a better delimiter than space, otherwise parsing the new column may get a little ugly (the names from actors, director and title are all likely going to have spaces)

#

oh...and replace df with data (since I think your dataframe is called data

#

out of habit, I called mine df

fierce shadow
# pearl vault Other option is

focus more on your graphics card as machine learning and ai are graphics card intensive, if you are looking forward for reinforcement learning and stuff then 16gb+ ram is also recommended

#

1050ti is kinda bad choice for ai, I would recommend you going for 1650 super or better

pearl vault
#

@fierce shadow 1660ti sir

fierce shadow
#

thats pretty nice then

pearl vault
#

What about processor? I5 10300 h is a quad core processor many people are telling it will bottle neck the gpu

fierce shadow
#

I don't think so that it would bottle neck

pearl vault
#

Ok ty

blazing bridge
#

no a 1660ti and i5 10300h would not bottleneck the gpu

fair girder
#

What's a good "sparsity" to switch a dense matrix to a sparse one?

#

I am dealing with a matrix with roughly 50% zeros and want to reduce ram usage

tidal bough
#

not sure if 50% is low enough - you can test it yourself, compare the memory usage of numpy.ndarrays with scipy.sparse.csr_matrixes, say.

#

often you have to deal with matrixes with a density of like one percent, in which case it's definitely profitable

grave frost
#

Your CPU doesn't matter much. what is priority is memory and GPU. since mobile GPU's (like the on in laptops) have to optimize power for batteries with thermals, you likely will face a lot of crashes and won't be able to use it for long (Speaking from experience). Very few laptops have good thermals like Razer Laptops that cost 1.5 Lakh+

#

Desktop is usually the best choice of Deep Learning. Laptops are good only for light stuff

astral path
#

quick stats question: what metric should I use for measuring how spread out the majority of a variable is (if that makes sense?

tidal bough
#

Variance?

astral path
#

for example
you can see that pts drops off after 4 rows

tidal bough
#

Alternatively, you may be looking for kurtosis.

astral path
#

what's kurtosis?

exotic maple
#

@tidal bough Isnt kurtosis only used when you want to know how one-sided your distribution is?

tidal bough
tidal bough
exotic maple
#

as in, you want to know if you can assume a normal distribution

tidal bough
#

kurtosis is for measuring how thicc the tails are, basically

exotic maple
exotic maple
astral path
#

as a more broad question, I'm trying to figure out if a team relies very heavily on a few players to score or if its scoring is very evenly distributed throughout the entire team

#

so like how top-heavy it is

#

would kurtosis apply ?

exotic maple
#

I think that would be skewness

#

but honestly, you should be able to find it with a basic histogram and / or kde curve of goals

astral path
#

ah ok

#

i'm using this as a feature for an ML model

#

so it'd need to be quantified

exotic maple
#

basically what you want to know is if a player or group of players as the majority of goals right?

astral path
#

yeah basically

exotic maple
#

also consider: you need to normalize it by position. I dont know much about american "football" but

astral path
#

or minutes played alternatively

#

this is basketball btw

exotic maple
#

positions have a major influence

#

ah

astral path
#

positions don't have a huge influence because all 5 posisitions are on the court the same amt of time for 90% of teams

exotic maple
#

yeah for basketball i dont think you will have that consideration

grave frost
#

how thicc the tails
I honestly never though confusedreptile would ever use smthing like that lol

astral path
#

lol

grave frost
#

reminds of Dani 😁

astral path
#

so skew is the way to go?

exotic maple
#

I would think so, yes. perhaps someone ese can chime

#

remember that skewness measures "simmetry" basically

#

so, if every position / person scored the same, it would be symmetrical

astral path
#

ah yeah that makes sense

exotic maple
#

but if one person scores most, your distribution of scores will be assymetrical

astral path
#

ok, that really helps a lot!

exotic maple
#

I really need to have a statistics book to keep on hand ffs.

astral path
#

im really thinking i should get at least a minor in statistics, i couldn't imagine only taking two courses and then going into industry

exotic maple
#

I love hypothesis testing but i always forget when to use what

#

-rages in 5 types of t-tests-

astral path
#

funny story

#

my data science class was supposed to be for stats majors only and i'm a data science major, and my university forgot to put a stats prereq on it even though most students are DS majors in the class and none of us knew about the prereq

#

so me and like 80% of the class went halfway thru without knowing a bit of statistics when the prof thought there was a stats prereq

#

luckily she caught on and removed the stats part of assignments cus i was lost lol

exotic maple
#

lliterally 0 statistics?

#

wtf lol

#

like not even basic descriptives?

astral path
#

yeah i never took it in high school and my first semester was filled with calc, CS, and gen ed

uncut orbit
#

oof

exotic maple
#

oof

uncut orbit
#

do you know it now?

exotic maple
#

look at this guy not knowing what's the difference between mean, median and mode :v

#

-calls them all average-

uncut orbit
#

lmao

exotic maple
astral path
astral path
exotic maple
#

that's basically descriptive statistics :p

#

add range, and quartiles and you're set on descriptives

astral path
#

and stdev is stretching it because i was only told that it was "the average of how far away from the average things are" in sophomore year of HS

astral path
exotic maple
#

depending on how elitist the person is, some will claim stddev is useless and you should always speak in variances instead

#

i think each has their use :v

astral path
#

nah she's a really down to earth prof who understands the students, she just didnt' know we didnt know anything about stats

exotic maple
#

central limit theorem isnt 'basic' but its definetely not the toughjest thing out there

astral path
#

yeah i just have a hard time when she put an integral into stats when i dont even get the stats without calculus in there lmao

exotic maple
#

i've literally never, ever used calculus for stats

#

its in the definition of the density and all, but ive never, ever used it

uncut orbit
#

LMAO

#

you guys are good

tidal bough
#

how do you calculate conditional distributions and stuff like that, then?

astral path
#

that bodes well ig

#

calculate what 👀

#

shit i should take stats this summer

exotic maple
astral path
#

i cant go on in my major like this

tidal bough
#

in my probability theory course, we eventually switched to a notation where even sums over discrete distributions are written via integrals

#

because it's just nicer to have things consistent

exotic maple
#

I can see where its coming from thou

astral path
#

well its been nice talkin but i gtg to said data science class in 1 minute

exotic maple
#

I've always found the definition of Riemann integrals to be so succintly beautiful yet complex

bronze jacinth
uncut orbit
#

explain your doubts

bronze jacinth
#

im not sure which is bankrupt and which is not

#

like are 0=bankrupt and 1=not in the 'Bankrupt?' column?

#

also since im new its hard and takes time understanding the code

uncut orbit
#

ah ofc

#

i would think that 1 means bankrupt

bronze jacinth
#

yes even i thought the same thing

uncut orbit
#

hmmm

bronze jacinth
#

but (1min)

uncut orbit
#

your data is skewed a lot

exotic maple
#

0 normally is no

#

and lso, use logic

#

do you believe most companies would be bankrupt?

bronze jacinth
uncut orbit
#

go to google datasets...you might find an explanation

bronze jacinth
#

he knows ml so maybe i mightve trust him :/

uncut orbit
bronze jacinth
#

oh thanks!

exotic maple
#

you can know something and still screw up

uncut orbit
#

its from uci

exotic maple
#

always curate things with a bit of common sense as well lol. It's unlikely 90% of taiwanese companies are bankrpt...

exotic maple
#

if they are, i'll send you some money :v

bronze jacinth
bronze jacinth
#

i didnt know if that was possible so here i am 🙂

#

wait ew no i dont want the emoji

uncut orbit
#

welcome anytime

bronze jacinth
#

for the prev dataset(one im using)

#

i put an example to check but it gives the same output for any data i put in there

uncut orbit
#

check all ur variables

exotic maple
#

what is the best place to take a refresher in calculus? I think i need to review a lot of things.

uncut orbit
#

coursera

exotic maple
#

any specifics?

#

i tried khan academy in the past and liked it, but im thinking something a bit more...applied

uncut orbit
#

hmmm

#

i haven't done calculus yet

#

but lemme see

#

most courses are for free

astral path
bronze jacinth
astral path
#

they're pretty applied to engineering problems

exotic maple
#

Calc3 as in vector calculus?

astral path
#

we're getting into that right now actually so yeah

uncut orbit
astral path
#

but start of the semester was series

exotic maple
#

I think i have my college notebook for that one. The ones i've lost are calc 1 and 2 lol

astral path
#

ohhh lemme check if im still in the canvas course for those

exotic maple
#

thanks man 🙂

bronze jacinth
astral path
#

ye im in it!

exotic maple
#

nothing in general

#

just a frefsher

#

anything is ok

astral path
#

something like this?

uncut orbit
#

it has some calculus

bronze jacinth
#

@uncut orbit my friend asked me to run this and came to the conclusion that majority are bankrupt

uncut orbit
#

id say not...

exotic maple
arctic wedgeBOT
#

Hey @astral path!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @astral path!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

#

Hey @astral path!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

astral path
#

oof

#

i'll just DM

exotic maple
#

thanks a lot man 😄 you too @uncut orbit

uncut orbit
#

welcome

bronze jacinth
#

and i dont know how to go after that because in whatever ive learned there'll be .data files there

uncut orbit
#

the kaggle one is the same

bronze jacinth
#

oh

#

i checked the variables and i dont think they are wrong...

uncut orbit
#

ok

#

try to directly copy paste the data points

bronze jacinth
uncut orbit
#

ok

#

are you using colab?

bronze jacinth
#

vs code

uncut orbit
#

hmmm do you have anaconda?

bronze jacinth
#

nope :/

#

ill make a colab?

uncut orbit
#

yea

#

it'll be easier that way

spring seal
#

Hi there, I am interested in Simulation of Physics/Engineering problems by using python. For example: Simulation in Fluid Dynamics and Thermodynamics . So, which modules i have to learn and practice and get projects. Now days, I am learning Pandas, Numpy, Matplotlib and seaborn. My background is Mechanical Engineer. Please suggest modules / Techniques , I would like to hear and apply.

jaunty plaza
#

yo how much does tensorRT help

misty flint
#

havent done calc in 10 years

#

but looked at partial derivatives just so i could understand backpropagation

exotic maple
#

8 years here

#

I remember some things about derivatives integration and calculus

#

But if u ask me to solve one ill embarass myself lol

grave frost
#

I am learning Pandas, Numpy, Matplotlib and seaborn
Uh, how does that have anything to do with fluid simulations?

misty flint
#

itll start to come back if you start looking at examples again

warm coyote
#

## Import Libraries



import pandas as pd                 # pandas is a dataframe library
import matplotlib.pyplot as plt     # matplotlib.pyplot plots data
import numpy as np                  # numpy provides N-dim object support

# do ploting inline instead of in a separate window
%matplotlib inline

## Load and review data 

df = pd.read_csv("./tree/Notebooks/data/pima-data.csv")      # load Pima data.  Adjust path as necessary

df.shape

Error that I'm getting:


NameError Traceback (most recent call last)
<ipython-input-8-633337079cd0> in <module>
----> 1 df.shape

NameError: name 'df' is not defined

Also, I am using Jupyter notebook

misty flint
#

what happens when you run 'df' by itself in a cell

#

if it doesnt show up, you probably didnt import pandas

silk marsh
#

movie recomodation is based on KNN??

serene scaffold
silk marsh
#

like it is based on matrix factorization ?

misty flint
#

this was interesting to see

serene scaffold
misty flint
#

then for those in the states

astral path
#

lmao wyoming

misty flint
#

california, texas, and new york seem to have most jobs postings

misty flint
astral path
#

ay my state is second highest %

misty flint
#

washington?

astral path
#

actually my state is highest percent

#

ye

misty flint
#

technically first

#

bc DC isnt a state

astral path
#

washington state not DC!

misty flint
#

my state is 2nd highest in postings

astral path
#

texas nice

#

never been

misty flint
#

🇨🇱

#

yes ik thats the chile flag

#

but good enough

astral path
#

lmao

#

austin?

misty flint
#

i wish

#

dfw

astral path
#

oh nice

#

austin is the big tech scene in texas so that was my first guess

misty flint
#

spent some time in austin tho. at least ik there will be jobs for me in my state after i graduate

#

yeah gonna try to move there after my first job

#

we shall see

astral path
#

slc is on the rise in tech jobs overall

misty flint
#

they have a ton jobs down there

astral path
#

which is where i go to school

misty flint
#

noice

astral path
#

ok so anyways i had a question

#

more boring than usual

misty flint
#

lolol

#

go for it

astral path
#

I'm building a function to scrape from a table and one part is dropping certain columns from a dataframe at some point. however, not all tables will have a specific column. how would I make it so this one column is only dropped if it exists?

#

rows_df = rows_df.drop(columns=['#', 'Weight', 'Hometown', 'High School', 'RSCI Top 100']) is what i have right now, and not all tables have High School as a column

misty flint
#

what happens if you try to drop the column but it doesnt exist?

#

does it return an error?

astral path
#

yeah

#

KeyError: "['High School'] not found in axis"

misty flint
#

i would just do a small try/except code right before dropping the column

astral path
#

i thought about it but

misty flint
#

see if it exists

astral path
#

wait nvm

misty flint
#

did it not work?

#

i think it should work

astral path
#

yeee it works!!

#

thank u man

misty flint
astral path
#

i'm ALMOST done with data collection ! !

misty flint
#

np sometimes its easy to figure out once you talk your ideas out

#

and nice

#

i believe in you

astral path
#

$350 now on the line here!

misty flint
#

💸

astral path
#

💰

silk marsh
#

@serene scaffold yes like i was searching for the project like what it is based upon

#

am not able to understand is it based o knn or matrix....

astral path
#

so it's not just one column

#

is there a way to just conditionally do it for any col?

exotic maple
#

@astral path You can try creating a master list of cols, then retrieve the columns names from the actual table, evaluate which ones match, and then only delete the matche

#

matches

#

for example

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

exotic maple
#
cols_to_delete = ["PLOP1", "PLOP2", "COL3"]

# here you get the cols from the df
df_cols = list(df.columns)

# evlauate and keep only those that are found
to_del = [col for col in df_cols if col in cols_to_delete

#finally, drop them
df.drop(to_del, axis=1)

#

I did it via lists but there are other ways to do it

misty flint
#

that seems like it would work yeah

#

just dont forget the ending brackets for the list comprehension

#

that ALWAYS trips me up

#

"why doesnt this list comprehension work"

#

forgets the brackets

exotic maple
misty flint
#

we are literally the same person i swear

exotic maple
#

what would be a good, non iteration way to find the largest string on a list

#

?

#

right now I have this

#
    long_word = ""
    
    for word in unique_words:
        if len(word) > len(long_word):
            long_word = word
        else:
            continue
#

but im sure there has to be more optimal way to do it 😦

astral path
#

I'm assuming it's not sorted?

#

if not I'm pretty sure the only way would be O(N)

spark stag
#

max will implicitly iterate over the list but if the list is in no way sorted, you would need to iterate over the list to find the longest string

astral path
#

if I have a dataframe which looks like this

#

and i want to append a row with new columns that looks like this for each row in the dataframe

#

ho would I do that?

#

so the first row with 2016 Villanova vs. 2016 North Carolina would have that all the data from the second image as data in new columns

#

and all the other rows in the dataframe would have the same columns

nocturne kraken
#

probably would be something called a "join"

astral path
nocturne kraken
#

then how do you decide which row gets to which row?

astral path
#

i'm iterating over each row

#
for index, row in df.iterrows():
    try:
      # a should be the two rows combined, get_stats is a function that creates the new columns
      a = get_stats(row['url']).join(row)
      display(a)
    except:
      print("ERROR: ", row['url'])
    time.sleep(1.2)
    break
nocturne kraken
#

so just by the order?

astral path
nocturne kraken
#

you can concat with axis=1

astral path
#

i'll try it

astral path
nocturne kraken
#

I don't know what you are doing

#

I thought you had two dataframes: df1 and df2

#

and you want them smooshed together horizontally

#

that is done by pdf.concat([df1, df2], axis=1)

astral path
#

i have a dataframe df with a column url, and I have a function get_stats(url) which takes the value of url for a given row and returns a new dataframe which is a single row. I want to append get_stats(url) to the dataframe for each value of url

nocturne kraken
#

use row.append

#

or you could use pd.apply and create the entire dataframe of stats from the urls

astral path
#

aight i'm running that

kindred radish
#

In machine learning, is it important to remove outliers?

#

I've got this column where the majority of values are in the range of 50-100, however the column has a max value of 400. So when i normalise the column, i get a bunch of low numbers because of only one outlier

#

This would presumably have an affect on how the machine learns right? Is there a scheme for removing outliers? Like looking at the standard deviation or something?

lean ledge
lean ledge
kindred radish
#

oh hi Raggy!

lean ledge
kindred radish
#

Is that what this is doing?

astral path
#

that's just normalizing it by min and max values

astral path
#

yeah what raggy pasted is better for yuo

kindred radish
#

physics wizards pulling through once again 👏 cheers Raggy ^^

kindred radish
#

So i gave this a whirl, and I just want to know how this will affect learning:

#

So there are negatives, so if i put this through a MLP i imagine this will "slacken" weights instead of "tightening" them (does that make sense)?

#

Is this better or worse than using normalised values from 0 to 1?

#

Moreover, do i still need to remove these outliers? Or is it fine to keep them as this data is "standardised" instead of normalised?

high badge
#

the outliers should be removed i think

#

or replaced

#

by median or mean or imputed with whatever statistic you see fit

kindred radish
#

so if i remove them i get rid of the whole row right?

high badge
#

you can get rid of the whole row, column, or just the value itself

kindred radish
#

I thought i had to get rid of NaN data?

high badge
#

but if u were to get rid of just the value itself you should replace it with another value that wont disproportionately affect training

kindred radish
#

oh i see

#

which is why you suggested the mean/median

high badge
#

yea

kindred radish
#

hmmm i didn't think of that before. In this case it doesn't matter too much since it's only 2 values. But i have removed a decent amount of NaN data before

#

Didn't think to replace the values with the mean/median

#

is removing it the "safer" option?

high badge
#

depends on how important your row/column is

#

if you identify a column as significant or you think it will be useful during training, then perhaps u should keep the oclumn

kindred radish
#

the answer would be that I don't know, since im not sure how much of an affect this particular column has on the data

high badge
#

u can try estimating how important it is, if its not something blatantly irrelevant then it should be kept for the network to find hidden patterns in

shy kraken
#

I can't seem to find the explanation for when you use a ! or a % in google colab....when do you use which? why do I have to use ! for ls and % for cd?

misty flint
#

its the same as for jupyter magic commands

#

since colab is built on jupyter

#

look up jupyter notebook magic

shy kraken
#

Got it! I am a magician now

misty flint
eternal hare
#

RuntimeError: Input type (torch.FloatTensor) and weight type (XLAFloatType) should be the same

#

i have no idea

#

what to do

serene scaffold
#

though I'll make one uninformed guess anyway: is there a way to convert whatever your weights are to a FloatTensor?

dire comet
#

Hi, anyone knows what should i do here? The only thing that I know is that I should make a py code using a montecarlo and the maximum likelihood estimation, but not really know how.

exotic maple
#

ooh a confidence interval question

#

I havent seen it wth std of random error thou lol

#

Confidence interval is

#

mean +- (t-score @ confidence * std / sqrt(samples)

#

t-score also requires degrees of freedom, which normally is samples -1

#

but i have never seen a case where you're given the standard deviation of the error

#

in your case can calculate the sample mean from the observations, the t-score from the confidence interval and degrees of freedom

#

but i have no clue if you can just input the standard error of the mean there lol

dire comet
#

Oh ok ok, I will give it a try. Thanks for the help tho @exotic maple 🙂

exotic maple
#

this is what i said, ni math form

#

x bar is the average of the sample

dire comet
#

Thank you so much!

mortal flicker
#

can someone please help me to re-arrange this data frame?

#

want a matrix that is 50 (states) x months

#

i have tried to group by date and then transpose

exotic maple
#

You want a multilevel index of state and month?

mortal flicker
#

yes

#
month1  val    val         val
month2  val    val         val```
#

maybe i can export it as a csv and reshape it in R lol

lavish swift
#

@mortal flicker are you able to share your data? or a snippet? Just easier to try some things on.

astral path
#

im just stumped at filling in the columns in this dataframe

#

im at the point where it looks like this, it has the desired columns and the original columns

#

and i have a function

def stats(x):
  try:
    a = get_stats(x)
  except:
    return np.nan
  display(x)
  return a

which takes a URL and returns a dataframe with one row containing all the stats in the columns from ppg on

#

like this

#

i just have no clue how to loop over each url and insert the desired values from stats(url) into the desired columns

#

any help? Thanks!!

misty flint
brave owl
#

Posted a Question about Normalizing at #help-kiwi , Please help if possible

sudden panther
#

Hi everyone ,I’m Anggita. Currently I’m making sentiment analysis twitter. I want to analyze hashtag of trending tweets on different country? Can anyone know what should I do?

obtuse sable
#

Trade signals xD

bronze jacinth
#

getting precision as 0 with this dataset. help?

neon thistle
#

Hello guys, I'm starting in python and I trying detect anomalies in a list. I search anything to make this and tested a code, but I'm receiving a error:
ModuleNotFoundError: No module named 'numpy'
Anyone can help me?

bronze jacinth
#

you have to import numpy

lean ledge
#

You have to install numpy first

bronze jacinth
#

@lean ledge would you mind helping me out?

neon thistle
neon thistle
misty tiger
#

can anyone help me to setup gpu fr jupyter notebook

rich reef
#

Greetings,
I have a pandas dataframe with a Datetimeindex and various discrete values.
For simplification:

index       p    q
2020-1-1    6    9
2020-1-8    4    19
2020-1-12   4    17

Every time a value p or q changes, a record is added. That means that at 2020-1-2 the values are essentially the same as 2020-1-1.

I want to plot these data but face a problem:

  • Plotting the sparse data structure as a line, there's a slope between measurements which is misleading.
  • Plotting the sparse data structure as a bar leaves countless gaps.
  • Filling the missing dates and plotting line or bar is slow, because the real dataset spans several years and has 40 columns.

Essentially I want a bar chart where bars have different widths, until the next measure, or a line plot that moves "like a staircase" with only horizontal and vertical segments.

What would you recommend?

untold cove
rich reef
#

This seems to be a very different issue than what I'm asking 🤔

#

But thanks anyway

velvet thorn
#

@rich reef I like the line option

grave frost
#

Oh god stupid Jupyter notebooks. I put my model to train for 48 hours on cloud time and it didn't say anything at all. the next time when I used the command line I got killed. Jupyter sucks 🤬

#

Now I have to spend more money to train it for the 3rd time with more RAM 😞

rich reef
# velvet thorn <@185043591343636480> I like the line option

I'll try that, then. I think I'll duplicate the index, offset by a second, and merge and forwardfill.

That should get

index                    p    q
2019-12-31 23:59:59      NA   NA
2020-01-01 00:00:00      6    9
2020-01-07 23:59:59      6    9
2020-01-08 00:00:00      4    19
2020-01-11 23:59:59      4    19
2020-01-12 00:00:00      4    17

A line plot should then practically be a staircase. The NA at the start doesnt matter because its outside the interval.

velvet thorn
rich reef
#

How would you do that? Honestly I always grab pandas out of convenience, and I work with Dataframe's in the rest of my data wrangling

rich reef
# grave frost Oh god stupid Jupyter notebooks. I put my model to train for 48 hours on cloud t...

https://www.youtube.com/watch?v=7jiPeIFXb6U I'll just leave this here ...

I have been using and teaching Python for many years. I wrote a best-selling book about learning data science. And here’s my confession: I don’t like notebooks. (There are dozens of us!) I’ll explain why I find notebooks difficult, show how they frustrate my preferred pedagogy, demonstrate how I prefer to work, and discuss what Jupyter could do ...

▶ Play video
warm moth
#

Ai is cool but cyber secuiryt is better

#

im doing cyber sec currently

#

just hopefully i can get a good gpa

#

but im working on it

lavish tundra
#

i'm doing a bot where multiple ppl gonna use it to generate a img and sent it to the person, so i was thinking how the file should be render, but i was thinking, is possible to XY graphic be sent without be saved and the script understand the different of ppl like using ram?
What i mean is: instead save the file with a ID and sent it and after it delete, is possible i dont need to save it and he just sent the file who match the person who request?

fickle socket
#

Anyone got any recommends on using Python and SLURM? I got a NLP pipeline I am working on and I am wondering if anyone here got some tips on trying to scale Python on a supercomputer.

#

Specifically I am using Spacy and pipe multiprocessing but I seems like I am going to hit some road blocks with that but I can figure that out on my end.

grave frost
# warm moth Ai is cool but cyber secuiryt is better

There are plenty of projects to automate CyberSec with AI. currently they are experimental so it required bombarding the server with requests - however I read about some corporate AI projects that can query a server and construct a full profile of it to aid another model to find vulnerabilities in it. pretty cool if you ask me

whole idol
fickle socket
fickle socket
lean ledge
#

Very very automatic

#

Essentially you just need to add a few lines to your code and it's automatically capable of running on a cluster

fickle socket
#

Does that work with SLURM? It looks like I have to drag the admins to add that.

lean ledge
#

It should work fine with slurm. Just need to load the openmpi and then set up relevant Python environments with horovod installed

#

Admins not needed unless for some reason there's no ompi

#

Edit your few lines of code, then you can mpirun within your sbatch

fickle socket
#

There is definitely OMPI.

#

Isn't this for TensorFlow stuff? I am not running any GPU stuff just Python multiprocessing stuff.

lean ledge
#

Oh you're not distributed NLP training?

fickle socket
#

Oh, I am not doing training. I am running Spacy models. I should have mentioned that.

#

There is already a trained model. The challenge is to run it in a reasonable amount of time. I got thousands of biomedical papers to work through so clearly I need to think harder about running things.

lean ledge
#

Do you have your NLP corpus stored in a pandas df at any point?

fickle socket
#

Yea. But the dataset itself is a bunch of little json files with the plaintext in there. It's the CORD19 dataset if you know about it.

serene scaffold
lean ledge
#

Honestly there's a good chance you're better off launching multiple sbatch runs with different number arguments that pick which data they end up processing and then combining the information later

fickle socket
lean ledge
#

Just make a script to split up the data into X directories, have a generic sbatch with an argument for which directory to look at, then launch X sbatch runs

#

They can be squeued and executed on their own

fickle socket
lean ledge
#

While your first test run seems to be running, you can write a script to combine the output of the runs

#

Yep

#

Much easier than trying to parallelise the actual work on the application level using MPI

serene scaffold
#

You can also use more_itertools.chunked and joblib

fickle socket
#

I still need to make sure the pipeline itself runs with multiprocessing though, but that's already done.

lean ledge
#

Psst, if in the future you need to do distributed training, horovod is lit 🔥 🔥

#

Implemented it on prod for all AutoML clients to use

#

Was fun

fickle socket
lean ledge
#

I sorta miss HPC stuff. The power rush you get from launching 64 GPUs for training is like a drug to my ego

#

Plus module load tensorflow is the chillest experience I've had with dependency management ever

grave frost
#

Hmm... is there any way we can use pre-trained word embeddings with large documents (except averaging them)?

#

coz I don't think averaging would be very useful or retain information tbh

lean ledge
grave frost
#

Got it. Thanx a ton!

#

stuff like sentence-BERT isn't viable, and I only have the word2vec embeddings for a particular language. averaging loses the order, so that's not so good either

#

What d'you reckon might work better :-
Doc2Vec trained from scratch on dataset (which is small but for a specific domain)
OR
Word2Vec trained on Wiki + average + Tf-Idf

lapis sequoia
#

Is it fine to return a variable from a function where the variable has the same name as the function it is contained in? For example:

def ds(x, y):
    z = x + y**3
    a = 2 * (z / 3.14)
    ds = a + x / y
    return ds

# use the function as follows
>>> ds = ds(4.1, 9)

Or would something like this be better:

def calc_ds(x, y):
    z = x + y**3
    a = 2 * (z / 3.14)
    ds = a + x / y
    return ds

# use the function as follows
>>> ds = calc_ds(4.1, 9)

I have a bunch of functions like this defined in a module called solid. So I would actually call the function as

import solid
# approach 1
ds = solid.ds(4.1, 9)
# or using approach 2
ds = solid.calc_ds(4.1, 9)

I'm just wondering if there is a preferred naming convention for functions contained in a module.

serene scaffold
#

@lapis sequoia functions contain references to themselves in their own namespace via the same name of the function in the outer namespace. That's what enables us to use recursion in the language. However in your first example, the statement ds = a + x / y re-assigns that name within the function's namespace.

#

however you could just do

def ds(x, y):
    z = x + y**3
    a = 2 * (z / 3.14)
    return a + x / y

no need to assign a name to it if you're just going to return it right away

#

please DM @sonic vapor about that

#

Anyway, it's bad practice to overwrite existing variable names unless you're trying to update the data that that name is supposed to represent.

lapis sequoia
#

Gotcha.

serene scaffold
#

a common convention to avoid doing that is to put a trailing underscore at the end of the variable name. So you could have ds_ = a + x / y, though in this case that wouldn't be useful since you can just return that expression right away

lapis sequoia
#

I like to assign whatever I return in a function to variable. So the return statement is something like return x where x can be a small calculation (as my example above) or are large calculation that spans 2 or 3 lines of code. I like the idea of using an underscore. Something like return _ds would work for my example.

serene scaffold
lapis sequoia
#

My second approach seems to avoid these issues by providing a more descriptive function name which avoids the naming problems.

# in solid.py
def calc_ds(x, y):
    z = x + y**3
    a = 2 * (z / 3.14)
    ds = a + x / y
    return ds
# use the function
import solid
ds = solid.calc_ds(4,1, 9)
serene scaffold
bronze jacinth
#

im getting my precision = 0
searched up online and realised i have too many values for class 1 (6k+) compared to class 2(around 200)
now the ratio is 500 to 200 ish but im still getting precision = 0
help?

serene scaffold
#

I don't agree with the second approach. if your function already has a descriptive name for the namespace that it's in, prepending it with "calc_" isn't very informative. most functions to a calculation.

#

@bronze jacinth either your model doesn't work, or the system you're using to calculate your precision score doesn't work. Or some other third thing. Can you be more descriptive about how you arrived at this point?

bronze jacinth
#

sure yes

#

(im new so i apologise for any mistakes or misunderstandings)
i think the model is working because im getting decent accuracy and confidence

serene scaffold
#

like, what is the formula, and why does it matter?

bronze jacinth
#

is it like the number of predicted values that turned out to be right?

haughty finch
serene scaffold
bronze jacinth
#

our teacher didnt explain the code much so i have to refer to youtube to understand syntax

serene scaffold
#

what course is this?

bronze jacinth
serene scaffold
#

on second thought though, it sounds like you do understand it

bronze jacinth
serene scaffold
#

because precision tells you how often the predictions that you actually make are right

serene scaffold
#

if it is then I should leave

bronze jacinth
#

im here so i drop the average here by a couple

serene scaffold
#

I'm looking for one expression with tp, fp, fn, tn in there. but you won't use one two of those

bronze jacinth
#

nope im afraid that wasnt thought

#

but i did see something like that while learning online

serene scaffold
#

precision is tp / (tp + fp)

#

do you know what tp and fp are?

bronze jacinth
#

no 😦

serene scaffold
#

suppose you're making a classifier that tells you if something is a sandwich

#

if something is a sandwich, and your model says it's a sandwich, that's a true positive

misty flint
#

ooo confusion matrix

serene scaffold
#

if something is a sandwich, and your model says it's a salad, that's a false negative. it said it wasn't something (negative) and it was wrong (false)

#

but if it was a salad, and your model said it was a sandwich, that's a false positive. It said it was the thing you were looking for, but it's wrong.

misty flint
bronze jacinth
serene scaffold
#

so basically, precision means "out of all the times my model said it was a sandwich, how many of them were actually sandwiches?"

#

as opposed to "what percentage of the sandwiches in the data did the model find?"

#

the latter is called recall

bronze jacinth
#

oOhh yes

serene scaffold
#

So to your question, why are you getting 0 for your precision score

bronze jacinth
#

and true predictions over all predictions made is accuracy right?

serene scaffold
lapis sequoia
bronze jacinth
#

my model says that these are the sandwiches, but actually none of them are
and thats why precision is coming 0

serene scaffold
#

I guess you'll have to adjust the parameters of your model 🙃

bronze jacinth
#

hmm

#

not sure how but ill will try doing that

serene scaffold
#

but not having run your code, I can't rule out that the precision calculator isn't broken

bronze jacinth
#

what does support mean?

serene scaffold
bronze jacinth
#

while using classification report

serene scaffold
#

like in this other conversation, if I had a function that calculates precision, I would just call it precision because it's known that precision is a metric

serene scaffold
#

I've never heard of it

bronze jacinth
#

yea maybe

#

i printed the confusion matrix beacuse it sounded cool (i still have to learn that tho)

serene scaffold
#

It might be that support is the number of instances of that class for a given calculation

#

but I'm not completely sure

bronze jacinth
#

hmm

serene scaffold
#

anyway, was this informative for you? I was going to study for my midterm, and then everything changed.

bronze jacinth
#

this was hella helpful, thank you so much!

#

best of luck for your exams!

serene scaffold
#

miss me with that operating systems shit

astral path
#

im just stumped at filling in the columns in this dataframe
im at the point where it looks like this, it has the desired columns and the original columns

#

and i have a function

def stats(x):
  try:
    a = get_stats(x)
  except:
    return np.nan
  display(x)
  return a
#

which takes a URL and returns a dataframe with one row containing all the stats in the columns from ppg on
like this

i just have no clue how to loop over each url and insert the desired values from stats(url) into the desired columns
any help? Thanks!!

deft ruin
#

You can use the map method: df[“url”] = df[“url”].map(stats)

astral path
#

aight that's running right now

#

it'll take a while to see if it worked tho, uses a webscraper

deft ruin
#

Oh it’s probably a good idea to make a simple test case so you can see if it works quickly

astral path
#

oop yeah thats true

#

this is what's returned

#

vs. before

#
df_test = df.head(5)
df_test['more_stats'] = df_test['url'].map(stats)
df_test
deft ruin
#

ignoring the warning for a moment, what's in the more_stats column(s)?

astral path
#

this is it

deft ruin
#

looks like it worked but created a nested structure

#

my bad i didnt realize there were multiple columns at first

astral path
#

yeah, so that's a good first step

#

farther than i'd gotten so far

deft ruin
#

instead of redefining df["more_stats"] you can just append the new dataframe as extra columns

astral path
#

would that just append new columns for each iteration or just for the first time i iterate

deft ruin
#

it will only append once if you use pd.concat([df1, df2], axis = 1) after creating the more_stats df

#

as an intermediate variable I mean

astral path
#

hmm so just df_test['url'].map(stats) still returns a nested structure

misty flint
#

whats the difference between a 3d array and a tensor

#

is it like the difference between a computer science vector and a physics vector

#

different definitions depending on the field?

sudden panther
#

Hi everyone, can anyone know how to fetch_tweets in arabic word?

serene scaffold
serene scaffold
deft ruin
#

@astral path maybe use .loc again to pull out the data frame you want and then concatenate

astral path
#

ended up with this

#

it's slow cus its not vectorized but it'll work for my needs

serene scaffold
#

rather than doing len(da) append operations?

astral path
#

what do you mean?

#

da ended up returning a Series of DataFrames

serene scaffold
#

@astral path can you copy and paste the code in that cell into this chat as text?

astral path
#

ya

#
da = df_test['url'].map(stats)
dw = pd.DataFrame()
for i in da:
  dw = dw.append(i, ignore_index=True)
dw
#

!code

serene scaffold
#
da = df_test['url'].map(stats)
dw = pd.concat(da, ignore_index=True)
#

see if that's what you want. it might be faster.

astral path
#

i'll try

#

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "Series"

serene scaffold
#

it's a series of dataframes, right?

#

try dw = pd.concat(da.iteritems(), ignore_index=True)

#

@astral path does that work?

misty flint
serene scaffold
#

I didn't think "tensor" had a mathematical definition

#

I thought it was just a way of specifying how a mathematical data structure is being stored in a machine 😛

astral path
#

nah TypeError: cannot concatenate object of type '<class 'tuple'>'; only Series and DataFrame objs are valid

#

it didn't

serene scaffold
sudden panther
serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

astral path
serene scaffold
astral path
#

ok

#
2049                                                  ...
2048                                                  ...
2047                                                  ...
2046                                                  ...
2045                                                  ...
Name: url, dtype: object
serene scaffold
astral path
#

ya

#
,url
2049,"                                                 url   ppg  ... momentum experience
0  https://www.sports-reference.com/cbb/schools/v...  80.6  ...        1        1.7

[1 rows x 20 columns]"
2048,"                                                 url   ppg  ... momentum experience
0  https://www.sports-reference.com/cbb/schools/v...  80.6  ...        1        1.7

[1 rows x 20 columns]"
2047,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/n...  87.5  ...  0.833333        1.8

[1 rows x 20 columns]"
2046,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/s...  72.7  ...  0.666667        1.6

[1 rows x 20 columns]"
2045,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/n...  87.5  ...  0.833333        1.8

[1 rows x 20 columns]"
sudden panther
serene scaffold
#

@sudden panther I can't help with this, unfortunately

sudden panther
hazy niche
#

Hi folks, I have imported a csv files and I want to drop every column if the name equals alertXYZ, where XYZ is [0-9]

serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

astral path
#

i think i'll just keep it with the O(N) implementation instead of vectorized unless it takes too long

serene scaffold
#

In fact I think that might mean that it's O(n^2)

astral path
#

oh yikes

serene scaffold
#

I'd still like to know what print(da.iloc[:5].to_csv()) looks like

#

I can probably help you solve it if you are able to provide that

astral path
#

OH i sent it like 10 minutes ago

#

turns out it errored

#
,url
2049,"                                                 url   ppg  ... momentum experience
0  https://www.sports-reference.com/cbb/schools/v...  80.6  ...        1        1.7

[1 rows x 20 columns]"
2048,"                                                 url   ppg  ... momentum experience
0  https://www.sports-reference.com/cbb/schools/v...  80.6  ...        1        1.7

[1 rows x 20 columns]"
2047,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/n...  87.5  ...  0.833333        1.8

[1 rows x 20 columns]"
2046,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/s...  72.7  ...  0.666667        1.6

[1 rows x 20 columns]"
2045,"                                                 url   ppg  ...  momentum experience
0  https://www.sports-reference.com/cbb/schools/n...  87.5  ...  0.833333        1.8

[1 rows x 20 columns]"
#

it didn't say it was an error until i hovered over the message...

serene scaffold
#

this doesn't look right

#

actually

#

hmm

#

@astral path join the code help 0 voice chat

astral path
#

ok

hazy niche
astral path
#

ok im there now

deft ruin
#

@astral path this avoids the append issue:

df2 = pd.concat([stats(url) for _, url in df['url'].iteritems()])
full_df = pd.concat([df, df2], axis = 1)
astral path
#

oh i should have specified, stelercus helped me in the VC to fix it

deft ruin
#

oh no worries -- glad you got it working

astral path
#

thanks!

mortal dove
#

I have a dataframe like this, I want to create a new column C That has the first value of A for each value of B(table is sorted by B)

   A        B        C
0  150      0   
1  153      0
2  157      0
3  160      1
4  165      1

So, when populated it would look like this:

   A        B        C
0  150      0        150 
1  153      0        150
2  157      0        150
3  160      1        160
4  165      1        160

Any ideas?

uncut orbit
#

copy A

#

and then define it a df['C']

mortal dove
#

It's not just a copy of A

uncut orbit
#

ohhhh

exotic maple
#

You can probably implement something like that with df.apply()

#

df["C"] = df.apply(YOUR FUNCTION HERE, axis=1 -> since you want it over columns)

mortal dove
#

Ah, dataframe isn't sorted on A, so it wouldn't always be the smallest value that has to be put into C, needs to be the first occurring value(time series index).
I hacked my way through with a loop, but it's pretty slow since I have 61k rows

exotic maple
#

you can still do it

#

instead of min you can just cast list(a) and retrieve index 0

#

but since you loop already nvm :p

deft ruin
#

I think groupby with transform would work here

#
df.groupby('b').transform(min)
#

oh my b not min but your function for getting the first index

mighty cobalt
#

Hello I need a little help with cv2

#

So cv2 does not support gifs. How can I read gifs from Url's to manipulate them

#

I tried reading frame by frame and storing them in a np array but it didn't work

exotic maple
#

@deft ruin to this day i still dont know wtf does transform do lol

deft ruin
#

@exotic maple at least in this case it will broadcast the grouped df back to the original size

#

nice when you want e.g. a column with group means

exotic maple
#

so if i nwated a column

#

with the mean of A

#

i can do

#

df["MEan"] = df.groubpy("A").transform(np.average) ?

deft ruin
#

yeah exactly

exotic maple
#

thanks man. I never wtf that was for lol. never got the hang of it

#

btw question, just to confirm if im right.

deft ruin
#

yeah no problem man I'm still learning it as well

#

coming from R actually lol

exotic maple
#

apply() on a dataframe is applied as a vector or as functional programming? That is, to the wholc olumn at once or value-by--value

#

i was having a debate about that in another forums

#

last i remember apply is also vectorized

#

applies the function to the whole column, at once

deft ruin
#

yeah I think it's confusing because the dataframe method applies the function to each column or row (i.e. vector) but the apply method on a series is element by element

#

so you have to be careful with types

iron basalt
grave frost
#

Why is it depends the most common programming answer for almost everything?

sick furnace
#

I'm having some trouble in configuring my environment for Apache spark - when I try to run things like connecting to a postgresql, I get streams of errors

plucky loom
#

Do any of you, guys know about sympy?