#data-science-and-ml

1 messages ยท Page 247 of 1

velvet thorn
#

My maths is alright
@desert parcel okay, I'm not here to quiz you

#

but yeah, try fixing the nan values in the input

#

and see if it works

desert parcel
#

I'm checking the csv

#

I don't see anything nan values

#

even if there are I avoided those columns

velvet thorn
#

uh

#

why aren't you doing it programmatically

#

it's in a DataFrame

#

you can query it

desert parcel
#

I was looking at a pandas tutorial

#

but I feel asleep xd

#

I was tired

velvet thorn
#

okay...?

desert parcel
#

lol I know I'm not the best

#

Oh i see the error now

#

ok yeah there is a loss now

#

no more nans

velvet thorn
#

yup

#

that's good

#

btw for future reference

#

you can use df.isna().sum()

#

to see whether there are nulls on a column-wise basis

desert parcel
#

ohh

#

yeah that's handy

velvet thorn
#

or df.isna().sum().sum() to get the total number of nulls

#

yeah...

#

well.

#

pandas knowledge is really important IMO

#

scanning a CSV by hand is hell

desert parcel
#

yeah I can see that

#

I was looking at the tutorial before I fell asleep

#

the way you can handle data is really useful

velvet thorn
#

IMO you should be @ least intermediate with pandas and numpy1

#

before you even think of touching TF/torch

desert parcel
#

yeah I'm gonna take a look at those

#

100%

#

I just thought it would be to cool to learn ML

#

I went in without really knowing what I needed to know

#

I also got an inf in one of my losses

#

The values of loss change depending on what optimizer I use

velvet thorn
#

it's cool

#

but

#

not many people can pick it up right off the bat.

#

and if your fundamentals are weak

#

you run into a lot of problems

#

without knowing how to solve them

desert parcel
#

Well I just read the errors

#

and mess with the code for like 30 minutes or something

#

before I ask for help

velvet thorn
#

yeah, I think you depend way too much on getting help

desert parcel
#

Yeah I think so too lol

#

But sometimes I can't figure out the issue

velvet thorn
#

and if your fundamentals are weak
@velvet thorn largely because of this

#

to be fair, part of it is about experience

desert parcel
#

Well I can't say much about that since I'm new to this

#

I just mess around with stuff I know and read the docs

#

Not that I can understand a lot of it

velvet thorn
#

I would suggest at least 2-3 months of quality Python experience before beginning to touch deep learning

desert parcel
#

What would you consider quality then

#

Because I feel like i'm good at python but there are things that I miss

velvet thorn
#

Because I feel like i'm good at python but there are things that I miss
@desert parcel don't think so TBH

desert parcel
#

I know I'm not good at it but I just feel like im' good at it lol

velvet thorn
#

even on the knowledge level

desert parcel
#

I did say feel

velvet thorn
#

for example, are you familiar with decorators, context managers, or the descriptor protocol?

desert parcel
#

never heard of the last one

velvet thorn
#

__get__?

desert parcel
#

Oh yeah

#

stuff like

#

__init__

velvet thorn
#

huh?

#

no...

desert parcel
#

lol then nvm

velvet thorn
#

the descriptor protocol underlies properties

desert parcel
#

maybe I should rewatch that python tut video

#

that got me into this

velvet thorn
#

anyway, IMO quality is about building things that stretch your capabilities

#

and developing knowledge

desert parcel
#

This being python for beginners

velvet thorn
#

breadth is really useful.

desert parcel
#

not the tut for getting into ml

velvet thorn
#

because everything is connected

#

a bit of computer science knowledge is also really nice for ML

desert parcel
#

I have none of that

#

mostly because I'm still in highschool and I haven't really searched up any vids on CS

#

Maybe I should do that

velvet thorn
#

the nice thing

#

about life nowadays

#

is that you're no longer locked into your major

#

I don't have a CS background either

#

nothing even close

#

but, yeah, knowing the answer to questions like "why is it faster to tell if an element is in a set vs a list" will come in handy someday.

desert parcel
#

An element in a set doesn't have repeating values

#

right?

#

like if there are repeats only one instance will be printed, not sure if the right language is used

velvet thorn
#

uh...

#

yes but no

#

I mean, nothing of what you have said is wrong

#

but what I mean is

#

3 in {1, 2, 3} vs 3 in [1, 2, 3]

#

the former is faster; why?

#

and that is a CS question

desert parcel
#

ohh

#

Idk maybe something with memory?

#

I'm just guessing

velvet thorn
#

no point in me telling you

desert parcel
#

ik

velvet thorn
#

the thing is that these things are not obvious, but they will be important sometime in the future

desert parcel
#

I'm checking it out

velvet thorn
#

just an illustration

#

anyway I'm out

#

have fun learning!

#

I was in your position like a year ago

#

it's a great journey

desert parcel
#

Well thanks for the constructive criticism

#

Really did bring somethings to light

#

or shine

lapis sequoia
#

Do i need to know the foundational linear regression algorithm and knn algorithm and some algebra,matrix,probability and statistics,calculus and numpy completely to order start new life in machine learning sequence/interested in the legendary computer vital version

desert oar
#

Yes

#

It's foundational material

#

Even if you don't use it frequently

worn bough
#

You don't need it to implement basic algorithms, but it's very handy if you want to know what you're doing.

#

I mean, you don't need a huge maths course, but you need to know a thing or two about probabilities and calculus.

lapis sequoia
#

no algebra?

#

what if i wanna be efficient

#

then i require good math right

desert oar
#

@lapis sequoia probability, calculus, linear algebra. i agree you dont need to learn it all at once, but you should definitely start learning it and seek to keep learning it over time.

lapis sequoia
#

@desert oar Thanks buddy

heady lance
#

COMMANDLINE Video Player - convert video files to ascii art
upvotes 263 comments 21 user Slingerhd

What is the most impressive Python based project you have seen?
Sometimes I find that Python can be so much more, but people use it mainly in data science (which is fine). Wonder any...
upvotes 38 comments 37 user vitsensei

Crime Watch: An Interactive Way To View Crime
​ A Demonstration Of Crime Watch Github Link...
<upvotes638729835245731840> 29 <comments638729835073765387> 8 <user638729835442602003> python959

Python logo in colored ASCII art!
<upvotes638729835245731840> 28 <comments638729835073765387> 2 <user638729835442602003> Honno

[A DoS attack in 15 lines of code.
Hi, I have tried to create the simplest possible denial of service attack; for this script, I have not used more than 15...
upvotes 6 comments 26 user progsNyx

viral scroll
#

The driverBreakdown column here is a nested dict

  'environment': {'average': 5,
   'questions': {'Question 1': 5}},
  'peerRelationship': {'average': 4,
   'questions': {'Question 2': 4}}},
 'Mood': {'average': 5.0,
  'mood': {'average': 5,
   'questions': {'Question 3': 5}}},
 'RewardsAndRecognition': {'average': 1.0,
  'recognition': {'average': 1,
   'questions': {'Question 4': 1}}}}

I would like to convert the driverBreakdown column into multiple rows in this way

#

is there any way to achieve this directly via pandas

#

and by not using multiple python iterators

heady lance
#

Hello Everyone

desert oar
#

@viral scroll i would do a combination

#
  1. write a function to "flatten" each nested dict
  2. "explode" the flattened dicts into dataframe rows
tribal hornet
#

hello, in which channel can i clear my doubts about python?

viral scroll
#

@desert oar
Any suggestion in how to optimize the flatten part as my data set can contains upto a million rows and I am afraid that flatting each row could be time consuming.

Also, Thanks for letting me know about the explode function.

desert oar
lapis sequoia
#

where can i learn Linear Algebra

remote pumice
#

Hello can somebody help? so i am planning to make a driver distraction detection using open cv. so i am thinking of adding a feature which shows the number of alert the driver gets during driving. So how can i get the data?

near moss
#

you convince a lab or some agency to fund your research

#

because I am 99.9% sure those data are not publicly available if they exist at all

solid aurora
#

I've got a matplotlib question:

#

I have a plot using 2x2 subplots, and things are layouted properly

#

But when I add a column there is a large gap between the rows

#

and changing the figsize doesn't help

#

let me take some screenshots and show examples

last wind
#
 [1 1 0 0]
 [1 1 1 0]
 [1 1 1 1]]
``` anyone have an idea how to make this mask in numpy
#

please ping me if you have an answer

solid aurora
#

the pics are intentionally low-quality

#

the content of the plots is irrelevant here

#

why is there a large gap between the two rows?

#

I ran plt.figure(figsize=(20, 20)) before both

#

I'm assuming that the figsize affects the subplots, since the bottom pic is clearly not square

last wind
#

because the total figure size is now 20 by 20. thas an aspect ration of 1:1 for those youd need an aspect ratio of like 2:3 i belive

#

wait is there whitespace after the lowr ones?

solid aurora
#

@last wind if the figsize affects the total size then why is the bottom picture clearly non-square?

#

no

last wind
#

thanks

solid aurora
#

lemme try with (30, 20)

#

still don't fully understand how that works

#

but ยฏ_(ใƒ„)_/ยฏ

jolly sinew
#

If I have a intake csv file from an animal shelter and an outcomes csv file from an animal shelter, but there are about 200 more records in the outcomes csv, how could I remove those so I can nicely join dataframes from the two csv files with pandas?

solid aurora
#

there are types of joins you can use

#

idr how exactly pandas does joins

#

but shouldn't you be able to do a Left Join (assuming intake is on the left)?

#

@jolly sinew ^

lapis sequoia
#

Hi guys I'm learning numpy and make a face-recogintion using opencv , how do i do that

#

ik numpy basics and some essential so should i get start learning opencv/

#

or do i need to know how ml works

jolly sinew
#

@solid aurora I tried a merge on the animal ID column, but it is not a unique column because sometimes an animal with the same animal ID is recorded / admitted multiple times, so merging on animal ID multiplied those records. However, I really appreciated your advice and I'll try the left join.

#

So there's not really a good primary key

solid aurora
#

@jolly sinew hmm maybe generate a new column which is f"{animal-id}-{visit-number}"

#

so if animal 1 gets seen 3 times, the column will be 1-1, 1-2, and 1-3

jolly sinew
#

Oh nice, that's a good idea

solid aurora
#

that's assuming that the intake and outtake happens sequentially and there are no missing records

#

that would totally break if there is an outtake that's not recorded

jolly sinew
#

I'm going to give it a shot

#

Am I allowed to post links to the datasets here? It might make more sense if you could see their general shape

solid aurora
#

Yea sure

#

I won't be able to take a look, but maybe someone else can

merry ridge
#

Is anyone familiar with the package MIP? I have a somewhat complex mixed integer LP problem I am trying to solve and the package seems to be running into numerical artifacts, or getting stuck trying to find a solution without terminating. I can't find much information online on how to trouble shoot it.

molten hamlet
#

uhh

#

what exactly are you want to achieve?

#

maybe it is using wrong numeric methods

jolly sinew
#

@solid aurora I found a solution thanks to your help! I used cumcount to generate a new column of occurrences for each animal id and then did a left join on both the animal id and the occurrences columns

#

outcomesdf["Occ_Number"] = outcomesdf.groupby("Animal ID").cumcount()+1
intakesdf["Occ_Number"] = intakesdf.groupby("Animal ID").cumcount()+1
fulldf = pd.merge(intakesdf, outcomesdf, on=['Animal ID', 'Occ_Number'],
how='left', validate="1:1")

solid aurora
#

@jolly sinew glad to hear that!

merry ridge
#

I am trying to solve a convex mixed integer LP problem. I'm not sure sure how else I can describe it without going into extreme detail.

#

I read a recent paper by Garvie and Burkardt that showed their (unrelated) LP problem would not converge using gurobi, but would converge reliably with other solvers. I'm not sure how often something like that occurs in practice before I try to reimplement this entire thing.

upbeat ore
#

what's the best approach to store data to access it afterwards?

#

i want to build a face recognition db, which would store the id, entrances on screen and such stuff, now i'm confused on how to store it better, would an object be appropriate?

solid aurora
#

@upbeat ore use a proper database such as sqlite or postgresql or smth

#

@merry ridge tbh I don't think #data-science-and-ml is the best place to find linear programming advice

#

I'm not even sure how many machine learning engineers have any linear programming experience

#

I for sure have none, but then I'm just a high schooler ๐Ÿ™‚

#

========================

#

Anyway I came here to ask

merry ridge
#

I'm not sure I agree with that statement, but I'm willing to admit I'm wrong.

solid aurora
#

Is there a functional difference between a high figsize and a high DPI value in matplotlib?

#

@merry ridge you may be more likely right than wrong - the machine learning engineers I've interacted with are all fresh out of college and focusing more on the buisness/analytics side than the math side

desert oar
#

@merry ridge this is a perfectly fine place to ask, but i dont know how many people anywhere on this discord have experience with that

#

@solid aurora it's not that they don't know math. it's that linear programming isn't typically an important part of machine learning or data science nowadays, so it's not typically taught much. especially not to ML engineers who don't need the full methodological breadth that a researcher might need

#

it's probably good for a generalist to at least be aware of LP solvers and such, but i've certainly never needed it in my career

merry ridge
#

I figured that a lot of ML practioner's would frequently run into a problem where their black box algorithm fails and they need to trouble shoot it. Are the packages used more resilient than I think they are?

solid aurora
#

@desert oar oh yea I'm sure ML engineers know math, and are at least aware of what LP can do

#

just I doubt they have experience utilizing LP to solve such problems

#

@merry ridge oh no you're entirely right, black box algorithms fail often, just that i've never heard of someone turning to LP to solve them

desert oar
#

@merry ridge ML practitioners rarely use black box algorithms

#

99% of the time machine learning is differentiable and solved with convex optimization methods like gradient descent

upbeat ore
#

Could anyone advise me on how to proceed with building something like this? -- We got a surveillance system in a bar, and people sometimes fight here, bring weapons and stuff, we want to be able to identify weapons and people that are on ban list from cameras. My first idea, was using facenet, for face detection and recognition, what about weapons?, + if people come in with masks or hats, should i be looking at another data set? All suggestions are welcome. Thank you.

solid aurora
#

I think they means things like neural networks where it's difficult to understand why something misperforms @desert oar

desert oar
#

even so, explicit LP solvers just aren't used for that

solid aurora
#

mmhm ^

#

@upbeat ore I always like to say that obtaining good data is 80% of the work of creating an ML model

#

you're going to need to find a dataset of images where people are holding/concealing weapons

#

and then label it so you can create ground truth

#

that can help build a model where you detect weapons on people

desert oar
#

^ this. you need a big labeled dataset of people holding weapons in various poses etc. and you need to make sure you arent accidentally training a racist model. basically this is a huge task that even major well-funded police departments have completely failed to successfully tackle, and i doubt you will be able to do it on your own.

solid aurora
#

and you need to make sure you arent accidentally training a racist model
THIS ^^^

desert oar
#

but if you really feel like you want to try it, someone will have to sit down and label potentially thousands of still frames of security footage

upbeat ore
#

what's the fastest face recognition right now ?

desert oar
#

there are data labeling tools you can use or purchase for that task. then you gotta actually build a model on top of it which will likely require significant gpu computing.

#

i would start there

merry ridge
#

Do you mind telling me a bit more about what you do salt rock lamp? I'm just curious because the kind of work I see actively used and considered within the realm of ML sounds very different from yours.

desert oar
#

@merry ridge data scientist

#

im curious what you see

solid aurora
#

@upbeat ore you don't want fastest, you want most accurate

merry ridge
#

I can describe the last few problems I've worked on if that helps. I just consider myself in data science

solid aurora
#

I can make a "face recognition toolkit" that just assigns labels randomly

#

it can run in 0 ms, but it will have absolutely terrible performance

#

There's always a good medium balance between speed and accuracy

upbeat ore
#

well, i was thinking that i need a fast one to be able track the faces and weapons in between

solid aurora
#

and you 100% want to err on the side of accuracy

upbeat ore
#

and still be able to output stuff on monitor

solid aurora
#

well what device is this running on?

#

a desktop computer with some sort of GPU should easily be able to handle 60fps if it's like a convnet

odd yoke
#

easily is an overstatement tbh

upbeat ore
#

would you mind to explain how to go with this, so basically i use yolo to detect the human, then extract the human and detect the face with facenet, then search in db for banlist, then look back the the full human and try to find the weapon there

#

or there's a better approach, sorry if this sounds stupid

desert oar
#

you might have a lot of false positives with that system, although i guess thats good enough to go and send a guard to visually inspect

upbeat ore
#

yeah that was the idea, just to know before hand

#

as people with weapons sit near the gambling machines, we got 3 of those

#

and usually its like +10 minutes before the real stuff happens, so this is the frame we would like to catch and disable the person to not harm himself or any others

desert oar
#

i mean this is not a small task and you'll have to test it a lot

#

but we are just now at a point where maybe this tech is within reach for a bar

upbeat ore
#

there is no timeline for it

desert oar
#

and not like, a giant company

#

@merry ridge sure

#

data science is a pretty broad range of job titles and tasks

#

i always like to know what other people work on

merry ridge
#

@desert oar The last three projects I've had was classifying electricity prices to detect moments when a player in the market could change their bidding strategy to alter spot prices in a significant way; looking at modeling strategies to predict mean electricity prices levels given some anticipated regulatory changes next year; and modelling how congestion affects crude oil prices in the US Gulf Coast. The last several positions in the data science for oil & gas, pipeline, electricity and other commodity markets I applied for all required very strong LP programming knowledge (which I am not that great at) because they still use quite a lot of excel models in house.

desert oar
#

ahh interesting

#

what do they use the LP models for?

merry ridge
#

It's mainly because a lot of this stuff depends on economic factors, so the LP part mostly handles finding price equilibrium

desert oar
#

i see. im mostly doing nlp classification nowadays, although my background was more in social science and statistics

merry ridge
#

But I don't exactly enjoy it. It is finicky and frustrating to work on

desert oar
#

hm yeah. honestly i only learned the simplex method in school

#

like i said its just not something ive ever needed

#

but i can probably think of times in the past where maybe it might have come in handy

#

e.g. i used to work in business travel, there were some problems i had at that job that i couldnt easily solve with standard "fit a predictive model" techniques

merry ridge
#

A lot of this is being handed a paper that showed great results at some conference I've never been to

#

and being told to implement it in some completely different context that doesn't always even make sense

#

To fish out some competitive advantage and it is pretty tiring having nothing ever work

iron rampart
#

Hey is this the right chat for machine learning based questions?

low glade
#

@iron rampartyes

iron rampart
#

Alright, so is it possible to create an machine learning script that can learn how to use a computer?

low glade
#

@iron rampartuhh....woah good question, I think so in terms of the operating system but how far would you want it to go, like opening notepad or...

iron rampart
#

Well doing task's on it own

#

Lets start simple

#

So ive created a "bot" than can open spotify by moving the curser to the right x and y coords. And then click

low glade
#

hmm....I believe that's possible but, im not sure how to go about that

iron rampart
#

But when i move the spotify icon it will be completly useless. So is their a way it can learn it self where it is?

low glade
#

gonna have to research

iron rampart
#

Or should i start with a simpler idea

low glade
#

nah that's sound good, doing automation with spotify but I'm new myself so I'm not sure how I would go about it

iron rampart
#

Owhh

#

Cause i have no idea where to start

low glade
#

what library do you use? @iron rampart

iron rampart
#

Euhm tensorflow?

low glade
#

oh aight I'm learning that too

#

but I'm gonna have to learn more about it

iron rampart
#

Yeah me too

#

I just don't know where to start and what's impossible to make

desert oar
#

@merry ridge fair enough, at least you have people with domain expertise guiding you. in most of my work im completely doing it all from scratch and i have no clue whats going on

#

grass is greener i suppose ๐Ÿ˜›

merry ridge
#

Oh I certainly have no idea what I am doing

low glade
#

@merry ridge๐Ÿ’€

desert oar
#

@iron rampart "learn how to use a computer" is a big and ill-defined task. this is really more of an AI question than a ML question anyway

iron rampart
#

Whats the diffrence between tho's?

#

I thoughed they were sort of the sams

low glade
#

they are kinda tbh I believe machine learning has to do with models and coming up with algorithms to train them

#

but that's just my observation so far other than that it seems similar

desert oar
#

id say that machine learning is "lower level"

#

AI would be like carrying out sequences of tasks, and reacting to unexpected input

#

whereas ML is simpler clearly-defined tasks

#

e.g. something like "identify individuals with weapons in surveillance video" is machine learning

#

but "identify threatening individuals in surveillance video" is AI, because then the model needs to learn the general concept of "threatenting"

low glade
#

@desert oarahh I see that's makes better sense

desert oar
#

however im not an AI practitioner so i can't claim to speak for the industry

#

but that's how i separate the two in my mind

#

in common data science practice, ML usually means making one-off predictions in a live/production setting

#

or generally just making predictions without human input

#

or even more loosely, it's sometimes just used to refer to techniques for building models that aren't "traditional" statistics

#

or even just for building models without really being concerned with statistical inference

#

its weird because its used all the time but nobody seems to have a good definition for what ML really is

iron rampart
#

@desert oar Wow you seem pretty into machine learning... could you tell me where to start learning it?

earnest tundra
#

Can anyone suggest some good projects which can be done by an intermediate data science learner but like mainly about data cleaning and preprocessing??

crude karma
#

how much level of code should a data scientist know vs a programmer/coder

median relic
#

pretty much comparable to any programmer if you are dealing with Deep learning per say

#

other wise, just a fundamental understanding of algorithms and statistics is sufficient

crude karma
#

im pre new and interested in this field.. im learning code right now and I finished a stats course last year in college.. is deep learning a masters level thing?

median relic
#

oh thats cool, no deep learning is not really a masters thing. It just happens to demand some prerequisites that are based in linear algebra, diff calc. And it has more to do with neural networks.

#

but otherwise like machine learning concepts it is pretty easy to learn

tidal bough
#

https://www.coursera.org/learn/machine-learning
I recommend this introductory ML course to everyone. It covers linear and logistic regression, basic unsupervised learning, some Support Vector Machine stuff and neural networks (including personally implementing backpropagation).

#

So essentially the basics of all fields, and with very little required knowledge - only basic linear algebra, which the course provides materials and a refresher for.

#

free, too.

crude karma
#

damn i ahvent taken linear algebra in college

median relic
#

@tidal bough yes, that is a good place. If any one is interested in deep learning I would highly recommend http://www.deeplearningbook.org/ and the UCL deepmind lecture series on youtube

crude karma
#

i only took differential calculus and even then i got a bad mark rip

tidal bough
#

...how did you take differential calculus without linear algebra? All the stability theorems are about matrix eigenvalues and stuff.

median relic
#

@crude karma dont worry it is not that difficult, just put your mind to it. you can easily learn so many concepts quickly. Just don't think of it as some advanced concept

crude karma
#

uh

#

our courses are split

#

so like

#

differential calculus, integral calculus, then linear algebra

#

thats the progression

tidal bough
#

ah, I got it now

#

I thought you mean differential equations

#

differential calculus is indeed a lot earlier

crude karma
#

i really have to re take diff calculus

flat quest
#

i mean it depends on what you plan to do. @crude karma

you really won't need diff calculus to become an entry ML engineer or a data scientist.

Most of the linear algebra, differential calculus, statistics stuff only gets really important once you start going into research. Until then it's just familiarizing yourself with the techniques or models that have already been researched and proven.

crude karma
#

thinking of going into industry rather than academia

flat quest
#

well there's research in industry as well. Companies hire a number of ML researchers

desert oar
#

i kinda disagree

#

linalg and stats are essential problem solving tools in my work

#

and calculus is just a necessary prerequisite for understanding pretty much anything

#

do you need it all on day 1? no. will you need it to actually make it through and understand the material? yes. will you be worse off without it? yes.

#

take it from me, who tried to get by for a long time learning as little "theory" as possible

#

if you don't know the underlying math at least roughly, you can't really know what you're doing or why it works / doesn't work.

#

intuition is nice but not enough

#

especially once you're past the entry level and the problems in front of you no longer resemble exactly things that you saw in your coursework and textbooks

#

maybe if you're lucky enough that your work allows you to dump everything into keras and call it a day, then fine

#

but i dont know of many people whose jobs are actually like that

velvet thorn
#

^

#

this a thousand times

#

if you do not understand the concepts underlying the code you use (and I don't just mean the programming abstractions, but also the mathematics)

#

your work will be slow, inefficient and of low quality

#

and you will spend a ton of time not understanding the errors you get and the problems you face

#

I have trained data scientists and done freelance teaching

#

I cannot understate the importance of a strong foundation

desert oar
#

(where "errors" includes "my model kinda works but it performs badly sometimes", not just "ValueError")

storm scroll
#

Whatโ€™s the best and easiest python package to implement plots on my website
, including playable interface and 3D graphs

velvet thorn
#

(where "errors" includes "my model kinda works but it performs badly sometimes", not just "ValueError")
@desert oar edited to clarify

frank bone
#

why cant i do if (var = some_function): continuewithcode else: break

#

goal is to continue the code if a value can be assigned to var. If there's an error assigning a value to var, it should break the loop

velvet thorn
#

goal is to continue the code if a value can be assigned to var. If there's an error assigning a value to var, it should break the loop
@frank bone because that's not what if is for

#

but try-except

desert oar
#

@frank bone in python 3.8+ you can do

if (var := some_function()):
    ...
frank bone
#

tied except, worked well ๐Ÿ™‚ thanks!

desert oar
#

oh, i see

frank bone
#

@desert oar good to know!

#

ill try that too

#

thx

desert oar
#

it wont help with your specific question, i realized

#

but its a useful new feature otherwise

flat quest
#

Yeah I agree, but it depends on what exactly you're work is, and how far you plan to go in terms of an ML/DS career @desert oar

By all means, actually understanding the underlying mathematics is vital to get far in ML. But an entry engineer doesn't really need to really know the underlying details.

For new people it might be worth just spending 4 - 5 months learning ML, and then getting an entry job for a while. And then come back and learn all those topics in depth

If you start getting into doing research, writing your own libraries, CUDA software, or just encountering problems that are very specific to a company, then those will be necesarry.

merry ridge
#

In my opinion, at the absolute minimum even a very basic crash course in linear algebra provides a mountain of intuition that is helpful at all levels of skill. It is the least forgivable of the mathematical corners to cut.

flat quest
#

yeah prob that, stats, and gradients

bitter harbor
#

I've found stats to be a bit less useful when building nn's but it's still %100 important for ml

flat quest
#

its more so for DS
Actual neural nets it's not as important unless you're working with probabilistic models

bitter harbor
#

that's what I mean, it's more using in pre/post processing

merry ridge
#

So I'll be honest. I have no idea why stats is important.

#

Are you rolling regression techniques into the area of stats? Probability theory into stats? In my mind they are different things.

desert oar
#

stats and machine learning have the same problem

#

nowadays theyre mostly just references to different sets of problem solving approaches

#

the far ends of stats are very different from the far ends of machine learning, but its really hard to draw a line between them. to the extent that theyre even different things

#

so yeah, id say that fitting a probability model counts as stats

#

as do all the various forms of hypothesis testing

#

i can't say that NHST is that useful in industry nowadays, but the concept i think is very important to understand

merry ridge
#

Fair enough. At least in my experience, I feel like people cheat on the hypothesis testing part like crazy

desert oar
#

true. depends on what industry

#

basically any time youre inferring parameters of a probability model, i have a hard time resisting the temptation to call that stats

#

fun debate topic: is holt winters smoothing stats or machine learning?

merry ridge
#

Here, you learn linear regression etc as part of your calculus curriculum before Lagrange multipliers and most of the probalistic tools are taught in a proper probability theory course separate from stats which is why I ask.

flat quest
#

I mean neural networks are just stats in a nutshell. We just made the process more incremental.

We're still finding a distribution of generated values that closely match the original dataset. You won't likely use direct stat techniques in a conventional neural net though, at least not directly.

For Bayesian/Probabilistic models, though, those are pretty much grounded on distributions.

And then stats also helps with data analysis. You need to know what you're working with before you do any feature engineering or data cleaning.

desert oar
#

I mean neural networks are just stats in a nutshell. We just made the process more incremental.
i dont know if this is true

bitter harbor
#

fun debate topic: is holt winters smoothing stats or machine learning?
ml

desert oar
#

"machine learning is making predictions without explicitly making statistical inferences, except where making such inferences is convenient for making better predictions"

#

its way past my bedtime

#

means and variances are probability theory... but estimating them from data is statistics

#

what other aphoristic platitudes can i come up with at 2:30 AM

flat quest
#

well ML was largely built on the basis of stats.

I guess its difficult to say that they're exactly the same, but they follow very similar principles.

The line is very blurred. I guess the question would be what is stats in the first place, cause statistics is a very large encompassing term.

bitter harbor
#

arguing that when a computer makes an inference on a dataset - when it's simply an equation is stats is dumb, even a basic neural net is a series of equations that allows those inferences to be made, so just because it has a 'lower complexity' + can be done by hand, doesn't mean it's not machine (or ig if you really want to be specific: mathematical) learning

#

I think the 'learning' part of it is misleading

#

but then again, mathematically(/statistically) induced inferences doesn't roll off the tongue

desert oar
#

in all seriousness, ML is fundamentally a problem domain, whereas statistics is a set of techniques for fitting and making inferences from probabilistic models. it just so happens that we now have a lot of methods for making predictions that arent inherently statistical, but still fulfill the task of ML (and happen to be useful in many other places), so we call them "ML techniques" and the whole thing becomes a terminological mess

#

statistics is one tool for approching the task of ML

#

but you can minimize loss functions without explicitly appealing to statistics

#

whereas much of statistics does depend on minimizing loss functions

#

basically its all historical nonsense

merry ridge
#

From my perspective, ML is mainly Numerical Analysis (which is basically taylor series and secant lines) and linear algebra with comparatively little stats sprinkled on the shoulders of those two giants.

bitter harbor
#

see i'd argue ml is what happens when statistical models get combined with math

desert oar
#

i dont see much numerical analysis used in ML at all @merry ridge , i think your experience is somewhat unique (and very interesting)

merry ridge
#

What about something like gradient descent?

desert oar
#

sure

merry ridge
#

that is a core topic in every numerical analysis course

desert oar
#

then, granted

#

its funny that traditional statistics historically depended on 2nd order optimization methods

#

when it turned out that gradient descent was good enough all along. maybe its just that computers used to be so much slower so you needed the faster convergence of 2nd order methods?

flat quest
#

I mean I guess it depends on what you see ML models are doing. They fit to distributions on the data at the end of the day, whether they're good or not, using a number of statistical based techniques.

Well gradient descent on its own is not necesarrily enough. We don't directly use the gradient anymore.

desert oar
#

a neural network is not explicitly fitting any distribution, at least in the general case

#

is there always an implied model for the conditional expectation and an error distribution around said expectation? yeah

#

does that mean you can actually know or make use of that information? maybe, maybe not

merry ridge
#

One of the problems with higher order methods is that they may be faster, but they may be less numerically stable

desert oar
#

doesnt computing hessians also get pretty gnarly

merry ridge
#

Truncating at x^2 is a bit of a trade off

desert oar
#

what were you saying about the secant lines?

merry ridge
#

I say secant lines because a lot of derivatives are just approximted by one

desert oar
#

ah

flat quest
#

I mean the goal of a neural network in the majority of cases is to create values that are in the best case basically exactly matching to the real global dataset.

Unless I'm making a major error here, it seems to me that we're trying to approximate distributions

desert oar
#

i dont think thats the case. you can say that your predictions should roughly follow the distribution of the target in the training data (assuming the features have the same distribution as in the training data), but that isnt really the goal

flat quest
#

well roughly, because we don't have a true dataset encompassing all possible data points.

If we did we wouldn't need to worry about overfitting as much as we do now

desert oar
#

however im pretty sure you can safely say without loss of generality that a neural network in a regression problem typically makes predictions of the form f(x) = E(Y|X=x)

flat quest
#

right

#

also on the topic of gradients, it is worthwhile wondering if the gradient is really the most useful thing. I mean the brain is using some form of learning, and its pretty good. I don't know if its using gradient descent and backprop - there was a recent paper on this using inverse functions - or something else

Tho how synapses form between neurons and strengthen based on current research seem to suggest its based on repeated firing.

desert oar
#

i mean, gradient descent is just the optimization algorithm that happens to work on networks

#

and it also lends itself to this really elegant formulation in terms of computation graphs and layers

#

im sure hexicle can attest to all kinds of other specialized ways to find the parameters of a function that minimize average loss on a dataset

flat quest
#

true, but its largely suited for supervised learning

merry ridge
#

I mean, I've know of them, but they are mostly a gigantic pain in the ass to use

desert oar
#

if someone comes up with a better technique, then i dont think anyone is going to reject it out of hand, but youre going to have to make a case for why it beats the elegance and convenience of stochastic or batch gradient descent

flat quest
#

oh yeah, there's been attempts at more biologically plausible neurons before

They haven't seemed to work as of yet though. No reason to use it unless it's actually performing above our current SOTA

desert oar
#

right

#

hence all of these weird and interesting machine learning techniques have been relegated to history. goodbye to the radial basis functions and kernels and coordinate descent and support vectors

#

and this is why i dont know anything about anything

#

because i just dont need to anymore

merry ridge
#

A lot of the fancier techniques get increasingly complicated to the point of absurdity

#

One of my former Numerical Analysis professors would tell us stories of specialized rooms with paper that hung from the top of the walls to the floor and they would write high order methods from step ladders to figure out all the coefficients needed in a formula.

desert oar
#

heh

#

thats wild

flat quest
#

i mean everything's getting more complicated in general, until someone goes an abstracts portions of it

#

^^

#

i wonder how long the actual equations were...

merry ridge
#

It's been too long to remember the exact topic at the time

#

I'd have to ask them, but it is nearly impossible during the pandemic

flat quest
#

yeah for sure

wait are u a grad student?

merry ridge
#

No

#

I keep an office at my alma mater because I am an editor for a discrete mathematics journal and I didn't want to "take my work home". Also it is nice having a space at the university even if I show up once every 4 months.

flat quest
#

ah gotcha gotcha

bitter harbor
#

salt I want your opinion on something ;)

desert oar
#

heh sure

bitter harbor
#

I've started exploring linux based systems and found a laptop calling the kubuntu focus

#

which claims:

#

is that a thing?

desert oar
#

i have no idea

bitter harbor
#

like it's a i7-9750H 6c/12t 4.5GHz, 2060-80 rtx laptop for like 2.5k

#

and I get branding and all that

#

but I can't tell if that's bs or not

#

would you happen to know what an ai score is?

lapis sequoia
#

I read 'up to' and 'often'

#

and I'm pretty sure that marketing bs

velvet thorn
#

that sounds super dodgy.

bitter harbor
odd yoke
#

do not buy a laptop for deep learning

#

like, ever ever ever

#

it will have a combination of shitty gpu, shitty ventilation, and being overpriced

#

as well as generally not being upgradable

#

do yourself a favor and get a better desktop for half the price

willow kernel
#

Hello - I would like to get some help condensing a large dataset. Was wondering if anyone has experience with this?

lost patio
#

Hi @willow kernel, continuing from where we left off, let's write a little script to break up your dataset into separate files.

willow kernel
#

Sounds good!

lost patio
#

You have a csv file yes?

willow kernel
#

Yes

lost patio
#

Actually we should probably move to a help channel so we don't flood this one. Head over to #help-broccoli

frank bone
#

any idea how you would go about converting "requests.models.Response" type of data (coming from an API call) into normal python type of format...dict, list, pandas?

#

i got it, converting it into json then going from there

arctic wedgeBOT
#

Hey @vague portal!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

vague portal
#

Hmm how can I share an excel file with you guys? I need some help with analysing this dataframe ๐Ÿ˜ฎ

charred agate
#

Hello all, Im currently a comp sci student in college actively looking for motivated people to work on AI based projects with. TensorFlow would be the main api. I am always looking for like minded people to collaborate with. Pardon me if this is a inappropriate place for a message like this. Pointers to other places where I could look for people would be great. Networking during a pandemic is hard and this was the first place that came to mind. Thanks!

vague portal
#

Basically, I have a large dataframe (200,000 x 10) that I want to analyse. I want to be able to group the data by one of the columns, and then make sub-groups within these groups. I think I could do this manually, but it will take me forever, is there a quicker way to do this than using for loops?

#

I'm on the water data science project and it would be helpful to have someone with machine learning experience to help us out

atomic forge
#

!resources

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

charred agate
#

Hey thank you, I really appreciate it. I will check it out later today, itโ€™s 5am for me. Thank you again @vague portal ! Also @atomic forge , thanks for the resources.

atomic forge
#

np

random perch
#

Im planning on buying and putting together a pc build with the new 3080 when it comes out. Do you guys know if its possible to train small models on gpus that arnt like a titan? I want to work on building a deep learning model to play chess.

lapis sequoia
#

has anyone used eta-squared in here (ideally in python)? have a question about it

little shard
#

Hi! I've been trying to screenshot data from a program so I could then convert it to text because I wanted to automate the whole process. But from some reason, four separate screenshots are taken at the same time although I set time.sleep()in multiple places โ€” but when I do the same thing, but with no running the program (just desktop is visible) the screenshots are taken separately. How can I delay screenshots while inside program?

#

this is the part of the code:

`from datetime import datetime
from datetime import date
import subprocess

try:
from PIL import Image
except ImportError:
import Image
import pytesseract

def ocr_core(filename):

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string(Image.open(filename)) 
return text

os.startfile('C:\Program Files\Stellarium\stellarium.exe')

time.sleep(8)

pag.hotkey('f3')
pag.typewrite('Delta Cep')
pag.hotkey('enter')

time.sleep(4)
now_cep=datetime.now()
vrijeme_cep=now_cep.strftime('%m-%d-%Y %H-%M-%S')
folder='images/'
filename=' Delta Cep.png'
output_cep=folder+vrijeme_cep+filename
time.sleep(2)
im=scr.grab(bbox=(0,0,1919,1079))
im.save(output_cep)`

flat quest
#

hey @charred agate

Yeah looking for motivated AI/ML people as well, maybe we can work on a project together ๐Ÿ™‚

proud iron
#

Guys, what is one of the fastest ways for a newcomer in AI to start writing machine learning code. ๐Ÿ™‚

iron rampart
#

Where should i start with machne learning?

tidal bough
#

@proud iron doing a coursera course, probably.

#

It has little to no needed background knowledge, yet covers most of the fields.

proud iron
#

@tidal bough do you currently remember a resource that requires background Python knowledge or? ๐Ÿ™‚

tidal bough
#

There probably are introductory ML courses in Python, but I don't know them.

proud iron
#

Cheers @tidal bough . ๐Ÿ™‚

molten hamlet
#

yo, so I got 2 kind of information, RGB map and height map,
how would I detect water groups?

#

lets say I want to count water areas

tidal bough
#

well, here it looks like water is just all the blue, lol

molten hamlet
#

xD

#

genius

#

its kinda ml question

#

how to find and count objects

#

xd

#

orrr

#

I will simply create mask and adios

tidal bough
#

you can just find contigent areas of a color

desert cradle
#

flood fill

haughty turtle
#

hey

#

i was wondering how i could (with panda or numpy if needed) detect categorical data ?

tidal bough
#

WDYM by detecting categorical data?

haughty turtle
#

get every categorical columns inside a new dataset

tidal bough
#

You could check how many unique values there are in that column, and if it's, say, below 100, assume it's categorical.

haughty turtle
#

it wont work everytime

#

cause if we have the speed of a car it can be unique every time (like: 140.274km/h)

tidal bough
#

If your dataset has only like a dozen different values for a column, you can consider that column categorical even if the dataset's creators considered it continious ๐Ÿ˜›

haughty turtle
#

the passenger class is a categorical column

#

however all the values are not unique

tidal bough
#

There are only 2 unique values, so by my definition it'd be a categorical column.

#

Same with price, here.

#

dataset with 3 rows isn't much of a one, though

haughty turtle
#

just an exemple

#

i need to deal with one with more than 100 columns

desert oar
#

@haughty turtle if the column contains only integers thats one potential sign

#

moreso if they're all consecutive integers

#

whereas very large nonconsecutive nonnegative integers all of the same digit length could be id numbers

#

look for "type" or "category" or "class" in the column name

#

this seems like a very weird problem to have

haughty turtle
#

its not haha

desert oar
#

what is this data? where is it retrieved from? how is it stored?

arctic wedgeBOT
#

Hey @haughty turtle!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

desert oar
#

its pretty unusual to have to perform automated data processing on 100s of tables with unknown schemata...

haughty turtle
#

i want to fillna

desert oar
#

or rather, tables with 100s of columns with unknown schemata

haughty turtle
#

i know the schemata

desert oar
#

then use it

#

look at it

haughty turtle
#

i can

#

but don't want to

#

cause i wan't to fill na in one line

desert oar
#

"i got the instruction manual but its long and i dont want to read it, how can i detect the right screws to tighten on my car?"

haughty turtle
#

that's why im doing a lib

#

i have plenty of other project to do and i want to create a lib to go faster

#

thats what code is supposed to do

desert oar
#

by the time out figure out an algorithm for this you could have just typed out a list of categorical/numeric indicators by hand

#

if you want to do it for fun, then go for it. but this doesnt sound like an optimal use of your time

haughty turtle
#

ill try and if it shows it results publish it .... maybe it will help others to save time

iron rampart
#

@tidal bough The course is in Python lang right?

tidal bough
#

if you mean the coursera ML one - no, it uses Octave.

#

The programming assignments there all require manually writing code rather than using existing libraries anyway, so it wouldn't be much different if it were in Python.

#

Octave has builtin linear algebra support, so it's kinda like numpy ๐Ÿ™‚

crisp jewel
#
from zlib import crc32
import numpy as np

def test_set_check(identifier, test_ratio):
    return crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32```
#

can anyone help me with this?

#

The function checks, whether one row of a DataFrame should belong to the test-set or the train-set.

#

I don't know how it works

spiral yew
#

For deep learning, what OS do you guys recommend and most people use? I've heard that Windows is pretty bulky and Linux is the best. I cant use Linux for school since I need some software, so any recommendations? (I have a macbook pro as well)

crisp jewel
#

use your macbook pro , that what I would go for , besides linux has most of the softwares that you need in school

spiral yew
#

Is windows not good? I've seem some ml people use windows @crisp jewel

#

the specs of my macbook pro are pretty bad compared to my pc's (its a desktop pc not a laptop btw which i built this summer)

crisp jewel
#

use the macbook

#

leave the desktop as windows

#

macbook for working

#

the other one sell it lmao

#

dont sell the monitor tho

vivid zenith
#

k

spiral yew
#

macbooks are pretty bad tbh, im not doing anything and the fan runs really fast

#

and for deep learning a macbook isnt the best idea because it doesnt even have a gpu

uncut shadow
#

They have gpus tho

raven mulch
#

In this video we look at a paper which proposes with theoretical and empirical evidence to use tempered sigmoids instead of ReLU (or in general exploding activation functions) to improve on differentially private stochastic gradient descent (DP-SGD). I would love to spark discussions here or on the youtube comment section about this paper!

Video: https://www.youtube.com/watch?v=g2acvGl99-k

Paper: https://arxiv.org/abs/2007.14191

Abstract: Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to less than ideal privacy/utility tradeoffs, as we show here. Instead, we propose that model architectures are chosen ab initio explicitly for privacy-preserving training. To provide guarantees under the gold standard of differential privacy, one must bound as strictly as possible how individual training points can possibly affect model updates. In this paper, we are the first to observe that the choice of activation function is central to bounding the sensitivity of privacy-preserving deep learning. We demonstrate analytically and experimentally how a general family of bounded activation functions, the tempered sigmoids, consistently outperform unbounded activation functions like ReLU. Using this paradigm, we achieve new state-of-the-art accuracy on MNIST, FashionMNIST, and CIFAR10 without any modification of the learning procedure fundamentals or differential privacy analysis.

In this video we look at a paper which proposes with theoretical and empirical evidence to use tempered sigmoids instead of ReLU (or in general exploding activation functions) to improve on differentially private stochastic gradient descent (DP-SGD).

Paper: https://arxiv.org...

โ–ถ Play video
bitter harbor
#

โ€œthe best tempered sigmoid achieves 98.1% test accuracy whereas the baseline ReLU model trained to provide identical privacy guarantees (ฮต = 2.93) achieved 96.6% accuracy.โ€

#

Iโ€™d like to see some proof of that

#

Also what does the โ€˜heatโ€™ in figure 2 represent?

flat quest
#

i see we have another yannic klicher @raven mulch

raven mulch
#

@bitter harbor itโ€™s the testing accuracy

#

Hahaha @flat quest heโ€™s great

velvet thorn
#
from zlib import crc32
import numpy as np

def test_set_check(identifier, test_ratio):
    return crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32```

@crisp jewel this is WILD

#

why would anyone do that

#

this is like a great example of "trying to be smart" IMO

fathom raptor
#

where these are all columns in the csv file

#

that i'm reading from

#

sorry if this is too vague im just starting with data science ๐Ÿ˜ฌ

velvet thorn
#

ah, this is the infamous Titanic dataset

fathom raptor
#

indeed

velvet thorn
#

are you asking

#

what each column means?

fathom raptor
#

no, i get that

#

i'm just asking how this data.corr() statement works

velvet thorn
#

incidentally, I would suggest data.corr().abs()['survived'].sort_values() instead

#

hm

#

are you asking about the meaning of the correlation coefficient?

fathom raptor
#

lol no okay lemme try to word this better

solar bluff
#

๐Ÿ›ณ๏ธ

fathom raptor
#

okay first of all why are there [[]] around the "survived"

velvet thorn
#

okay

#

that's a pandas question

#

so you know data is a DataFrame, right?

fathom raptor
#

is there a channel for that lol

#

yeah i get that

velvet thorn
#

which conceptually represents 2D data

fathom raptor
#

mhm

velvet thorn
#

okay, so you know what a Series is?

fathom raptor
#

uhh

#

no :)

velvet thorn
#

a Series represents 1D data

#

either a row or a column

#

so, for example, if you do data['survived']

#

you get the column representing whether each person survived or not

#

because a column is 1D, that's a Series

fathom raptor
#

ohh

velvet thorn
#

so you can think of a DataFrame as a collection of Series

fathom raptor
#

like a vector?

velvet thorn
#

yes

#

now, we used square brackets above

#

to access a single column of data

#

but what if we want to take multiple columns?

#

then we would pass a list

#

say we wanted the sex and age columns

#

data[['sex', 'age']]

#

which you can break down as:

columns = ['sex', 'age']
data[columns]
#

make sense?

fathom raptor
#

yes

#

but in [['survived']] we only have one element in the list?

velvet thorn
#

yes

#

so now

#

that's the difference between 2D data with one unit dimension and 1D data

#

in other words

#

if you did ['survived']

#

you would get a Series

#

but with [['survived']] you have a DataFrame with one column.

#

and the two are different things

fathom raptor
#

ohh so data[['sex', 'age']] returns a dataframe okayy

velvet thorn
#

just like in normal Python, [[1, 2, 3]] and [1, 2, 3] are different things

#

yup

fathom raptor
#

so data.corr() .abs() without the [['survived']] would give a very large dataframe with every pairwise correlation coefficient i assume

#

lemme try it out

velvet thorn
#

yes

fathom raptor
#

ooh this is cool

#

thanks for the help :))

velvet thorn
#

yw

versed violet
#

Hello ! I have a csv which represents the temperature data for the 4 seasons, I want to add a precise number for each 90 iterations and I am a little stuck doing it with pandas

velvet thorn
#

Hello ! I have a csv which represents the temperature data for the 4 seasons, I want to add a precise number for each 90 iterations and I am a little stuck doing it with pandas
@versed violet what do you mean precise number for each 90 iterations

#

like first 90 rows one number, next 90 rows one number, etc.?

versed violet
#

I want to count 94 row for exemple and add a number for each of this 94 rows

hasty grail
#

From your image, do you mean

  • Add 3 to each of the the first 79 rows
  • Add 2 to each of the next 93 rows
  • Add 5 to each of the 94 rows after that
  • ...
    ?
versed violet
#

Yes !

hasty grail
versed violet
#

Yes 1mn just to read how to claim the help channel thanks !

fathom raptor
#

quick question, how come if i replace '?' with numpy.nan i can still use .dropna() on the dataframe? does python have a built in nan data type? my intuition tells me that numpy.nan is different but idk

velvet thorn
#

quick question, how come if i replace '?' with numpy.nan i can still use .dropna() on the dataframe? does python have a built in nan data type? my intuition tells me that numpy.nan is different but idk
@fathom raptor yes

versed violet
#

@hasty grail it tells me I'm in a "Cool Down" expect i've never opened a help channel

fathom raptor
#

yes to python having a builtin nan?

velvet thorn
#

to both

#

for the latter, float('nan')

hasty grail
#

hmm not sure what that means, maybe one of the helpers/mods can elaborate?

versed violet
hasty grail
#

yeah that is strange...

#

But to answer your question, a simple way would be just to read the entire file and put the contents into a list

#

Edit the list

#

then overwrite the file with the contents of the new list

versed violet
#

Oh i see, and to write a loop to add the numbers right ?

hasty grail
#

yes

versed violet
#

That's where i have a problem, I can't really see how i can right the loop, like do I do it with a count or with a len ?

hasty grail
#

you can use enumerate()

fathom raptor
#

wait idk if this is a datascience question or just a noob programming question, but how come both of these syntaxes work?

hasty grail
#

!eval

for i, v in enumerate(['a', 'b', 'c']):
    print(i, v)
arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

hasty grail
#

This is why being in a help channel would be helpful...

#

you can take a look at the output in #bot-commands

#

@fathom raptor It's analogous to calling class instance methods. You can call it by <obj>.<func> or <func>(<obj>)

versed violet
#

This is why being in a help channel would be helpful...
@hasty grail I see yeah, thanks a lot I will try with what you gave me until now and waint for this "cooldown" to finish, thanks a lot for the help !

velvet thorn
#

wait idk if this is a datascience question or just a noob programming question, but how come both of these syntaxes work?
@fathom raptor the latter is more idiomatic

graceful glacier
#

can anyone whos familiar with pandas tell me the difference between .nunique and .value_counts?

velvet thorn
#

can anyone whos familiar with pandas tell me the difference between .nunique and .value_counts?
@graceful glacier have you tried calling both of them on the same data

#

the difference should be quite apparent

graceful glacier
#

ok i got it after looking it up. just starting out with pandas so some of the concepts blur together for me

junior quest
#

is it even possible to save a matplotlib animation as a html file?

crude karma
#

is the variable "axis" a built in variable in numpy???

bitter harbor
#

idk about axis, but row and column are

#

like:

for row in array:```
#

but axis is a argument for most np functions

#

ig it is

crude karma
#

but how does python recognize axis

#

if axis = 0

flat quest
#

probably through some properties but I'm not completely sure

bleak fox
#

but how does python recognize axis
@crude karma 0 R and 1 as columb

hasty grail
#

is the variable "axis" a built in variable in numpy???
I don't understand what you mean by "built in"

solid aurora
#

ok i'm super sleep deprived but is there a clean, elegant way of iterating through "square sections" in a numpy 2d array?

#

i.e. if n=2 I would want to look at the four "quarters" of the array:

>>> squares( np.reshape(np.arange(16), (4, 4)), n=2)
np.array([
  [[ 0, 1],
   [ 4, 5]],
  [[ 2, 3],
   [ 6, 7]],
  [[ 8, 9],
   [12,13]],
  [[10,11],
   [14,15]]
])```
#

there doesn't happen to be a built-in numpy way of doing this, right?

#

I just have to use slices?

hasty grail
#

I think there is a function for that

#

let me see..

#

ok no, apparently this is one of the things Tensorflow has but NumPy doesn't -_-

crude karma
#

like

#

how does axis know its 0 for row and 1 for column

#

you can name anything other than axis and have it assign 0 for row and 1 for column right?

velvet thorn
#

how does axis know its 0 for row and 1 for column
@crude karma convention

#

by default, axis 0 is rows

crude karma
#

convention?

velvet thorn
#

yes, convention

#

is something about that unclear?

solid aurora
#

@crude karma axis 0 is the outermost axis

#

i.e. you enter 0 lists before hitting the 0th axis

#

axis 1 is the second-most outermost axis

#

you must enter one list before you hit the 1st axis list

velvet thorn
#

only under C-order

solid aurora
#

true

velvet thorn
#

but well

#

I've never seen F-order being used

solid aurora
#

actually, how does numpy store arrays internally?

#

C-order?

velvet thorn
#

no

#

that's what the

#

uh

#

order argument is for

solid aurora
#

ah

#

what's the default?

velvet thorn
#

it physically changes the memory layout

#

'C'

crude karma
#

ah

velvet thorn
#

which is why, if you change the order, you notice that the speed of iteration across specific axes changes

#

since memory contiguity is actually affected

solid aurora
#

mmhm

#

purely for research purposes, somebody could probably make a wrapper that lets you specify a custom axis order to persist on disk

#

it would probably just permute the axises it passes to numpy

#

and use numpy's C-order internally

#

anyway @hasty grail why is that implemented in tensorflow? lol

#

seems like something they should have contributed to numpy

hasty grail
#

It can probably be implemented via a combination of NumPy ops

#

Take a look at this

lapis sequoia
#

UnimplementedError: 2 root error(s) found.
(0) Unimplemented: {{function_node __inference_train_function_16567}} File system scheme '[local]' not implemented (file: '../input/birdsong-resampled-train-audio-03/redcro/XC143214.wav')
[[{{node ReadFile}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNextAsOptional_5]]
(1) Unimplemented: {{function_node __inference_train_function_16567}} File system scheme '[local]' not implemented (file: '../input/birdsong-resampled-train-audio-03/purfin/XC171695.wav')
[[{{node ReadFile}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNextAsOptional_3]]
0 successful operations.
7 derived errors ignored.

#

whats this error?

hasty grail
#

looks like you're running a TF model with distributed training?

#

or using a tf.data.Dataset object

#

in any case you need to provide more info than that for us to help, such as your actual code

lapis sequoia
#

I'm using a tf.data.Dataset object

#

wait I'll make a pastebin link

arctic wedgeBOT
#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia
#

:p

hasty grail
#

does pastebin not work?

jovial lotus
#

hey all, I need help figuring out where to go in my project. Basically I have gathered a TON of data of "if x then y's" and have a bunch of probabilities regarding such. I need help figuring out how to pick the next candidates given a set of x values.

lapis sequoia
#

hey all, I need help figuring out where to go in my project. Basically I have gathered a TON of data of "if x then y's" and have a bunch of probabilities regarding such. I need help figuring out how to pick the next candidates given a set of x values.
linear regression?

jovial lotus
#

I think so, thats what I found when I googled it. Guess I'll have to look back into my AI homework ๐Ÿ˜ซ

lapis sequoia
#

I'm assuming y's are continuous labels and x are features and you want to predict a y for x?

jovial lotus
#

@lapis sequoia do you have any experience with it? Could I dm you?

hasty grail
#

Oh, you're using TPU strategy to train the model

lapis sequoia
#

yesss is that causing this

#

I didnt think it is

#

@jovial lotus sure

hasty grail
#

is your model_dir a GCS bucket?

#

It has to be a GCS bucket from what I have read

lapis sequoia
#

Nope it's on kaggle

#

working directory

hasty grail
#

don't use TPU training then

lapis sequoia
#

Hmm I've used TPU on kaggle before in the same way. Let me remove it and try

lapis sequoia
#

@hasty grail OOM error with GPU xD

hasty grail
#

umm how many params does the model have?

lapis sequoia
#

350,000

hasty grail
#

what does nvidia-smi return?

lapis sequoia
#

worked

#

i reduced params

hasty grail
#

what worked?

#

your GPU must be tiny

#

350k params is not that much in terms of GPU usage unless you got a bunch of attention layers...

lapis sequoia
#

also removed Residual connections

#

not what i wanted but welp something to build on

hasty grail
#

wait I see you are doing some signal transformations

#

so that's probably what it ate up so much GPU

lapis sequoia
#

Kaggle GPU tho yes, I'm using MEL spectogram on audio

#

but it's gonna take 50 hours to train ffs

upbeat ore
#

I would like to hear your thoughts on this. First perform DETR object detection if the object is the desired one, e.g. person, do perform face detection from facenet on the found cutout object. would that be efficient in terms of speed?

#

as DETR would be really nice to find other objects as weapons that i would need to perform on the same image as well

hidden halo
#

I have a Pandas related question in #help-popcorn. Could someone please take a look.

desert oar
#

@hidden halo did you get an answer?

hidden halo
#

Not entirely

#

One sec, I'll post it here

#

I have dataframe in Pandas which looks like this:

+--------+------------+---------+-------+
|class   | student_id | subject | score |
+--------+------------+---------+-------+
|4       |           5| Maths   |     56|
|4       |           5| English |     65|
|4       |           6| Maths   |     73|
|4       |           6| English |     78|
+--------+------------+---------+-------+

I want to convert the subject column to column headers, while retaining all other columns as is, like this:

+--------+------------+---------+-------+
|class   | student_id | English | Maths |
+--------+------------+---------+-------+
|4       |           5|       65|     56|
|4       |           6|       78|     73|
+--------+------------+---------+-------+
#

I got this. But I can't seem to be able to flatten it. Any ideas
df.set_index(['class', 'id', 'sub']).unstack('sub').reset_index()

desert oar
#

oh, yeah thats a thing

#

i think if you do rename with a callable, it receives a tuple

#

let me check

desert oar
#

@hidden halo ```python
data = data.pivot(index=['class', 'student_id'], columns='subject').reset_index()

flat_colnames = ['_'.join(filter(None, ctup)) for ctup in data.columns.to_flat_index()]
data.columns = flat_colnames

print(data)

#

.to_flat_index() seems to be the trick you were missing

paper niche
#

maybe I'm missing something, but couldn't you just have selected 'score' before resetting the index? as in df.set_index(['class', 'id', 'sub']).unstack('sub')['score'].reset_index()

#

also, pretty interesting point about .to_flat_index(), didn't know that was a thing

desert oar
#

yeah that would work too @paper niche

#

seems more "opaque" though. if i wrote that code i would want to leave a comment like # selecting the column avoids creating a multi-index

paper niche
#

sure, I agree. I like the pivot table solution better, I think the intention is clearer than setting index and unstacking

desert oar
#

i do wish pivot wouldn't mess w/ the index though

#

e.g. if you already have a meaningful index you then have to keep track of the indexes to reset

#

it gets messy

hidden halo
#

data = data.pivot(index=['class', 'student_id'], columns='subject').reset_index()
This did not work for me, it threw an error saying Length mismatch: Expected 4 rows, received array of length 2
However, I tried the second part with my unstack method and that did the trick. Thanks a lot.
Now I'll go try to unpack what happened in that line.

desert oar
#

@hidden halo this was my whole script, maybe your real data is different

import io
from operator import methodcaller

import pandas as pd


data_txt = '''
class   | student_id | subject | score
4       |           5| Maths   |     56
4       |           5| English |     65
4       |           6| Maths   |     73
4       |           6| English |     78
'''

data = pd.read_csv(io.StringIO(data_txt), sep='|') \
    .rename(columns=methodcaller('strip'))
data['subject'] = data['subject'].str.strip()

data1 = data.pivot(index=['class', 'student_id'], columns='subject').reset_index()
flat_colnames = ['_'.join(filter(None, ctup)) for ctup in data1.columns.to_flat_index()]
data1.columns = flat_colnames
print(data1)
hidden halo
#

maybe I'm missing something, but couldn't you just have selected 'score' before resetting the index? as in df.set_index(['class', 'id', 'sub']).unstack('sub')['score'].reset_index()
@paper niche And apparently this does the trick too, in a much simpler manner as well.
Thanks a lot fickletofu

desert oar
#

i would definitely leave comments explaining what this is doing

#

if i had to read that code id be confused

#

and also w/ fickle's method you have to prepend subject_ to the unstacked column names

paper niche
#

yeah if you'ld like to add a prefix to the pivoted columns, just go with salt rock lamp's solution

desert oar
#

oooh wait

hidden halo
desert oar
#

fickle's method names the axis

#

that's kind of nice

#

unstack is nice because it names the column index itself subject which i think is cool

hidden halo
#

I haven't worked with multi-index data frames much, so I find this very confusing. Still trying to figure out what fickle's method does exactly by running it step by step

desert oar
#

@hidden halo selecting score selects a Series from the dataframe

#

so when you reset_index, that promotes the Series to a DataFrame with flat column names

#

rather, unstack creates a DataFrame with a multiindex column

#

the "outer" layer of the column axis has a score label

#

selecting that gives you just the "inner" layer, which is a DataFrame with a non-multi column index

#

then you reset_index on that, and the index "columns" become regular DataFrame columns

#

comparison of both methods

hidden halo
#

@hidden halo selecting score selects a Series from the dataframe
@desert oar Yes, I get this now. I was not able to understand what was there inside score. After looking at it in multiple ways, now I figured

desert oar
#

i was actually wrong in those first 2 lines ๐Ÿ˜…

#

look at the next

#

i crossed out the wrong parts

hidden halo
#

Yeah, it makes sense. It's not very clear yet, I guess that will take a little more working with multi-index DFs for this seem familiar. But I get the general idea.

#

One more question actually, I'm not able to get rid of the first column, which is basically the index, titled sub. I don't want the sub there, but reset index doesn't remove it.

desert oar
#

that isnt the first column

#

that's the name of the column index

#

which is what i was saying before

#

see the example repl i posted? you need to .rename_axis(columns=None)

#

if you do df2.columns you will see that the result is an Index object with name='sub'

#

this is an artifact of selecting a single key from MultiIndex columns

hidden halo
#

Oh

#

Got it, working now. Thanks. Will probably take some time till I fully understand these methods.

lapis sequoia
#

Hi all. I have a question about the pandas module. I'm trying to delete rows from a series, but it looks like the drop() documentation only allows me to do this by making a completely new series. Is there a way to edit the current series I have? Because trying to do this in iterations runs into nightmare keyerrors, because I can't simply write series = series.drop([2])

#

I'm essentially asking if there's something in pandas that is the equivalent of .append or .remove in python's lists

solar bluff
#

.drop() has an inplace argument if you need to modify the series in place.

#

There is also a .append() method on pandas series that will let you effectively concatenate one series to another.

versed violet
#

@solar bluff are you good with pandas ?

solar bluff
#

I use it every day in my job. I'm no world class expert or anything but I get by

desert oar
#

@lapis sequoia note that drop drops by row label, not by numerical position

lapis sequoia
#

Well I'm currently trying to add the inplace=True argument, but now printing out the series is printing "None"

desert oar
#

df.drop(index=[2], inplace=True) might be the 100th row, or it might even be multiple rows, with the label2

#

inplace=True makes .drop return None

lapis sequoia
#

Oh, is there a way to drop by numerical position? the documentation is confusing to me

versed violet
solar bluff
#

"inplace bool, default False

If True, do operation inplace and return None." sure enough
lapis sequoia
#

I was doing some testing and series.drop([2]) seemed to remove the 3rd column (as I would expect)

desert oar
#
import pandas as pd

s = pd.Series(list('abcdefghijklmnop'))

pos = [2]
s.drop(index=s.index[pos], inplace=True)
print(s)
#

wait, columns?

#

or rows

lapis sequoia
#

Row

desert oar
#

it works if the row labels happen to be the same as the row numbers

#

which is only true sometimes or by default

solar bluff
#

I pretty much always avoid inplace so I'm not very well skilled with using that as an argument

lapis sequoia
#

In that case, is there no way to essentially remove a row without having to make a new variable?

desert oar
#

i just showed you

lapis sequoia
#

Because again this creates so many keyerror problems

desert oar
#

the keyerror problems have nothing to do with creating a new variable

#

the keyerror has to do with you confusing row numbers and row labels

#
s.drop(index=s.index[2], inplace=True)

should work

#

s.index gives you the row labels

#

so you can index that to get the relevant labels

#

then drop using that

#

also note that pandas doesn't make a copy of all the data even when you copy the Series

#

however if you are 100% sure that your row labels and row numbers are identical then you can just use drop(index=[2])

#

but if you for example do .sort_values() then the row labels will be out of order because the row labels stay attached to the rows

#

and then you'd have to .reset_index() to remove the out-of-order index and create a new correctly ordered index

lapis sequoia
#

So what's the point of returning none?

desert oar
#

because it operates in-place

#

list.append also returns None

lapis sequoia
#

Ah I see

#

Okay, that seems to work. And evidently I need to read up on how pandas defines indexes and labels because I'm getting confused

desert oar
#

im using the term "labels" loosely

#

a DataFrame has two "axes": the index (i.e. row labels) and the columns (i.e. column labels)

#

each "axis" is represented by an Index object, which is similar to but not the same as a Series

#

an Index has a dtype and can contain strings, numbers, dates, etc.

#

and you can do row and/or column lookups on DataFrames using the Index values

#

the .loc accessor does index lookups. the .iloc accessor does positional lookups

#

if you create a DataFrame and don't specify the index, you get a default RangeIndex which is just 1:1 with the row numbers

lapis sequoia
#

So an index is not always numerical?

desert oar
#

correct

#
data = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]}, index=list('abc'))
lapis sequoia
#

I see

#

So s.drop(index=s.index[2], inplace=True) seems to work on its own, but when doing it in an iteration I still seem to be getting keyerrors

desert oar
#

keyerror happens if the index value is missing

lapis sequoia
#

So I'm trying to drop a row at row index value x, but it can't find x

#

Either because it's out of bounds or there's nothing there

#

Oh, I think it's happening because when you remove something from a series, no index values change

solar bluff
#

"hey pandas, do this thing at this location"

#

pandas: "that location? what location? i don't see no location like that. KeyError"

desert oar
#

thats correct @lapis sequoia

#

when you remove the a row from the dataframe in my example, there is no longer an a row

lapis sequoia
#

Yeah, which I believe is something .remove would automatically take care of in python

#

I suppose I could find a way to work around that though

desert oar
#

im not sure what you mean

#

can you show more of your code

#

what are you even trying to do

lapis sequoia
#

Well, in a list, say [a, b, c]. if you do list.remove[2], list would be [a, b], with the indexes adjusting automatically

desert oar
#

thats because the indexes are positional in a list

#

a DataFrame is more like an OrderedDict

lapis sequoia
#

it's on another computer, but let me write something out to make it easier

desert oar
#

it has both positional indexes and named indexes, i.e. keys

#

the keys in a dict don't adjust when you delete an entry

#

if you want to keep the row labels in sync with the row positions you need to .reset_index(drop=True) after every row deletion

lapis sequoia
#
        for j in range(0, products_b):
            if products_b[j] == products_a[i]:
                deleted_indicies.append(i)
                products_b.drop(index=products_b.index[j], inplace=True)
                break
desert oar
#

products_b and products_a are series objects? and the indexes are unique?

lapis sequoia
#

they are both series objects, yes

#

As for the indices, I've just been using numerical positions

desert oar
#

as in, you never call set_index on these right? and you never otherwise explicitly specified an index?

lapis sequoia
#

right

desert oar
#
deleted_indices = []
for i_a, val_a in products_a.items():
        for i_b, val_b in products_b.items():
            if val_a == val_b:
                deleted_indicies.append(i_b)
                products_b.drop(index=i_b, inplace=True)
                break

does this work?

lapis sequoia
#

that's what giving the keyerror

desert oar
#

oh?

#

oh i see

lapis sequoia
#

even after adding the reset_index line

desert oar
#

what if the key was deleted in a previous iteration?

lapis sequoia
#

Yeah, thats the problem

desert oar
#

you shouldnt modify something you're iterating over

#

you want to just delete the elements from b where they occur in a?

lapis sequoia
#

Yeah, I'm trying to delete the elements from b when they occur in a so that the iteration doesn't take as long

desert oar
#
for i, val in products_b[~products_b.isin(products_a)].items():
    # do something
#

but you should always question iterating manually over a Series

#

usually you can go a lot faster by using .map or .apply

lapis sequoia
#

I was wondering if there were better ways, because this is taking a very long time. the series are both pretty long

#

I'll look into map and apply

desert oar
#

iterating over pandas series is very slow

#

compared to iterating over a list

#

what are you trying to do more generally?

lapis sequoia
#

Basically I'm taking a master list of items, comparing them with another list of discontinued items

#

Ideally all of the discontinued items will be in the masterlist. If that's true it should be relatively simple to remove the discontinued items from the masterlist

#

Iteration like this was just the first thing that came to me

#

it's just going to take a long time when the masterlist has about 75k entries

desert oar
#

yeah just use .isin

#
products_current = products_master[~products_master.isin(products_discontinued)]
lapis sequoia
#

yeah I figured I was overthinking something to the very bone

#

I suppose that line would imply iteration as well, though?

desert oar
#

in C, internally.

#

not in python

lapis sequoia
#

Ah, i see

desert oar
#

although it is a shame how slow iteration over a Series is

#

but thats a bigger design issue

lapis sequoia
#

So I'm looking at the .isin documentation now. Once you get the series of booleans I suppose you can just filter out all of the "True"s

#

I would imagine with the drop() function

#

Oh you can't actually pass a series object into .isin(). Guess I'll have to convert products_b to a list

desert oar
#

eh?

#

im not sure what you mean

#
pd.Series([1,2,3]).isin(pd.Series([1,2,4]))
lapis sequoia
#

products_discontinued is a series itself

#

Oh, you can

#

the documentation says it only accepts a set or list-like

desert oar
#
pd.testing.assert_series_equal(
    pd.Series([1,2,3]).isin(pd.Series([1,2,4,7,-5])),
    pd.Series([True, True, False])
)
lapis sequoia
#

does that include a series obejct?

desert oar
#

yes, pandas docs do a poor job of defining their terms

lapis sequoia
#

aha, it's simple then

desert oar
#

a "list-like" is a list, pandas series, numpy array, and a handful of other things

#

and ~ is logical negation on a Series

#

so

pd.Series([1,2,3], index=['a', 'b', 'c']).isin(pd.Series([1,2,4,7,-5]))

is

pd.Series([True, True, False], index=['a', 'b', 'c'])

and therefore

~pd.Series([1,2,3], index=['a', 'b', 'c']).isin(pd.Series([1,2,4,7,-5]))

is

pd.Series([True, True, False], index=['a', 'b', 'c'])
#

itd be nice if there was a .notin method for efficiency but this is still a lot faster than iterating

lapis sequoia
#

So my thought is that once I have the series of booleans, I could use the .iloc method to return the indices of all "True"s

desert oar
#

you dont need that

lapis sequoia
#

Oh, that's not true

desert oar
#

you can index/subset a series with a boolean series

#

again:

products_current = products_master[~products_master.isin(products_discontinued)]
lapis sequoia
#

So is the ~ operator specific to pandas? ive never seen it before

desert oar
#

no, the ~ is binary inversion

tidal bough
#

It's the bitwise NOT operator. pandas overloads it to work elementwise on Series.

desert oar
#

yeah

#

bitwise NOT, thats what its called

lapis sequoia
#

okay, so if i wanted to keep an archives of all the discontinued products i can just delete the ~

desert oar
#

well the discontinued products are already in their own series..

lapis sequoia
#

oh, getting carried away there

desert oar
#

unless you want the intersection of products_master and products_discontinued, in which case yes

#

that said... what format does this data originally arrive in?

lapis sequoia
#

Excel files

desert oar
#

(note that in python, custom classes can override the behavior of various operators including +, -, /, *, &, |, ~)

#

sure, but they are in 2 different files?

#

or different sheets

lapis sequoia
#

different files

desert oar
#

the discontinued list and the master list

#

ok