#data-science-and-ml | Python | Page 247

velvet thorn · 2020-08-24T09:47:26.786Z

> My maths is alright okay, I'm not here to quiz you

velvet thorn Aug 24, 2020, 9:47 AM

#

My maths is alright
@desert parcel okay, I'm not here to quiz you

#

but yeah, try fixing the nan values in the input

#

and see if it works

desert parcel Aug 24, 2020, 9:48 AM

#

I'm checking the csv

#

I don't see anything nan values

#

even if there are I avoided those columns

velvet thorn Aug 24, 2020, 9:49 AM

#

uh

#

why aren't you doing it programmatically

#

it's in a DataFrame

#

you can query it

desert parcel Aug 24, 2020, 9:50 AM

#

I was looking at a pandas tutorial

#

but I feel asleep xd

#

I was tired

velvet thorn Aug 24, 2020, 9:50 AM

#

okay...?

desert parcel Aug 24, 2020, 9:50 AM

#

lol I know I'm not the best

#

Oh i see the error now

#

ok yeah there is a loss now

#

no more nans

velvet thorn Aug 24, 2020, 9:52 AM

#

yup

#

that's good

#

btw for future reference

#

you can use df.isna().sum()

#

to see whether there are nulls on a column-wise basis

desert parcel Aug 24, 2020, 9:53 AM

#

ohh

#

yeah that's handy

velvet thorn Aug 24, 2020, 9:53 AM

#

or df.isna().sum().sum() to get the total number of nulls

#

yeah...

#

well.

#

pandas knowledge is really important IMO

#

scanning a CSV by hand is hell

desert parcel Aug 24, 2020, 9:55 AM

#

yeah I can see that

#

I was looking at the tutorial before I fell asleep

#

the way you can handle data is really useful

velvet thorn Aug 24, 2020, 9:55 AM

#

IMO you should be @ least intermediate with pandas and numpy1

#

before you even think of touching TF/torch

desert parcel Aug 24, 2020, 9:57 AM

#

yeah I'm gonna take a look at those

#

100%

#

I just thought it would be to cool to learn ML

#

I went in without really knowing what I needed to know

#

I also got an inf in one of my losses

#

The values of loss change depending on what optimizer I use

velvet thorn Aug 24, 2020, 9:58 AM

#

it's cool

#

but

#

not many people can pick it up right off the bat.

#

and if your fundamentals are weak

#

you run into a lot of problems

#

without knowing how to solve them

desert parcel Aug 24, 2020, 9:58 AM

#

Well I just read the errors

#

and mess with the code for like 30 minutes or something

#

before I ask for help

velvet thorn Aug 24, 2020, 9:59 AM

#

yeah, I think you depend way too much on getting help

desert parcel Aug 24, 2020, 9:59 AM

#

Yeah I think so too lol

#

But sometimes I can't figure out the issue

velvet thorn Aug 24, 2020, 10:00 AM

#

and if your fundamentals are weak
@velvet thorn largely because of this

#

to be fair, part of it is about experience

desert parcel Aug 24, 2020, 10:01 AM

#

Well I can't say much about that since I'm new to this

#

I just mess around with stuff I know and read the docs

#

Not that I can understand a lot of it

velvet thorn Aug 24, 2020, 10:02 AM

#

I would suggest at least 2-3 months of quality Python experience before beginning to touch deep learning

desert parcel Aug 24, 2020, 10:02 AM

#

What would you consider quality then

#

Because I feel like i'm good at python but there are things that I miss

velvet thorn Aug 24, 2020, 10:02 AM

#

Because I feel like i'm good at python but there are things that I miss
@desert parcel don't think so TBH

desert parcel Aug 24, 2020, 10:02 AM

#

I know I'm not good at it but I just feel like im' good at it lol

velvet thorn Aug 24, 2020, 10:02 AM

#

even on the knowledge level

desert parcel Aug 24, 2020, 10:02 AM

#

I did say feel

velvet thorn Aug 24, 2020, 10:03 AM

#

for example, are you familiar with decorators, context managers, or the descriptor protocol?

desert parcel Aug 24, 2020, 10:03 AM

#

never heard of the last one

velvet thorn Aug 24, 2020, 10:03 AM

#

__get__?

desert parcel Aug 24, 2020, 10:03 AM

#

Oh yeah

#

stuff like

#

__init__

velvet thorn Aug 24, 2020, 10:03 AM

#

huh?

#

no...

desert parcel Aug 24, 2020, 10:03 AM

#

lol then nvm

velvet thorn Aug 24, 2020, 10:03 AM

#

the descriptor protocol underlies properties

desert parcel Aug 24, 2020, 10:04 AM

#

maybe I should rewatch that python tut video

#

that got me into this

velvet thorn Aug 24, 2020, 10:04 AM

#

anyway, IMO quality is about building things that stretch your capabilities

#

and developing knowledge

desert parcel Aug 24, 2020, 10:04 AM

#

This being python for beginners

velvet thorn Aug 24, 2020, 10:04 AM

#

breadth is really useful.

desert parcel Aug 24, 2020, 10:04 AM

#

not the tut for getting into ml

velvet thorn Aug 24, 2020, 10:04 AM

#

because everything is connected

#

a bit of computer science knowledge is also really nice for ML

desert parcel Aug 24, 2020, 10:05 AM

#

I have none of that

#

mostly because I'm still in highschool and I haven't really searched up any vids on CS

#

Maybe I should do that

velvet thorn Aug 24, 2020, 10:05 AM

#

the nice thing

#

about life nowadays

#

is that you're no longer locked into your major

#

I don't have a CS background either

#

nothing even close

#

but, yeah, knowing the answer to questions like "why is it faster to tell if an element is in a set vs a list" will come in handy someday.

desert parcel Aug 24, 2020, 10:06 AM

#

An element in a set doesn't have repeating values

#

right?

#

like if there are repeats only one instance will be printed, not sure if the right language is used

velvet thorn Aug 24, 2020, 10:07 AM

#

uh...

#

yes but no

#

I mean, nothing of what you have said is wrong

#

but what I mean is

#

3 in {1, 2, 3} vs 3 in [1, 2, 3]

#

the former is faster; why?

#

and that is a CS question

desert parcel Aug 24, 2020, 10:07 AM

#

ohh

#

Idk maybe something with memory?

#

I'm just guessing

velvet thorn Aug 24, 2020, 10:08 AM

#

no point in me telling you

desert parcel Aug 24, 2020, 10:08 AM

#

ik

velvet thorn Aug 24, 2020, 10:08 AM

#

the thing is that these things are not obvious, but they will be important sometime in the future

desert parcel Aug 24, 2020, 10:08 AM

#

I'm checking it out

velvet thorn Aug 24, 2020, 10:08 AM

#

just an illustration

#

anyway I'm out

#

have fun learning!

#

I was in your position like a year ago

#

it's a great journey

desert parcel Aug 24, 2020, 10:08 AM

#

Well thanks for the constructive criticism

#

Really did bring somethings to light

#

or shine

lapis sequoia Aug 24, 2020, 12:06 PM

#

Do i need to know the foundational linear regression algorithm and knn algorithm and some algebra,matrix,probability and statistics,calculus and numpy completely to order start new life in machine learning sequence/interested in the legendary computer vital version

desert oar Aug 24, 2020, 12:12 PM

#

Yes

#

It's foundational material

#

Even if you don't use it frequently

worn bough Aug 24, 2020, 12:49 PM

#

You don't need it to implement basic algorithms, but it's very handy if you want to know what you're doing.

#

I mean, you don't need a huge maths course, but you need to know a thing or two about probabilities and calculus.

lapis sequoia Aug 24, 2020, 12:50 PM

#

no algebra?

#

what if i wanna be efficient

#

then i require good math right

desert oar Aug 24, 2020, 12:52 PM

#

@lapis sequoia probability, calculus, linear algebra. i agree you dont need to learn it all at once, but you should definitely start learning it and seek to keep learning it over time.

lapis sequoia Aug 24, 2020, 12:53 PM

#

@desert oar Thanks buddy

heady lance Aug 24, 2020, 1:02 PM

#

COMMANDLINE Video Player - convert video files to ascii art
upvotes 263 comments 21 user Slingerhd

What is the most impressive Python based project you have seen?
Sometimes I find that Python can be so much more, but people use it mainly in data science (which is fine). Wonder any...
upvotes 38 comments 37 user vitsensei

Crime Watch: An Interactive Way To View Crime
 A Demonstration Of Crime Watch Github Link...
< upvotes 638729835245731840> 29 < comments 638729835073765387> 8 < user 638729835442602003> python959

Python logo in colored ASCII art!
< upvotes 638729835245731840> 28 < comments 638729835073765387> 2 < user 638729835442602003> Honno

[A DoS attack in 15 lines of code.
Hi, I have tried to create the simplest possible denial of service attack; for this script, I have not used more than 15...
upvotes 6 comments 26 user progsNyx

r/Python - Python logo in colored ASCII art!

169 votes and 9 comments so far on Reddit

viral scroll Aug 24, 2020, 1:02 PM

#

Hi Guys,

I have a pandas data frame like this

📎 Screenshot_2020-08-24_at_4.17.38_PM.png

#

The driverBreakdown column here is a nested dict

  'environment': {'average': 5,
   'questions': {'Question 1': 5}},
  'peerRelationship': {'average': 4,
   'questions': {'Question 2': 4}}},
 'Mood': {'average': 5.0,
  'mood': {'average': 5,
   'questions': {'Question 3': 5}}},
 'RewardsAndRecognition': {'average': 1.0,
  'recognition': {'average': 1,
   'questions': {'Question 4': 1}}}}

I would like to convert the driverBreakdown column into multiple rows in this way

📎 Screenshot_2020-08-24_at_6.31.21_PM.png

#

is there any way to achieve this directly via pandas

#

and by not using multiple python iterators

heady lance Aug 24, 2020, 1:04 PM

#

Hello Everyone

desert oar Aug 24, 2020, 1:13 PM

#

@viral scroll i would do a combination

#

write a function to "flatten" each nested dict
"explode" the flattened dicts into dataframe rows

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html

tribal hornet Aug 24, 2020, 1:18 PM

#

hello, in which channel can i clear my doubts about python?

viral scroll Aug 24, 2020, 1:22 PM

#

@desert oar
Any suggestion in how to optimize the flatten part as my data set can contains upto a million rows and I am afraid that flatting each row could be time consuming.

Also, Thanks for letting me know about the explode function.

desert oar Aug 24, 2020, 1:41 PM

#

@tribal hornet you should clarify what exactly your doubts are, then ask in a help channel 🙂 #❓｜how-to-get-help

lapis sequoia Aug 24, 2020, 4:23 PM

#

where can i learn Linear Algebra

remote pumice Aug 24, 2020, 4:30 PM

#

Hello can somebody help? so i am planning to make a driver distraction detection using open cv. so i am thinking of adding a feature which shows the number of alert the driver gets during driving. So how can i get the data?

near moss Aug 24, 2020, 5:07 PM

#

you convince a lab or some agency to fund your research

#

because I am 99.9% sure those data are not publicly available if they exist at all

solid aurora Aug 24, 2020, 6:13 PM

#

I've got a matplotlib question:

#

I have a plot using 2x2 subplots, and things are layouted properly

#

But when I add a column there is a large gap between the rows

#

and changing the figsize doesn't help

#

let me take some screenshots and show examples

last wind Aug 24, 2020, 6:15 PM

#

 [1 1 0 0]
 [1 1 1 0]
 [1 1 1 1]]
``` anyone have an idea how to make this mask in numpy

#

please ping me if you have an answer

solid aurora Aug 24, 2020, 6:16 PM

#

Pre-adding a column:

📎 unknown.png

#

the pics are intentionally low-quality

#

the content of the plots is irrelevant here

#

post-adding a column:

📎 unknown.png

#

why is there a large gap between the two rows?

#

I ran plt.figure(figsize=(20, 20)) before both

#

I'm assuming that the figsize affects the subplots, since the bottom pic is clearly not square

last wind Aug 24, 2020, 6:18 PM

#

because the total figure size is now 20 by 20. thas an aspect ration of 1:1 for those youd need an aspect ratio of like 2:3 i belive

#

wait is there whitespace after the lowr ones?

solid aurora Aug 24, 2020, 6:19 PM

#

@last wind if the figsize affects the total size then why is the bottom picture clearly non-square?

#

no

#

that's the full figure

📎 unknown.png

last wind Aug 24, 2020, 6:19 PM

#

thanks

solid aurora Aug 24, 2020, 6:20 PM

#

I set figsize=(20, 30) and got this:

📎 unknown.png

#

lemme try with (30, 20)

#

oh yea that solved it

📎 unknown.png

#

still don't fully understand how that works

#

but ¯_(ツ)_/¯

jolly sinew Aug 24, 2020, 6:23 PM

#

If I have a intake csv file from an animal shelter and an outcomes csv file from an animal shelter, but there are about 200 more records in the outcomes csv, how could I remove those so I can nicely join dataframes from the two csv files with pandas?

solid aurora Aug 24, 2020, 6:23 PM

#

there are types of joins you can use

#

idr how exactly pandas does joins

#

but shouldn't you be able to do a Left Join (assuming intake is on the left)?

#

@jolly sinew ^

lapis sequoia Aug 24, 2020, 6:24 PM

#

Hi guys I'm learning numpy and make a face-recogintion using opencv , how do i do that

#

ik numpy basics and some essential so should i get start learning opencv/

#

or do i need to know how ml works

jolly sinew Aug 24, 2020, 6:26 PM

#

@solid aurora I tried a merge on the animal ID column, but it is not a unique column because sometimes an animal with the same animal ID is recorded / admitted multiple times, so merging on animal ID multiplied those records. However, I really appreciated your advice and I'll try the left join.

#

So there's not really a good primary key

solid aurora Aug 24, 2020, 6:33 PM

#

@jolly sinew hmm maybe generate a new column which is f"{animal-id}-{visit-number}"

#

so if animal 1 gets seen 3 times, the column will be 1-1, 1-2, and 1-3

jolly sinew Aug 24, 2020, 6:34 PM

#

Oh nice, that's a good idea

solid aurora Aug 24, 2020, 6:34 PM

#

that's assuming that the intake and outtake happens sequentially and there are no missing records

#

that would totally break if there is an outtake that's not recorded

jolly sinew Aug 24, 2020, 6:38 PM

#

I'm going to give it a shot

#

Am I allowed to post links to the datasets here? It might make more sense if you could see their general shape

solid aurora Aug 24, 2020, 6:40 PM

#

Yea sure

#

I won't be able to take a look, but maybe someone else can

merry ridge Aug 24, 2020, 6:53 PM

#

Is anyone familiar with the package MIP? I have a somewhat complex mixed integer LP problem I am trying to solve and the package seems to be running into numerical artifacts, or getting stuck trying to find a solution without terminating. I can't find much information online on how to trouble shoot it.

molten hamlet Aug 24, 2020, 6:59 PM

#

uhh

#

what exactly are you want to achieve?

#

maybe it is using wrong numeric methods

jolly sinew Aug 24, 2020, 7:17 PM

#

@solid aurora I found a solution thanks to your help! I used cumcount to generate a new column of occurrences for each animal id and then did a left join on both the animal id and the occurrences columns

#

outcomesdf["Occ_Number"] = outcomesdf.groupby("Animal ID").cumcount()+1
intakesdf["Occ_Number"] = intakesdf.groupby("Animal ID").cumcount()+1
fulldf = pd.merge(intakesdf, outcomesdf, on=['Animal ID', 'Occ_Number'],
how='left', validate="1:1")

solid aurora Aug 24, 2020, 7:20 PM

#

@jolly sinew glad to hear that!

merry ridge Aug 24, 2020, 7:21 PM

#

I am trying to solve a convex mixed integer LP problem. I'm not sure sure how else I can describe it without going into extreme detail.

#

I read a recent paper by Garvie and Burkardt that showed their (unrelated) LP problem would not converge using gurobi, but would converge reliably with other solvers. I'm not sure how often something like that occurs in practice before I try to reimplement this entire thing.

upbeat ore Aug 24, 2020, 8:18 PM

#

what's the best approach to store data to access it afterwards?

#

i want to build a face recognition db, which would store the id, entrances on screen and such stuff, now i'm confused on how to store it better, would an object be appropriate?

solid aurora Aug 24, 2020, 8:35 PM

#

@upbeat ore use a proper database such as sqlite or postgresql or smth

#

@merry ridge tbh I don't think #data-science-and-ml is the best place to find linear programming advice

#

I'm not even sure how many machine learning engineers have any linear programming experience

#

I for sure have none, but then I'm just a high schooler 🙂

#

========================

#

Anyway I came here to ask

merry ridge Aug 24, 2020, 8:37 PM

#

I'm not sure I agree with that statement, but I'm willing to admit I'm wrong.

solid aurora Aug 24, 2020, 8:38 PM

#

Is there a functional difference between a high figsize and a high DPI value in matplotlib?

#

@merry ridge you may be more likely right than wrong - the machine learning engineers I've interacted with are all fresh out of college and focusing more on the buisness/analytics side than the math side

desert oar Aug 24, 2020, 8:48 PM

#

@merry ridge this is a perfectly fine place to ask, but i dont know how many people anywhere on this discord have experience with that

#

@solid aurora it's not that they don't know math. it's that linear programming isn't typically an important part of machine learning or data science nowadays, so it's not typically taught much. especially not to ML engineers who don't need the full methodological breadth that a researcher might need

#

it's probably good for a generalist to at least be aware of LP solvers and such, but i've certainly never needed it in my career

merry ridge Aug 24, 2020, 8:51 PM

#

I figured that a lot of ML practioner's would frequently run into a problem where their black box algorithm fails and they need to trouble shoot it. Are the packages used more resilient than I think they are?

solid aurora Aug 24, 2020, 8:51 PM

#

@desert oar oh yea I'm sure ML engineers know math, and are at least aware of what LP can do

#

just I doubt they have experience utilizing LP to solve such problems

#

@merry ridge oh no you're entirely right, black box algorithms fail often, just that i've never heard of someone turning to LP to solve them

desert oar Aug 24, 2020, 8:53 PM

#

@merry ridge ML practitioners rarely use black box algorithms

#

99% of the time machine learning is differentiable and solved with convex optimization methods like gradient descent

upbeat ore Aug 24, 2020, 8:53 PM

#

Could anyone advise me on how to proceed with building something like this? -- We got a surveillance system in a bar, and people sometimes fight here, bring weapons and stuff, we want to be able to identify weapons and people that are on ban list from cameras. My first idea, was using facenet, for face detection and recognition, what about weapons?, + if people come in with masks or hats, should i be looking at another data set? All suggestions are welcome. Thank you.

solid aurora Aug 24, 2020, 8:53 PM

#

I think they means things like neural networks where it's difficult to understand why something misperforms @desert oar

desert oar Aug 24, 2020, 8:54 PM

#

even so, explicit LP solvers just aren't used for that

solid aurora Aug 24, 2020, 8:54 PM

#

mmhm ^

#

@upbeat ore I always like to say that obtaining good data is 80% of the work of creating an ML model

#

you're going to need to find a dataset of images where people are holding/concealing weapons

#

and then label it so you can create ground truth

#

that can help build a model where you detect weapons on people

desert oar Aug 24, 2020, 8:55 PM

#

^ this. you need a big labeled dataset of people holding weapons in various poses etc. and you need to make sure you arent accidentally training a racist model. basically this is a huge task that even major well-funded police departments have completely failed to successfully tackle, and i doubt you will be able to do it on your own.

solid aurora Aug 24, 2020, 8:55 PM

#

and you need to make sure you arent accidentally training a racist model
THIS ^^^

desert oar Aug 24, 2020, 8:56 PM

#

but if you really feel like you want to try it, someone will have to sit down and label potentially thousands of still frames of security footage

upbeat ore Aug 24, 2020, 8:57 PM

#

what's the fastest face recognition right now ?

desert oar Aug 24, 2020, 8:57 PM

#

there are data labeling tools you can use or purchase for that task. then you gotta actually build a model on top of it which will likely require significant gpu computing.

#

that said apparently you can do OK fine-tuning existing models like YOLO v3 https://eng-memo.info/blog/yolo-original-dataset-en/

Fine-tuning of YOLO v3 with small original datasets to detect custo...

Fine-tune YOLO v3 with small original datasets to detect a custom object. In this post , detect WHILL Model C in image.

#

i would start there

merry ridge Aug 24, 2020, 8:58 PM

#

Do you mind telling me a bit more about what you do salt rock lamp? I'm just curious because the kind of work I see actively used and considered within the realm of ML sounds very different from yours.

desert oar Aug 24, 2020, 8:58 PM

#

@merry ridge data scientist

#

im curious what you see

solid aurora Aug 24, 2020, 8:59 PM

#

@upbeat ore you don't want fastest, you want most accurate

merry ridge Aug 24, 2020, 8:59 PM

#

I can describe the last few problems I've worked on if that helps. I just consider myself in data science

solid aurora Aug 24, 2020, 9:00 PM

#

I can make a "face recognition toolkit" that just assigns labels randomly

#

it can run in 0 ms, but it will have absolutely terrible performance

#

There's always a good medium balance between speed and accuracy

upbeat ore Aug 24, 2020, 9:00 PM

#

well, i was thinking that i need a fast one to be able track the faces and weapons in between

solid aurora Aug 24, 2020, 9:00 PM

#

and you 100% want to err on the side of accuracy

upbeat ore Aug 24, 2020, 9:00 PM

#

and still be able to output stuff on monitor

solid aurora Aug 24, 2020, 9:00 PM

#

well what device is this running on?

#

a desktop computer with some sort of GPU should easily be able to handle 60fps if it's like a convnet

odd yoke Aug 24, 2020, 9:01 PM

#

easily is an overstatement tbh

upbeat ore Aug 24, 2020, 9:01 PM

#

would you mind to explain how to go with this, so basically i use yolo to detect the human, then extract the human and detect the face with facenet, then search in db for banlist, then look back the the full human and try to find the weapon there

#

or there's a better approach, sorry if this sounds stupid

desert oar Aug 24, 2020, 9:03 PM

#

you might have a lot of false positives with that system, although i guess thats good enough to go and send a guard to visually inspect

upbeat ore Aug 24, 2020, 9:04 PM

#

yeah that was the idea, just to know before hand

#

as people with weapons sit near the gambling machines, we got 3 of those

#

and usually its like +10 minutes before the real stuff happens, so this is the frame we would like to catch and disable the person to not harm himself or any others

desert oar Aug 24, 2020, 9:05 PM

#

i mean this is not a small task and you'll have to test it a lot

#

but we are just now at a point where maybe this tech is within reach for a bar

upbeat ore Aug 24, 2020, 9:05 PM

#

there is no timeline for it

desert oar Aug 24, 2020, 9:05 PM

#

and not like, a giant company

#

@merry ridge sure

#

data science is a pretty broad range of job titles and tasks

#

i always like to know what other people work on

merry ridge Aug 24, 2020, 9:11 PM

#

@desert oar The last three projects I've had was classifying electricity prices to detect moments when a player in the market could change their bidding strategy to alter spot prices in a significant way; looking at modeling strategies to predict mean electricity prices levels given some anticipated regulatory changes next year; and modelling how congestion affects crude oil prices in the US Gulf Coast. The last several positions in the data science for oil & gas, pipeline, electricity and other commodity markets I applied for all required very strong LP programming knowledge (which I am not that great at) because they still use quite a lot of excel models in house.

desert oar Aug 24, 2020, 9:13 PM

#

ahh interesting

#

what do they use the LP models for?

merry ridge Aug 24, 2020, 9:13 PM

#

It's mainly because a lot of this stuff depends on economic factors, so the LP part mostly handles finding price equilibrium

desert oar Aug 24, 2020, 9:13 PM

#

i see. im mostly doing nlp classification nowadays, although my background was more in social science and statistics

merry ridge Aug 24, 2020, 9:14 PM

#

But I don't exactly enjoy it. It is finicky and frustrating to work on

desert oar Aug 24, 2020, 9:14 PM

#

hm yeah. honestly i only learned the simplex method in school

#

like i said its just not something ive ever needed

#

but i can probably think of times in the past where maybe it might have come in handy

#

e.g. i used to work in business travel, there were some problems i had at that job that i couldnt easily solve with standard "fit a predictive model" techniques

merry ridge Aug 24, 2020, 9:16 PM

#

A lot of this is being handed a paper that showed great results at some conference I've never been to

#

and being told to implement it in some completely different context that doesn't always even make sense

#

To fish out some competitive advantage and it is pretty tiring having nothing ever work

iron rampart Aug 24, 2020, 9:29 PM

#

Hey is this the right chat for machine learning based questions?

low glade Aug 24, 2020, 9:31 PM

#

@iron rampartyes

iron rampart Aug 24, 2020, 9:32 PM

#

Alright, so is it possible to create an machine learning script that can learn how to use a computer?

low glade Aug 24, 2020, 9:34 PM

#

@iron rampartuhh....woah good question, I think so in terms of the operating system but how far would you want it to go, like opening notepad or...

iron rampart Aug 24, 2020, 9:34 PM

#

Well doing task's on it own

#

Lets start simple

#

So ive created a "bot" than can open spotify by moving the curser to the right x and y coords. And then click

low glade Aug 24, 2020, 9:35 PM

#

hmm....I believe that's possible but, im not sure how to go about that

iron rampart Aug 24, 2020, 9:35 PM

#

But when i move the spotify icon it will be completly useless. So is their a way it can learn it self where it is?

low glade Aug 24, 2020, 9:35 PM

#

gonna have to research

iron rampart Aug 24, 2020, 9:35 PM

#

Or should i start with a simpler idea

low glade Aug 24, 2020, 9:36 PM

#

nah that's sound good, doing automation with spotify but I'm new myself so I'm not sure how I would go about it

iron rampart Aug 24, 2020, 9:36 PM

#

Owhh

#

Cause i have no idea where to start

low glade Aug 24, 2020, 9:37 PM

#

what library do you use? @iron rampart

iron rampart Aug 24, 2020, 9:38 PM

#

Euhm tensorflow?

low glade Aug 24, 2020, 9:39 PM

#

oh aight I'm learning that too

#

but I'm gonna have to learn more about it

iron rampart Aug 24, 2020, 9:39 PM

#

Yeah me too

#

I just don't know where to start and what's impossible to make

desert oar Aug 24, 2020, 9:40 PM

#

@merry ridge fair enough, at least you have people with domain expertise guiding you. in most of my work im completely doing it all from scratch and i have no clue whats going on

#

grass is greener i suppose 😛

merry ridge Aug 24, 2020, 9:40 PM

#

Oh I certainly have no idea what I am doing

low glade Aug 24, 2020, 9:41 PM

#

@merry ridge💀

desert oar Aug 24, 2020, 9:42 PM

#

@iron rampart "learn how to use a computer" is a big and ill-defined task. this is really more of an AI question than a ML question anyway

iron rampart Aug 24, 2020, 9:42 PM

#

Whats the diffrence between tho's?

#

I thoughed they were sort of the sams

low glade Aug 24, 2020, 9:43 PM

#

they are kinda tbh I believe machine learning has to do with models and coming up with algorithms to train them

#

but that's just my observation so far other than that it seems similar

desert oar Aug 24, 2020, 9:46 PM

#

id say that machine learning is "lower level"

#

AI would be like carrying out sequences of tasks, and reacting to unexpected input

#

whereas ML is simpler clearly-defined tasks

#

e.g. something like "identify individuals with weapons in surveillance video" is machine learning

#

but "identify threatening individuals in surveillance video" is AI, because then the model needs to learn the general concept of "threatenting"

low glade Aug 24, 2020, 9:48 PM

#

@desert oarahh I see that's makes better sense

desert oar Aug 24, 2020, 9:48 PM

#

however im not an AI practitioner so i can't claim to speak for the industry

#

but that's how i separate the two in my mind

#

in common data science practice, ML usually means making one-off predictions in a live/production setting

#

or generally just making predictions without human input

#

or even more loosely, it's sometimes just used to refer to techniques for building models that aren't "traditional" statistics

#

or even just for building models without really being concerned with statistical inference

#

its weird because its used all the time but nobody seems to have a good definition for what ML really is

iron rampart Aug 24, 2020, 10:01 PM

#

@desert oar Wow you seem pretty into machine learning... could you tell me where to start learning it?

earnest tundra Aug 24, 2020, 10:03 PM

#

Can anyone suggest some good projects which can be done by an intermediate data science learner but like mainly about data cleaning and preprocessing??

crude karma Aug 25, 2020, 12:32 AM

#

how much level of code should a data scientist know vs a programmer/coder

median relic Aug 25, 2020, 12:33 AM

#

pretty much comparable to any programmer if you are dealing with Deep learning per say

#

other wise, just a fundamental understanding of algorithms and statistics is sufficient

crude karma Aug 25, 2020, 12:34 AM

#

im pre new and interested in this field.. im learning code right now and I finished a stats course last year in college.. is deep learning a masters level thing?

median relic Aug 25, 2020, 12:36 AM

#

oh thats cool, no deep learning is not really a masters thing. It just happens to demand some prerequisites that are based in linear algebra, diff calc. And it has more to do with neural networks.

#

but otherwise like machine learning concepts it is pretty easy to learn

tidal bough Aug 25, 2020, 12:37 AM

#

https://www.coursera.org/learn/machine-learning
I recommend this introductory ML course to everyone. It covers linear and logistic regression, basic unsupervised learning, some Support Vector Machine stuff and neural networks (including personally implementing backpropagation).

#

So essentially the basics of all fields, and with very little required knowledge - only basic linear algebra, which the course provides materials and a refresher for.

#

free, too.

crude karma Aug 25, 2020, 12:39 AM

#

damn i ahvent taken linear algebra in college

median relic Aug 25, 2020, 12:39 AM

#

@tidal bough yes, that is a good place. If any one is interested in deep learning I would highly recommend http://www.deeplearningbook.org/ and the UCL deepmind lecture series on youtube

crude karma Aug 25, 2020, 12:39 AM

#

i only took differential calculus and even then i got a bad mark rip

tidal bough Aug 25, 2020, 12:40 AM

#

...how did you take differential calculus without linear algebra? All the stability theorems are about matrix eigenvalues and stuff.

median relic Aug 25, 2020, 12:40 AM

#

@crude karma dont worry it is not that difficult, just put your mind to it. you can easily learn so many concepts quickly. Just don't think of it as some advanced concept

crude karma Aug 25, 2020, 12:40 AM

#

uh

#

our courses are split

#

so like

#

differential calculus, integral calculus, then linear algebra

#

thats the progression

tidal bough Aug 25, 2020, 12:41 AM

#

ah, I got it now

#

I thought you mean differential equations

#

differential calculus is indeed a lot earlier

crude karma Aug 25, 2020, 12:41 AM

#

i really have to re take diff calculus

flat quest Aug 25, 2020, 2:16 AM

#

i mean it depends on what you plan to do. @crude karma

you really won't need diff calculus to become an entry ML engineer or a data scientist.

Most of the linear algebra, differential calculus, statistics stuff only gets really important once you start going into research. Until then it's just familiarizing yourself with the techniques or models that have already been researched and proven.

crude karma Aug 25, 2020, 2:17 AM

#

thinking of going into industry rather than academia

flat quest Aug 25, 2020, 2:18 AM

#

well there's research in industry as well. Companies hire a number of ML researchers

desert oar Aug 25, 2020, 2:49 AM

#

i kinda disagree

#

linalg and stats are essential problem solving tools in my work

#

and calculus is just a necessary prerequisite for understanding pretty much anything

#

do you need it all on day 1? no. will you need it to actually make it through and understand the material? yes. will you be worse off without it? yes.

#

take it from me, who tried to get by for a long time learning as little "theory" as possible

#

if you don't know the underlying math at least roughly, you can't really know what you're doing or why it works / doesn't work.

#

intuition is nice but not enough

#

especially once you're past the entry level and the problems in front of you no longer resemble exactly things that you saw in your coursework and textbooks

#

maybe if you're lucky enough that your work allows you to dump everything into keras and call it a day, then fine

#

but i dont know of many people whose jobs are actually like that

velvet thorn Aug 25, 2020, 3:18 AM

#

^

#

this a thousand times

#

if you do not understand the concepts underlying the code you use (and I don't just mean the programming abstractions, but also the mathematics)

#

your work will be slow, inefficient and of low quality

#

and you will spend a ton of time not understanding the errors you get and the problems you face

#

I have trained data scientists and done freelance teaching

#

I cannot understate the importance of a strong foundation

desert oar Aug 25, 2020, 3:21 AM

#

(where "errors" includes "my model kinda works but it performs badly sometimes", not just "ValueError")

storm scroll Aug 25, 2020, 4:18 AM

#

What’s the best and easiest python package to implement plots on my website
, including playable interface and 3D graphs

velvet thorn Aug 25, 2020, 4:22 AM

#

(where "errors" includes "my model kinda works but it performs badly sometimes", not just "ValueError")
@desert oar edited to clarify

frank bone Aug 25, 2020, 4:59 AM

#

why cant i do if (var = some_function): continuewithcode else: break

#

goal is to continue the code if a value can be assigned to var. If there's an error assigning a value to var, it should break the loop

velvet thorn Aug 25, 2020, 5:06 AM

#

goal is to continue the code if a value can be assigned to var. If there's an error assigning a value to var, it should break the loop
@frank bone because that's not what if is for

#

but try-except

desert oar Aug 25, 2020, 5:07 AM

#

@frank bone in python 3.8+ you can do

if (var := some_function()):
    ...

frank bone Aug 25, 2020, 5:07 AM

#

tied except, worked well 🙂 thanks!

desert oar Aug 25, 2020, 5:07 AM

#

oh, i see

frank bone Aug 25, 2020, 5:07 AM

#

@desert oar good to know!

#

ill try that too

#

thx

desert oar Aug 25, 2020, 5:07 AM

#

it wont help with your specific question, i realized

#

but its a useful new feature otherwise

flat quest Aug 25, 2020, 6:09 AM

#

Yeah I agree, but it depends on what exactly you're work is, and how far you plan to go in terms of an ML/DS career @desert oar

By all means, actually understanding the underlying mathematics is vital to get far in ML. But an entry engineer doesn't really need to really know the underlying details.

For new people it might be worth just spending 4 - 5 months learning ML, and then getting an entry job for a while. And then come back and learn all those topics in depth

If you start getting into doing research, writing your own libraries, CUDA software, or just encountering problems that are very specific to a company, then those will be necesarry.

merry ridge Aug 25, 2020, 6:13 AM

#

In my opinion, at the absolute minimum even a very basic crash course in linear algebra provides a mountain of intuition that is helpful at all levels of skill. It is the least forgivable of the mathematical corners to cut.

flat quest Aug 25, 2020, 6:17 AM

#

yeah prob that, stats, and gradients

bitter harbor Aug 25, 2020, 6:18 AM

#

I've found stats to be a bit less useful when building nn's but it's still %100 important for ml

flat quest Aug 25, 2020, 6:19 AM

#

its more so for DS
Actual neural nets it's not as important unless you're working with probabilistic models

bitter harbor Aug 25, 2020, 6:20 AM

#

that's what I mean, it's more using in pre/post processing

merry ridge Aug 25, 2020, 6:21 AM

#

So I'll be honest. I have no idea why stats is important.

#

Are you rolling regression techniques into the area of stats? Probability theory into stats? In my mind they are different things.

desert oar Aug 25, 2020, 6:22 AM

#

stats and machine learning have the same problem

#

nowadays theyre mostly just references to different sets of problem solving approaches

#

the far ends of stats are very different from the far ends of machine learning, but its really hard to draw a line between them. to the extent that theyre even different things

#

so yeah, id say that fitting a probability model counts as stats

#

as do all the various forms of hypothesis testing

#

i can't say that NHST is that useful in industry nowadays, but the concept i think is very important to understand

merry ridge Aug 25, 2020, 6:24 AM

#

Fair enough. At least in my experience, I feel like people cheat on the hypothesis testing part like crazy

desert oar Aug 25, 2020, 6:25 AM

#

true. depends on what industry

#

basically any time youre inferring parameters of a probability model, i have a hard time resisting the temptation to call that stats

#

fun debate topic: is holt winters smoothing stats or machine learning?

merry ridge Aug 25, 2020, 6:26 AM

#

Here, you learn linear regression etc as part of your calculus curriculum before Lagrange multipliers and most of the probalistic tools are taught in a proper probability theory course separate from stats which is why I ask.

flat quest Aug 25, 2020, 6:26 AM

#

I mean neural networks are just stats in a nutshell. We just made the process more incremental.

We're still finding a distribution of generated values that closely match the original dataset. You won't likely use direct stat techniques in a conventional neural net though, at least not directly.

For Bayesian/Probabilistic models, though, those are pretty much grounded on distributions.

And then stats also helps with data analysis. You need to know what you're working with before you do any feature engineering or data cleaning.

desert oar Aug 25, 2020, 6:26 AM

#

I mean neural networks are just stats in a nutshell. We just made the process more incremental.
i dont know if this is true

bitter harbor Aug 25, 2020, 6:26 AM

#

fun debate topic: is holt winters smoothing stats or machine learning?
ml

desert oar Aug 25, 2020, 6:27 AM

#

"machine learning is making predictions without explicitly making statistical inferences, except where making such inferences is convenient for making better predictions"

#

its way past my bedtime

#

means and variances are probability theory... but estimating them from data is statistics

#

what other aphoristic platitudes can i come up with at 2:30 AM

flat quest Aug 25, 2020, 6:30 AM

#

well ML was largely built on the basis of stats.

I guess its difficult to say that they're exactly the same, but they follow very similar principles.

The line is very blurred. I guess the question would be what is stats in the first place, cause statistics is a very large encompassing term.

bitter harbor Aug 25, 2020, 6:30 AM

#

arguing that when a computer makes an inference on a dataset - when it's simply an equation is stats is dumb, even a basic neural net is a series of equations that allows those inferences to be made, so just because it has a 'lower complexity' + can be done by hand, doesn't mean it's not machine (or ig if you really want to be specific: mathematical) learning

#

I think the 'learning' part of it is misleading

#

but then again, mathematically(/statistically) induced inferences doesn't roll off the tongue

desert oar Aug 25, 2020, 6:31 AM

#

in all seriousness, ML is fundamentally a problem domain, whereas statistics is a set of techniques for fitting and making inferences from probabilistic models. it just so happens that we now have a lot of methods for making predictions that arent inherently statistical, but still fulfill the task of ML (and happen to be useful in many other places), so we call them "ML techniques" and the whole thing becomes a terminological mess

#

statistics is one tool for approching the task of ML

#

but you can minimize loss functions without explicitly appealing to statistics

#

whereas much of statistics does depend on minimizing loss functions

#

basically its all historical nonsense

merry ridge Aug 25, 2020, 6:33 AM

#

From my perspective, ML is mainly Numerical Analysis (which is basically taylor series and secant lines) and linear algebra with comparatively little stats sprinkled on the shoulders of those two giants.

bitter harbor Aug 25, 2020, 6:33 AM

#

see i'd argue ml is what happens when statistical models get combined with math

desert oar Aug 25, 2020, 6:33 AM

#

i dont see much numerical analysis used in ML at all @merry ridge , i think your experience is somewhat unique (and very interesting)

merry ridge Aug 25, 2020, 6:34 AM

#

What about something like gradient descent?

desert oar Aug 25, 2020, 6:34 AM

#

sure

merry ridge Aug 25, 2020, 6:34 AM

#

that is a core topic in every numerical analysis course

desert oar Aug 25, 2020, 6:34 AM

#

then, granted

#

its funny that traditional statistics historically depended on 2nd order optimization methods

#

when it turned out that gradient descent was good enough all along. maybe its just that computers used to be so much slower so you needed the faster convergence of 2nd order methods?

flat quest Aug 25, 2020, 6:35 AM

#

I mean I guess it depends on what you see ML models are doing. They fit to distributions on the data at the end of the day, whether they're good or not, using a number of statistical based techniques.

Well gradient descent on its own is not necesarrily enough. We don't directly use the gradient anymore.

desert oar Aug 25, 2020, 6:36 AM

#

a neural network is not explicitly fitting any distribution, at least in the general case

#

is there always an implied model for the conditional expectation and an error distribution around said expectation? yeah

#

does that mean you can actually know or make use of that information? maybe, maybe not

merry ridge Aug 25, 2020, 6:37 AM

#

One of the problems with higher order methods is that they may be faster, but they may be less numerically stable

desert oar Aug 25, 2020, 6:37 AM

#

doesnt computing hessians also get pretty gnarly

merry ridge Aug 25, 2020, 6:38 AM

#

Truncating at x^2 is a bit of a trade off

desert oar Aug 25, 2020, 6:39 AM

#

what were you saying about the secant lines?

merry ridge Aug 25, 2020, 6:39 AM

#

I say secant lines because a lot of derivatives are just approximted by one

desert oar Aug 25, 2020, 6:39 AM

#

ah

flat quest Aug 25, 2020, 6:40 AM

#

I mean the goal of a neural network in the majority of cases is to create values that are in the best case basically exactly matching to the real global dataset.

Unless I'm making a major error here, it seems to me that we're trying to approximate distributions

desert oar Aug 25, 2020, 6:42 AM

#

i dont think thats the case. you can say that your predictions should roughly follow the distribution of the target in the training data (assuming the features have the same distribution as in the training data), but that isnt really the goal

flat quest Aug 25, 2020, 6:43 AM

#

well roughly, because we don't have a true dataset encompassing all possible data points.

If we did we wouldn't need to worry about overfitting as much as we do now

desert oar Aug 25, 2020, 6:44 AM

#

however im pretty sure you can safely say without loss of generality that a neural network in a regression problem typically makes predictions of the form f(x) = E(Y|X=x)

flat quest Aug 25, 2020, 6:45 AM

#

right

#

also on the topic of gradients, it is worthwhile wondering if the gradient is really the most useful thing. I mean the brain is using some form of learning, and its pretty good. I don't know if its using gradient descent and backprop - there was a recent paper on this using inverse functions - or something else

Tho how synapses form between neurons and strengthen based on current research seem to suggest its based on repeated firing.

desert oar Aug 25, 2020, 6:46 AM

#

i mean, gradient descent is just the optimization algorithm that happens to work on networks

#

and it also lends itself to this really elegant formulation in terms of computation graphs and layers

#

im sure hexicle can attest to all kinds of other specialized ways to find the parameters of a function that minimize average loss on a dataset

flat quest Aug 25, 2020, 6:47 AM

#

true, but its largely suited for supervised learning

merry ridge Aug 25, 2020, 6:48 AM

#

I mean, I've know of them, but they are mostly a gigantic pain in the ass to use

desert oar Aug 25, 2020, 6:48 AM

#

if someone comes up with a better technique, then i dont think anyone is going to reject it out of hand, but youre going to have to make a case for why it beats the elegance and convenience of stochastic or batch gradient descent

flat quest Aug 25, 2020, 6:49 AM

#

oh yeah, there's been attempts at more biologically plausible neurons before

They haven't seemed to work as of yet though. No reason to use it unless it's actually performing above our current SOTA

desert oar Aug 25, 2020, 6:49 AM

#

right

#

hence all of these weird and interesting machine learning techniques have been relegated to history. goodbye to the radial basis functions and kernels and coordinate descent and support vectors

#

and this is why i dont know anything about anything

#

because i just dont need to anymore

merry ridge Aug 25, 2020, 6:51 AM

#

A lot of the fancier techniques get increasingly complicated to the point of absurdity

#

One of my former Numerical Analysis professors would tell us stories of specialized rooms with paper that hung from the top of the walls to the floor and they would write high order methods from step ladders to figure out all the coefficients needed in a formula.

desert oar Aug 25, 2020, 6:52 AM

#

heh

#

thats wild

flat quest Aug 25, 2020, 6:53 AM

#

i mean everything's getting more complicated in general, until someone goes an abstracts portions of it

#

^^

#

i wonder how long the actual equations were...

merry ridge Aug 25, 2020, 6:55 AM

#

It's been too long to remember the exact topic at the time

#

I'd have to ask them, but it is nearly impossible during the pandemic

flat quest Aug 25, 2020, 6:57 AM

#

yeah for sure

wait are u a grad student?

merry ridge Aug 25, 2020, 6:58 AM

#

No

#

I keep an office at my alma mater because I am an editor for a discrete mathematics journal and I didn't want to "take my work home". Also it is nice having a space at the university even if I show up once every 4 months.

flat quest Aug 25, 2020, 6:59 AM

#

ah gotcha gotcha

bitter harbor Aug 25, 2020, 7:18 AM

#

salt I want your opinion on something ;)

desert oar Aug 25, 2020, 7:18 AM

#

heh sure

bitter harbor Aug 25, 2020, 7:19 AM

#

I've started exploring linux based systems and found a laptop calling the kubuntu focus

#

which claims:

#

📎 unknown.png

#

is that a thing?

desert oar Aug 25, 2020, 7:20 AM

#

i have no idea

bitter harbor Aug 25, 2020, 7:22 AM

#

like it's a i7-9750H 6c/12t 4.5GHz, 2060-80 rtx laptop for like 2.5k

#

and I get branding and all that

#

but I can't tell if that's bs or not

#

would you happen to know what an ai score is?

lapis sequoia Aug 25, 2020, 7:26 AM

#

I read 'up to' and 'often'

#

and I'm pretty sure that marketing bs

velvet thorn Aug 25, 2020, 7:32 AM

#

that sounds super dodgy.

bitter harbor Aug 25, 2020, 7:49 AM

#

📎 unknown.png

odd yoke Aug 25, 2020, 8:09 AM

#

do not buy a laptop for deep learning

#

like, ever ever ever

#

it will have a combination of shitty gpu, shitty ventilation, and being overpriced

#

as well as generally not being upgradable

#

do yourself a favor and get a better desktop for half the price

willow kernel Aug 25, 2020, 8:13 AM

#

Hello - I would like to get some help condensing a large dataset. Was wondering if anyone has experience with this?

lost patio Aug 25, 2020, 8:15 AM

#

Hi @willow kernel, continuing from where we left off, let's write a little script to break up your dataset into separate files.

willow kernel Aug 25, 2020, 8:16 AM

#

Sounds good!

lost patio Aug 25, 2020, 8:17 AM

#

You have a csv file yes?

willow kernel Aug 25, 2020, 8:17 AM

#

Yes

lost patio Aug 25, 2020, 8:20 AM

#

Actually we should probably move to a help channel so we don't flood this one. Head over to #help-broccoli

frank bone Aug 25, 2020, 8:44 AM

#

any idea how you would go about converting "requests.models.Response" type of data (coming from an API call) into normal python type of format...dict, list, pandas?

#

i got it, converting it into json then going from there

arctic wedgeBOT Aug 25, 2020, 8:52 AM

#

Hey @vague portal!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

vague portal Aug 25, 2020, 8:54 AM

#

Hmm how can I share an excel file with you guys? I need some help with analysing this dataframe 😮

charred agate Aug 25, 2020, 9:14 AM

#

Hello all, Im currently a comp sci student in college actively looking for motivated people to work on AI based projects with. TensorFlow would be the main api. I am always looking for like minded people to collaborate with. Pardon me if this is a inappropriate place for a message like this. Pointers to other places where I could look for people would be great. Networking during a pandemic is hard and this was the first place that came to mind. Thanks!

vague portal Aug 25, 2020, 9:14 AM

#

Basically, I have a large dataframe (200,000 x 10) that I want to analyse. I want to be able to group the data by one of the columns, and then make sub-groups within these groups. I think I could do this manually, but it will take me forever, is there a quicker way to do this than using for loops?

#

@charred agate check out R42 institute https://www.r42group.com/r42institutefellows

R42

R42 Institute - AI Fellows Program

The Fellowship is designed to develop the AI/machine learning, deep science, design thinking and entrepreneurial skills of emerging talent.

#

I'm on the water data science project and it would be helpful to have someone with machine learning experience to help us out

atomic forge Aug 25, 2020, 9:26 AM

#

!resources

arctic wedgeBOT Aug 25, 2020, 9:26 AM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

charred agate Aug 25, 2020, 9:28 AM

#

Hey thank you, I really appreciate it. I will check it out later today, it’s 5am for me. Thank you again @vague portal ! Also @atomic forge , thanks for the resources.

atomic forge Aug 25, 2020, 9:28 AM

#

np

random perch Aug 25, 2020, 2:34 PM

#

Im planning on buying and putting together a pc build with the new 3080 when it comes out. Do you guys know if its possible to train small models on gpus that arnt like a titan? I want to work on building a deep learning model to play chess.

lapis sequoia Aug 25, 2020, 3:15 PM

#

has anyone used eta-squared in here (ideally in python)? have a question about it

little shard Aug 25, 2020, 3:21 PM

#

Hi! I've been trying to screenshot data from a program so I could then convert it to text because I wanted to automate the whole process. But from some reason, four separate screenshots are taken at the same time although I set time.sleep()in multiple places — but when I do the same thing, but with no running the program (just desktop is visible) the screenshots are taken separately. How can I delay screenshots while inside program?

#

this is the part of the code:

`from datetime import datetime
from datetime import date
import subprocess

try:
from PIL import Image
except ImportError:
import Image
import pytesseract

def ocr_core(filename):

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string(Image.open(filename)) 
return text

os.startfile('C:\Program Files\Stellarium\stellarium.exe')

time.sleep(8)

pag.hotkey('f3')
pag.typewrite('Delta Cep')
pag.hotkey('enter')

time.sleep(4)
now_cep=datetime.now()
vrijeme_cep=now_cep.strftime('%m-%d-%Y %H-%M-%S')
folder='images/'
filename=' Delta Cep.png'
output_cep=folder+vrijeme_cep+filename
time.sleep(2)
im=scr.grab(bbox=(0,0,1919,1079))
im.save(output_cep)`

flat quest Aug 25, 2020, 5:31 PM

#

hey @charred agate

Yeah looking for motivated AI/ML people as well, maybe we can work on a project together 🙂

proud iron Aug 25, 2020, 5:44 PM

#

Guys, what is one of the fastest ways for a newcomer in AI to start writing machine learning code. 🙂

iron rampart Aug 25, 2020, 5:44 PM

#

Where should i start with machne learning?

tidal bough Aug 25, 2020, 6:09 PM

#

@proud iron doing a coursera course, probably.

#

@iron rampart @proud iron https://www.coursera.org/learn/machine-learning I much recommend this one as an introductory ML course.

#

It has little to no needed background knowledge, yet covers most of the fields.

proud iron Aug 25, 2020, 6:11 PM

#

@tidal bough do you currently remember a resource that requires background Python knowledge or? 🙂

tidal bough Aug 25, 2020, 6:12 PM

#

There probably are introductory ML courses in Python, but I don't know them.

#

Doesn't really matter though, since all the courses in the https://www.coursera.org/specializations/aml specialization use Python, pretty much.

#

I've only done https://www.coursera.org/learn/practical-rl?specialization=aml of them, mind - wasn't very interested in the rest.

proud iron Aug 25, 2020, 6:25 PM

#

Cheers @tidal bough . 🙂

molten hamlet Aug 25, 2020, 6:27 PM

#

yo, so I got 2 kind of information, RGB map and height map,
how would I detect water groups?

📎 stacked.png

#

lets say I want to count water areas

tidal bough Aug 25, 2020, 6:30 PM

#

well, here it looks like water is just all the blue, lol

molten hamlet Aug 25, 2020, 6:30 PM

#

xD

#

genius

#

📎 Screenshot_from_2020-08-25_20-34-56.png

#

its kinda ml question

#

how to find and count objects

#

xd

#

orrr

#

I will simply create mask and adios

tidal bough Aug 25, 2020, 6:48 PM

#

you can just find contigent areas of a color

desert cradle Aug 25, 2020, 6:50 PM

#

flood fill

haughty turtle Aug 25, 2020, 7:01 PM

#

hey

#

i was wondering how i could (with panda or numpy if needed) detect categorical data ?

tidal bough Aug 25, 2020, 7:03 PM

#

WDYM by detecting categorical data?

haughty turtle Aug 25, 2020, 7:04 PM

#

get every categorical columns inside a new dataset

tidal bough Aug 25, 2020, 7:06 PM

#

You could check how many unique values there are in that column, and if it's, say, below 100, assume it's categorical.

haughty turtle Aug 25, 2020, 7:07 PM

#

it wont work everytime

#

cause if we have the speed of a car it can be unique every time (like: 140.274km/h)

tidal bough Aug 25, 2020, 7:10 PM

#

If your dataset has only like a dozen different values for a column, you can consider that column categorical even if the dataset's creators considered it continious 😛

haughty turtle Aug 25, 2020, 7:11 PM

#

ok and what about that:

📎 unknown.png

#

the passenger class is a categorical column

#

however all the values are not unique

tidal bough Aug 25, 2020, 7:12 PM

#

There are only 2 unique values, so by my definition it'd be a categorical column.

#

Same with price, here.

#

dataset with 3 rows isn't much of a one, though

haughty turtle Aug 25, 2020, 7:13 PM

#

just an exemple

#

i need to deal with one with more than 100 columns

desert oar Aug 25, 2020, 7:19 PM

#

@haughty turtle if the column contains only integers thats one potential sign

#

moreso if they're all consecutive integers

#

whereas very large nonconsecutive nonnegative integers all of the same digit length could be id numbers

#

look for "type" or "category" or "class" in the column name

#

this seems like a very weird problem to have

haughty turtle Aug 25, 2020, 7:21 PM

#

its not haha

desert oar Aug 25, 2020, 7:21 PM

#

what is this data? where is it retrieved from? how is it stored?

arctic wedgeBOT Aug 25, 2020, 7:22 PM

#

Hey @haughty turtle!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

desert oar Aug 25, 2020, 7:22 PM

#

its pretty unusual to have to perform automated data processing on 100s of tables with unknown schemata...

haughty turtle Aug 25, 2020, 7:22 PM

#

i want to fillna

desert oar Aug 25, 2020, 7:23 PM

#

or rather, tables with 100s of columns with unknown schemata

haughty turtle Aug 25, 2020, 7:23 PM

#

i know the schemata

desert oar Aug 25, 2020, 7:23 PM

#

then use it

#

look at it

haughty turtle Aug 25, 2020, 7:23 PM

#

i can

#

but don't want to

#

cause i wan't to fill na in one line

desert oar Aug 25, 2020, 7:23 PM

#

"i got the instruction manual but its long and i dont want to read it, how can i detect the right screws to tighten on my car?"

haughty turtle Aug 25, 2020, 7:23 PM

#

that's why im doing a lib

#

i have plenty of other project to do and i want to create a lib to go faster

#

thats what code is supposed to do

desert oar Aug 25, 2020, 7:24 PM

#

by the time out figure out an algorithm for this you could have just typed out a list of categorical/numeric indicators by hand

#

if you want to do it for fun, then go for it. but this doesnt sound like an optimal use of your time

haughty turtle Aug 25, 2020, 7:28 PM

#

ill try and if it shows it results publish it .... maybe it will help others to save time

iron rampart Aug 25, 2020, 7:48 PM

#

@tidal bough The course is in Python lang right?

tidal bough Aug 25, 2020, 7:50 PM

#

if you mean the coursera ML one - no, it uses Octave.

#

The programming assignments there all require manually writing code rather than using existing libraries anyway, so it wouldn't be much different if it were in Python.

#

Octave has builtin linear algebra support, so it's kinda like numpy 🙂

crisp jewel Aug 25, 2020, 7:57 PM

#

from zlib import crc32
import numpy as np

def test_set_check(identifier, test_ratio):
    return crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32```

#

can anyone help me with this?

#

The function checks, whether one row of a DataFrame should belong to the test-set or the train-set.

#

I don't know how it works

spiral yew Aug 25, 2020, 8:00 PM

#

For deep learning, what OS do you guys recommend and most people use? I've heard that Windows is pretty bulky and Linux is the best. I cant use Linux for school since I need some software, so any recommendations? (I have a macbook pro as well)

crisp jewel Aug 25, 2020, 8:00 PM

#

use your macbook pro , that what I would go for , besides linux has most of the softwares that you need in school

spiral yew Aug 25, 2020, 8:04 PM

#

Is windows not good? I've seem some ml people use windows @crisp jewel

#

the specs of my macbook pro are pretty bad compared to my pc's (its a desktop pc not a laptop btw which i built this summer)

crisp jewel Aug 25, 2020, 8:07 PM

#

use the macbook

#

leave the desktop as windows

#

macbook for working

#

the other one sell it lmao

#

dont sell the monitor tho

vivid zenith Aug 25, 2020, 8:14 PM

#

k

spiral yew Aug 25, 2020, 8:25 PM

#

macbooks are pretty bad tbh, im not doing anything and the fan runs really fast

#

and for deep learning a macbook isnt the best idea because it doesnt even have a gpu

uncut shadow Aug 25, 2020, 8:29 PM

#

They have gpus tho

raven mulch Aug 25, 2020, 9:03 PM

#

In this video we look at a paper which proposes with theoretical and empirical evidence to use tempered sigmoids instead of ReLU (or in general exploding activation functions) to improve on differentially private stochastic gradient descent (DP-SGD). I would love to spark discussions here or on the youtube comment section about this paper!

Video: https://www.youtube.com/watch?v=g2acvGl99-k

Paper: https://arxiv.org/abs/2007.14191

Abstract: Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to less than ideal privacy/utility tradeoffs, as we show here. Instead, we propose that model architectures are chosen ab initio explicitly for privacy-preserving training. To provide guarantees under the gold standard of differential privacy, one must bound as strictly as possible how individual training points can possibly affect model updates. In this paper, we are the first to observe that the choice of activation function is central to bounding the sensitivity of privacy-preserving deep learning. We demonstrate analytically and experimentally how a general family of bounded activation functions, the tempered sigmoids, consistently outperform unbounded activation functions like ReLU. Using this paradigm, we achieve new state-of-the-art accuracy on MNIST, FashionMNIST, and CIFAR10 without any modification of the learning procedure fundamentals or differential privacy analysis.

YouTube

Federico Barbero

Tempered Sigmoid Activations for Deep Learning with Differential Pr...

In this video we look at a paper which proposes with theoretical and empirical evidence to use tempered sigmoids instead of ReLU (or in general exploding activation functions) to improve on differentially private stochastic gradient descent (DP-SGD).

Paper: https://arxiv.org...

▶ Play video

arXiv.org

Tempered Sigmoid Activations for Deep Learning with Differential Pr...

Because learning sometimes involves sensitive data, machine learning
algorithms have been extended to offer privacy for training data. In practice,
this has been mostly an afterthought, with...

bitter harbor Aug 25, 2020, 9:26 PM

#

“the best tempered sigmoid achieves 98.1% test accuracy whereas the baseline ReLU model trained to provide identical privacy guarantees (ε = 2.93) achieved 96.6% accuracy.”

#

I’d like to see some proof of that

#

Also what does the ‘heat’ in figure 2 represent?

flat quest Aug 25, 2020, 10:38 PM

#

i see we have another yannic klicher @raven mulch

raven mulch Aug 25, 2020, 10:40 PM

#

@bitter harbor it’s the testing accuracy

#

Hahaha @flat quest he’s great

velvet thorn Aug 26, 2020, 1:33 AM

#

from zlib import crc32
import numpy as np

def test_set_check(identifier, test_ratio):
    return crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32```

@crisp jewel this is WILD

#

why would anyone do that

#

this is like a great example of "trying to be smart" IMO

fathom raptor Aug 26, 2020, 1:43 AM

#

alright so when i run that code this is what i get

📎 unknown.png

#

where these are all columns in the csv file

#

that i'm reading from

#

sorry if this is too vague im just starting with data science 😬

velvet thorn Aug 26, 2020, 1:44 AM

#

ah, this is the infamous Titanic dataset

fathom raptor Aug 26, 2020, 1:44 AM

#

indeed

velvet thorn Aug 26, 2020, 1:44 AM

#

are you asking

#

what each column means?

fathom raptor Aug 26, 2020, 1:44 AM

#

no, i get that

#

i'm just asking how this data.corr() statement works

velvet thorn Aug 26, 2020, 1:45 AM

#

incidentally, I would suggest data.corr().abs()['survived'].sort_values() instead

#

hm

#

are you asking about the meaning of the correlation coefficient?

fathom raptor Aug 26, 2020, 1:45 AM

#

lol no okay lemme try to word this better

solar bluff Aug 26, 2020, 1:45 AM

#

🛳️

fathom raptor Aug 26, 2020, 1:45 AM

#

okay first of all why are there [[]] around the "survived"

velvet thorn Aug 26, 2020, 1:46 AM

#

okay

#

that's a pandas question

#

so you know data is a DataFrame, right?

fathom raptor Aug 26, 2020, 1:46 AM

#

is there a channel for that lol

#

yeah i get that

velvet thorn Aug 26, 2020, 1:46 AM

#

which conceptually represents 2D data

fathom raptor Aug 26, 2020, 1:46 AM

#

mhm

velvet thorn Aug 26, 2020, 1:46 AM

#

okay, so you know what a Series is?

fathom raptor Aug 26, 2020, 1:46 AM

#

uhh

#

no :)

velvet thorn Aug 26, 2020, 1:46 AM

#

a Series represents 1D data

#

either a row or a column

#

so, for example, if you do data['survived']

#

you get the column representing whether each person survived or not

#

because a column is 1D, that's a Series

fathom raptor Aug 26, 2020, 1:47 AM

#

ohh

velvet thorn Aug 26, 2020, 1:47 AM

#

so you can think of a DataFrame as a collection of Series

fathom raptor Aug 26, 2020, 1:47 AM

#

like a vector?

velvet thorn Aug 26, 2020, 1:47 AM

#

yes

#

now, we used square brackets above

#

to access a single column of data

#

but what if we want to take multiple columns?

#

then we would pass a list

#

say we wanted the sex and age columns

#

data[['sex', 'age']]

#

which you can break down as:

columns = ['sex', 'age']
data[columns]

#

make sense?

fathom raptor Aug 26, 2020, 1:48 AM

#

yes

#

but in [['survived']] we only have one element in the list?

velvet thorn Aug 26, 2020, 1:49 AM

#

yes

#

so now

#

that's the difference between 2D data with one unit dimension and 1D data

#

in other words

#

if you did ['survived']

#

you would get a Series

#

but with [['survived']] you have a DataFrame with one column.

#

and the two are different things

fathom raptor Aug 26, 2020, 1:50 AM

#

ohh so data[['sex', 'age']] returns a dataframe okayy

velvet thorn Aug 26, 2020, 1:50 AM

#

just like in normal Python, [[1, 2, 3]] and [1, 2, 3] are different things

#

yup

fathom raptor Aug 26, 2020, 1:51 AM

#

so data.corr() .abs() without the [['survived']] would give a very large dataframe with every pairwise correlation coefficient i assume

#

lemme try it out

velvet thorn Aug 26, 2020, 1:52 AM

#

yes

fathom raptor Aug 26, 2020, 1:53 AM

#

ooh this is cool

#

thanks for the help :))

velvet thorn Aug 26, 2020, 1:53 AM

#

yw

versed violet Aug 26, 2020, 1:59 AM

#

Hello ! I have a csv which represents the temperature data for the 4 seasons, I want to add a precise number for each 90 iterations and I am a little stuck doing it with pandas

velvet thorn Aug 26, 2020, 2:04 AM

#

Hello ! I have a csv which represents the temperature data for the 4 seasons, I want to add a precise number for each 90 iterations and I am a little stuck doing it with pandas
@versed violet what do you mean precise number for each 90 iterations

#

like first 90 rows one number, next 90 rows one number, etc.?

versed violet Aug 26, 2020, 2:10 AM

#

📎 unknown.png

#

I want to count 94 row for exemple and add a number for each of this 94 rows

#

My csv looks like this

📎 unknown.png

hasty grail Aug 26, 2020, 2:13 AM

#

From your image, do you mean

Add 3 to each of the the first 79 rows
Add 2 to each of the next 93 rows
Add 5 to each of the 94 rows after that
...
?

versed violet Aug 26, 2020, 2:13 AM

#

Yes !

hasty grail Aug 26, 2020, 2:14 AM

#

Can you claim a help channel (read #❓｜how-to-get-help ) so that we can go more in-depth about it?

versed violet Aug 26, 2020, 2:17 AM

#

Yes 1mn just to read how to claim the help channel thanks !

fathom raptor Aug 26, 2020, 2:20 AM

#

quick question, how come if i replace '?' with numpy.nan i can still use .dropna() on the dataframe? does python have a built in nan data type? my intuition tells me that numpy.nan is different but idk

velvet thorn Aug 26, 2020, 2:21 AM

#

quick question, how come if i replace '?' with numpy.nan i can still use .dropna() on the dataframe? does python have a built in nan data type? my intuition tells me that numpy.nan is different but idk
@fathom raptor yes

versed violet Aug 26, 2020, 2:21 AM

#

@hasty grail it tells me I'm in a "Cool Down" expect i've never opened a help channel

fathom raptor Aug 26, 2020, 2:21 AM

#

yes to python having a builtin nan?

velvet thorn Aug 26, 2020, 2:22 AM

#

to both

#

for the latter, float('nan')

hasty grail Aug 26, 2020, 2:22 AM

#

hmm not sure what that means, maybe one of the helpers/mods can elaborate?

versed violet Aug 26, 2020, 2:24 AM

#

📎 unknown.png

hasty grail Aug 26, 2020, 2:24 AM

#

yeah that is strange...

#

But to answer your question, a simple way would be just to read the entire file and put the contents into a list

#

Edit the list

#

then overwrite the file with the contents of the new list

versed violet Aug 26, 2020, 2:25 AM

#

Oh i see, and to write a loop to add the numbers right ?

hasty grail Aug 26, 2020, 2:26 AM

#

yes

versed violet Aug 26, 2020, 2:27 AM

#

That's where i have a problem, I can't really see how i can right the loop, like do I do it with a count or with a len ?

hasty grail Aug 26, 2020, 2:28 AM

#

you can use enumerate()

fathom raptor Aug 26, 2020, 2:28 AM

#

wait idk if this is a datascience question or just a noob programming question, but how come both of these syntaxes work?

📎 unknown.png

hasty grail Aug 26, 2020, 2:28 AM

#

!eval

for i, v in enumerate(['a', 'b', 'c']):
    print(i, v)

arctic wedgeBOT Aug 26, 2020, 2:28 AM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

hasty grail Aug 26, 2020, 2:29 AM

#

This is why being in a help channel would be helpful...

#

you can take a look at the output in #bot-commands

#

@fathom raptor It's analogous to calling class instance methods. You can call it by <obj>.<func> or <func>(<obj>)

versed violet Aug 26, 2020, 2:33 AM

#

This is why being in a help channel would be helpful...
@hasty grail I see yeah, thanks a lot I will try with what you gave me until now and waint for this "cooldown" to finish, thanks a lot for the help !

velvet thorn Aug 26, 2020, 2:40 AM

#

wait idk if this is a datascience question or just a noob programming question, but how come both of these syntaxes work?
@fathom raptor the latter is more idiomatic

graceful glacier Aug 26, 2020, 3:37 AM

#

can anyone whos familiar with pandas tell me the difference between .nunique and .value_counts?

velvet thorn Aug 26, 2020, 3:40 AM

#

can anyone whos familiar with pandas tell me the difference between .nunique and .value_counts?
@graceful glacier have you tried calling both of them on the same data

#

the difference should be quite apparent

graceful glacier Aug 26, 2020, 3:47 AM

#

ok i got it after looking it up. just starting out with pandas so some of the concepts blur together for me

junior quest Aug 26, 2020, 4:02 AM

#

is it even possible to save a matplotlib animation as a html file?

crude karma Aug 26, 2020, 5:13 AM

#

is the variable "axis" a built in variable in numpy???

bitter harbor Aug 26, 2020, 5:24 AM

#

idk about axis, but row and column are

#

like:

for row in array:```

#

but axis is a argument for most np functions

#

📎 unknown.png

#

ig it is

crude karma Aug 26, 2020, 5:44 AM

#

but how does python recognize axis

#

if axis = 0

flat quest Aug 26, 2020, 6:03 AM

#

probably through some properties but I'm not completely sure

bleak fox Aug 26, 2020, 6:11 AM

#

but how does python recognize axis
@crude karma 0 R and 1 as columb

hasty grail Aug 26, 2020, 6:13 AM

#

is the variable "axis" a built in variable in numpy???
I don't understand what you mean by "built in"

solid aurora Aug 26, 2020, 6:34 AM

#

ok i'm super sleep deprived but is there a clean, elegant way of iterating through "square sections" in a numpy 2d array?

#

i.e. if n=2 I would want to look at the four "quarters" of the array:

>>> squares( np.reshape(np.arange(16), (4, 4)), n=2)
np.array([
  [[ 0, 1],
   [ 4, 5]],
  [[ 2, 3],
   [ 6, 7]],
  [[ 8, 9],
   [12,13]],
  [[10,11],
   [14,15]]
])```

#

there doesn't happen to be a built-in numpy way of doing this, right?

#

I just have to use slices?

hasty grail Aug 26, 2020, 6:41 AM

#

I think there is a function for that

#

let me see..

#

ok no, apparently this is one of the things Tensorflow has but NumPy doesn't -_-

#

https://www.tensorflow.org/api_docs/python/tf/nn/with_space_to_batch?hl=en

crude karma Aug 26, 2020, 6:45 AM

#

like

#

how does axis know its 0 for row and 1 for column

#

you can name anything other than axis and have it assign 0 for row and 1 for column right?

velvet thorn Aug 26, 2020, 6:52 AM

#

how does axis know its 0 for row and 1 for column
@crude karma convention

#

by default, axis 0 is rows

crude karma Aug 26, 2020, 6:53 AM

#

convention?

velvet thorn Aug 26, 2020, 6:54 AM

#

yes, convention

#

is something about that unclear?

solid aurora Aug 26, 2020, 6:54 AM

#

@crude karma axis 0 is the outermost axis

#

i.e. you enter 0 lists before hitting the 0th axis

#

axis 1 is the second-most outermost axis

#

you must enter one list before you hit the 1st axis list

velvet thorn Aug 26, 2020, 6:55 AM

#

only under C-order

solid aurora Aug 26, 2020, 6:56 AM

#

true

velvet thorn Aug 26, 2020, 6:56 AM

#

but well

#

I've never seen F-order being used

solid aurora Aug 26, 2020, 6:56 AM

#

actually, how does numpy store arrays internally?

#

C-order?

velvet thorn Aug 26, 2020, 6:56 AM

#

no

#

that's what the

#

uh

#

order argument is for

solid aurora Aug 26, 2020, 6:56 AM

#

ah

#

what's the default?

velvet thorn Aug 26, 2020, 6:57 AM

#

it physically changes the memory layout

#

'C'

crude karma Aug 26, 2020, 6:57 AM

#

ah

velvet thorn Aug 26, 2020, 6:57 AM

#

which is why, if you change the order, you notice that the speed of iteration across specific axes changes

#

since memory contiguity is actually affected

solid aurora Aug 26, 2020, 6:57 AM

#

mmhm

#

purely for research purposes, somebody could probably make a wrapper that lets you specify a custom axis order to persist on disk

#

it would probably just permute the axises it passes to numpy

#

and use numpy's C-order internally

#

anyway @hasty grail why is that implemented in tensorflow? lol

#

seems like something they should have contributed to numpy

hasty grail Aug 26, 2020, 6:59 AM

#

It can probably be implemented via a combination of NumPy ops

#

https://stackoverflow.com/questions/44357970/how-to-implement-tf-space-to-depth-with-numpy

#

Take a look at this

lapis sequoia Aug 26, 2020, 7:24 AM

#

UnimplementedError: 2 root error(s) found.
(0) Unimplemented: {{function_node __inference_train_function_16567}} File system scheme '[local]' not implemented (file: '../input/birdsong-resampled-train-audio-03/redcro/XC143214.wav')
[[{{node ReadFile}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNextAsOptional_5]]
(1) Unimplemented: {{function_node __inference_train_function_16567}} File system scheme '[local]' not implemented (file: '../input/birdsong-resampled-train-audio-03/purfin/XC171695.wav')
[[{{node ReadFile}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNextAsOptional_3]]
0 successful operations.
7 derived errors ignored.

#

whats this error?

hasty grail Aug 26, 2020, 7:28 AM

#

looks like you're running a TF model with distributed training?

#

or using a tf.data.Dataset object

#

in any case you need to provide more info than that for us to help, such as your actual code

lapis sequoia Aug 26, 2020, 7:34 AM

#

I'm using a tf.data.Dataset object

#

wait I'll make a pastebin link

arctic wedgeBOT Aug 26, 2020, 7:40 AM

#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia Aug 26, 2020, 7:41 AM

#

:p

hasty grail Aug 26, 2020, 7:45 AM

#

does pastebin not work?

jovial lotus Aug 26, 2020, 8:01 AM

#

hey all, I need help figuring out where to go in my project. Basically I have gathered a TON of data of "if x then y's" and have a bunch of probabilities regarding such. I need help figuring out how to pick the next candidates given a set of x values.

lapis sequoia Aug 26, 2020, 8:01 AM

#

does pastebin not work?
@hasty grail https://notepad.pw/47i2f3b1

#

hey all, I need help figuring out where to go in my project. Basically I have gathered a TON of data of "if x then y's" and have a bunch of probabilities regarding such. I need help figuring out how to pick the next candidates given a set of x values.
linear regression?

jovial lotus Aug 26, 2020, 8:03 AM

#

I think so, thats what I found when I googled it. Guess I'll have to look back into my AI homework 😫

lapis sequoia Aug 26, 2020, 8:04 AM

#

I'm assuming y's are continuous labels and x are features and you want to predict a y for x?

jovial lotus Aug 26, 2020, 8:04 AM

#

@lapis sequoia do you have any experience with it? Could I dm you?

hasty grail Aug 26, 2020, 8:04 AM

#

Oh, you're using TPU strategy to train the model

lapis sequoia Aug 26, 2020, 8:04 AM

#

yesss is that causing this

#

I didnt think it is

#

@jovial lotus sure

hasty grail Aug 26, 2020, 8:05 AM

#

is your model_dir a GCS bucket?

#

It has to be a GCS bucket from what I have read

lapis sequoia Aug 26, 2020, 8:06 AM

#

Nope it's on kaggle

#

working directory

hasty grail Aug 26, 2020, 8:08 AM

#

don't use TPU training then

lapis sequoia Aug 26, 2020, 8:09 AM

#

Hmm I've used TPU on kaggle before in the same way. Let me remove it and try

lapis sequoia Aug 26, 2020, 9:10 AM

#

@hasty grail OOM error with GPU xD

hasty grail Aug 26, 2020, 9:11 AM

#

umm how many params does the model have?

lapis sequoia Aug 26, 2020, 9:16 AM

#

350,000

hasty grail Aug 26, 2020, 9:17 AM

#

what does nvidia-smi return?

lapis sequoia Aug 26, 2020, 9:19 AM

#

worked

#

i reduced params

hasty grail Aug 26, 2020, 9:19 AM

#

what worked?

#

your GPU must be tiny

#

350k params is not that much in terms of GPU usage unless you got a bunch of attention layers...

lapis sequoia Aug 26, 2020, 9:20 AM

#

also removed Residual connections

#

not what i wanted but welp something to build on

hasty grail Aug 26, 2020, 9:20 AM

#

wait I see you are doing some signal transformations

#

so that's probably what it ate up so much GPU

lapis sequoia Aug 26, 2020, 9:21 AM

#

Kaggle GPU tho yes, I'm using MEL spectogram on audio

#

but it's gonna take 50 hours to train ffs

upbeat ore Aug 26, 2020, 10:30 AM

#

I would like to hear your thoughts on this. First perform DETR object detection if the object is the desired one, e.g. person, do perform face detection from facenet on the found cutout object. would that be efficient in terms of speed?

#

as DETR would be really nice to find other objects as weapons that i would need to perform on the same image as well

hidden halo Aug 26, 2020, 11:08 AM

#

I have a Pandas related question in #help-popcorn. Could someone please take a look.

desert oar Aug 26, 2020, 2:09 PM

#

@hidden halo did you get an answer?

hidden halo Aug 26, 2020, 2:10 PM

#

Not entirely

#

One sec, I'll post it here

#

I have dataframe in Pandas which looks like this:

+--------+------------+---------+-------+
|class   | student_id | subject | score |
+--------+------------+---------+-------+
|4       |           5| Maths   |     56|
|4       |           5| English |     65|
|4       |           6| Maths   |     73|
|4       |           6| English |     78|
+--------+------------+---------+-------+

I want to convert the subject column to column headers, while retaining all other columns as is, like this:

+--------+------------+---------+-------+
|class   | student_id | English | Maths |
+--------+------------+---------+-------+
|4       |           5|       65|     56|
|4       |           6|       78|     73|
+--------+------------+---------+-------+

#

📎 unknown.png

#

I got this. But I can't seem to be able to flatten it. Any ideas
df.set_index(['class', 'id', 'sub']).unstack('sub').reset_index()

desert oar Aug 26, 2020, 2:18 PM

#

oh, yeah thats a thing

#

i think if you do rename with a callable, it receives a tuple

#

let me check

desert oar Aug 26, 2020, 2:45 PM

#

@hidden halo ```python
data = data.pivot(index=['class', 'student_id'], columns='subject').reset_index()

flat_colnames = ['_'.join(filter(None, ctup)) for ctup in data.columns.to_flat_index()]
data.columns = flat_colnames

print(data)

#

.to_flat_index() seems to be the trick you were missing

paper niche Aug 26, 2020, 2:57 PM

#

maybe I'm missing something, but couldn't you just have selected 'score' before resetting the index? as in df.set_index(['class', 'id', 'sub']).unstack('sub')['score'].reset_index()

#

also, pretty interesting point about .to_flat_index(), didn't know that was a thing

desert oar Aug 26, 2020, 2:57 PM

#

yeah that would work too @paper niche

#

seems more "opaque" though. if i wrote that code i would want to leave a comment like # selecting the column avoids creating a multi-index

paper niche Aug 26, 2020, 2:58 PM

#

sure, I agree. I like the pivot table solution better, I think the intention is clearer than setting index and unstacking

desert oar Aug 26, 2020, 2:59 PM

#

i do wish pivot wouldn't mess w/ the index though

#

e.g. if you already have a meaningful index you then have to keep track of the indexes to reset

#

it gets messy

hidden halo Aug 26, 2020, 2:59 PM

#

data = data.pivot(index=['class', 'student_id'], columns='subject').reset_index()
This did not work for me, it threw an error saying Length mismatch: Expected 4 rows, received array of length 2
However, I tried the second part with my unstack method and that did the trick. Thanks a lot.
Now I'll go try to unpack what happened in that line.

desert oar Aug 26, 2020, 3:00 PM

#

@hidden halo this was my whole script, maybe your real data is different

import io
from operator import methodcaller

import pandas as pd


data_txt = '''
class   | student_id | subject | score
4       |           5| Maths   |     56
4       |           5| English |     65
4       |           6| Maths   |     73
4       |           6| English |     78
'''

data = pd.read_csv(io.StringIO(data_txt), sep='|') \
    .rename(columns=methodcaller('strip'))
data['subject'] = data['subject'].str.strip()

data1 = data.pivot(index=['class', 'student_id'], columns='subject').reset_index()
flat_colnames = ['_'.join(filter(None, ctup)) for ctup in data1.columns.to_flat_index()]
data1.columns = flat_colnames
print(data1)

hidden halo Aug 26, 2020, 3:00 PM

#

maybe I'm missing something, but couldn't you just have selected 'score' before resetting the index? as in df.set_index(['class', 'id', 'sub']).unstack('sub')['score'].reset_index()
@paper niche And apparently this does the trick too, in a much simpler manner as well.
Thanks a lot fickletofu

desert oar Aug 26, 2020, 3:01 PM

#

i would definitely leave comments explaining what this is doing

#

if i had to read that code id be confused

#

and also w/ fickle's method you have to prepend subject_ to the unstacked column names

paper niche Aug 26, 2020, 3:02 PM

#

yeah if you'ld like to add a prefix to the pivoted columns, just go with salt rock lamp's solution

desert oar Aug 26, 2020, 3:03 PM

#

oooh wait

hidden halo Aug 26, 2020, 3:03 PM

#

Actually, this works fine for my usecase

📎 unknown.png

desert oar Aug 26, 2020, 3:03 PM

#

fickle's method names the axis

#

that's kind of nice

#

unstack is nice because it names the column index itself subject which i think is cool

hidden halo Aug 26, 2020, 3:05 PM

#

I haven't worked with multi-index data frames much, so I find this very confusing. Still trying to figure out what fickle's method does exactly by running it step by step

desert oar Aug 26, 2020, 3:06 PM

#

@hidden halo ~~selecting score selects a Series from the dataframe~~

#

~~so when you reset_index, that promotes the Series to a DataFrame with flat column names~~

#

rather, unstack creates a DataFrame with a multiindex column

#

the "outer" layer of the column axis has a score label

#

selecting that gives you just the "inner" layer, which is a DataFrame with a non-multi column index

#

then you reset_index on that, and the index "columns" become regular DataFrame columns

#

https://repl.it/@maximum__/flatten-multiindex-colnames#main.py

repl.it

maximum__

flatten multiindex colnames

A Python repl by maximum__

#

comparison of both methods

hidden halo Aug 26, 2020, 3:08 PM

#

@hidden halo selecting score selects a Series from the dataframe
@desert oar Yes, I get this now. I was not able to understand what was there inside score. After looking at it in multiple ways, now I figured

desert oar Aug 26, 2020, 3:09 PM

#

i was actually wrong in those first 2 lines 😅

#

look at the next

#

i crossed out the wrong parts

hidden halo Aug 26, 2020, 3:10 PM

#

Yeah, it makes sense. It's not very clear yet, I guess that will take a little more working with multi-index DFs for this seem familiar. But I get the general idea.

#

One more question actually, I'm not able to get rid of the first column, which is basically the index, titled sub. I don't want the sub there, but reset index doesn't remove it.

#

It doesn't change at all

📎 unknown.png

desert oar Aug 26, 2020, 3:18 PM

#

that isnt the first column

#

that's the name of the column index

#

which is what i was saying before

#

see the example repl i posted? you need to .rename_axis(columns=None)

#

if you do df2.columns you will see that the result is an Index object with name='sub'

#

this is an artifact of selecting a single key from MultiIndex columns

hidden halo Aug 26, 2020, 3:21 PM

#

Oh

#

Got it, working now. Thanks. Will probably take some time till I fully understand these methods.

lapis sequoia Aug 26, 2020, 3:55 PM

#

Hi all. I have a question about the pandas module. I'm trying to delete rows from a series, but it looks like the drop() documentation only allows me to do this by making a completely new series. Is there a way to edit the current series I have? Because trying to do this in iterations runs into nightmare keyerrors, because I can't simply write series = series.drop([2])

#

I'm essentially asking if there's something in pandas that is the equivalent of .append or .remove in python's lists

solar bluff Aug 26, 2020, 4:00 PM

#

.drop() has an inplace argument if you need to modify the series in place.

#

There is also a .append() method on pandas series that will let you effectively concatenate one series to another.

versed violet Aug 26, 2020, 4:01 PM

#

@solar bluff are you good with pandas ?

solar bluff Aug 26, 2020, 4:02 PM

#

I use it every day in my job. I'm no world class expert or anything but I get by

desert oar Aug 26, 2020, 4:02 PM

#

@lapis sequoia note that drop drops by row label, not by numerical position

lapis sequoia Aug 26, 2020, 4:03 PM

#

Well I'm currently trying to add the inplace=True argument, but now printing out the series is printing "None"

desert oar Aug 26, 2020, 4:03 PM

#

df.drop(index=[2], inplace=True) might be the 100th row, or it might even be multiple rows, with the label2

#

inplace=True makes .drop return None

lapis sequoia Aug 26, 2020, 4:03 PM

#

Oh, is there a way to drop by numerical position? the documentation is confusing to me

versed violet Aug 26, 2020, 4:04 PM

#

@solar bluff Yesterday @hasty grail helped me a lot to right a code that goes through every line of a csv and add a value depending of which season the line is, right now i have a little bug with the code and i can't find the error https://repl.it/repls/FamiliarMinorBases#main.py

repl.it

FamiliarMinorBases

A Python repl created by an anonymous user

solar bluff Aug 26, 2020, 4:04 PM

#

"inplace bool, default False

If True, do operation inplace and return None." sure enough

lapis sequoia Aug 26, 2020, 4:04 PM

#

I was doing some testing and series.drop([2]) seemed to remove the 3rd column (as I would expect)

desert oar Aug 26, 2020, 4:04 PM

#

import pandas as pd

s = pd.Series(list('abcdefghijklmnop'))

pos = [2]
s.drop(index=s.index[pos], inplace=True)
print(s)

#

wait, columns?

#

or rows

lapis sequoia Aug 26, 2020, 4:05 PM

#

Row

desert oar Aug 26, 2020, 4:05 PM

#

it works if the row labels happen to be the same as the row numbers

#

which is only true sometimes or by default

solar bluff Aug 26, 2020, 4:05 PM

#

I pretty much always avoid inplace so I'm not very well skilled with using that as an argument

lapis sequoia Aug 26, 2020, 4:05 PM

#

In that case, is there no way to essentially remove a row without having to make a new variable?

desert oar Aug 26, 2020, 4:05 PM

#

i just showed you

lapis sequoia Aug 26, 2020, 4:05 PM

#

Because again this creates so many keyerror problems

desert oar Aug 26, 2020, 4:05 PM

#

the keyerror problems have nothing to do with creating a new variable

#

the keyerror has to do with you confusing row numbers and row labels

#

s.drop(index=s.index[2], inplace=True)

should work

#

s.index gives you the row labels

#

so you can index that to get the relevant labels

#

then drop using that

#

also note that pandas doesn't make a copy of all the data even when you copy the Series

#

however if you are 100% sure that your row labels and row numbers are identical then you can just use drop(index=[2])

#

but if you for example do .sort_values() then the row labels will be out of order because the row labels stay attached to the rows

#

and then you'd have to .reset_index() to remove the out-of-order index and create a new correctly ordered index

lapis sequoia Aug 26, 2020, 4:08 PM

#

So what's the point of returning none?

desert oar Aug 26, 2020, 4:08 PM

#

because it operates in-place

#

list.append also returns None

lapis sequoia Aug 26, 2020, 4:08 PM

#

Ah I see

#

Okay, that seems to work. And evidently I need to read up on how pandas defines indexes and labels because I'm getting confused

desert oar Aug 26, 2020, 4:09 PM

#

im using the term "labels" loosely

#

a DataFrame has two "axes": the index (i.e. row labels) and the columns (i.e. column labels)

#

each "axis" is represented by an Index object, which is similar to but not the same as a Series

#

an Index has a dtype and can contain strings, numbers, dates, etc.

#

and you can do row and/or column lookups on DataFrames using the Index values

#

the .loc accessor does index lookups. the .iloc accessor does positional lookups

#

if you create a DataFrame and don't specify the index, you get a default RangeIndex which is just 1:1 with the row numbers

lapis sequoia Aug 26, 2020, 4:12 PM

#

So an index is not always numerical?

desert oar Aug 26, 2020, 4:12 PM

#

correct

#

data = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]}, index=list('abc'))

lapis sequoia Aug 26, 2020, 4:12 PM

#

I see

#

So s.drop(index=s.index[2], inplace=True) seems to work on its own, but when doing it in an iteration I still seem to be getting keyerrors

desert oar Aug 26, 2020, 4:14 PM

#

keyerror happens if the index value is missing

lapis sequoia Aug 26, 2020, 4:14 PM

#

So I'm trying to drop a row at row index value x, but it can't find x

#

Either because it's out of bounds or there's nothing there

#

Oh, I think it's happening because when you remove something from a series, no index values change

solar bluff Aug 26, 2020, 4:15 PM

#

"hey pandas, do this thing at this location"

#

pandas: "that location? what location? i don't see no location like that. KeyError"

desert oar Aug 26, 2020, 4:15 PM

#

thats correct @lapis sequoia

#

when you remove the a row from the dataframe in my example, there is no longer an a row

lapis sequoia Aug 26, 2020, 4:16 PM

#

Yeah, which I believe is something .remove would automatically take care of in python

#

I suppose I could find a way to work around that though

desert oar Aug 26, 2020, 4:16 PM

#

im not sure what you mean

#

can you show more of your code

#

what are you even trying to do

lapis sequoia Aug 26, 2020, 4:17 PM

#

Well, in a list, say [a, b, c]. if you do list.remove[2], list would be [a, b], with the indexes adjusting automatically

desert oar Aug 26, 2020, 4:17 PM

#

thats because the indexes are positional in a list

#

a DataFrame is more like an OrderedDict

lapis sequoia Aug 26, 2020, 4:17 PM

#

it's on another computer, but let me write something out to make it easier

desert oar Aug 26, 2020, 4:17 PM

#

it has both positional indexes and named indexes, i.e. keys

#

the keys in a dict don't adjust when you delete an entry

#

if you want to keep the row labels in sync with the row positions you need to .reset_index(drop=True) after every row deletion

lapis sequoia Aug 26, 2020, 4:19 PM

#

        for j in range(0, products_b):
            if products_b[j] == products_a[i]:
                deleted_indicies.append(i)
                products_b.drop(index=products_b.index[j], inplace=True)
                break

desert oar Aug 26, 2020, 4:19 PM

#

products_b and products_a are series objects? and the indexes are unique?

lapis sequoia Aug 26, 2020, 4:20 PM

#

they are both series objects, yes

#

As for the indices, I've just been using numerical positions

desert oar Aug 26, 2020, 4:21 PM

#

as in, you never call set_index on these right? and you never otherwise explicitly specified an index?

lapis sequoia Aug 26, 2020, 4:21 PM

#

right

desert oar Aug 26, 2020, 4:22 PM

#

deleted_indices = []
for i_a, val_a in products_a.items():
        for i_b, val_b in products_b.items():
            if val_a == val_b:
                deleted_indicies.append(i_b)
                products_b.drop(index=i_b, inplace=True)
                break

does this work?

lapis sequoia Aug 26, 2020, 4:22 PM

#

that's what giving the keyerror

desert oar Aug 26, 2020, 4:22 PM

#

oh?

#

oh i see

lapis sequoia Aug 26, 2020, 4:22 PM

#

even after adding the reset_index line

desert oar Aug 26, 2020, 4:22 PM

#

what if the key was deleted in a previous iteration?

lapis sequoia Aug 26, 2020, 4:23 PM

#

Yeah, thats the problem

desert oar Aug 26, 2020, 4:23 PM

#

you shouldnt modify something you're iterating over

#

you want to just delete the elements from b where they occur in a?

lapis sequoia Aug 26, 2020, 4:23 PM

#

Yeah, I'm trying to delete the elements from b when they occur in a so that the iteration doesn't take as long

desert oar Aug 26, 2020, 4:23 PM

#

for i, val in products_b[~products_b.isin(products_a)].items():
    # do something

#

but you should always question iterating manually over a Series

#

usually you can go a lot faster by using .map or .apply

lapis sequoia Aug 26, 2020, 4:26 PM

#

I was wondering if there were better ways, because this is taking a very long time. the series are both pretty long

#

I'll look into map and apply

desert oar Aug 26, 2020, 4:26 PM

#

iterating over pandas series is very slow

#

compared to iterating over a list

#

what are you trying to do more generally?

lapis sequoia Aug 26, 2020, 4:28 PM

#

Basically I'm taking a master list of items, comparing them with another list of discontinued items

#

Ideally all of the discontinued items will be in the masterlist. If that's true it should be relatively simple to remove the discontinued items from the masterlist

#

Iteration like this was just the first thing that came to me

#

it's just going to take a long time when the masterlist has about 75k entries

desert oar Aug 26, 2020, 4:30 PM

#

yeah just use .isin

#

products_current = products_master[~products_master.isin(products_discontinued)]

lapis sequoia Aug 26, 2020, 4:32 PM

#

yeah I figured I was overthinking something to the very bone

#

I suppose that line would imply iteration as well, though?

desert oar Aug 26, 2020, 4:32 PM

#

in C, internally.

#

not in python

lapis sequoia Aug 26, 2020, 4:32 PM

#

Ah, i see

desert oar Aug 26, 2020, 4:33 PM

#

although it is a shame how slow iteration over a Series is

#

but thats a bigger design issue

lapis sequoia Aug 26, 2020, 4:34 PM

#

So I'm looking at the .isin documentation now. Once you get the series of booleans I suppose you can just filter out all of the "True"s

#

I would imagine with the drop() function

#

Oh you can't actually pass a series object into .isin(). Guess I'll have to convert products_b to a list

desert oar Aug 26, 2020, 4:36 PM

#

eh?

#

im not sure what you mean

#

pd.Series([1,2,3]).isin(pd.Series([1,2,4]))

lapis sequoia Aug 26, 2020, 4:36 PM

#

products_discontinued is a series itself

#

Oh, you can

#

the documentation says it only accepts a set or list-like

desert oar Aug 26, 2020, 4:36 PM

#

pd.testing.assert_series_equal(
    pd.Series([1,2,3]).isin(pd.Series([1,2,4,7,-5])),
    pd.Series([True, True, False])
)

lapis sequoia Aug 26, 2020, 4:36 PM

#

does that include a series obejct?

desert oar Aug 26, 2020, 4:37 PM

#

yes, pandas docs do a poor job of defining their terms

lapis sequoia Aug 26, 2020, 4:37 PM

#

aha, it's simple then

desert oar Aug 26, 2020, 4:37 PM

#

a "list-like" is a list, pandas series, numpy array, and a handful of other things

#

and ~ is logical negation on a Series

#

so

pd.Series([1,2,3], index=['a', 'b', 'c']).isin(pd.Series([1,2,4,7,-5]))

is

pd.Series([True, True, False], index=['a', 'b', 'c'])

and therefore

~pd.Series([1,2,3], index=['a', 'b', 'c']).isin(pd.Series([1,2,4,7,-5]))

is

pd.Series([True, True, False], index=['a', 'b', 'c'])

#

itd be nice if there was a .notin method for efficiency but this is still a lot faster than iterating

lapis sequoia Aug 26, 2020, 4:44 PM

#

So my thought is that once I have the series of booleans, I could use the .iloc method to return the indices of all "True"s

desert oar Aug 26, 2020, 4:44 PM

#

you dont need that

lapis sequoia Aug 26, 2020, 4:44 PM

#

Oh, that's not true

desert oar Aug 26, 2020, 4:44 PM

#

you can index/subset a series with a boolean series

#

again:

products_current = products_master[~products_master.isin(products_discontinued)]

lapis sequoia Aug 26, 2020, 4:47 PM

#

So is the ~ operator specific to pandas? ive never seen it before

desert oar Aug 26, 2020, 4:47 PM

#

no, the ~ is binary inversion

tidal bough Aug 26, 2020, 4:47 PM

#

It's the bitwise NOT operator. pandas overloads it to work elementwise on Series.

desert oar Aug 26, 2020, 4:47 PM

#

yeah

#

bitwise NOT, thats what its called

lapis sequoia Aug 26, 2020, 4:48 PM

#

okay, so if i wanted to keep an archives of all the discontinued products i can just delete the ~

desert oar Aug 26, 2020, 4:48 PM

#

well the discontinued products are already in their own series..

lapis sequoia Aug 26, 2020, 4:48 PM

#

oh, getting carried away there

desert oar Aug 26, 2020, 4:49 PM

#

unless you want the intersection of products_master and products_discontinued, in which case yes

#

that said... what format does this data originally arrive in?

lapis sequoia Aug 26, 2020, 4:49 PM

#

Excel files

desert oar Aug 26, 2020, 4:49 PM

#

(note that in python, custom classes can override the behavior of various operators including +, -, /, *, &, |, ~)

#

sure, but they are in 2 different files?

#

or different sheets

lapis sequoia Aug 26, 2020, 4:50 PM

#

different files

desert oar Aug 26, 2020, 4:50 PM

#

the discontinued list and the master list

#

ok