#data-science-and-ml | Python | Page 233

desert oar Jul 11, 2020, 10:20 PM

#

you're solving the problem of "how do i select the rows i want"

tardy portal Jul 11, 2020, 10:21 PM

#

alright so since running the contains method, can't I just assign values to the column that if its either true or false?

desert oar Jul 11, 2020, 10:21 PM

#

show me what you mean

#

!rules 6

arctic wedgeBOT Jul 11, 2020, 10:21 PM

#

Rules

6. No spamming or unapproved advertising, including requests for paid work. Open-source projects can be showcased in #show-your-projects.

desert oar Jul 11, 2020, 10:21 PM

#

@lapis sequoia

tardy portal Jul 11, 2020, 10:21 PM

#

i'm running into an issue, but something along the lines of this: df_churn[(df_churn['AutoPayment'] == 'False'), 'AutoPayment'] = 0

desert oar Jul 11, 2020, 10:21 PM

#

eh?

#

what

#

back up

#

show me your whole code

tardy portal Jul 11, 2020, 10:22 PM

#


df_churn[(df_churn['AutoPayment'] == 'False'), 'AutoPayment'] = 0
df_churn[(df_churn['AutoPayment'] == 'True'), 'AutoPayment'] = 1```

desert oar Jul 11, 2020, 10:22 PM

#

oh boy

#

no

tardy portal Jul 11, 2020, 10:22 PM

#

hahaha

desert oar Jul 11, 2020, 10:22 PM

#

df_churn['AutoPayment'].str.contains('automatic') returns a new series

tardy portal Jul 11, 2020, 10:22 PM

#

yikes department?

desert oar Jul 11, 2020, 10:22 PM

#

almost nothing in pandas operates "in place" like that

#

at least without specifically requesting it

#

in fact almost nothing in python operates like that

#

well ok some methods on collections do

#

like list.append et al

tardy portal Jul 11, 2020, 10:24 PM

#

alrighty, let me see if I can figure this out, but there has to be a way to write it where the false and true values return 1 and 0

desert oar Jul 11, 2020, 10:24 PM

#

use .loc

#

remember

#

.loc allows you to select rows w/ a boolean series

#

right?

#

.str.contains returns... a boolean series

#

i'll give you a hint because i don't want you to spend all day on this: make a column of 0s and then assign 1s to the rows that meet your desired condition

tardy portal Jul 11, 2020, 10:25 PM

#

so then I should use the .str.find('automatic')

desert oar Jul 11, 2020, 10:25 PM

#

no

#

use like

#

90% less brain

#

be less clever

#

more stupid

tardy portal Jul 11, 2020, 10:25 PM

#

hahaha

#

if I could buy you a beer I would buy you one

desert oar Jul 11, 2020, 10:25 PM

#

start with my hint

#

go from there

#

hah that's ok i have some ciders here

tardy portal Jul 11, 2020, 10:39 PM

#

should I convert the new column to numeric?

desert oar Jul 11, 2020, 10:43 PM

#

no

#

again

#

more simple

#

make a new column of 0s

#

subset the rows that meet your desired condition

#

assign 1 to those rows

#

that's it

tardy portal Jul 11, 2020, 10:43 PM

#

am I able to do that without creating a new column?

desert oar Jul 11, 2020, 10:43 PM

#

make a new column of 0s

tardy portal Jul 11, 2020, 10:44 PM

#

the question only requests only 1 column to be added thats why I ask

desert oar Jul 11, 2020, 10:44 PM

#

?

#

make a new columns of 0s, and replace the data with 1s where you need it to be 1

#

yes you are creating exactly 1 new column in the dataframe

tardy portal Jul 11, 2020, 10:45 PM

#

which I already did which contains the values of True or False

desert oar Jul 11, 2020, 10:45 PM

#

you don't need to do that

#

you know how to use .loc with a series of True and False values right?

tardy portal Jul 11, 2020, 10:47 PM

#

df_churn.loc[df_churn['AutoPayment'] == True, 'AutoPayment']```

#

correct or no?

desert oar Jul 11, 2020, 10:47 PM

#

you're doing too much still

#

back up

#

and follow the steps i provided for you

#

you just have df_churn and the 'PaymentMethod' column

#

make me a new column called 'AutoPayement' that contains only 0s

#

just show me how to do that

tardy portal Jul 11, 2020, 10:49 PM

#

okay, ill drop the autopayment column that already exists

#

and recreate it

desert oar Jul 11, 2020, 10:49 PM

#

yes

#

this shouldn't be difficult, again don't think so hard

#

literally just make a new column in a dataframe

#

with any name

#

full of 0s

#

show me how to do that, 1 line

tardy portal Jul 11, 2020, 10:55 PM

#

df_churn['AutoPayment'] = 0

desert oar Jul 11, 2020, 10:55 PM

#

great

#

so

#

now

#

some of those values need to be 1s

#

right now they are all 0s

#

how do you do that, with .loc

tardy portal Jul 11, 2020, 10:56 PM

#

df_churn.loc[df_churn['AutoPayment'] == 0]

desert oar Jul 11, 2020, 10:56 PM

#

what do you think that code does

#

hint: you aren't assigning anything in that code

tardy portal Jul 11, 2020, 10:56 PM

#

its locating all the values of 0 within that column?

desert oar Jul 11, 2020, 10:56 PM

#

sure

#

but we already know that's literally every value

#

so df_churn['AutoPayment'] == 0 is just a Series containing only True

#

which isn't helpful

tardy portal Jul 11, 2020, 10:57 PM

#

Okay

desert oar Jul 11, 2020, 10:57 PM

#

df_churn['AutoPayment'] = 0
df_churn.loc[a_series_containing_boolean_values, 'AutoPayment'] = 1

#

does this make sense to you

#

again, way simpler than you're trying to make it out to be

tardy portal Jul 11, 2020, 10:59 PM

#

so df_churn.loc[df_churn['PaymentMethod'].str.contains('automatic'0, 'AutoPayment'] = 1?

desert oar Jul 11, 2020, 10:59 PM

#

fix your typo 🙂

tardy portal Jul 11, 2020, 11:00 PM

#

df_churn.loc[df_churn['PaymentMethod'].str.contains('automatic'), 'AutoPayment'] = 1

desert oar Jul 11, 2020, 11:01 PM

#

great

#

perfect

#

does that make sense to you now?

tardy portal Jul 11, 2020, 11:01 PM

#

yes, but I didn't know you could utilize the .loc method that way

desert oar Jul 11, 2020, 11:01 PM

#

ah

#

yeah

#

i have no idea how your instructor expected you to do this without knowing anything useful about pandas

#

hopefully that helps with your other assignments..

tardy portal Jul 11, 2020, 11:02 PM

#

the thing is everything is online, so she uploads videos per module, but does not go into depth

desert oar Jul 11, 2020, 11:02 PM

#

i see

#

that's unfortunate

tardy portal Jul 11, 2020, 11:02 PM

#

which is annoying because i'm paying over 1k for this class

slim fox Jul 11, 2020, 11:02 PM

#

drawbacks of online classes

desert oar Jul 11, 2020, 11:02 PM

#

yeah :/

tardy portal Jul 11, 2020, 11:03 PM

#

3 hecking credits

desert oar Jul 11, 2020, 11:03 PM

#

that said a lot of instructors would just do the same in lecture

#

so at least you get to do it at home in your pajamas or w/e

slim fox Jul 11, 2020, 11:03 PM

#

this

tardy portal Jul 11, 2020, 11:03 PM

#

you should change your name to 'salt rock champ'

desert oar Jul 11, 2020, 11:03 PM

#

sorry that took so long but hopefully it was educational

slim fox Jul 11, 2020, 11:03 PM

#

we had few professors just reading their slides in monotonous voice

desert oar Jul 11, 2020, 11:03 PM

#

ugh thats the worst @slim fox

#

i prefer being a lamp but i appreciate the sentiment

slim fox Jul 11, 2020, 11:04 PM

#

it is. we literally slept few lectures

#

and just went to the park few times

#

cause.... you miss nothing

#

as he gives .ppts

#

well. gave back then

#

considering his age he might no longer teach

tardy portal Jul 11, 2020, 11:07 PM

#

soo...I have one more question and ill be out of your guys' head for today

#

@desert oar if that's okay?

#

its in relation to the previous question

desert oar Jul 11, 2020, 11:09 PM

#

yeah just ask

#

if i cant help lossberg or someone else probably can

tardy portal Jul 11, 2020, 11:10 PM

#

alrighty

#

uno momento

#

is there a .churn method?

desert oar Jul 11, 2020, 11:11 PM

#

what would you want such a method to do

tardy portal Jul 11, 2020, 11:12 PM

#

📎 unknown.png

#

df_churn.loc[df_churn['Churn'] != 'Yes', 'AutoPayment'].agg()```

#

am I on the right track here?

lapis sequoia Jul 11, 2020, 11:23 PM

#

hi

#

Im making a bar graph in matplotlib, but my xaxis it too squished, how could i fix this?

desert oar Jul 11, 2020, 11:24 PM

#

@tardy portal

#

no

#

.sum()

tardy portal Jul 11, 2020, 11:27 PM

#

so should I find the sum and then find the percentage?

desert oar Jul 11, 2020, 11:27 PM

#

sure

#

or you can do .mean(), that's a nice shortcut

#

but stop and think for a sec about why .mean works

#

before you just do it

tardy portal Jul 11, 2020, 11:28 PM

#

but the thing is its asking for a percentage of two values within a column

desert oar Jul 11, 2020, 11:28 PM

#

eh?

#

you know what, just use sum

tardy portal Jul 11, 2020, 11:28 PM

#

mean is just the average I don't think that helps, from my understanding atleast

desert oar Jul 11, 2020, 11:28 PM

#

yeah

#

it works, trust me

#

but don't use it

#

it's just a math trick

#

use .sum() and divide

#

go with what make sense to you

#

you'll learn tricks later

tardy portal Jul 11, 2020, 11:30 PM

#

i'm trying to figure it out why you would use mean

#

I guess its one step less to take?

desert oar Jul 11, 2020, 11:31 PM

#

yeah. again not important

#

just a shortcut

#

and the point of homework isn't to practice shortcuts

#

i shouldn't have even mentioned it

tardy portal Jul 11, 2020, 11:31 PM

#

making me overthink! lmao

#

jk

desert oar Jul 11, 2020, 11:31 PM

#

yeah exactly

#

was a mistake 🙂

#

i need to listen to my own advice

tardy portal Jul 11, 2020, 11:32 PM

#

let me see if I can figure this out on my own and return the code and see if its correct

#

is there a way to combine both arguments and divide two times?

#

one with yes and one with no?

#

df_churn.loc[(df_churn['Churn'] == 'Yes') & (df_churn['Churn'] 1= 'Yes'), 'AutoPayment'].sum()

#

so that would be the total

#

for some reason that returns 0

desert oar Jul 11, 2020, 11:44 PM

#

pandas and python aren't magic

#

what does (df_churn['Churn'] == 'Yes') & (df_churn['Churn'] != 'Yes') evaluate to?

tardy portal Jul 11, 2020, 11:48 PM

#

0 technically because its combining rows with both values?

desert oar Jul 11, 2020, 11:49 PM

#

df_churn['Churn'] == 'Yes' evaluates to a Series

#

containing True and False values

#

i.e. dtype bool

#

df_churn['Churn'] != 'Yes' also evaluates to a bool Series

#

& does elementwise logical operations on 2 Series objects

#

so (df_churn['Churn'] == 'Yes') & (df_churn['Churn'] != 'Yes') will be a series containing all False

tardy portal Jul 11, 2020, 11:50 PM

#

but within the dataframe under that column it only shows 'yes' and 'no' values

desert oar Jul 11, 2020, 11:50 PM

#

huh?

#

df_churn['Churn'] contains whatever

#

df_churn['Churn'] == 'Yes' evaluates to a new Series

#

containing only True and False

#

that is how == works in pandas

tardy portal Jul 11, 2020, 11:50 PM

#

📎 unknown.png

desert oar Jul 11, 2020, 11:51 PM

#

df_churn['Churn'] == 'Yes' is not the same as df_churn['Churn']

tardy portal Jul 11, 2020, 11:51 PM

#

ahh okay so it should just be '='

desert oar Jul 11, 2020, 11:51 PM

#

no

#

no no no

#

= is assigment

#

python, like most programming languages, consists largely of evaluating sections of code called expressions

#

df_churn['Churn'] == 'Yes' is an expression

#

it's like 3 + 4

#

that's a math expression, which evaluates to some result according to how 3, 4, and + are defined

#

in this case we know + is addition and that 3+4 should evaluate to 7

tardy portal Jul 11, 2020, 11:52 PM

#

so how am I suppose to use the .loc method for both yes and no values for the churn column?

desert oar Jul 11, 2020, 11:53 PM

#

do it separately like you did the first time?

#

do one for yes and one for no

#

or use what you know mathematically about how percentages work

#

either one is valid

tardy portal Jul 11, 2020, 11:53 PM

#

okay, so don't combine them, got it

#

alright so i am on the right track here

desert oar Jul 11, 2020, 11:53 PM

#

yeah it's the same reason i regretted using the .mean trick

#

don't look for a clever elegant solution while you're still learning the basics

tardy portal Jul 11, 2020, 11:54 PM

#

I can't just divide each line of code by 100, that would be silly

#

correct?

desert oar Jul 11, 2020, 11:54 PM

#

that doesn't even make sense, so fortunately you don't have to do it

#

compute the thing you want, in this case a sum

#

plop it into a new variable

#

and go from there

tardy portal Jul 11, 2020, 11:55 PM

#

is there a total method I could pass? similar to .sum?

desert oar Jul 11, 2020, 11:56 PM

#

what do you mean?

#

something that computes "the number of True values in this Series"?

tardy portal Jul 11, 2020, 11:57 PM

#

I want to be able to combine both True and False values to get a total

desert oar Jul 11, 2020, 11:57 PM

#

im still not sure i understand what you're asking

tardy portal Jul 11, 2020, 11:57 PM

#

in order to find the percentage

desert oar Jul 11, 2020, 11:57 PM

#

look at the question you're being asked

#

q 12

#

it's asking something very straightforward and you already posted the answer

tardy portal Jul 11, 2020, 11:58 PM

#

I did not post the answer, I wish I did

desert oar Jul 11, 2020, 11:58 PM

#

state, in plain words, how you'd solve q 12

tardy portal Jul 11, 2020, 11:59 PM

#

yeah take both sums and separately divide by the total

desert oar Jul 11, 2020, 11:59 PM

#

ok

#

so?

#

go ahead and do that

tardy portal Jul 12, 2020, 12:01 AM

#

df_churn.loc[df_churn['Churn'] == 'Yes', 'AutoPayment'] / df_churn['Churn']

desert oar Jul 12, 2020, 12:01 AM

#

explain to me what that code does

tardy portal Jul 12, 2020, 12:04 AM

#

dividing the sum by the total, but that does not work

desert oar Jul 12, 2020, 12:04 AM

#

that's because that isn't what the code does 🙂

#

df_churn.loc[df_churn['Churn'] == 'Yes', 'AutoPayment'] what does this evaluate to?

tardy portal Jul 12, 2020, 12:06 AM

#

all of the 'yes' values in the churn column that also has autopayment?

desert oar Jul 12, 2020, 12:06 AM

#

more specifically

#

a Series, containing values from the 'AutoPayment' column such that df_churn == 'Yes'

#

so a Series of 1s and 0s

tardy portal Jul 12, 2020, 12:08 AM

#

shouldn't it be df_churn.loc[df_churn['AutoPayment'] == 0, 'Churn'].sum()

desert oar Jul 12, 2020, 12:08 AM

#

does that fix things?

#

what does that code do?

#

you were closer before

tardy portal Jul 12, 2020, 12:11 AM

#

nvm the last code I posted, so df_churn.loc[df_churn['Churn'] == 'Yes', 'AutoPayment'].sum() should locate the 'Yes' values from the 'Churn' column in which also includes the 0 values from the 'AutoPayment' column?

desert oar Jul 12, 2020, 12:11 AM

#

the code is right but your explanation doesn't make sense

#

the first argument to .loc selects rows

#

the 2nd argument selects columns

#

so this is values from the 'AutoPayment' column, from rows where df_churn['Churn'] == 'Yes' contains True

tardy portal Jul 12, 2020, 12:15 AM

#

I think i'm conducting the answer incorrectly

desert oar Jul 12, 2020, 12:15 AM

#

i think you need to think a bit more "methodically" about code

#

less MS Word more MS Paint, if that makes any sense

#

it looks like you're kind of just guessing at how to put these lines of code together

#

rather than being careful to write code that specifically corresponds to the logic you want

#

i don't mean to be insulting by saying that. it's a common thing that novices try to do

tardy portal Jul 12, 2020, 12:17 AM

#

no offense taken

desert oar Jul 12, 2020, 12:18 AM

#

coding is a continuous process of 1) writing step by step logic, aka an "algorithm", 2) figuring out how to express that logic with the tools provided to you by the programming language

#

sometimes it's iterative: you think you've expressed your logic correctly, then you realize there's no way to implement it in the way that you wanted

#

so you have to go back and revise your logic

#

etc

#

and always always remember that a computer, and by extension a programming language like python, is stupid

#

it cannot know what you mean

#

it can only follow the exact instructions you provide to it

tardy portal Jul 12, 2020, 12:19 AM

#

no i understand that

desert oar Jul 12, 2020, 12:19 AM

#

so you have to be very clear about what you want to do

#

in order to provide the correct instructions

#

good

#

hopefully that helps you adjust your thinking for the rest of these problems

tardy portal Jul 12, 2020, 12:20 AM

#

i'm just trying to figure out the total so I can divide and find the percentages, but i'm not sure how to do that correctly

desert oar Jul 12, 2020, 12:20 AM

#

df_churn.loc[df_churn['Churn'] == 'Yes', 'AutoPayment'].sum()

#

what do you think this code does?

tardy portal Jul 12, 2020, 12:20 AM

#

so its taking all of the 'yes' values from the rows of 'Churn'

#

is that correct/

#

?*

desert oar Jul 12, 2020, 12:21 AM

#

not quite

#

it's taking all the rows from df_churn

#

where ``Churn'is'Yes'`

#

see the difference?

tardy portal Jul 12, 2020, 12:21 AM

#

yes

desert oar Jul 12, 2020, 12:21 AM

#

good, continue

tardy portal Jul 12, 2020, 12:22 AM

#

with that said, it then applies that to the 'AutoPayment' column

desert oar Jul 12, 2020, 12:22 AM

#

yes

#

hint: it's correct

#

i just want to make sure you understand why

#

and that it isn't a guess

tardy portal Jul 12, 2020, 12:39 AM

#

alright so going back to finding the percentage, I simply cannot divide in order to find the percentage

desert oar Jul 12, 2020, 12:40 AM

#

what do you mean

#

you know the sum, i.e. the # of elements that meet the criterion you want

#

just divide that sum by the total, no?

tardy portal Jul 12, 2020, 12:40 AM

#

so I have the sum total of both groups that have/don't have autopayments

desert oar Jul 12, 2020, 12:41 AM

#

yes, and?

#

divide each one by the number of rows

urban quarry Jul 12, 2020, 12:52 AM

#

could anyone here help me choose a bayesian nonparametric clustering technique?

#

Or rather, if anyone is an expert in cluster analysis, please let me know, cuz i have a very specific question

vale tundra Jul 12, 2020, 2:16 AM

#

Hey guys^ Any idea on how i can resize images to like say 128x128 pixels?

plush mango Jul 12, 2020, 6:02 AM

#

I got a dict inside another dict I need to access the second dict

#

how do i do that pls help

inland rivet Jul 12, 2020, 6:08 AM

#

dic={'key':{'key2':'value'}}
print(dic['key']['key2'])

output: value

plucky harness Jul 12, 2020, 6:37 AM

#

How to collect the language data and create our own custom or artificial dataset? Any suggestions?

ripe forge Jul 12, 2020, 6:44 AM

#

The dataset depends on the task. So, what kind of task are you thinking of doing

#

And what does language data mean

plucky harness Jul 12, 2020, 7:29 AM

#

@ripe forge Suppose if i have to collect an ancient language data and the images of that language is very few and data augmentation is not an option here. So what strategies we can apply to get more images data of that language?

ripe forge Jul 12, 2020, 8:28 AM

#

Oh so images. Well the augmentation is the way, why would that not be an option?

lapis sequoia Jul 12, 2020, 10:23 AM

#

Hi, who can suggest me a project that can be done with libraries (PANDA, MATPLOTLIB), it's to practice in data science with python

umbral aspen Jul 12, 2020, 12:02 PM

#

Hi guys

#

Do you know of any pretrained models which are able to determine hair color?

#

I find a few examples of being able to identify hair for the purpose of replacing it or changing the color but nothing around just identifying the hair color

#

Preferably an example using tensorflow/keras

lapis sequoia Jul 12, 2020, 12:30 PM

#

@lapis sequoia I use Matplotlib which also supports beautiful LaTex fonts for publishing.

#

@lapis sequoia How

#

@lapis sequoia Such as plt.xlabel(r'$t*U/C$',fontsize=12), label=r'U'

#

@lapis sequoia ok, can you suggest me a project or online project that can be done with libraries (PANDA, MATPLOTLIB), it's to practice in data science with python

uncut shadow Jul 12, 2020, 12:38 PM

#

well, you can look for some on kaggle

#

there should be many

lapis sequoia Jul 12, 2020, 12:39 PM

#

@lapis sequoia You should listen someone who knows things. e.g. @uncut shadow 🙂

#

@lapis sequoia thanks

uncut shadow Jul 12, 2020, 12:43 PM

#

well, kaggle is an answer https://www.kaggle.com/datasets You can find there many datasets and use them for your own purposes. you can also do some googling and there is a huge chance you are going to find something intersting there. I have also found some on datacamp which might also look intersting https://www.datacamp.com/projects/topic:data_visualization

Find Open Datasets and Machine Learning Projects | Kaggle

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. Flexible Data Ingestion.

DataCamp

Projects

Data science projects

lapis sequoia Jul 12, 2020, 2:03 PM

#

@uncut shadow thanks

haughty narwhal Jul 12, 2020, 3:32 PM

#

Hey guys! Really simple question with Sympy 1.6 and python 3.7.7. I'm following along with the book "Numerical Python" by Robert Johannson, and this simple code returns a different result

g = sympy.Function("g")
g(x, y, z)
print(g.free_symbols)```

The author gets {x, y, z}, as he was successful in applying the domain variables to g. I get the empty set, however:
```set()```

#

Did the library's behavior change recently, or am I missing a simple concept?

chilly geyser Jul 12, 2020, 3:46 PM

#

I think you have the wrong code

#

It's not

g = sympy.Function("g")

It is

g = sympy.Function("g")(x, y, z)

#

@haughty narwhal https://discordapp.com/channels/267624335836053506/267659945086812160/731899723145019473

haughty narwhal Jul 12, 2020, 3:57 PM

#

@chilly geyser Thanks a lot! The book seemed to imply that you can define a function and assign a domain afterward, but after rereading a few times, it seems that I made an incorrect assumption. Thanks again.

umbral aspen Jul 12, 2020, 5:59 PM

#

@mellow saffron You should be able to detect leaves of certain plants...But without more info it is hard to know how well it will work

#

What you will need is a lot of test images with the correct leaves labelled...The images should be similar to what you plan to feed to your model as well

#

If the leaves are easy to identify (say bay leave vs maple leaf) then it could work well

umbral aspen Jul 12, 2020, 6:52 PM

#

well it can detect objects and where they are in the image

#

So then you should be able to calculate the pixels between at least

#

Then based on your camera etc you could approximate the distance

#

I would also search around...maybe someone has already created a leaf classifying model

idle otter Jul 12, 2020, 7:22 PM

#

a3 = np.array([[
[1, 2, 3], [4, 5, 6], [7, 8, 9],
[1, 2, 3], [4, 5, 6], [7, 8, 9]
]])

the shape of the array is (1, 6, 3) what does the 1, 6, and the 3 represent?

chilly gust Jul 12, 2020, 7:23 PM

#

the length of the nested arrays

#

you have 1 array containing 6 arrays of 3 scalars each

idle otter Jul 12, 2020, 7:24 PM

#

ah thanks

flat quest Jul 12, 2020, 7:46 PM

#

well it depends do you just want a classifier, and then u control where the robot moves?

Or a reinforcement model where the robot learns where to move on his own. For that you'll need more than a leaf classification model @mellow saffron

#

well yeah

But you have to options. Code in the instructions for what the robot should do if it detects a leaf. Or let the model learn how to do it by itself using a reinforcement model

inland rivet Jul 12, 2020, 8:02 PM

#

I'm taking the "Introduction to Data Science with Python" course by the University of Chicago on Coursera.

flat quest Jul 12, 2020, 8:19 PM

#

well reinforcement learning is based on three concepts

agent, environment, and reward. The learning would be based on the reward (now this can be an immediate reward or a long term reward - which is more difficult to interpret)

Anyways, it would probably be best to take a look into RL, and try some basic models with it. That should help to give you an intuition of how RL works

#

@mellow saffron

lapis sequoia Jul 12, 2020, 8:27 PM

#

I'm working with this and I can't get this to work:
{
"Country": "Albania",
"CountryCode": "AL",
"Slug": "albania",
"NewConfirmed": 93,
"TotalConfirmed": 3371,
"NewDeaths": 4,
"TotalDeaths": 89,
"NewRecovered": 6,
"TotalRecovered": 1881,
"Date": "2020-07-12T14:23:45Z",
"Premium": {}
},
Code:
https://paste.pythondiscord.com/hasotizuga.py

flat quest Jul 12, 2020, 8:28 PM

#

which part isn't working

lapis sequoia Jul 12, 2020, 8:36 PM

#

I have a .dat file consists of floating numbers which is ~1 GB. I use regex search via:
for line in lines:
match=re.search(self.forceRegex,line)
if match:
self.time.append(float(match.group(1)))
But since it's too huge, I'm at the "speed border" of Python I think. I can't open it. My laptop freezes. What do you think?

lapis sequoia Jul 12, 2020, 9:00 PM

#

@flat quest im trying to make a search function like they choose what country but i cant figure out how to do it

flat quest Jul 12, 2020, 9:04 PM

#

its a dict not a csv?

#

so the fetch returns a list of dicts? then you can just iterate over the dict to get the ones that have dict.country==country

unkempt stone Jul 12, 2020, 9:07 PM

#

hii! i’m very new to Matplotlib so I want to ask if it is possible to use a script and a csv file (with datas) to generate many different graphs (same kind of graph, but different data) at the same time?

coarse spire Jul 13, 2020, 2:24 AM

#

Hi, given an array of timestamps (or seconds) what would be the best way of finding large clusters? Sometimes they happen in the same minute but some other times they may be split out over the course of 15-30 minute of (semi) continuous activity.

I made histograms with 5 minute intervals but I would like some programmatic way so that I can feed it into some other analysis.

#

@unkempt stone You might want to look at pandas, you can use that to read in a csv file and it has some plotting functions built on top of matplotlib like histograms, bar charts, etc.

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html

if you check the sidebar it has bar, box, hist, pie, etc.

agile anvil Jul 13, 2020, 2:58 AM

#

I have a 200 row, 51 column pandas dataframe of floats, and I want to delete all the columns which don't have one of the top ten values in the last row. Any ideas? My google/stackoverflow-fu is weak on this one.

#

looking at https://stackoverflow.com/questions/58758098/how-can-i-remove-columns-of-pandas-dataframe-conditional-on-last-row-values and just feeling dazed, knowing it's close....

Stack Overflow

How can I remove columns of pandas dataframe conditional on last ro...

Given a data-frame like:

         A  B  C

2019-11-02 120 25 11
2019-11-03 119 28 15
2019-11-04 115 23 18
2019-11-05 119 30 20
2019-11-06 121 32 25
2019-11-07 117 24 30
I would like to r...

sonic haven Jul 13, 2020, 6:16 AM

#

DEVICE_PLACEMENT_EXPLICIT = pywrap_tensorflow.TFE_DEVICE_PLACEMENT_EXPLICIT
AttributeError: module 'tensorflow.python.pywrap_tensorflow' has no attribute 'TFE_DEVICE_PLACEMENT_EXPLICIT'

#

is there a way to fix?

worn stratus Jul 13, 2020, 7:43 AM

#

I'm trying to figure out the right way to store some data. First I just want to get it in a dataframe though. The data is essentially a name as a string, then a long series of timsetamp/value pairs. E.g

 ('name 2', [(timestamp, value), (timestamp2, value), (timestamp3,value)...]),
 ('name 3', [(timestamp, value), (timestamp2, value), (timestamp3,value)...])]```
What I can't figure out is how to jam the timestamp pairs into a dataframe sensibly, any suggestions would be appreciated

glacial rune Jul 13, 2020, 8:12 AM

#

If I have a folder containing multiple text files containing an hour's worth of data (names include timestamps), is there a way I can read all of them in?

#

e.g. data files called
data-2020-07-01-00-00-00.txt
data-2020-07-01-01-00-00.txt
data-2020-07-01-02-00-00.txt

#

and then can I automate writing them out to a slightly different file name? e.g. just add _02 at the end i.e
data-2020-07-01-00-00-00_02.txt
data-2020-07-01-01-00-00_02.txt
data-2020-07-01-02-00-00_02.txt

#

could I make a list of the imported files and add the _02 to the end and use that list to create the text files I want?

chrome barn Jul 13, 2020, 8:46 AM

#

for f in os.listdir('folder/'):
    if f.endswith('.txt'):
        file_name = f.replace(".txt", "")
        os.rename{f'{f}',f'{file_name}_02.txt'}

#

you could use something like this or if you want to read them into something like a dataframe replace the inner part of the loop with pd.read_csv for the file and then output with pd.to_csv with the desired filenam

acoustic halo Jul 13, 2020, 11:21 AM

#

If i save a keras model with model.save(), will it save the training history dict as well?

small ermine Jul 13, 2020, 12:05 PM

#

Guys, where can i start learning some basic stuff about data sciences..?..before i enroll in a major course or something.

worldly elk Jul 13, 2020, 1:03 PM

#

Hi All,

What is the best Javascript method to crawl links, without sending 2 request each time.

I use Request and Selenium today, so i can get the http status code, because it doesn't exist in Selenium.

What is a better way of doing this? anybody 🙂

inland rivet Jul 13, 2020, 1:27 PM

#

What's the difference between skiprows and header in read_excel()?

spiral peak Jul 13, 2020, 1:36 PM

#

Header will take the first row (or the row you specify) as the column labels for the dataframe. Skiprows won't import the rows at the beginning that you specify.

sonic haven Jul 13, 2020, 2:55 PM

#

DEVICE_PLACEMENT_EXPLICIT = pywrap_tensorflow.TFE_DEVICE_PLACEMENT_EXPLICIT
AttributeError: module 'tensorflow.python.pywrap_tensorflow' has no attribute 'TFE_DEVICE_PLACEMENT_EXPLICIT'
any1 know how to fix

sonic haven Jul 13, 2020, 3:36 PM

#

class GNMTAttentionMultiCell(tf.nn.rnn_cell.MultiRNNCell):
AttributeError: module 'tensorflow._api.v2.nn' has no attribute 'rnn_cell'

desert oar Jul 13, 2020, 4:09 PM

#

https://towardsdatascience.com/sktime-a-unified-python-library-for-time-series-machine-learning-3c103c139a55 from another discord, seems interesting

Medium

Sktime: a Unified Python Library for Time Series Machine Learning

The “sklearn” for time series forecasting, classification, and regression

lapis sequoia Jul 13, 2020, 4:36 PM

#

its a dict not a csv?
@flat quest Yes it's a dict
so the fetch returns a list of dicts? then you can just iterate over the dict to get the ones that have dict.country==country
@flat quest See that's the problem I'm having and I can't figure out how to do that and I tried checking out some videos on how to work with json in Python .

#

I also get this error

Exception has occurred: TypeError
string indices must be integers

#

Code:
https://paste.pythondiscord.com/lowatutaro.py

#

I just checked and the Countries im looking for to get a name on is a list

#

"Countries": [
{
"Country": "Afghanistan",
"CountryCode": "AF",
"Slug": "afghanistan",
"NewConfirmed": 85,
"TotalConfirmed": 34451,
"NewDeaths": 16,
"TotalDeaths": 1010,
"NewRecovered": 81,
"TotalRecovered": 21216,
"Date": "2020-07-13T16:12:18Z",
"Premium": {}
},

#

This is an example after alphabetical order

#

Anyone have experience with computer vision? I'm trying to develop a way to effectively detect and crop images of rivets from pictures of airplane panels to be run through a CNN to determine if they are damaged or not. If anyone has any input on a good way to do this I would appreciate it! I've tried to use feature mapping in opencv but not having much luck or consistency

flat quest Jul 13, 2020, 4:46 PM

#

so object detection? @lapis sequoia there's a whole field of research on that

The most successful algorithms have been rcnn's, yolo, ssd, and more recently something like facebook's detr.

#

i think this part's your problem @lapis sequoia

countryname = country["Country"][country]

If its a list, you can't access it by name

lapis sequoia Jul 13, 2020, 4:52 PM

#

countryname = data["Countries"][0]["Country"]

This would work but it wouldn't take what im searching for this would the take the first

#

Which is:
"Countries": [
{
"Country": "Afghanistan",
"CountryCode": "AF",
"Slug": "afghanistan",
"NewConfirmed": 85,
"TotalConfirmed": 34451,
"NewDeaths": 16,
"TotalDeaths": 1010,
"NewRecovered": 81,
"TotalRecovered": 21216,
"Date": "2020-07-13T16:12:18Z",
"Premium": {}
},

flat quest Jul 13, 2020, 4:54 PM

#

so what you could do is simply loop over each country in countries

for data_country in data["countries"]: 

print(data_country[country])

lapis sequoia Jul 13, 2020, 4:54 PM

#

oh ok

#

thanks

flat quest Jul 13, 2020, 4:54 PM

#

yeah np

lapis sequoia Jul 13, 2020, 5:03 PM

#

@flat quest So we actually have a resnet CNN trained to detect if a rivet is damaged or not, the goal now is to be able to fly a drone around a plane and extract the rivets from photos to send through that trained CNN. Do you think yolo would be a good tool to actually detect the rivets to be extracted?

#

this is pretty weird:

for data_country in data["Countries"]:
  countryname = data_country[country]
  print(countryname)

#

I realized this may all be really basic stuff, but to be honest I was hired as an intern to a start up that I'm pretty underqualified for. I'm basically the only person in the datascience department of this project and im only a junior in a CS degree so im pretty clueless with a lot of this stuff.

flat quest Jul 13, 2020, 5:08 PM

#

yeah @lapis sequoia yolo should do pretty well, it won't have as high of an accuracy as something like rcnn, but its a lot faster

lapis sequoia Jul 13, 2020, 5:08 PM

#

so i get this error:

Exception has occurred: KeyError
'Sweden'

#

but it's correct

#

{
      "Country": "Sweden",
      "CountryCode": "SE",
      "Slug": "sweden",
      "NewConfirmed": 0,
      "TotalConfirmed": 74898,
      "NewDeaths": 0,
      "TotalDeaths": 5526,
      "NewRecovered": 0,
      "TotalRecovered": 0,
      "Date": "2020-07-13T16:12:18Z",
      "Premium": {}
    },

#

Awesome thanks, I'll give that a try

flat quest Jul 13, 2020, 5:10 PM

#

yeah thats pretty common nowadays. A lot of companies are hiring anyone for DS.

ah i see the issue. The country name is sweden inside another dict.

We can do the inefficient way i guess.

for country in countries:

if(country.name == country_name):
print(country)

lapis sequoia Jul 13, 2020, 5:15 PM

#

Really? Thats surprising to me, but also kinda good news, especially if I do decently well in this internship and get some good experience in the field before I even graduate

#

This is a good example:

{
  "Global": {
    "NewConfirmed": 132420,
    "TotalConfirmed": 12849441,
    "NewDeaths": 3537,
    "TotalDeaths": 568652,
    "NewRecovered": 111741,
    "TotalRecovered": 7116071
  },
  "Countries": [
    {
      "Country": "Afghanistan",
      "CountryCode": "AF",
      "Slug": "afghanistan",
      "NewConfirmed": 85,
      "TotalConfirmed": 34451,
      "NewDeaths": 16,
      "TotalDeaths": 1010,
      "NewRecovered": 81,
      "TotalRecovered": 21216,
      "Date": "2020-07-13T16:12:18Z",
      "Premium": {}
    },
    {
      "Country": "Albania",
      "CountryCode": "AL",
      "Slug": "albania",
      "NewConfirmed": 83,
      "TotalConfirmed": 3454,
      "NewDeaths": 4,
      "TotalDeaths": 93,
      "NewRecovered": 65,
      "TotalRecovered": 1946,
      "Date": "2020-07-13T16:12:18Z",
      "Premium": {}
    },

But the thing is that it gives me this error:

Exception has occurred: AttributeError
'dict' object has no attribute 'Country'

Code:

for data_country in data["Countries"]:
                        country_name = country
                        if (data_country == country_name):
                            print(country)

I changed some names here because of this:

@commands.command()
    async def corona(self, ctx, *, country = None):

This above me is apart of the discord.py library so it's correct it's just that it's named country and by default it's None

#

Maybe that's the problem?

desert oar Jul 13, 2020, 5:24 PM

#

dict elements can't be accessed with .

lapis sequoia Jul 13, 2020, 5:25 PM

#

it's a list

desert oar Jul 13, 2020, 5:26 PM

#

@commands.command()
async def corona(self, ctx, *, country = None):
    for data_country in data['Countries']:
        if data_country['Country'] == country:
            print(country)

#

is this what you want?

lapis sequoia Jul 13, 2020, 5:26 PM

#

I can try

#

I want the name from the list

desert oar Jul 13, 2020, 5:27 PM

#

this sounds like a great use for Pandas btw

#

but start with what you already have

lapis sequoia Jul 13, 2020, 5:31 PM

#

It works

#

but I need to type it in this way

!corona Sweden

#

Can I convert every single word in the begging with a capital letter?

desert oar Jul 13, 2020, 5:32 PM

#

eh?

#

isn't that what commands.command does?

#

you can do whatever string manipulation you want inside corona()

#

@commands.command()
async def corona(self, ctx, *, country = None):
    country = country.title().strip()
    for data_country in data['Countries']:
        if data_country['Country'] == country:
            print(country)

for example

lapis sequoia Jul 13, 2020, 5:40 PM

#

Thank you so much 🙂

#

Okay one more question, I am using labelimg to label rivets in a bunch of photos to train yolo, some of the rivets are cut off half way in the image that I'm labeling. Should I still label the rivets if only half of it is showing or should I only be labeling totally visible rivets?

desert oar Jul 13, 2020, 5:54 PM

#

i would label the half rivets too, no?

#

can you mark them as "half" so you can leave them out later if needed

#

idk if you're using a labeling tool or doing it w/ excel or something

lapis sequoia Jul 13, 2020, 5:55 PM

#

oh no:

for data_country in data["Countries"]:
                        if data_country["Country"] == country.title():
                            country = country.title()
                            country_code = data_country[""]

#

Now I gotta access the country code and all the other details and that is hard

#

I gotta do an if statement for every single one of them?

chrome barn Jul 13, 2020, 6:11 PM

#

country = 'Japan'
for data_country in data["Countries"]:
                        if data_country["Country"] == country:
                            country_name = data_country['Country']
                            country_code = data_country['CountryCode']

#

just assign them this way

desert oar Jul 13, 2020, 6:14 PM

#

why would it be that hard?

#

once you have the correct data_country element

#

you can do whatever you want with it

lapis sequoia Jul 13, 2020, 6:20 PM

#

How can I print each item in a list into an evenly spaced column on a single line?

>>> x = [3.4, 8.1, 9, 1.54, 12.8]
>>> f'items {[i for i in x]}'
'items [3.4, 8.1, 9, 1.54, 12.8]'
# I want to print something like this:
'items    3.4    8.1    9    1.54    12.8

uncut shadow Jul 13, 2020, 6:31 PM

#

check .join()

#

this will come in handy

lapis sequoia Jul 13, 2020, 6:38 PM

#

@uncut shadow Great suggestion. This seems to work for me:

''.join(f'{i:10.2}' for i in x)

uncut shadow Jul 13, 2020, 6:38 PM

#

well

#

you could just do " ".join(x)

#

(u can change number of spaces if u want)

lapis sequoia Jul 13, 2020, 6:39 PM

#

I like to define the spacing in the f-string.

uncut shadow Jul 13, 2020, 6:40 PM

#

oh Ok

desert oar Jul 13, 2020, 6:59 PM

#

!e ```python
x = [3.4, 8.1, 9, 1.54, 12.8]
print( f'items {"".join(format(i, "10.2f") for i in x)}' )

arctic wedgeBOT Jul 13, 2020, 6:59 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

items       3.40      8.10      9.00      1.54     12.80

desert oar Jul 13, 2020, 6:59 PM

#

not necessarily recommended for readability, but have fun anyway

#

!e ```python
x = [3.4, 8.1, 9, 1.54, 12.8]
print( 'items ' + ''.join(format(i, '10.2f') for i in x) )

arctic wedgeBOT Jul 13, 2020, 7:00 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

items       3.40      8.10      9.00      1.54     12.80

desert oar Jul 13, 2020, 7:00 PM

#

much easier to read imo

limber cove Jul 13, 2020, 9:22 PM

#

Simulate your own pandemic:

#

https://github.com/pvcraven/pandemic_simulator

GitHub

pvcraven/pandemic_simulator

Contribute to pvcraven/pandemic_simulator development by creating an account on GitHub.

eager heath Jul 13, 2020, 9:46 PM

#

Noice

#

That makes me think of the primer youtube channel

flat quest Jul 13, 2020, 9:46 PM

#

@chrome barn yeah that python code.

You can just reference the other parts by doing dict[key]

frank bone Jul 13, 2020, 10:10 PM

#

Does anyone know if there's a possibility to have an associative array like in PHP?

    AAPL:([DATE1, DATE2, DATE3, etc],
          [PRICE1, PRICE2, PRICE3, etc],
          [VOLUME1, VOLUME2, VOLUME3, etc])

    AMZN:([DATE1, DATE2, DATE3, etc],
          [PRICE1, PRICE2, PRICE3, etc],
          [VOLUME1, VOLUME2, VOLUME3, etc])

    TSLA:([DATE1, DATE2, DATE3, etc],
          [PRICE1, PRICE2, PRICE3, etc],
          [VOLUME1, VOLUME2, VOLUME3, etc])
        ]

so I could query a value like this:
request = master_array[AAPL, 1]
print(request)
output: [PRICE1, PRICE2, PRICE3, etc]```
Reason: be able to iterate variables like so master_array[VARIABLE, int]
Alternatively how would you go about organising this neatly? The goal is to create new values from csv files on the fly, then saving them in arrays like above to be able to further manipulate them

#

somehow naming arrays...planning to use numpy

#

but if there are better solutions lmk! sorry still quite a noobie in coding, learning every day 🙂

pale thunder Jul 13, 2020, 10:14 PM

#

You would use a dict

#

And do some_dict[key][index]

serene scaffold Jul 13, 2020, 10:15 PM

#

What you have right now is very close to what you want. Your don't need to wrap each list in parentheses though.

frank bone Jul 13, 2020, 10:15 PM

#

thats going to be insanely memory hungry no? the dicts

#

i have around 100GB of csv files

pale thunder Jul 13, 2020, 10:16 PM

#

You cannot load all of those into ram regardless

serene scaffold Jul 13, 2020, 10:16 PM

#

I mean, if you have 100gb of ram...

#

(you'd need more obviously)

frank bone Jul 13, 2020, 10:16 PM

#

well...the csv...the derivative values are just around 2GB

#

just read about numpy performance advantages, so i was wondering if theres a solution to this

pale thunder Jul 13, 2020, 10:17 PM

#

I would suggest handling it as lazily as possible, as that will be a better solution in case you ever need to work on larger datasets

#

Pandas may be what you want though

#

It is pretty much a csv-like data structure with fast random access and some other things

frank bone Jul 13, 2020, 10:18 PM

#

What you have right now is very close to what you want. Your don't need to wrap each list in parentheses though.
@serene scaffold how though? how would i call a sub array in numpy? then a value in a sub array (x,y)?

#

Pandas may be what you want though
@pale thunder ill look into it thanks, any quick links?

#

for this purpose

serene scaffold Jul 13, 2020, 10:19 PM

#

my_dict = {'hi': ['a', 'b']}

#

That's a perfectly fine list.

#

If you put the list in parentheses, it doesn't change anything, so it just makes your code less readable.

frank bone Jul 13, 2020, 10:21 PM

#

oh you can have key:array/list pairs? didnt know that

#

srry just started coding/python yesterday 😆

serene scaffold Jul 13, 2020, 10:21 PM

#

No problem.

#

A lot of programming languages have arrays, which are ordered strictures of a finite length

#

Python lists are different

frank bone Jul 13, 2020, 10:22 PM

#

so i can just append or extend them infinitely?

serene scaffold Jul 13, 2020, 10:22 PM

#

They grow and shrink to contain any number of items.

frank bone Jul 13, 2020, 10:22 PM

#

the lists

serene scaffold Jul 13, 2020, 10:22 PM

#

Yes, until you run out of memory.

#

For that reason they usually aren't called arrays. When you talk about arrays in python, it's assumed you're talking about math.

frank bone Jul 13, 2020, 10:24 PM

#

so numpy arrays are fixed length? cant extend them infinitely?

#

on thego

serene scaffold Jul 13, 2020, 10:24 PM

#

I know you can concatenate any number of numpy arrays

pale thunder Jul 13, 2020, 10:25 PM

#

Numpy arrays are pretty much entirely mutable, so you can extend them even in place afaik

frank bone Jul 13, 2020, 10:25 PM

#

just wondering about numpy because of speed/efficiency...when going through i.e. 5 years or backtesting i wouldnt want to wait a day or however long if it could be 15min 😄

#

not sure if it makes that much of a difference though?

serene scaffold Jul 13, 2020, 10:26 PM

#

Numpy uses C underneath. Its internal operations should be pretty fast.

pale thunder Jul 13, 2020, 10:26 PM

#

First get it to work, then see performance

#

Pandas will probably be faster than plain python though

frank bone Jul 13, 2020, 10:27 PM

#

true, getting ahead of myself here 😉

#

my_dict = {'hi': ['a', 'b']}

@serene scaffold could you make an example how to append 'c' to the list?

serene scaffold Jul 13, 2020, 10:29 PM

#

my_dict['hi'].append('c')

frank bone Jul 13, 2020, 10:29 PM

#

oh thats great thanks 😄

#

could you use a dynamic variable to generate a new key?

serene scaffold Jul 13, 2020, 10:30 PM

#

Generate a new key?

frank bone Jul 13, 2020, 10:30 PM

#

key:list pair

serene scaffold Jul 13, 2020, 10:31 PM

#

my_dict['bob'] = 8

frank bone Jul 13, 2020, 10:31 PM

#

could 'bob' be a variable?

serene scaffold Jul 13, 2020, 10:31 PM

#

Now bob is a key with that value.

#

Yes, you could use a variable as long as it points to a hashable object

frank bone Jul 13, 2020, 10:32 PM

#

nice thanks 🙂 noted

pale thunder Jul 13, 2020, 10:32 PM

#

Pretty much everywhere that you can put a literal in python you can put a variable

frank bone Jul 13, 2020, 10:32 PM

#

is it as straight forward in numpy and pandas?

pale thunder Jul 13, 2020, 10:32 PM

#

In pandas not quite, as it uses a csv like structure

#

Though i think it just sticks nulls everywhere

frank bone Jul 13, 2020, 10:34 PM

#

is it possible to have 2 different datatypes per key?

#

in the dict method

#

id want date + price "streams" and ticker symbol as key

serene scaffold Jul 13, 2020, 10:34 PM

#

I'm not sure what you mean

#

There's no rule that all the keys in a given dict have to have the same type

#

Or for the values.

frank bone Jul 13, 2020, 10:36 PM

#

my_dict = {'AAPL': ['date_a', 'date_b'], [price_a, price_b]}

#

like this

serene scaffold Jul 13, 2020, 10:37 PM

#

If you're trying to make a data structure for storing prices of something over time, I would not do it like this.

frank bone Jul 13, 2020, 10:37 PM

#

thats what im trying yes

#

price, volume vs. date

serene scaffold Jul 13, 2020, 10:37 PM

#

If you wanted to do this on a large scale, you probably want to look into data bases.

#

You could also make a data class that stores a date and a price

frank bone Jul 13, 2020, 10:38 PM

#

want to try it with lists or arrays

serene scaffold Jul 13, 2020, 10:38 PM

#

Why

mellow spruce Jul 13, 2020, 10:38 PM

#

Hello all, I am trying to think what is the best approach to solve this problem. I need to create a table that upon selecting some items will refer to other tables of those specific items to generate some calculations to be displayed on the main table. Furtheremore, if wanted these other tables used to calculate the main table should be accessible and show calculations performed on a data base or the lower heriarchy table. so It should be something like main table->table 1-> table2-> table 3. each table + some other data will be used to calculate the table above. I did some research and using the datatable library seems to do it but if there are other ways I am happy to hear. also let me know if this makes sense.

frank bone Jul 13, 2020, 10:40 PM

#

Why
@serene scaffold alrdy ran out of inodes once 😆

#

switched to xfs

serene scaffold Jul 13, 2020, 10:40 PM

#

Inodes?

frank bone Jul 13, 2020, 10:40 PM

#

yeah my hard drive ran out of inodes before it ran out of space

#

had way too many folders and files

serene scaffold Jul 13, 2020, 10:41 PM

#

I apparently don't know what that is.

frank bone Jul 13, 2020, 10:41 PM

#

each reserves a couple bytes in space

#

well not too important, i could make a database now

#

idk about complexity vs arrays/lists

#

?

serene scaffold Jul 13, 2020, 10:42 PM

#

Complexity?

frank bone Jul 13, 2020, 10:42 PM

#

ext4 has a limited amount of inodes, when you install the fs

#

if you have too many files or symlinks or folders youll run out of memory, even though you have 20% used

serene scaffold Jul 13, 2020, 10:43 PM

#

I don't know about details that are that low.

frank bone Jul 13, 2020, 10:43 PM

#

anyways, yeah i was wondering about DB complexity for a noob vs arrays

serene scaffold Jul 13, 2020, 10:43 PM

#

You can also look into options where only data you're currently using is in memory

frank bone Jul 13, 2020, 10:43 PM

#

id love to create the code in 3-4 weeks time

serene scaffold Jul 13, 2020, 10:43 PM

#

Python has generators for this.

frank bone Jul 13, 2020, 10:44 PM

#

well im kind of like going through a "moving window" when backtesting

#

around 30-45 day windows

serene scaffold Jul 13, 2020, 10:44 PM

#

I'm not sure what you mean.

frank bone Jul 13, 2020, 10:45 PM

#

ill pm u if its ok, dont want to spam the whole channel 😄

serene scaffold Jul 13, 2020, 10:45 PM

#

Well, I probably don't know how to solve what you're doing

#

And this channel is specifically for data science

#

So writing about it here leaves the door open for others to chime in.

frank bone Jul 13, 2020, 10:46 PM

#

alright ill keep asking my questions here then, i think i might just try dict for now and see if it works out for me

serene scaffold Jul 13, 2020, 10:46 PM

#

Well, don't load all 100gb at once

#

Load the data you need, compute stuff, write it to disk, and repeat.

frank bone Jul 13, 2020, 10:47 PM

#

yup 😄

#

how would you read from a specific index within the list in a dict?

#

print(my_dict['hi']) print the list

#

what about access 2nd index in the list?

mellow spruce Jul 13, 2020, 11:01 PM

#

Hello all, I am trying to think what is the best approach to solve this problem. I need to create a table that upon selecting some items will refer to other tables of those specific items to generate some calculations to be displayed on the main table. Furtheremore, if wanted these other tables used to calculate the main table should be accessible and show calculations performed on a data base or the lower heriarchy table. so It should be something like main table->table 1-> table2-> table 3. each table + some other data will be used to calculate the table above. I did some research and using the datatable library seems to do it but if there are other ways I am happy to hear. also let me know if this makes sense.

frank bone Jul 13, 2020, 11:31 PM

#

alrdy fell in love with pandas, as easy as excel basically 😄

hearty cloud Jul 13, 2020, 11:38 PM

#

Hello all, I am trying to think what is the best approach to solve this problem. I need to create a table that upon selecting some items will refer to other tables of those specific items to generate some calculations to be displayed on the main table. Furtheremore, if wanted these other tables used to calculate the main table should be accessible and show calculations performed on a data base or the lower heriarchy table. so It should be something like main table->table 1-> table2-> table 3. each table + some other data will be used to calculate the table above. I did some research and using the datatable library seems to do it but if there are other ways I am happy to hear. also let me know if this makes sense.
@mellow spruce It looks like a SQL type calculation, but I'm not sure if a temporary database or so could be built

mellow spruce Jul 13, 2020, 11:41 PM

#

@mellow spruce It looks like a SQL type calculation, but I'm not sure if a temporary database or so could be built
@hearty cloud The thing with doing it through sql is that it has to be interactive and updated every time you make different selections in the different tables. Thank you for the input!

lapis sequoia Jul 13, 2020, 11:50 PM

#

Hi, i need help with this message error : AttributeError: 'list' object has no attribute 'reshape'

frank bone Jul 14, 2020, 12:05 AM

#

Anyone know how I can have more depth in pandas dataframes?


cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

df = pd.DataFrame(cars, columns = ['Brand', 'Price'])

print (df)```
instead of cars as "master", I want "Transport" as "master"
Then:
Cars -> Brand, Price
Planes -> Brand, Price
Boats -> Brand, Price
etc

#

hope you understand what i mean 🙂

desert oar Jul 14, 2020, 12:08 AM

#

how would you do it in excel?

#

(i see that you have excel experience)

zealous kayak Jul 14, 2020, 12:09 AM

#

@lapis sequoia
Because reshape is not for python list . It's a numpy function for arrays.
Eg:
L = [1,2,3,4]
Arr = numpy.array(L)
Arr = Are.reshape(2,2)

desert oar Jul 14, 2020, 12:09 AM

#

@mellow spruce you can do that with pandas, although i dont think i entirely understand what youre asking for

frank bone Jul 14, 2020, 12:10 AM

#

how would you do it in excel?
@desert oar i would use different sheets

lapis sequoia Jul 14, 2020, 12:10 AM

#

@zealous kayak thanks

zealous kayak Jul 14, 2020, 12:10 AM

#

@frank bone how about using classes

desert oar Jul 14, 2020, 12:10 AM

#

ok, nothing wrong with using multiple data frames

frank bone Jul 14, 2020, 12:10 AM

#

it will be thousands tho

desert oar Jul 14, 2020, 12:10 AM

#

alternatively you can have a column vehicle_type with values like 'Plane', 'Car', etc.

frank bone Jul 14, 2020, 12:15 AM

#

will there be performance issues if i use lets say 1 huge dataframe, or 4000 smaller dataframes?

#

one better over the other?

#

all in RAM

zealous kayak Jul 14, 2020, 12:16 AM

#

What do you actually want?
Can you explain a bit more

frank bone Jul 14, 2020, 12:17 AM

#

ill try

#

so i have a dataset for stocks (csv files) with around 20 datafields, i want to multiply 2 data columns, sum them, then store the result as Symbol -> Date -> Volume, and this for around 1000 symbols per day

#

and i want to have 1 consistent stream for each symbol...vol1, vol2, vol3, vol4...etc

#

but theres also price1, price2, price3 etc...and date1, date2, date3 et..

#

so several "streams" per ticker per date

#

so n amount of arrays within array?

#

idk how to most efficiently solve this

#

Example: AAPL -> 20180601 -> 1000(vol), 356(price)
20180602 -> 1060, 374
20180603 -> 1400, 333
20180604 -> 1100, 353

#

but a thounsand symbols in same array/df

#

so now im wondering if i should just go easy and make 1000 dataframes instead of 1 big

mellow spruce Jul 14, 2020, 12:37 AM

#

@mellow spruce you can do that with pandas, although i dont think i entirely understand what youre asking for
@desert oar let me try to explain it better. We have different parts of a group and different groups that are part of the system. I want to display a main graphs that displays some calculations on the whole system. Upon selecting a group I want to display some data of the parts that are part of that group and calculations in a different table. Upon selecting a part I want to further drill down and again show some data and some calculations. Now on the Hughes llevel (system) upon selecting some groups. This table is going to access some calculations of the groups and part’s tables and show some calculations. Does that explain it better?

desert oar Jul 14, 2020, 12:40 AM

#

@frank bone it depends on how you're using the data

frank bone Jul 14, 2020, 12:57 AM

#

@frank bone it depends on how you're using the data
@desert oar im using it to do very simple manipulations and comparisons, for example comparing avg daily volume of last 3 days vs avg daily volume of last 30 days etc

#

nothing too math intensive i guess?

desert oar Jul 14, 2020, 12:58 AM

#

i mean how you're using it in code, if that makes any sense

#

you can put 1000 data frames in a dict in a for-loop for example

frank bone Jul 14, 2020, 12:58 AM

#

well im a noob 😄

#

not sure i even know yet lol

#

i have a big problem with the 1000 data frames example

#

i need to create dataframe variable names dynamically

#

very easy thing to do in PHP or bash

#

seems almost impossible in python...or do you know a way 😄 ?

#

like i would need to create the data frame like so:

#

volume_$TICKER

#

i dont want to type 1000 freaking variables in main function, just let the program do it

#

and keep in ram

#

so how do i accomplish to make something like: volume_$TICKER = pd.read_csv(etc...)

#

it iterates day to day...so each day TICKER reappears, and then ill append my value to the dict

desert oar Jul 14, 2020, 2:03 AM

#

Use a dict

haughty narwhal Jul 14, 2020, 2:38 AM

#

Really basic question on integration in Sympy. This code returns, embarrassingly enough:
0.707106781186547 * sqrt(2) * A
... which clearly should = A

# Import
import sympy
sympy.init_printing()

# Define symbols
t, RT, A= sympy.symbols("t RT A", real = True)
sigma = sympy.Symbol("sigma", real = True, positive = True)

# Define Gaussian function
f = (A / (sigma * sympy.sqrt(2 * sympy.pi))) * sympy.exp(-(1/2) * ((t - RT)/sigma)**2)

# Integrate
f.integrate((t, -1 * sympy.oo, sympy.oo))

#

I don't know if this reflects poorly on me or Sympy, but I was wondering if somebody could steer me in the right direction to get the real output (A)

plush crescent Jul 14, 2020, 4:06 AM

#

Any mathematicians that can help me out with a python/pandas formula? I'm trying to calculate the STC(Schaff Trend Cycle) using data I got from tradingview. The data I have includes their STC results so I can compare but the formula I've found for STC from different python libs out there is giving me bad results.
STC formula:

Schaff Trend Cycle - Three input values are used with the STC:
– Sh: shorter-term Exponential Moving Average with a default period of 23
– Lg: longer-term Exponential Moving Average with a default period of 50
– Cycle, set at half the cycle length with a default value of 10.
The STC is calculated in the following order:
First, the 23-period and the 50-period EMA and the MACD values are calculated:
EMA1 = EMA (Close, Short Length);
EMA2 = EMA (Close, Long Length);
MACD = EMA1 – EMA2.
Second, the 10-period Stochastic from the MACD values is calculated:
%K (MACD) = %KV (MACD, 10);
%D (MACD) = %DV (MACD, 10);
Schaff = 100 x (MACD – %K (MACD)) / (%D (MACD) – %K (MACD)).

Python version:

EMA_fast = pd.Series(
                df['Close'].ewm(ignore_na=False, span=23, adjust=True).mean(),
                name='EMA_fast',
            )
EMA_slow = pd.Series(
                df['Close'].ewm(ignore_na=False, span=50, adjust=True).mean(),
                name="EMA_slow",
            )


MACD = pd.Series((EMA_fast - EMA_slow), name="MACD")
STOK = (
                (MACD - MACD.rolling(window=10).min()) / (
                MACD.rolling(window=10).max() - MACD.rolling(window=10).min())) * 100

STOD = STOK.rolling(window=10).mean()

df['STC'] = 100 * (MACD - (STOK * MACD)) / ((STOD * MACD) - (STOK * MACD))

#

The python version isn't giving me the same results as tradingviews STC. I've looked at the pine script used to calculate STC via tradingview but having a hard time understanding it

dull turtle Jul 14, 2020, 7:00 AM

#

how i can get a prediction accuracy of CNN model with categorical data? i have folders or classes like "driving_licence images", "passport images"or say if i pass "passport image" then it should let us know that to which class or folder it is matching? e.g. say "passport" - 85 %, "driving_licence" - 70 % this way how i can get this ?

woeful plover Jul 14, 2020, 7:00 AM

#

Anyone reading the deep learning book by Ian Goodfellow? Let's form a study group. DM please

heavy crow Jul 14, 2020, 7:49 AM

#

I am following this tutorial trying to get object detection working. https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb
It works great but I want to detect multiple classes (3). The tutorial states

for simplicity, we assume a single class in this colab; though it should be straightforward to extend this to handle multiple classes

but I can find a "straightforward" way to extend it for multiple classes. Could someone explain what the process would be to get multiple classes?

GitHub

tensorflow/models

Models and examples built with TensorFlow. Contribute to tensorflow/models development by creating an account on GitHub.

dull turtle Jul 14, 2020, 8:11 AM

#

@heavy crow what u want to do exactly?

heavy crow Jul 14, 2020, 8:25 AM

#

@dull turtle detect 3 differnet objects using tf 2

dull turtle Jul 14, 2020, 8:49 AM

#

what is tf2? @heavy crow

heavy crow Jul 14, 2020, 8:50 AM

#

tensorflow 2.0

lapis sequoia Jul 14, 2020, 8:51 AM

#

skimming the notebook it seems like you need to pass a different model configuration for the model_builder.build method

#

i am not familiar /w the object detection api but from what i understood the config is a model.proto object - which i assume is just some abstractation for describing a tensorflow model

dull turtle Jul 14, 2020, 9:51 AM

#

!pastebin

arctic wedgeBOT Jul 14, 2020, 9:51 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

sullen kiln Jul 14, 2020, 10:09 AM

#

hi guys, any R users here?

silk axle Jul 14, 2020, 11:00 AM

#

How would I make an AI that can recognise hand-written numbers, such as the ones in the image below?

📎 image0.png

#

What libraries/specific aspects of a library should I be looking into?

silk axle Jul 14, 2020, 11:19 AM

#

@me upon response please

ripe forge Jul 14, 2020, 11:56 AM

#

@silk axle so, images of digits yes? You first need a dataset. Individual image of digits with their corresponding number that they're supposed to be displaying. Then, you need a model that works on images. That means any deep learning library, for example tensor flow keras.

#

Fortunately for you, this kind of problem is the de facto introduction to CNNs (which are a type of neural network architecture that excel at image tasks). Check out tutorials on CNN on the "mnist" dataset. (it's a dataset of handwritten images.)

#

Now, these kinds of models might not always carry over nicely to real tasks. Because images are expected to be resized and have only one digit in them at a time and so on.

#

So what ocrs usually do is first break the combination of digits into smaller boxes around individual digits

#

So that is something outside the model that you'd have to do. One of the ones I recall that can try doing this kind of box creation for you automatically is jtessboxeditor

#

But I feel like you'd be going very far down the rabbit hole at that point. I'd worry about just trying to understand one digit at a time first.

marble jasper Jul 14, 2020, 12:02 PM

#

recognising digits is a pretty standard exercise for ML, you don't have to go into neural networks at all. In fact many course on ML will start with adaptive boosting haar wavelets for this problem

#

there's a lot of tutorials on it as well

#

anyway, to help you, there is an MNIST database of a lot of handwriten digits training data you should grab: https://en.wikipedia.org/wiki/MNIST_database

#

I believe sklearn has this available in sklearn.datasets.load_digits

#

a good place to start

ripe forge Jul 14, 2020, 12:03 PM

#

Ooh. I've never tried non neural nets on these kinds of tasks before.

marble jasper Jul 14, 2020, 12:04 PM

#

I guess they're the "old world" of ML

ripe forge Jul 14, 2020, 12:04 PM

#

Call it a part of self learning I suppose, my first interaction with mnist came paired with cnns

silk axle Jul 14, 2020, 12:04 PM

#

I've never really done any ML/AI stuff before so I guess start with just one number and progress from there. The main things I'm not sure on with this are:

how do I tell Python that "this sample imagine is a 1/2/3..."?
how do I get Python to read in the input image in a format that I can use for this
how do I then extract the number from the input image
how do I then compare the extracted number to the sample images to tell which number it is

marble jasper Jul 14, 2020, 12:04 PM

#

but digit recognition as well as face detection are common examples used for them

ripe forge Jul 14, 2020, 12:05 PM

#

All those 4 questions are great. Fortunately (or unfortunately) all those 4 questions have the same answer: they are all abstracted away by the apis of the model libraries that you use.

#

So it's as simple as putting things in the correct folder, or calling some library function.

#

Everything else just.. Happens. Magic.

silk axle Jul 14, 2020, 12:06 PM

#

So all that stuff is handled behind-the-scenes?

ripe forge Jul 14, 2020, 12:06 PM

#

Mhm. Pretty much

silk axle Jul 14, 2020, 12:06 PM

#

Well that makes it a lot easier I guess lmao

#

I'm guessing I want to use sklearn for this? Since that has the sample set?

ripe forge Jul 14, 2020, 12:07 PM

#

Depends on what model you're going to build

#

Deep neural networks I would use tensor flow personally. Sklearn is amazing for most other models.

marble jasper Jul 14, 2020, 12:07 PM

#

here's a blog page on AdaBoost on that MNIST dataset by the way: https://towardsdatascience.com/boosting-and-adaboost-clearly-explained-856e21152d3e worth knowing just for background information. but a CNN is going to outperform this. but I think it's worth knowing anyway

silk axle Jul 14, 2020, 12:08 PM

#

What exactly do you mean by a "model"? I'm new to this ml/ai stuff lol

ripe forge Jul 14, 2020, 12:08 PM

#

Good. I think those are the kind of questions you'd actually want to ask at this stage.

#

So, here's the question to you then

#

What is ml?

#

What do you understand by the phrase so far, if anything

#

(and it's perfectly fine if you say, no clue yet either.)

silk axle Jul 14, 2020, 12:10 PM

#

All I really know is like, "getting a computer to identify patterns so that it can identify what an object is"

#

So say for the number "8" it's essentially looking for two circles ontop of each other kinda thing

ripe forge Jul 14, 2020, 12:11 PM

#

Good. Now, let's get one specific detail that should make it easier

#

"how" do we get it to identify what the object is

#

Because we aren't in the ml territory till we answer this one. Because we have a choice

silk axle Jul 14, 2020, 12:12 PM

#

I kinda know the theory behind it, in that you feed it samples and say, "these are all '8's, figure out the patterns", and then in future if it sees that pattern it knows it's an '8', but I don’t know like how you actually do that and how the computer actually identifies the patterns

ripe forge Jul 14, 2020, 12:13 PM

#

Bingo. Solid question

#

There's two ways: one is, we define the rules. Say, as you said yourself, maybe 8 is 2 circles touching at some point. Or say, sky is pixels that are blue in colour.

#

So we program the logic for it

#

We say, a circle is pixels of black that follow an equation. Or we say, blue is pixel 0,0,x

#

But then, we code those rules up. And those rules start to break on different inputs

#

So, we start predicting water as sky for example.

marble jasper Jul 14, 2020, 12:14 PM

#

(I think this is why courses start with haar wavelets, because they're easy to understand and reason for. And then you move onto NN and then you're like 🤷‍♂️ I don't know what's going on in there, but it works)

ripe forge Jul 14, 2020, 12:14 PM

#

So then we add more rules. It must be above some kind of horizon line. Etc etc

#

This is traditional programming. we control the rules

#

We look at input. We set the rules. We get the output.

#

The second way, the ml way is this: we look at input. We let a machine look at output. We let it figure out *any rule whatsoever * it can.

#

And we give it a task: get as many input output pairs correct as you can.

marble jasper Jul 14, 2020, 12:16 PM

#

can I just add to this explanation:
https://qph.fs.quoracdn.net/main-qimg-6d4739d376cc968727a97f99994dc64a

#

these black and white squares are example of 2D haar wavelets. This example isn't for finding numbers, but detecting faces

#

these are some examples of simple "rules"

ripe forge Jul 14, 2020, 12:17 PM

#

That's finally where ml comes in. The machine is given freedom to make its own rules, but it's given an instruction to get the best score

#

The real truth is, we have no idea what rules it will end up defining to achieve that goal

marble jasper Jul 14, 2020, 12:18 PM

#

"is the bottom half of this rectangle lighter than the top half?" that's a rule you can come up with. That rule matches most people's eyes and nose because your eyes and eyebrows tend to be darker pixels in an image than your cheeks

#

but this rule matches a lot of other things too. so you combine this rule with another rule "is the middle third of this rectangle lighter than the ones aside?" that rule matches the middle of the nose quite well in photos

#

again, these two rules together can match a lot of other things, but it's slightly better than each of them on their own

#

so you can see that if you string together a hundred, or maybe a thousand of these simple rules, you might end up with a single strong rule that detects faces more than it detects anything else

silk axle Jul 14, 2020, 12:20 PM

#

Yea, I get what you're saying @marble jasper. More rules means more precision, although I guess with ML it can make errors in the rules which would snowball

marble jasper Jul 14, 2020, 12:20 PM

#

but this is very tedious work to do manually. so you give this problem to a computer and say: "come up with a few hundred random sets of rules, and try them out on this dataset of known faces and dataset of unknown faces, and give me the best rules"

#

and then you say "ok, take these best rules and change them a bit - adjust them slightly, and then run another test to see if the change made them any better"

#

so what you end up doing is you tell a computer to adjust the rules automatically, gradually improving on them until you have a model that is as good as it can be on the dataset that you have. That automatic self-improvement is machine learning

ripe forge Jul 14, 2020, 12:21 PM

#

This is where the magic of ml comes in. When it comes with the rules, it's evaluated against its overall task. That is, get the best score.

silk axle Jul 14, 2020, 12:22 PM

#

But what's to stop, for example, the ML at some point determining that water is the sky, and then adding rules that are actually looking for water instead of the sky?

marble jasper Jul 14, 2020, 12:22 PM

#

Neural Nets are like this, but you don't need to use haar wavelets (the squares shown above)...although actually nothing stops you from using them. you can use the raw pixel data directly

ripe forge Jul 14, 2020, 12:22 PM

#

When it invariably fails the first time, it makes changes so that it improves. As for how the machine itself comes up with rules, that's a very specific concept of "optimization."or in case of neural networks, it's called"backpropogation"

#

Because we have the input output pair.

marble jasper Jul 14, 2020, 12:22 PM

#

no. you're not looking for water or sky. you are looking at raw pixel values. the computer doesn't understand concepts like "water" or "sky", only "this is dark", "this is light"

ripe forge Jul 14, 2020, 12:22 PM

#

We show it this is what you should have done

#

And it sees but this is what I did

#

And then it optimizes to do a better job

#

That's the rule creation/improvement step

silk axle Jul 14, 2020, 12:23 PM

#

I'm kinda confused with that. We say it got it wrong, but how does it know where it went wrong?

#

When we only see the end output, not the process of how it got there

ripe forge Jul 14, 2020, 12:24 PM

#

Yes and no. See, rules, when we humans think of them, have meaning

#

For machines, it's as mindless as a math equation

marble jasper Jul 14, 2020, 12:24 PM

#

that's the thing that makes neural networks hard to reason about. There are a lot of rules in there, and some of them are wrong or suboptimal. But we don't actually care (or rather, we can't care)

#

we only care that it gets a certain percentage of results right or wrong

silk axle Jul 14, 2020, 12:25 PM

#

But say it's not getting enough accuracy, how do we improve it?

marble jasper Jul 14, 2020, 12:25 PM

#

we can't go in there and figure out which rules are wrong and correct them. that would take impossibly long

ripe forge Jul 14, 2020, 12:25 PM

#

They just solve the equation. The equation ends up making our inputs and outputs coincide. At that point, it doesn't matter that we can't explain every rule, just that this combination of rules or math equations understood the input output pairs.

#

All this marketing hype around ai being smart? All lies.

marble jasper Jul 14, 2020, 12:26 PM

#

this is where traditional non-neural network ML differs from neural networks. In the example I gave above, adaboost/haar uses an evolutionary algorithm for improvement

#

that's quite easy to understand

ripe forge Jul 14, 2020, 12:26 PM

#

It's literally a bunch of equations.

marble jasper Jul 14, 2020, 12:26 PM

#

it takes a bunch of rules, and then makes random changes to them, and sees if it does any better

#

we don't actually care that these rules have errors or inefficiencies in them. we just care that this random mutation of rules performs better than this other random mutation of rules

#

we don't care why

#

or we can't care why

#

(it's too complex to actually go in and look at them)

ripe forge Jul 14, 2020, 12:27 PM

#

But say it's not getting enough accuracy, how do we improve it?
@silk axle we don't. Not In the traditional sense. You gave up control over the rules.

silk axle Jul 14, 2020, 12:27 PM

#

🤔

#

So it's basically just hoping that it's good enough?

marble jasper Jul 14, 2020, 12:27 PM

#

think about dog breeders

ripe forge Jul 14, 2020, 12:27 PM

#

we care about getting the most input output pairs right

marble jasper Jul 14, 2020, 12:28 PM

#

dog breeders try to breed certain traits in dogs - size, temperament, coat pattern, stamina for racing, etc. etc.

#

dog breeders aren't geneticists, they can't go into the genetic material and figure out what's "wrong"

pale thunder Jul 14, 2020, 12:28 PM

#

You essentially have an algorithm that guesses what changes would improve, and if they do, you go on from there. Improve enough times and you get an actually useful model

ripe forge Jul 14, 2020, 12:28 PM

#

Ooh this is a good example

marble jasper Jul 14, 2020, 12:28 PM

#

yet they can still breed dogs with certain traits

#

(again, still using adaboost as an example, NN slightly different as Lakmatiol suggests)

stray ivy Jul 14, 2020, 12:29 PM

#

the core of data science and ML is statistics, so you're never really going to get anything 100%. good enough is pretty much the best case scenario

marble jasper Jul 14, 2020, 12:29 PM

#

it's like saying "I don't know what you did, but whatever it is, keep doing it/do more of that"

silk axle Jul 14, 2020, 12:31 PM

#

Yea, that makes more sense, thanks

ripe forge Jul 14, 2020, 12:31 PM

#

That's the magic. For neural networks specifically, It's like, you took an multiple choice question example for some kid. without teaching him anything. The kid took random guesses. but then you told him, this was right, this was wrong.

#

Then the kid is made to resit the same exam

silk axle Jul 14, 2020, 12:31 PM

#

It's just making guesses and if it guesses right it'll keep going down that path, if it guesses wrong, it backtracks and tries something else?

ripe forge Jul 14, 2020, 12:31 PM

#

Eventually this kid is gonna ace this exam

#

Without knowing a single thing about this subject

pale thunder Jul 14, 2020, 12:32 PM

#

Yeah, though it is a bit smarter about it than random in most cases

ripe forge Jul 14, 2020, 12:32 PM

#

Aye

#

Call it the most over simplification of the principle, but it's a good one

stray ivy Jul 14, 2020, 12:32 PM

#

also the "learning" bit isn't as nebulous as you would think. it's really just using legit metrics for statistics/regression/w.e. to "learn" what's better

#

like, "better" could mean CVRMSE goes down

#

as an example

ripe forge Jul 14, 2020, 12:33 PM

#

Aye, the real learning is simply some kind of optimization or back propagation

#

Just like solving some equation

#

For the machine it makes sense. And for us, we don't really care.

pale thunder Jul 14, 2020, 12:34 PM

#

It is a lot of pretty basic numerical calculus

#

Sometimes

stray ivy Jul 14, 2020, 12:35 PM

#

maybe

#

afaik, calculus isn't something terribly easy to implement

marble jasper Jul 14, 2020, 12:35 PM

#

I've learned to never trust anyone who says "it's just basic calculus"

stray ivy Jul 14, 2020, 12:35 PM

#

for real

ripe forge Jul 14, 2020, 12:35 PM

#

I've learned to never trust anyone. 😉

stray ivy Jul 14, 2020, 12:36 PM

#

calculus as a theory is pretty easy for humans to understand, but it's pretty difficult to get a computer to "understand" calculus

#

that's why there exist numerical methods for approximating such things

pale thunder Jul 14, 2020, 12:36 PM

#

Numerical approximations of calc are quite simple. Algebraically solving it is not.

silk axle Jul 14, 2020, 12:36 PM

#

So I know more behind the theory of how it all works now, but how would I proceed with my actual task? @ripe forge

stray ivy Jul 14, 2020, 12:37 PM

#

for sure, numerical approximations are pretty simple. i was more referring to the algebraic portion

ripe forge Jul 14, 2020, 12:37 PM

#

You need. Your dataset. Input output pairs. Fortunately they exist. Mnist. Then you need, a model. Fortunately they also exist. Then you just need a tutorial that shows both. For that, give me one sec if you want a CNN. Alternatively meseta linked one

silk axle Jul 14, 2020, 12:38 PM

#

What do you mean by a "CNN"?

pale thunder Jul 14, 2020, 12:38 PM

#

Yeah, though in its basic form in NNs, it is the chain rule. Which is not horrible to do on paper

#

Convolutional neural network

#

They are generally better than most other NNs at dealing with images

silk axle Jul 14, 2020, 12:39 PM

#

Right, okay

ripe forge Jul 14, 2020, 12:57 PM

#

https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification/

Machine Learning Mastery

Jason Brownlee

How to Develop a CNN for MNIST Handwritten Digit Classification

How to Develop a Convolutional Neural Network From Scratch for MNIST Handwritten Digit Classification. The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning. Although the dataset is effectively solved, it can be used...

#

In the code, change the keras imports to tensorflow.keras

#

It's a great site so I trust their tutorials but the code might be old now

dull turtle Jul 14, 2020, 12:59 PM

#

app.logger.info("result2 : %s", result2)                   
            search_index = (result2)
            mlb = MultiLabelBinarizer()
            mlb.fit([label_map])
            predictions = model.predict(samples_to_predict)[0]
            idxs = np.argsort(predictions)[::-1][:2]
            for (i, j) in enumerate(idxs):
                #build the label and draw the label on the image
                label = "{}: {:.2f}%".format(mlb.classes_[label_map], predictions[j] * 100)
                cv2.putText(test_img, label, (10, (i * 30) + 25), 
                            cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
                # show the probabilities for each of the individual labels
            for (label, predictions) in (mlb.classes_, predictions):
                print("{}: {:.2f}%".format(label, predictions * 100))
            ```

silk axle Jul 14, 2020, 12:59 PM

#

Thanks @ripe forge

dull turtle Jul 14, 2020, 1:00 PM

#

Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\paymentz\dynamic_testing.py", line 123, in post
    label = "{}: {:.2f}%".format(mlb.classes_[label_map], predictions[j] * 100)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices```

ripe forge Jul 14, 2020, 1:03 PM

#

@TizzySaurus If you'd like a bit of a deeper dive (but still a high level one) behind the theory for this task from a NN perspective, take a look at this. http://neuralnetworksanddeeplearning.com/

dull turtle Jul 14, 2020, 1:03 PM

#

@ripe forge i am also working on deep learning NN

#

!pastebin

arctic wedgeBOT Jul 14, 2020, 1:03 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

silk axle Jul 14, 2020, 1:04 PM

#

@ripe forge I get this error whenever trying to use tensorflow module (importing) -- even just import tensorflow gives this https://paste.pythondiscord.com/ujamukigas.sql

dull turtle Jul 14, 2020, 1:04 PM

#

https://paste.pythondiscord.com/ecomuvoveq.py see code here @ripe forge

ripe forge Jul 14, 2020, 1:05 PM

#

Sorry @dull turtle i personally was never great at debugging actual model code, and I wouldn't be able to spare the time to take a look.

#

Weird @silk axle , do you use anaconda or python directly

silk axle Jul 14, 2020, 1:07 PM

#

Python 3.6.8 directly, on Windows 64-bit if that matters

dull turtle Jul 14, 2020, 1:07 PM

#

Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\paymentz\dynamic_testing.py", line 123, in post
    label = "{}: {:.2f}%".format(mlb.classes_[label_map], predictions[j] * 100)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices```

@ripe forge see error here

ripe forge Jul 14, 2020, 1:08 PM

#

If I had to guess, one of j or label_map is wrong. But I'm not sure what's being passed where

dull turtle Jul 14, 2020, 1:09 PM

#

https://www.pyimagesearch.com/2018/05/07/multi-label-classification-with-keras/ see this i am following look at their

PyImageSearch

Adrian Rosebrock

Multi-label classification with Keras - PyImageSearch

In this tutorial you will learn how to perform multi-label classification using Keras, Python, and deep learning.

ripe forge Jul 14, 2020, 1:09 PM

#

@TizzySaurus perhaps try to make a new virtual env and see if you can get a clean setup to work. Not too familiar with using python directly since I rely on conda commands. But basically installing tf correctly with its dependencies is a painful experience at times. I suspect it's simply your library is not setup correctly

#

It's just a guess though, sorry.

silk axle Jul 14, 2020, 1:10 PM

#

tbh I've never used venvs before and have no clue how to

dull turtle Jul 14, 2020, 1:10 PM

#

see here @ripe forge

📎 unknown.png

ripe forge Jul 14, 2020, 1:11 PM

#

😅 Me neither, I let conda commands handle environment creation

#

But you won't have conda.

silk axle Jul 14, 2020, 1:11 PM

#

I guess I'll try and figure it out, thanks

dull turtle Jul 14, 2020, 1:12 PM

#

i am trying to print the prediction with percentage @ripe forge

#

can u have a looke there bro it take only less than 5 mins bro?

ripe forge Jul 14, 2020, 1:14 PM

#

Sorry, I personally can't right now. Code debugging is rarely a 5 minute thing, and I wouldn't be able nor willing to spare the time right now.

#

If you'd like to, perhaps take up a help channel though. Hopefully someone can get back to you

dull turtle Jul 14, 2020, 1:15 PM

#

actually i am at the end of my projecyt

#

they recommende to data science

ripe forge Jul 14, 2020, 1:15 PM

#

As always, don't try to single out people for help, we might not always have the time to spare.

dull turtle Jul 14, 2020, 1:16 PM

#

yeah i can understand

uncut shadow Jul 14, 2020, 1:32 PM

#

Hey. What's variance (rather in statistics, than ML)? I understand it's definition and I know the formula, but I don't get what does it tell e.g. what does it tell me that variance of some numbers is e.g. 21.1324?

ripe forge Jul 14, 2020, 1:34 PM

#

In simple terms, it's just a number that indicates the spread from the mean value

uncut shadow Jul 14, 2020, 1:35 PM

#

Yeah, but what does it actually tell me? Like, when you measure height, and you get that building you measured is 10 m tall then it means that there is 10 m from the ground to the highest point of that building. What about the variance? How could I "show" it or sth?

ripe forge Jul 14, 2020, 1:35 PM

#

That means, it's more useful when you talk about it relatively.

uncut shadow Jul 14, 2020, 1:36 PM

#

hmmm

#

Ok

ripe forge Jul 14, 2020, 1:36 PM

#

Well, meters is a unit of height. Sadly varience was never quantified into well defined units. Not in the sense you'd like it to

#

However, standard deviation and variance still talks about something very fundamentally important about a group of data points.

#

Two groups with the same mean, can have different variances. So if I tell you one group has variance 10 and another 100, relatively you can infer which one has points more close together yes?

uncut shadow Jul 14, 2020, 1:38 PM

#

yeah

#

makes sense

ripe forge Jul 14, 2020, 1:38 PM

#

Cool. So yep, hope that helps

uncut shadow Jul 14, 2020, 1:38 PM

#

thanks

#

also, do you know why is standard deviation square root of variance?

ripe forge Jul 14, 2020, 1:39 PM

#

Other way around. Deviation came first. Edit: wrong. Ignore this

uncut shadow Jul 14, 2020, 1:39 PM

#

oh

ripe forge Jul 14, 2020, 1:39 PM

#

We squared because we didn't want positives and negatives to cancel out

#

Hm.

#

Now that I think of it, I'm not sure if that's a 100% correct. But that's how I understood it at the time

lapis sequoia Jul 14, 2020, 1:40 PM

#

Think of it this way. You have two datasets: [-10, 0, 10, 20, 30], and [8, 9, 10, 11, 12]. They both have a mean of 10 which gives some indication about which value they are clustered around, but they're obviously quite different.
That's where variance (or more commonly standard deviation) comes in.

#

I'm assuming you know how to calculate the variance. And for the first dataset it's 200, and for the second is 2

#

So if by knowing the mean and variance only you have some sense of how these two datasets are distributed

#

They are both around 10, but one is much more clustered than the other

uncut shadow Jul 14, 2020, 1:42 PM

#

oh, Ok, thanks

lapis sequoia Jul 14, 2020, 1:42 PM

#

Using the std.dev you get 10√2 and √2

ripe forge Jul 14, 2020, 1:42 PM

#

Apparently it can get really involved too. For the square root question

#

https://stats.stackexchange.com/questions/269405/why-do-we-take-the-square-root-of-variance-to-create-standard-deviation

Cross Validated

Why do we take the square root of variance to create standard devia...

Sorry if this is has been answered elsewhere, I haven't been able to find it.

I am wondering why we take the square root, in particular, of variance to create the standard deviation? What is it ab...

lapis sequoia Jul 14, 2020, 1:42 PM

#

And there you can see that the standard deviation of one set is 10 times greater than the other. Which makes perfectly sense.

uncut shadow Jul 14, 2020, 1:43 PM

#

oh Ok

lapis sequoia Jul 14, 2020, 1:44 PM

#

Variance is just std.dev squared, so I suggest you find a tool online or use a graphing calculator and just plot various normal-distributions and see how the standard deviation (and thus variance) changes the distribution of the data

ripe forge Jul 14, 2020, 1:45 PM

#

Aye, std dev gets back on the same scale as all the values were.

lapis sequoia Jul 14, 2020, 1:45 PM

#

yea

ripe forge Jul 14, 2020, 1:45 PM

#

Which has a certain elegance to it

lapis sequoia Jul 14, 2020, 1:45 PM

#

And also, units (unless that's what you meant)

ripe forge Jul 14, 2020, 1:45 PM

#

Mhm, yep

mellow spruce Jul 14, 2020, 3:59 PM

#

Hello all, I am trying to think what is the best approach to solve this problem. I need to create a table that upon selecting some items will refer to other tables of those specific items to generate some calculations to be displayed on the main table. Furtheremore, if wanted these other tables used to calculate the main table should be accessible and show calculations performed on a data base or the lower heriarchy table. so It should be something like main table->table 1-> table2-> table 3. each table + some other data will be used to calculate the table above. I did some research and using the datatable library seems to do it but if there are other ways I am happy to hear. also let me know if this makes sense.

limpid raft Jul 14, 2020, 4:00 PM

#

When building a 2D convolutional model, how do you determine the filter size and the order you place the layers in. For example when could filter size 64 then 32 be better than 32 then 64?

ripe forge Jul 14, 2020, 4:13 PM

#

General guidance go big to small. It's mostly a trial and error. However, one intuition I can suggest is thinking in terms of "more simple" and "more complex"

#

So a model with bigger layers, more layers, more neurons, etc is capable of overfit ting more. It's a more complex model

#

That kind of deal.

#

As for why big to small, the intuition is this: sparse to dense compacts information. Dense to sparse expands the space in which information lies

#

So the latter can teach the model to pick up on noises in the data, which is usually bad.

#

Disclaimer, these are the intuitions I've formed to understand how things work, and I cannot vouch for their technical rigour. Just, they've served me well so far.

limpid raft Jul 14, 2020, 4:18 PM

#

That's useful. Thank you!

ripe forge Jul 14, 2020, 4:19 PM

#

@mellow spruce sounds fine, though it just feels like an unnecessary complicated structure. Perhaps normalize (join and merge) some tables to make it easier. Otherwise sure, seems like a common table join approach

mellow spruce Jul 14, 2020, 4:53 PM

#

@mellow spruce sounds fine, though it just feels like an unnecessary complicated structure. Perhaps normalize (join and merge) some tables to make it easier. Otherwise sure, seems like a common table join approach
@ripe forge thanks!

drowsy kite Jul 14, 2020, 5:52 PM

#

is there a more efficient way to get dummy values for data with a lot of dates? I went from having 27 columns to 600 columns mostly based on the fact that it created a new column for each date instance.

#

dates play a good role predictability in this case so i don't want to take it out completely

flat quest Jul 14, 2020, 5:53 PM

#

Well one way might be to just not use one hot encoding?

@drowsy kite

drowsy kite Jul 14, 2020, 5:55 PM

#

that's a good point but i've stemmed away from using it in the past because models usually interoperate the non binary input as an increase when its not necessarily the case

pastel compass Jul 14, 2020, 5:59 PM

#

I'm trying to understand the code behind a simple encoder-decoder network example, but I had a small question. Is "input" in green because it corresponds with the input() function, or does it mean something else in this context?

📎 unknown.png

flat quest Jul 14, 2020, 6:00 PM

#

Yeah it’s green because the editor just sees it as the input function

stray ivy Jul 14, 2020, 6:00 PM

#

if there's a function defined as input(), then you should rename your variables

flat quest Jul 14, 2020, 6:00 PM

#

Rather than a function arg

#

It’ll still work but it might just throw you off

pastel compass Jul 14, 2020, 6:01 PM

#

Cool, I noticed the same thing for iter as well, so I'll change them to inp and iterable or something like that. Thanks!

flat quest Jul 14, 2020, 6:01 PM

#

Np

#

@drowsy kite that’s true. But dates do have an inherently numerical structure, and if you use dL nns rather than a default sklearn models, it should be able to deal with the non linearity better.

drowsy kite Jul 14, 2020, 6:06 PM

#

interesting, i didn't know that

#

do you know if its possible to transform some categorical data via one hot encoder and the other by label encoding ?

#

@flat quest

flat quest Jul 14, 2020, 6:17 PM

#

yeah you can pick and choose. Generally its done by selecting the specific columns you want to either one hot or label encode

@drowsy kite

real hollow Jul 14, 2020, 6:30 PM

#

Hello good evening guys

#

could someone help me with using more than one core to calculate a for loop with a nested for loop inside?

#

i cannot wrap my head around how to actually force python to use more than one core

#

its 2 for loops iterating through a pandas dataframe

#

the same dataframe

#

operations are made on strings not numbers which appears to be an additional obstacle

#

im using a server with 12 cores and the code runs only on one thread

#

estimated run time will be 15h-20h...

lapis sequoia Jul 14, 2020, 6:47 PM

#

have you tried dask? dask is specifically designed for these problems and it is developed on top of pandas and joblib

#

https://docs.dask.org/en/latest/delayed.html
specifically delayed might be what you are looking for

#

actually the basic dask dataframe should do it, probably no need for delayed, my bad

eternal rivet Jul 14, 2020, 6:51 PM

#

Hey all, I'm plotting a line graph with 5 lines on it. 4 of them are continuous but I need the 5th to have a few breaks in it where there is no data for the x coordinate. I'm struggling to find help on stackoverflow. Anyone know how to handle this?

real hollow Jul 14, 2020, 7:01 PM

#

Thank you @lapis sequoia 🙂 Will check it out right now!

#

WOAH THAT IS EXACTLY WHAT I NEEDED

#

YOU ARE GREAT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

flat quest Jul 14, 2020, 7:33 PM

#

hmm sounds like dask is useful 😉

maybe i should start using it

lapis sequoia Jul 14, 2020, 8:15 PM

#

@real hollow no problem mate. have fun using it 🙂

outer tusk Jul 14, 2020, 8:44 PM

#

any one know how i can drop the unnamed row or column?

📎 Screen_Shot_2020-07-14_at_4.42.08_PM.png

#

tried data.drop() and it wont delete it without dleeting the entire column

#

and also drop the 'Canada (map)' label

#

📎 Screen_Shot_2020-07-14_at_4.36.42_PM.png

lapis sequoia Jul 14, 2020, 8:51 PM

#

I stumbled across this:

webhook = DiscordWebhook(url=url)
embed = DiscordEmbed(title="**Ban Hammer**",
                     description=f"Banned {member}",
                     color=242424)

But that color value is really weird and how can I make it in pure red like rgb(255, 0, 0)

#

I tried searching it up but found nothing and also hex color doesn't work

rare portal Jul 14, 2020, 9:00 PM

#

any one know how i can drop the unnamed row or column?
@outer tusk Can you provide a screenshot of the dataframe? Print the index too, that looks like unamed is set to be the index, so you can df.reset_index and drop the column I believe. You can also drop the column before setting the index.

outer tusk Jul 14, 2020, 9:01 PM

#

📎 Screen_Shot_2020-07-14_at_5.01.11_PM.png

pastel compass Jul 14, 2020, 9:01 PM

#

Sorry for asking the question here, but would anyone be able to share which file extension I should used for a bcolz.carray object?

outer tusk Jul 14, 2020, 9:01 PM

#

driver = webdriver.Chrome(executable_path='/Users/w000jc/downloads/chromedriver')
driver.get('https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=2010001701')
html = driver.page_source
tables = pd.read_html(html, index_col=[0])
data = tables[0]
driver.close()``` @rare portal

pastel compass Jul 14, 2020, 9:01 PM

#

I've seen .dat, but I'm not 100% sure if that is typical convention

outer tusk Jul 14, 2020, 9:05 PM

#

@rare portal when i do data.drop(columns="Canada (map)") it drops everything under it

rare portal Jul 14, 2020, 9:34 PM

#

@rare portal when i do data.drop(columns="Canada (map)") it drops everything under it
@outer tusk Yeah, I see the issue. I don't have selenium installed right now so I can't provide you with the code, but can you try deleting the index_col from the read_html method and replace it with skiprows=[0]? You want to skip the Canada map line and set headers to line 1 I believe. Something like:

pd.read_html(skiprows=0, header=1)

That should return a cleaner dataframe,. Basically, the issue is that your data is 'nested' under the Canada (map) column so dropping using columns="Canada (map)" will drop everything.

outer tusk Jul 14, 2020, 9:35 PM

#

@rare portal it just worked using this

data.columns = data.columns.droplevel(level=0)```

#

multiindexes 😡

rare portal Jul 14, 2020, 9:36 PM

#

@rare portal it just worked using this
data.columns = data.columns.droplevel(level=0)```

@outer tusk haha yeah. They're quite tricky.

outer tusk Jul 14, 2020, 9:37 PM

#

last thing

#

for column in data.columns:
    data[column].apply(lambda x: x[:-1])``` how can I put this 'inplace'?

#

trying to iterate over all of the columns and take off the a letter at the end of the dollar amounts.
for example every cell is like so: 1231313A or 1441B

#

i want the letters to be removed and trying to loop over every column

umbral geode Jul 14, 2020, 9:45 PM

#

Hey,can someone help we with dataframes?

#

I wrote this code:

#

df1 = a_list=["hi","bye","car"]
for word in a_list:
if 'i' in a_list:
print(word)

#

That part worked.

#

Then I did this: df1=pandas.DataFrame([a_list])

#

and got this

#

0 1 2
0 hi bye car

#

I should only get hi.

#

From multiple columns how can I filter required columns ,and put them in a dataframe.

#

Please help.

pseudo sonnet Jul 14, 2020, 10:02 PM

#

I'm confusion

#

You shouldn't be only getting hi?

umbral geode Jul 14, 2020, 10:03 PM

#

oh

#

I am trying to filter and put the filtered results in the dataframe.

rare portal Jul 14, 2020, 10:04 PM

#

for column in data.columns:
    data[column].apply(lambda x: x[:-1])``` how can I put this 'inplace'?

@outer tusk Hmm, you don't need to iterate through the columns; in fact, you should avoid doing that due to performance reasons.

You can use pd.replace to make these kinds of operations. Combined with regex, you can do quite a lot of powerful stuff.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html

Something like:

df.iloc[:,1:].replace(regex='[^0-9]',value='', inplace=True)

Should replace none numeric characters from all columns that go from column 1 to last column.

pseudo sonnet Jul 14, 2020, 10:04 PM

#

where is the filtering happening?

umbral geode Jul 14, 2020, 10:05 PM

#

Only words that have i in them should be put in the dataframe.

pseudo sonnet Jul 14, 2020, 10:05 PM

#

I know

#

but where are you filtering?

#

There is no filtering going on in that code

umbral geode Jul 14, 2020, 10:05 PM

#

for word in a_list:
if 'i' in a_list:
print(word)

#

I am filitering words with i

outer tusk Jul 14, 2020, 10:05 PM

#

for column in data.columns:
    data[column].transform( lambda x: re.sub(r'[A-Z]', '', x)).replace(',', '')``` I did this but the commas still there 😦 @rare portal

pseudo sonnet Jul 14, 2020, 10:05 PM

#

that doesn't filter it, that just prints all the words with i in them

#

those words remain in a_list

umbral geode Jul 14, 2020, 10:06 PM

#

I know,that is what I need. It did that but when I put in dataframe it is printing all words in the dataframe.

pseudo sonnet Jul 14, 2020, 10:07 PM

#

well yeah

umbral geode Jul 14, 2020, 10:07 PM

#

hi is the only word that should be in dataframe but all words are going in

#

df1 = a_list=["hi","bye","car"]
for word in a_list:
if 'i' in a_list:
print(word)

pseudo sonnet Jul 14, 2020, 10:07 PM

#

because you're not filtering anything out of a_list

umbral geode Jul 14, 2020, 10:07 PM

#

Could you tell me how?

pseudo sonnet Jul 14, 2020, 10:08 PM

#

print() doesn't delete anything

umbral geode Jul 14, 2020, 10:08 PM

#

Oh no not that

#

I need it to delete in the dataframe.

#

df1=pandas.DataFrame([a_list])

#

what do I need to do to make only words with 'i' go in dataframe

pseudo sonnet Jul 14, 2020, 10:09 PM

#

that's something that's better handled outside the dataframe

#

so you want to keep a_list as it is?

umbral geode Jul 14, 2020, 10:10 PM

#

yes

pseudo sonnet Jul 14, 2020, 10:10 PM

#

df1 = a_list=["hi","bye","car"]

#

change this to ```python
another_list = a_list = ["hi", "bye", "car"]

#

since assigning a_list to df1 is redundant if you're just gonna reinitialize df1 as a dataframe anyway

#

for word in another_list:
    if 'i' in word:
        print(word)
        another_list.remove(word)

#

then initialize pd1 as a dataframe with another_list

umbral geode Jul 14, 2020, 10:15 PM

#

It did the opposite of what I need.

#

It removed hi,but I need it to remove the rest and only put 'hi' in the dataframe.

pseudo sonnet Jul 14, 2020, 10:19 PM

#

oh sorry I misread lmao

#

well then just use if 'i' not in word

#

or initialize another_list as an empty list and use another_list.append(word)

umbral geode Jul 14, 2020, 10:22 PM

#

so i just change remove(word) to append(word)?

pseudo sonnet Jul 14, 2020, 10:26 PM

#

initialize another_list as empty as well

#

https://www.tutorialspoint.com/python/python_lists.htm

Python - Lists - Tutorialspoint

Python - Lists - The most basic data structure in Python is the sequence. Each element of a sequence is assigned a number - its position or index. The first index is zero, the s

umbral geode Jul 14, 2020, 10:28 PM

#

ok so i made another_list empty

#

another_list=[""]

#

now i have no data though

#

if i make it an empty list

outer tusk Jul 14, 2020, 10:54 PM

#

anyone know why this line is jsut silently failing

for column in data.columns:
    data[column].transform( lambda x: x[:-1]).replace(',','')```

#

not returning anything in jupyter

#

no changes made in the columns

modern canyon Jul 14, 2020, 10:57 PM

#

Hello, anyone here has any experience with recommendation systems?

#

I have this dataframe:

📎 Capture.PNG

#

I'd like to take some movie/tv-series titles as input and recommend some similar items by taking in 'titleType', 'genres','averageRating' as features

#

How do I go about doing this?

#

Can cosine similarity provide good results?

#

If yes, how do I feed in non-numerical data as input?

marble jasper Jul 14, 2020, 11:08 PM

#

you'd need to tokenize the non-numerical data

#

or vectorize I guess

#

for example if you did: Documentary=0, News=1, Sport=2, Drama=3, Adventure=4, Fantasy=5, Biography=6, Family=7 then your genre vectors would be:

[0, 0, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 1, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 1, 1]```

#

for the five films

#

cosine similarity may work ok-ish for matching the genres

#

I don't think you should try to do cosine similarity across genres and averageRatings for example

#

or at least you'd need to figure out an appropriate weighting

pseudo sonnet Jul 14, 2020, 11:15 PM

#

yeah

#

Up to you how much you think rating should be considered in comparison to genre

#

or if you don't trust yourself on that you could find a pre-trained model for movie recommendations and see what the feature importances are

serene scaffold Jul 14, 2020, 11:27 PM

#

I think I've almost wrapped my head around neural networks. I'm trying to make one to map between vector spaces.

#

Softmax is a function you can use to calculate loss, yes?

#

Do you multiply the loss vector by each matrix of weights?

modern canyon Jul 14, 2020, 11:33 PM

#

@serene scaffold @pseudo sonnet I see, so what are the SOTA methods for recommendation systems? I'd prefer not to use pre-trained models as I am a beginner and would love to learn and build my own models.

serene scaffold Jul 14, 2020, 11:33 PM

#

recommendation systems? No idea, I've only done NLP.

lapis sequoia Jul 14, 2020, 11:56 PM

#

Hi, How to convert column object to column string

frank bone Jul 14, 2020, 11:57 PM

#

some help needed please:
im trying to get the lowest directory names in a folder structure using this code (from stackexchange):


for root,dirs,files in os.walk(starting_directory):
    if not dirs:
        lowest_dirs.append(root)```
I have to get many thousand paths back, but it's the same depth for every single one. Is there a way to define a "mindepth" so it doesnt waste so much time walking?

#

found solution i think, ill just use this https://walkdir.readthedocs.io/en/stable/

pseudo sonnet Jul 15, 2020, 12:41 AM

#

@modern canyon I wouldn't know, try googling. I mean my opinion is that you don't really need ML to do what you're doing. You're really just developing a similarity metric between movies based on the features you chose

#

An ML-appropriate problem would be optimizing that metric based on a bunch of training data

frank bone Jul 15, 2020, 12:44 AM

#

is there any performance loss when running python code in visual studio code? or is it the real thing?

serene scaffold Jul 15, 2020, 1:12 AM

#

@frank bone VSC isn't what's running the code, so no

outer tusk Jul 15, 2020, 1:12 AM

#

Anyone know why these aren't returning any changes?

📎 Screen_Shot_2020-07-14_at_8.56.52_PM.png

#

📎 Screen_Shot_2020-07-14_at_8.52.59_PM.png

#

no error msgs

frank bone Jul 15, 2020, 1:20 AM

#

@serene scaffold i just tested it though, it was somewhat faster

#

Like 30-40%

#

30s vs 17s, multiple runs

#

Anyways is there something to make it even faster? Read something about c wrappers or something? Any experience how much faster it can get?

serene scaffold Jul 15, 2020, 1:22 AM

#

if it was faster, I don't think it had anything to do with VSC.

#

What are you trying to do?

frank bone Jul 15, 2020, 1:22 AM

#

Sifting through csv files and storing data in dataframes

serene scaffold Jul 15, 2020, 1:24 AM

#

Looks like Pandas uses Cython.

frank bone Jul 15, 2020, 1:24 AM

#

Oh so not much to gain i guess?

#

Or nothing

#

17s to sift through 12370 items, import 2 data columns from each csv and multiplying + summing them then print to console

#

It seems very slow imo

#

Gonna take like 1-2 hours for a year of data

serene scaffold Jul 15, 2020, 1:27 AM

#

I'm not really experienced with that, though I have gotten significant performance improvements with Cython code.

#

I've never used pandas

frank bone Jul 15, 2020, 1:28 AM

#

Time to investigate

#

But 1-2h waiting is painful

#

Especially if you have to make frequent changes to parameters

serene scaffold Jul 15, 2020, 1:29 AM

#

I know that if you're using numpy, you miss out on a lot of the C optimizations if you use for loops.

#

that might be true with pandas as well

frank bone Jul 15, 2020, 1:30 AM

#

The main thing i use is for loops with pandas 😄

#

And pandas uses numpy arrays

serene scaffold Jul 15, 2020, 1:30 AM

#

what's an example of an operation that you do with for loops?

frank bone Jul 15, 2020, 1:31 AM

#

Iterate through csv files