#data-science-and-ml

1 messages · Page 224 of 1

radiant nymph
#

I dont think its about losing imp but dimension explosion and overfitting

charred blaze
#

sorry @wind plume , will look into your problem now

#

I'm a bit tipsy but let's see what I can do.

flat quest
#

@radiant nymph no if important features are not selected in ur model due to one - hot ur model will not perform well. Overfitting won't really happen as long as u limit the tree depth / size, and that will deal with dim explosion as well.

radiant nymph
#

okk.

charred blaze
#

@wind plume Until LOC 152, your module looks OK to me.

#

considering your final purpose with this, I still think the whole column replacement issue is not the way to go.

#

considering that you just want to plot data according to where they're dried or wet, couldn't you just create to separate dataframe for each of these types, while having a common attribute between them and when it comes to the code that draws the bar plots, just have this code reference the common attribute then?

#

sorry if it's not very clear

wind plume
#

I appreciate the tipsy advice @charred blaze, in all seriousness!

What exactly do you mean tho? Are you saying like make a data frame with all wet samples and all dry samples? Rather than have a master dataframe with ALL samples? Wouldn't that require me to sort them at the beginning near where I ask for file inputs? Not sure the best way to go around it tbh.

charred blaze
#

two separate dataframes

wind plume
#

The fact I get NaNs from my input confuses the fuck out of me tho. The code is probably not elegant at all as it's my first real project

charred blaze
#

one for the wet samples

#

another one for the dry samples

wind plume
#

Would you recommend making the two dataframes AFTER I have the master sheet? Or have two master sheets

#

My thought process was having one dataframe that I'd constantly be uploading new data to and then isolating however many columns and graphing them. It's the isolation part that is giving me trouble with the Nan shit

charred blaze
#

I say before the master sheet

wind plume
#

So how do you envision it working? I'd still have to either concat, melt, or do what I am currently struggling with, no?

charred blaze
#

yes, you would have to join those two dataframes afterwards

wind plume
#

Would I join them a different way, tho? I still think I'd use the same or similar function right?

charred blaze
#

you need a common attribute in each row of those two separate dataframes

wind plume
#

But how? Just the same as I tried to make my previous code work?

#

input wet or dry in a column etc

#

maybe append a column after removing outliers??

lapis sequoia
#

Can anyone recommend a book of maths required for data science and ml?

lapis sequoia
#

Thanks

digital lynx
#

can someone help me with pandas plz @ me

lusty coral
#

@digital lynx what's wrong?

real wigeon
#

can i print multiple slice objects? using iloc?

flat quest
#

yeah

#

ilocs prob the best way to go for that

digital lynx
#

I just need to know if Pandas.to_csv(filename) overwrites the current csv file if it has stuff on it

#

I am making a bot that takes data from a csv file and graphs it and changes the data in the table. I need to delete the first row, and add data to the end. I need to know if the to_csv() method will just overwrite the file because that is what I need, not appending all the data I just changed

opaque stratus
#

Hey ---> can anyone help me install TensorFlow on VSCODE?

whole roost
#
  • Can anyone here help me with specific code troubleshooting? Basic array and histogram use
digital lynx
#

wouldnt you just pip install on your machine? @opaque stratus

rough prawn
#

yo
i have this json

"warnings": [
      {
        "id": 711390341789646919,
        "reason": null
      }

how can i remove the obj with that specific id

paper niche
#

I am making a bot that takes data from a csv file and graphs it and changes the data in the table. I need to delete the first row, and add data to the end. I need to know if the to_csv() method will just overwrite the file because that is what I need, not appending all the data I just changed
@digital lynx it does. the default mode is 'w', which is 'write'.

opaque stratus
#

Hey

#

Currently using VSCode's jupyternotebook interface

#

any idea how I could route the usage to my laptop CPU and GPU?

blazing bridge
#

I suggest using google colab instead, everything runs in the cloud @opaque stratus

opaque stratus
#

@blazing bridge yeah i know lol

#

used it before

#

i just love vscode's look

blazing bridge
#

Oh ok

opaque stratus
#

and not google colabs lol

#

😦

blazing bridge
#

Yeah Ik what you mean

real wigeon
#

So long story short, I'm reading the pandas docs in my spare time. I'm curious what the best way to go about this would be. Should I be reading linearly, or would it be better to pick certain topics (if so which ones)?

quaint wyvern
#

@real wigeon I personally prefer to learn smth doing a project. reading docs isnt gonna stick to your mind unless you actually use them in code. so if you go with topics and related classes it would be better. i think Kaggle has a Pandas course. its short and useful

real wigeon
#

Ok

#

I have a job with a lit of downtime so I'm trying to read docs

#

And code when I'm home

long shard
#

@opaque stratus ... use PIP intall in your command prompt/ Conda Prompt to download necessary modules ... Open VS code and import those modules .. That will work

lapis sequoia
#

Any ideas?

#

Hello, I am trying to clean up some panda dataframes using BeautifulSoup. I am unable to apply that to one column. Any help is appreciated.

#
import pandas as pd
from bs4 import BeautifulSoup


df = pd.DataFrame({"id": [1,2], "a":[ ["<a>Hello</a>"],["<c>Aorld</c>"]], "b":[["<c>World</c>"],["<c>Corld</c>"]]})
df['c'] = df.apply(BeautifulSoup(df['a'].all(), 'html.parser').get_text())

print (df)
echo kelp
#

@lapis sequoia I'm a little confused... Are you trying to parse an existing webpage and put it into a dataframe? It seems like you're ultimately mixing two distinct data structure here

lapis sequoia
#

@echo kelp well, I am trying to clean up the text in column a by removing any html tags.

#

and I saw that BeautifulSoup can help

echo kelp
#

you can access the data in tags returned in beautiful soup by using .content

#

so if you have a tag already stored as an object, you should be able to return the 'Hello' for example, by using something like tag.content

#

I'm not exactly sure about the syntax

lapis sequoia
#

sorry, I am not sure if I am following you

echo kelp
#

yeah, no, sorry

#

well

#

I don't think it would be best practice to store the raw tag themselves as the data in columns a and b

#

if you're looking to manipulate those strings, I'd probably try to use something like a regular expression rather than beautifulsoup in this context

#

beautiful soup can parse tags, but I don't know how applicable it is to iterating over a series of tags in this fashion, particularly when returned from a dataframe

lapis sequoia
#

ahh i see

#

let me try regular expressions

echo kelp
#

did this point you in this direction?

lapis sequoia
#

similar posr

#

post

echo kelp
#

yeah

#

I can definitely see how it applies

#

I do know though, if you are working with pandas dataframes, every action you take should be "vectorized". Ideally, iterating over a dataframe row by row is heavily discouraged by pandas. So, you can definitely construct a solution somehow doing this, maybe someone else might know better than I do.

lapis sequoia
#

Hello?

#

thanks

echo kelp
#

np, sorry I couldn't find a neat solution

lapis sequoia
#

@echo kelp i think i found it...

#
df['c'] = df['a'].apply(lambda text: BeautifulSoup(''.join(text), 'html.parser').get_text())
#

this worked for me

#

thanks again

#

it was incorrect data type being passed

echo kelp
#

great!

#

that's nifty, great use of a lambda function

stone ruin
#

did anyone else chuckle when they first saw panda's cumulative functions?

balmy chasm
#

@real wigeon
I find that the problem with reading docs is that they don't really have a structure/lesson plan. You just go to learn random tricks, not really see how they fit together.

I've heard a lot of good things about this book, and I plan to read through it myself down the road (It was written by the creator of Pandas).

https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1491957662

real wigeon
#

yeah i was going to watch a freecodecamp tut

#

and use the docs to augment the knowledge

#

while trying a project

last peak
#

HEy

#

Im new to ML i had some basic question about Linear Regression
I am trying to understand this question
A^TA x = A^T b

#

If someone could message me I could give some more context I had some question to clarify what even is going on here

valid drum
#

Hi, I'm having troubles with implementing Conv2D backpropagation using Numpy.
This is what I've done for forward propagation:

ch, h, w = x.shape
Hout = (h - self.filters.shape[-2]) // self.stride + 1
Wout = (w - self.filters.shape[-1]) // self.stride + 1

a = np.lib.stride_tricks.as_strided(x, (Hout, Wout, ch, self.filters.shape[2], self.filters.shape[3]),
                                    (x.strides[1] * self.stride, x.strides[2] * self.stride) + (
                                    x.strides[0], x.strides[1], x.strides[2]))
out = np.einsum('ijckl,ackl->aij', a, self.filters)

I tried doing this but it's not working:

F = np.lib.stride_tricks.as_strided(x, (n_filt, size_filt, size_filt, dim_filt, size_filt, size_filt),
                                    (x.strides[0], x.strides[1] * self.stride, x.strides[2] * self.stride) + (
                                    x.strides[0], x.strides[1], x.strides[2]))
F = np.einsum('aijckl,anm->acij', F, dA_prev)
dF = np.zeros(shape=self.filters.shape) # shape=[n_filters, ch, h, w]
size_filt = self.filters.shape[-1]
for filt in range(n_filt):
    y_filt = y_out = 0
    while y_filt + size_filt <= size_img:
        x_filt = x_out = 0
        while x_filt + size_filt <= size_img:
            dF[filt] += dA_prev[filt, y_out, x_out] * x[:, y_filt:y_filt + size_filt, x_filt:x_filt + size_filt]

This is working great but very slow

lapis sequoia
#

!unzip '/content/drive/My Drive/Colab Notebooks/Dataset.zip' not working. The command is run but the images dont show anywhere in my drive. I have a zip file in my drive which I wanna unzip to use for training testing and validation.

late cargo
#

Is it ok to webscrape a website if its robots.txt has nearly nothing? It only has 3 lines

kindred finch
#

It depends what those three lines are and, more importantly, if they have a ToS page

#

If you paste the website link here I could take a quick look

late cargo
kindred finch
#

Looks like a no

You agree:
not to use any manual or automated software, devices or other processes (including but not limited to spiders, robots, scrapers, crawlers, avatars, data mining tools or the like) to "scrape" or download data from any web pages contained in the Website```
In the Terms of Service https://www.horoscope.com/us/tos.aspx
#

@late cargo

late cargo
#

Thanks

solar oracle
#

How should I go about choosing the best impute method? I don't wont to remove data because it is very small already.

raw rapids
#

@solar oracle there's so many ways to impute values

#

The best thing to do is to run a grid search on the best imputer

#

for easy tasks sklearn's SimpleImputer() is really useful

#

There's also Knn Imputation and MICE Imputation

#

Interpolation

#

the list gos on

#

the most method would be to run a grid search on the imputers

sullen glacier
#

hello, I want to learn data science or at least improve my understanding of basics in this area, what free materials may you suggest, I'd be also happy if some people will agree for in person help so I will feel no shame asking stupid questions

raw rapids
#

is a good introduction to machine learning

#

without too much rigor

#

then there's Andrew Ng course on Coursera

#

which is also really good

sullen glacier
#

@raw rapids looks hard but I will note those materials, thank you

raw rapids
#

how did u assume its hard

#

lol

sullen glacier
#

@raw rapids because I've checked about deep learning before, it's extremely hard by itself

raw rapids
#

no its not

#

its based around simple concepts

#

the deep learning mit course

#

is really beginner-friendly intro

sullen glacier
#

I was thinking that first I should basics on learning before going deep but maybe I was wrong

raw rapids
#

I started of with the course I mentioned above and I'm doing fine

#

is a really good place to supplement your skills

#

they have a treasure trove of awesome notebooks

sullen glacier
#

recently I only discovered what is a notebook

raw rapids
#

well

#

you can keep a note of the things I mentioned above

devout sail
sullen glacier
#

sure I will

raw rapids
#

ya thats the Andrew Ng course I mentioned earlier

devout sail
#

Oh cool didn't see you did

#

Yeah, Andrew's one of the best

raw rapids
#

ya definitely

devout sail
#

The certificate costs money, but participating is free

sullen glacier
#

good

raw rapids
#

yup, it requires a lot of time and dedication in my opinion

#

if you want to retain as much info as possible

#

so you have to make a commitment

#

but its I agree with @devout sail , that is a very good intro

sullen glacier
#

my English is very bad but I may try it

somber tapir
#

Hello, I hope this is the correct channel. I am very new to using python and trying to just write a simple dividend yield formula for a single stock, but want to actually see the steps involved. (My code) # NHI dividend yield=dividend per share/market price per share# d=[1.10, 1.11, 1.12] m=47.25 y=dividend/market
(output) Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'list' and 'float' However, if I write the code without a list it works fine >>> d=1.10

m=47.25

y=d/m
y
0.023280423280423283
y*100
2.3280423280423284 What am I doing wrong with my list?

solar oracle
#

!code

arctic wedgeBOT
#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
These are backticks, not quotes. Backticks can usually be found on the tilde key.
• You can also use py as the language instead of python
• The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')
solar oracle
#

Please paste code this way so we can read it properly

somber tapir
#

d=1.10

m=47.25
y=d/m
y
0.023280423280423283
y*100
2.3280423280423284
d=[1.10, 1.11, 1.12]
m=47.25
y=d/m
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'list' and 'float'

solar oracle
#

Oh you are doing it in console

somber tapir
#

'''python

solar oracle
#

"Should be backticks, not quotes."

#

Your problem is that you can't divide a list with a float

jolly briar
#

@somber tapir if you're using the console a lot have a look at ipython console, it's sooo much nicer

somber tapir
#

thanks I am trying to put the code in chat, but am such a noob and don't want to spam the chat with my bad attempts

jolly briar
solar oracle
#

just use `

somber tapir
#

and a float refers to the fact that I have decimals correct?

solar oracle
#

the problem is coming from you trying to divide the LIST with anything, if it was int instead of float it would still raise an error

#

you need to divide the items inside the list

jolly briar
#

or use numpy

solar oracle
#

that too

somber tapir
#

Okay, I can see I have a knowledge gap. I am going to do some reading. Thanks for the help!

solar oracle
#

I think it is actually a fairly intuitive "mistake", but more learning never hurts. Have fun!

somber tapir
#

Oh holy hell iPython does look way nicer.

lapis sequoia
#

What math is used in a self driving car? All the way from the auto pilot code to the electronics that drive the car from the outputs the code gives?

jolly briar
#

@lapis sequoia arithmetic would be used at all levels i imagine

lapis sequoia
#

Yeah I guess so

storm plume
#

Hey guys, I'm familiar with manipulating data in Alteryx with GUI but I'm trying my hand at doing it with Pandas. I'm trying to do a cross join with 3 series for a dataframe for every possible combination. Is there an equivalent function in Python/Pandas?

#

Here's an example I made.

raw rapids
#

@storm plume

#

You could create a array of permutations with sympy and then make it into the dataframe

storm plume
#

Nah, I figured it out... you have to create a dummy column and do repeated joins.

#

df1.assign(foo=1).merge(df2.assign(foo=1)).drop('foo',1)

#

Wish there was a cleaner way to do it, but oh well.

lusty coral
#

@storm plume you could have used pandas.MultiIndex.from_product, then convert it to dataframe

storm plume
#

Ahhhhhhhhhhh! I just looked at the documentation.

#

That's perfect.

#

Wish I had known about the existence of it earlier. Thanks!

teal turret
#

Hi guys, is anyone here familiar with tesseract ? I am sort of new to python, but i am familiar with other languages,
when i try to get the text off this image I just get "AN afi" not even a number, i tried to invert the image but still got a similar result, any ideas?

import pytesseract as tess
from PIL import Image
import PIL.ImageOps


# inverted_image = PIL.ImageOps.invert(img)
# inverted_image.save('new_name.png')

img = Image.open("text.png")
text = tess.image_to_string(img)
print(text)```
#

here is the image

lusty coral
#

why you inverting the image?

#

@teal turret

#

@teal turret set "exposure" to -100, this way the text is more clear

teal turret
#

I jut tried to see if inverting will help

blazing bridge
#

Anyone have recommendations on the best way to learn and be proficient in machine learning. For example courses and books

#

Typically online would be best

storm plume
#

Hey guys, I'm trying to do a left join and keep only the parent table of anything that did not match.

#
data1 = {'NameA':['Tom', 'Nick', 'Krish', 'Jack'],
        'AgeA':[20, 21, 19, 18]}
data2 = {'NameB':['Tom', 'Nick', 'C', 'D'],
        'AgeB':[20, 21, 3, 4]}
 
# Create DataFrame
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
list = [df1, df2]

df1 = pd.merge(df1,df2,how='left',left_on=['NameA','AgeA'],right_on=['NameB','AgeB'])
print(df1)```
#

Output:

#
0    Tom    20   Tom  20.0
1   Nick    21  Nick  21.0
2  Krish    19   NaN   NaN
3   Jack    18   NaN   NaN
#

Expected:

#

How should I approach this?

tepid ocean
#

Anyone have recommendations on the best way to learn and be proficient in machine learning. For example courses and books
@blazing bridge check out the book by the Keras creator Francois Chollet - Deep Learning with Python. It's a very great resource with lots of code examples and tutorials. Also, have a look at the fast.ai website (https://www.fast.ai) and community forum where you can share code and learn from other's coding

swift latch
#

hi everyone i am new to this channel, i have a question regarding an error i am facing using twitterAPI. i have included an image of my question below

sonic lichen
#

guys, a question: if I want to import a script with functions that also contains imported modules (e.g. sys, os) into an empty script in order to reuse my code, is there a way around to use those already imported modules (sys, os) or do I have to reimport them again?

uncut shadow
#

well, I have never tested it but I don't think there is point in doing that

sonic lichen
#

What is the most clean way to use it then: I have functions that rely on imported modules and now I want to import those functions into a new script; do I reimport those already imported modules? I actually see with dir(function.py) that it does import previously imported modules, but their use needs to be function.sys.argv for example which of course is very annoying

polar acorn
#

What? If you have in a module a function that import sys. You do not need to explicitly import sys along with that function in a different module if that's what you're asking. Just import the function itself and it should work.

sonic lichen
#

@polar acorn thnx :))

uncut shadow
#

oh, that;s what you meant

#

then yes, you don't need to import anything

wind plume
#

I'm trying to find and filter outliers by each column individually, and then make a new dataframe with the outliers completely filtered.

The output has the same values in every cell, except some values are NaN which I assume means the filter worked. The code seems half functional.

#
for col in df_new.columns:
    Q1 = df_new[col].quantile(0.25)
    Q3 = df_new[col].quantile(0.75)
    IQR = Q3 - Q1
    lower = Q1 - 1.5*(IQR)
    upper = Q3 + 1.5*(IQR)
    
    target = df_new[col]
    df_iqr[col] = df_new[((target > lower) & (target < upper))]
#

Using pandas BTW. Filtering by quartiles and defining it in the for loop. I assume my problem comes with the last sentence of the code. I'll try and play around. At the moment it is filtering things and not filtering things. It is completely removing a row (index skips from 11 to 13)

#

If i should be posting this in a help thread let me know

#

I fixed one part by changing the last line to df_iqr from df_iqr[col] but for some reason it is just blasting index 12 row from every column

paper niche
#

I fixed one part by changing the last line to df_iqr from df_iqr[col] but for some reason it is just blasting index 12 row from every column
@wind plume As long as the row has 1 column where the value is an "outlier", the whole row will be removed. What are you expecting will happen instead?

#

the dataframe has to remain in a tabular format (you can't just remove a "cell" -- speaking in excel terms)

wind plume
#

But other rows have outliers, yet they don't remove. I'm kinda hoping that I will filter through every column individually by the columns IQR. Yet there's no way there's all outliers in row 12

#

I hope that makes some sense.

#

I expect it to be replaced with NaN, a sign either a cell was blank or I filtered properly.

paper niche
#

what's df_iqr?

wind plume
#

That would be the new filtered dataframe

#

Col would be the individual columns in df_new

#

So it's saying df iqr is taking the same column names as dfnew, yet using dfnews values and applying some function to it to selectively filter

paper niche
#

hmm but df_new is not changing throughout the loop, so the final df_iqr would just be the df_new with the rows with last column's outliers removed

#

your last line of the loop (after your "fix"), will keep overwriting df_iqr with df_new with 1 column's outliers removed

wind plume
#

I don't see how you get that, not saying you're wrong I just don't really understand what the last line of code is doing outside of the conditions I set.

paper niche
#

let's make sure we're on the same page first:
is this what you have at the moment?

for col in df_new.columns:
    Q1 = df_new[col].quantile(0.25)
    Q3 = df_new[col].quantile(0.75)
    IQR = Q3 - Q1
    lower = Q1 - 1.5*(IQR)
    upper = Q3 + 1.5*(IQR)
    
    target = df_new[col]
    df_iqr = df_new[((target > lower) & (target < upper))]
wind plume
#

So it's seeing an outlier in row0 and thus deleting it?

#

And yep!

paper niche
#

okay, so when col == 'B' (second loop), it sees row 2 has an outlier

#

and df_iqr will be set to a dataframe with row 2 removed

#

but then the next loop, col=='C', df_iqr gets set back to the full df_new (because there's no outliers), and row 2 appears back in df_iqr again

#

the final loop with col=='D', row 0 has an outlier, and so df_iqr is set to df_new with row 0 removed --> and that's what you get.

#

the issue here: df_new is never changing, yet df_iqr is being assigned df_new with a single column filtered (outliers removed)

#

and every loop df_iqr is being overwritten

wind plume
#

So basically it will remove the last row with an outlier? And in this case it happened to be mine

paper niche
#

it will remove rows where the last column has an outlier

wind plume
#

It sounds like I need to overwrite df_iqr[column] them

paper niche
#

so you want, at the end of everything, df_iqr and df_new to have the same shape?

#

just with the outliers to be replaced by np.nan?

wind plume
#

But if I replace df_iqr with df_iqr[column] I get a dataframe where column A is all the other columns but with the filter applied

#

Correct

#

I want the filter to apply and remove them. I figured making a new dataframe was the way to go but maybe not

paper niche
#

yea that won't be possible. you can't assign a dataframe as a series / column in a dataframe

lapis sequoia
#

hey yo

wind plume
#

Making a new dataframe out of the filtered values that is.

paper niche
#

in the simple example I have above: what is the desired output?

#

which rows should be in df_iqr?

wind plume
#

I can't easily pastrbij right now but C0 and B2 would be NaN

#

All rows

paper niche
#

okay, you're not looking to remove them then. just replacing the outlier values with nan

wind plume
#

Yes, I suppose so. When I made this before, I was only working with a one column dataframe so it wasn't bad

#

But when working with multiple columns and applying individual statistics to each column its getting hard

#

I feel like my code is very close

acoustic forge
#

I might be super dumb. But I am currently doing predictions on a dataset. I tried with both Forest and Linear regression models, but my R2 score is constantly negative

sonic night
#

Hello all, I'm novice in data analysis, I need some help, how can I show number range in x and y starting from 1? Thank you very much for your help.

paper niche
#

@wind plume maybe try this

Q1 = df_new.quantile(0.25)
Q3 = df_new.quantile(0.75)
IQR = Q3 - Q1

df_iqr = df_new.query('(@Q1 - 1.5 * @IQR) < @df_new < (@Q3 + 1.5 * @IQR)')
lapis sequoia
#

hey there

wind plume
#

What does @ do @paper niche

paper niche
#

it's a syntax for you to access your python variables within the query string

#

it's akin to

df_new[(Q1 - 1.5*IQR < df_new) & (df_new < Q3 + 1.5*IQR)]
wind plume
#

I don't think I've seen thst before but I'm pretty new to coding and pandas. Does it not work if you don't have the @

#

Do i use the above code in my for loop? I assume not.

lapis sequoia
#

yeah first time I'm seeing this too

#

is this new.. is it performant

#

the query method

paper niche
#

I don't think I've seen thst before but I'm pretty new to coding and pandas. Does it not work if you don't have the @
@wind plume no it doesn't, and no there's no need for a loop. If you have a look at what Q1 - 1.5*IQR < df_new is, it's a dataframe of the same shape as df_new, with elements as booleans. (True, if the corresponding element in df_new is a low outlier, False otherwise). You can use this boolean "mask" to filter from df_new to get your df_iqr

#

is this new.. is it performant
@lapis sequoia I think so, lemme try to pull up a SO thread about this..

lapis sequoia
#

I see the last commit was on march 2020.. seems new

paper niche
#

I can't seem to find it.. basically it saves you multiple lookups, especially if you're doing things like df[df.A > 10 & df.B < 100 & df.C > 10]

#

if I remember correctly

wind plume
#

@paper niche woah that worked. I don't really get how. I've seen people use query for things like this. I don't know how it is parsing through every column individually and doing statistics on it. That is insane.

#

With so little code wtf

paper niche
#

the performance is not significant if you're not doing multiple lookups like this (edited...)

wind plume
#

I really want to understand this and not just accept this as an answer and move on

#

Cuz this is confusing to me

paper niche
#

@paper niche woah that worked. I don't really get how. I've seen people use query for things like this. I don't know how it is parsing through every column individually and doing statistics on it. That is insane.
@wind plume break the code down into smaller pieces. like I said, have a look at what Q1 - 1.5*IQR < df_new is first, then what (Q1 - 1.5*IQR < df_new) & (df_new < Q3 + 1.5*IQR) is, then finally what df_new[...] looks like. you'll get a better feel of what this code is doing

wind plume
#

Like why it is a string. How it is looping through everything. What the @ does when the variables are already well defined

#

Okay I will

#

Mind if I try to explain it to you?

paper niche
#

go ahead

wind plume
#

The @q1-1.5*@iqr and @q3+1.5*iqr is applying a filter. It is saying if any value falls between that, it is now in df_iqr

#

Sorry for the bootleg discord code

#

On mobile atm

paper niche
#

no worries, yeah I get you

wind plume
#

I don't quite get the @ and why it is a string but I can look that up. And I assume the df_new.query is saying "for the values in the dataframe df_new that fit our condition, turn that into a dataframe" I guess it seems redundant and more of a procedure. I assume this wouldn't work if you used a generic df that would be defined earlier?

#

The grippy part is what the @ in the string is doing, I assume it is shorthand based off what you are saying

#

Because normally if you use IQR in a formula and you defined IQR before it is no problem

paper niche
wind plume
#

So a query by necessity needs to be a string

lapis sequoia
#

are you familiar with server logs, I'm trying to build a parser to end all parsers

paper niche
#

if you had a dataframe with a column x

a = 1
df.query("x < @a")

it's just replacing @a with 1 (your python variable)

wind plume
#

So how is it applying my filter to EVERY column, individually?

paper niche
#

So a query by necessity needs to be a string
@wind plume yeah. within the query string, normal alphabetic characters/words are interpreted as the column names (thus if you had df.query("x < a") instead (without the '@') then it will try to look for a column in df called 'a'

wind plume
#

Why is it not taking the average Q1 iqr etc? Of the whole dataframe

paper niche
#

okay 1 sec

#

are you familiar with server logs, I'm trying to build a parser to end all parsers
@lapis sequoia no unfortunately not. haha I don't deal with the logs 😉

lapis sequoia
#

me neither..

#

Apparently there is something called the common log format, and it can have n fields..

paper niche
lapis sequoia
#

but people have been trying to parse these with regexes, with limited success

#

seems like, they have to add capture groups when the logs change.. so I'm wondering if there's a way to make a parser that will parse the first line to check for all available fields from the entire list of fields defined in common log format

#

and not throw errors when a field is missing from newer logs

paper niche
wind plume
#

But your IQR and Q1/3 in this case i assume is applied not by column but by dataframe right?

#

Oh no, apparently not

paper niche
#

sorry I mispoke just now

#

but this expression calculates the upper outlier per column

#

in a pandas series

lapis sequoia
#

and if a new field is introduced, it should be able to account for that too

paper niche
#

performing df_new < (series), pandas compares all the elements in column 'A', 'B' and 'C' with the respective outlier value in the (series) -- this is essentially your for-loop

#

you end up with a dataframe of booleans (called a mask) -- shown in the pic above

wind plume
#

But in that example above, q3+1. 5*iqr won't be a series I think

#

You're just saying do this function

paper niche
#

you mean in the query string?

wind plume
#

Ah thst was your query string in this case? Ok

paper niche
#

if the query string syntax is still confusing, we can just discuss this one (it's entirely equivalent):

df_iqr = df_new[(Q1 - 1.5*IQR < df_new) & (df_new < Q3 + 1.5*IQR)]
#

where Q1, IQR and Q3 are pandas Series holding the respective per-column values

wind plume
#

The above makes a ton of sense. You are saying the iqr dataframe is now df_new with the appropriate cutoffs

#

So my problem was that I used THAT line inside a for loop

paper niche
#

yeah exactly. You were confused about how I was achieving this without a for-loop

#

hopefully it's clear now with the explanation about the masking

wind plume
#

And every time I did a for loop, it would throw out the last row that had a outlier

#

So I guess I need to learn when I need to use a for loop or not haha

paper niche
#

And every time I did a for loop, it would throw out the last row that had a outlier
@wind plume nono, you would throw out rows with an outlier in that column (the column that you're currently iterating over)

#

So I guess I need to learn when I need to use a for loop or not haha
@wind plume rule of thumb: explicit loops in pandas (and numpy) shouldn't be required in most cases (certainly not when the operations being performed are so simple)

wind plume
#

I've always thought for loops as if I had a list and I wanted to apply iterations on every item in the list, use a for loop

paper niche
#

for ordinary python list, this is true. but not with numpy and pandas. much of the speed improvements from using these packages comes from knowing how to take advantage of vectorized operations.

#

and if a new field is introduced, it should be able to account for that too
@lapis sequoia regex isn't known for being flexible tho xD are you planning to do this with regex too? or..?

wind plume
#

Gooootcha. So the rule of thumb is if I'm a beginner there shouldn't be any need for me to use a for loop? I realize what I'm trying to do isn't probably mega difficult but it seems like the only way to solve this is to cheat and get help

lapis sequoia
#

I am not sure of the direction yet.. I think it'd be a nice tool to make

wind plume
#

It's my personal project not for school or anything so it's not cheating but you know what I mean

lapis sequoia
#

found the github

#
line_parser = apache_log_parser.make_parser("%h <<%P>> %t %Dus \"%r\" %>s %b  \"%{Referer}i\" \"%{User-Agent}i\" %l %u")
paper niche
#

Gooootcha. So the rule of thumb is if I'm a beginner there shouldn't be any need for me to use a for loop? I realize what I'm trying to do isn't probably mega difficult but it seems like the only way to solve this is to cheat and get help
@wind plume yeah, just try to keep in mind when dealing with numpy/pandas/similar scientific computing packages that if you're explicitly building loops, there's probably a better way of doing it. and don't guilt trip over getting help haha. It's part of the learning process. Reading other people's answers is how you become aware of better solutions to your problems (it beats reading through the entire documentation yourself)

lapis sequoia
#

thing is if you see this line, it defines exactly what pattern to expect and where, this won't work if a field is missing in the log or if the fields are in different places

paper niche
#

so the common log format assumes every line follows the same format?

#

yeah I see

wind plume
#

I really appreciate the help dude.those pesky for loops have been fucking me on my pandas project lol

paper niche
#

np 🙂

lapis sequoia
#

I'll work on a small part of my project first..

#

so, I'm thinking.. find all the fields available and set everything else to null in a row.. that way I can accept data when the field is introduced in newer logs

real wigeon
#

so im a bit new to datascience related projects

#

i looked up how to combine duplicate values across multiple columns

#

and am now trying to sort the output from greatest to smallest

#

data = confirmed.groupby('Country/Region')['5/13/20'].max().apply(lambda g: g.nlargest(20).sum())

#

im getting an error regarding the .max()

uncut shadow
#

well, it would be better if you would provide the error tho

jolly briar
#

@real wigeon do all the parts make sense? what is returned by ...max() ? does it make sense to .apply() to that returned value?

real wigeon
#
line 52, in <module>
    total_sum_by_region()
  File "/Users/asdkals/Library/Preferences/PyCharmCE2018.3/scratches/scratch_18.py", line 34, in total_sum_by_region
    data = confirmed.groupby('Country/Region')['5/13/20'].max().apply(lambda g: g.nlargest(20).sum())
  File "/Users/aklsdjals/.local/share/virtualenvs/COVID19-tX0C9oPJ/lib/python3.7/site-packages/pandas/core/series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2329, in pandas._libs.lib.map_infer
  File "/Users/alksdjals/Library/Preferences/PyCharmCE2018.3/scratches/scratch_18.py", line 34, in <lambda>
    data = confirmed.groupby('Country/Region')['5/13/20'].max().apply(lambda g: g.nlargest(20).sum())
AttributeError: 'int' object has no attribute 'nlargest'
valid drum
#

Hi, I tried to implement backpropagation for convolutional layer but for some reason the results are wrong.
I tried to make a full convolution of the filters and the previous layer's gradients.
dA_prev shape : [K, H, W]
w(filters) shape: [K, C, H, W]
x shape: [C, H, W]

dA_dim, dA_h, dA_w = dA_prev.shape # previous layer's gradients
pad_h = dA_h - 1
pad_w = dA_w - 1
ow = np.pad(w, ((0, 0), (0, 0), (pad_h, pad_h), (pad_w, pad_w)), 'constant')
ow = ow[:, :, ::-1, ::-1]
dA = np.lib.stride_tricks.as_strided(ow, (ow.shape[0], x.shape[1], x.shape[2], dA_h, dA_w, ow.shape[1]),
                                     (ow.strides[0], ow.strides[2] * stride[0], ow.strides[3] * stride[1]) + (
                                         ow.strides[2], ow.strides[3], ow.strides[1]))
dA = np.tensordot(dA, dA_prev, axes=[(0, 3, 4), (0, 1, 2)])
sturdy laurel
#

Hey I am looking for an uncased POS tagging model (prefferably using the hugging face tranformers frame work) does any one have any recorces?

#

please tag me if you ahve any info I am going to be away from discord for a bit and I dont want to miss it 🙂

rain palm
wind plume
#

Maybe Im just putting in for loops when I dont need to, but say I have values (what I call item) in a column 'Sample' in dataframe called df_melt...

I've been stuck on this for a few hours. I am not sure if this is a red flag and it means my fundamentals are screwed or if this is tricky, or if there are millions of examples I can look up online. I hate coming here asking for help

Ultimately what I want to do is look and see if a value in 'Sample' column has one of the case-insensitive keywords, and if it does, make correspond THAT specific row with 'Weathered'. If it does not have the keyword, we assume it is dry, therefore we call it 'Dry'.
An example column would have a value called "X Dry" or "Y weathered"

df_melt['State'] = ''
keywords = ['wet','weathered','weather']

for item in df_melt['Sample']:
    if any(kw.lower() in item.lower() for kw in keywords):
        print(item + ' is wet')
        df_melt['State'] = np.where(df_melt['Sample'].str.contains(item), 'Weathered','Dry')
    else:
        print(item + ' is dry')
#

My natural instinct is to make for loops if I want to iterate thru a list but as fickletofu said, there's probably ways around using for loops.

Is this where I should build a query?

#

Fwiw when I do this, I see the correct values labeled as 'is dry' or 'is wet', it's a matter of writing it. Not sure how, or why it's so difficult.

rain palm
#

@wind plume Like this?

In [39]: df = pd.DataFrame({'Sample': ['wet', 'WET', 'weathered', 'weather', 'dry']})                                                                                           

In [40]: df['State'] = np.where(df['Sample'].str.contains('wet|weathered|weather', case=False), 'Weathered','Dry')                                                              

In [41]: df                                                                                                                                                                     
Out[41]: 
      Sample      State
0        wet  Weathered
1        WET  Weathered
2  weathered  Weathered
3    weather  Weathered
4        dry        Dry
wind plume
#

@rain palm is there any way to make it totally case insensitive tho? So it could accept WeT, etc. That was my hope with the keywords. Will this also work for something named "720 Wet" or something like that?

rain palm
#

Do you know how to use regex?

wind plume
#

I don't, is it hard to learn? If this is something that will 100% help I am willing to learn

rain palm
#

Finds "720 WET" it seems:

>>> df = pd.DataFrame({'Sample': ['wet', '720 WET', 'weathered', 'weather', 'dry']})
>>> df['State'] = np.where(df['Sample'].str.contains('wet|weathered|weather', case=False), 'Weathered','Dry') 
>>> df
      Sample      State
0        wet  Weathered
1    720 WET  Weathered
2  weathered  Weathered
3    weather  Weathered
4        dry        Dry
fallow thunder
#

@wind plume It will help you long way with string matching.

wind plume
#

I missed the case = false, that is AWESOME.

rain palm
#

Yup.

#

The pandas docs are (unusually - sadly) very well written.

wind plume
#

That clarifies so much

#

I think then, any time I want to make or search something case insensitive I can do that

#

Does regex do it better or faster?

#

It's so counter intuitive to me why you wouldn't use a for loop, but fickletofu was right. Good solution without any for loop nonsense

#

Idk, if I am struggling with stuff like this is it normal? Or does this mean I really need to sit down and watch some YouTube class

rain palm
#

No, regex isn't faster necessarily.

#

Fine, takes time to learn.

fallow thunder
#

Avoid youtube.

wind plume
#

Do you recommend ways to learn this? I was really stuck and was going into for loops and shit when I really didn't need to. Tried a bunch of stuff and spent like hours on it. Then you showed me to use "|", and case = false which was immensely helpful. Doubt I could have found that elsewhere

#

What I heard was to go make your own program that's genuinely useful for you. That's what I'm doing

#

I've used this time WFH to learn to code since I am a research scientist and can't be in lab lol

fallow thunder
#

If you want to learn the right way that will help you a lot, look for books in youtube, if you want to find something there.

#

Also check the sites from the tools that you use, they normally spend time doing tutorials for you.

#

And experiment with your own projects.

#

But if you don't know the basics of programming avoid data science totally

wind plume
#

I learned the very basics through python crash course, but I'm no master of it. At that point in the book it had me code a game and build a website and work up data. I decided to start my own project that would help automate graphing and data workup (remove outliers etc)

fallow thunder
#

What did you learn?

wind plume
#

Dictionaries, lists, list comprehension, user inputs, if else, etc

#

Then it got into coding a game and I felt like I was copy and pasting and not really learning. And it didn't interest me because it wasn't for work. So I made it work applicable.

#

Learned pandas was pretty damn solid, then learned how to use pandas with other packages like seaborn and numpy tho in very limited detail

fallow thunder
#

You need to get logic going. Solve some problems.

#

It will help you find ways to solve problems with the data that you use

#

You can keep going with data science without doing that, but you can get stuck quite often with for loops, if else

wind plume
#

Ahhhh, awesome thank you for the link! Are these insanely hard challenges, or totally doable for a novice and if you can complete it, it's a solid start? If not, return to the python crash course?

fallow thunder
#

They have difficulties, so you can start with the easy ones

wind plume
#

I notice I get stuck on things for hours and trying to bash my head on things isn't fun. Sometimes I fix it, other times I post here and am like "ojhhhhhhhhhhh"

fallow thunder
#

If you find yourself hardstuck with the easy ones because you don't know the syntax, you can check the python documentation.

#

That's better than any video course you can find

wind plume
#

Is syntax my issue? A lot of problems I have are because I don't know how to do somrthing I want to do. Not sure if that is logic, or syntax, or literally every coding problem ever.

You probably saw my example above and can probably eealziw what I was trying to do, but the fact I couldn't do one small thing meant my code didn't work even tho the rest was sound

fallow thunder
#

It doesn't seem like, that's why I'm recommending you to do code challenges and read the documentation

#

Both things can help you to find solutions (for example, the case=False, it's in the documentation)

#

Alright, we should stop talking about this on a data science channel, if you want to ask for help #python-discussion

wind plume
#

I appreciate it a lot :)

uncut shadow
#

Hello. I have been looking for books for Machine learning from scratch (it Has to be from scratch, so no frameworks like TF, Pytorch, theano etc. just numpy, pandas or matpmotlib etc.) Unfortunatelly, I couldn't find any. Does anybody know any good books?

twin parcel
#

Thanks to everyone helped me with my job scraper! got an interview for a company that creates them next week 😄

lusty coral
#

Found strange pandas interaction let me share

#

Make a df

#

Then get a loc of df to another variable

#

Change original df

granite sierra
#

@uncut shadow I can't recommend any books, I had to do a similar project for a uni assignment, there are loads of youtube tutorials, and if you type for example "neural net in python from scratch", loads of examples

lusty coral
#

Then the locced df is changed

#

Change locced df, original is not changed

#

Wth is that

#

I didn't try to change locced df after I changed original df though

full flint
#

Hi guys,

#

is anyone around to answer a quick squestion?

#

I have a scatter plot showing the relationship between cosine and euclidean distance matrices that looks like this:
https://gyazo.com/372514a67c18132b6364582cfdc6125c

I have been asked to plot a second order polynomial over the data.

Our practicals and lectures don't really cover this so I was wondering if somebody could help explain? 😅

potent hamlet
#

hi everyone, anyone know about TCN(Temporal Convolutional Network)? i have project to predict inflation in my country (it's time series case). I know TCN is evolution from CNN and it use for image processing, but i've read that TCN can be used for time series data, i want to implement TCN on my case(infaltion) but
I had difficulty getting started. maybe you have used it or you have reference about that, please tell me

polar acorn
#

@potent hamlet Check out https://github.com/philipperemy/keras-tcn for a implementation, easy to use and might work for your use case. I've used TCN for time series classification but not for forecasting. Worked great for classification at least.

potent hamlet
#

oh thank you very much

#

maybe you have repo on github about your TCN time series? can i see?

#

oh yes one more question, does that mean TCN is not suitable for forecasting (regression)?

polar acorn
#

@potent hamlet Sorry that repo is private and not mine I just wrote it 🙂 As I said I haven't tried it for time series forecasting and although it might work fine (Google had at least one article for "TCN time series forecasting") I have a feeling it would be overkill for something like inflation.

flat quest
#

haven't looked into TCN's yet, but if its anything similar to CNN's should work on time series regression just fine.

#

well you know you could always run an ml model on it and get the plot @full flint
it may not be as good as other statistical methods for finding the second order polynomial over the data, but would likely be significantly easier

#

@uncut shadow are you trying to implement complex models just to learn the code behind it or also the mathematical understanding of it?

uncut shadow
#

Yes

lapis sequoia
#

How do I plot 2 columns (one has decimal numbers ranging from 1 to 10 and other has corresponding values to that) using pandas and matplotlib? I want to plot the whole number 1-10 in x axis.

lapis sequoia
#

how can i master numpy

#

i cant learn all those syntaxes at once

polar acorn
#

Find a project you can do that would use numpy a lot and then learn what you need for that. Learning by doing often works better than reading through all the documentation or similar.

raw raptor
#

hello

#

I made some code for a neural network a little while back

#

I'm curious what you guys would think of it

arctic wedgeBOT
#

Hey @raw raptor!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

raw raptor
#

!code-blocks

arctic wedgeBOT
#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
These are backticks, not quotes. Backticks can usually be found on the tilde key.
• You can also use py as the language instead of python
• The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')
#

Hey @raw raptor!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
These are backticks, not quotes. Backticks can usually be found on the tilde key.
• You can also use py as the language instead of python
• The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')
rigid storm
#

Hey guys, when using likert scale type responses for your analysis, how would you handle missing values? for example, in the experiment each participant had to fill out a total of 49 statements on a 7-point scale. Within these responses, sometimes there is an answer missing randomly.

raw raptor
#

k, here it is

#

I didn't have any use for a neural network yet, so I didn't program in back propagation or a fitness function

lapis sequoia
#

@polar acorn thanks , can u suggest some projects ?

polar acorn
#

If (and only if) you have some experience in deep learning then making your own simple neural net is nice. Or you can explore the random module by implementing rock, scissors paper vs the computer. Or a simple connect four game. Or find data from something you're interested in sports, finance, dota or whatever, put it in numpy and do some analysis. You would use pandas for this in real life but for learning internals you can use numpy. Or you could google around, you're probably not the first to ask.

raw raptor
#

Wow, never thought of that, thank you! I've barely used numpy before so I'd have to do some learning with that, but I'll definitely make some of these games to test it out once I get the time.

fervent bridge
#

When a 19 year old intern says that Data Scientist and AI are the same
-.-
I don't see the data in the video below that a data scientist is working on
https://www.youtube.com/watch?v=gn4nRCC9TwQ

Google's artificial intelligence company, DeepMind, has developed an AI that has managed to learn how to walk, run, jump, and climb without any prior guidance. The result is as impressive as it is goofy.

Read more: http://www.businessinsider.com/sai

FACEBOOK: https://www.fac...

▶ Play video
lapis sequoia
#

I need to do a project for learning numpy

#

How can i get the data sets for that

uncut shadow
fading drum
#

Hey guys sorry if this is the wrong chat room, does anyone mind giving me a hand with this error?

fervent bridge
#

@fading drum Its a warning

#

and it means exactly what it says

#

your data is prob a dict I can't see what it is but its warning you that in future versions you won't be able to do such thing so change your habits

lapis sequoia
#

@uncut shadow thanks

lament tiger
#

Hello team 👋 , so I'm working on with python on colab on Q&A BERT base model using simple-transformers library (https://simpletransformers.ai/) I have a model which has been trained with squad it works pretty well and all, but! every time i ask a question i have to also provide a context where that question can be subtracted from 🤔 .

Now here is the case, let's say i have a table containing a bunch of paragraphs with specific information about depression. Now let's say someone asks a query like: "what can i do to deal with depression?". What techniques do you guys recommend or know about so that based on the question i can choose the best paragraph where the answer will be taken? 🥴

Thank you for your time guys 🙏

lapis ice
#

!paste

stable verge
#

Anyone used PIL before?

real wigeon
#

hey

#

I'm trying to get better with datascience, and currently trying to plot a line chart

#

trying to do something like this post

#

this is currently what I have

#
def plot_sums():
    confirmed.set_index('Country/Region')
    confirmed_date_time = confirmed[3:]
    date_time = pd.to_datetime(confirmed_date_time)
    countries = confirmed

    DF = pd.DataFrame()
    DF['countries'] = countries
    DF.set_index(date_time)

    fig, ax = plt.subplots()
    fig.subplots_adjust(bottom=.3)
    plt.xticks(rotation=90)
    plt.plot()```
#

and my traceback is

#

File "/Applications/PyCharm CE.app/Contents/bin/BankApp/Users/asjkdhask/PycharmProjects/COVID19/Covid.py", line 73, in <module>
    plot_sums()
  File "/Applications/PyCharm CE.app/Contents/bin/BankApp/Users/asjkdhaskjda/PycharmProjects/COVID19/Covid.py", line 28, in plot_sums
    date_time = pd.to_datetime(confirmed_date_time)
  File "/Users/aksjdhaksjd/.local/share/virtualenvs/COVID19-tX0C9oPJ/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 731, in to_datetime
    result = _assemble_from_unit_mappings(arg, errors, tz)
  File "/Users/ajksdhaskhd/.local/share/virtualenvs/COVID19-tX0C9oPJ/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 832, in _assemble_from_unit_mappings
    "to assemble mappings requires at least that "
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
#

am I not allowed to use a slice like that to denote the columns I want to use?

uncut shadow
#

@real wigeon This should work I think

twilit onyx
#

Is there any website which allows me to feed images and it will recognise the digits for me?

#

Via API calls?

real wigeon
#

that definitely exists

#

but idk where

#

I know adobe has that as a premium feature

#

I can actually help you with that @twilit onyx

#

I have a script for stuff like that

#

thank you @uncut shadow that was an interesting read

#

yeah the thing is that im getting an unresolved attribute refference for unstack()

celest comet
#

Hey all, I'm a new python developer (really a new learner of python)

#

and I'm looking to get a job doing data mining/munging

sharp raven
#

does anyone know where i can find information about urllib? like how to use it

tranquil crane
#

Is there any good machine learning course for free that uses Python?

celest comet
#

I know there's a course from the Coursera guy...

#

Andrew.....

#

let me google it

tranquil crane
#

He uses Octave

celest comet
#

Andrew Ng

tranquil crane
#

That guy's voice is so....hypnotic

celest comet
#

says it's free enrollment, most of the courses on coursera can at least be taken for free without credit

#

@tranquil crane yeah it looks like it's done in octave, but it probably applies to python as well

tranquil crane
#

Thanks

cloud ledge
#

Hi Everyone, was wondering how you guys run your distributed programming

#

For instance, if I wanted to put 10 requests in for 10 models all at once, such as an async process, is there some sort of computing power I could dial into using a key and run my model?

ivory plank
#

What is it you want to do? @cloud ledge feedforward an x into 10 different models for 10 different outputs?

cloud ledge
#

This might not be the best place to ask, so I apologize in advance. I have users from my website submit requests to run pre-defined machine learning models

#

They purchase X amount of cores/gpus to run the models on, but the problem I am trying to solve is that now, how do I run 10 models, say for 10 users, all at once

ivory plank
#

Sorry, I'm still not getting it

cloud ledge
#

I can't run 10 models at once using the procesing power of 1 server (say that only have 20 avaliable cores)

#

So I was wondering if I could offload all that work somewhere

ivory plank
#

You're running your models in a VM or a container right, and now you're trying to optimize your network protocol handling?

cloud ledge
#

So yes, the models are in a container, I just don't have the resoruces to run them

#

I might have 100 people need to run a contained model, not sure the best way to be able to do that

ivory plank
#

I can't really help you much there since I'm not very familiar with distributed computing and database optimization. But, it appears your problem doesn't have to do with with the neural networks themselves since you run them in a container and can treat them as just a piece of software.

#

You might want to try to ask the folks over at web-development/async/databases for their knowledge

cloud ledge
#

thanks ink - your patience and understanding is really appreciated

#

will do

real wigeon
#

im trying to get a sum per column from a df

#

which i then am trying to plot on a line chart

#

just need some help with the summing for now

uncut shadow
#

Well

real wigeon
#

i could probably use group by?

#

and do a loop

#

or osmething?

uncut shadow
#

You should google it How to do this in pandas, but I think you can also put it in numpy and then sum columns

real wigeon
#

i've been googeling how to sum

#

.sum()

#

and then you set the index

#

df['column_name'].sum()

#

returns sums for all the columns

#

idk, I guess I should put those new values in a series? and then plot those?

#

alright yeah sry i thought it was more complicated

uncut shadow
#

👍

#

Googling mostly solves 99,99% of problems

real wigeon
#

and print statements

uncut shadow
#

Those too lol

real wigeon
#

the fact that it summed it per column but it's just called .sum()

real wigeon
#

confused me

valid drum
#

Is using a 4D array of [n, x, x, x] will be a lot faster than iterating n times on an array of [x,x,x]?
I'm asking because Im implementing a CNN using Numpy and I need to improve the preformance in order to make it even trainable(it's 100x slower than keras CPU only)...
You can check it out here if you want:
https://github.com/shafzhr/SimpleConvNet

whole roost
#

Hi!! I have a question about transposing an array?

#

I would say I know how, but it's just ... not working?

#
    phi = np.random.uniform(0,math.pi,N)
    theta = np.random.uniform(0,2*math.pi,N)
    r = 1
    x_array = r*np.sin(phi)*np.cos(theta)
    y_array = r*np.sin(phi)*np.cos(phi)
    z_array = r*np.cos(phi)
    skin = [x_array,y_array,z_array] # find a way to flip the rows and columns on this
    # print(np.ndim(skin))
    # not sure why ndim is giving 2, it ought to be 50x3
    sphere = np.asarray(skin)
    sphere.transpose
    print(np.shape(sphere))
    # make a 3D scatterplot with matplotlib
    return sphere```
#

I'm trying to flip the rows and columns in skin. I thought I could do this with .transpose, but it's not working.

fervent bridge
#

@whole roost what are you passing in for N?

whole roost
#
    print(small_sphere)```
fervent bridge
#
import numpy as np
import math

def sample_sphere_polar(N):
    phi = np.random.uniform(0,math.pi,N)
    theta = np.random.uniform(0,2*math.pi,N)
    r = 1
    x_array = r*np.sin(phi)*np.cos(theta)
    y_array = r*np.sin(phi)*np.cos(phi)
    z_array = r*np.cos(phi)
    skin = [x_array,y_array,z_array] # find a way to flip the rows and columns on this
    # print(np.ndim(skin))
    # not sure why ndim is giving 2, it ought to be 50x3
    sphere = np.array(skin).T
    print(np.shape(skin))
    print(np.shape(sphere))
    # make a 3D scatterplot with matplotlib
    return sphere

small_sphere = sample_sphere_polar(50)
#

@whole roost

#

Worked for me seems like calling .T in the same line made a difference

whole roost
#

Thank you!!

last peak
#

hi guys anyone here familiar with this factorization
A=UEV^T

#

When A is a rectangular matrix, the SVD

#

Does the SVD become Q^T DQ, where D is diagonal eigen value matrix and Q is the orthogonal
vector matrix, when A is a square matrix

merry ridge
#

No they are different decompositions.

#

You can always decompose a matrix using it's SVD, the existence of a orthogonal diagonalization depends on the number of linearly independent eigenvectors.

jagged basin
#

what does input_dim mean in keras?

#

if I had a perceptron like this

#

and input0 is 1

#

and input 1 is 0

#

what would be the input_dim of that layer?

spark stag
#

@jagged basin in this example its(2, ), the input dimensions is the 'shape' of the data for that layer, if you have used numpy you can think of it like the shape of a numpy array, there can be different dimensions of different sizes

jagged basin
#

I see

uncut shadow
#

from what I know it's basically the number of features your data has (number of columns)

#

normally you have data like this (x, y) where x stands for number of samples in your batch and y stands for number of features

#

(in RNNs you have (x, y, z) where z stands for number of time steps but it's not an RNN)

vague hawk
#

Noob question here regarding AI:

The google image search (similar image) - I assume it uses AI like the one below, right?
https://deepai.org/machine-learning-model/image-similarity

I was wondering - how does google get the result so quickly? Wouldn't they have to go through billions of pictures on the web?

uncut shadow
#

well

#
  1. They have really powerful machines for that
  2. They probably have some metadata for images so they are not checking all images tho
#

but I'm not 100% sure about the second one

crisp anvil
#

hey folks..
i need some help
i need to start learning machine learning

uncut shadow
#

so

#

do you know python?

crisp anvil
#

yeah i know

#

but i m weak at maths

uncut shadow
#

well, it will be a problem if you want to make your own models without using any frameworks like Tensorflow, Keras or PyTorch

crisp anvil
#

actually i want to understand the underlying maths behind it

uncut shadow
#

so you need to know maths

#

mostly linear algebra

#

calculus

#

statistics

#

algorithms

#

and stuff like that

crisp anvil
#

maths and that plotting stuff

#

@uncut shadow yes

#

can you suggest me some good books to start with

#

including books for maths and ML etc

uncut shadow
#

well, I didn't read any books about this type of stuff so unfortunately, I'm not able to suggest anything

#

but you should google and search and there probably be many interesting books out there

crisp anvil
#

i've tried that
but here i encounters a problem
if i start to learn a single library book that keeps refrencing the concepts of another lib

#

here i get stuck

lapis ice
#
Epoch [56/100] Batch 300/1588                   Loss D: 0.6502, loss G: 2.3085 D(x): 0.9010
Epoch [56/100] Batch 400/1588                   Loss D: 0.6502, loss G: 2.3040 D(x): 0.9014
Epoch [56/100] Batch 500/1588                   Loss D: 0.6502, loss G: 2.3045 D(x): 0.8998
Epoch [56/100] Batch 600/1588                   Loss D: 0.6502, loss G: 2.2953 D(x): 0.8995
Epoch [56/100] Batch 700/1588                   Loss D: 0.6502, loss G: 2.3021 D(x): 0.9003
Epoch [56/100] Batch 800/1588                  

Any idea what could cause the D loss to be "stuck" after certain amount of epoch and the G loss being so huge from the very start?
Bach size is currently 8, learning rate is 0.0002

#

DCGAN

gusty willow
#

Best source to learn deep learning are?

rigid storm
#

Hi guys, im trying to compute means for some specific rows in a pd df. i changed the likert scale responses to numbers and got rid of some columns that had text in them. however it still isnt able to compute the row means..

#

any idea how it returns NaN?

#

there are some NaN values in there (10 out of 4069) but i also set skipna to true so it should be able to calculate the mean still

lapis sequoia
#

suspecting that the nans are not understood as nans

rigid storm
#

In the original csv, this was just a blank cell

lapis sequoia
#

you have to make sure pandas agrees that they are nans

rigid storm
#

But surely it should at least be able to compute rows where no NaNs are found?

lapis sequoia
#

i guess?

rigid storm
#

however, only 4 out of 84 rows have NaNs in them

lapis sequoia
#

wait you're printing the temp_df

#

not temp_df.mean

rigid storm
#

Oh oops, this is the output;

lapis sequoia
#

wrong axis?

#

are they strings?

rigid storm
#

well, i dont think so, this is the code that i used to get response to number;

#
mymap = {'Totaal niet (-3)':-3, 'Niet (-2)':-2, 'Enigszins niet (-1)':-1, 'Neutraal (0)':0, 'Enigszins wel (1)':1,
         'Wel (2)':2, 'Helemaal (3)':3, 'Man':0, 'Vrouw':1}

df = df.applymap(lambda s: mymap.get(s) if s in mymap else s)```
lapis sequoia
#

have you printed the info

#

dtype

rigid storm
#

i tried some stuff indeed but it doesnt let me

#

whats the command for that?

#

to check all the datatypes in the dataframe

lapis sequoia
#

temp_df.info() just without any print

rigid storm
#

temp_df.dtypes()?

lapis sequoia
#

if there aren't too many columns\rows

#

or at least columns

rigid storm
lapis sequoia
#

so it's a series

rigid storm
#

Whats a series?

lapis sequoia
#

dataframes are made of series.

#

it's a 1d dataframe except it's missing a bunch of stuff.

#

if you really want, you can do say temp_df.to_frame().info()

rigid storm
#

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84 entries, 0 to 83
Data columns (total 1 columns):
0 0 non-null float64
dtypes: float64(1)
memory usage: 752.0 bytes

lapis sequoia
#

so they're floats

rigid storm
#

i figured that shouldnt really be a problem for mean calculation tho

lapis sequoia
#

well you're doing something wrong

#

is my diagnosis

rigid storm
#

i mean for sure

#

haha

#

but its super weird, even if they're floats instead of integers, what would be the difference?

lapis sequoia
#

nothing

rigid storm
#

uhm

#

it doesnt like the 0?

lapis sequoia
#

0 is fine

rigid storm
#

its okay with - afayk?

lapis sequoia
#

though i wonder

#

just post the data here. it's 84 rows

rigid storm
#

the raw data (csv)?

lapis sequoia
#

in backticks please

rigid storm
#

Sorry i dont fully understand, the backticks are for code right?

lapis sequoia
#

yes but you can post data there too

#

it looks better and will be easier on to copypaste to test

#

that or the full csv. i don't care

rigid storm
#

Ill send the csv if you dont mind. if you want i can send the ipnb as well

arctic wedgeBOT
#

Hey @rigid storm!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia
#

i don't need your ipynb. there's something wrong with it

rigid storm
#

uhmmm

#

could i send you a friend req real quick?

#

and send it privately

lapis sequoia
#

you can post the data here as a copypaste

#

it's not that many rows

rigid storm
#

How tho?

lapis sequoia
#

it's a csv file

#

that's text

#

open it up, ctrl+c, put it in back ticks, ctrl+v, enter

rigid storm
#

ah its > 2000 chars

#

cant post

lapis sequoia
#

such is life.

rigid storm
#

i mean if you dont feel like it via another way i'd understand that

lapis sequoia
#

that thing friendlist is for people i know

rigid storm
#

Allright

#

np

lapis sequoia
#

but take the simplest solution here you can. just read in like 5 lines from the data

#

what happens then

rigid storm
#

should i send the first five?

lapis sequoia
#

well you can do that too

#

if it's somewhat representative of the rest of the data\problem

rigid storm
#

cant do it. even one line has too many chars.

#

with that text at the beginning at least

#

but i mightve fucked up at that mapping part. although it looks like it did convert to floats tho

lapis sequoia
#

are you sure the axis is correct

rigid storm
#

i think if i do axis=0 it will try to do something per row right

#

and axis =1 would be columns?

lapis sequoia
#

does it

rigid storm
#

all rows show this
@rigid storm this was someting like temp_df.mean(axis=0)

#

which gave 'NaN' as the mean for each row up to the 84th

lapis sequoia
#

things are not always what they seem...

rigid storm
#

sorry it is 1 indeed

lapis sequoia
#

they flip around in these cases

#

normally 0 would be rows

rigid storm
#

yeah that mean(axis=1) gave all rows with na

lapis sequoia
#

this is what you expect right?

rigid storm
#

exactly

#

well

#

that, but with the rows

lapis sequoia
#

change the axis to 1 then

rigid storm
#

yeah ofc, but in your DF, it calculates the mean still

#

mine just outputs NaN

lapis sequoia
#

yes

#

what is your temp_df

#

is it still a series

#

if so, it doesn't have axes

rigid storm
#

let me check

#

wait

#

wtf

#

somethin happened

#

i actually have means now

#

i at least dropped the questions themselves (which was row1 - which was text)

lapis sequoia
#

are those means correct

rigid storm
#

they have to be id say. scale goes from -3 to 3

lapis sequoia
#

dropping the zero hmm

rigid storm
#

a lot will be around 0 anyway

#

the zero's should have no influence on the means right

#

like it would be as if these responses dont exist

lapis sequoia
#

i guess you're dropping rows?

#

wait did you actually just drop the index 0?

rigid storm
#

wait mayeb it does matter

lapis sequoia
#

since your index starts from 1 now

rigid storm
#

i dropped this:

#

oh

#

yeah

#

i dropped index 0 (row 0)

#

yes

lapis sequoia
#

... so what was your index 0?

rigid storm
#

which was the original question for the likert scale

lapis sequoia
#

🤦‍♂️

rigid storm
lapis sequoia
#

and that was causing the problem?

rigid storm
#

i guess? but i figured it would just give me NaN for 0 and the rest would be calculated

lapis sequoia
#

that shouldn't be a row in the data...

rigid storm
#

yeah true, thas how i got it from qualtrics 😅

lapis sequoia
#

the problem is that it's gonna coax the whole column datatype

#

into the same datatype

#

they're all some bs strings now or something

#

since you left it there

rigid storm
#

so it couldne cope with it just because of that being in?

lapis sequoia
#

it should be part of the index if it has to be there but i think it shouldn't

#

well it makes everything a string

rigid storm
#

i thought it would just calculate row by row, and if a row wouldnt be possible NaN would be the output

lapis sequoia
#

did you try calculating row by row

#

you probably got nan for every single row

#

actually

#

it probably didn't even try

#

since it saw it was a string

rigid storm
#

do you have the syntax for calc of a secific row?

#

specific

#

kinda curious

lapis sequoia
#

it skipped over all the rows because they were all strings

#

just like it would skip over all the columns

rigid storm
#

But then those means should be the right ones correct?

lapis sequoia
#

the new ones? yes

rigid storm
#

i mean they look correct to me

lapis sequoia
#

they are

rigid storm
#

and then axis=0 i would get the means for all coumns right

lapis sequoia
#

yes

rigid storm
#

columns

#

God this took way too long haha

#

but thanks

lapis sequoia
#

you can try also .describe()

rigid storm
#

for the effort

lapis sequoia
#

to get the basic statistics for the axes

#

it gives you these same numbers and some others too

rigid storm
#

ah nice

lapis sequoia
#

also you would see all the data types if you just do .info() on the dataframe

#

it would have told you the columns are all objects

rigid storm
#

you mean

lapis sequoia
#

yes

rigid storm
#

this was on temp_df tho

#

so before i removed row 0

lapis sequoia
#

you can't calculate a mean on objects

#

that's the point

#

they have to be something that counts as a number of some kind

rigid storm
#

yeah makes sense

#

lol

#

well im glad it worked out in the end

lapis sequoia
#

hey y'all, i got this error: ParserError: Unknown string format: 2020-05-19 10-AM

after i tried to convert it with this df['Date'] = pd.to_datetime(df['Date'])

#

how can i convert "2020-05-19 10-AM" to datetime?

#

give to_datetime the correct format parameter

lapis sequoia
#

sorry for my ignorance, but do you have a simple example how i pass the format parameters?

#

i tried this now:

for i in df['Date']:
    i.strptime('%Y-%m-%d %h-%p')
#

but i just get this error AttributeError: 'str' object has no attribute 'strptime'

#
    date = datetime.strptime(i, '%Y-%m-%d %h-%p')
#

got it now 🙂 thanks though for your help @lapis sequoia

#

you can just shove that same format string to pd.to_datetime

#

and it'll do it automatically'

#

without any looping

rigid storm
#

hey @lapis sequoia if you dont mind one last question? Instead of replacing within the column, what would be the easiest way to replace for rows?

#

df['columnname'].replace(['NaN'], <the number>)

#

so this for example could be for all values in a column right?

lapis sequoia
#

mmm yes

#

that kinda takes the column as a series out of the dataframe and then does a replace on it

rigid storm
#

so id have to say df = .....

#

but for rows?

lapis sequoia
#

what's the difference between replacing in rows and columns in your case?

rigid storm
#

df['x'] = df['x'].replace(['NaN',], the number)

lapis sequoia
#

do you want to replace full rows?

rigid storm
#

i just want to replace the NaNs per row

#

with the mean

#

of that row

lapis sequoia
#

also when you assign back to a dataframe, you should always do df.loc[:, 'x'] =

#

ah

rigid storm
#

so each NaN can just be replaced with same number, but only that new number of that row (the mean of the row)

lapis sequoia
#

there are better functions for that

#

wait so

#

one moment

rigid storm
#

ok

lapis sequoia
#

you want to replace the NaN with the average of that row?

rigid storm
#

si

#

for ex. one participant might have 2 NaNs

#

those two will be replaced with that participant's mean of the rest of his responses

#

(Each row = one particiapnt with 49 answers)

#

if they filled in everything

#

participant 74 has 4 NaNs > 74 mean was 0.222 so those 4 get 0.222

lapis sequoia
#

you want

#
df.fillna(df.mean(axis=1), axis=1)```
#

I think

#

try that

#

I hope that works

#

though I get the feeling those nans are not integers so it won't work

#

or floats

rigid storm
#

i assigned the same name to it again and then printed it btw

#

fyi

lapis sequoia
#

!e ```py
import numpy as np
import pandas as pd

df = pd.DataFrame(np.ones((5,5)), columns=['a', 'b', 'c', 'd', 'e'])
df.iloc[2,2] = np.nan

print(df.fillna(df.mean(axis=1)))```

arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

lapis sequoia
#

blah

#

wrong chan

rigid storm
#

i can copy it

#

and indeed see what it does

#

ok so looks like the NaN is not filled in this case you sent

lapis sequoia
#

i guess you'll have to apply a function on the y-axis

#

that does the filling by row

#

you could drop the axis=1 from what's happening here but then it'll fill with the column averages

rigid storm
#

What about filling the NaNs seperately?

lapis sequoia
#

df.apply(lambda row: row.fillna(row.mean()), axis=1)

#

well that was 2015

#

but you can see it wasn't implemented in your version either

rigid storm
#

this seems to work haha

#

numbers turned into floats as well somehow (at least for the observer)

#

but yeah that looks right

#

ofc they were already technically floats right

lapis sequoia
#

that depends a bit yeah

#

i'm not sure how the integer nans happen in pandas

rigid storm
#

but i think it should be fine right now

#

i can check one last time what the data type of the cells is or something

#

all float64

lapis sequoia
#

hey i got another question. I have dates in a dataframe that look like this: '2020-05-18 11-PM'
i used @broken mortarwakes tipps and was able to convert the times with this function:

#

but now i realized that '2020-05-18 11-PM' and '2020-05-18 11-AM' both were converted to 2020-05-18 11:00:00

#

how can I make 11-PM 23:00 and 11-AM 11:00?

#

Why doesn't it automatically turn AM and PM into distinct times?

broken mortar
lapis sequoia
#
When used with the strptime() function, the %p directive only affects the output hour field if the %I directive is used to parse the hour.
#

you are a walking demigod among us normal humans

#

it's called using google

#

thank you though

#

np. tips fedora

#

@lapis sequoia are you by any chance familiar with the reddit API or pushshift API?

broken mortar
lapis sequoia
#

I need to download an entire subreddit, that has a few posts a day and was created in 2008

lapis sequoia
#

that sounds fun

#

why do you need to

#

for my thesis i am doing some datascience and want to do some sentiment analysis based on posts and comments of certain subreddits

#

that sounds.. dated.. but ok

#

well it is just a little part of the thesis but it needs to be done...

#

do you have some suggestions on how to do it?

#

sure.. look up fastblob

#

there's also semantic context for complex sentences

#

as i read the reddit API does not provide searching by time anymore. and my attempts with pushshift are unsuccessful...

#

you mean fastblob for the sentiment analysis?

#

and you need to search by time, because?

#

I already created a framework for it, since it isn't in english.

#

ahh that's cool

#

i need it by time since I need to get all posts and comments from july 2017 to today and reddit API restricts somehow more than 1000 results or something

#

so do it in batches

#

yes, but I can only get the 1000 latest

#

but 1000 results seems like less than a week

#

ahh

#

that sucks..

#

that means you can't do it

#

try to look for an existing dataset

#

or send them a request through your school

#

the alternative is scraping, which is probably against ToS and a waste of time

#

well I thought about creating a spider with scrapy

#

but I also read that with pushshift it should be possible to get results by time but my attempts until now failed. I will paste my question from earlier this day:

#

hey y'all! Is anyone of you familiar with the pushshift API? I was using psaw and basically used their demo example to grab posts from 2017. But somehow it retrieves only the latest posts and not from the time indicated:

from psaw import PushshiftAPI
import datetime as dt

api = PushshiftAPI()

start_epoch=int(dt.datetime(2017, 1, 1).timestamp())

data = list(api.search_submissions(after=start_epoch, subreddit='neo',  filter=['url','author', 'title', 'subreddit', 'num_comments', 'comments'], limit=10))

print(data)```
This is what the code above returns for me, if I use limit=1 instead of 10:

[submission(author='anonboyGR', created_utc=1590054802, num_comments=0, subreddit='NEO', title='Pi Network Cryptocurrency', url='https://www.reddit.com/r/NEO/comments/gntymq/pi_network_cryptocurrency/', created=1590047602.0, d_={'author': 'anonboyGR', 'created_utc': 1590054802, 'num_comments': 0, 'subreddit': 'NEO', 'title': 'Pi Network Cryptocurrency', 'url': 'https://www.reddit.com/r/NEO/comments/gntymq/pi_network_cryptocurrency/', 'created': 1590047602.0})]```
notice how this is in fact not from 2017...
This is the link to the example I used: https://psaw.readthedocs.io/en/latest/#first-10-submissions-to-r-politics-in-2017-filtering-results-to-url-author-title-subreddit-fields

#

where's the comment

#

I see author, date, etc but isn't the comment supposed to be part of the payload

#

well it's just an example but this post didn't have a comment (num_comments = 0)

#

also it was immediately deleted.

#

My problem though is that the post is not from 2017 even though i was sticking exactly to the example in the link

#

hmm

#

looks like it's something this person put together to search public posts.. he's not with reddit

#

maybe you can raise an issue on his github

#

the last commit was march

#

yeah... i will probably find a way to make it work

lapis ice
#

I have issue with my DCGAN where the training basically halts at

#

Currently I am trying to read on the D/G module and how I can mess around with the activation functions

lapis sequoia
#

Hi guys. For emotion detection, which is the most accurate github project with pretrained models provided?

last peak
#

a = [1,2,3,4]

#

a=[1,2,3,4]
b=a.asfarray(a)
b

array([1., 2., 3., 4.])
Is there a way to change the dtype of np arrays back to int

#

like without turning it back to a list and converting to int with native python

#

i want to switch between numpy dtype to another numpy dtype in that framework, so hopefully nothing slows down too much

lapis ice
#
RuntimeError: size mismatch, m1: [8192 x 16], m2: [8192 x 16] at 

What...

rustic igloo
#

Hello, i am stuck on an error that I don't know what the problem is... see code and error message below. Please let me know what i'm doing wrong! Thanks!

import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras import Model

input_A = tf.random.normal([4,100],0,1)
input_B = tf.random.normal([4,100],0,1)

X = tf.matmul(input_A, tf.transpose(input_B))
X = tf.keras.layers.Dense(192)(X)
X = tf.keras.layers.Dropout(0.2)(X)
output = tf.keras.layers.Dense(1, activation='softmax')(X)

# print(input_A, input_B, input_C, output)

model = tf.keras.Model(inputs=[input_A, input_B], outputs = output)

ERROR MESSAGE
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-68-4c747a5ac852> in <module>()
     13 # print(input_A, input_B, input_C, output)
     14 
---> 15 model = tf.keras.Model(inputs=[input_A, input_B], outputs = output)
     16 

6 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in op(self)
   1111   def op(self):
   1112     raise AttributeError(
-> 1113         "Tensor.op is meaningless when eager execution is enabled.")
   1114 
   1115   @property

AttributeError: Tensor.op is meaningless when eager execution is enabled.
uncut shadow
rustic igloo
#

@uncut shadow Thanks !

uncut shadow
#

👍

lapis sequoia
#

Hi. I was looking for a real-time emotion detection program written in Python that has the models pre-trained and available. Any suggestions?

sullen oasis
#

I think this might be the right place to ask... I am looking for an API that lets me see weather data. Specifically monthly highs/lows/averages. There's so many things online but they are all about the live weather forecast.

#

Any ideas?

rain palm
craggy coyote
#

One of the biggest struggles I find related to Data + python is regularly needing to un_nest keys in such a way that the data can be put into tables/CSV form for analysis.

I came accross a stackoverflow post awhile back which gave me a great function that I tweaked slightly but it's still running into issues with nested lists of dictionaries.

Are there any well known methods of accomplishing this? Or should I keep hacking on what I have?

Example dataset:

    "key2": [
              {"nested_key1": "nested_value1",
              {"nested_key2": "nested_value2"},

              {"nested_key1": "nested_value1",
              {"nested_key2": "nested_value2"}
            ]
}```
#

This is the function (its 90% what I found on stack overflow with tiny edits from me while testing it)

def flatten_dictionary(d):
    result = {}
    stack = [iter(d.items())]  # Create a list of the dictionarie's keys + values in touples (k, v), (k, v) then put all that into a list
    keys = []
    while stack:
        for k, v in stack[-1]:  # Examine the LAST item in the list of touples
            keys.append(k)
            if isinstance(v, list):
                if len(v) > 0:
                    for item in v:
                        if item:
                            if isinstance(item, dict):
                                if len(item.keys()) < 1:
                                    result['.'.join(keys)] = 'None'
                                else:
                                    stack.append(iter(item.items()))
                            elif isinstance(item, list):
                                result['.'.join(keys)] = '.'.join(item)
                                keys.pop()  # This may need to be re-commented out
                            else:
                                result['.'.join(keys)] = ''.join(str(v))
                                keys.pop()
                                break
                    break
                else:
                    result['.'.join(keys)] = 'None'
                    keys.pop()
            elif isinstance(v, dict):
                if len(v.keys()) < 1:
                    result['.'.join(keys)] = 'None'
                    keys.pop()
                else:
                    stack.append(iter(v.items()))
                    break
            else:
                result['.'.join(keys)] = str(v)
                keys.pop()
        else:
            if keys:
                keys.pop()
            stack.pop()
    return result```
lapis sequoia
#

bruh

#

what are you doing.. don't do this

#

this is not how you unnest structures..

#

in your nested structure, what data do you actually hope to use and how do you want it structured

jagged basin
#

any resources on creating a genetic algorithm in keras?

balmy hare
#

how do i run a command when someone react on a message 🙂

lapis sequoia
#

need a project that takes webcam feed and tells in real time if you are happy, sad, surprised etc. It should have pretrained models etc
Any links?

flint lynx
#

Anyone able to help with a Pandas/Matplotlib question in Help-hydrogen?

real wigeon
#

so

#
def plot_sums():
    index_confirmed = confirmed.set_index('Country/Region')
    confirmed_date_time = index_confirmed.iloc[:, 3:]
    summed_values = confirmed_date_time.sum(skipna=True)
    summed_values.plot.line()```
#

I'm getting an exit code 0 from this, but my output contains no plot. What gives?

valid drum
#

How can I vectorize this(it’s very slow)?


‎‏    def backprop(self, dA_prev):
        """
‎‏        Back propagation in a max pooling layer
‎‏        :param dA_prev: derivative of the cost function with respect to the previous layer(when going backwards)
‎‏        :return: the derivative of the cost layer with respect to the current layer
        """
‎‏        x = self.cache['X']
‎‏        n_batch, ch_x, h_x, w_x = x.shape
‎‏        h_poolwindow, w_poolwindow = self.pool_size

‎‏        dA = np.zeros(shape=x.shape)  # dC/dA --> gradient of the input
‎‏        for n in range(n_batch):
‎‏            for ch in range(ch_x):
‎‏                curr_y = out_y = 0
‎‏                while curr_y + h_poolwindow <= h_x:
‎‏                    curr_x = out_x = 0
‎‏                    while curr_x + w_poolwindow <= w_x:
‎‏                        window_slice = x[n, ch, curr_y:curr_y + h_poolwindow, curr_x:curr_x + w_poolwindow]
‎‏                        i, j = np.unravel_index(np.argmax(window_slice), window_slice.shape)
‎‏                        dA[n, ch, curr_y + i, curr_x + j] = dA_prev[n, ch, out_y, out_x]

‎‏                        curr_x += self.stride
‎‏                        out_x += 1

‎‏                    curr_y += self.stride
‎‏                    out_y += 1
‎‏        return dA

merry ridge
#

What kind of derivative is this? I assume you are doing some kind of shooting method but I can’t follow the discretization.

valid drum
#

What kind of derivative is this? I assume you are doing some kind of shooting method but I can’t follow the discretization.
@merry ridge
Max pooling

#

@merry ridge
That’s how I vectorized the forward propagation:


        n_batch, ch_x, h_x, w_x = x.shape
        h_poolwindow, w_poolwindow = self.pool_size

        out_h = int((h_x - h_poolwindow) / self.stride) + 1
        out_w = int((w_x - w_poolwindow) / self.stride) + 1

        windows = as_strided(x,
                             shape=(n_batch, ch_x, out_h, out_w, *self.pool_size),
                             strides=(x.strides[0], x.strides[1],
                                      self.stride * x.strides[2],
                                      self.stride * x.strides[3],
                                      x.strides[2], x.strides[3])
                             )
        out = np.max(windows, axis=(4, 5))

#

But I can’t find a way to do so for the back-propagation...

rustic igloo
#

@rustic igloo I think this might help you https://github.com/tensorflow/tensorflow/issues/27739
@uncut shadow Thanks for the link, but I don't still quite understand. The article said this was a bug and was fixed, so why is this still occurring?

If it is suggesting to use tf.Variable for all parameters on moving average function, may I know which of variable in my code need to apply this?

Thanks.

lilac fiber
#

anybody can help about ROC curve?

arctic canopy
#

Guys if anyone here works with numpy pls tell me any advices to learn it and keep motivated(I want to learn it for ML and I started already but sometimes it seems a bit difficult)

dense scroll
#

Hey guys, I am currently learning and specializing in data science. I am really loving this topic. I am starting to think of ways to make it a my main source of income. Do you guys know what types of people would hire a Data Science Company/Product/Service and which problems are they usually trying to solve? (I am not trying to get hired by a company, but to start my own company that sells data science solutions)

rustic igloo
#

Guys if anyone here works with numpy pls tell me any advices to learn it and keep motivated(I want to learn it for ML and I started already but sometimes it seems a bit difficult)
@arctic canopy I'm also learning this. The best way I found is to practice numpy methods with a small set of code. Also worthwhile to read up on difference it has with other packages like pandas (which is also based on numpy).

arctic canopy
#

@rustic igloo Thanks for your reply im reading a book called python for data anlysis so i will take me to panda after i finish numpy chapter, I will try to pratice it more as you said also can you give me some beginner project?

rustic igloo
#

@rustic igloo Thanks for your reply im reading a book called python for data anlysis so i will take me to panda after i finish numpy chapter, I will try to pratice it more as you said also can you give me some beginner project?
@arctic canopy try practicing something like this:
https://www.machinelearningplus.com/python/101-numpy-exercises-python/

arctic canopy
#

@rustic igloo Thanks i will check it out

lapis sequoia
#

that's pretty nice

#

someone pin this

dull turtle
#

hello guyz i have 1 dought what is means by

#

109ms/step - loss: 5.1975e-07 - accuracy: 1.0000 - val_loss: 0.0000e+00 - val_accuracy: 1.0000 this

jagged basin
#
self.networks[i].get_weights() + self.networks[v].get_weights()```
#

(keras) whenever I try to add the weights of two different networks

#

it returns an error

#

is there a way I could bypass this?

uncut shadow
#

Well

#

It's probably because matrices storing these weights have different shapes

valid drum
#

@merry ridge Do you have any ideas?

#

Maybe extracting the windows and than summing over a certain axis? I really have no idea...

blazing bridge
#

@arctic canopy
I noticed you were having trouble with Numpy. Check out the channel Coding Matrix. They have beginner friendly content. https://m.youtube.com/channel/UCKaajyjktvduM6mmuBtAOyg

#

@arctic canopy are you reading the book physically or electronically

valid drum
#

Do we divide the gradients by the batch-size in Adam optimizer?

#

Because I haven’t seen that mentioned at all...

dull turtle
#

hi i am having following codition

#
if result2 ==0:
    print("country name: Aba, document type: driving licence")
elif result2 ==1:
    print("country name: Aba, document type: Passport") ```
#

but always my 1st condition gets true i.e. "if" gets true

#

but now in my case my elif condition is true i am using passport image then also it is giving licence image as output

#

when i pass 'passport` image it is predicting 'licence image'

late torrent
#

the 'Export Notebook as HTML' option has the most horrific styling ^

#

how can I get something simple and clean that still has all the syntax highlighting etc without rewriting all the CSS?!

dull turtle
#

i have my image recognition model it is predicting "passport " as "licence " and viceversa. what can be the issue will be?

uncut shadow
#
  1. Maybe 1 stands for driving license and 0 for passport in dataset?
  2. Your model might not trained with enough data (or there is something wrong with your model) which causes this.
  3. Maybe you should change the threshold for predicting those values?
lapis sequoia
#

Can anyone help me with this question?
You are designing a neural network to extract a feature map of size 50 x 50 from a colour image of size 100 x 100 x 3
What is the number of parameters if only one fully connected layer is used?

#

trying to study for my exam and i dont know where to begin with this question

arctic canopy
#

@blazing bridge thanks for the channel, im reading an electronic book

dull turtle
#

@uncut shadow now my model only predicting for "licence images" only . for "passport" image it is predicting as "licence " only.

last peak
#

could someone explain how numpy
np.swapaxes(..)
and
np.moveaxes(..)
is working, I am having a hard time visualizing it

Examples

x = np.zeros((3, 4, 5))

np.moveaxis(x, 0, -1).shape
(4, 5, 3)
np.moveaxis(x, -1, 0).shape
(5, 3, 4)

x
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])

np.swapaxes(x,0,2)
array([[[0, 4],
[2, 6]],
[[1, 5],
[3, 7]]])

#

I dont understand what is this 0,2 axis, how did that switch those number

#

uh helloo

#

ok i understand swapaxes, moveaxes though

lapis sequoia
#

Can anyone suggest me what to learn for machine learning?

gusty willow
#

How to select columns that are in english only from a table of different languages?

#

@lapis sequoiamaths and statistics basically...and a language to code in

lapis sequoia
#

Yeah I have learnt python

#

I want to learn ml for python

rigid storm
#

Hi guys how would you approach comparing two groups that made the same survey, but the groups differ in age (survey filled in with likert scale responses between -3 and 3)

#

there is 50 likert items per respondent, so are we just checking normality (if even possible?) for each column (question)? that seems incorrect

#

should we just not check normality and use a nonparametric test?

#

however, due to some NaNs, some of these responses were imputed with the mean of the row, which is a continuous value (for example 0.22)

uncut shadow
#

@gusty willow well, if all you have is raw data (for example csv files or sth) then there is no way to do this tho. Computer cannot detect which language is it (I mean, not without machine learning)

gusty willow
#

@uncut shadowhow with ML?

uncut shadow
#

you can technically make a model which could detect what language is it

#

you would need a dataset for that

#

but data often doesn't have many collumns so the best way would be to choose columns manually

summer yarrow
#

hi

#

can somone help me

rigid storm
#

Can we assume normality of data if both our groups are > 30? (according to central limit theorem) ?

#

datapoints are discrete [-3, -2, -1, 0, 1, 2, 3]

polar acorn
#

You can assume normality of the mean of the data but not the data itself, that is what the central limit theorem says.

rigid storm
#

could you elaborate the diff?

polar acorn
#

Let's for instance say the data is uniformly distributed. Pretty far from a normal distribution right? However you pull many large sample sets from the data and find the mean of each one. Then if each sample set is big enough you will find that if you plot the means they look normally distributed. What does this mean in practice? It means no matter how your data is distributed, if you have enough samples you can treat the mean as normally distributed and do all the stuff you normally would do with a normally distributed value, e.g. hypothesis testing etc. But the data itself is not normally distributed. It's a common misunderstanding though.

rigid storm
#

Ah so if i were to run this same experiment lets say 30 times.

#

then i plot the distribution of all of those means.

#

i get a normal distribution.