#data-science-and-ml | Python | Page 224

radiant nymph May 15, 2020, 9:50 PM

#

I dont think its about losing imp but dimension explosion and overfitting

charred blaze May 15, 2020, 9:52 PM

#

sorry @wind plume , will look into your problem now

#

I'm a bit tipsy but let's see what I can do.

flat quest May 15, 2020, 9:53 PM

#

@radiant nymph no if important features are not selected in ur model due to one - hot ur model will not perform well. Overfitting won't really happen as long as u limit the tree depth / size, and that will deal with dim explosion as well.

radiant nymph May 15, 2020, 9:55 PM

#

okk.

#

https://github.com/atulbunkar/Wine-Prediction please review my work and give feedbacks

GitHub

atulbunkar/Wine-Prediction

Supervisied model to predict Wine based on its location and user Reviews ! - atulbunkar/Wine-Prediction

charred blaze May 15, 2020, 9:59 PM

#

@wind plume Until LOC 152, your module looks OK to me.

#

considering your final purpose with this, I still think the whole column replacement issue is not the way to go.

#

considering that you just want to plot data according to where they're dried or wet, couldn't you just create to separate dataframe for each of these types, while having a common attribute between them and when it comes to the code that draws the bar plots, just have this code reference the common attribute then?

#

sorry if it's not very clear

wind plume May 15, 2020, 10:43 PM

#

I appreciate the tipsy advice @charred blaze, in all seriousness!

What exactly do you mean tho? Are you saying like make a data frame with all wet samples and all dry samples? Rather than have a master dataframe with ALL samples? Wouldn't that require me to sort them at the beginning near where I ask for file inputs? Not sure the best way to go around it tbh.

charred blaze May 15, 2020, 10:43 PM

#

two separate dataframes

wind plume May 15, 2020, 10:43 PM

#

The fact I get NaNs from my input confuses the fuck out of me tho. The code is probably not elegant at all as it's my first real project

charred blaze May 15, 2020, 10:43 PM

#

one for the wet samples

#

another one for the dry samples

wind plume May 15, 2020, 10:44 PM

#

Would you recommend making the two dataframes AFTER I have the master sheet? Or have two master sheets

#

My thought process was having one dataframe that I'd constantly be uploading new data to and then isolating however many columns and graphing them. It's the isolation part that is giving me trouble with the Nan shit

charred blaze May 15, 2020, 10:51 PM

#

I say before the master sheet

wind plume May 15, 2020, 10:55 PM

#

So how do you envision it working? I'd still have to either concat, melt, or do what I am currently struggling with, no?

charred blaze May 15, 2020, 10:57 PM

#

yes, you would have to join those two dataframes afterwards

wind plume May 15, 2020, 10:59 PM

#

Would I join them a different way, tho? I still think I'd use the same or similar function right?

charred blaze May 16, 2020, 12:07 AM

#

you need a common attribute in each row of those two separate dataframes

wind plume May 16, 2020, 2:01 AM

#

But how? Just the same as I tried to make my previous code work?

#

input wet or dry in a column etc

#

maybe append a column after removing outliers??

lapis sequoia May 16, 2020, 6:08 AM

#

Can anyone recommend a book of maths required for data science and ml?

sudden juniper May 16, 2020, 8:07 AM

#

https://www.learndatasci.com/free-data-science-books/

100+ Free Data Science Books

lapis sequoia May 16, 2020, 9:50 AM

#

Thanks

digital lynx May 16, 2020, 8:57 PM

#

can someone help me with pandas plz @ me

lusty coral May 16, 2020, 9:53 PM

#

@digital lynx what's wrong?

real wigeon May 16, 2020, 10:41 PM

#

can i print multiple slice objects? using iloc?

flat quest May 16, 2020, 11:49 PM

#

yeah

#

ilocs prob the best way to go for that

digital lynx May 17, 2020, 12:21 AM

#

I just need to know if Pandas.to_csv(filename) overwrites the current csv file if it has stuff on it

#

I am making a bot that takes data from a csv file and graphs it and changes the data in the table. I need to delete the first row, and add data to the end. I need to know if the to_csv() method will just overwrite the file because that is what I need, not appending all the data I just changed

opaque stratus May 17, 2020, 12:59 AM

#

Hey ---> can anyone help me install TensorFlow on VSCODE?

whole roost May 17, 2020, 1:02 AM

#

Can anyone here help me with specific code troubleshooting? Basic array and histogram use

digital lynx May 17, 2020, 1:49 AM

#

wouldnt you just pip install on your machine? @opaque stratus

rough prawn May 17, 2020, 2:09 AM

#

yo
i have this json

"warnings": [
      {
        "id": 711390341789646919,
        "reason": null
      }

how can i remove the obj with that specific id

paper niche May 17, 2020, 2:18 AM

#

I am making a bot that takes data from a csv file and graphs it and changes the data in the table. I need to delete the first row, and add data to the end. I need to know if the to_csv() method will just overwrite the file because that is what I need, not appending all the data I just changed
@digital lynx it does. the default mode is 'w', which is 'write'.

opaque stratus May 17, 2020, 5:55 AM

#

Hey

#

Currently using VSCode's jupyternotebook interface

#

any idea how I could route the usage to my laptop CPU and GPU?

blazing bridge May 17, 2020, 6:03 AM

#

I suggest using google colab instead, everything runs in the cloud @opaque stratus

opaque stratus May 17, 2020, 6:04 AM

#

@blazing bridge yeah i know lol

#

used it before

#

i just love vscode's look

blazing bridge May 17, 2020, 6:04 AM

#

Oh ok

opaque stratus May 17, 2020, 6:04 AM

#

and not google colabs lol

#

😦

blazing bridge May 17, 2020, 6:04 AM

#

Yeah Ik what you mean

real wigeon May 17, 2020, 12:55 PM

#

So long story short, I'm reading the pandas docs in my spare time. I'm curious what the best way to go about this would be. Should I be reading linearly, or would it be better to pick certain topics (if so which ones)?

quaint wyvern May 17, 2020, 1:01 PM

#

@real wigeon I personally prefer to learn smth doing a project. reading docs isnt gonna stick to your mind unless you actually use them in code. so if you go with topics and related classes it would be better. i think Kaggle has a Pandas course. its short and useful

real wigeon May 17, 2020, 1:02 PM

#

Ok

#

I have a job with a lit of downtime so I'm trying to read docs

#

And code when I'm home

long shard May 17, 2020, 1:47 PM

#

@opaque stratus ... use PIP intall in your command prompt/ Conda Prompt to download necessary modules ... Open VS code and import those modules .. That will work

lapis sequoia May 17, 2020, 3:01 PM

#

I have this problem https://www.reddit.com/r/Python/comments/glgo4s/how_do_i_properly_loopbackdsnoop/

r/Python - How do I properly loopback/dsnoop?

0 votes and 1 comment so far on Reddit

#

Any ideas?

#

Hello, I am trying to clean up some panda dataframes using BeautifulSoup. I am unable to apply that to one column. Any help is appreciated.

#

import pandas as pd
from bs4 import BeautifulSoup


df = pd.DataFrame({"id": [1,2], "a":[ ["<a>Hello</a>"],["<c>Aorld</c>"]], "b":[["<c>World</c>"],["<c>Corld</c>"]]})
df['c'] = df.apply(BeautifulSoup(df['a'].all(), 'html.parser').get_text())

print (df)

echo kelp May 17, 2020, 3:13 PM

#

@lapis sequoia I'm a little confused... Are you trying to parse an existing webpage and put it into a dataframe? It seems like you're ultimately mixing two distinct data structure here

lapis sequoia May 17, 2020, 3:14 PM

#

@echo kelp well, I am trying to clean up the text in column a by removing any html tags.

#

and I saw that BeautifulSoup can help

echo kelp May 17, 2020, 3:14 PM

#

you can access the data in tags returned in beautiful soup by using .content

#

so if you have a tag already stored as an object, you should be able to return the 'Hello' for example, by using something like tag.content

#

I'm not exactly sure about the syntax

lapis sequoia May 17, 2020, 3:15 PM

#

sorry, I am not sure if I am following you

echo kelp May 17, 2020, 3:15 PM

#

yeah, no, sorry

#

well

#

I don't think it would be best practice to store the raw tag themselves as the data in columns a and b

#

if you're looking to manipulate those strings, I'd probably try to use something like a regular expression rather than beautifulsoup in this context

#

beautiful soup can parse tags, but I don't know how applicable it is to iterating over a series of tags in this fashion, particularly when returned from a dataframe

lapis sequoia May 17, 2020, 3:19 PM

#

ahh i see

#

let me try regular expressions

echo kelp May 17, 2020, 3:20 PM

#

did this point you in this direction?

#

https://stackoverflow.com/questions/20045955/regex-pattern-in-python-for-parsing-html-title-tags

Stack Overflow

regex pattern in python for parsing HTML title tags

I am learning to use both the re module and the urllib module in python and attempting to write a simple web scraper. Here's the code I've written to scrape just the title of websites:

#!/usr/bin/...

lapis sequoia May 17, 2020, 3:21 PM

#

similar posr

#

post

echo kelp May 17, 2020, 3:21 PM

#

yeah

#

I can definitely see how it applies

#

I do know though, if you are working with pandas dataframes, every action you take should be "vectorized". Ideally, iterating over a dataframe row by row is heavily discouraged by pandas. So, you can definitely construct a solution somehow doing this, maybe someone else might know better than I do.

lapis sequoia May 17, 2020, 3:23 PM

#

Hello?

#

thanks

echo kelp May 17, 2020, 3:24 PM

#

np, sorry I couldn't find a neat solution

lapis sequoia May 17, 2020, 3:59 PM

#

@echo kelp i think i found it...

#

df['c'] = df['a'].apply(lambda text: BeautifulSoup(''.join(text), 'html.parser').get_text())

#

this worked for me

#

thanks again

#

it was incorrect data type being passed

echo kelp May 17, 2020, 4:02 PM

#

great!

#

that's nifty, great use of a lambda function

stone ruin May 17, 2020, 6:36 PM

#

did anyone else chuckle when they first saw panda's cumulative functions?

balmy chasm May 17, 2020, 7:59 PM

#

@real wigeon
I find that the problem with reading docs is that they don't really have a structure/lesson plan. You just go to learn random tricks, not really see how they fit together.

I've heard a lot of good things about this book, and I plan to read through it myself down the road (It was written by the creator of Pandas).

https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1491957662

real wigeon May 17, 2020, 8:00 PM

#

yeah i was going to watch a freecodecamp tut

#

and use the docs to augment the knowledge

#

while trying a project

last peak May 17, 2020, 11:05 PM

#

HEy

#

Im new to ML i had some basic question about Linear Regression
I am trying to understand this question
A^TA x = A^T b

#

If someone could message me I could give some more context I had some question to clarify what even is going on here

valid drum May 17, 2020, 11:09 PM

#

Hi, I'm having troubles with implementing Conv2D backpropagation using Numpy.
This is what I've done for forward propagation:

ch, h, w = x.shape
Hout = (h - self.filters.shape[-2]) // self.stride + 1
Wout = (w - self.filters.shape[-1]) // self.stride + 1

a = np.lib.stride_tricks.as_strided(x, (Hout, Wout, ch, self.filters.shape[2], self.filters.shape[3]),
                                    (x.strides[1] * self.stride, x.strides[2] * self.stride) + (
                                    x.strides[0], x.strides[1], x.strides[2]))
out = np.einsum('ijckl,ackl->aij', a, self.filters)

I tried doing this but it's not working:

F = np.lib.stride_tricks.as_strided(x, (n_filt, size_filt, size_filt, dim_filt, size_filt, size_filt),
                                    (x.strides[0], x.strides[1] * self.stride, x.strides[2] * self.stride) + (
                                    x.strides[0], x.strides[1], x.strides[2]))
F = np.einsum('aijckl,anm->acij', F, dA_prev)

dF = np.zeros(shape=self.filters.shape) # shape=[n_filters, ch, h, w]
size_filt = self.filters.shape[-1]
for filt in range(n_filt):
    y_filt = y_out = 0
    while y_filt + size_filt <= size_img:
        x_filt = x_out = 0
        while x_filt + size_filt <= size_img:
            dF[filt] += dA_prev[filt, y_out, x_out] * x[:, y_filt:y_filt + size_filt, x_filt:x_filt + size_filt]

This is working great but very slow

lapis sequoia May 18, 2020, 6:24 AM

#

!unzip '/content/drive/My Drive/Colab Notebooks/Dataset.zip' not working. The command is run but the images dont show anywhere in my drive. I have a zip file in my drive which I wanna unzip to use for training testing and validation.

late cargo May 18, 2020, 7:43 AM

#

Is it ok to webscrape a website if its robots.txt has nearly nothing? It only has 3 lines

kindred finch May 18, 2020, 7:49 AM

#

It depends what those three lines are and, more importantly, if they have a ToS page

#

If you paste the website link here I could take a quick look

late cargo May 18, 2020, 8:17 AM

#

https://www.horoscope.com/us/index.aspx

Planetary Update by Horoscope.com

The moon continues through Aries today and sextiles Mercury, the planet of communication, in Gemini, bringing information, passionate conversations, and choices.

kindred finch May 18, 2020, 8:24 AM

#

Looks like a no

You agree:
not to use any manual or automated software, devices or other processes (including but not limited to spiders, robots, scrapers, crawlers, avatars, data mining tools or the like) to "scrape" or download data from any web pages contained in the Website```
In the Terms of Service https://www.horoscope.com/us/tos.aspx

#

@late cargo

late cargo May 18, 2020, 8:25 AM

#

Thanks

solar oracle May 18, 2020, 11:20 AM

#

How should I go about choosing the best impute method? I don't wont to remove data because it is very small already.

raw rapids May 18, 2020, 3:10 PM

#

@solar oracle there's so many ways to impute values

#

The best thing to do is to run a grid search on the best imputer

#

for easy tasks sklearn's SimpleImputer() is really useful

#

There's also Knn Imputation and MICE Imputation

#

Interpolation

#

the list gos on

#

the most method would be to run a grid search on the imputers

sullen glacier May 18, 2020, 3:20 PM

#

hello, I want to learn data science or at least improve my understanding of basics in this area, what free materials may you suggest, I'd be also happy if some people will agree for in person help so I will feel no shame asking stupid questions

raw rapids May 18, 2020, 3:29 PM

#

http://introtodeeplearning.com/

MIT Deep Learning 6.S191

MIT's official introductory course on deep learning methods and applications.

#

is a good introduction to machine learning

#

without too much rigor

#

then there's Andrew Ng course on Coursera

#

which is also really good

sullen glacier May 18, 2020, 3:30 PM

#

@raw rapids looks hard but I will note those materials, thank you

raw rapids May 18, 2020, 3:30 PM

#

how did u assume its hard

#

lol

sullen glacier May 18, 2020, 3:31 PM

#

@raw rapids because I've checked about deep learning before, it's extremely hard by itself

raw rapids May 18, 2020, 3:31 PM

#

no its not

#

its based around simple concepts

#

the deep learning mit course

#

is really beginner-friendly intro

sullen glacier May 18, 2020, 3:32 PM

#

I was thinking that first I should basics on learning before going deep but maybe I was wrong

raw rapids May 18, 2020, 3:33 PM

#

I started of with the course I mentioned above and I'm doing fine

#

kaggle.com

#

is a really good place to supplement your skills

#

they have a treasure trove of awesome notebooks

sullen glacier May 18, 2020, 3:33 PM

#

recently I only discovered what is a notebook

raw rapids May 18, 2020, 3:34 PM

#

well

#

you can keep a note of the things I mentioned above

devout sail May 18, 2020, 3:34 PM

#

I also suggest
https://www.coursera.org/learn/machine-learning

Coursera

Machine Learning | Coursera

Learn Machine Learning from Stanford University. Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, ...

sullen glacier May 18, 2020, 3:34 PM

#

sure I will

raw rapids May 18, 2020, 3:34 PM

#

ya thats the Andrew Ng course I mentioned earlier

devout sail May 18, 2020, 3:35 PM

#

Oh cool didn't see you did

#

Yeah, Andrew's one of the best

raw rapids May 18, 2020, 3:35 PM

#

ya definitely

devout sail May 18, 2020, 3:35 PM

#

The certificate costs money, but participating is free

sullen glacier May 18, 2020, 3:35 PM

#

good

raw rapids May 18, 2020, 3:35 PM

#

yup, it requires a lot of time and dedication in my opinion

#

if you want to retain as much info as possible

#

so you have to make a commitment

#

but its I agree with @devout sail , that is a very good intro

sullen glacier May 18, 2020, 3:38 PM

#

my English is very bad but I may try it

somber tapir May 18, 2020, 6:54 PM

#

Hello, I hope this is the correct channel. I am very new to using python and trying to just write a simple dividend yield formula for a single stock, but want to actually see the steps involved. (My code) # NHI dividend yield=dividend per share/market price per share# d=[1.10, 1.11, 1.12] m=47.25 y=dividend/market
(output) Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'list' and 'float' However, if I write the code without a list it works fine >>> d=1.10

m=47.25

y=d/m
y
0.023280423280423283
y*100
2.3280423280423284 What am I doing wrong with my list?

solar oracle May 18, 2020, 6:56 PM

#

!code

arctic wedgeBOT May 18, 2020, 6:56 PM

#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
• These are backticks, not quotes. Backticks can usually be found on the tilde key.
• You can also use py as the language instead of python
• The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')

solar oracle May 18, 2020, 6:56 PM

#

Please paste code this way so we can read it properly

somber tapir May 18, 2020, 6:57 PM

#

d=1.10

m=47.25
y=d/m
y
0.023280423280423283
y*100
2.3280423280423284
d=[1.10, 1.11, 1.12]
m=47.25
y=d/m
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'list' and 'float'

solar oracle May 18, 2020, 6:58 PM

#

Oh you are doing it in console

somber tapir May 18, 2020, 6:58 PM

#

'''python

solar oracle May 18, 2020, 6:59 PM

#

"Should be backticks, not quotes."

#

Your problem is that you can't divide a list with a float

jolly briar May 18, 2020, 7:00 PM

#

@somber tapir if you're using the console a lot have a look at ipython console, it's sooo much nicer

somber tapir May 18, 2020, 7:00 PM

#

thanks I am trying to put the code in chat, but am such a noob and don't want to spam the chat with my bad attempts

jolly briar May 18, 2020, 7:01 PM

#

print('this')

📎 unknown.png

solar oracle May 18, 2020, 7:01 PM

#

just use `

somber tapir May 18, 2020, 7:01 PM

#

and a float refers to the fact that I have decimals correct?

solar oracle May 18, 2020, 7:01 PM

#

the problem is coming from you trying to divide the LIST with anything, if it was int instead of float it would still raise an error

#

you need to divide the items inside the list

jolly briar May 18, 2020, 7:02 PM

#

or use numpy

solar oracle May 18, 2020, 7:03 PM

#

that too

somber tapir May 18, 2020, 7:05 PM

#

Okay, I can see I have a knowledge gap. I am going to do some reading. Thanks for the help!

solar oracle May 18, 2020, 7:07 PM

#

I think it is actually a fairly intuitive "mistake", but more learning never hurts. Have fun!

somber tapir May 18, 2020, 7:08 PM

#

Oh holy hell iPython does look way nicer.

lapis sequoia May 18, 2020, 8:59 PM

#

What math is used in a self driving car? All the way from the auto pilot code to the electronics that drive the car from the outputs the code gives?

jolly briar May 18, 2020, 8:59 PM

#

@lapis sequoia arithmetic would be used at all levels i imagine

lapis sequoia May 18, 2020, 9:02 PM

#

Yeah I guess so

storm plume May 18, 2020, 10:29 PM

#

Hey guys, I'm familiar with manipulating data in Alteryx with GUI but I'm trying my hand at doing it with Pandas. I'm trying to do a cross join with 3 series for a dataframe for every possible combination. Is there an equivalent function in Python/Pandas?

#

Here's an example I made.

#

📎 unknown.png

raw rapids May 18, 2020, 11:46 PM

#

@lapis sequoia , https://selfdrivingcars.mit.edu/ is a good introduction to self driving cars

MIT 6.S094: Deep Learning for Self-Driving Cars

lexfridman

MIT 6.S094: Deep Learning for Self-Driving Cars

An introduction to deep learning through the applied theme of building a self-driving car. Includes video lectures, competitions, and guest talks.

#

@storm plume

#

You could create a array of permutations with sympy and then make it into the dataframe

storm plume May 18, 2020, 11:52 PM

#

Nah, I figured it out... you have to create a dummy column and do repeated joins.

#

df1.assign(foo=1).merge(df2.assign(foo=1)).drop('foo',1)

#

Wish there was a cleaner way to do it, but oh well.

lusty coral May 19, 2020, 1:00 AM

#

@storm plume you could have used pandas.MultiIndex.from_product, then convert it to dataframe

storm plume May 19, 2020, 1:01 AM

#

Ahhhhhhhhhhh! I just looked at the documentation.

#

That's perfect.

#

Wish I had known about the existence of it earlier. Thanks!

teal turret May 19, 2020, 1:04 AM

#

Hi guys, is anyone here familiar with tesseract ? I am sort of new to python, but i am familiar with other languages,
when i try to get the text off this image I just get "AN afi" not even a number, i tried to invert the image but still got a similar result, any ideas?

import pytesseract as tess
from PIL import Image
import PIL.ImageOps


# inverted_image = PIL.ImageOps.invert(img)
# inverted_image.save('new_name.png')

img = Image.open("text.png")
text = tess.image_to_string(img)
print(text)```

#

📎 text.png

#

here is the image

lusty coral May 19, 2020, 2:09 AM

#

why you inverting the image?

#

@teal turret

#

@teal turret set "exposure" to -100, this way the text is more clear

teal turret May 19, 2020, 2:31 AM

#

I jut tried to see if inverting will help

blazing bridge May 19, 2020, 3:27 AM

#

Anyone have recommendations on the best way to learn and be proficient in machine learning. For example courses and books

#

Typically online would be best

storm plume May 19, 2020, 4:52 AM

#

Hey guys, I'm trying to do a left join and keep only the parent table of anything that did not match.

#

data1 = {'NameA':['Tom', 'Nick', 'Krish', 'Jack'],
        'AgeA':[20, 21, 19, 18]}
data2 = {'NameB':['Tom', 'Nick', 'C', 'D'],
        'AgeB':[20, 21, 3, 4]}
 
# Create DataFrame
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
list = [df1, df2]

df1 = pd.merge(df1,df2,how='left',left_on=['NameA','AgeA'],right_on=['NameB','AgeB'])
print(df1)```

#

Output:

#

0    Tom    20   Tom  20.0
1   Nick    21  Nick  21.0
2  Krish    19   NaN   NaN
3   Jack    18   NaN   NaN

#

Expected:

#

📎 unknown.png

#

How should I approach this?

tepid ocean May 19, 2020, 6:42 AM

#

Anyone have recommendations on the best way to learn and be proficient in machine learning. For example courses and books
@blazing bridge check out the book by the Keras creator Francois Chollet - Deep Learning with Python. It's a very great resource with lots of code examples and tutorials. Also, have a look at the fast.ai website (https://www.fast.ai) and community forum where you can share code and learn from other's coding

swift latch May 19, 2020, 9:51 AM

#

hi everyone i am new to this channel, i have a question regarding an error i am facing using twitterAPI. i have included an image of my question below

📎 Capture.PNG

sonic lichen May 19, 2020, 10:58 AM

#

guys, a question: if I want to import a script with functions that also contains imported modules (e.g. sys, os) into an empty script in order to reuse my code, is there a way around to use those already imported modules (sys, os) or do I have to reimport them again?

uncut shadow May 19, 2020, 11:11 AM

#

well, I have never tested it but I don't think there is point in doing that

sonic lichen May 19, 2020, 11:20 AM

#

What is the most clean way to use it then: I have functions that rely on imported modules and now I want to import those functions into a new script; do I reimport those already imported modules? I actually see with dir(function.py) that it does import previously imported modules, but their use needs to be function.sys.argv for example which of course is very annoying

polar acorn May 19, 2020, 11:25 AM

#

What? If you have in a module a function that import sys. You do not need to explicitly import sys along with that function in a different module if that's what you're asking. Just import the function itself and it should work.

sonic lichen May 19, 2020, 11:27 AM

#

@polar acorn thnx :))

uncut shadow May 19, 2020, 12:17 PM

#

oh, that;s what you meant

#

then yes, you don't need to import anything

wind plume May 19, 2020, 2:06 PM

#

I'm trying to find and filter outliers by each column individually, and then make a new dataframe with the outliers completely filtered.

The output has the same values in every cell, except some values are NaN which I assume means the filter worked. The code seems half functional.

#

for col in df_new.columns:
    Q1 = df_new[col].quantile(0.25)
    Q3 = df_new[col].quantile(0.75)
    IQR = Q3 - Q1
    lower = Q1 - 1.5*(IQR)
    upper = Q3 + 1.5*(IQR)
    
    target = df_new[col]
    df_iqr[col] = df_new[((target > lower) & (target < upper))]

#

Using pandas BTW. Filtering by quartiles and defining it in the for loop. I assume my problem comes with the last sentence of the code. I'll try and play around. At the moment it is filtering things and not filtering things. It is completely removing a row (index skips from 11 to 13)

#

If i should be posting this in a help thread let me know

#

I fixed one part by changing the last line to df_iqr from df_iqr[col] but for some reason it is just blasting index 12 row from every column

paper niche May 19, 2020, 2:20 PM

#

I fixed one part by changing the last line to df_iqr from df_iqr[col] but for some reason it is just blasting index 12 row from every column
@wind plume As long as the row has 1 column where the value is an "outlier", the whole row will be removed. What are you expecting will happen instead?

#

the dataframe has to remain in a tabular format (you can't just remove a "cell" -- speaking in excel terms)

wind plume May 19, 2020, 2:21 PM

#

But other rows have outliers, yet they don't remove. I'm kinda hoping that I will filter through every column individually by the columns IQR. Yet there's no way there's all outliers in row 12

#

I hope that makes some sense.

#

I expect it to be replaced with NaN, a sign either a cell was blank or I filtered properly.

paper niche May 19, 2020, 2:22 PM

#

what's df_iqr?

wind plume May 19, 2020, 2:22 PM

#

That would be the new filtered dataframe

#

Col would be the individual columns in df_new

#

So it's saying df iqr is taking the same column names as dfnew, yet using dfnews values and applying some function to it to selectively filter

paper niche May 19, 2020, 2:24 PM

#

hmm but df_new is not changing throughout the loop, so the final df_iqr would just be the df_new with the rows with last column's outliers removed

#

your last line of the loop (after your "fix"), will keep overwriting df_iqr with df_new with 1 column's outliers removed

wind plume May 19, 2020, 2:26 PM

#

I don't see how you get that, not saying you're wrong I just don't really understand what the last line of code is doing outside of the conditions I set.

paper niche May 19, 2020, 2:27 PM

#

📎 Screenshot_2020-05-19_at_10.26.52_PM.png

#

let's make sure we're on the same page first:
is this what you have at the moment?

for col in df_new.columns:
    Q1 = df_new[col].quantile(0.25)
    Q3 = df_new[col].quantile(0.75)
    IQR = Q3 - Q1
    lower = Q1 - 1.5*(IQR)
    upper = Q3 + 1.5*(IQR)
    
    target = df_new[col]
    df_iqr = df_new[((target > lower) & (target < upper))]

wind plume May 19, 2020, 2:28 PM

#

So it's seeing an outlier in row0 and thus deleting it?

#

And yep!

paper niche May 19, 2020, 2:28 PM

#

okay, so when col == 'B' (second loop), it sees row 2 has an outlier

#

and df_iqr will be set to a dataframe with row 2 removed

#

but then the next loop, col=='C', df_iqr gets set back to the full df_new (because there's no outliers), and row 2 appears back in df_iqr again

#

the final loop with col=='D', row 0 has an outlier, and so df_iqr is set to df_new with row 0 removed --> and that's what you get.

#

the issue here: df_new is never changing, yet df_iqr is being assigned df_new with a single column filtered (outliers removed)

#

and every loop df_iqr is being overwritten

wind plume May 19, 2020, 2:31 PM

#

So basically it will remove the last row with an outlier? And in this case it happened to be mine

paper niche May 19, 2020, 2:31 PM

#

it will remove rows where the last column has an outlier

wind plume May 19, 2020, 2:31 PM

#

It sounds like I need to overwrite df_iqr[column] them

paper niche May 19, 2020, 2:32 PM

#

so you want, at the end of everything, df_iqr and df_new to have the same shape?

#

just with the outliers to be replaced by np.nan?

wind plume May 19, 2020, 2:32 PM

#

But if I replace df_iqr with df_iqr[column] I get a dataframe where column A is all the other columns but with the filter applied

#

Correct

#

I want the filter to apply and remove them. I figured making a new dataframe was the way to go but maybe not

paper niche May 19, 2020, 2:33 PM

#

yea that won't be possible. you can't assign a dataframe as a series / column in a dataframe

lapis sequoia May 19, 2020, 2:33 PM

#

hey yo

wind plume May 19, 2020, 2:33 PM

#

Making a new dataframe out of the filtered values that is.

paper niche May 19, 2020, 2:33 PM

#

in the simple example I have above: what is the desired output?

#

which rows should be in df_iqr?

wind plume May 19, 2020, 2:34 PM

#

I can't easily pastrbij right now but C0 and B2 would be NaN

#

All rows

paper niche May 19, 2020, 2:34 PM

#

okay, you're not looking to remove them then. just replacing the outlier values with nan

wind plume May 19, 2020, 2:35 PM

#

Yes, I suppose so. When I made this before, I was only working with a one column dataframe so it wasn't bad

#

But when working with multiple columns and applying individual statistics to each column its getting hard

#

I feel like my code is very close

acoustic forge May 19, 2020, 2:38 PM

#

I might be super dumb. But I am currently doing predictions on a dataset. I tried with both Forest and Linear regression models, but my R2 score is constantly negative

sonic night May 19, 2020, 2:38 PM

#

Hello all, I'm novice in data analysis, I need some help, how can I show number range in x and y starting from 1? Thank you very much for your help.

📎 unknown.png

paper niche May 19, 2020, 2:39 PM

#

@wind plume maybe try this

Q1 = df_new.quantile(0.25)
Q3 = df_new.quantile(0.75)
IQR = Q3 - Q1

df_iqr = df_new.query('(@Q1 - 1.5 * @IQR) < @df_new < (@Q3 + 1.5 * @IQR)')

lapis sequoia May 19, 2020, 2:39 PM

#

hey there

wind plume May 19, 2020, 2:39 PM

#

What does @ do @paper niche

paper niche May 19, 2020, 2:40 PM

#

it's a syntax for you to access your python variables within the query string

#

it's akin to

df_new[(Q1 - 1.5*IQR < df_new) & (df_new < Q3 + 1.5*IQR)]

wind plume May 19, 2020, 2:42 PM

#

I don't think I've seen thst before but I'm pretty new to coding and pandas. Does it not work if you don't have the @

#

Do i use the above code in my for loop? I assume not.

lapis sequoia May 19, 2020, 2:43 PM

#

yeah first time I'm seeing this too

#

is this new.. is it performant

#

the query method

paper niche May 19, 2020, 2:45 PM

#

I don't think I've seen thst before but I'm pretty new to coding and pandas. Does it not work if you don't have the @
@wind plume no it doesn't, and no there's no need for a loop. If you have a look at what Q1 - 1.5*IQR < df_new is, it's a dataframe of the same shape as df_new, with elements as booleans. (True, if the corresponding element in df_new is a low outlier, False otherwise). You can use this boolean "mask" to filter from df_new to get your df_iqr

#

is this new.. is it performant
@lapis sequoia I think so, lemme try to pull up a SO thread about this..

lapis sequoia May 19, 2020, 2:46 PM

#

I see the last commit was on march 2020.. seems new

paper niche May 19, 2020, 2:47 PM

#

I can't seem to find it.. basically it saves you multiple lookups, especially if you're doing things like df[df.A > 10 & df.B < 100 & df.C > 10]

#

if I remember correctly

wind plume May 19, 2020, 2:48 PM

#

@paper niche woah that worked. I don't really get how. I've seen people use query for things like this. I don't know how it is parsing through every column individually and doing statistics on it. That is insane.

#

With so little code wtf

paper niche May 19, 2020, 2:48 PM

#

the performance is not significant if you're not doing multiple lookups like this (edited...)

wind plume May 19, 2020, 2:48 PM

#

I really want to understand this and not just accept this as an answer and move on

#

Cuz this is confusing to me

paper niche May 19, 2020, 2:49 PM

#

@paper niche woah that worked. I don't really get how. I've seen people use query for things like this. I don't know how it is parsing through every column individually and doing statistics on it. That is insane.
@wind plume break the code down into smaller pieces. like I said, have a look at what Q1 - 1.5*IQR < df_new is first, then what (Q1 - 1.5*IQR < df_new) & (df_new < Q3 + 1.5*IQR) is, then finally what df_new[...] looks like. you'll get a better feel of what this code is doing

wind plume May 19, 2020, 2:49 PM

#

Like why it is a string. How it is looping through everything. What the @ does when the variables are already well defined

#

Okay I will

#

Mind if I try to explain it to you?

paper niche May 19, 2020, 2:50 PM

#

go ahead

wind plume May 19, 2020, 2:51 PM

#

The @q1-1.5*@iqr and @q3+1.5*iqr is applying a filter. It is saying if any value falls between that, it is now in df_iqr

#

Sorry for the bootleg discord code

#

On mobile atm

paper niche May 19, 2020, 2:51 PM

#

no worries, yeah I get you

wind plume May 19, 2020, 2:54 PM

#

I don't quite get the @ and why it is a string but I can look that up. And I assume the df_new.query is saying "for the values in the dataframe df_new that fit our condition, turn that into a dataframe" I guess it seems redundant and more of a procedure. I assume this wouldn't work if you used a generic df that would be defined earlier?

#

The grippy part is what the @ in the string is doing, I assume it is shorthand based off what you are saying

#

Because normally if you use IQR in a formula and you defined IQR before it is no problem

paper niche May 19, 2020, 2:54 PM

#

📎 Screenshot_2020-05-19_at_10.54.47_PM.png

wind plume May 19, 2020, 2:55 PM

#

So a query by necessity needs to be a string

lapis sequoia May 19, 2020, 2:56 PM

#

are you familiar with server logs, I'm trying to build a parser to end all parsers

#

wanted to build something useful.. and was looking at this https://github.com/rory/apache-log-parser

GitHub

rory/apache-log-parser

Parses log lines from an apache log. Contribute to rory/apache-log-parser development by creating an account on GitHub.

paper niche May 19, 2020, 2:57 PM

#

if you had a dataframe with a column x

a = 1
df.query("x < @a")

it's just replacing @a with 1 (your python variable)

wind plume May 19, 2020, 2:58 PM

#

So how is it applying my filter to EVERY column, individually?

paper niche May 19, 2020, 2:58 PM

#

So a query by necessity needs to be a string
@wind plume yeah. within the query string, normal alphabetic characters/words are interpreted as the column names (thus if you had df.query("x < a") instead (without the '@') then it will try to look for a column in df called 'a'

wind plume May 19, 2020, 2:58 PM

#

Why is it not taking the average Q1 iqr etc? Of the whole dataframe

paper niche May 19, 2020, 2:58 PM

#

okay 1 sec

#

are you familiar with server logs, I'm trying to build a parser to end all parsers
@lapis sequoia no unfortunately not. haha I don't deal with the logs 😉

lapis sequoia May 19, 2020, 2:59 PM

#

me neither..

#

Apparently there is something called the common log format, and it can have n fields..

#

https://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format

paper niche May 19, 2020, 3:00 PM

#

📎 Screenshot_2020-05-19_at_11.00.15_PM.png

lapis sequoia May 19, 2020, 3:00 PM

#

but people have been trying to parse these with regexes, with limited success

#

https://stackoverflow.com/questions/40549123/apache-access-log-regex-parsing

Stack Overflow

Apache access log regex parsing

I have a custom access LOG for Apache:

LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %{JSESSIONID}C %D %V" mylog
I am trying to parse from Python the LOGs generated; ...

#

seems like, they have to add capture groups when the logs change.. so I'm wondering if there's a way to make a parser that will parse the first line to check for all available fields from the entire list of fields defined in common log format

#

and not throw errors when a field is missing from newer logs

paper niche May 19, 2020, 3:02 PM

#

📎 Screenshot_2020-05-19_at_11.02.09_PM.png

wind plume May 19, 2020, 3:02 PM

#

But your IQR and Q1/3 in this case i assume is applied not by column but by dataframe right?

#

Oh no, apparently not

paper niche May 19, 2020, 3:02 PM

#

sorry I mispoke just now

#

but this expression calculates the upper outlier per column

#

in a pandas series

lapis sequoia May 19, 2020, 3:03 PM

#

and if a new field is introduced, it should be able to account for that too

paper niche May 19, 2020, 3:04 PM

#

performing df_new < (series), pandas compares all the elements in column 'A', 'B' and 'C' with the respective outlier value in the (series) -- this is essentially your for-loop

#

you end up with a dataframe of booleans (called a mask) -- shown in the pic above

wind plume May 19, 2020, 3:06 PM

#

But in that example above, q3+1. 5*iqr won't be a series I think

#

You're just saying do this function

paper niche May 19, 2020, 3:07 PM

#

you mean in the query string?

wind plume May 19, 2020, 3:08 PM

#

Ah thst was your query string in this case? Ok

paper niche May 19, 2020, 3:09 PM

#

if the query string syntax is still confusing, we can just discuss this one (it's entirely equivalent):

df_iqr = df_new[(Q1 - 1.5*IQR < df_new) & (df_new < Q3 + 1.5*IQR)]

#

where Q1, IQR and Q3 are pandas Series holding the respective per-column values

wind plume May 19, 2020, 3:10 PM

#

The above makes a ton of sense. You are saying the iqr dataframe is now df_new with the appropriate cutoffs

#

So my problem was that I used THAT line inside a for loop

paper niche May 19, 2020, 3:11 PM

#

yeah exactly. You were confused about how I was achieving this without a for-loop

#

hopefully it's clear now with the explanation about the masking

wind plume May 19, 2020, 3:11 PM

#

And every time I did a for loop, it would throw out the last row that had a outlier

#

So I guess I need to learn when I need to use a for loop or not haha

paper niche May 19, 2020, 3:11 PM

#

And every time I did a for loop, it would throw out the last row that had a outlier
@wind plume nono, you would throw out rows with an outlier in that column (the column that you're currently iterating over)

#

So I guess I need to learn when I need to use a for loop or not haha
@wind plume rule of thumb: explicit loops in pandas (and numpy) shouldn't be required in most cases (certainly not when the operations being performed are so simple)

wind plume May 19, 2020, 3:15 PM

#

I've always thought for loops as if I had a list and I wanted to apply iterations on every item in the list, use a for loop

paper niche May 19, 2020, 3:18 PM

#

for ordinary python list, this is true. but not with numpy and pandas. much of the speed improvements from using these packages comes from knowing how to take advantage of vectorized operations.

#

and if a new field is introduced, it should be able to account for that too
@lapis sequoia regex isn't known for being flexible tho xD are you planning to do this with regex too? or..?

wind plume May 19, 2020, 3:21 PM

#

Gooootcha. So the rule of thumb is if I'm a beginner there shouldn't be any need for me to use a for loop? I realize what I'm trying to do isn't probably mega difficult but it seems like the only way to solve this is to cheat and get help

lapis sequoia May 19, 2020, 3:21 PM

#

I am not sure of the direction yet.. I think it'd be a nice tool to make

wind plume May 19, 2020, 3:21 PM

#

It's my personal project not for school or anything so it's not cheating but you know what I mean

lapis sequoia May 19, 2020, 3:21 PM

#

https://pypi.org/project/apache-log-parser/ I'm trying to find the source for this, but I don't see it from the pypi page

PyPI

apache-log-parser

Parse lines from an apache log file

#

found the github

#

line_parser = apache_log_parser.make_parser("%h <<%P>> %t %Dus \"%r\" %>s %b  \"%{Referer}i\" \"%{User-Agent}i\" %l %u")

paper niche May 19, 2020, 3:23 PM

#

Gooootcha. So the rule of thumb is if I'm a beginner there shouldn't be any need for me to use a for loop? I realize what I'm trying to do isn't probably mega difficult but it seems like the only way to solve this is to cheat and get help
@wind plume yeah, just try to keep in mind when dealing with numpy/pandas/similar scientific computing packages that if you're explicitly building loops, there's probably a better way of doing it. and don't guilt trip over getting help haha. It's part of the learning process. Reading other people's answers is how you become aware of better solutions to your problems (it beats reading through the entire documentation yourself)

lapis sequoia May 19, 2020, 3:24 PM

#

thing is if you see this line, it defines exactly what pattern to expect and where, this won't work if a field is missing in the log or if the fields are in different places

paper niche May 19, 2020, 3:24 PM

#

so the common log format assumes every line follows the same format?

#

yeah I see

wind plume May 19, 2020, 3:26 PM

#

I really appreciate the help dude.those pesky for loops have been fucking me on my pandas project lol

paper niche May 19, 2020, 3:26 PM

#

np 🙂

lapis sequoia May 19, 2020, 3:29 PM

#

I'll work on a small part of my project first..

#

so, I'm thinking.. find all the fields available and set everything else to null in a row.. that way I can accept data when the field is introduced in newer logs

#

something like this https://stackoverflow.com/questions/57528383/normalizing-nested-json-data-with-pandas?rq=1

Stack Overflow

Normalizing nested json data with pandas

I am trying to work with a nested json and I am not reaching the result that I want.

I have a JSON data like this:

{'from_cache': True,
'results': [{'data': [{'date': '2019/06/01', 'value': 0},
...

real wigeon May 19, 2020, 4:56 PM

#

so im a bit new to datascience related projects

#

i looked up how to combine duplicate values across multiple columns

#

and am now trying to sort the output from greatest to smallest

#

data = confirmed.groupby('Country/Region')['5/13/20'].max().apply(lambda g: g.nlargest(20).sum())

#

im getting an error regarding the .max()

uncut shadow May 19, 2020, 4:59 PM

#

well, it would be better if you would provide the error tho

jolly briar May 19, 2020, 5:01 PM

#

@real wigeon do all the parts make sense? what is returned by ...max() ? does it make sense to .apply() to that returned value?

real wigeon May 19, 2020, 5:06 PM

#

line 52, in <module>
    total_sum_by_region()
  File "/Users/asdkals/Library/Preferences/PyCharmCE2018.3/scratches/scratch_18.py", line 34, in total_sum_by_region
    data = confirmed.groupby('Country/Region')['5/13/20'].max().apply(lambda g: g.nlargest(20).sum())
  File "/Users/aklsdjals/.local/share/virtualenvs/COVID19-tX0C9oPJ/lib/python3.7/site-packages/pandas/core/series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2329, in pandas._libs.lib.map_infer
  File "/Users/alksdjals/Library/Preferences/PyCharmCE2018.3/scratches/scratch_18.py", line 34, in <lambda>
    data = confirmed.groupby('Country/Region')['5/13/20'].max().apply(lambda g: g.nlargest(20).sum())
AttributeError: 'int' object has no attribute 'nlargest'

valid drum May 19, 2020, 5:09 PM

#

Hi, I tried to implement backpropagation for convolutional layer but for some reason the results are wrong.
I tried to make a full convolution of the filters and the previous layer's gradients.
dA_prev shape : [K, H, W]
w(filters) shape: [K, C, H, W]
x shape: [C, H, W]

dA_dim, dA_h, dA_w = dA_prev.shape # previous layer's gradients
pad_h = dA_h - 1
pad_w = dA_w - 1
ow = np.pad(w, ((0, 0), (0, 0), (pad_h, pad_h), (pad_w, pad_w)), 'constant')
ow = ow[:, :, ::-1, ::-1]
dA = np.lib.stride_tricks.as_strided(ow, (ow.shape[0], x.shape[1], x.shape[2], dA_h, dA_w, ow.shape[1]),
                                     (ow.strides[0], ow.strides[2] * stride[0], ow.strides[3] * stride[1]) + (
                                         ow.strides[2], ow.strides[3], ow.strides[1]))
dA = np.tensordot(dA, dA_prev, axes=[(0, 3, 4), (0, 1, 2)])

sturdy laurel May 19, 2020, 5:23 PM

#

Hey I am looking for an uncased POS tagging model (prefferably using the hugging face tranformers frame work) does any one have any recorces?

#

please tag me if you ahve any info I am going to be away from discord for a bit and I dont want to miss it 🙂

rain palm May 19, 2020, 5:29 PM

#

@sturdy laurel Never used one (dabbled in NLP only) but found this - https://github.com/huggingface/transformers

GitHub

huggingface/transformers

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0. - huggingface/transformers

wind plume May 19, 2020, 6:40 PM

#

Maybe Im just putting in for loops when I dont need to, but say I have values (what I call item) in a column 'Sample' in dataframe called df_melt...

I've been stuck on this for a few hours. I am not sure if this is a red flag and it means my fundamentals are screwed or if this is tricky, or if there are millions of examples I can look up online. I hate coming here asking for help

Ultimately what I want to do is look and see if a value in 'Sample' column has one of the case-insensitive keywords, and if it does, make correspond THAT specific row with 'Weathered'. If it does not have the keyword, we assume it is dry, therefore we call it 'Dry'.
An example column would have a value called "X Dry" or "Y weathered"

df_melt['State'] = ''
keywords = ['wet','weathered','weather']

for item in df_melt['Sample']:
    if any(kw.lower() in item.lower() for kw in keywords):
        print(item + ' is wet')
        df_melt['State'] = np.where(df_melt['Sample'].str.contains(item), 'Weathered','Dry')
    else:
        print(item + ' is dry')

#

My natural instinct is to make for loops if I want to iterate thru a list but as fickletofu said, there's probably ways around using for loops.

Is this where I should build a query?

#

Fwiw when I do this, I see the correct values labeled as 'is dry' or 'is wet', it's a matter of writing it. Not sure how, or why it's so difficult.

rain palm May 19, 2020, 6:55 PM

#

@wind plume Like this?

In [39]: df = pd.DataFrame({'Sample': ['wet', 'WET', 'weathered', 'weather', 'dry']})                                                                                           

In [40]: df['State'] = np.where(df['Sample'].str.contains('wet|weathered|weather', case=False), 'Weathered','Dry')                                                              

In [41]: df                                                                                                                                                                     
Out[41]: 
      Sample      State
0        wet  Weathered
1        WET  Weathered
2  weathered  Weathered
3    weather  Weathered
4        dry        Dry

wind plume May 19, 2020, 7:39 PM

#

@rain palm is there any way to make it totally case insensitive tho? So it could accept WeT, etc. That was my hope with the keywords. Will this also work for something named "720 Wet" or something like that?

rain palm May 19, 2020, 7:39 PM

#

Do you know how to use regex?

wind plume May 19, 2020, 7:40 PM

#

I don't, is it hard to learn? If this is something that will 100% help I am willing to learn

rain palm May 19, 2020, 7:41 PM

#

Finds "720 WET" it seems:

>>> df = pd.DataFrame({'Sample': ['wet', '720 WET', 'weathered', 'weather', 'dry']})
>>> df['State'] = np.where(df['Sample'].str.contains('wet|weathered|weather', case=False), 'Weathered','Dry') 
>>> df
      Sample      State
0        wet  Weathered
1    720 WET  Weathered
2  weathered  Weathered
3    weather  Weathered
4        dry        Dry

fallow thunder May 19, 2020, 7:41 PM

#

@wind plume It will help you long way with string matching.

wind plume May 19, 2020, 7:42 PM

#

I missed the case = false, that is AWESOME.

rain palm May 19, 2020, 7:42 PM

#

Yup.

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html

#

The pandas docs are (unusually - sadly) very well written.

wind plume May 19, 2020, 7:42 PM

#

That clarifies so much

#

I think then, any time I want to make or search something case insensitive I can do that

#

Does regex do it better or faster?

#

It's so counter intuitive to me why you wouldn't use a for loop, but fickletofu was right. Good solution without any for loop nonsense

#

Idk, if I am struggling with stuff like this is it normal? Or does this mean I really need to sit down and watch some YouTube class

rain palm May 19, 2020, 7:50 PM

#

No, regex isn't faster necessarily.

#

Fine, takes time to learn.

fallow thunder May 19, 2020, 7:53 PM

#

Avoid youtube.

wind plume May 19, 2020, 7:54 PM

#

Do you recommend ways to learn this? I was really stuck and was going into for loops and shit when I really didn't need to. Tried a bunch of stuff and spent like hours on it. Then you showed me to use "|", and case = false which was immensely helpful. Doubt I could have found that elsewhere

#

What I heard was to go make your own program that's genuinely useful for you. That's what I'm doing

#

I've used this time WFH to learn to code since I am a research scientist and can't be in lab lol

fallow thunder May 19, 2020, 7:56 PM

#

If you want to learn the right way that will help you a lot, look for books in youtube, if you want to find something there.

#

Also check the sites from the tools that you use, they normally spend time doing tutorials for you.

#

And experiment with your own projects.

#

But if you don't know the basics of programming avoid data science totally

wind plume May 19, 2020, 7:59 PM

#

I learned the very basics through python crash course, but I'm no master of it. At that point in the book it had me code a game and build a website and work up data. I decided to start my own project that would help automate graphing and data workup (remove outliers etc)

fallow thunder May 19, 2020, 8:01 PM

#

What did you learn?

wind plume May 19, 2020, 8:02 PM

#

Dictionaries, lists, list comprehension, user inputs, if else, etc

#

Then it got into coding a game and I felt like I was copy and pasting and not really learning. And it didn't interest me because it wasn't for work. So I made it work applicable.

#

Learned pandas was pretty damn solid, then learned how to use pandas with other packages like seaborn and numpy tho in very limited detail

fallow thunder May 19, 2020, 8:06 PM

#

You need to get logic going. Solve some problems.

#

https://www.freecodecamp.org/news/the-10-most-popular-coding-challenge-websites-of-2016-fb8a5672d22f/

freeCodeCamp.org

The 10 most popular coding challenge websites for 2020

A great way to improve your skills when learning to code is by solving coding
challenges. Solving different types of challenges and puzzles can help you
become a better problem solver, learn the intricacies of a programming language,
prepare for job interviews, learn new algor...

#

It will help you find ways to solve problems with the data that you use

#

You can keep going with data science without doing that, but you can get stuck quite often with for loops, if else

wind plume May 19, 2020, 8:09 PM

#

Ahhhh, awesome thank you for the link! Are these insanely hard challenges, or totally doable for a novice and if you can complete it, it's a solid start? If not, return to the python crash course?

fallow thunder May 19, 2020, 8:09 PM

#

They have difficulties, so you can start with the easy ones

wind plume May 19, 2020, 8:10 PM

#

I notice I get stuck on things for hours and trying to bash my head on things isn't fun. Sometimes I fix it, other times I post here and am like "ojhhhhhhhhhhh"

fallow thunder May 19, 2020, 8:11 PM

#

If you find yourself hardstuck with the easy ones because you don't know the syntax, you can check the python documentation.

#

That's better than any video course you can find

wind plume May 19, 2020, 8:13 PM

#

Is syntax my issue? A lot of problems I have are because I don't know how to do somrthing I want to do. Not sure if that is logic, or syntax, or literally every coding problem ever.

You probably saw my example above and can probably eealziw what I was trying to do, but the fact I couldn't do one small thing meant my code didn't work even tho the rest was sound

fallow thunder May 19, 2020, 8:14 PM

#

It doesn't seem like, that's why I'm recommending you to do code challenges and read the documentation

#

Both things can help you to find solutions (for example, the case=False, it's in the documentation)

#

Alright, we should stop talking about this on a data science channel, if you want to ask for help #python-discussion

wind plume May 19, 2020, 8:19 PM

#

I appreciate it a lot :)

uncut shadow May 19, 2020, 9:20 PM

#

Hello. I have been looking for books for Machine learning from scratch (it Has to be from scratch, so no frameworks like TF, Pytorch, theano etc. just numpy, pandas or matpmotlib etc.) Unfortunatelly, I couldn't find any. Does anybody know any good books?

twin parcel May 19, 2020, 9:55 PM

#

Thanks to everyone helped me with my job scraper! got an interview for a company that creates them next week 😄

lusty coral May 19, 2020, 10:26 PM

#

Found strange pandas interaction let me share

#

Make a df

#

Then get a loc of df to another variable

#

Change original df

granite sierra May 19, 2020, 10:27 PM

#

@uncut shadow I can't recommend any books, I had to do a similar project for a uni assignment, there are loads of youtube tutorials, and if you type for example "neural net in python from scratch", loads of examples

lusty coral May 19, 2020, 10:27 PM

#

Then the locced df is changed

#

Change locced df, original is not changed

#

Wth is that

#

I didn't try to change locced df after I changed original df though

full flint May 20, 2020, 3:48 AM

#

Hi guys,

#

is anyone around to answer a quick squestion?

#

I have a scatter plot showing the relationship between cosine and euclidean distance matrices that looks like this:
https://gyazo.com/372514a67c18132b6364582cfdc6125c

I have been asked to plot a second order polynomial over the data.

Our practicals and lectures don't really cover this so I was wondering if somebody could help explain? 😅

Gyazo

potent hamlet May 20, 2020, 5:12 AM

#

hi everyone, anyone know about TCN(Temporal Convolutional Network)? i have project to predict inflation in my country (it's time series case). I know TCN is evolution from CNN and it use for image processing, but i've read that TCN can be used for time series data, i want to implement TCN on my case(infaltion) but
I had difficulty getting started. maybe you have used it or you have reference about that, please tell me

polar acorn May 20, 2020, 6:55 AM

#

@potent hamlet Check out https://github.com/philipperemy/keras-tcn for a implementation, easy to use and might work for your use case. I've used TCN for time series classification but not for forecasting. Worked great for classification at least.

potent hamlet May 20, 2020, 7:03 AM

#

oh thank you very much

#

maybe you have repo on github about your TCN time series? can i see?

#

oh yes one more question, does that mean TCN is not suitable for forecasting (regression)?

polar acorn May 20, 2020, 7:05 AM

#

@lusty coral If you have time check out this article series https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c especially part 3 and 4. pandas might have changed since 2017 with 1.0 coming out but I still think most of the intuition would be valid. However when and where pandas uses deep or shallow copies have always been sort of opaque.

#

@potent hamlet Sorry that repo is private and not mine I just wrote it 🙂 As I said I haven't tried it for time series forecasting and although it might work fine (Google had at least one article for "TCN time series forecasting") I have a feeling it would be overkill for something like inflation.

flat quest May 20, 2020, 7:35 AM

#

haven't looked into TCN's yet, but if its anything similar to CNN's should work on time series regression just fine.

#

well you know you could always run an ml model on it and get the plot @full flint
it may not be as good as other statistical methods for finding the second order polynomial over the data, but would likely be significantly easier

#

@uncut shadow are you trying to implement complex models just to learn the code behind it or also the mathematical understanding of it?

uncut shadow May 20, 2020, 8:30 AM

#

Yes

lapis sequoia May 20, 2020, 10:27 AM

#

How do I plot 2 columns (one has decimal numbers ranging from 1 to 10 and other has corresponding values to that) using pandas and matplotlib? I want to plot the whole number 1-10 in x axis.

lapis sequoia May 20, 2020, 12:49 PM

#

how can i master numpy

#

i cant learn all those syntaxes at once

polar acorn May 20, 2020, 12:58 PM

#

Find a project you can do that would use numpy a lot and then learn what you need for that. Learning by doing often works better than reading through all the documentation or similar.

raw raptor May 20, 2020, 1:07 PM

#

hello

#

I made some code for a neural network a little while back

#

I'm curious what you guys would think of it

arctic wedgeBOT May 20, 2020, 1:08 PM

#

Hey @raw raptor!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

raw raptor May 20, 2020, 1:09 PM

#

!code-blocks

arctic wedgeBOT May 20, 2020, 1:09 PM

#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
• These are backticks, not quotes. Backticks can usually be found on the tilde key.
• You can also use py as the language instead of python
• The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')

#

Hey @raw raptor!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
• These are backticks, not quotes. Backticks can usually be found on the tilde key.
• You can also use py as the language instead of python
• The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')

rigid storm May 20, 2020, 1:15 PM

#

Hey guys, when using likert scale type responses for your analysis, how would you handle missing values? for example, in the experiment each participant had to fill out a total of 49 statements on a 7-point scale. Within these responses, sometimes there is an answer missing randomly.

raw raptor May 20, 2020, 1:15 PM

#

https://gist.github.com/Blockplanet94/18386dd562e59ccbded658b007c03717

Gist

Neural network so far

Neural network so far. GitHub Gist: instantly share code, notes, and snippets.

#

k, here it is

#

I didn't have any use for a neural network yet, so I didn't program in back propagation or a fitness function

lapis sequoia May 20, 2020, 1:24 PM

#

@polar acorn thanks , can u suggest some projects ?

polar acorn May 20, 2020, 1:45 PM

#

If (and only if) you have some experience in deep learning then making your own simple neural net is nice. Or you can explore the random module by implementing rock, scissors paper vs the computer. Or a simple connect four game. Or find data from something you're interested in sports, finance, dota or whatever, put it in numpy and do some analysis. You would use pandas for this in real life but for learning internals you can use numpy. Or you could google around, you're probably not the first to ask.

raw raptor May 20, 2020, 1:48 PM

#

Wow, never thought of that, thank you! I've barely used numpy before so I'd have to do some learning with that, but I'll definitely make some of these games to test it out once I get the time.

fervent bridge May 20, 2020, 2:29 PM

#

When a 19 year old intern says that Data Scientist and AI are the same
-.-
I don't see the data in the video below that a data scientist is working on
https://www.youtube.com/watch?v=gn4nRCC9TwQ

YouTube

Tech Insider

Google's DeepMind AI Just Taught Itself To Walk

Google's artificial intelligence company, DeepMind, has developed an AI that has managed to learn how to walk, run, jump, and climb without any prior guidance. The result is as impressive as it is goofy.

FACEBOOK: https://www.fac...

▶ Play video

lapis sequoia May 20, 2020, 3:38 PM

#

I need to do a project for learning numpy

#

How can i get the data sets for that

uncut shadow May 20, 2020, 3:45 PM

#

@lapis sequoia https://www.kaggle.com/datasets

Find Open Datasets and Machine Learning Projects | Kaggle

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. Flexible Data Ingestion.

fading drum May 20, 2020, 4:17 PM

#

📎 unknown.png

#

Hey guys sorry if this is the wrong chat room, does anyone mind giving me a hand with this error?

fervent bridge May 20, 2020, 4:24 PM

#

@fading drum Its a warning

#

and it means exactly what it says

#

your data is prob a dict I can't see what it is but its warning you that in future versions you won't be able to do such thing so change your habits

lapis sequoia May 20, 2020, 4:28 PM

#

@uncut shadow thanks

lament tiger May 20, 2020, 4:54 PM

#

Hello team 👋 , so I'm working on with python on colab on Q&A BERT base model using simple-transformers library (https://simpletransformers.ai/) I have a model which has been trained with squad it works pretty well and all, but! every time i ask a question i have to also provide a context where that question can be subtracted from 🤔 .

Now here is the case, let's say i have a table containing a bunch of paragraphs with specific information about depression. Now let's say someone asks a query like: "what can i do to deal with depression?". What techniques do you guys recommend or know about so that based on the question i can choose the best paragraph where the answer will be taken? 🥴

Thank you for your time guys 🙏

lapis ice May 20, 2020, 6:01 PM

#

!paste

stable verge May 20, 2020, 6:07 PM

#

Anyone used PIL before?

real wigeon May 20, 2020, 6:34 PM

#

hey

#

I'm trying to get better with datascience, and currently trying to plot a line chart

#

trying to do something like this post

#

https://stackoverflow.com/questions/49418248/plot-x-axis-as-date-in-matplotlib/49418672#49418672

Stack Overflow

plot x-axis as date in matplotlib

I am trying to perform some analysis on data. I got csv file and I convert it into pandas dataframe. the data looks like this. Its has several columns, but I am trying to draw x-axis as date column...

#

this is currently what I have

#

def plot_sums():
    confirmed.set_index('Country/Region')
    confirmed_date_time = confirmed[3:]
    date_time = pd.to_datetime(confirmed_date_time)
    countries = confirmed

    DF = pd.DataFrame()
    DF['countries'] = countries
    DF.set_index(date_time)

    fig, ax = plt.subplots()
    fig.subplots_adjust(bottom=.3)
    plt.xticks(rotation=90)
    plt.plot()```

#

and my traceback is

#


File "/Applications/PyCharm CE.app/Contents/bin/BankApp/Users/asjkdhask/PycharmProjects/COVID19/Covid.py", line 73, in <module>
    plot_sums()
  File "/Applications/PyCharm CE.app/Contents/bin/BankApp/Users/asjkdhaskjda/PycharmProjects/COVID19/Covid.py", line 28, in plot_sums
    date_time = pd.to_datetime(confirmed_date_time)
  File "/Users/aksjdhaksjd/.local/share/virtualenvs/COVID19-tX0C9oPJ/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 731, in to_datetime
    result = _assemble_from_unit_mappings(arg, errors, tz)
  File "/Users/ajksdhaskhd/.local/share/virtualenvs/COVID19-tX0C9oPJ/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 832, in _assemble_from_unit_mappings
    "to assemble mappings requires at least that "
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing

#

am I not allowed to use a slice like that to denote the columns I want to use?

uncut shadow May 20, 2020, 6:47 PM

#

@real wigeon This should work I think

#

https://stackoverflow.com/questions/39992411/to-datetime-value-error-at-least-that-year-month-day-must-be-specified-pand

Stack Overflow

to_datetime Value Error: at least that [year, month, day] must be s...

I am reading from two different CSVs each having date values in their columns. After read_csv I want to convert the data to datetime with the to_datetime method. The formats of the dates in each CS...

twilit onyx May 20, 2020, 6:48 PM

#

Is there any website which allows me to feed images and it will recognise the digits for me?

#

Via API calls?

real wigeon May 20, 2020, 6:54 PM

#

that definitely exists

#

but idk where

#

I know adobe has that as a premium feature

#

I can actually help you with that @twilit onyx

#

I have a script for stuff like that

#

thank you @uncut shadow that was an interesting read

#

yeah the thing is that im getting an unresolved attribute refference for unstack()

celest comet May 20, 2020, 7:31 PM

#

Hey all, I'm a new python developer (really a new learner of python)

#

and I'm looking to get a job doing data mining/munging

sharp raven May 20, 2020, 7:32 PM

#

does anyone know where i can find information about urllib? like how to use it

tranquil crane May 20, 2020, 7:32 PM

#

Is there any good machine learning course for free that uses Python?

celest comet May 20, 2020, 7:33 PM

#

I know there's a course from the Coursera guy...

#

Andrew.....

#

let me google it

tranquil crane May 20, 2020, 7:33 PM

#

He uses Octave

celest comet May 20, 2020, 7:33 PM

#

Andrew Ng

tranquil crane May 20, 2020, 7:34 PM

#

That guy's voice is so....hypnotic

celest comet May 20, 2020, 7:34 PM

#

https://www.coursera.org/learn/machine-learning#enroll

Coursera

Machine Learning | Coursera

Learn Machine Learning from Stanford University. Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, ...

#

says it's free enrollment, most of the courses on coursera can at least be taken for free without credit

#

@tranquil crane yeah it looks like it's done in octave, but it probably applies to python as well

tranquil crane May 20, 2020, 7:37 PM

#

Thanks

cloud ledge May 20, 2020, 7:45 PM

#

Hi Everyone, was wondering how you guys run your distributed programming

#

For instance, if I wanted to put 10 requests in for 10 models all at once, such as an async process, is there some sort of computing power I could dial into using a key and run my model?

ivory plank May 20, 2020, 7:54 PM

#

What is it you want to do? @cloud ledge feedforward an x into 10 different models for 10 different outputs?

cloud ledge May 20, 2020, 7:54 PM

#

This might not be the best place to ask, so I apologize in advance. I have users from my website submit requests to run pre-defined machine learning models

#

They purchase X amount of cores/gpus to run the models on, but the problem I am trying to solve is that now, how do I run 10 models, say for 10 users, all at once

ivory plank May 20, 2020, 7:56 PM

#

Sorry, I'm still not getting it

cloud ledge May 20, 2020, 7:57 PM

#

I can't run 10 models at once using the procesing power of 1 server (say that only have 20 avaliable cores)

#

So I was wondering if I could offload all that work somewhere

ivory plank May 20, 2020, 7:57 PM

#

You're running your models in a VM or a container right, and now you're trying to optimize your network protocol handling?

cloud ledge May 20, 2020, 7:58 PM

#

So yes, the models are in a container, I just don't have the resoruces to run them

#

I might have 100 people need to run a contained model, not sure the best way to be able to do that

ivory plank May 20, 2020, 8:02 PM

#

I can't really help you much there since I'm not very familiar with distributed computing and database optimization. But, it appears your problem doesn't have to do with with the neural networks themselves since you run them in a container and can treat them as just a piece of software.

#

You might want to try to ask the folks over at web-development/async/databases for their knowledge

cloud ledge May 20, 2020, 8:03 PM

#

thanks ink - your patience and understanding is really appreciated

#

will do

real wigeon May 20, 2020, 8:26 PM

#

im trying to get a sum per column from a df

#

which i then am trying to plot on a line chart

#

just need some help with the summing for now

uncut shadow May 20, 2020, 8:28 PM

#

Well

real wigeon May 20, 2020, 8:28 PM

#

i could probably use group by?

#

and do a loop

#

or osmething?

uncut shadow May 20, 2020, 8:28 PM

#

You should google it How to do this in pandas, but I think you can also put it in numpy and then sum columns

real wigeon May 20, 2020, 8:28 PM

#

i've been googeling how to sum

#

.sum()

#

and then you set the index

#

df['column_name'].sum()

#

returns sums for all the columns

#

idk, I guess I should put those new values in a series? and then plot those?

#

alright yeah sry i thought it was more complicated

uncut shadow May 20, 2020, 8:35 PM

#

👍

#

Googling mostly solves 99,99% of problems

real wigeon May 20, 2020, 8:35 PM

#

and print statements

uncut shadow May 20, 2020, 8:35 PM

#

Those too lol

real wigeon May 20, 2020, 8:36 PM

#

the fact that it summed it per column but it's just called .sum()

real wigeon May 20, 2020, 9:00 PM

#

confused me

valid drum May 20, 2020, 11:32 PM

#

Is using a 4D array of [n, x, x, x] will be a lot faster than iterating n times on an array of [x,x,x]?
I'm asking because Im implementing a CNN using Numpy and I need to improve the preformance in order to make it even trainable(it's 100x slower than keras CPU only)...
You can check it out here if you want:
https://github.com/shafzhr/SimpleConvNet

whole roost May 21, 2020, 1:07 AM

#

Hi!! I have a question about transposing an array?

#

I would say I know how, but it's just ... not working?

#

    phi = np.random.uniform(0,math.pi,N)
    theta = np.random.uniform(0,2*math.pi,N)
    r = 1
    x_array = r*np.sin(phi)*np.cos(theta)
    y_array = r*np.sin(phi)*np.cos(phi)
    z_array = r*np.cos(phi)
    skin = [x_array,y_array,z_array] # find a way to flip the rows and columns on this
    # print(np.ndim(skin))
    # not sure why ndim is giving 2, it ought to be 50x3
    sphere = np.asarray(skin)
    sphere.transpose
    print(np.shape(sphere))
    # make a 3D scatterplot with matplotlib
    return sphere```

#

I'm trying to flip the rows and columns in skin. I thought I could do this with .transpose, but it's not working.

fervent bridge May 21, 2020, 1:08 AM

#

@whole roost what are you passing in for N?

whole roost May 21, 2020, 1:09 AM

#

    print(small_sphere)```

fervent bridge May 21, 2020, 1:21 AM

#

import numpy as np
import math

def sample_sphere_polar(N):
    phi = np.random.uniform(0,math.pi,N)
    theta = np.random.uniform(0,2*math.pi,N)
    r = 1
    x_array = r*np.sin(phi)*np.cos(theta)
    y_array = r*np.sin(phi)*np.cos(phi)
    z_array = r*np.cos(phi)
    skin = [x_array,y_array,z_array] # find a way to flip the rows and columns on this
    # print(np.ndim(skin))
    # not sure why ndim is giving 2, it ought to be 50x3
    sphere = np.array(skin).T
    print(np.shape(skin))
    print(np.shape(sphere))
    # make a 3D scatterplot with matplotlib
    return sphere

small_sphere = sample_sphere_polar(50)

#

@whole roost

#

Worked for me seems like calling .T in the same line made a difference

whole roost May 21, 2020, 1:23 AM

#

Thank you!!

last peak May 21, 2020, 4:40 AM

#

hi guys anyone here familiar with this factorization
A=UEV^T

#

When A is a rectangular matrix, the SVD

#

Does the SVD become Q^T DQ, where D is diagonal eigen value matrix and Q is the orthogonal
vector matrix, when A is a square matrix

merry ridge May 21, 2020, 6:06 AM

#

No they are different decompositions.

#

You can always decompose a matrix using it's SVD, the existence of a orthogonal diagonalization depends on the number of linearly independent eigenvectors.

jagged basin May 21, 2020, 6:53 AM

#

what does input_dim mean in keras?

#

📎 how-to-perform-classification-using-a-neural-network-a-simple-perceptron-example_rk_aac_image1.png

#

if I had a perceptron like this

#

and input0 is 1

#

and input 1 is 0

#

what would be the input_dim of that layer?

spark stag May 21, 2020, 7:09 AM

#

@jagged basin in this example its(2, ), the input dimensions is the 'shape' of the data for that layer, if you have used numpy you can think of it like the shape of a numpy array, there can be different dimensions of different sizes

jagged basin May 21, 2020, 7:13 AM

#

I see

uncut shadow May 21, 2020, 7:47 AM

#

from what I know it's basically the number of features your data has (number of columns)

#

normally you have data like this (x, y) where x stands for number of samples in your batch and y stands for number of features

#

(in RNNs you have (x, y, z) where z stands for number of time steps but it's not an RNN)

vague hawk May 21, 2020, 10:26 AM

#

Noob question here regarding AI:

The google image search (similar image) - I assume it uses AI like the one below, right?
https://deepai.org/machine-learning-model/image-similarity

I was wondering - how does google get the result so quickly? Wouldn't they have to go through billions of pictures on the web?

DeepAI

Image Similarity

Image Similarity compares two images and returns a value that tells you how visually similar they are. The lower the the score, the more contextually similar the two images are with a score of '0' being identical. Sifting through datasets looking for duplicates or finding a vi...

uncut shadow May 21, 2020, 10:27 AM

#

well

#

They have really powerful machines for that
They probably have some metadata for images so they are not checking all images tho

#

but I'm not 100% sure about the second one

crisp anvil May 21, 2020, 10:28 AM

#

hey folks..
i need some help
i need to start learning machine learning

uncut shadow May 21, 2020, 10:28 AM

#

so

#

do you know python?

crisp anvil May 21, 2020, 10:29 AM

#

yeah i know

#

but i m weak at maths

uncut shadow May 21, 2020, 10:29 AM

#

well, it will be a problem if you want to make your own models without using any frameworks like Tensorflow, Keras or PyTorch

crisp anvil May 21, 2020, 10:30 AM

#

actually i want to understand the underlying maths behind it

uncut shadow May 21, 2020, 10:30 AM

#

so you need to know maths

#

mostly linear algebra

#

calculus

#

statistics

#

algorithms

#

and stuff like that

crisp anvil May 21, 2020, 10:31 AM

#

maths and that plotting stuff

#

@uncut shadow yes

#

can you suggest me some good books to start with

#

including books for maths and ML etc

uncut shadow May 21, 2020, 10:32 AM

#

well, I didn't read any books about this type of stuff so unfortunately, I'm not able to suggest anything

#

but you should google and search and there probably be many interesting books out there

crisp anvil May 21, 2020, 10:33 AM

#

i've tried that
but here i encounters a problem
if i start to learn a single library book that keeps refrencing the concepts of another lib

#

here i get stuck

lapis ice May 21, 2020, 10:34 AM

#

Epoch [56/100] Batch 300/1588                   Loss D: 0.6502, loss G: 2.3085 D(x): 0.9010
Epoch [56/100] Batch 400/1588                   Loss D: 0.6502, loss G: 2.3040 D(x): 0.9014
Epoch [56/100] Batch 500/1588                   Loss D: 0.6502, loss G: 2.3045 D(x): 0.8998
Epoch [56/100] Batch 600/1588                   Loss D: 0.6502, loss G: 2.2953 D(x): 0.8995
Epoch [56/100] Batch 700/1588                   Loss D: 0.6502, loss G: 2.3021 D(x): 0.9003
Epoch [56/100] Batch 800/1588

Any idea what could cause the D loss to be "stuck" after certain amount of epoch and the G loss being so huge from the very start?
Bach size is currently 8, learning rate is 0.0002

#

DCGAN

gusty willow May 21, 2020, 11:12 AM

#

Best source to learn deep learning are?

rigid storm May 21, 2020, 12:08 PM

#

Hi guys, im trying to compute means for some specific rows in a pd df. i changed the likert scale responses to numbers and got rid of some columns that had text in them. however it still isnt able to compute the row means..

#

📎 unknown.png

#

any idea how it returns NaN?

#

there are some NaN values in there (10 out of 4069) but i also set skipna to true so it should be able to calculate the mean still

lapis sequoia May 21, 2020, 12:17 PM

#

suspecting that the nans are not understood as nans

rigid storm May 21, 2020, 12:17 PM

#

In the original csv, this was just a blank cell

lapis sequoia May 21, 2020, 12:18 PM

#

you have to make sure pandas agrees that they are nans

rigid storm May 21, 2020, 12:18 PM

#

But surely it should at least be able to compute rows where no NaNs are found?

lapis sequoia May 21, 2020, 12:18 PM

#

i guess?

rigid storm May 21, 2020, 12:18 PM

#

all rows show this

📎 unknown.png

#

however, only 4 out of 84 rows have NaNs in them

lapis sequoia May 21, 2020, 12:18 PM

#

wait you're printing the temp_df

#

not temp_df.mean

rigid storm May 21, 2020, 12:19 PM

#

Oh oops, this is the output;

#

📎 unknown.png

lapis sequoia May 21, 2020, 12:19 PM

#

wrong axis?

#

are they strings?

rigid storm May 21, 2020, 12:20 PM

#

well, i dont think so, this is the code that i used to get response to number;

#

mymap = {'Totaal niet (-3)':-3, 'Niet (-2)':-2, 'Enigszins niet (-1)':-1, 'Neutraal (0)':0, 'Enigszins wel (1)':1,
         'Wel (2)':2, 'Helemaal (3)':3, 'Man':0, 'Vrouw':1}

df = df.applymap(lambda s: mymap.get(s) if s in mymap else s)```

lapis sequoia May 21, 2020, 12:21 PM

#

have you printed the info

#

dtype

rigid storm May 21, 2020, 12:21 PM

#

i tried some stuff indeed but it doesnt let me

#

whats the command for that?

#

to check all the datatypes in the dataframe

lapis sequoia May 21, 2020, 12:22 PM

#

temp_df.info() just without any print

rigid storm May 21, 2020, 12:22 PM

#

temp_df.dtypes()?

lapis sequoia May 21, 2020, 12:23 PM

#

if there aren't too many columns\rows

#

or at least columns

rigid storm May 21, 2020, 12:23 PM

#

📎 unknown.png

lapis sequoia May 21, 2020, 12:23 PM

#

so it's a series

rigid storm May 21, 2020, 12:23 PM

#

Whats a series?

lapis sequoia May 21, 2020, 12:23 PM

#

dataframes are made of series.

#

it's a 1d dataframe except it's missing a bunch of stuff.

#

if you really want, you can do say temp_df.to_frame().info()

rigid storm May 21, 2020, 12:24 PM

#

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84 entries, 0 to 83
Data columns (total 1 columns):
0 0 non-null float64
dtypes: float64(1)
memory usage: 752.0 bytes

lapis sequoia May 21, 2020, 12:24 PM

#

so they're floats

rigid storm May 21, 2020, 12:25 PM

#

i figured that shouldnt really be a problem for mean calculation tho

lapis sequoia May 21, 2020, 12:25 PM

#

well you're doing something wrong

#

is my diagnosis

rigid storm May 21, 2020, 12:27 PM

#

i mean for sure

#

haha

#

but its super weird, even if they're floats instead of integers, what would be the difference?

lapis sequoia May 21, 2020, 12:27 PM

#

nothing

rigid storm May 21, 2020, 12:28 PM

#

uhm

#

it doesnt like the 0?

lapis sequoia May 21, 2020, 12:28 PM

#

0 is fine

rigid storm May 21, 2020, 12:28 PM

#

its okay with - afayk?

lapis sequoia May 21, 2020, 12:28 PM

#

though i wonder

#

just post the data here. it's 84 rows

rigid storm May 21, 2020, 12:29 PM

#

the raw data (csv)?

lapis sequoia May 21, 2020, 12:29 PM

#

in backticks please

rigid storm May 21, 2020, 12:30 PM

#

Sorry i dont fully understand, the backticks are for code right?

lapis sequoia May 21, 2020, 12:30 PM

#

yes but you can post data there too

#

it looks better and will be easier on to copypaste to test

#

that or the full csv. i don't care

rigid storm May 21, 2020, 12:31 PM

#

Ill send the csv if you dont mind. if you want i can send the ipnb as well

arctic wedgeBOT May 21, 2020, 12:31 PM

#

Hey @rigid storm!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia May 21, 2020, 12:31 PM

#

i don't need your ipynb. there's something wrong with it

rigid storm May 21, 2020, 12:32 PM

#

uhmmm

#

could i send you a friend req real quick?

#

and send it privately

lapis sequoia May 21, 2020, 12:32 PM

#

you can post the data here as a copypaste

#

it's not that many rows

rigid storm May 21, 2020, 12:33 PM

#

How tho?

lapis sequoia May 21, 2020, 12:33 PM

#

it's a csv file

#

that's text

#

open it up, ctrl+c, put it in back ticks, ctrl+v, enter

rigid storm May 21, 2020, 12:34 PM

#

ah its > 2000 chars

#

cant post

lapis sequoia May 21, 2020, 12:34 PM

#

such is life.

rigid storm May 21, 2020, 12:36 PM

#

i mean if you dont feel like it via another way i'd understand that

lapis sequoia May 21, 2020, 12:37 PM

#

that thing friendlist is for people i know

rigid storm May 21, 2020, 12:37 PM

#

Allright

#

np

lapis sequoia May 21, 2020, 12:38 PM

#

but take the simplest solution here you can. just read in like 5 lines from the data

#

what happens then

rigid storm May 21, 2020, 12:39 PM

#

should i send the first five?

lapis sequoia May 21, 2020, 12:39 PM

#

well you can do that too

#

if it's somewhat representative of the rest of the data\problem

rigid storm May 21, 2020, 12:41 PM

#

cant do it. even one line has too many chars.

#

with that text at the beginning at least

#

but i mightve fucked up at that mapping part. although it looks like it did convert to floats tho

lapis sequoia May 21, 2020, 12:42 PM

#

are you sure the axis is correct

rigid storm May 21, 2020, 12:43 PM

#

i think if i do axis=0 it will try to do something per row right

#

and axis =1 would be columns?

lapis sequoia May 21, 2020, 12:44 PM

#

does it

rigid storm May 21, 2020, 12:45 PM

#

all rows show this
@rigid storm this was someting like temp_df.mean(axis=0)

#

which gave 'NaN' as the mean for each row up to the 84th

lapis sequoia May 21, 2020, 12:45 PM

#

📎 unknown.png

#

things are not always what they seem...

rigid storm May 21, 2020, 12:46 PM

#

sorry it is 1 indeed

lapis sequoia May 21, 2020, 12:46 PM

#

they flip around in these cases

#

normally 0 would be rows

rigid storm May 21, 2020, 12:46 PM

#

yeah that mean(axis=1) gave all rows with na

lapis sequoia May 21, 2020, 12:49 PM

#

📎 unknown.png

#

this is what you expect right?

rigid storm May 21, 2020, 12:50 PM

#

exactly

#

well

#

that, but with the rows

lapis sequoia May 21, 2020, 12:50 PM

#

change the axis to 1 then

rigid storm May 21, 2020, 12:51 PM

#

yeah ofc, but in your DF, it calculates the mean still

#

mine just outputs NaN

lapis sequoia May 21, 2020, 12:51 PM

#

yes

#

what is your temp_df

#

is it still a series

#

if so, it doesn't have axes

rigid storm May 21, 2020, 12:53 PM

#

let me check

#

wait

#

wtf

#

somethin happened

#

i actually have means now

#

i at least dropped the questions themselves (which was row1 - which was text)

📎 unknown.png

lapis sequoia May 21, 2020, 12:55 PM

#

are those means correct

rigid storm May 21, 2020, 12:56 PM

#

they have to be id say. scale goes from -3 to 3

lapis sequoia May 21, 2020, 12:56 PM

#

dropping the zero hmm

rigid storm May 21, 2020, 12:56 PM

#

a lot will be around 0 anyway

#

the zero's should have no influence on the means right

#

like it would be as if these responses dont exist

lapis sequoia May 21, 2020, 12:57 PM

#

i guess you're dropping rows?

#

wait did you actually just drop the index 0?

rigid storm May 21, 2020, 12:57 PM

#

wait mayeb it does matter

lapis sequoia May 21, 2020, 12:58 PM

#

since your index starts from 1 now

rigid storm May 21, 2020, 12:58 PM

#

i dropped this:

#

oh

#

yeah

#

i dropped index 0 (row 0)

#

yes

lapis sequoia May 21, 2020, 12:58 PM

#

... so what was your index 0?

rigid storm May 21, 2020, 12:58 PM

#

which was the original question for the likert scale

lapis sequoia May 21, 2020, 12:58 PM

#

🤦‍♂️

rigid storm May 21, 2020, 12:58 PM

#

📎 unknown.png

lapis sequoia May 21, 2020, 12:58 PM

#

and that was causing the problem?

rigid storm May 21, 2020, 12:59 PM

#

i guess? but i figured it would just give me NaN for 0 and the rest would be calculated

lapis sequoia May 21, 2020, 12:59 PM

#

that shouldn't be a row in the data...

rigid storm May 21, 2020, 12:59 PM

#

yeah true, thas how i got it from qualtrics 😅

lapis sequoia May 21, 2020, 12:59 PM

#

the problem is that it's gonna coax the whole column datatype

#

into the same datatype

#

they're all some bs strings now or something

#

since you left it there

rigid storm May 21, 2020, 1:00 PM

#

so it couldne cope with it just because of that being in?

lapis sequoia May 21, 2020, 1:00 PM

#

it should be part of the index if it has to be there but i think it shouldn't

#

well it makes everything a string

rigid storm May 21, 2020, 1:00 PM

#

i thought it would just calculate row by row, and if a row wouldnt be possible NaN would be the output

lapis sequoia May 21, 2020, 1:00 PM

#

did you try calculating row by row

#

you probably got nan for every single row

#

actually

#

it probably didn't even try

#

since it saw it was a string

rigid storm May 21, 2020, 1:01 PM

#

do you have the syntax for calc of a secific row?

#

specific

#

kinda curious

lapis sequoia May 21, 2020, 1:02 PM

#

it skipped over all the rows because they were all strings

#

just like it would skip over all the columns

rigid storm May 21, 2020, 1:02 PM

#

But then those means should be the right ones correct?

lapis sequoia May 21, 2020, 1:02 PM

#

the new ones? yes

rigid storm May 21, 2020, 1:02 PM

#

i mean they look correct to me

lapis sequoia May 21, 2020, 1:02 PM

#

they are

rigid storm May 21, 2020, 1:03 PM

#

and then axis=0 i would get the means for all coumns right

lapis sequoia May 21, 2020, 1:03 PM

#

yes

rigid storm May 21, 2020, 1:03 PM

#

columns

#

God this took way too long haha

#

but thanks

lapis sequoia May 21, 2020, 1:03 PM

#

you can try also .describe()

rigid storm May 21, 2020, 1:03 PM

#

for the effort

lapis sequoia May 21, 2020, 1:03 PM

#

to get the basic statistics for the axes

#

it gives you these same numbers and some others too

rigid storm May 21, 2020, 1:04 PM

#

ah nice

lapis sequoia May 21, 2020, 1:04 PM

#

also you would see all the data types if you just do .info() on the dataframe

#

it would have told you the columns are all objects

rigid storm May 21, 2020, 1:05 PM

#

this?

📎 unknown.png

#

you mean

lapis sequoia May 21, 2020, 1:05 PM

#

yes

rigid storm May 21, 2020, 1:05 PM

#

this was on temp_df tho

#

so before i removed row 0

lapis sequoia May 21, 2020, 1:05 PM

#

you can't calculate a mean on objects

#

that's the point

#

they have to be something that counts as a number of some kind

rigid storm May 21, 2020, 1:06 PM

#

yeah makes sense

#

lol

#

well im glad it worked out in the end

lapis sequoia May 21, 2020, 1:11 PM

#

hey y'all, i got this error: ParserError: Unknown string format: 2020-05-19 10-AM

after i tried to convert it with this df['Date'] = pd.to_datetime(df['Date'])

#

how can i convert "2020-05-19 10-AM" to datetime?

#

give to_datetime the correct format parameter

#

https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior look the formatting here

lapis sequoia May 21, 2020, 1:32 PM

#

sorry for my ignorance, but do you have a simple example how i pass the format parameters?

#

i tried this now:

for i in df['Date']:
    i.strptime('%Y-%m-%d %h-%p')

#

but i just get this error AttributeError: 'str' object has no attribute 'strptime'

#

    date = datetime.strptime(i, '%Y-%m-%d %h-%p')

#

got it now 🙂 thanks though for your help @lapis sequoia

#

you can just shove that same format string to pd.to_datetime

#

and it'll do it automatically'

#

without any looping

rigid storm May 21, 2020, 1:40 PM

#

hey @lapis sequoia if you dont mind one last question? Instead of replacing within the column, what would be the easiest way to replace for rows?

#

df['columnname'].replace(['NaN'], <the number>)

#

so this for example could be for all values in a column right?

lapis sequoia May 21, 2020, 1:41 PM

#

mmm yes

#

that kinda takes the column as a series out of the dataframe and then does a replace on it

rigid storm May 21, 2020, 1:41 PM

#

so id have to say df = .....

#

but for rows?

lapis sequoia May 21, 2020, 1:42 PM

#

what's the difference between replacing in rows and columns in your case?

rigid storm May 21, 2020, 1:42 PM

#

df['x'] = df['x'].replace(['NaN',], the number)

lapis sequoia May 21, 2020, 1:42 PM

#

do you want to replace full rows?

rigid storm May 21, 2020, 1:42 PM

#

i just want to replace the NaNs per row

#

with the mean

#

of that row

lapis sequoia May 21, 2020, 1:42 PM

#

also when you assign back to a dataframe, you should always do df.loc[:, 'x'] =

#

ah

rigid storm May 21, 2020, 1:43 PM

#

so each NaN can just be replaced with same number, but only that new number of that row (the mean of the row)

lapis sequoia May 21, 2020, 1:43 PM

#

there are better functions for that

#

wait so

#

one moment

rigid storm May 21, 2020, 1:43 PM

#

ok

lapis sequoia May 21, 2020, 1:43 PM

#

you want to replace the NaN with the average of that row?

rigid storm May 21, 2020, 1:44 PM

#

si

#

for ex. one participant might have 2 NaNs

#

those two will be replaced with that participant's mean of the rest of his responses

#

(Each row = one particiapnt with 49 answers)

#

if they filled in everything

#

participant 74 has 4 NaNs > 74 mean was 0.222 so those 4 get 0.222

📎 unknown.png

lapis sequoia May 21, 2020, 1:46 PM

#

you want

#

df.fillna(df.mean(axis=1), axis=1)```

#

I think

#

try that

#

I hope that works

#

though I get the feeling those nans are not integers so it won't work

#

or floats

rigid storm May 21, 2020, 1:48 PM

#

📎 unknown.png

#

i assigned the same name to it again and then printed it btw

#

fyi

lapis sequoia May 21, 2020, 1:50 PM

#

!e ```py
import numpy as np
import pandas as pd

df = pd.DataFrame(np.ones((5,5)), columns=['a', 'b', 'c', 'd', 'e'])
df.iloc[2,2] = np.nan

print(df.fillna(df.mean(axis=1)))```

arctic wedgeBOT May 21, 2020, 1:50 PM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

lapis sequoia May 21, 2020, 1:50 PM

#

blah

#

wrong chan

rigid storm May 21, 2020, 1:50 PM

#

i can copy it

#

and indeed see what it does

#

ok so looks like the NaN is not filled in this case you sent

lapis sequoia May 21, 2020, 1:52 PM

#

i guess you'll have to apply a function on the y-axis

#

that does the filling by row

#

you could drop the axis=1 from what's happening here but then it'll fill with the column averages

rigid storm May 21, 2020, 1:54 PM

#

What about filling the NaNs seperately?

lapis sequoia May 21, 2020, 1:54 PM

#

df.apply(lambda row: row.fillna(row.mean()), axis=1)

#

heh this is apparently not implemented in pandas yet https://stackoverflow.com/questions/33058590/pandas-dataframe-replacing-nan-with-row-average

Stack Overflow

Pandas Dataframe: Replacing NaN with row average

I am trying to learn pandas but i have been puzzled with the following please. I want to replace NaNs is a dataframe with the row average. Hence something like df.fillna(df.mean(axis=1)) should wo...

#

well that was 2015

#

but you can see it wasn't implemented in your version either

rigid storm May 21, 2020, 1:55 PM

#

this seems to work haha

#

74 had 4 nans, > 0.222

📎 unknown.png

#

numbers turned into floats as well somehow (at least for the observer)

#

but yeah that looks right

#

ofc they were already technically floats right

lapis sequoia May 21, 2020, 2:01 PM

#

that depends a bit yeah

#

i'm not sure how the integer nans happen in pandas

rigid storm May 21, 2020, 2:01 PM

#

but i think it should be fine right now

#

i can check one last time what the data type of the cells is or something

#

all float64

lapis sequoia May 21, 2020, 2:24 PM

#

hey i got another question. I have dates in a dataframe that look like this: '2020-05-18 11-PM'
i used @broken mortarwakes tipps and was able to convert the times with this function:

#

but now i realized that '2020-05-18 11-PM' and '2020-05-18 11-AM' both were converted to 2020-05-18 11:00:00

#

how can I make 11-PM 23:00 and 11-AM 11:00?

#

Why doesn't it automatically turn AM and PM into distinct times?

#

https://stackoverflow.com/questions/51235708/parsing-string-to-datetime-while-accounting-for-am-pm-in-pandas

Stack Overflow

Parsing string to datetime while accounting for AM/PM in pandas

I am trying to parse a string in this format "2018 - 07 - 07 04 - AM"
to pandas datetime using strftime format. However, It seems to me the format doesn't recognize the difference between AM and PM...

broken mortar May 21, 2020, 2:28 PM

#

PepeS

lapis sequoia May 21, 2020, 2:28 PM

#

When used with the strptime() function, the %p directive only affects the output hour field if the %I directive is used to parse the hour.

#

you are a walking demigod among us normal humans

#

it's called using google

#

thank you though

#

np. tips fedora

#

@lapis sequoia are you by any chance familiar with the reddit API or pushshift API?

broken mortar May 21, 2020, 2:42 PM

#

PepeS

lapis sequoia May 21, 2020, 2:43 PM

#

I need to download an entire subreddit, that has a few posts a day and was created in 2008

lapis sequoia May 21, 2020, 3:15 PM

#

that sounds fun

#

why do you need to

#

for my thesis i am doing some datascience and want to do some sentiment analysis based on posts and comments of certain subreddits

#

that sounds.. dated.. but ok

#

well it is just a little part of the thesis but it needs to be done...

#

do you have some suggestions on how to do it?

#

sure.. look up fastblob

#

there's also semantic context for complex sentences

#

as i read the reddit API does not provide searching by time anymore. and my attempts with pushshift are unsuccessful...

#

you mean fastblob for the sentiment analysis?

#

and you need to search by time, because?

#

I already created a framework for it, since it isn't in english.

#

ahh that's cool

#

i need it by time since I need to get all posts and comments from july 2017 to today and reddit API restricts somehow more than 1000 results or something

#

so do it in batches

#

yes, but I can only get the 1000 latest

#

but 1000 results seems like less than a week

#

ahh

#

that sucks..

#

that means you can't do it

#

try to look for an existing dataset

#

or send them a request through your school

#

the alternative is scraping, which is probably against ToS and a waste of time

#

well I thought about creating a spider with scrapy

#

but I also read that with pushshift it should be possible to get results by time but my attempts until now failed. I will paste my question from earlier this day:

#

hey y'all! Is anyone of you familiar with the pushshift API? I was using psaw and basically used their demo example to grab posts from 2017. But somehow it retrieves only the latest posts and not from the time indicated:

from psaw import PushshiftAPI
import datetime as dt

api = PushshiftAPI()

start_epoch=int(dt.datetime(2017, 1, 1).timestamp())

data = list(api.search_submissions(after=start_epoch, subreddit='neo',  filter=['url','author', 'title', 'subreddit', 'num_comments', 'comments'], limit=10))

print(data)```
This is what the code above returns for me, if I use limit=1 instead of 10:

[submission(author='anonboyGR', created_utc=1590054802, num_comments=0, subreddit='NEO', title='Pi Network Cryptocurrency', url='https://www.reddit.com/r/NEO/comments/gntymq/pi_network_cryptocurrency/', created=1590047602.0, d_={'author': 'anonboyGR', 'created_utc': 1590054802, 'num_comments': 0, 'subreddit': 'NEO', 'title': 'Pi Network Cryptocurrency', 'url': 'https://www.reddit.com/r/NEO/comments/gntymq/pi_network_cryptocurrency/', 'created': 1590047602.0})]```
notice how this is in fact not from 2017...
This is the link to the example I used: https://psaw.readthedocs.io/en/latest/#first-10-submissions-to-r-politics-in-2017-filtering-results-to-url-author-title-subreddit-fields

#

where's the comment

#

I see author, date, etc but isn't the comment supposed to be part of the payload

#

well it's just an example but this post didn't have a comment (num_comments = 0)

#

also it was immediately deleted.

#

My problem though is that the post is not from 2017 even though i was sticking exactly to the example in the link

#

hmm

#

looks like it's something this person put together to search public posts.. he's not with reddit

#

maybe you can raise an issue on his github

#

the last commit was march

#

yeah... i will probably find a way to make it work

lapis ice May 21, 2020, 3:47 PM

#

I have issue with my DCGAN where the training basically halts at

📎 unknown.png

#

Currently I am trying to read on the D/G module and how I can mess around with the activation functions

lapis sequoia May 21, 2020, 4:13 PM

#

Hi guys. For emotion detection, which is the most accurate github project with pretrained models provided?

last peak May 21, 2020, 4:53 PM

#

a = [1,2,3,4]

#

a=[1,2,3,4]
b=a.asfarray(a)
b

array([1., 2., 3., 4.])
Is there a way to change the dtype of np arrays back to int

#

like without turning it back to a list and converting to int with native python

#

i want to switch between numpy dtype to another numpy dtype in that framework, so hopefully nothing slows down too much

lapis ice May 21, 2020, 5:27 PM

#

RuntimeError: size mismatch, m1: [8192 x 16], m2: [8192 x 16] at

What...

rustic igloo May 21, 2020, 5:30 PM

#

Hello, i am stuck on an error that I don't know what the problem is... see code and error message below. Please let me know what i'm doing wrong! Thanks!

import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras import Model

input_A = tf.random.normal([4,100],0,1)
input_B = tf.random.normal([4,100],0,1)

X = tf.matmul(input_A, tf.transpose(input_B))
X = tf.keras.layers.Dense(192)(X)
X = tf.keras.layers.Dropout(0.2)(X)
output = tf.keras.layers.Dense(1, activation='softmax')(X)

# print(input_A, input_B, input_C, output)

model = tf.keras.Model(inputs=[input_A, input_B], outputs = output)

ERROR MESSAGE
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-68-4c747a5ac852> in <module>()
     13 # print(input_A, input_B, input_C, output)
     14 
---> 15 model = tf.keras.Model(inputs=[input_A, input_B], outputs = output)
     16 

6 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in op(self)
   1111   def op(self):
   1112     raise AttributeError(
-> 1113         "Tensor.op is meaningless when eager execution is enabled.")
   1114 
   1115   @property

AttributeError: Tensor.op is meaningless when eager execution is enabled.

uncut shadow May 21, 2020, 5:35 PM

#

@rustic igloo I think this might help you https://github.com/tensorflow/tensorflow/issues/27739

GitHub

[Tensorflow 2.0] AttributeError: Tensor.op is meaningless when eage...

System information Enviroment : Google Colaboratory TensorFlow installed from (source or binary): !pip install tensorflow-gpu==2.0.0-alpha TensorFlow version (use command below): 2.0-alpha My code ...

rustic igloo May 21, 2020, 5:41 PM

#

@uncut shadow Thanks !

uncut shadow May 21, 2020, 5:42 PM

#

👍

lapis sequoia May 21, 2020, 5:56 PM

#

Hi. I was looking for a real-time emotion detection program written in Python that has the models pre-trained and available. Any suggestions?

sullen oasis May 21, 2020, 5:57 PM

#

I think this might be the right place to ask... I am looking for an API that lets me see weather data. Specifically monthly highs/lows/averages. There's so many things online but they are all about the live weather forecast.

#

Any ideas?

rain palm May 21, 2020, 6:19 PM

#

@lapis sequoia https://github.com/topics/emotion-recognition?l=python - many libraries.

GitHub

Build software better, together

GitHub is where people build software. More than 50 million people use GitHub to discover, fork, and contribute to over 100 million projects.

craggy coyote May 21, 2020, 6:40 PM

#

One of the biggest struggles I find related to Data + python is regularly needing to un_nest keys in such a way that the data can be put into tables/CSV form for analysis.

I came accross a stackoverflow post awhile back which gave me a great function that I tweaked slightly but it's still running into issues with nested lists of dictionaries.

Are there any well known methods of accomplishing this? Or should I keep hacking on what I have?

Example dataset:

    "key2": [
              {"nested_key1": "nested_value1",
              {"nested_key2": "nested_value2"},

              {"nested_key1": "nested_value1",
              {"nested_key2": "nested_value2"}
            ]
}```

#

This is the function (its 90% what I found on stack overflow with tiny edits from me while testing it)

def flatten_dictionary(d):
    result = {}
    stack = [iter(d.items())]  # Create a list of the dictionarie's keys + values in touples (k, v), (k, v) then put all that into a list
    keys = []
    while stack:
        for k, v in stack[-1]:  # Examine the LAST item in the list of touples
            keys.append(k)
            if isinstance(v, list):
                if len(v) > 0:
                    for item in v:
                        if item:
                            if isinstance(item, dict):
                                if len(item.keys()) < 1:
                                    result['.'.join(keys)] = 'None'
                                else:
                                    stack.append(iter(item.items()))
                            elif isinstance(item, list):
                                result['.'.join(keys)] = '.'.join(item)
                                keys.pop()  # This may need to be re-commented out
                            else:
                                result['.'.join(keys)] = ''.join(str(v))
                                keys.pop()
                                break
                    break
                else:
                    result['.'.join(keys)] = 'None'
                    keys.pop()
            elif isinstance(v, dict):
                if len(v.keys()) < 1:
                    result['.'.join(keys)] = 'None'
                    keys.pop()
                else:
                    stack.append(iter(v.items()))
                    break
            else:
                result['.'.join(keys)] = str(v)
                keys.pop()
        else:
            if keys:
                keys.pop()
            stack.pop()
    return result```

lapis sequoia May 21, 2020, 6:54 PM

#

bruh

#

what are you doing.. don't do this

#

this is not how you unnest structures..

#

in your nested structure, what data do you actually hope to use and how do you want it structured

jagged basin May 21, 2020, 9:45 PM

#

any resources on creating a genetic algorithm in keras?

balmy hare May 21, 2020, 9:49 PM

#

how do i run a command when someone react on a message 🙂

lapis sequoia May 21, 2020, 10:10 PM

#

need a project that takes webcam feed and tells in real time if you are happy, sad, surprised etc. It should have pretrained models etc
Any links?

flint lynx May 21, 2020, 11:04 PM

#

Anyone able to help with a Pandas/Matplotlib question in Help-hydrogen?

real wigeon May 21, 2020, 11:04 PM

#

so

#

def plot_sums():
    index_confirmed = confirmed.set_index('Country/Region')
    confirmed_date_time = index_confirmed.iloc[:, 3:]
    summed_values = confirmed_date_time.sum(skipna=True)
    summed_values.plot.line()```

#

I'm getting an exit code 0 from this, but my output contains no plot. What gives?

valid drum May 22, 2020, 12:13 AM

#

How can I vectorize this(it’s very slow)?


‎‏    def backprop(self, dA_prev):
        """
‎‏        Back propagation in a max pooling layer
‎‏        :param dA_prev: derivative of the cost function with respect to the previous layer(when going backwards)
‎‏        :return: the derivative of the cost layer with respect to the current layer
        """
‎‏        x = self.cache['X']
‎‏        n_batch, ch_x, h_x, w_x = x.shape
‎‏        h_poolwindow, w_poolwindow = self.pool_size

‎‏        dA = np.zeros(shape=x.shape)  # dC/dA --> gradient of the input
‎‏        for n in range(n_batch):
‎‏            for ch in range(ch_x):
‎‏                curr_y = out_y = 0
‎‏                while curr_y + h_poolwindow <= h_x:
‎‏                    curr_x = out_x = 0
‎‏                    while curr_x + w_poolwindow <= w_x:
‎‏                        window_slice = x[n, ch, curr_y:curr_y + h_poolwindow, curr_x:curr_x + w_poolwindow]
‎‏                        i, j = np.unravel_index(np.argmax(window_slice), window_slice.shape)
‎‏                        dA[n, ch, curr_y + i, curr_x + j] = dA_prev[n, ch, out_y, out_x]

‎‏                        curr_x += self.stride
‎‏                        out_x += 1

‎‏                    curr_y += self.stride
‎‏                    out_y += 1
‎‏        return dA

merry ridge May 22, 2020, 12:40 AM

#

What kind of derivative is this? I assume you are doing some kind of shooting method but I can’t follow the discretization.

valid drum May 22, 2020, 1:00 AM

#

What kind of derivative is this? I assume you are doing some kind of shooting method but I can’t follow the discretization.
@merry ridge
Max pooling

#

@merry ridge
That’s how I vectorized the forward propagation:


        n_batch, ch_x, h_x, w_x = x.shape
        h_poolwindow, w_poolwindow = self.pool_size

        out_h = int((h_x - h_poolwindow) / self.stride) + 1
        out_w = int((w_x - w_poolwindow) / self.stride) + 1

        windows = as_strided(x,
                             shape=(n_batch, ch_x, out_h, out_w, *self.pool_size),
                             strides=(x.strides[0], x.strides[1],
                                      self.stride * x.strides[2],
                                      self.stride * x.strides[3],
                                      x.strides[2], x.strides[3])
                             )
        out = np.max(windows, axis=(4, 5))

#

But I can’t find a way to do so for the back-propagation...

rustic igloo May 22, 2020, 1:51 AM

#

@rustic igloo I think this might help you https://github.com/tensorflow/tensorflow/issues/27739
@uncut shadow Thanks for the link, but I don't still quite understand. The article said this was a bug and was fixed, so why is this still occurring?

If it is suggesting to use tf.Variable for all parameters on moving average function, may I know which of variable in my code need to apply this?

Thanks.

GitHub

[Tensorflow 2.0] AttributeError: Tensor.op is meaningless when eage...

System information Enviroment : Google Colaboratory TensorFlow installed from (source or binary): !pip install tensorflow-gpu==2.0.0-alpha TensorFlow version (use command below): 2.0-alpha My code ...

lilac fiber May 22, 2020, 2:16 AM

#

anybody can help about ROC curve?

arctic canopy May 22, 2020, 2:28 AM

#

Guys if anyone here works with numpy pls tell me any advices to learn it and keep motivated(I want to learn it for ML and I started already but sometimes it seems a bit difficult)

dense scroll May 22, 2020, 2:57 AM

#

Hey guys, I am currently learning and specializing in data science. I am really loving this topic. I am starting to think of ways to make it a my main source of income. Do you guys know what types of people would hire a Data Science Company/Product/Service and which problems are they usually trying to solve? (I am not trying to get hired by a company, but to start my own company that sells data science solutions)

rustic igloo May 22, 2020, 3:13 AM

#

Guys if anyone here works with numpy pls tell me any advices to learn it and keep motivated(I want to learn it for ML and I started already but sometimes it seems a bit difficult)
@arctic canopy I'm also learning this. The best way I found is to practice numpy methods with a small set of code. Also worthwhile to read up on difference it has with other packages like pandas (which is also based on numpy).

arctic canopy May 22, 2020, 3:20 AM

#

@rustic igloo Thanks for your reply im reading a book called python for data anlysis so i will take me to panda after i finish numpy chapter, I will try to pratice it more as you said also can you give me some beginner project?

rustic igloo May 22, 2020, 3:24 AM

#

@rustic igloo Thanks for your reply im reading a book called python for data anlysis so i will take me to panda after i finish numpy chapter, I will try to pratice it more as you said also can you give me some beginner project?
@arctic canopy try practicing something like this:
https://www.machinelearningplus.com/python/101-numpy-exercises-python/

Machine Learning Plus

Selva Prabhakaran

101 Numpy Exercises for Data Analysis

The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics. The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest.

arctic canopy May 22, 2020, 3:26 AM

#

@rustic igloo Thanks i will check it out

lapis sequoia May 22, 2020, 6:50 AM

#

that's pretty nice

#

someone pin this

dull turtle May 22, 2020, 7:00 AM

#

hello guyz i have 1 dought what is means by

#

109ms/step - loss: 5.1975e-07 - accuracy: 1.0000 - val_loss: 0.0000e+00 - val_accuracy: 1.0000 this

jagged basin May 22, 2020, 7:54 AM

#

self.networks[i].get_weights() + self.networks[v].get_weights()```

#

(keras) whenever I try to add the weights of two different networks

#

it returns an error

#

is there a way I could bypass this?

uncut shadow May 22, 2020, 8:01 AM

#

Well

#

It's probably because matrices storing these weights have different shapes

valid drum May 22, 2020, 8:57 AM

#

@merry ridge Do you have any ideas?

#

Maybe extracting the windows and than summing over a certain axis? I really have no idea...

blazing bridge May 22, 2020, 9:10 AM

#

@arctic canopy
I noticed you were having trouble with Numpy. Check out the channel Coding Matrix. They have beginner friendly content. https://m.youtube.com/channel/UCKaajyjktvduM6mmuBtAOyg

YouTube

Coding Matrix

Welcome to our channel, our names are Hamad Sultan and Shaheed Mohamed Ali. We are two aspiring high school students and programmers who wish to share our kn...

#

@arctic canopy are you reading the book physically or electronically

valid drum May 22, 2020, 9:49 AM

#

Do we divide the gradients by the batch-size in Adam optimizer?

#

Because I haven’t seen that mentioned at all...

dull turtle May 22, 2020, 10:25 AM

#

hi i am having following codition

#

if result2 ==0:
    print("country name: Aba, document type: driving licence")
elif result2 ==1:
    print("country name: Aba, document type: Passport") ```

#

but always my 1st condition gets true i.e. "if" gets true

#

but now in my case my elif condition is true i am using passport image then also it is giving licence image as output

#

when i pass 'passport` image it is predicting 'licence image'

late torrent May 22, 2020, 10:43 AM

#

I have a dying question: how on earth do you output Jupyter notebooks to HTML without it looking truly terrible?!

📎 Screen_Shot_2020-05-22_at_6.33.13_pm.png

#

the 'Export Notebook as HTML' option has the most horrific styling ^

#

how can I get something simple and clean that still has all the syntax highlighting etc without rewriting all the CSS?!

dull turtle May 22, 2020, 11:29 AM

#

i have my image recognition model it is predicting "passport " as "licence " and viceversa. what can be the issue will be?

uncut shadow May 22, 2020, 12:34 PM

#

Maybe 1 stands for driving license and 0 for passport in dataset?
Your model might not trained with enough data (or there is something wrong with your model) which causes this.
Maybe you should change the threshold for predicting those values?

lapis sequoia May 22, 2020, 12:35 PM

#

Can anyone help me with this question?
You are designing a neural network to extract a feature map of size 50 x 50 from a colour image of size 100 x 100 x 3
What is the number of parameters if only one fully connected layer is used?

#

trying to study for my exam and i dont know where to begin with this question

arctic canopy May 22, 2020, 12:37 PM

#

@blazing bridge thanks for the channel, im reading an electronic book

dull turtle May 22, 2020, 12:40 PM

#

@uncut shadow now my model only predicting for "licence images" only . for "passport" image it is predicting as "licence " only.

last peak May 22, 2020, 2:40 PM

#

could someone explain how numpy
np.swapaxes(..)
and
np.moveaxes(..)
is working, I am having a hard time visualizing it

Examples

x = np.zeros((3, 4, 5))

np.moveaxis(x, 0, -1).shape
(4, 5, 3)
np.moveaxis(x, -1, 0).shape
(5, 3, 4)

x
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])

np.swapaxes(x,0,2)
array([[[0, 4],
[2, 6]],
[[1, 5],
[3, 7]]])

#

I dont understand what is this 0,2 axis, how did that switch those number

#

uh helloo

#

ok i understand swapaxes, moveaxes though

lapis sequoia May 22, 2020, 4:37 PM

#

Can anyone suggest me what to learn for machine learning?

gusty willow May 22, 2020, 4:48 PM

#

How to select columns that are in english only from a table of different languages?

#

@lapis sequoiamaths and statistics basically...and a language to code in

lapis sequoia May 22, 2020, 4:49 PM

#

Yeah I have learnt python

#

I want to learn ml for python

rigid storm May 22, 2020, 5:02 PM

#

Hi guys how would you approach comparing two groups that made the same survey, but the groups differ in age (survey filled in with likert scale responses between -3 and 3)

#

there is 50 likert items per respondent, so are we just checking normality (if even possible?) for each column (question)? that seems incorrect

#

should we just not check normality and use a nonparametric test?

#

this is how responses look

📎 unknown.png

#

however, due to some NaNs, some of these responses were imputed with the mean of the row, which is a continuous value (for example 0.22)

uncut shadow May 22, 2020, 5:17 PM

#

@gusty willow well, if all you have is raw data (for example csv files or sth) then there is no way to do this tho. Computer cannot detect which language is it (I mean, not without machine learning)

gusty willow May 22, 2020, 5:18 PM

#

@uncut shadowhow with ML?

uncut shadow May 22, 2020, 5:19 PM

#

you can technically make a model which could detect what language is it

#

you would need a dataset for that

#

but data often doesn't have many collumns so the best way would be to choose columns manually

summer yarrow May 22, 2020, 5:27 PM

#

hi

#

can somone help me

#

https://stackoverflow.com/questions/61959701/policy-gradient-on-tic-tac-toe-not-working

Stack Overflow

Policy Gradient on Tic-Tac-Toe not working

I wanted to implement the Policy Gradient on Tic-Tac-Toe.
I tried to use the code that worked for any environment like CartPole-v0 to my Tic-Tac-To game. But it is not learning. There are no errors.

rigid storm May 22, 2020, 6:17 PM

#

Can we assume normality of data if both our groups are > 30? (according to central limit theorem) ?

#

datapoints are discrete [-3, -2, -1, 0, 1, 2, 3]

polar acorn May 22, 2020, 6:23 PM

#

You can assume normality of the mean of the data but not the data itself, that is what the central limit theorem says.

rigid storm May 22, 2020, 6:25 PM

#

could you elaborate the diff?

polar acorn May 22, 2020, 6:35 PM

#

Let's for instance say the data is uniformly distributed. Pretty far from a normal distribution right? However you pull many large sample sets from the data and find the mean of each one. Then if each sample set is big enough you will find that if you plot the means they look normally distributed. What does this mean in practice? It means no matter how your data is distributed, if you have enough samples you can treat the mean as normally distributed and do all the stuff you normally would do with a normally distributed value, e.g. hypothesis testing etc. But the data itself is not normally distributed. It's a common misunderstanding though.

rigid storm May 22, 2020, 6:36 PM

#

Ah so if i were to run this same experiment lets say 30 times.

#

then i plot the distribution of all of those means.

#

i get a normal distribution.