#data-science-and-ml | Python | Page 188

zealous ermine Jul 28, 2018, 8:43 PM

#

📎 b.png

#

im using cropped images such as those, and comparing them against other character images cropped in the same way, but drawn with a different font, to see if theyre the same character

#

right now, im using MSE to compare the 2 images - so i might compare the image against the whole alphabet in another font, and whichever character it matches best with, thats the character i assign it

#

so like the japanese extra thick is written with japanese characters

#

乇

#

and that would be compared against a-z, A-Z and whichever was most similar is what it would be

#

so in that case E

#

the thing is, i dont really get how these functions work, so im not really sure which one to use for this

#

does anyone have any idea how they work, so they can tell me which would be the most effective here?

#

(testing a lot of characters would be pretty labour intensive cuz id have to find them and then check them manually)

velvet anchor Jul 28, 2018, 9:43 PM

#

I covered MSE in AP Stats

#

and NRMSE seems pretty similar just, you know, normalized

#

what do you wanna know about how they work

zealous ermine Jul 29, 2018, 1:31 AM

#

Wait i’ll dm u this tomorrow so we’re not taking up the channel

#

Other ppl u can use it now

serene oar Jul 29, 2018, 7:11 PM

#

Hello.
I am using pandas to read a csv file and I have assigned the "Date" column as the index.
The data is about the volume and price of a stock. Is it possible to somehow assign weekdays to the dates and from there bring out on what days the biggest volume of shares on average was sold or the average of every weekday in general for comparison?

#

Would making nested lists of 5 indices per list and comparing their values their positions have be a good option?
Or would that be too inefficient?

serene oar Jul 29, 2018, 7:42 PM

#

Found properties: got fix. Sry for noise

analog rampart Jul 30, 2018, 8:24 AM

#

how many lectures are there in andrews ng's ML course

prime thistle Jul 30, 2018, 6:55 PM

#

does anyone know of a way to check image similarity in python?

velvet anchor Jul 30, 2018, 6:58 PM

#

I thought OpenCV had a method for it but I don't think it does upon researching

#

here's a way you can use MSE to compare @prime thistle https://www.pyimagesearch.com/2014/09/15/python-compare-two-images/

#

@zealous ermine this is also relevant to you I think

prime thistle Jul 30, 2018, 6:59 PM

#

nice find

velvet anchor Jul 30, 2018, 7:00 PM

#

Also here, same technique explained differently http://scikit-image.org/docs/dev/auto_examples/transform/plot_ssim.html

prime thistle Jul 30, 2018, 7:01 PM

#

its way faster than i expected

velvet anchor Jul 30, 2018, 7:02 PM

#

Yeah iterating over single images is prettyquick

#

especially if they're small

prime thistle Jul 30, 2018, 7:03 PM

#

i have nearly 2k

#

and i would like to be able to 'plot' them in some way or another

#

to see which images are close to each other and which are not

#

possibly even cluster them

velvet anchor Jul 30, 2018, 7:05 PM

#

That seems reasonable but you'll definiteily want to cache the values in some way

prime thistle Jul 30, 2018, 7:06 PM

#

yeah sounds reasonable

#

ty for pointing me in a direction 😃

velvet anchor Jul 30, 2018, 7:07 PM

#

Because 2k images compared to 2k images is uh, quite a large number

desert cradle Jul 30, 2018, 7:14 PM

#

4 million pairs

#

wait, no, only 1.999 million

#

"only"

velvet anchor Jul 30, 2018, 7:24 PM

#

Yeah

#

and trying to store it all in ram 👌

#

Well I guess that’s only 32mb. Seemed much bigger in my head

prime thistle Jul 30, 2018, 7:31 PM

#

yeah itsa lot

#

i was hoping for some kind of canvas or plot to plot them on

velvet anchor Jul 30, 2018, 7:32 PM

#

Matplotlib will handle the drawing for you.

#

Or rather can handle the amount of points

#

You’ll just have to decide how you want it graphed

#

You could also use Jupyter and graph it inside the canvas

prime thistle Jul 30, 2018, 7:44 PM

#

well i think the literal drawing would be fine

#

but im looking for measures of similarity

#

to place them on axes

velvet anchor Jul 30, 2018, 7:50 PM

#

Yeah I just meant like matplotlib would be a good library to handle plotting.

prime thistle Jul 30, 2018, 7:50 PM

#

funny how its still the go to lib

velvet anchor Jul 30, 2018, 7:52 PM

#

Yeah it’s gonna be a long while before it’s overtaken

#

Because it’s just so good

prime thistle Jul 30, 2018, 8:01 PM

#

its kinda sweet

zealous ermine Jul 30, 2018, 8:51 PM

#

@velvet anchor i’ve seen that - that’s why I originally went with SSIM

#

but MSE turned out to work a lot better

velvet anchor Jul 30, 2018, 8:52 PM

#

👌

zealous ermine Jul 30, 2018, 8:53 PM

#

@prime thistle get like a 96GB ram server from digital ocean for a couple of hours (write and test the code before hand) and then just use that so u can cache all the images at once

#

(Idk how big your images are, u might need less)

velvet anchor Jul 30, 2018, 8:54 PM

#

At 1080p quality to hold 2,001 in memory (all 2000 and the comparing image) its 16gb of ram

#

So to hold all of them it'd be like 40gb probably to hold everything

zealous ermine Jul 30, 2018, 8:55 PM

#

That’s it? Damn thought it would be more

#

Wait then why does he need 40 if it’s 16?

velvet anchor Jul 30, 2018, 8:55 PM

#

... nvm i'm dumb I doubled it

zealous ermine Jul 30, 2018, 8:55 PM

#

😂😂

velvet anchor Jul 30, 2018, 8:55 PM

#

because, mentally, I was like

#

you need 2000 images and then 2000 to compare them all

#

disregarding the fact its the same 2000

zealous ermine Jul 30, 2018, 8:56 PM

#

That would still only be 32 not sure where u got the extra 8 from

velvet anchor Jul 30, 2018, 8:56 PM

#

Just from like holding the numbers, the matplot, OS overhead

zealous ermine Jul 30, 2018, 8:56 PM

#

8 GB for that?!!

velvet anchor Jul 30, 2018, 8:56 PM

#

Seems reasonable to ensure you dont run oom

zealous ermine Jul 30, 2018, 8:57 PM

#

The plot is xy, so 2 numbers per photo - how much storage can 4000 numbers possibly take 😂

velvet anchor Jul 30, 2018, 8:57 PM

#

Yeah I mean its overkill but #napkinmath

lapis sequoia Jul 30, 2018, 9:59 PM

#

What can be done with a simple basic neural net? (example: in:[1, 0, 1] out:[0,0,1] )

velvet anchor Jul 30, 2018, 10:08 PM

#

Well some of the very basic ones are used in tutorials like determining if the numbers passed are representative of a flower species for instance. It’s a little more complex than that, but not by a whole lot

prime thistle Jul 31, 2018, 7:26 AM

#

ill try to build it and ill tell you haha

dense siren Aug 1, 2018, 5:34 AM

#

Hi guys. I posted this question in a help server, but since it is computational astro, I decided to reach out here as people in this thread may have more specific knowledge for my problem. 😃

#

My question is about how I can solve a coupled system of ODE's, and print out the variables in a plot.

#

📎 unknown.png

#

https://paste.pythondiscord.com/ijehedevug.py Note, the q here is replaced by F variable

placid snow Aug 1, 2018, 5:45 AM

#

Is M supposed to be raised to the power of 2 in (48/(5*math.pi*M**2))?

dense siren Aug 1, 2018, 5:47 AM

#

Oops yes, LaTex error.

placid snow Aug 1, 2018, 6:00 AM

#

And where is x defined in py F = x[:,0] e = x[:,1]

dense siren Aug 1, 2018, 6:06 AM

#

that was meant to be y (fixed): https://paste.pythondiscord.com/hibasiluxa.py

chrome seal Aug 1, 2018, 7:42 AM

#

has anyone taken a look at the most recent kaggle challenge? thoughts?
https://www.kaggle.com/c/airbus-ship-detection

Airbus Ship Detection Challenge

Find ships on satellite images as quickly as possible

velvet anchor Aug 1, 2018, 7:47 AM

#

That’s cool

serene oar Aug 1, 2018, 1:22 PM

#

Hello.
I got a feeling I shouldn't be plotting like this.. confirm?

 df = pd.read_csv(str, parse_dates=True)
        df.plot(subplots=True, figsize=(6, 6), sharex=True)
        plt.show()

I want to assign the X axis to be the date instead of the random index that's in the file.
How can I do that?

#

This is the current outcome.
The Y axis sets itself automatically to what I want it to be. Although I'd like it to show the actual (huge) numbers other than making it smaller by removing zeros.

📎 unknown.png

#

Main problem is - how can I assign the Date column to be the shared X axis?

#

Ohhh, that's a Pandas problem. No wonder..

prime thistle Aug 1, 2018, 1:56 PM

#

are the axes even different right now?

brazen spade Aug 1, 2018, 1:58 PM

#

Should i normalize the dataframe columns prior to creating a heatmap from a dataframe with 10 or so columns. Do you really only want columns with variable data, removing the categorical data?

prime thistle Aug 1, 2018, 2:15 PM

#

whats your definition of normalising

serene oar Aug 1, 2018, 2:38 PM

#

The X axis is shared, Y is different.
WOrking with stock price vs volume here. Volume is on a waaayy higher scale.

dreamy tartan Aug 2, 2018, 10:59 AM

#

Hi everyone,

I stuck on something and couldnt figure it out. I have train and test datasets about customers info. There are some difference values between them, difference are numeric values, life-time and customer choices [choices are binary and my target]

I trained my model and predict test data. Results were fine. Accuracy is 86%, also predict 0 and predict 1 accuracy rates are pretty good. Till here everything looking fine. Problem came up when i wanted to predict filtered data from train data. I applied special filter on data like that: in output which were 0 in train data and became 1 in test data. So i predict it and accuracy of results are not even close to 86%. Its less than 20%

I couldnt understand why accuracy is so bad. Any idea?

prime thistle Aug 2, 2018, 6:30 PM

#

so you and a train and test set

#

which were unfiltered

#

and real data which is filtered

#

you question is a bit difficult to understand to me

velvet anchor Aug 2, 2018, 6:32 PM

#

So @dreamy tartan to specify a little more

#

Your accuracy, like the number in training is good, and then when you pass in data to the model using your predict function, that data is also good?

dense oak Aug 2, 2018, 10:36 PM

#

Hey all simple question, just learning how to use sklearn.metrics.accuracy_score . Is a higher score better, or a lower score better? ex. 0.734 vs 0.523

dusky agate Aug 3, 2018, 12:25 AM

#

i guess higher is better, do you have a link to the docs?

velvet anchor Aug 3, 2018, 1:55 AM

#

Higher is better in p much every other ML framework

lapis sequoia Aug 3, 2018, 4:01 AM

#

Hello

dreamy tartan Aug 3, 2018, 7:13 AM

#

@velvet anchor Yes when i predict my all test data and compare result with test_output. Accuracy is also good.

prime thistle Aug 3, 2018, 8:55 AM

#

But is that filtered like the filtered data you mentioned later in the question?

heady sail Aug 4, 2018, 10:56 PM

#

I just wrote my own Binary Search Tree. Could someone tell me if I did anything in correctly/inefficiently? ```python
class Node:
def init(self,val):
self.val = val
self.leftchild = None
self.rightchild = None
class bst:
def init(self,inp):
self.inp = inp
self.root = Node(inp)
'''Inserts a new item into the tree
Num: is the input of the new number or it can be a list
nde: is the node we are checking(root by default)'''
def insert(self,num,nde = None):
if nde is None:
nde = self.root
#check if num is an int
if type(num) == int:
#compares num with the current nde and makes sure that the leftchild is empty
if num < nde.val and nde.leftchild is None:
n = Node(num)
nde.leftchild = n
# compares num with the current nde and makes sure that the rightchild is empty
elif num > nde.val and nde.rightchild is None:
n = Node(num)
nde.rightchild = n
elif num < nde.val:
self.insert(num,nde.leftchild)
elif num > nde.val:
self.insert(num,nde.rightchild)

def search(self,inp,nde = None):
    if nde is None:
        nde = self.root
    if inp is nde.val:
        return inp
    if inp is False:
        return None
    elif inp < nde.val and nde.leftchild is not None:
        return self.search(inp,nde.leftchild)
    elif inp > nde.val and nde.rightchild is not None:
        return self.search(inp,nde.rightchild)```

lone mist Aug 4, 2018, 11:03 PM

#

first thing you could improve is making the bst not rely on having an initial/root node passed in

#

even if you don't, this line self.inp = inp is useless

#

furthermore, you could nest some if statements to save on comparisons

#

for example, you have if num < nde.val and nde.leftchild is None:

#

but then you later do just elif num < nde.val:

#

you could have justif num < nde.val and add a nested check for nde.leftchild is None

velvet anchor Aug 4, 2018, 11:07 PM

#

I did a bst for a school project not long ago

lone mist Aug 4, 2018, 11:07 PM

#

fun stuff

velvet anchor Aug 4, 2018, 11:07 PM

#

https://github.com/claythearc/CS-317-Proj-2/blob/master/MyBST.py

#

Here. It’s pretty messy though

lone mist Aug 4, 2018, 11:08 PM

#

@heady sail should mention you so you dont forget 😃

velvet anchor Aug 4, 2018, 11:08 PM

#

Not solely a bst though. Was half bst and then a linked list that ran along the tree too

lone mist Aug 4, 2018, 11:09 PM

#

ive done them in cpp and typescript

#

never in python surprisingly

heady sail Aug 4, 2018, 11:12 PM

#

Thanks for the resources and comments 😄

velvet anchor Aug 4, 2018, 11:26 PM

#

No problem. I’d clean it up, but it’s turned in already so there’s no incentive. 😂

stone oasis Aug 5, 2018, 1:30 AM

#

@velvet anchor just rebuild the tree brah, no problems here. 😂 okhandbutflipped

velvet anchor Aug 5, 2018, 1:30 AM

#

👌🏾

stone oasis Aug 5, 2018, 1:30 AM

#

👌🏿 fuck the big O

velvet anchor Aug 5, 2018, 1:30 AM

#

its binary search that means its n log n by default

stone oasis Aug 5, 2018, 1:31 AM

#

🤔 yeah, works out

#

now find the least common ancestor between two nodes

#

🤠

velvet anchor Aug 5, 2018, 2:05 AM

#

k so proof by induction

#

assume a tree has 1 node

#

the lca is the root

#

qed

scarlet mist Aug 6, 2018, 4:10 AM

#

Sooooooooooooo... Is anyone familiar with this bad boy: https://github.com/watson-developer-cloud/python-sdk/blob/develop/README.md

GitHub

watson-developer-cloud/python-sdk

python-sdk - :snake: Client library to use the IBM Watson services in Python and available in pip as watson-developer-cloud

velvet anchor Aug 6, 2018, 4:13 AM

#

most of the people who will prolly wont be around until tomorrow id wager but it at least wont get buried here

scarlet mist Aug 6, 2018, 5:40 AM

#

Much appreciated.

scarlet mist Aug 6, 2018, 6:59 PM

#

Okay, so right now, I'm just trying to figure out authentication. Here's my code:

#

from watson_developer_cloud import ToneAnalyzerV3 as tav3

tone = tav3(version = '2017-09-21',
username = 'username',
password = 'password')

tone = tav3(version = '2017-09-21')
tone.set_username_and_password('username',
'password')

#

and here's my console output:

#

runfile('C:/Users/kjohn/OneDrive/Desktop/Watson.py', wdir='C:/Users/kjohn/OneDrive/Desktop')
Traceback (most recent call last):

File "<ipython-input-7-0c81ff784954>", line 1, in <module>
runfile('C:/Users/kjohn/OneDrive/Desktop/Watson.py', wdir='C:/Users/kjohn/OneDrive/Desktop')

File "C:\Users\kjohn\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "C:\Users\kjohn\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/kjohn/OneDrive/Desktop/Watson.py", line 17, in <module>
tone = tav3(version = '2017-09-21')

File "C:\Users\kjohn\Anaconda3\lib\site-packages\watson_developer_cloud\tone_analyzer_v3.py", line 105, in init
use_vcap_services=True)

File "C:\Users\kjohn\Anaconda3\lib\site-packages\watson_developer_cloud\watson_service.py", line 272, in init
'You must specify your IAM api key or username and password service '

ValueError: You must specify your IAM api key or username and password service credentials (Note: these are different from your Bluemix id)

velvet anchor Aug 6, 2018, 7:51 PM

#

Are you user and password corrrect?

scarlet mist Aug 6, 2018, 8:05 PM

#

I copied them directly from the ones listed in my service instance on Bluemix.

#

And I'm following the documentation to a T, excepting the service being called.

velvet anchor Aug 6, 2018, 8:09 PM

#

Try your other user and password combination just to be sure

#

That way it’s 100% an error with something you’ve done and not just a dumb mistake we all make

scarlet mist Aug 6, 2018, 8:19 PM

#

Eeeeeeh... Also tried the service in the documentation examples with a different (paired to that specific instance of that service) credentials), and the same error got thrown in my face.

velvet anchor Aug 6, 2018, 8:20 PM

#

Hmm. Okay I’ll be at work and can take a closer look in a few. Just doing the small basic stuff while en route :p

scarlet mist Aug 6, 2018, 8:22 PM

#

Well dang, just look at you multitasking.

#

Debugging and driving at the same time?

#

I take it you've figured out how to import multiprocessing to IRL?

velvet anchor Aug 6, 2018, 8:25 PM

#

Not driving. Just in line for food and waking dog and stuff :p

scarlet mist Aug 6, 2018, 8:39 PM

#

Ah, gotcha!

velvet anchor Aug 6, 2018, 8:54 PM

#

@scarlet mist making an account here to toy around with it

scarlet mist Aug 6, 2018, 9:06 PM

#

Sweet, thanks for lending a hand.

velvet anchor Aug 6, 2018, 9:09 PM

#

which service did you make?

#

also @scarlet mist i get no errors with my py file being

#

from watson_developer_cloud import ToneAnalyzerV3 as tav3

tone = tav3(version = '2017-09-21', username = "username_here", password = "password here")

scarlet mist Aug 6, 2018, 9:19 PM

#

Worked for me as well, after commenting out the last couple of lines. But I was under the impression that I need them after instantiating tone.

#

Or am I misunderstanding their usage?

#

Also, I have both the discovery and tone analyzer services running.

#

@velvet anchor

velvet anchor Aug 6, 2018, 9:22 PM

#

i dont think you need both lines

#

i think its just showing 2 different ways to initialize the connection

scarlet mist Aug 6, 2018, 9:23 PM

#

Oh! I see, it's showing how to alter parameters after tone is instantiated.

#

That makes a good bit more sense.

#

Thanks for the extra set of eyes!

thorn river Aug 7, 2018, 8:40 AM

#

How could I use sets to calculate (for example) true positives, without creating a confusion matrix first?

Given I have two lists, y_true, y_pred. For example:

y_pred = [0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 1, 2, 0, 0, 0, 0, 1, 1]```


Where 0 = no spam, 1 = spam, 2 = phishing

i created the first sets like this: 

 ```really_spam = [i for (i,v) in enumerate(y_true) if v == 1]
spam_set = set(really_spam)

pred_spam = [i for (i,v) in enumerate(y_pred) if v ==1]

pred_spam_set = set(pred_spam)```


How would I go about calculating true positives for spam using the two sets? 

Any help is much appreciated

#

Question has been answered in #help-coconut

glad pivot Aug 7, 2018, 6:03 PM

#

I've got a question for anybody that's here

#

I am working through the Data Science Nanodegree through Udacity and am having a bit of trouble understanding some gradient descent concepts

#

I am working through an assignment where I am implementing and training a neural network, and I am tasked to write some of the helper functions

#

So here are some functions that were given to me:

#

def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_prime(x):
    return sigmoid(x) * (1-sigmoid(x))
def error_formula(y, output):
    return - y*np.log(output) - (1 - y) * np.log(1-output)

#

And I was tasked to write the error_term_formula, which takes y and output as arguments

#

So this is the error term formula: −(y−ŷ )σ′(x)

#

And this is Udacity's "solution" to writing the error_term_formula:

#

def error_term_formula(y, output):
    return (y-output) * output * (1 - output)

#

Can anyone help me understand why it is being written like this and why they are not utilizing the sigmoid_prime() function?

feral lodge Aug 7, 2018, 8:44 PM

#

Yo man, say the neuron's error is the squared error
E = -(y - ŷ)² and we want to compute the derivative of the error with respect to the neuron's input z. That means we want to find out out in what manner the error E changes if we "wiggle" z by just a little bit -- if we know that, we can change z in such a way as to lower E. The small changes of E as a result of small changes in z are described by the derivative of E w.r.t. z:
∂E/∂z
But the error isn't a function of z, it's a function of ŷ! However, ŷ, the output, is a function of z, the input. So the situation is like this: E = f(ŷ(z)). That means that if we wiggle z a little bit, then ŷ will also wiggle a little bit as a result -- then the E will finally also wiggle a little bit. This propagation of wiggles is described by the chain rule:
∂E/∂z = ∂E/∂ŷ * ∂ŷ/∂z
Now, we know that
E = -(y - ŷ)² and ŷ = σ(z)
which means that ```
∂E/∂z = ∂E/∂ŷ * ∂ŷ/∂z
= -2(y - ŷ) * (-1) * ŷ * (1 - ŷ)

Where I put some extra spaces in the middle to show the difference between `∂E/∂ŷ` and `∂ŷ/∂z`.

Now, you'll notice that this formula equals `2(y - ŷ) * ŷ * (1 - ŷ)`, ie Udacity's formula multiplied by a factor `2`. The `2` comes from my using the squared error -- maybe Udacity defines the error to be `E = -(y - ŷ)` instead. @glad pivot

#

So there's no need to use sigmoid_prime and waste CPU cycles in this case; we would just get a value we already have

glad pivot Aug 7, 2018, 8:53 PM

#

@feral lodge Thank you so, so much! This helps immensely, none of the "mentors" at Udacity have responded to my question. This makes a lot more sense!

feral lodge Aug 7, 2018, 8:54 PM

#

Glad to help friendo! 👌

glad pivot Aug 7, 2018, 8:56 PM

#

I can show you the training function where it is being implemented, maybe that will help me understand if they are using the MSE or not

feral lodge Aug 7, 2018, 8:56 PM

#

Let's have a look

glad pivot Aug 7, 2018, 8:57 PM

#

It won't let me paste in a codeblock longer than 2,000 chars

feral lodge Aug 7, 2018, 8:57 PM

#

Try https://hastebin.com/

#

Can't read it if it's too long though, busy reading some papers 🤓

desert cradle Aug 7, 2018, 8:58 PM

#

https://paste.pythondiscord.com/

glad pivot Aug 7, 2018, 8:58 PM

#

It's only like 60 lines

#

https://hastebin.com/owumalecuq.py

desert cradle Aug 7, 2018, 8:59 PM

#

really almost any pastebin site's fine, but worth noting there is an official one

glad pivot Aug 7, 2018, 8:59 PM

#

Ok sounds good, I get the idea that you all really know what you're doing here in comparison to Udacity's Slack channel

#

I can provide the helper functions as well if needed

#

It says # Printing out the mean square error on the training set

#

# Activation (sigmoid) function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_prime(x):
    return sigmoid(x) * (1-sigmoid(x))
def error_formula(y, output):
    return - y*np.log(output) - (1 - y) * np.log(1-output)
# TODO: Write the error term formula
def error_term_formula(y, output):
    return (y-output) * output * (1 - output)

#

So I suppose it is just squaring them at a later point

#

Also I just realized I already posted those functions earlier 😛

feral lodge Aug 7, 2018, 9:24 PM

#

Hmm, looks like there's an error in their code 🤔 The error_formula you're using there, is the log-loss formula, used when we have two possible outputs, 1 and 0.
The log-loss looks like this:

logloss(y, ŷ)
    if y == 1:
        return -log(ŷ)
    else if y == 0:
        return -log(1-ŷ)
    else:
        undefined

Which can equivalently be written like Udacity did:

    return - y*log(ŷ) - (1 - y) * log(1-ŷ)```
However, the error derivative (that I showed eariler) was the result of differentiating the error `E = (y - ŷ)` They mention this in the code, line 27:
```python
           # The error, the target minus the network output
           error = error_formula(y, output)

#

I'm guessing they were intending to use the log-loss, but changed their minds and forgot to change the error computation

glad pivot Aug 7, 2018, 9:28 PM

#

I understand all of that I think, but the log loss formula itself doesn't return a binary value, correct?

feral lodge Aug 7, 2018, 9:28 PM

#

Correct

glad pivot Aug 7, 2018, 9:29 PM

#

It simply omits one site of the equation given if it was correctly classified or not

feral lodge Aug 7, 2018, 9:29 PM

#

You can switch between log-loss, squared loss and whatever no problem. But you have to change the error differentiation formula accordingly

#

What is the optimum for one loss type may not be the optimum for another

#

So computing the log loss, but using the differentiation of another loss is not correct

glad pivot Aug 7, 2018, 9:30 PM

#

Ah so they are using two different methods here

#

They are computing the error with one method, and then using the derived (gradient) of another formula?

feral lodge Aug 7, 2018, 9:31 PM

#

Yarp

#

Which may work fine is the optima of these two error types are close

#

But is almost certainly unintended

glad pivot Aug 7, 2018, 9:31 PM

#

It seems as if they are; I don't think they would have published it if they weren't

#

So in order for this to be 100% accurate, what would you change?

#

I apologise if I'm pulling you from your papers

feral lodge Aug 7, 2018, 9:33 PM

#

def error_formula(y, output):
    return y - output

I think 🤔

#

But not using the squared error is unusual, they should probably stick to that

glad pivot Aug 7, 2018, 9:33 PM

#

loss = np.mean((out - targets) ** 2)

#

Aren't they doing that here, though?

feral lodge Aug 7, 2018, 9:34 PM

#

So I would personally use

def error_formula(y, output):
    return (y-output)**2
def error_term_formula(y, output):
    return -2 * (y-output) * output * (1 - output)

glad pivot Aug 7, 2018, 9:34 PM

#

I'll run it as is, show the output, make your change, and show

feral lodge Aug 7, 2018, 9:35 PM

#

Regarding this:

loss = np.mean((out - targets) ** 2)```

That's just for printing as far as I can tell

glad pivot Aug 7, 2018, 9:35 PM

#

📎 unknown.png

feral lodge Aug 7, 2018, 9:36 PM

#

So it's like this:

They wanted to use log loss, but changed their minds. They fixed the differentiation, but forgot to fix the actual error to the non-squared error.
For printing the progress however, they show the squared error

#

So there are three error functions floating about

glad pivot Aug 7, 2018, 9:36 PM

#

📎 unknown.png

#

With your changes 🤔

feral lodge Aug 7, 2018, 9:37 PM

#

Yes, we also have to pay attention to the gradient descent

glad pivot Aug 7, 2018, 9:37 PM

#

Oh, is it going the opposite way now?

#

📎 unknown.png

#

I flipped the += to be -=

feral lodge Aug 7, 2018, 9:38 PM

#

Correct

#

Writing an explanation

glad pivot Aug 7, 2018, 9:38 PM

#

No that makes sense! In case anyone is lurking

feral lodge Aug 7, 2018, 9:39 PM

#

Kk, sweet 😄

#

You'll observe that in my squared error function, i return Udacity's original error derivative multiplied by a factor of -2 yeah? Previously it was just 2, because I had, in my first explanation, defined the error to be E = -(y-ŷ)². But in my code you'll notice I opted for E = (y-ŷ)². This is the usual case; generally the error is positive, leading to a negated derivative

glad pivot Aug 7, 2018, 9:43 PM

#

Yes that makes more sense, they even went over that in their lectures

#

But I didn't know why it wasn't showing up in their code. I have a feeling that they adopted this code from some other lecture series, and didn't really give it that much editing

#

It would have to be squared--or else the error would be misrepresented, or close to zero even

#

but they are also taking 1/m * sum(MSE)

#

And they said it was that way for convention's sake

feral lodge Aug 7, 2018, 9:45 PM

#

This leads to the typical form of gradient descent -- using a minus sign in the update of the weights. You'll see on line 42 they update their weights like this:
weights += learnrate * del_w / n_records
This is unusual. Usually, we use -= in gradient descent; ie when we're minimizing. += is usually used when doing gradient ascent; ie when we're maximizing. The way they've done it is to define a negative error, where we usually use a positive error. This leads to their weight update adding instead of subtracting. You can read a bit about it here if you want https://medium.com/@aerinykim/why-do-we-subtract-the-slope-a-in-gradient-descent-73c7368644fa

#

Yes I think they've done some last-minute changes to their log loss coverage. I found this github ticket regarding their lectures https://github.com/udacity/sdc-issue-reports/issues/1331

glad pivot Aug 7, 2018, 9:47 PM

#

That was also confusing to me, but that also makes more sense. I had calculated GD in previous labs, and had always subtracted incrementally until reaching a minimum

#

That only further affirms my suspicions that it is third-party content adopted to their course

#

Well I guess going through this whole ordeal has helped me understand a little bit better, at least

#

I hope you're a teacher! (You mentioned papers...) Only because you're so great at explaining difficult concepts

feral lodge Aug 7, 2018, 9:51 PM

#

Thank you for the kinds words my friend! 😄 I'm just a master's student currently, but I'll definitely have to do some teaching down the line as part of my doctorate

#

The papers are research papers 😃 Coincidentally, also about neural networks! https://arxiv.org/pdf/1703.01961.pdf

glad pivot Aug 7, 2018, 9:53 PM

#

Well if you ever need to practice teaching , I would be happy to be on the receiving end 😄

#

The end of this section of the course requires an Image Classifier programme, which I'm sure I will need some help with 😫

feral lodge Aug 7, 2018, 9:55 PM

#

Just ask away, there are a few of us here who've worked with image classifiers and the like 👌

glad pivot Aug 7, 2018, 9:55 PM

#

Oh man, I am not great with math notation. I could never do higher ed in DS

feral lodge Aug 7, 2018, 9:55 PM

#

Math was an acquired taste for me, don't sell yourself short!

glad pivot Aug 7, 2018, 9:55 PM

#

I haven't even taken a proper linear algebra course 🤣

feral lodge Aug 7, 2018, 9:56 PM

#

I took remedial maths as a kid 👀

glad pivot Aug 7, 2018, 9:56 PM

#

Just BC calculus in high school

#

Which then waived my requirements for maths in college, so my brain got really really rusty with derivatives and such

#

I majored in Finance, by the way

feral lodge Aug 7, 2018, 10:02 PM

#

Machine learning will be useful for you then!

#

Youtube channels like khan academy and 3blue1brown are golden for getting a fetter feel for math notation by the way! (khan has a great series on linear algebra; you couldn't ask for a better introduction to that subject imo) Usually the concepts are simple and intuitive enough, and the notation is just a convenient way to convey it

glad pivot Aug 7, 2018, 10:07 PM

#

You mean applying ML in Finance? Yes I agree, especially for trading and such. I have taken several "refresher" courses on LA in the past few weeks, and I understand all of the concepts just fine, I think. Determinants, Matmul, Linear transformations, etc., I just wouldn't be able to sit down and work out a problem by hand

lean ledge Aug 7, 2018, 11:01 PM

#

I think 3b1b 's essence of linear algebra is a gold mine and an absolute treasure. One of the best resources out there

glad pivot Aug 7, 2018, 11:03 PM

#

I have taken that! It is incredible

coarse perch Aug 8, 2018, 5:25 AM

#

Any mini projects you people can suggest?

dense ocean Aug 8, 2018, 6:43 AM

#

anyone dealing with dedupeio and having performance problems when you take a generator back from matchBlocks and try to convert it into a list() ?

#

it seems that dedupeio is efficient with cores (happily consumed 8 cores on this box), but when I get my clustered list back and try to do anything with it (ie. clustered_list = list(clustered_list), python is restricting itself to a single core

serene oar Aug 8, 2018, 11:31 AM

#

Hi, how do you go about grabbing data from Yahoo finance nowadays? I used the fix_yahoo_finance import, but it returns some weird in the Volume column

📎 unknown.png

proud pond Aug 8, 2018, 12:50 PM

#

does AI , ML stuff go under #data-science-and-ml ?

serene oar Aug 8, 2018, 12:52 PM

#

Yep

#

📎 unknown.png

glad pivot Aug 8, 2018, 1:12 PM

#

@serene oar i've been using source="morningstar" for my financial queries

#

When using pandas_datareader

#

And that seems to work just fine for me

serene oar Aug 8, 2018, 1:12 PM

#

👍

bronze coyote Aug 8, 2018, 7:31 PM

#

I know this isn't Python related , but does anyone here use R? Trying to do a conditional statement and hitting a wall. Finding it difficult to find resources.

scarlet mist Aug 8, 2018, 9:07 PM

#

Not strictly Python related, but I got my datasets from Python... Let's say I have a set of metadata from 16,000 tweets dating back to late 2010 (the end of history for tweets containing a given stock ticker), and a set of historical stock data dating back to 2006. If I want to potentially find a correlation between metrics relating to tweets and stock performance, should I include dates all the way back to 2006 in my regressions/correl-coeff analysis, or should I only include dates back to the end of tweet history?

#

Also, @glad pivot is the Morningstar API working again? It broke a few days ago and forced me to learn another api.

glad pivot Aug 8, 2018, 9:35 PM

#

I haven’t used it for a month or so, so maybe it is broken. I haven’t checked

scarlet mist Aug 8, 2018, 9:59 PM

#

Ah. I haven't checked it in a few days, but if it is still broken, the Quandl api is a nice susbsitute.

polar acorn Aug 9, 2018, 12:02 PM

#

@bronze coyote post code sample?

void anvil Aug 10, 2018, 2:12 AM

#

Any good packages for markowitz optimization? I think PYMCEF might be breaking

HL me if you know a different one

proud pond Aug 10, 2018, 11:20 AM

#

hey guys ,
can someone suggest to me a playlist on Youtube to watch for learning AI , i want to learn how to make AI that can play games , and i know almost nothing about AI so where should i start ?

earnest prawn Aug 10, 2018, 12:59 PM

#

certainly not at you tube

#

do some googling on python AI stuff

void anvil Aug 10, 2018, 5:11 PM

#

Are there any good packages for markowitz optimization / other similar stuff?

serene oar Aug 11, 2018, 12:47 PM

#

Hi!
If there are 2 datasets I want to combine, but the dates wont match, how can I go about this?
I currently took a real estate index, and converted it into an annual mean. The other data I have is the GDP, but it only has the year number as its index. When I added this data into one dataframe, it print NaN to every GDP measure.

GDP imports as with this date format: 2000, 2001, 2002...
real estate index date is : 2015-02-01... etc
How can I have it to be a shared year number only?

#

Index is from Quandl, GDP from World Bank.

#

Can I at least add -12-31 to the end of the GDP date, as that's the date I get for each year when I take the annual mean?

ionic roost Aug 11, 2018, 8:19 PM

#

hello group

#

i am taking intro to data science this fall at a college level

#

i have been developing in python for seven months now but they use R in that class

#

does anybody have good free resources or online textbooks for introductions?

polar acorn Aug 11, 2018, 8:42 PM

#

Here's a free book on R for data science that I never read, but that someone once recommended me. http://r4ds.had.co.nz/

R for Data Science

This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how...

#

I might add it's written by Hadley Wickham, a name you are guaranteed to run into if you're using R for data science.

ionic roost Aug 11, 2018, 8:59 PM

#

My teacher just told me in email that's the book we are using! Interesting.

#

Thanks for help

vague dock Aug 12, 2018, 8:44 AM

#

Hi there! Not sure if this is the right place to ask, but does anyone know if matplotlib (imshow() specifically) only works in Jupyter notebook? Is there no default display function outside of the notebook?

import matplotlib.pyplot as plt
plt.imshow(some_image)```

Thanks!

polar acorn Aug 12, 2018, 12:19 PM

#

Try adding plt.show()?

vague dock Aug 12, 2018, 12:21 PM

#

@pptt thanks so much, works great 👌

muted niche Aug 12, 2018, 1:49 PM

#

Does anyone happen to know off the top of their head what features would be good to extract from accelerometer data? I'm trying to classify movments

lapis plume Aug 12, 2018, 5:48 PM

#

Hi, is there any good resource for an intro to statistics with Python?

small ore Aug 12, 2018, 9:02 PM

#

@muted niche out of curiosity, how ( in what format) is the accelerometer data available to you?

muted niche Aug 13, 2018, 7:02 AM

#

@small ore It gets streamed from my phone as a string that is comma separated. I take what I need from the string and store it as a float in a numpy array

lilac shadow Aug 13, 2018, 7:04 AM

#

perhaps it would be a good idea to provide an example of how this accelerometer data is laid out in your array, but i'm not too sure. it would actually be interesting to see some visualisations of what this data looks like though. ^-^

small ore Aug 13, 2018, 7:09 AM

#

@muted niche is it just (ax, ay, az, plumb) or more data in the stream? Or in other words what data do you have in each packet of the stream?

muted niche Aug 13, 2018, 7:09 AM

#

📎 AccData.png

#

x, y,z, timestamp

small ore Aug 13, 2018, 7:10 AM

#

Cool

muted niche Aug 13, 2018, 7:10 AM

#

polling rate is around 16-17 Hz

#

at first I just chucked the raw data into an SVM model but that's not correct I need to extract features from the data I recieve and use those values in the SVM

small ore Aug 13, 2018, 7:13 AM

#

I can imagine a lot of different use cases from the accelerometer data. I suppose it will depend on your exact application how you process it

muted niche Aug 13, 2018, 7:14 AM

#

I'm trying to classify different activities, walking and falling for the most part

small ore Aug 13, 2018, 7:16 AM

#

Like if you are making an app to determine the velocity of the vehicle you are traveling you need to get the exact acceleration or in the direction of your travel and integrate it by some means.

#

I suggest making use of the plumb data in addition to your ax, ay, az as it seems to be logical to me that all the actions you mention is corelated to it

lilac shadow Aug 13, 2018, 7:18 AM

#

that's certainly an interesting idea. is this data transferred from your phone to the python application through sockets or something like that?

small ore Aug 13, 2018, 7:18 AM

#

Also why not just plain logistic regression?

lilac shadow Aug 13, 2018, 7:21 AM

#

oh that looks like it would be good, actually

small ore Aug 13, 2018, 7:21 AM

#

Juan, as far as I can imagine he just needs to somehow record raw data on his phone and transfer by any means to the system he is using to analyze them. ( Need not be real time). Once you get the prediction equation he may need to use the real time data if his use case is to predict an activity as soon as it happens

lilac shadow Aug 13, 2018, 7:22 AM

#

oh i see. if it's real-time however, then that would certainly be fancy :D

small ore Aug 13, 2018, 7:23 AM

#

The phone shouting out "I am falling, help me!" ? 😛

lilac shadow Aug 13, 2018, 7:23 AM

#

haha, there's an app for that™

muted niche Aug 13, 2018, 7:23 AM

#

it is in realtime

lilac shadow Aug 13, 2018, 7:23 AM

#

tell me more.

muted niche Aug 13, 2018, 7:24 AM

#

it is more to classify human movement

lilac shadow Aug 13, 2018, 7:24 AM

#

https://www.tenor.co/wQM2.gif

Tenor

the office

▶ Play video

muted niche Aug 13, 2018, 7:24 AM

#

what is plumb data?

#

im making a system that detects when a person falls and sends an ert

small ore Aug 13, 2018, 7:25 AM

#

Okay. Do you know how an accelerometer works?

muted niche Aug 13, 2018, 7:25 AM

#

it uses accelerometer data from a phome amd images from a webcam

#

not entirely, i know how to get the xyz data

small ore Aug 13, 2018, 7:29 AM

#

While there are bound to be accelerometers working on different ways, an example would be an instrument that produces an electrical output corresponding to the finite difference of the movement. So it is not absolute acceleration that is available to you from the instrument but a series of values which is the difference from the previous value. It is relative data.

muted niche Aug 13, 2018, 7:31 AM

#

that sounds like it would explain why the first reading is always 0,0,0

small ore Aug 13, 2018, 7:33 AM

#

Suppose you have an application to locate the stars in the sky and are using the back camera of the phone to super-impose the image with a map of stars. You have your screen facing downwards. If you do not have a way to know which direction your phone is tilted and just know x, y and z accelerations, then it is impossible to know if your screen is facing downwards or upwards ( In general the orientation of the phone cannot be fully determined just with (ax, ay, az)

#

A plumb line in engineering practice ( Or say plumbing practice) is a device used to determine if a line is straight and oriented along the gravity of the earth. Phones have separate plumb sensors ( coz accelerometer data alone is insufficient to determine orientation) . You use this data to determine which direction the gravity of earth is pointing to

#

May go with different names than plumb. I am not sure

#

Do I make sense or did I confuse you?

muted niche Aug 13, 2018, 7:37 AM

#

no, that definitely makes sense.

#

I thought it would be possible to get orientation from accelerometer alone though

lilac shadow Aug 13, 2018, 7:37 AM

#

i was actually wondering how the device determines the orientation, that's certainly interesting.

muted niche Aug 13, 2018, 7:38 AM

#

what ever sensor reads -9.8 is facing up/down

vague dock Aug 13, 2018, 7:38 AM

#

I've been reading along and wondering: would it have merit to transform the data from relative to previous frame to relative to 0-frame (absolute-ish) for the problem of classifying movements?

small ore Aug 13, 2018, 7:39 AM

#

Accelerometers does not pick up earth's gravity as acceleration. But the (ax, ay, az) data you have got may already have been processedby your phone to add data from the plumb sensors. So it may already be adjusted for it

muted niche Aug 13, 2018, 7:39 AM

#

I was thinking I would need to perform some sort of statistical analysis on the data, standard deviation, fft the data and get energy of the signal etc

small ore Aug 13, 2018, 7:40 AM

#

Esp if you are getting a 9.8 when you hold it constant

muted niche Aug 13, 2018, 7:40 AM

#

Ah right, that would explain why the accelerometer reads -9.8 when aligned with the up/down axis

small ore Aug 13, 2018, 7:44 AM

#

@vague dock I am not sure what exactly you mean but I am thinking some processed data from the previous few seconds ( Example: Simple moving average) could perhaps be needed

vague dock Aug 13, 2018, 7:45 AM

#

well, for example instead of having [0, 1, 1, 2, 2, -1, -3, -5] you'd have [0, 1, 2, 4, 6, 5, 2, -3]

#

so instead of difference to previous frame you'd have the sum that represents difference with the first frame

small ore Aug 13, 2018, 7:47 AM

#

But in a series of continuous data you will need a function that processes data over the previous few frames

#

Coz I think a 'fall' or some such action is a function over a few frames

vague dock Aug 13, 2018, 7:49 AM

#

yeah, but aren't all movements are actions over multiple frames?

muted niche Aug 13, 2018, 7:49 AM

#

I'm keeping my buffer/window size to 61 data points

#

that is enough to capture the whole "fall" signal

#

So I need to do some signal processing on that window

small ore Aug 13, 2018, 7:50 AM

#

Oh. Also Shakis, I am thinking of a non ML approach. Do know that differential of displacement is velocity, differential of velocity is acceleration and differential of acceleration is jerk ( all over time).

muted niche Aug 13, 2018, 7:51 AM

#

I did not know that.

#

I am not even sure I know what differential is. Sounds a bit calculusy

#

(nor jerk)

small ore Aug 13, 2018, 7:52 AM

#

A fall tends to get finite values of jerk whereas the simple walking would have 0 or very small jerk

muted niche Aug 13, 2018, 7:52 AM

#

Alright, that sounds like a really good one to use.

small ore Aug 13, 2018, 7:52 AM

#

differential is certainly the calculus one. ANd one can numerically integrate/differentiate signals

vague dock Aug 13, 2018, 7:52 AM

#

Differential of a function f(x) is another function f'(x). f'(x) gives the rate of change/slope of f(x) at x

muted niche Aug 13, 2018, 7:53 AM

#

Okay, I understand that and I'm sure I remember how to do it somewhere deep in my memory lol

#

if not, google is my friend

small ore Aug 13, 2018, 7:54 AM

#

Just try to plot velocity/acceleration/jerk for your various activity ( A couple of examples each?) and you will perhaps hit upon an idea of filtering it

muted niche Aug 13, 2018, 7:55 AM

#

Okay, that sounds like a really good starting point tyvm

small ore Aug 13, 2018, 7:55 AM

#

A fall should ( I think) be marked with jerk spikes

muted niche Aug 13, 2018, 7:55 AM

#

It is now 8.55am I gotta start work 😛

#

I will save this

vague dock Aug 13, 2018, 7:55 AM

#

have fun

small ore Aug 13, 2018, 7:55 AM

#

Yeah. That is indeed a fun idea

lilac shadow Aug 13, 2018, 7:56 AM

#

can confirm

vague dock Aug 13, 2018, 7:56 AM

#

also you could try to plot position by integrating, but without knowing initial position/velocity/acceleration could be innacurate

small ore Aug 13, 2018, 7:56 AM

#

( I would if possible like to see your visualization plots if you can manage it)

muted niche Aug 13, 2018, 7:57 AM

#

I will post back once I have something to show.

#

oh one more question

#

differential of displacement: what is the displacement?

small ore Aug 13, 2018, 7:58 AM

#

Bitchmoon, you are technically correct but I am thinking if integrating without an initial condition( or an assumed initial condition) should still give out some graphs that could distinguish activities somehow. We do not need exact velocity values here

vague dock Aug 13, 2018, 7:58 AM

#

Not certain about the terminology, I'd guess position at time x + 1 - position at time x?

small ore Aug 13, 2018, 7:58 AM

#

Displacement in this context is how much your phone moved in a particular timeframe in a given direction

vague dock Aug 13, 2018, 7:58 AM

#

True, that's why it's just offhand suggestion, might give some insights

small ore Aug 13, 2018, 8:13 AM

#

Hm hm. Also try dropping your phone from 2 metres high upon a matress or something and see if the 'free fall' case records '0' acceleration in the z direction for most of the fall. There may be a way to filter a deterministic data even if your phone is rotating while it falls. I will need to think hard on my basic mechanics

lean ledge Aug 13, 2018, 10:28 AM

#

Lol do any of you have a background in engineering or signal analysis

#

With an accelerometer if you're getting real-time data run it through a kalman filter to get accurate meaningful readings, use pythagoras to measure magnitude of acceleration. When it's falling, it will have an overall acceleration reading of almost 0. There are lots of ways to figure out when its not normal. Blindly measuring for acceleration until its under some magnitude should be enough given it's passed through a KF

#

@small ore @lilac shadow Almost all accelerometers you see will be MEMs. Pretty much all work on the same principle as a mass hanging from a spring

#

And I dont recommend using an accelerometer to predict position or velocity to any extent, the errors pile up very quickly so the drift is insane and unusable outside of small periods of time

lilac shadow Aug 13, 2018, 10:34 AM

#

@muted niche would be good to mention with this too.

lean ledge Aug 13, 2018, 10:36 AM

#

@small ore It's called a "derivative" not a "differential". A differential specifically is the infinitesimal quantity of something( "dx"). A derivative is the ratio of two differentials, with the overall quantity being the derivative of the top quantity with respect to the bottom quantity (dx/dt is the first derivative of x with respect to time)

muted niche Aug 13, 2018, 10:46 AM

#

@lean ledge thanks very much! I'll look into KF

teal veldt Aug 13, 2018, 1:08 PM

#

Hello folks, I'm currently reading and working through the Python Data Science Handbook, and there's a specific section that I'm having trouble understanding. If needed I can link the page, but the issue is that, after creating a 10 x 2 array to simulate points on a X,Y graph it calculates the distance between them by summing the square differences using the following code

dist_sq = np.sum((X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2, axis=-1)

#

Why is he putting a np.newaxis into the array?

#

Can't he just leave it as (X[: , :] - X[ :, :]) and numpy will just broadcast the subtraction to each point?

polar acorn Aug 13, 2018, 1:44 PM

#

Well (X[: , :] - X[ :, :]) would just give you an 10x2 array of zeros?

teal veldt Aug 13, 2018, 1:57 PM

#

Oh.

#

So, I kind of understand that since there's 10 points and we need to calculate the differences between each of them we need to add a dimension to the array

#

Since there will be a total of 100 measurements

#

But why was the np.newaxis put in two different places in the formula?

polar acorn Aug 13, 2018, 9:08 PM

#

First off I'm not quite sure what you mean by 100 measurements. You have two vectors with 10 elements, the distance between them is found by first taking the difference element by element, giving you 10 differences (x_1-y_2, x_2-y_2, ...., x_10-y_10), squaring and summing these and then taking the root of that sum.

You could have done this easier, you could have done np.sum((X[0 , :] - X[ 1 , :]) ** 2, axis=-1). The reason we aren't doing this is because this only works when we have two vectors. What if we have three? We could copy paste the previous line multiple times and change the indexes but that would be a hassle. Instead we use the first method because that gives us a distance matrix.

small ore Aug 13, 2018, 9:13 PM

#

Thank you @lean ledge I stand corrected

#

And no background on signal analysis

polar acorn Aug 13, 2018, 9:19 PM

#

Now feel free to correct if I'm far off but this is what I think the idea is. X[:, np.newaxis, :]changes your matrix from a 2x10 matrix to a 2x1x10 matrix. In the same way X[np.newaxis, :, :]changes a 2x10 into a 1x2x10.

Imagine these new matrixes like follows, where v1, is your first vector of X[0,:] and y is the second vector X[1,:] and the z dimension is into the screen. Our two new matrixes then looks like this.
⎡v1⎤
⎣v2⎦, [v1, v2]
Subtracting these two from each other I suppose numpy broadcasts to the following
⎡v1-v1, v1-v2⎤
⎣v2-v1, v2-v2⎦
Where the subtraction is done element wise on all 10 elements in the z dimension. Now squaring all these element differences in the z dimension and adding them up gives you the distance between v1 and v1, v1 and v2, v2 and v1 and v2 and v2. This 2x2 matrix is what you get if you do this in python. Now this works for any number of vectors of course.

#

@teal veldt Did that make sense?

small ore Aug 13, 2018, 9:56 PM

#

@muted niche Here is one site which maybe useful:https://towardsdatascience.com/kalman-filter-an-algorithm-for-making-sense-from-the-insights-of-various-sensors-fused-together-ddf67597f35e
Has basic explanation of Kalmann filter in text and math. Also has python implementation

Towards Data Science

Kalman Filter: An Algorithm for making sense from the insights of ...

An Algorithm that is an Astrologer for the Sensor fusion process.It can predict a estimate for you and would correct itself if its…

#

And it has got some amazing links in the "Useful Links" section too

muted niche Aug 14, 2018, 8:08 AM

#

ah, cheers! @small ore

tight dove Aug 14, 2018, 10:07 AM

#

Hi guys

#

I got a dictionary like this -

#


dict = {
  "Glo": [ "0705", "0807", "0805", "0811", "0815", "0905"],
  "Airtel": ["0701", "0708", "0802", "0808", "0812", "0902", "0907"],
  "MTN": ["0703", "0706", "0803", "0806", "0810", "0813", "0814", "0816", "0816"],
  "Etisalat": ["0809", "0817", "0818", "0908", "0909"],
  "Multilinks": ["07027", "0709"],
  "Visafone": ["07025", "07026", "0704"],
  "Starcomms": ["07028", "07029", "0819"],
  "Zoom": ["0707"],
  "Ntel": ["0804"],
  "Smile": ["0702"]
}

#

How do I print out the key given a specific value from the lists?

placid snow Aug 14, 2018, 10:12 AM

#

My first thought would be to iterate through key, value pairs and check if your desired_value is in the value part of your loop

tight dove Aug 14, 2018, 10:12 AM

#

with conventional dictionaries, I use pandas like this -

import pandas as pd
dict = {
  "Glo": [ "0705", "0807", "0805", "0811", "0815", "0905"],
  "Airtel": ["0701", "0708", "0802", "0808", "0812", "0902", "0907"],
  "MTN": ["0703", "0706", "0803", "0806", "0810", "0813", "0814", "0816", "0816"],
  "Etisalat": ["0809", "0817", "0818", "0908", "0909"],
  "Multilinks": ["07027", "0709"],
  "Visafone": ["07025", "07026", "0704"],
  "Starcomms": ["07028", "07029", "0819"],
  "Zoom": ["0707"],
  "Ntel": ["0804"],
  "Smile": ["0702"]
}
network = pd.Series(dict)
print (network[network.values == '0806'])

#

But I have a dictionary with list of values for each key

tight dove Aug 14, 2018, 10:47 AM

#

Hi @placid snow

#

So I did this

#

dictionary = {
  "Glo": [ "0705", "0807", "0805", "0811", "0815", "0905"],
  "Airtel": ["0701", "0708", "0802", "0808", "0812", "0902", "0907"],
  "MTN": ["0703", "0706", "0803", "0806", "0810", "0813", "0814", "0816", "0816"],
  "Etisalat": ["0809", "0817", "0818", "0908", "0909"],
  "Multilinks": ["07027", "0709"],
  "Visafone": ["07025", "07026", "0704"],
  "Starcomms": ["07028", "07029", "0819"],
  "Zoom": ["0707"],
  "Ntel": ["0804"],
  "Smile": ["0702"]
}

number = "0706"
for key, value in dictionary.items():
  s = set(value)
  if number in s:
    print(key)

placid snow Aug 14, 2018, 10:49 AM

#

Why convert the list to a set?

tight dove Aug 14, 2018, 10:49 AM

#

I read from a site it is more efficient

#

Best practice

#

https://docs.quantifiedcode.com/python-anti-patterns/performance/using_key_in_list_to_check_if_key_is_contained_in_a_list.html

placid snow Aug 14, 2018, 10:49 AM

#

I mean, somewhat. But you're not working with big lists here?

tight dove Aug 14, 2018, 10:50 AM

#

Lol no, but in case what i'm working on scales

placid snow Aug 14, 2018, 10:50 AM

#

Alrighty

lean ledge Aug 14, 2018, 10:51 AM

#

@tight dove doing set() is even slower

tight dove Aug 14, 2018, 10:51 AM

#

Really

#

?

lean ledge Aug 14, 2018, 10:51 AM

#

It wouldn't be if it was already a set

#

But to convert it into the set

#

It already has to iterate over all of the things in the list

placid snow Aug 14, 2018, 10:51 AM

#

I'd assume the check would be faster, but the conversion would take a while compared

tight dove Aug 14, 2018, 10:51 AM

#

I was thinking initially iterate through the check if the value is contained in the list

#

@lean ledge hmmm

#

So what's best practice then?

#

Don't want to do anything anti-pattern

lean ledge Aug 14, 2018, 10:53 AM

#

Accessing from set is O(1) but construction using set() is O(n). Just iterating over the list is also O(n). All you do by doing set() is waste a function call unless you plan on reusing that set again and again

#

Just do it without the set

#

Or store sets in the original dict rather than keys

tight dove Aug 14, 2018, 10:55 AM

#

Ok, so this would suffice then

number = "0806"
for key, value in dictionary.items():
  if number in value:
    print(key)

lean ledge Aug 14, 2018, 11:45 AM

#

Yep

teal veldt Aug 14, 2018, 1:42 PM

#

@polar acorn Yes, it does, thank you so much for taking your time to explain that in detail, I really appreciate it

jade robin Aug 14, 2018, 2:41 PM

#

ٴٴٴٴٴ

misty sonnet Aug 14, 2018, 2:43 PM

#

@weak kiln this name is against the guidelines?

#

Fix it?

#

:(

south quest Aug 14, 2018, 2:44 PM

#

@jade robin you're going to have to add a nickname which complies with our nickname policy.

#

Your current name is a character which messes around with how text is displayed

jade robin Aug 14, 2018, 2:45 PM

#

roger that

#

ok

south quest Aug 14, 2018, 2:45 PM

#

Also, don't send hidden unicode characters in help channels, that's considered spam

jade robin Aug 14, 2018, 2:46 PM

#

roger that

placid snow Aug 15, 2018, 10:49 AM

#

Is there any preferred way of storing training / test data?

#

I worked with 50,000 .txt files with my recent uni course. Doesn't sound like the best way of storing the data

placid snow Aug 15, 2018, 1:10 PM

#

While I'm at it, how would I go about loading 160k lines of json into memory, so I can rewrite it with proper indentation

velvet anchor Aug 15, 2018, 1:24 PM

#

Will probably all fit honestly unless the lines are massive

#

Even if every like is 1k that’s only ~200 megs

#

Storing data doesn’t super matter really depending on framework. Just with what’s easiest to use. Keras / tf have generator options to feed into a model

turbid bay Aug 15, 2018, 1:54 PM

#

hey just a quick question. But i was wondering whether anyone would want to help teach me about machine learning. I've tried looking at tutorials online etc. but whenever i have tried to do anything without a tutorial i have no clue where to start. If anyone would like to teach me that'd be highly appreciated. thanks

lean ledge Aug 15, 2018, 2:00 PM

#

@placid snow Try looking at HDF5

placid snow Aug 15, 2018, 2:20 PM

#

@velvet anchor must be massive then, i ran into a MemoryError trying to load the json into memory so i could re-write it with indentation

#

Will do raggu

#

the json file itself it 150mb

velvet anchor Aug 15, 2018, 2:22 PM

#

@turbid bay Andrew Ngs course on coursera

glad pivot Aug 15, 2018, 2:39 PM

#

Jose Portilla on Udemy

turbid bay Aug 15, 2018, 3:44 PM

#

i will. thanks @velvet anchor

velvet anchor Aug 15, 2018, 3:45 PM

#

Having personally taken it myself its quite good

#

widely regarded as one of the best

#

its also free, unless you want the certificate afterwards.

turbid bay Aug 15, 2018, 3:46 PM

#

do you mind sending me a link please

velvet anchor Aug 15, 2018, 3:46 PM

#

https://www.coursera.org/learn/machine-learning

Coursera

Machine Learning | Coursera

Machine Learning from Stanford University. Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, ...

#

Some people recommend this one as well

#

https://courses.edx.org/courses/course-v1:ColumbiaX+CSMM.102x+1T2017/course/

turbid bay Aug 15, 2018, 3:47 PM

#

thankyou very much

lapis sequoia Aug 15, 2018, 5:49 PM

#

@velvet anchor Any book do you recommend to start on Data Science ?

velvet anchor Aug 15, 2018, 5:49 PM

#

Not that I have personal recommendations on.

glad pivot Aug 15, 2018, 5:50 PM

#

@feral lodge have you used PyTorch at all?

feral lodge Aug 15, 2018, 5:50 PM

#

Yep

#

been a while though

#

I liked it a lot

glad pivot Aug 15, 2018, 5:51 PM

#

I'm having some issues loading in some image data and can't seem to figure out why I'm getting the error I am

velvet anchor Aug 15, 2018, 5:51 PM

#

I liked it too but I wound up moving to Keras after a bit

glad pivot Aug 15, 2018, 5:52 PM

#

data_dir = 'Cat_Dog_data'

# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.5, 0.5, 0.5], 
                                                            [0.5, 0.5, 0.5])])

test_trainsforms = transforms.Compose([transforms.Resize(255),
                                transforms.CenterCrop(224),
                                transforms.ToTensor(), 
                                transforms.Normalize([0.5, 0.5, 0.5],
                                                    [0.5, 0.5, 0.5])])
                                


# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=32)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32)

feral lodge Aug 15, 2018, 5:52 PM

#

Let's have a look bren, if I can't help probably someone else can

glad pivot Aug 15, 2018, 5:52 PM

#

# change this to the trainloader or testloader 
data_iter = iter(testloader)

images, labels = next(data_iter)
fig, axes = plt.subplots(figsize=(10,4), ncols=4)
for ii in range(4):
    ax = axes[ii]
    helper.imshow(images[ii], ax=ax)

#

invalid argument 0: Sizes of tensors must match except in dimension 0. Got 500 and 280 in dimension 2 at /Users/soumith/code/builder/wheel/pytorch-src/aten/src/TH/generic/THTensorMath.cpp:3616

feral lodge Aug 15, 2018, 5:52 PM

#

I'll probably switch to keras or tensorflow for my thesis yeah

glad pivot Aug 15, 2018, 5:53 PM

#

So I'm loading in the Cat/Dog data from Kaggle

#

And I was tasked with the TODOs you can see above

#

They didn't really give me much instruction here, so I might be doing something completely wrong

#

The dog/cat data are in .png format I believe

#

Oh no actually they're .jpg

feral lodge Aug 15, 2018, 5:57 PM

#

https://medium.com/@yvanscher/pytorch-tip-yielding-image-sizes-6a776eb4115b This guy says to set batch_size to 1 when you get this error

glad pivot Aug 15, 2018, 5:58 PM

#

I tried that, and it didn't fix it

#

📎 unknown.png

#

I mean I guess it kind of fixes it?

#

That just messes with Udacity's helper functions though

#

But you wouldn't want to run a forward pass with batches of one image..

placid snow Aug 15, 2018, 6:14 PM

#

What's the correct way to append rows to a dataframe?

glad pivot Aug 15, 2018, 6:15 PM

#

pd.concat

#

and specify your axis

feral lodge Aug 15, 2018, 6:17 PM

#

df.append https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html works too, without specifying axis 🐸 👌

placid snow Aug 15, 2018, 6:17 PM

#

What exactly is axis in this context

#

And, don't append need a series ?

#

Hang on, I have dataframes im appending. silly me

feral lodge Aug 15, 2018, 6:19 PM

#

Axis is 0 for rows, 1 for columns

placid snow Aug 15, 2018, 6:19 PM

#

Ah, that explains why I got 6k columns c:

feral lodge Aug 15, 2018, 6:19 PM

#

No idea about your tensors tbh bren 🤔 Only thing I can think of is to check the image sizes

glad pivot Aug 15, 2018, 6:20 PM

#

Ugh ok, it seems like a not very common problem

#

So maybe I'll just move on..?

feral lodge Aug 15, 2018, 6:22 PM

#

Can't hurt, you can always go back and try to fix it again if this turns out to be a requisite for understanding the next lessons

glad pivot Aug 15, 2018, 6:23 PM

#

📎 unknown.png

#

📎 unknown.png

#

That is the instruction here, if that helps provide a little more context to the problem? Don't feel obligated if you don't have time!

#

I'm pretty sure that jpgs only have 3 color channels?

feral lodge Aug 15, 2018, 6:34 PM

#

I'll have a look again tomorrow friendo, long day for me! Hopefully someone can help before then

#

Yep

glad pivot Aug 15, 2018, 6:34 PM

#

No problem!! Thanks for being willing

#

I can try to ask a Udacity 'mentor'

feral lodge Aug 15, 2018, 6:35 PM

#

Give it a go 😮 Feel free to write the fix if they can help, I'd like to know

velvet anchor Aug 15, 2018, 6:37 PM

#

I've had this error in Keras before but I don't remember how I solved it

#

are your images actually loading?

#

like is the directory correct?

glad pivot Aug 15, 2018, 6:39 PM

#

yes, directory is correct

#

i've got one image to load correctly

#

📎 unknown.png

#

so it works when I switch out the testloader for the trainloader, the only difference here is that they are in different subdirectories

#

and that the data isn't being randomized

#

but i was able to get non-randomized data to work earlier

velvet anchor Aug 15, 2018, 7:01 PM

#

Hmm

#

Are they the same size>

glad pivot Aug 15, 2018, 7:01 PM

#

they're all different sizes

#

even the testing ones

#

*training ones

#

so for some reason it is working for the training but not testing set

velvet anchor Aug 15, 2018, 7:02 PM

#

hmm

#

can you send the code and a link to the data set?

glad pivot Aug 15, 2018, 7:02 PM

#

dataset is local

velvet anchor Aug 15, 2018, 7:02 PM

#

may be a minute before I can look at it though about to leave class

#

https://www.kaggle.com/c/dogs-vs-cats

#

its this one right?

glad pivot Aug 15, 2018, 7:03 PM

#

yes that's t

#

this is the code

📎 Part_7_-_Loading_Image_Data.ipynb

#

shouldn't have any sensitive info on it

#

it is utilizing some external (local) modules

velvet anchor Aug 15, 2018, 7:07 PM

#

kk lemme get home and I can take a closer look

cerulean mica Aug 16, 2018, 5:52 AM

#

Can anyone help with a qlearn issue/python issue? keep on getting value errors >.<

serene oar Aug 16, 2018, 8:17 AM

#

Hi, when plotting with mpld3 to html, it loses the X axis values for me, making it be from -1 to 23, instead of having the years 1995-2017.
When plotting with pyplot, everything is as it should be.

📎 unknown.png

#

📎 unknown.png

#

How can I have the exact same plot in html with mpld3, as I have with pyplot?

analog crane Aug 17, 2018, 12:02 AM

#

hello, what would be the fastest way to check if an image is (almost - compression) the same as another one ? (need to compare it to somewhere around 1000 pictures but stop if a good enough match is found)(modifié)
pngs

velvet anchor Aug 17, 2018, 1:51 AM

#

You could look into perceptual hashes

#

You could also downscale the images super small. Like to 8x8 or something and then do some math to compute a similarity score

#

A quick one is using the square of the distance for all 3 colors

novel path Aug 17, 2018, 9:32 AM

#

Hey all, I’m struggling with a dataframe manipulation question (posted #help-coconut ). Thanks a lot to take a look if you have some time.

summer plover Aug 17, 2018, 10:18 AM

#

@lone mist have been pinged in the help channel, he might get online soon. maybe you could repost your question here @novel path just for the more ease of access to the question

novel path Aug 17, 2018, 10:22 AM

#

Ok, thanks for the tip @summer plover

#

I'm struggling with a df case. I’d like to cipy the High values from source table for all Year-Week matching
these two tables are not same dimension
thanks a lot in advance
https://imgur.com/4vwWgha

Imgur

#

Thanks @lone mist

lone mist Aug 17, 2018, 10:35 AM

#

Sorry, I lack the knowledge to help you with this. I don't even know what that is.

novel path Aug 17, 2018, 11:31 AM

#

Finally I’ve solved it with database like python. Both merge and join works. Thanks anyway @lone mist

#

https://pandas.pydata.org/pandas-docs/stable/merging.html#joining-key-columns-on-an-index

placid snow Aug 17, 2018, 5:01 PM

#

What's a good way to partially load a huge json file

simple crag Aug 17, 2018, 5:04 PM

#

do you need the whole thing in memory?

placid snow Aug 17, 2018, 5:11 PM

#

I'm unsure if i actually do.

#

case 1, I want to format it with indentation and
case 2, I want to plot the data

placid snow Aug 17, 2018, 5:37 PM

#

Side question, how would I go about plotting occurrence of values in a bar graph?

#

I'm counting lengths of strings with py lengths = df["Contents"].apply(str).apply(str.split).apply(len)
Andtrying to plot it with

lengths.value_counts().plot.bar(x=lengths.values)```

#

Unsure if my result i actually correct based on the messy x axis..

📎 unknown.png

#

Actually nvm, I lowered the max amount of str length and got a read-able answer. It does seem correct

strong arrow Aug 17, 2018, 6:42 PM

#

Hello does anyone know abotut tesseract here?

naive hornet Aug 17, 2018, 6:42 PM

#

!t ask

arctic wedgeBOT Aug 17, 2018, 6:42 PM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

strong arrow Aug 17, 2018, 6:44 PM

#

Oh alright so I'm having this problem and it says Error opening data file \Program Files (x86)\Tesseract-OCR\tessdata/chi_tra.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'chi_tra'
Tesseract couldn't load any languages!
Could not initialize tesseract.

#

Please lmk asap

#

welp

#

No one is here

#

damn i have best luck

#

everyone ignores me

#

nice

nocturne flower Aug 17, 2018, 6:56 PM

#

You're not being ignored. But you aren't going to get 100 people saying "I don't know"

velvet anchor Aug 17, 2018, 6:57 PM

#

Yup. Not many people are familiar with ML libraries. My only suggestion is setting the environment variable to what it tells you to.

strong arrow Aug 17, 2018, 6:58 PM

#

I'm new to this can you please help witht hat real quick

nocturne flower Aug 17, 2018, 6:58 PM

#

"Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory."

#

Google it and do what you find.

#

That's what anyone here would be doing

velvet anchor Aug 17, 2018, 7:00 PM

#

https://emop.tamu.edu/Installing-Tesseract-Windows8

void anvil Aug 17, 2018, 7:03 PM

#

is there a command to transpose partial columns of a DF into rows
and label the rows
eg I have the list 1:10
I want it to look like:
original, -1, -2, -3
1
2
3
4, 3, 2 , 1
5, 4, 3, 2
6, 5, 4, 3
etc.
10, 9, 8, 7

naive hornet Aug 17, 2018, 7:09 PM

#

hey, @strong arrow, it's come to our attention that you're botting HQTrivia. that's very not allowed by them or us, time to stop asking for help.

livid jetty Aug 18, 2018, 10:35 AM

#

is this right place to talk about servers?

muted niche Aug 18, 2018, 11:08 AM

#

think data-science applies more to processing and visualising data, machine learning, statistics and stuff. I could be wrong

#

@livid jetty I am currently working on some server like stuff, although it is pretty basic stuff. If you have a question I can try to answer

signal siren Aug 18, 2018, 12:24 PM

#

Hi, I have a question regarding pandas plot function. First: Here is the code:

columns = [column, 'gender', 'twitterid']
group = [column, 'gender']
x = df[columns].groupby(group)['twitterid'].unique().apply(len)
axis = df.plot.bar()
plt.savefig('fancy_image.png')

The code above results in this plot:

#

📎 mbti_gender_16_tweets.png

#

But this plot has wrong ticks on the y-axis and it's bars are also wrong. Here is the csv file of the dataframe above

#

📎 mbti_gender_16_tweets.csv

#

ENFJ,F,6
ENFJ,M,2
ENFP,F,10
ENFP,M,9
ENTJ,F,2
ENTJ,M,7
ENTP,F,5
ENTP,M,10
ESFJ,F,5
ESFJ,M,2
ESFP,F,3
ESTJ,M,1
ESTP,M,2
INFJ,F,15
INFJ,M,9
INFP,F,25
INFP,M,19
INTJ,F,8
INTJ,M,8
INTP,F,7
INTP,M,16
ISFJ,F,3
ISFP,F,4
ISFP,M,5
ISTJ,F,1
ISTJ,M,4
ISTP,M,4

#

Has anyone a clue why this happens?

#

But if I run it interactively in the IPython Shell it works properly

fluid venture Aug 19, 2018, 7:12 AM

#

hey @signal siren

not sure how to help, there are some problems with the snippet to begin with:
1 - theres a unassigned variable column, probably should be 'mbit'?
2 - you assign a x variable and dont use it
3 - probably there is a misunderstanding in x = [...].unique().apply(len) , absolutely anything that you apply unique and then len will result in a column of a bunch of 1

about the image: the csv doesnt seem to relate in any way to the image you provided

about the csv: taking by the labels in the image, you probably want to visualize 'number of tweets by gender and mbti', that is, you want to see which mbti tweets more, discriminated by gender, is that so? That is kind of impossible using this csv, because if 'twitterid' correspond to a 'unique individual', there is some error in the date, there are ids with multiple MBTIs and genders:

(
    pd.read_csv('./mbti_gender_16_tweets.csv', names=[
        'mbti', 'gender', 'twitterid'
    ])
    .set_index('twitterid')
    .sort_index()
)

📎 unknown.png

#

That being said, maybe you want something like this:

(
    df
    .groupby(['mbti', 'gender'])
    .twitterid
    .sum()
    .unstack()
    .plot.bar(stacked=True)
)

📎 unknown.png

signal siren Aug 19, 2018, 9:30 AM

#

@fluid venture thank you for your help!
The codesnippet above was split into two functions and I forgott to reassign the x. But the .unstack() method is really nice! Thank you a lot!

fluid venture Aug 19, 2018, 5:42 PM

#

@signal siren cool, be careful thou, this snippet doesnt make much sense, Im summing twitter ids, used this only as an example

turbid bay Aug 20, 2018, 12:34 PM

#

can someone explain to me the meaning of partial derirative in the gradient descent algorithm? like what is a partial derirative and what does it do??

polar acorn Aug 20, 2018, 1:15 PM

#

Do you know what derivative is?

#

As in differentiate y = f(x) with respect to x. Often denoted dy/dx

turbid bay Aug 20, 2018, 3:26 PM

#

yes kind of i think XD

#

as in if y=x^3 then dy/dx = 3x^2

polar acorn Aug 20, 2018, 5:23 PM

#

Yes exactly. Now what if you function has several variables? Lets say y = x^2 * z^2
What happens when you differentiate this?

#

Partial derivatives is a solution to this where you say that first you differentiate y as if everything but x was a constant (like if it's a just a number like 2 or 5). You call this ∂ y/∂ x, since we pretend z is constant we change d to ∂ to signify that y really depends on more than just x. Then we can calculate ∂ y/∂ z and for this we assume that x is a constant. In gradient descent we use ∂ y/∂ x to see how we need to adjust x to minimise y, and ∂ y/∂ z to see how we need to adjust z to minimise y.

#

Makes sense?

misty sonnet Aug 20, 2018, 6:49 PM

#

@polar acorn if it is just a single expression you can apply that logic across the whole thing

#

If it is factors there is a different method j think?

#

Essence of calculus by 3blue1brown is amazing

#

Check it out!

lean ledge Aug 20, 2018, 11:45 PM

#

@turbid bay You should probably take a course on multivariate calculus

turbid bay Aug 20, 2018, 11:48 PM

#

i could yes. but im already taking a course on machine learning rn and thats taking a lot of effort as it is 😂. but. @polar acorn explanation makes a bit of sense actually. i just need to be able to visualise how it works with gradient descent. but that’ll have to wait until tomorrow

lean ledge Aug 21, 2018, 1:09 AM

#

Is that a formal course? You didnt have Calc 3 as a prerequisite?

polar acorn Aug 21, 2018, 7:46 AM

#

@turbid bay I would listen to @lean ledge here, understanding machine learning takes some effort, but understanding it without a good grasp on multivariate calculus and linear algebra takes a LOT more effort. At least you should check out the youtube channel @misty sonnet recommended, it's quite good. It seems like more effort but I think it'll save you some effort to really get the basics down.

small ore Aug 21, 2018, 9:06 AM

#

@turbid bay If it is Andrew Ng's course you do not need to understand calculus in detail. You just need to know what it means and you use the final result of the derivation rather than having to know the mathematical details of the steps involved. For gradient descent you have to just know that similar to dy/dx giving the slope of a curve defined by y = f(x), a partial derivative gives the gradient ('slope') of a multivariable function. Although a quick learning of partial derivatives can only help understand better

turbid bay Aug 21, 2018, 9:41 AM

#

no @lean ledge its an online course

#

and @small ore it is Andrew Ng’s course however even tho its not nevessary to understand it in detail it is always nice to understand it

small ore Aug 21, 2018, 9:44 AM

#

Read up some basic partial derivatives esp one with some sort of example or demp which visually shows you what a gradient can be thought of as

lean ledge Aug 21, 2018, 9:45 AM

#

I do not at all recommend considering trying to learn data science without learning maths if you're actually considering going into it

#

Maths can get quite complex

polar acorn Aug 21, 2018, 10:17 AM

#

It depends on what kind of problem solver you want to be I guess. Do you want to solve your problems mostly by trial and error, copy pasting, looking for tutorials or similar problems with solutions. Or do you want to be able to understand why a solution works for a specific problem? This is of course not a binary thing but more of a sliding scale. Anyhow I feel doing machine learning without understanding math leads you more to first strategy.

turbid bay Aug 21, 2018, 11:05 AM

#

@lean ledge i am learning maths XD just haven't done this yet. I am still studying at school

earnest prawn Aug 21, 2018, 1:12 PM

#

School only teaches a fragment of the entirety of maths though, so it's possible that you maybe never get to these topics

turbid bay Aug 21, 2018, 2:18 PM

#

im pretty sure that the maths course at school covers partial deriratives but i will eventually look at learning more maths than school teaches

#

but 50% of my lessons are maths

trail current Aug 21, 2018, 5:46 PM

#

hey guys, have a matplotlib question
when i try to use matplotlib.animate.funcanimation, i have the axis set to update with the increase of time
so the axis 'follows' the data (it's live data)
but now i can't use the navigation toolbar because whenever i try to zoom or whatever the axes get redrawn and it gets set back to default

wild haven Aug 22, 2018, 10:17 PM

#

anyone around to help out with a json --> pandas question?

simple crag Aug 22, 2018, 10:17 PM

#

!t ask

arctic wedgeBOT Aug 22, 2018, 10:17 PM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

wild haven Aug 22, 2018, 10:18 PM

#

very well

#

i have some json data with multiple nested levels - some of the objects lack every category. example:

      "array": [
        "firstArrayValue":0,
        "secondArrayValue":1],
    "firstCategory":secondValue,
    "array":[
        "firstArrayValue":0]
]```

#

so here, the second value's array doesn't contain "secondArrayValue"

#

so when i load this json into pandas and try to look up MyDataframe.at[1,'array']['secondArrayValue'] it gives a key error

#

i can't just tell the dataframe to deal with NaN values in a particular way - the key is actually missing

#

any thoughts on a workaround?

quiet gyro Aug 23, 2018, 2:00 AM

#

https://github.com/att-innovate/squanch

GitHub

att-innovate/squanch

squanch - A distributed simulation framework for quantum networks and channels

earnest prawn Aug 23, 2018, 2:17 AM

#

thats pretty cool

wild hinge Aug 23, 2018, 6:14 AM

#

hi everyone, is there a way to sort a pandas dataframe on odd even values?

#

This is part of my df, I want to sort even to odd (from A to F), can anyone point me in the right direction?

📎 unknown.png

polar acorn Aug 23, 2018, 7:17 AM

#

Still not quite sure what you want to achieve? Do you want to sort the rows or columns? Do you have an example of what it should look like?

wild hinge Aug 23, 2018, 8:32 AM

#

I only modified the first 3 columns, I want to observe the odd and even distribution in my dataframe. After I distribute the numbers like in the example above I want to use a color map to have a visual representation of this distribution.

📎 unknown.png

random bolt Aug 23, 2018, 8:37 AM

#

Show us what a sample unsorted dataset should look like, then what the algorithm should turn it into.

wild hinge Aug 23, 2018, 8:38 AM

#

Sample of unsorted dataset

📎 unknown.png

#

Sample of desired outcome(Odd numbers to Even numbers sorted A to F for each event(1,2,3)

📎 unknown.png

random bolt Aug 23, 2018, 8:41 AM

#

So odd before even in each column?

wild hinge Aug 23, 2018, 8:41 AM

#

yes sir

random bolt Aug 23, 2018, 8:48 AM

#

So, while I'm not too experienced with pandas, from what I can tell, you can probably use the advice from this stackexchange to make auxiliary columns to make arbitrary sorts.

#

https://stackoverflow.com/questions/39525928/pandas-sort-lambda-function

Stack Overflow

pandas sort lambda function

Given a dataframe 'a' with 3 columns, A , B , C and 3 rows of numerical values. How does one sort all the rows with a comp operator using only the product of A[i]*B[i]. It seems that the pandas s...

#

Are you familiar with the modulus function (% in python)?

wild hinge Aug 23, 2018, 8:51 AM

#

I am, but with the method from the article above I can only create another column to display the desired result, are you proposing to create the new column with the desired outcome then delete the input column?

polar acorn Aug 23, 2018, 8:53 AM

#

I made a ugly solution that might work

#

a = pd.DataFrame({'b':[12,2,2,1,2], 'c':[4,12,5,3,1]})
print(a)
a = a.apply(lambda x: sorted(x[x%2==0]) + sorted(x[x%2!=0]), axis=0)
print(a)

#

See if that solves your problem

wild hinge Aug 23, 2018, 8:55 AM

#

Like a charm, thank you so much

📎 unknown.png

polar acorn Aug 23, 2018, 8:56 AM

#

np

wild hinge Aug 23, 2018, 8:57 AM

#

why do you say it is ugly?

polar acorn Aug 23, 2018, 8:57 AM

#

Because I suspect there might be a better solution, though I'm not sharp enough with pandas to find it 😃

wild hinge Aug 23, 2018, 9:01 AM

#

the same lambda function can be used to rearrange max to min?

#

never mind, stupid question

random bolt Aug 23, 2018, 9:03 AM

#

The thing I linked literally makes an index column, so I think your solution is about as nice as possible.

wild hinge Aug 23, 2018, 9:05 AM

#

This is the outcome I was looking for, to have a better overview of the odd even distribution in my DF
Cheers for the help again 😄

📎 unknown.png

polar acorn Aug 23, 2018, 9:09 AM

#

No problem!

small ore Aug 23, 2018, 10:47 AM

#

@wild hinge if I may ask, what are you using to visualize the dataframe?

wild hinge Aug 23, 2018, 11:05 AM

#

pandas

#

this is the style I apply on my df to get the example above

#

new_df.transpose().style.applymap(odd_even)
.set_properties(**{'max-width': '300px', 'font-size': '1pt'})
.set_caption("Hover to magnify")
.set_precision(2)
.set_table_styles(magnify())

#

def odd_even(val):
"""
Takes a scalar and returns a string with
the css property 'color: red' for negative
strings, black otherwise.
"""
color = 'background-color: green' if val % 2==0 else 'background-color: cyan'
return color

#

this is the helper function I use to color my map

small ore Aug 23, 2018, 11:52 AM

#

No. I meant what you are using to visualize it. Both the spreadsheet and the "map"

wild hinge Aug 23, 2018, 11:53 AM

#

jupyter notebook

small ore Aug 23, 2018, 11:53 AM

#

Ahh. Nice

wild hinge Aug 23, 2018, 11:53 AM

#

it helps me visualize the observations I take, rather than using comments

small ore Aug 23, 2018, 11:58 AM

#

Yeah. Those are nice things to look at rather than just df.head

wild hinge Aug 23, 2018, 1:38 PM

#

I'm back with another question for plotting purposes now.

📎 unknown.png

#

I have the above df and I want to create a histogram

#

fig, ax = plt.subplots()

histogram = ax.hist(new_fr['G'], density=1)
y=new_fr['H']

ax.plot(histogram, y, '--')
ax.set_xlabel('Index')
ax.set_ylabel('%/event')
ax.set_title('Numbers')

#

and I get this value error :

ValueError: x and y must have same first dimension, but have shapes (3,) and (49,)

small ore Aug 23, 2018, 2:02 PM

#

!t codeblock

arctic wedgeBOT Aug 23, 2018, 2:02 PM

#

codeblock

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print("Hello world!")
```

This will result in the following:

print("Hello world!")

wild hinge Aug 23, 2018, 2:05 PM

#

!t

#

thank you 😃

steel breach Aug 23, 2018, 2:21 PM

#

Hi, i have list of nodes and adjacency matrix as 2 dim numpy array. What is the best way if user wants to get if there is connection between Node1 and Node2. I have to get Node1/2 indexes from the list and then access the matrix, is there something faster? because i have to use indexing from 0-n for matrix

simple crag Aug 23, 2018, 2:43 PM

#

If you've built the adjacency matrix from the list then why would you need to go back and get the indices?

steel breach Aug 23, 2018, 2:44 PM

#

well, if i have list of object of class Node

#

how can i use it for indexing 2 dim numpy array

#

maybe i am overthinking the problem

simple crag Aug 23, 2018, 2:46 PM

#

probably

steel breach Aug 23, 2018, 2:46 PM

#

goal is to store lattice (partial ordered set)

#

its good to use graph for is especially adjacency matrix

#

problem is how should i index this matrix

#

because there is nothing like node "0" or node "1" there are some real objects

#

which holds extra informations

#

so i need some sort of "translation" between integers and real Node objects

#

thats the reason why i think that i need to find index of the Node object and then access the matrix

simple crag Aug 23, 2018, 2:52 PM

#

The rows and columns correspond to the nodes in the list

steel breach Aug 23, 2018, 2:52 PM

#

correct

#

use case is easy, imagine that you have var with Node object and you want to know his upper neighbors

#

first you have to get the index of the node, then read the row/column of the matrix and find nodes based on the index and return these nodes

#

and thats seems to be bad design for me

#

finding node for index is O(1), bud other direction can be problem

simple crag Aug 23, 2018, 3:01 PM

#

I'm not sure how else you would like to do it, there are packages that abstract it away for you but they're going to do the same thing

steel breach Aug 23, 2018, 3:02 PM

#

ok, then i will try it this way thanks 😃 just want to be sure that my concept is fine

analog musk Aug 24, 2018, 9:55 AM

#

Hi , Anyone know how to do mean normalization of a 1000x20 array with random values from 0 to 5000 ?

#

With numpy

polar acorn Aug 24, 2018, 11:02 AM

#

Normalising each column? Or across all fields?

steel breach Aug 24, 2018, 11:33 AM

#

Hi if i have ndarray, is it possible to put values into based on indice list like [(0,0), (1,1)]?

analog musk Aug 24, 2018, 2:02 PM

#

@polar acorn normalising each column

#

I did this. I subtract mean of x from each average column and divide it by each column standard deviation

#

My problem is now the data separation.

#

The exercise ask to create a new 1d array with permutation of the row indices of x normalised

turbid bay Aug 24, 2018, 4:55 PM

#

can anybody tell me whether this bit of code performs the batch gradient descent algorithm......

#

📎 unknown.png

#

    for i in range(10000):
        sum_of_differences_0 = 0
        sum_of_differences_1 = 0
        for k in range(len(inputs)):
            sum_of_differences_0 += (theta_0 + (theta_1*inputs[k]))-answers[k]
            sum_of_differences_1 += ((theta_0 + (theta_1*inputs[k]))-answers[k])*inputs[k]
        theta_0 = theta_0 - learning_rate*(1/len(inputs))*sum_of_differences_0
        theta_1 = theta_1 - learning_rate*(1/len(inputs))*sum_of_differences_1```

lone mist Aug 24, 2018, 5:15 PM

#

Not familiar with the alg but it looks correct to what the picture above shows

turbid bay Aug 24, 2018, 5:15 PM

#

um ok then. thanks. im just confused because when implemented in my programme it doesnt work. But thanks for the help 😃

void anvil Aug 24, 2018, 10:19 PM

#

what python ml algorithms are natively capable of doing multiple outputs (e.g. x0..xn predicts y0...yn)

feral lodge Aug 25, 2018, 3:35 AM

#

That's batch gradient descent yes! Specifically gradient descent for linear regression, meaning we assume the target values y depend on the predictors x in a linear fashion and thus model each data point (x, y) with the linear function h(x) = θ₀ + θ₁x. In this particular variant of gradient descent, the parameters θ₀ and θ₁ are updated to minimize the least squares cost function J(θ) = ½(h(x) - y)². The difference between batch gradient descent and regular is that we're considering m inputs rather than just one each time we update the parameters. Regular gradient descent would look pretty much the same except it wouldn't have the summations (Σ).

You can read more on this in this pdf: http://cs229.stanford.edu/notes/cs229-notes1.pdf. Your expression is derived on page 4. If you have a look there you can also see that the two gradients on your picture -- ∂J(θ₀, θ₁)/∂θ₀ and ∂J(θ₀, θ₁)/∂θ₁ -- are actually conceptually the same. The only difference is that the first is multiplied with a 1 and the second is multiplied by x. This is because ∂h/∂θ₀ = 1 and ∂h/∂θ₁ = x.

#

Not sure why your code isn't working properly, I think we'd have to see more to tell @turbid bay

steep ibex Aug 25, 2018, 6:17 AM

#

I want to simulate a dwarf fortress like world. icosahedron: so 20 triangles, each triangle side should have 13 triangles in them, each triangle would be 512 triangles itself. This would simulate a 40.075 world about 1x1 km. It also needs a z-dimenson: down (towards lava), and a bit up: say 512 total on the z level. But: that would give me an insane amount of objects to iterate through each cycle: like 45365 million. (or a number in that ballpark)... any ideas on how I could accomplish that? My best guess at this moment is to have two kinds of objects: they all start as 'inactive', then have the world builder make some 'active' (put them in a queue or sumthing). I would then only need to chekc the active ones. This could cut out 90+ percent of all the objects to check each cycle. Anyone have a better idea?

stiff sapphire Aug 25, 2018, 7:49 AM

#

In things like DF it will only iterate over actual things that do stuff like machines or people

turbid bay Aug 25, 2018, 10:07 AM

#

@feral lodge this is my full code

hours_studied = [100, 25, 150, 75, 36, 43, 100, 52, 97, 250, 10, 5, 0, 25, 15, 19]
score_percentage = [80, 66, 84, 80, 52, 45, 90, 72, 78, 96, 34, 36, 25, 52, 40, 44]

###uses mean normalisation and feature scaling to scale the data

def find_range(array):
    array.sort()

    return array[len(array)-1] - array[0]

def find_mean(array):
    score = 0
    for i in array:
        score+=i

    return score/(len(array))

def feature_scaling(array):
    scaled_array = []
    array_range = find_range(array)
    array_mean = find_mean(array)

    for i in array:
        value = (i-array_mean)/array_range
        scaled_array.append(value)

    return scaled_array
    
###

def finding_theta_values(inputs, answers):
    theta_0 = 0
    theta_1 = 0
    learning_rate = 0.0001

    for i in range(10000):
        sum_of_differences_0 = 0
        sum_of_differences_1 = 0
        for k in range(len(inputs)):
            sum_of_differences_0 += (theta_0 + (theta_1*inputs[k]))-answers[k]
            sum_of_differences_1 += ((theta_0 + (theta_1*inputs[k]))-answers[k])*inputs[k]
        theta_0 = theta_0 - learning_rate*(1/len(inputs))*sum_of_differences_0
        theta_1 = theta_1 - learning_rate*(1/len(inputs))*sum_of_differences_1

    return theta_0, theta_1

def test(c, m):
    test_hours_studied = 56
    estimated_score = m*test_hours_studied + c
    return estimated_score

def main():
    scaled_inputs = feature_scaling(hours_studied)
    scaled_answers = feature_scaling(score_percentage)

    c,m = finding_theta_values(scaled_inputs, scaled_answers)

    
    result = test(c, m)
    print(result)
            
main()

#

the problem im having is the more iterations i do in the first for loop the higher my final result is. It is even managing to reach 4000% somehow

polar acorn Aug 25, 2018, 12:42 PM

#

@turbid bay a couple of things.

First of all your find range function actually sorts hours_studied and score_percentage outside of the function as well. Try it out by running a = [1,43,2,0,3]; find_range(a); print(a).
Second when you train on scaled data you must test on scaled data as well. Your test function should scale 56 in the same way your input was scaled, and then invert the scaling for the estimated score

turbid bay Aug 25, 2018, 1:26 PM

#

how else can i find the range of the array??

#

@polar acorn

polar acorn Aug 25, 2018, 1:28 PM

#

You can call min(array) and max(array) directly without sorting anything first

#

Also are you familiar with numpy and sklearn?

turbid bay Aug 25, 2018, 1:52 PM

#

no i am not. ive only just started learning machine learning so im only trying to put into practice the things ive learned.

polar acorn Aug 25, 2018, 2:08 PM

#

I see, well numpy is useful for all kinds of operations with arrays and sklearn is probably the first python library i'd look into if wanted to learn machine leaning with python. It's nice to write stuff yourself the first time sso you know how it works, but when you have done that in the future you can use, numpy for finding mean and ranges etc. and sklearn for scaling and gradient descent etc.

steel breach Aug 25, 2018, 2:34 PM

#

Hi if i have ndarray, is it possible to put values into based on indice list like [(0,0), (1,1)]?

feral lodge Aug 25, 2018, 3:43 PM

#

You're training with normalized data, but testing with non-normalized data. You can either skip normalizing the training data, or you can apply the exact same normalization transformation to the testing data as the training data 👌 It makes sense if you think about it -- your line has been fitted with points in a very small range centered on 0, like 0.02 and -0.4, but suddenly you're testing it with 56, so the predicted percentage skyrockets. By the way, a line isn't a great model for percentages, since percentages are bound between [0, 100] (or equivalently [0, 1]) but a line goes on forever. A proper model, like a sigmoid, wouldn't be able to predict 4000% @turbid bay

brazen spade Aug 25, 2018, 4:34 PM

#

Can I get some opinions on how to handle a data frame I'm working with? I'm analyzing soccer data to determine patterns in winning teams across multiple leagues. I have a valuable df of matches or games, with a lot of good info. Unfortunately some critical features in determining the control/pace/etc of the game are heavily null. The df is ~25k rows, with ~11k rows of these 8 critical features being empty. Should I drop these rows and lose half the data, fill with means, something else? These are technically games I could look up the data manually, but for 11k games, I don't have time for this so that's not an option

lapis sequoia Aug 25, 2018, 5:46 PM

#

good tutorial for blockchain development?

turbid bay Aug 26, 2018, 12:26 PM

#

hey guys, is this the correct way to test my gradient descent algorithm which has been feature scaled and mean normalised?

#

def test(c, m):
    test_hours_studied = 40
    test_range = find_range(hours_studied)
    test_mean = find_mean(hours_studied)
    score_range = find_range(score_percentage)
    score_mean = find_mean(score_percentage)

    scaled_hours_studied = (test_hours_studied-test_mean)/test_range
    estimated_score = (m*test_hours_studied)+ c
    estimated_score = (estimated_score*score_range)+score_mean
    return estimated_score```

glad pivot Aug 26, 2018, 12:46 PM

#

We would need to see your other functions

turbid bay Aug 26, 2018, 1:08 PM

#

ok

#

hours_studied = [100, 25, 150, 75, 36, 43, 100, 52, 97, 250, 10, 5, 0, 25, 15, 19]
score_percentage = [80, 66, 84, 80, 52, 45, 90, 72, 78, 96, 34, 36, 25, 52, 40, 44]
###uses mean normalisation and feature scaling to scale the data
def find_range(array):
    top = max(array)
    bottom = min(array)
    return top-bottom
def find_mean(array):
    score = 0
    for i in array:
        score+=i
    return score/(len(array))
def feature_scaling(array):
    scaled_array = []
    array_range = find_range(array)
    array_mean = find_mean(array)
    for i in array:
        value = (i-array_mean)/array_range
        scaled_array.append(value)
    return scaled_array    
def finding_theta_values(inputs, answers):
    theta_0 = 0
    theta_1 = 0
    learning_rate = 0.0001
    for i in range(10000):
        sum_of_differences_0 = 0
        sum_of_differences_1 = 0
        for k in range(len(inputs)):
            sum_of_differences_0 += (theta_0 + (theta_1*inputs[k]))-answers[k]
            sum_of_differences_1 += ((theta_0 + (theta_1*inputs[k]))-answers[k])*inputs[k]
        theta_0 = theta_0 - learning_rate*(1/len(inputs))*sum_of_differences_0
        theta_1 = theta_1 - learning_rate*(1/len(inputs))*sum_of_differences_1
    return theta_0, theta_1
def test(c, m):
    test_hours_studied = 40
    test_range = find_range(hours_studied)
    test_mean = find_mean(hours_studied)
    score_range = find_range(score_percentage)
    score_mean = find_mean(score_percentage)
    scaled_hours_studied = (test_hours_studied-test_mean)/test_range
    estimated_score = (m*test_hours_studied)+ c
    estimated_score = (estimated_score*score_range)+score_mean
    return estimated_score
def main():
    scaled_inputs = feature_scaling(hours_studied)
    scaled_answers = feature_scaling(score_percentage)
    c,m = finding_theta_values(scaled_inputs, scaled_answers)
    #c,m = finding_theta_values(hours_studied, score_percentage)
    result = test(c, m)
    print(result)           
main()

quiet gyro Aug 26, 2018, 7:13 PM

#

https://paste.pydis.com might be a good place for a large chunk like that

misty sonnet Aug 26, 2018, 7:13 PM

#

Also more tabs please

#

hours_studied = [
    100, 25, 150, 75, 36, 43, 100, 52, 97, 250, 10, 5, 0, 25, 15, 19]

score_percentage = [
    80, 66, 84, 80, 52, 45, 90, 72, 78, 96, 34, 36, 25, 52, 40, 44]

# uses mean normalisation and feature scaling to scale the data


def find_range(array):
    top = max(array)
    bottom = min(array)
    return top-bottom


def find_mean(array):
    score = 0
    for i in array:
        score += i
    return score/(len(array))


def feature_scaling(array):
    scaled_array = []
    array_range = find_range(array)
    array_mean = find_mean(array)
    for i in array:
        value = (i-array_mean)/array_range
        scaled_array.append(value)
    return scaled_array


def finding_theta_values(inputs, answers):
    theta_0 = 0
    theta_1 = 0
    learning_rate = 0.0001
    for _ in range(10000):
        sum_of_differences_0 = 0
        sum_of_differences_1 = 0
        for k in range(len(inputs)):
            sum_of_differences_0 += (theta_0 + (theta_1*inputs[k]))-answers[k]
            sum_of_differences_1 += (
                (theta_0 + (theta_1*inputs[k]))-answers[k])*inputs[k]
        theta_0 = theta_0 - learning_rate*(1/len(inputs))*sum_of_differences_0
        theta_1 = theta_1 - learning_rate*(1/len(inputs))*sum_of_differences_1
    return theta_0, theta_1


def test(c, m):
    test_hours_studied = 40
    score_range = find_range(score_percentage)
    score_mean = find_mean(score_percentage)
    estimated_score = (m * test_hours_studied) + c
    estimated_score = (estimated_score * score_range)+score_mean
    return estimated_score


def main():
    scaled_inputs = feature_scaling(hours_studied)
    scaled_answers = feature_scaling(score_percentage)
    c, m = finding_theta_values(scaled_inputs, scaled_answers)
    #  c, m = finding_theta_values(hours_studied, score_percentage)
    result = test(c, m)
    print(result)


main()

#

Fixed your code

#

xd

turbid bay Aug 26, 2018, 8:50 PM

#

???

wild hinge Aug 27, 2018, 8:24 AM

#

@misty sonnet you have a slow monday, don't you?

placid snow Aug 27, 2018, 8:26 AM

#

It was posted yesterday GWcmeisterPeepoE

wild hinge Aug 27, 2018, 8:26 AM

#

I'm having a slow Monday

bitter sable Aug 27, 2018, 9:33 AM

#

good day people

bitter sable Aug 27, 2018, 9:59 AM

#

df['datetime'] = pd.to_datetime(df['datetime'], infer_datetime_format=True)
opgroup = df['datetime'].sub(df['datetime'].shift(1), axis = 0) / np.timedelta64(1, 'h')

yields TypeError: unsupported operand type(s) for -: 'Timestamp' and 'float' since one of the last updates. I think it now has problems with type coercion from the NaN at the first index. does anyone habve the same problems? seems like such a basic issue

#

print(pd.to_datetime(df['datetime'].shift(1), infer_datetime_format=True))

because this doesnt work either

#

and it definitely worked at least 0.18 to 0.21 of pandas

polar acorn Aug 28, 2018, 2:05 PM

#

So I have two pandas dataframes with timestamp columns with different granularity. Is there a floor one of them to the other one? Such that all timestamps in df1 that are between timestamp t1 and t2 in df2 should take on the value of t1 in df2

void anvil Aug 29, 2018, 7:14 PM

#

naconda3\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)

What's the new call for NNs, MLP, etc.?

elfin yacht Aug 31, 2018, 3:13 AM

#

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html

cursive glade Aug 31, 2018, 11:01 AM

#

does the covariance in the upper left corner look like human error while trying to train the algorithm or just a problem that cant be solved by the algorithm ?

📎 unknown.png

noble plinth Aug 31, 2018, 12:08 PM

#

Hello, I am pretty new to ML stuff and was just wondering where I could start reading and learning how to go about "doing" it.

#

I want to do this as a school project and try to make the AI play snake

lean ledge Aug 31, 2018, 12:20 PM

#

@noble plinth maths is a good place to start. Finish till about calc 3 and linalg 2

noble plinth Aug 31, 2018, 12:22 PM

#

idk what it's called, or if you have this in the country you are. But I am in my third year of Engineering Science

#

I've had math for 2 years everyday

lean ledge Aug 31, 2018, 12:22 PM

#

Done diffential equations then in maths I'm guessing?

noble plinth Aug 31, 2018, 12:22 PM

#

yes

#

If I understand you correctly.

#

basically solving equations right?

lean ledge Aug 31, 2018, 12:23 PM

#

Can solve d²x/dt² = -kx?

noble plinth Aug 31, 2018, 12:24 PM

#

that particular one i think?

#

1 sec i'll come back to ya

lean ledge Aug 31, 2018, 12:24 PM

#

Which country are you in?

noble plinth Aug 31, 2018, 12:24 PM

#

sweden

lean ledge Aug 31, 2018, 12:25 PM

#

Hmhm

noble plinth Aug 31, 2018, 12:25 PM

#

i can give you what x is

lean ledge Aug 31, 2018, 12:25 PM

#

Know what grad, div and curl are?

noble plinth Aug 31, 2018, 12:25 PM

#

grad?

lean ledge Aug 31, 2018, 12:25 PM

#

In terms of vector calculus

noble plinth Aug 31, 2018, 12:25 PM

#

i think so ye

lean ledge Aug 31, 2018, 12:26 PM

#

Should be able to start ML properly then. I recommend Columbia's machine learning course on edX (it's free to audit)

noble plinth Aug 31, 2018, 12:27 PM

#

it's just that i have to translate all your terms to swedish to understand xd

lean ledge Aug 31, 2018, 12:27 PM

#

If you find it's too rigorous and mathsy for you, Andrew Ng's ML course on coursera is also okay

#

Yeah sorry about that, wish I could speak Swedish

noble plinth Aug 31, 2018, 12:27 PM

#

alrighty, thanks a lot

#

xd, it's not your fault

olive trench Aug 31, 2018, 8:04 PM

#

Guys I want to look into continually training a neural network. Are there any sources? I can't find to google anything

#

Basically I want to have a pre-trained model and then add more examples continually. Do I just use partial fit? Do I fit it as many times as the network has been trained for?

#

Or do I wait for more examples and do a batch?

small ore Aug 31, 2018, 8:15 PM

#

@noble plinth AndrewNg's course has some kind of translation or transliteration. I am not sure to what extent it is good

void anvil Sep 4, 2018, 9:07 PM

#

I'm trying to run just a simple multioutput ANN on some data.

I've split my data into a train / test split for my predictors X and my predicted values Y (x_train, x_test, y_train, y_test; train is what the model should learn on and test is what the model should try to maximize prediction accuracy on). Trying to figure out the call for any multioutput train in scikit to just double check I have everything set up correctly.

I've only ever used the Weka GUI so I'm new to the python thing, trying to find the basic calls

#

There's stuff like this:

https://medium.com/technology-invention-and-more/how-to-build-a-multi-layered-neural-network-in-python-53ec3d1d326a

Medium

How to build a multi-layered neural network in Python

In my last blog post, thanks to an excellent blog post by Andrew Trask, I learned how to build a neural network for the first time. It was…

#

where it clearly shows you how to build your own ANN, but I would assume that there are a ton of preexisting packages

void anvil Sep 5, 2018, 12:53 AM

#

I guess I'm trying to ask is there a way to manually set train/test sets for k-fold validation

#

instead of relying on au to splits

humble loom Sep 5, 2018, 3:09 AM

#

I have a new computer running Windows 10 and I just installed a 1070ti. I cannot get python to recognize the GPU. Is there a doc that provides info on installing everything so that Python will recognize the GPU?

#

I think I need the CUDA Toolkit but the install keeps failing

turbid bay Sep 5, 2018, 3:33 PM

#

yh im pretty sure u 100% need the CUDA Toolkit

#

i had the same problem with my 1070ti when trying to instsall it

#

eventually when installing instead of selecting install all. I had to go through each section of the toolkit and download each section individually

velvet anchor Sep 5, 2018, 3:51 PM

#

CUDA can be quite fickle

#

it took me a couple days to install it for my Tesla card but I forget what I did to fix that issue

high ocean Sep 5, 2018, 4:27 PM

#

Yo. Does anyone here use xarray datastructures?

#

I have a few questions that docs don't resolve.

naive hornet Sep 5, 2018, 4:28 PM

#

(once again, the explicit question is significantly more likely to attract a response)

high ocean Sep 5, 2018, 4:33 PM

#

Thanks for guiding me through this @naive hornet

#

Ok.
I have an xarray.Dataset with dims x,y,t
lets call that foo
when I run foo.isel(t = 0) I get a subset xarray.Dataset called bar
bar,unlike foo only has the dims x, and y
the selection, for some reason dropped the t coordinates/dimensions despite this particular slice, still having an associated t value.
Any idea on how I can select single values without dropping dimensions?

thorn river Sep 6, 2018, 5:46 PM

#

I've tried for a long time and can't seem to figure it out. Anyway, if anyone feels like answering this it'd be great hahah.

Question description:

Complete function features , which takes a signal (as a three-dimensional array), a list of feature functions, and returns an array with all the features extracted by the functions concatenated together. Each feature function in the list can be applied to a 3-dimensional signal of shape (example, channel, time), and will reduce it to a two-dimensional array along the axis specified via the axis keyword argument.

You may find the functions np.concatenate or np.hstack useful.

Example list of feature functions:
summaries = [np.mean, np.min, np.max, np.std]
Given input of the following dimensions: (677, 204, 200)

Write the function in such a way that the output will have the following dimensions : (677, 816)
So basically the question now is: whats the function for this.

feral lodge Sep 7, 2018, 11:56 AM

#

@thorn river If I were you, I'd break this problem down and solve it incrementally!

First, when working with multi-dimensional data, it's useful to get a good mental picture of what the data actually is, or could be, to make it less abstract and to realize what we're actually doing. Your (sample, channel, time) data has dimensions (677, 204, 200). You can imagine that this data was recorded in a router in an office, where everyone constantly uses the internet. 204 people can connect to the router. Your data can then be imagined to be the recorded data usage of those 204 people, recorded for 200 minutes each day over a 677-day period. In that data, element [i, j, k] is the data usage (number of bytes) of person j, during minute k of day i.

The inputs to your function are the data recordings and the summaries list -- the functions in this list are different statistics, i.e. functions for extracting one-number summaries of data sequences. The output of your function is a matrix containing the result of applying each summary to each channel, for each day. So if summaries = [np.mean], your output will have dimension (677, 204). In that output, element [i, j] will be the mean data usage of channel j over the 200-minute recording period, on day i.

#

Now, instead of trying to solve the entire problem all at once, I would do the following:

1) Figure out how to extract the recordings for one day (this will be a 204x200 matrix) and apply one of the summaries to each channel for that day. This will produce a 204-length vector, since we have summarized the 200-minute data usage recording to one number for each of the 204 channels.

2) Figure out how to apply all of the summaries for that one day. This will probably involve placing some of the code from 1) inside a for loop, looping over summaries. Each iteration of the loop will produce a 204-length vector, just like in 1). According to the problem description, these vectors should be stacked, or concatenated, using np.hstack or np.concatenate. In the end, this will produce a (204*len(summaries))-length vector. If summaries contains 4 functions, like you showed, then the length will be 816.

3) Figure out how to apply the method of 2) to each of the 677 days in the data. This will probably involve putting the code from 2) inside a for loop, looping over range(677). (There may be more efficient methods than looping, but looping is probably the simplest to understand.) Now, each iteration of this loop will produce a (204*len(summaries))-length vector, just like in 2). According to the problem description, each of these vectors should constitute the rows in the final output matrix -- so one row for each of the 677 days. The dimension of the final output is thus (677, 204*len(summaries)). If summaries contains 4 functions, then the dimension will be (677, 816).

#

So if you do it like this, your function might look sort of like this:

def apply_summaries_to_data(data, summaries):
    # This function contains the code from 3)
    output = np.empty(data.shape[0], len(summaries)*data.shape[1])

    for sample in range(data.shape[0]):
        # This loop contains the code from 2)
        output_row = np.empty(0)

        for summary in summaries:
            # This loop contains the code from 1)

            # apply summary to data[sample, :, :].
            # concatenate result to output_row.

        output[sample, :] = output_row
    return output

thorn river Sep 7, 2018, 12:21 PM

#

@feral lodge Thank you so much for the directions/explanation, I can work with this 😃

#

I'll let you know later how it went

feral lodge Sep 7, 2018, 12:22 PM

#

Good luck friendo! 👌

feral lodge Sep 7, 2018, 12:53 PM

#

@high ocean I've never worked with xarray and I can't really stay and help, but when I try with some random data, I get this:

>>> 
>>> # create random dataset
>>> ds = xr.DataArray(np.random.rand(4, 3), [('x', range(4)), ('t', range(3))]).to_dataset(name="my-data")
>>> 
>>> 
>>> # Get slice
>>> ds.isel(t=0)
<xarray.Dataset>
Dimensions:  (x: 4)
Coordinates:
  * x        (x) int64 0 1 2 3
    t        int64 0
Data variables:
    my-data  (x) float64 0.3875 0.331 0.07215 0.7621
>>>

You mean it's not enough to have the info

    t        int64 0

and you would like to be able to keep indexing this slice with other values of t? There's probably no way of doing that in that case. Your data is a 3D-matrix, which you can imagine as a stack of papers. The stack of papers has height, width, and thickness, ie (x, y, t). When you extract a slice, say at t = 5, that's like removing the sixth paper from the stack and regarding that separately from the rest. A single sheet of paper has the same height and width as the stack, but has no thickness, so it has no t-dimension. The only use for indexing other t's would be to get other papers from the stack, which you cannot from this single sheet; you need to go back to the original stack.

high ocean Sep 7, 2018, 5:05 PM

#

strange. when I do the same thing it doesn't have t in the coordinates.

#

What version are you using?

#

Also I realized that I was applying this to a dataset rather than dataarray

#

I'll play around with it to see if splitting id different on the data-array level.

feral lodge Sep 7, 2018, 6:55 PM

#

My xarray version is 0.10.8! I get the same result for both datasets and dataarrays

high ocean Sep 7, 2018, 6:58 PM

#

I'll switch versions and see if things are different!

thorn river Sep 8, 2018, 3:40 PM

#

@feral lodge I still can't figure out how to get it to work.

    # This function contains the code from 3)
    output = np.empty((data.shape[0], len(summaries)*data.shape[1]))

    for sample in range(data.shape[0]):
        # This loop contains the code from 2)
        output_row = np.zeros(0)

        for summary in summaries:
            # This loop contains the code from 1)
            stat = summary(data[sample, :, :], axis = 1)
            
            np.concatenate((output_row, stat))
            # apply summary to data[sample, :, :].
            # concatenate result to output_row.

        output[sample, :] = output_row
    return output```

Gives the error: 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-65-88d10ee6ba05> in <module>()
----> 1 apply_summaries_to_data(data, summaries)

<ipython-input-64-e1995b578e68> in apply_summaries_to_data(data, summaries)
     15             # concatenate result to output_row.
     16 
---> 17         output[sample, :] = output_row
     18     return output

ValueError: could not broadcast input array from shape (0) into shape (816)

feral lodge Sep 8, 2018, 3:44 PM

#

np.concatenate((output_row, stat)) returns the concatenation! To overwrite the vector in output_row, you'll have to do output_row = np.concatenate((output_row, stat))

thorn river Sep 8, 2018, 3:44 PM

#

o my god

#

that makes so much sense

#

completely forgot about that

feral lodge Sep 8, 2018, 3:45 PM

#

Easy mistake to make!

#

Numpy and pandas functions usually work like that. It makes it very easy to chain operations

thorn river Sep 8, 2018, 3:46 PM

#

Nice! Alright, time to test it with predictions

#

My first attempt at this exercise kept giving me the same accuracies, which did not agree with how the accuracies should have been

#

Hmmm

#

Anyway, @feral lodge thanks a lot for the help!!!! Very helpful to think about the arrays with for loops like that.

Made me better understand what I was actually doing as opposed to my first function to solve this exercise

    stat = []
    
    for i,j in enumerate(array):
        stat.append(sumstat(array[i], axis = 1))
    stat_np = np.array(stat)
    return stat_np


def features(signal, functions):
    x = 0
    for i in range(len(functions)):
        if i == 0:
            b = features_stat(signal, functions[i])
            x = b
        else:
            x = np.concatenate((x, features_stat(signal, functions[i])), axis = 1)
    return x```

feral lodge Sep 8, 2018, 4:08 PM

#

No worries! Did your perceptron thing work out?

#

It seems like weird data to use for linear regression if you ask me o:

thorn river Sep 8, 2018, 4:11 PM

#

Yeah we were told to use a perceptron for it haha

#

Still got stuck at 0.2 accuracy

#

I found out that it only predicted 5

radiant notch Sep 9, 2018, 11:05 AM

#

Hello
I've used a neural network template from the internet
And modified it slightly to use my own data

#

https://pastebin.com/Yuc6ZtHt

Pastebin

[Python] import numpy as np import time @np.vectorize def s...

#

But it always outputs numbers that are very close to 1
I've tried to train it with this data:
dataset = [[1,2],[2,3],[3,4]]
targetDataset = [[3,4],[4,5],[5,6]]

#

So by inputting [1,2], you'd expect the neural network to output something close to [3,4] after 1000 training runs

#

But no, it outputs "[[0.99944112] [0.9995817 ]]"

#

At least it got one thing right, the second output is bigger than the first.

wary willow Sep 9, 2018, 11:48 AM

#

What does list.index() return if there isn't any of that entry?

#

Oops wrong chat

placid snow Sep 9, 2018, 11:49 AM

#

 |  index(...)
 |      L.index(value, [start, [stop]]) -> integer -- return first index of value.
 |      Raises ValueError if the value is not present.```

#

It doesnt return anything, it raises an exception

wary willow Sep 9, 2018, 11:50 AM

#

Is there a way to do an if statement that checks if an entry exists in a list?

radiant notch Sep 9, 2018, 11:52 AM

#

you're in the wrong channel...

radiant notch Sep 9, 2018, 12:27 PM

#

Can anyone help me with my issue?

small ore Sep 9, 2018, 12:30 PM

#

@radiant notch You don't generally get replies in this channel as fast as in the help channels but you are in the right channel. Wait it out.

radiant notch Sep 9, 2018, 12:30 PM

#

I'm having the same issue with this code too.

#

https://pastebin.com/pqb9Esnk

Pastebin

[Python] from numpy import exp, array, random, dot class Neur...

#

Ok @small ore

open plaza Sep 9, 2018, 1:09 PM

#

can anybody here help me try plotting this csv file with seaborn? 😦

#

im on day 3 of no success

small ore Sep 9, 2018, 1:12 PM

#

!t ask

arctic wedgeBOT Sep 9, 2018, 1:12 PM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

timid yarrow Sep 9, 2018, 2:13 PM

#

Anyone here able to help me set one column based on the value of another column? Here's what I've got so far:
df['Group'] = 'Fixed Assets' if df['Item'] in FixedAssets else df['Item']

feral lodge Sep 9, 2018, 5:10 PM

#

The output of the last layer of neurons are passed through a sigmoid function σ(z) = 1/(1 + exp(-z)) https://en.wikipedia.org/wiki/Sigmoid_function.

σ(z) squashes z to a value between 0 and 1. This is useful for 2-class classification, but makes regression (which is what you're trying to do) impossible, since the output will always be [0, 1]. The reason your output is always ≈ 1 is because you're training it on output values above 1, which means it's constantly being told "go higher, go higher!!", even though it can't

#

For the same reason, if we change the training targets in your last code from [4, 5, 6, 7, 8, 9, 10] to [-4, -5, -6, -7, -8, -9, -10], the predictions will not be 0.999996 ≈ 1; they'll be 1.90919584e-07 ≈ 0 🙂 The network is being told "go lower, go lower!!", even though it cannot go lower than 0

#

@radiant notch

radiant notch Sep 9, 2018, 8:06 PM

#

Ok

#

What's a good way to make it so that it'll work on data that goes up

#

and is positive

#

Like...

#

[4, 5, 6, 7, 8, 9, 10]

#

@feral lodge

radiant notch Sep 9, 2018, 8:50 PM

#

Anyone?

feral lodge Sep 9, 2018, 10:35 PM

#

@radiant notch For regression you'll probably just want the identity activation function f(z) = z for the output layer -- the identity function has no restrictions on its output, so it can produce any number. Here's a list of activation functions with corresponding domain and range: https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions; it's good to have a look! However, we can't just switch output activation functions; they're related to the cost function that we optimize to train the network. So If we switch output activation function, we usually have to optimize a different cost function. Before doing anything else though, you should try out a simpler model than a neural network. Neural nets are extremely flexible and can estimate a huge number of different functions, but that comes at the cost of requiring a lot of training data. If your training data consists of just a few points, a neural net is pretty much a useless model -- it will be much too overfit and will not be able to generalize to new, unseen, data.

On top of that, the flexibility of a neural network is often unneeded, since data often follows a linear, quadratic, or otherwise polynomial pattern! Before trying a neural net, you should try fitting a line or curve with linear regression. These models have much less flexibility, but that 's both a strength and a weakness -- they require a lot less data for fitting.

#

Finally, working with hand-written neural networks like in your files is great for learning the basics, especially if it's done in the context of a course and preferably written by yourself from the bottom up. In practice, however, you'll definitely want to work with a library like Keras, Tensorflow or ~~PyCharm~~ PyTorch. Then you don't have to change a bunch of code if you want to try a different output activation function, and you also won't have to manually implement the gradient descent, which gets very difficult when you have more than a single hidden layer. Linear models, polynomial models and many others (like tree-based models, which may also be a good idea for your data) can also be used from libraries; a lot of people like Scikit-learn

placid snow Sep 10, 2018, 7:39 AM

#

does it make sense to have a "custom" distance algorithm to use with A*, where I get the Manhattandistance, and multiply it with the Hammingdistance?

#

It's solving a 0-8 board game

#

One of these games

📎 unknown.png

lean ledge Sep 10, 2018, 9:12 AM

#

To add on to what Sladon said, deep learning in general at the moment is sort of overrated. It has come leaps and bounds in situations like computer vision but for many tasks it's significantly slower and doesnt perform better than classic machine learning techniques. They're hyped and all but its worth learning more classical ML techniques. Decision trees or clustering or whatnot may not sound as cool as neural networks but they're more interpretable, faster and often just as well performing as NNs in many scenarios.

#

Personally I'm excited for how cool reinforcement learning has been recently. Lots of cool stuff happening in engineering such as in control theory where reinforcement learning based agents with trained policies are acting as controllers in complex systems that many people dont have the maths knowledge to analytically solve for

hasty maple Sep 10, 2018, 9:16 AM

#

An ML model is part of the feedback loop for a Control system eh? That's cool

lean ledge Sep 10, 2018, 9:18 AM

#

It is! Doing wonders in some parts of robotics

#

Reinforcement learning in a way is a control system on its own

#

Hell, the term agent is also used by mathematicians in control theory a bit

small ore Sep 10, 2018, 9:21 AM

#

@radiant notch Slandon means PyTorch, not PyCharm. Typo I guess

lean ledge Sep 10, 2018, 9:22 AM

#

lmao didnt even notice

small ore Sep 10, 2018, 9:22 AM

#

Raggy, did not know about 'Classical ML'. Good to know. Where do I learn about it?

lean ledge Sep 10, 2018, 9:24 AM

#

Almost any Machine Learning course will start with more classical ML stuff. Linear regression by gradient descent, logistic regression, KNN clustering, etc

small ore Sep 10, 2018, 9:25 AM

#

Ah. So that is what you meant by classical ML. I learnt that but do not know what 'decision trees' etc are

lean ledge Sep 10, 2018, 9:26 AM

#

Search it up! https://en.wikipedia.org/wiki/Decision_tree

#

The more techniques you know as a data scientist, the better because you can choose the right tool for the job

#

Here's a slightly more ML based introduction rather than OR based https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052

small ore Sep 10, 2018, 9:28 AM

#

Thank you!

#

Also rags, donno if Portal jumper's nick is as per name policy

lean ledge Sep 10, 2018, 9:39 AM

#

I can mention them fine, shouldnt be a problem

small ore Sep 10, 2018, 9:41 AM

#

👍

lapis sequoia Sep 10, 2018, 11:19 AM

#

@small ore you can help me ?

small ore Sep 10, 2018, 11:20 AM

#

!t ask

arctic wedgeBOT Sep 10, 2018, 11:20 AM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

small ore Sep 10, 2018, 11:20 AM

#

No one helps one-on-one. Ask and wait

lapis sequoia Sep 10, 2018, 11:20 AM

#

This is true or false ?

📎 unknown.png

small ore Sep 10, 2018, 11:21 AM

#

That doesnt look like a question for this server. There are at least 5 different mathematics servers on discord which could help you

lapis sequoia Sep 10, 2018, 11:22 AM

#

which are these servers ? @small ore

small ore Sep 10, 2018, 11:22 AM

#

Can't give links here. DM me and I will see if I can get hold of a couple

glad pivot Sep 10, 2018, 2:57 PM

#

anybody here at the moment?

simple crag Sep 10, 2018, 2:59 PM

#

It's generally faster to get a response if you just ask your question instead of asking to ask your question.

glad pivot Sep 10, 2018, 3:00 PM

#

Yeah I know, but I also didn't want to go to another resource and get an answer and waste someone's time here.

#

This is a plot of a PCA component and the various weights associated with features. I'm a bit confused as to why the MOBI_REGIO score is indicative of low movement patterns, when it is a negative number.

📎 unknown.png

#

That same question goes for basically all of the other negative feature weights.

glad pivot Sep 10, 2018, 3:22 PM

#

https://stats.stackexchange.com/questions/26352/interpreting-positive-and-negative-signs-of-the-elements-of-pca-eigenvectors I was able to find my answer here

Cross Validated

Interpreting positive and negative signs of the elements of PCA ei...

If I center my variables and then run a PCA analysis, do I need to interpret negative eigenvectors different than positive eigenvectors?

Clarification: In my PCA analysis I have in a component both

wild haven Sep 10, 2018, 4:42 PM

#

i have a pandas data frame whose index refer to individual particles, and columns refer to information about those particles (vector components of momenta, amongst other things). i'd like to be able to apply a function that takes all the data from a particular index and modifies that data in place (specifically, i want to perform coordinate transformations on the momenta). is there a "fast" way to do this via the dataframe, or do i have to just iterate over the index the slow way?

small ore Sep 10, 2018, 5:26 PM

#

Formula for the said transformation would be?

small ore Sep 10, 2018, 6:04 PM

#

@wild haven I cannot try out or search with just what you have provided. You could probably elaborate with an example or check one of the function application methods. Maybe this: https://pandas.pydata.org/pandas-docs/stable/basics.html#transform-with-multiple-functions

wild haven Sep 10, 2018, 6:14 PM

#

example of what to do with one of the rows:
df.columns returns ['rpx_1', 'rpx_2', 'rpy_1', 'rpy_2', 'rpz_1', 'rpz_2']
for each row, I'd like to sum the momenta and normalize the resulting vector:

new_x = row['rpx_1'] + row['rpx_2'] 
new_y = row['rpy_1'] + row['rpy_2'] 
new_z = row['rpz_1'] + row['rpz_2']
length = sqrt(new_x**2 + new_y**2 + new_z**2)
new_x = new_x / length
new_y = new_y / length
new_z = new_z / length
return new_x,new_y,new_z in a new data frame

#

i need one function that takes multiple values out of a given row, manipulates those values, and returns new values (rather than multiple functions)

#

ahh, think i got it @small ore ! i can use .apply and pick out values inside the function, then return a series that represents the new vector

#

thanks for the push in the right direction

small ore Sep 10, 2018, 6:26 PM

#

Um. Can you elaborate so that I can learn? I am only trying to learn by looking up on the web and trying out things

#

.apply seemed to change the column on which it is applied

wild haven Sep 10, 2018, 6:38 PM

#

def transform_row(row):
    data = row[['rpx_1','rpx_2','rpy_1','rpy_2','rpz_1','rpz_2']]
    rpx_sum = data['rpx_1'] + data['rpx_2']
    rpy_sum = data['rpy_1'] + data['rpy_2']
    rpz_sum = data['rpz_1'] + data['rpz_2']
    length = np.sqrt(rpx_sum**2 + rpy_sum**2 + rpz_sum**2)
    return pd.Series({'unit_x': rpx_sum/length, 'unit_y': rpy_sum/length, 'unit_z': rpz_sum/length})
    

df = data.apply(transform_row,axis=1)

#

this returns a data frame with 3 columns: unit_x, unit_y, unit_z

#

that declaration of data is probably superfluous

#

maybe not though, i don't have a good grasp of what's being done when .apply is being called

wanton saffron Sep 10, 2018, 6:48 PM

#

Hi, not sure if this is the right place but I'm looking to display some data graphically in a line graph.

#

{
    "2018-09-09": 167,
    "2018-09-08": 786,
    "2018-09-07": 50,
    "2018-09-06": 148,
    "2018-09-05": 12,
    "2018-09-04": 18,
    "2018-09-03": 20,
    "2018-09-02": 8
}

#

This is a small sample, and in some places there aren't values for every day.

proud raven Sep 10, 2018, 6:56 PM

#

@wanton saffron What is your question? Is it how to show a line graph?

wanton saffron Sep 10, 2018, 6:59 PM

#

More so the best way to do it with this large of a dataset

proud raven Sep 10, 2018, 7:01 PM

#

Can you be a little more specific about "best way". Are you looking for module recommendations? Recommendations for how to display a lot of information? Recommendations for how to fill in missing days? etc.

wanton saffron Sep 10, 2018, 7:02 PM

#

How to display a lot of information

#

📎 Figure_1.png

#

This is my first attempt but the data isnt very clear at all

#

This is literlaly just by lining up the date and number and plotting it

#

Maybe a way to change the x axis to display dates by month so you can at least read those

proud raven Sep 10, 2018, 7:11 PM

#

The quickest solution generalized is to find the min/max date of your data and calculate a set number of intervals in between. Say every month.

#

😃

#

If you want something a little easier, or you want a module to do the heavy lifting, try plotly: https://plot.ly/python/time-series/

Time Series

How to plot date and time in python.

wanton saffron Sep 10, 2018, 7:12 PM

#

Thank you, I'll take a look

wild haven Sep 10, 2018, 7:16 PM

#

any decent plotting module should let you designate the increments of the axis

#

plotly, matplotlib, etc

wanton saffron Sep 10, 2018, 7:31 PM

#

📎 Capture.PNG

#

Is it possible to have the specific data points like in my first graph but with the labels just being monthly like it is here?

upbeat orbit Sep 11, 2018, 3:51 AM

#

@wanton saffron log scale y-axis (for the first graph)

brittle wing Sep 12, 2018, 11:02 AM

#

https://stackoverflow.com/questions/32669415/opencv-ordering-a-contours-by-area-python
I used this to sort my contours
But when i use cv2.drawContours(img,largestcon,-1,255,-1) wont fill the contour

Stack Overflow

OpenCV: Ordering a contours by area (Python)

The OpenCV library provides a function that returns a set of contours for a binary (thresholded) image. The contourArea() can be applied to find associated areas.

Is the list of contours out outpu...

#

It will just give me the outlibe of the contour

#

Pls help

feral lodge Sep 12, 2018, 3:20 PM

#

From what i can see here https://docs.opencv.org/3.3.1/d6/d6e/group__imgproc__draw.html#ga746c0625f1781f1ffc9056259103edbc and here https://stackoverflow.com/questions/15340052/python-opencv-cv2-equivalent-for-cv-filled there's a thicknessparameter in drawContours that defaults to drawing only the outline. Try cv2.drawContours(img,largestcon,-1,255,-1, thickness = cv2.cv.CV_FILLED) or possibly cv2.drawContours(img,largestcon,-1,255,-1, thickness = cv2.FILLED) @brittle wing

brittle wing Sep 12, 2018, 10:21 PM

#

Thanks @feral lodge ill try that

simple fjord Sep 13, 2018, 10:46 AM

#

Hi All

#

Would someone help me with PCA and regression ?

#

here is the question

#

https://stats.stackexchange.com/questions/366723/principle-component-regression-using-python

Cross Validated

Principle component regression using python

I have strain temperature data and I have read that article https://www.idtools.com.au/principal-component-regression-python-2/

I'm trying to build a model and predict the strain out of the temper...

rugged linden Sep 13, 2018, 4:28 PM

#

@glad pivot did you make this?

📎 unknown.png

glad pivot Sep 13, 2018, 4:29 PM

#

no

rugged linden Sep 13, 2018, 4:29 PM

#

do you know how to make it?

glad pivot Sep 13, 2018, 4:29 PM

#

yes

rugged linden Sep 13, 2018, 4:29 PM

#

do you know any tutorials on the internet?

velvet anchor Sep 13, 2018, 7:22 PM

#

@undone storm Andrew Nguyen’s on coursera is nice

undone storm Sep 13, 2018, 7:37 PM

#

I saw that one in the pinned messages too, I will definitely check that out thanks

lean ledge Sep 14, 2018, 2:13 AM

#

/r/LearnMachineLearning has a fantastic wiki that gives an overview of how to get started with ML based on your background and intention: https://www.reddit.com/r/learnmachinelearning/wiki/index
They also have a list of resources that could get you started with ML and some prerequisites:
https://www.reddit.com/r/learnmachinelearning/wiki/resource

#

lapis sequoia Sep 14, 2018, 11:57 AM

#

I'm getting started with pandas (again) and I'm running in an unexpected issue with the .plot() function.

#

📎 1.png

#

Why am I seeing decreases (dips) in the bottom plot? The first plot shows that 'Profit' is always positive, yet when plotting the cumsum of 'Profit' I don't get a monotonically increasing curve. Any ideas?

last pike Sep 14, 2018, 12:14 PM

#

@lapis sequoia would it be possible to post more relevant code?

#

My initial guess is something's wrong with your data

#

So if you could post the code that aggregates the data into a plottable array i could rule that out or say that's the issue

lapis sequoia Sep 14, 2018, 12:19 PM

#

@last pike Thanks for taking a look. I'm loading the data from a .csv file -- sorry, I'm not quite sure what you mean with your last sentence. But I think I have an idea. I'm assuming implicitly that my data is sorted according to 'DateSold' (it should be), but if isn't, that could explain what I'm seeing here.

#

I'm checking this now.

last pike Sep 14, 2018, 12:22 PM

#

@lapis sequoia the x values suggest its sorted right, could you post the csv?

lapis sequoia Sep 14, 2018, 12:23 PM

#

Yeah, that was it! Sorry, if this was a very silly question. I wasn't perfectly sorted by 'DateSold'.

#

📎 3.png

last pike Sep 14, 2018, 12:23 PM

#

Ah, that works then

lapis sequoia Sep 14, 2018, 12:23 PM

#

Lesson learned! Thanks again for pointing me to it!

last pike Sep 14, 2018, 12:24 PM

#

Np, generally for graphing issues like that data will be the biggest potential point of failure

#

Make sure your data is exactly how you want it

vapid swallow Sep 14, 2018, 12:46 PM

#

I'm completely new to matplotlib and I'm trying to make a live updating time series that reads from a serial port

#

I'm trying to set it up that by clicking a button it will start reading and plotting the incoming data

#

But I'm not quite sure how to go about it, should I use ion() and a while loop or use FuncAnimation?

turbid bay Sep 14, 2018, 6:23 PM

#

I've been working on a basic machine learning code to calculate average test score varying on amounts of sleep and amount of study (per hour). I thought using a sigmoid function would be a good idea as that ranges between 1 and 0 and will not exceed this like that of a straight line which could give me more than 100% on the test. However my implementation hasn't been very good with my cost function recording values of around 2.0 billion before gradient descent and around 2.5 billion after it. below is the code what have i done wrong and how can i fix it? https://pastebin.com/y6fUU9xt

Pastebin

[Python] import math alpha = 0.0000001 theta = [1, 1, 1] de...

small ore Sep 14, 2018, 9:20 PM

#

@simple fjord May I know what W_A1, TW_A1, etc and T1 to T10 are in your data? ALso what is Offsetwert?

#

Are there x,y,z ( co-ords) to your sensors? They may also be good inputs to your regression model.

#

From what I understood, you are trying to predict strain from temperature but given that your data seems to be time-dependent, do you want a straight corelation between those two or is there more to it?

simple fjord Sep 14, 2018, 9:26 PM

#

W_A1 is sensor 1 strain, TW_A1 is it's corresponding temperature

#

@small ore

#

You can ignore T1 to T10

small ore Sep 14, 2018, 9:27 PM

#

There must be some relative co-ords to where these are mounted no?

simple fjord Sep 14, 2018, 9:27 PM

#

no just a straight corelation

#

yes there

#

with expectation (t) = strain

#

with minmal error

small ore Sep 14, 2018, 9:28 PM

#

I do not understand you there

simple fjord Sep 14, 2018, 9:28 PM

#

Yes just a straight corelation

#

I tried curve fitting but the results were so poor

#

as you saw in the figure

#

How the placement of sensors would help ?

#

I just want to get the mean of all temperature sensors

#

with each strain sensor

#

and get the prediction

#

I think the mean of all temperature sensors is good ?

small ore Sep 14, 2018, 9:32 PM

#

I do not know your component. But in general I would say strain depends on the position as well as the temperature ( assuming there are no other loads) . So I feel it is important

simple fjord Sep 14, 2018, 9:32 PM

#

you are correct

small ore Sep 14, 2018, 9:32 PM

#

Unless you are mounting several strain gauges about the same point