simple fjord Sep 14, 2018, 9:32 PM

#

so how would I use their placement information

#

no

#

several strain gauges in different positions

#

the strain sensor has temperature sensor its a fiber optice one

small ore Sep 14, 2018, 9:33 PM

#

You can add relative co-ords as regression variables

simple fjord Sep 14, 2018, 9:33 PM

#

Can you explain more

#

I'm very beginner

#

Yesterday I tried PCA with regression

#

I got good prediction from 2 sensors out of 15

small ore Sep 14, 2018, 9:36 PM

#

I am a beginner of some sort too. I am learning. If you could take the first couple lessons from a Machine learning course( Perhaps Andrew Ng) you know how to do multi-variable regression. We can wait for someone else to answer you on PCA. Meanwhile I will try to learn what it is

simple fjord Sep 14, 2018, 9:37 PM

#

https://www.idtools.com.au/principal-component-regression-python-2/

Principal Component Regression in Python - Instruments & Data Tools

Learn about Principal Component Regression. A step-to-step tutorial to build a NIR calibration model using Principal Component Regression in Python.

#

I followed that article

#

maybe its interesting for you

small ore Sep 14, 2018, 10:31 PM

#

Most certainly interesting. Now I need to read up other methods to reduce variables too 😃

void anvil Sep 14, 2018, 10:37 PM

#

With Scikit learn, is there a way to specify batches for online learners (like MLP)?

#

e.g.

#

 clf = MLPRegressor(solver='lbfgs', alpha = 1e-5, hidden_layer_sizes  = (25,25,25,), random_state = 11)
 clf.fit(x_train, y_train)```

```MLPRegressor(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
            beta_2=0.999, early_stopping=False, epsilon=1e-08,
            hidden_layer_sizes=(25, 25, 25), learning_rate='constant',
            learning_rate_init=0.001, max_iter=200, momentum=0.9,
            nesterovs_momentum=True, power_t=0.5, random_state=11, shuffle=True,
            solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
            warm_start=False)```

#

And what I want to do is:

#

Batch 2: trains on 10-150 from x_train
Batch 3: trains on 9,2,159,37 from x_train
etc.```

#

just hl me if you know

small ore Sep 15, 2018, 2:40 AM

#

@simple fjord Chapter 3 of https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf seems to have some methods for it

simple fjord Sep 15, 2018, 7:20 AM

#

@small ore Thanks so much, that's a very good book

radiant notch Sep 15, 2018, 9:07 AM

#

https://hastebin.com/ukukulayew.rb

#

Hello

#

My neural network is taken off of the internet because I'm a beginner and wanted to play around with it

#

But, I have a problem: All the neural network samples like this that I take off the internet use a sigmoid function which apparently means that the output can only ever be between 0 and 1

#

Furthermore, even if the output should be between 0 and 1 it doesn't seem to be giving correct responses

#

For example, [-3,-2,-1] would hopefully give 0

#

but instead it gives somewhere around 0.22

stone oasis Sep 15, 2018, 9:21 AM

#

what is that list of numbers

#

smoothstep or something? like what you are doing with those three args and how do they interact with ssigmoid

#

oh and the default or 'centered' value concerning sigmoid functions with an input of 0 is .5, not 0

radiant notch Sep 15, 2018, 9:27 AM

#

Well

#

I'm just testing it at the moment

#

and giving it test runs where [1,2,3] = 4

#

and [3,2,1] = 0

#

stuff like that

stone oasis Sep 15, 2018, 9:28 AM

#

you know order matters right

#

or wait nvm

#

u just showed me lol

#

@radiant notch https://en.wikipedia.org/wiki/Vanishing_gradient_problem check this out, rly interestin stuff

Vanishing gradient problem

In machine learning, the vanishing gradient problem is a difficulty found in training artificial neural networks with gradient-based learning methods and backpropagation. In such methods, each of the neural network's weights receives an update proportional to the partial deri...

radiant notch Sep 15, 2018, 12:46 PM

#

I know what that is

#

I've read some of it

#

And already knew anyhow

pine dune Sep 15, 2018, 7:15 PM

#

I am using this github project and it is not working i am getting this error Traceback (most recent call last):
File "create_dataset.py", line 9, in <module>
from predict import predict
File "D:\AI\Game\Game-Bot-master\predict.py", line 3, in <module>
from scipy.misc import imresize
File "C:\Users\Fidgety\Anaconda3\envs\tensorflow\lib\site-packages\scipy__init__.py", line 62, in <module>
from numpy import show_config as show_numpy_config
ImportError: cannot import name 'show_config' https://github.com/ardamavi/Game-Bot

GitHub

ardamavi/Game-Bot

Artificial intelligence learn playing any game with watching you. - ardamavi/Game-Bot

lapis sequoia Sep 16, 2018, 6:10 AM

#

!t resources

arctic wedgeBOT Sep 16, 2018, 6:10 AM

#

resources

It can be difficult to know where to begin when you are first starting out with Python. On our website, we have compiled a list of both free and paid resources that we recommend for learning and mastering Python.

It is hard to say exactly where you should start, as everyone will have a different prefered method of learning, but whether you like video tutorials, books or courses, you should find a suitable resource on our resources page

lapis sequoia Sep 16, 2018, 6:23 AM

#

hey guyss

#

i am trying to learn data science using python

#

i have learnt how to use numpy

#

and some plotting

#

thaT was free course on data camp

#

😉

#

now where should i head?

#

guide me

radiant notch Sep 16, 2018, 1:23 PM

#

I'm planning on making a neural network but the structure of one is obviously critical

#

So how can I get the neural network to change its own structure? Is this even possible? I don't want the network to be limited by my bad structure... if it is bad.

trail flicker Sep 16, 2018, 1:29 PM

#

@radiant notch if we could do that well, we'll have cracked the ai problem

radiant notch Sep 16, 2018, 4:47 PM

#

Surely there's a neural network that adapts in structure?

#

You'd just need to experiment with different structures when backpropagating?

radiant orbit Sep 16, 2018, 7:51 PM

#

What is difference between SGD and BGD in linear regression..... How are the training examples visited in both cases... Can someone please explain me in brief?

high ocean Sep 16, 2018, 8:04 PM

#

Sdg = one update per single training pair. BDG = one update per many training pairs (batch)

#

Also my phone is correcting SGD

#

and BGD for some reason

radiant orbit Sep 16, 2018, 8:27 PM

#

Thankyou for your response.. Lemme clear .. in bgd.. Suppose i have selected the whole training set as a batch so it means that the average of same set of examples in batch would do an update in every epoch? And in sgd.. You mean that in each epoch I will consider only one training example at once.. ?

junior ore Sep 16, 2018, 8:44 PM

#

Kindly post the answers for Quizzes 2, 3 and 4 in Applied Social Network Analysis in Python from Coursera
The course in the specialization Applied Data Science in Python is extremely abstract and challenging, the tutor extremely vague. My subscription to the specialization ends in less than 8 hours and my $49 USD will go in drain if I don't secure this specialization. Kindly post the answers for the quiz questions.

Quiz 2: https://www.coursera.org/learn/python-social-network-analysis/exam/tZYRH/module-2-quiz

Quiz 3: https://www.coursera.org/learn/python-social-network-analysis/exam/0qgIf/module-3-quiz

Quiz 4: https://www.coursera.org/learn/python-social-network-analysis/exam/CgIV0/module-4-quiz

Coursera

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

Coursera

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

Coursera

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

trail flicker Sep 16, 2018, 8:44 PM

#

🤔 asking for course answers?

junior ore Sep 16, 2018, 8:45 PM

#

without other options

#

because it is inaccessible at least for me in spite of trying to get my head around it for the past 8 hours

trail flicker Sep 16, 2018, 8:46 PM

#

you do realize we cant see those without logging, right?

junior ore Sep 16, 2018, 8:47 PM

#

my bad.. would you able to help me if i send you the pictures?

silk acorn Sep 16, 2018, 8:48 PM

#

You aren't asking for help but for straight up answers

junior ore Sep 16, 2018, 8:49 PM

#

it would be great if you would be able to help me find the answers as well. understand i have reached this stage without other options

hearty hazel Sep 16, 2018, 8:51 PM

#

As stated in #303906096458891264, not gonna happen.

radiant orbit Sep 16, 2018, 8:52 PM

#

Can someone please respond to my previous question?

junior ore Sep 16, 2018, 8:53 PM

#

no issues .. lemme try it would be great if someone can just help me with the questions

hearty hazel Sep 16, 2018, 8:53 PM

#

You can ask specific questions in one of the designated help channels.

#

!ask

#

!t ask

arctic wedgeBOT Sep 16, 2018, 8:53 PM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

hearty hazel Sep 16, 2018, 8:53 PM

#

read that page for more info

junior ore Sep 16, 2018, 8:56 PM

#

the only problem i am grappling with is:

#

the tutor provided juvenile code to perform the functions and the quiz has got a network visual which is really difficult to replicate and also he did not teach the ways in which we can find stuff like node connectivity manually

#

it would help if anyone can provide a workaround so that I can climb this myself

simple crag Sep 16, 2018, 9:02 PM

#

Carte blanche ~~tutoring~~ giving you code with 8 hours to go still is not the purpose of this server

#

If you have a specific question then you can proceed to an unused help channel as previously directed

small ore Sep 17, 2018, 12:41 AM

#

I think it is against Coursera honor code to ask anyone on this planet for quiz answers. If you are not going for the certification you could even skip the quiz

polar acorn Sep 17, 2018, 8:25 AM

#

So he could potentially ask people on the international space station or on Mars for help? Or is in orbit considered on the planet ( I never read the coursera honor code, I just clicked agree)

lapis sequoia Sep 17, 2018, 5:55 PM

#

Hi from where i can learn sci kit for data science

#

i watched a video

#

but i am not getting it

#

help me

#

@chrome spade i know u

#

u r justin from pramp server 😄

solid chasm Sep 17, 2018, 6:01 PM

#

@chrome spade no advertising

chrome spade Sep 17, 2018, 6:15 PM

#

@solid chasm Wasn't trying to advertise. Is there a place on this server I could post about a paid project?

#

@lapis sequoia that is correct 😃 Hello!

solid chasm Sep 17, 2018, 6:17 PM

#

nothing comes to mind, no, sorry

copper swan Sep 17, 2018, 7:45 PM

#

@rich yarrow you might find help here about that problem

#

and you can get some resources aswell when you ask

rich yarrow Sep 17, 2018, 7:47 PM

#

Hello, I am wondering where I can learn more on how to "Use NumPy and Matplotlib to draw a scatterplot of uniform random (x, y) values all drawn from the [0, 1] interval"

#

How do I use both NumPy and Matplotlib to make one scatter plot..

#

import numpy as np```

rich yarrow Sep 17, 2018, 8:22 PM

#

What does the following mean?

#

Fixing random state for reproducibility

np.random.seed(19680801)

silk acorn Sep 17, 2018, 8:23 PM

#

semi-random numbers are based off of a seed

#

If you set the seed, you'll get the same random numbers everytime

rich yarrow Sep 17, 2018, 10:20 PM

#

import matplotlib.pyplot as plt
import numpy as np

How to generate a random number in the range [0, 1]
import random
x=random.randint(0,1)
print(x)
How to do that for two dimensions (x/y)
How to show that on a plot

#

Is anyone able to help me with this?

earnest prawn Sep 18, 2018, 1:03 AM

#

That sounds extremely assignmentish

simple fjord Sep 18, 2018, 1:46 PM

#

Does someone has signal processing experience ?

#

https://stackoverflow.com/questions/52366673/calculate-time-shift-between-two-signals-and-shifting/52367241?noredirect=1#comment91679450_52367241

Stack Overflow

Calculate time shift between two signals and shifting

I have a temperature sensor and a strain sensor, I would like to calculate the time delay between the two sensors.

def process_data_time_delay(temperature, strain, normal_data):
from scipy imp...

#

Would someone please help me with calculating the time delay between two signals ?

#

the data set is there in the post

#

I can't find the peaks and shift the signal using the max peak index

wary willow Sep 18, 2018, 3:49 PM

#

Is there a place I can go to learn about making python AIs/machine learning?

placid snow Sep 18, 2018, 3:51 PM

#

📌 pins should have a few sources iirc

velvet anchor Sep 18, 2018, 8:53 PM

#

Andrew Ng on coursera 👌

small ore Sep 18, 2018, 10:33 PM

#

That is not python though

velvet anchor Sep 18, 2018, 11:30 PM

#

the original message was about AI / ML 😛

tight ibex Sep 18, 2018, 11:32 PM

#

should i just ask the same here? not well versed in discord etiquette, sorry

#

sorry again, got the meaning of the emoji. used to irc kek

small ore Sep 19, 2018, 12:52 AM

#

I like math when explained like this: https://www.youtube.com/watch?v=FgakZw6K1QQ

YouTube

StatQuest with Josh Starmer

StatQuest: Principal Component Analysis (PCA), Step-by-Step

Principal Component Analysis, is one of the most useful data analysis and machine learning methods out there. It can be used to identify patterns in highly c...

▶ Play video

split zodiac Sep 21, 2018, 4:18 AM

#

posted in UI too, but I've been working a new way to do data science stuff with ipython/jupter using graphnodes to connect ipython notebooks with data sources / control flows

#

📎 demo.png

lean ledge Sep 21, 2018, 5:40 AM

#

That looks super cool, whoa

polar acorn Sep 21, 2018, 7:28 AM

#

Matrix looking

polar acorn Sep 21, 2018, 8:11 AM

#

I'm doing a lot of similar Jupyter notebooks for testing some models. I would like to print the same metrics for the results every time, but don't want to copy paste those cells into every notebook. I will of course write an external function and import to all notebooks. But is there a way for this function to output things in several cells? I would like something like this:
cell 1 {
from external_helpers import print_results
print_results(y_hat, y)}
cell 2 {
Print accuracy in %
}
cell 3 {
Plot of y and y_hat
}
cell 4 {
Print a confusion matrix
}
etc. etc.

simple fjord Sep 22, 2018, 7:21 AM

#

Hi

#

would someone help me with this please ?

#

https://stackoverflow.com/questions/52443511/cut-two-signals-in-numpy

Stack Overflow

Cut two signals in numpy

I have the following situation, I did ACF auto correlation between temperature and strain.
I got a shifted signal in temperature and I already shifted it.

Of course the shifted signal has shorter

proud raven Sep 22, 2018, 3:01 PM

#

@polar acorn That's a good use case for https://github.com/nteract/papermill

GitHub

nteract/papermill

📚 Parameterize, execute, and analyze notebooks. Contribute to nteract/papermill development by creating an account on GitHub.

polar acorn Sep 22, 2018, 3:24 PM

#

@proud raven Thanks, I'll check it out

viscid aspen Sep 23, 2018, 3:09 PM

#

Hey, how could I make something like

sensors[any(sensors['Zone'].str.contains(sensor) for sensor in relevant_sensors)]

work in pandas ?
relevant_sensors is a list of strings. (The error I'm getting is about the ambiguity where normally I'd need a bitwise operation instead of a boolean one)

placid snow Sep 23, 2018, 3:11 PM

#

Maybe something like py sensors[sensors['Zone'] in relevant_sensors]? and index is for the first one

#

Without seeing the data, or trying it.

viscid aspen Sep 23, 2018, 3:12 PM

#

I have to make sure to use str.contains cause I want to match e.g. SENSOR_1.203 while there's only SENSOR_1 in relevant_sensors

placid snow Sep 23, 2018, 3:13 PM

#

What is your df storing

#

and what are you trying to get with that slice? The first one that is in relevant sensors?

viscid aspen Sep 23, 2018, 3:15 PM

#

so let's say there's SENSOR_1, SENSOR_2, SENSOR_3 in my relevant_sensors but the values in sensors['Zone'] would be something like SENSOR_1.201, SENSOR_1.202, SENSOR_2.001, SENSOR_5.1234. The result should therefore be a DF sensors where sensors['Zone'] values are only the first 3 (everything except SENSOR_5.1234)

#

does that make sense?

placid snow Sep 23, 2018, 3:16 PM

#

But are they strings?

#

or some object

viscid aspen Sep 23, 2018, 3:16 PM

#

yep, sorry

placid snow Sep 23, 2018, 3:17 PM

#

What do you get if you try my suggestion above then?

viscid aspen Sep 23, 2018, 3:18 PM

#

I get The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

#

(error)

#

just to clarify... the relevant_sensors list is a primitive python list of strings and the column sensors['Zone'] has values of the pandas object type

#

@placid snow I can do what you're suggesting if I convert relevant_sensors to a np.array and then do sensors[sensors['Zone'].isin(relevant_sensors)] but that gives me only the values that are equal to some string in relevant_sensors, not those that merely contain that value

placid snow Sep 23, 2018, 3:24 PM

#

Did a bit of testing and googling, but what about something like this (similar to yours) ```py

df = pd.DataFrame()
df["Zone"] = ["SENSOR_1.201", "SENSOR_1.202", "SENSOR_2.001", "SENSOR_5.1234"]
relevant = ["SENSOR_1", "SENSOR_2", "SENSOR_3"]
df[df["Zone"].str.contains("|".join(relevant))]
Zone
0 SENSOR_1.201
1 SENSOR_1.202
2 SENSOR_2.001```

#

but it doesn't use any, and it joins all the results in relevant to an or like regex expression

viscid aspen Sep 23, 2018, 3:26 PM

#

Well, it gets the job done. Thanks a lot.

#

So .str.contains can be used with a regex?

#

(I assume?)

placid snow Sep 23, 2018, 3:28 PM

#

That's what i gathered from it

#

That it requires a regex to search with

#

so we just create a regex which is basically this or this or this

viscid aspen Sep 23, 2018, 3:32 PM

#

would still be useful to know if I can somehow put my own functions/logic inside the [] of a dataframe, like I wanted to do with any(). This time there was a workaround but sometimes there might not be.

placid snow Sep 23, 2018, 3:33 PM

#

Not that I'm aware of

#

General logic + some of their methods

#

afaik

viscid aspen Sep 23, 2018, 3:38 PM

#

alright, thanks for the help 😃

placid snow Sep 23, 2018, 3:39 PM

#

Anytime

dreamy tapir Sep 23, 2018, 5:13 PM

#

What is the most easy to use neural network lib? I want something like

import nnetwork as nn
MyNet = nn.layers(input=3,hidden=1,output=2)
MyNet.train ([[0,1,0],[0,1]],[1,0,0],[1,0]])
print(MyNet.calculate([0,0,1])```

hasty maple Sep 23, 2018, 5:22 PM

#

keras maybe

dreamy tapir Sep 23, 2018, 5:25 PM

#

Doesn't seem that easy.

#

The closest one I found is http://neupy.com

finite solar Sep 23, 2018, 5:47 PM

#

deepy?

dreamy tapir Sep 23, 2018, 5:52 PM

#

Actually, pyBrain looks now the easiest I found.

#

>>> net = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
>>> trainer = BackpropTrainer(net, ds)```

#

It is actually easier than I wanted! Cool!

polar acorn Sep 23, 2018, 6:18 PM

#

sickit-learn also has a limited selection of networks that is very easy to get started with

velvet anchor Sep 24, 2018, 3:55 PM

#

@dreamy tapir Keras.

#

It’s really a lot easier than it seems

#

You just like throw your layers In like

a=Layer()
B=layer()(a)
....
Out = layer()(last layer)
Model.compile(Out)

#

I also have a wrapper I’ve been working on to simplify the generation of networks slightly for genetic optimization but it’s kind of on hold at the moment

#

Then to train you just write up a generator or use one of the built in ones and pass it into model.train()

#

iirc anyways, it’s been a bit since I’ve worked with it

dreamy tapir Sep 24, 2018, 5:00 PM

#

Really?

#

I'll take a look on the documentation

#

I don't understand... I'm not so advanced...

#

And it looks weird

velvet anchor Sep 24, 2018, 5:29 PM

#

You can look at this https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

Machine Learning Mastery

Develop Your First Neural Network in Python With Keras Step-By-Step

Keras is a powerful easy-to-use Python library for developing and evaluating deep learning models. It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models in a few short lines of code. In this post...

#

or this one https://www.pyimagesearch.com/2018/04/16/keras-and-convolutional-neural-networks-cnns/

PyImageSearch

Adrian Rosebrock

Keras and Convolutional Neural Networks (CNNs) - PyImageSearch

This gentle guide will show you how to implement, train, and evaluate your first Convolutional Neural Network (CNN) with Keras and deep learning.

dreamy tapir Sep 24, 2018, 6:18 PM

#

I still don't understand

dreamy tapir Sep 24, 2018, 6:52 PM

#

But I understand pyBrain

#

And I know synaptic.js perfectly and that's extremely easy and powerful.

dreamy tapir Sep 24, 2018, 7:29 PM

#

Just look at it. https://github.com/cazala/synaptic it is so easy and predefined

GitHub

cazala/synaptic

architecture-free neural network library for node.js and the browser - cazala/synaptic

#

And it's a pleasure the work with that library

#

Omg!

#

I found it http://jon--lee.github.io/neuralpy/

neuralpy | Python Neural Network

My new website

#

It's perfect

dreamy tapir Sep 25, 2018, 11:14 AM

#

But doesn't work on python3

flat thistle Sep 25, 2018, 6:11 PM

#

i’m just a beginner in ML and datascience...i wanted to start working on some small projects so that i can inprove my skills..could u guys please suggest me some projects to start or help me how to find a beginner project

keen pivot Sep 25, 2018, 7:02 PM

#

what's a good graph edge container?

#

i was thinking a dictionary like

#

{0:1, 0:2, 1:2, 2:3}

#

but you can only have unique keys...

#

so can you do {0: [1,2]}

#

of is it better to use {0:(1,2)}

placid snow Sep 25, 2018, 7:42 PM

#

What about pandas / numpys ?

keen pivot Sep 25, 2018, 8:07 PM

#

what do those have?

placid snow Sep 25, 2018, 8:28 PM

#

Depends what you need, but plotting the example data you gave would look something like

📎 unknown.png

#

but thats 2 different graphs for each column, you could most likely swap the numbers around to get the desired effect from that

#

@keen pivot

velvet anchor Sep 25, 2018, 8:32 PM

#

Could use a graph DB as wel

keen pivot Sep 25, 2018, 8:33 PM

#

that's not the kind of graph i mean @placid snow

polar acorn Sep 25, 2018, 8:35 PM

#

A dict would probably do for small things. Any bigger and I'm sure you'll find plenty of packages for that.

keen pivot Sep 25, 2018, 8:37 PM

#

okay.

#

I'm trying to figure out whether to use a tuple, dict, or list inside the dict as well.

#

I'm creating something would be described as a multilayed graph

#

with a main node that has edges to other main nodes

#

and within each node, there's a list(or dict?) of embedded nodes that may have edges to one or more other embedded nodes either in the same main node or others.

#

📎 20180925_213945.jpg

velvet anchor Sep 25, 2018, 8:40 PM

#

Tried Neo?

keen pivot Sep 25, 2018, 8:40 PM

#

I've not

velvet anchor Sep 25, 2018, 8:40 PM

#

It’s a neat graph database

keen pivot Sep 25, 2018, 8:57 PM

#

this makes a lot of sense.

#

I worry this may be a bit too heavy for what I'm looking to do.

#

I'm trying to make an application that runs concurrently while another larger application is running.

#

and based on what that application writes to a file, manipulate the nodes in my graph.

#

I've been implementing my graph myself purely in python.... I don't imagine there being more than 100 main nodes.

velvet anchor Sep 25, 2018, 9:00 PM

#

It may be a little overkill but it's neat to learn if you're going to be doing more stuff with graphs in the future

keen pivot Sep 25, 2018, 9:03 PM

#

Okay.

lean ledge Sep 25, 2018, 11:22 PM

#

@flat thistle MINST database is a classic project everyone does. If you want a more classic ML type, you can look at some basic regression or classification datasets on Kaggle

#

There's a few nice clustering datasets too to flex your DBSCAN and KNN muscles

pliant mantle Sep 26, 2018, 1:00 AM

#

I have a hard one for you, I want to learn about data syncronization

#

filesystems, binary data, python dictionaries

#

what are the tags / subject terms I should be looking for?

lean ledge Sep 26, 2018, 1:06 AM

#

Those are some very different subjects. Python dictionaries are hashmaps with dynamically allocated arrays. Filesystems is big enough to be a course on its own but easily searchable. Binary data is a vague term that can mean anything. For any specific file format, it can be differently laid out so you'll have to search those up separately. Executable binaries have their own executable formats too such as ELF for Linux

#

@pliant mantle

trail flicker Sep 26, 2018, 1:08 AM

#

elf is best binary executable, change my mind

pliant mantle Sep 26, 2018, 1:08 AM

#

@lean ledge
I'm talking about syncing binary data. Example, recognizing a bit has changed on another system and transmitting, byte 1235623451 has changed to 0b00001111

lean ledge Sep 26, 2018, 1:10 AM

#

Oof, I don't know about that, can't comment

pliant mantle Sep 26, 2018, 1:18 AM

#

binary, file, directory, node

lean ledge Sep 26, 2018, 3:23 AM

#

(@pliant mantle you might also wanna post this somewhere else since that isnt exactly data science)

spark nimbus Sep 26, 2018, 1:36 PM

#

@pliant mantle read /dev/sdX in binary mode and set up IPC between the two

#

not sure about writing though

trail flicker Sep 26, 2018, 2:22 PM

#

@spark nimbus ayy

spark nimbus Sep 26, 2018, 2:23 PM

#

ayy

velvet anchor Sep 27, 2018, 5:52 AM

#

@pliant mantle maybe researching how ECC ram works? It’s not exact but the error correcting there might translate over somehow

thorn river Sep 27, 2018, 9:27 AM

#

I'm interested in learning more about fairness/bias in machine learning (and trying to account for those things in a ml project on a dataset with text possibly), would anyone happen to have some directions on some good articles about this?

lean ledge Sep 27, 2018, 9:56 AM

#

ooh on a related note, on a tech conference recently I met the person in charge behind the AI ethics report for Australia. Australia's been a bit behind in terms of policy changes to accomodate AI and the ethical challenges it poses so she was in charge of writing a report built of case studies and possible pitfalls for policy makers with regards to what sort of ML models should and shouldnt be used, how they should be managed etc to take care of fairness and bias, and data privacy of people whose data is used in the models etc. She did her PhD in neuroscience and did work in ethics at the neuroscience research institute hence why she was part of that role

#

just relevant and thought it'd be cool to share

thorn river Sep 27, 2018, 10:19 AM

#

Oh wow that's interesting!

lapis sequoia Sep 27, 2018, 6:50 PM

#

Hello Everyone! I'm hoping someone could help me clear up this error I'm receiving? So I've been trying to learn the basics of Pandas and Matplotlib for Data Analysis/Science, and I've done OK so far. But whenever I get stuck its tough to find an answer since I'm new to Python as well lol here's my code:

https://pastebin.com/VL3nZtrQ

Pastebin

[Python] import pandas as pd from matplotlib import pyplot as plt...

#

and I'm getting the following error: AttributeError: 'NoneType' object has no attribute 'seq'

#

I have no idea what that means lol So on line #12 I create a list from values within a column and on line #23 I use it to set the xticks

polar acorn Sep 27, 2018, 7:59 PM

#

Which line gives you the error?

placid snow Sep 27, 2018, 8:14 PM

#

!t traceback

arctic wedgeBOT Sep 27, 2018, 8:14 PM

#

traceback

Please provide a full traceback to your exception in order for us to identify your issue.

A full traceback could look like: java Traceback (most recent call last): File "tiny", line 3, in do_something() File "tiny", line 2, in do_something a = 6 / 0 ZeroDivisionError: integer division or modulo by zero
The best way to read your traceback is bottom to top.
• Identify the exception raised (e.g. ZeroDivisonError)
• Make note of the line number, and navigate there in your program.
• Try to understand why the error occurred.

To read more about exceptions and errors, please refer to the official Python tutorial.

lapis sequoia Sep 27, 2018, 8:34 PM

#

I was actually able to figure that part out LOL but I do have a question resulting from this bit of code!

When I plot the graph itself, it prints out pretty small and scrunched up. Then I MANUALLY click+drag the bottom corner of the image then Save, and it saves more stretched out; making it easier to read the long x-axis. Why is that, and how do I get it to plot out with a width/length wide enough to accommodate a long axis? I've uploaded the two images.

📎 first_same_manually_dragged.png

#

📎 first_scrunched_up.png

#

Here's the pastebin for the code:

https://pastebin.com/DVTsWdu7

Pastebin

[Python] # Median price of each region plt.close('all') ax = plt...

#

So basically I'd like to use the 'savefig()' option to save the figure already 'stretched-out', without me having to manually print out the plot then click+drag?

lapis sequoia Sep 28, 2018, 7:59 AM

#

📎 python.JPG

#

http://www.cis.umassd.edu/~dkoop/dsc201-2018fa/a1/a1.ipynb https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2017-050118.txt

placid snow Sep 28, 2018, 8:05 AM

#

What did you not understand with using pandas? It seems to be the way to go with this, both for the reason it's asked for and your data goes well with it?

lapis sequoia Sep 28, 2018, 8:12 AM

#

📎 code.JPG

#

"list indices must be integers or slices, not str" posts this error message

#

so I did the first 3 without pandas but can't figure out how to get the year latitude and name of hurricane all in one

placid snow Sep 28, 2018, 8:35 AM

#

Are you trying to create a DataFrame pr hurricane?

#

Seems to be a csv file youre loading. Pandas can create a single df from a csv file with pandas.read_from_csv(filepath) (iirc)

#

And you can do most of this logic with the df itself

small ore Sep 28, 2018, 8:39 AM

#

I dont see a problem description there. I onlysee some unfilled problems

#

Ah. It is in the picture. Difficult to even see the problem.

lapis sequoia Sep 28, 2018, 8:47 AM

#

Okay so I retried the pandas approach and did the read_from_csv and got a data table. Do I add headers now so the data looks more organized?

placid snow Sep 28, 2018, 8:50 AM

#

Your csv file dont have them already?

#

You can write them in the csv, or in code up to you. If they dont already exist

small ore Sep 28, 2018, 8:51 AM

#

There is something like a header= argument for that method

lapis sequoia Sep 28, 2018, 8:51 AM

#

No it looks really messy but im reading a tutorial now on how to clean it it looks a lot cleaner when access df.head()

#

when i print df though it comes out like a badly formatted text file

placid snow Sep 28, 2018, 8:52 AM

#

Show me

lapis sequoia Sep 28, 2018, 8:52 AM

#

okay give me a sec

small ore Sep 28, 2018, 8:53 AM

#


    List of column names to use. If file contains no header row, then you should explicitly pass header=None. Duplicates in this list will cause a UserWarning to be issued.

lapis sequoia Sep 28, 2018, 8:54 AM

#

📎 code2.JPG

#

📎 code3.JPG

placid snow Sep 28, 2018, 8:57 AM

#

Specify the names param as a list of your headers

#

And see if that fixes it

small ore Sep 28, 2018, 9:00 AM

#

Although, going by just what I see ( I don't see the problem statement for problems 1 to 5, I might be missing something), they already have the data ready for you

lapis sequoia Sep 28, 2018, 9:06 AM

#

Problem 1 was to find the amount of unique hurricane names problem 2 was most common hurricane name problem 3 was year with most hurricanes and then 4 is most northern hurricane and 5 is hurricane with maximum sustained wind

#

📎 code4.JPG

placid snow Sep 28, 2018, 9:07 AM

#

Are there only 2 columns to your data?

#

Seemed to have lat and long as well

silk acorn Sep 28, 2018, 9:10 AM

#

For finding unique names, there exists a data type that can only have unique values in them.

small ore Sep 28, 2018, 9:10 AM

#

I mean is it required to use pandas? In the codeblocks above they have opened the file in a different way and allowed you to use it. Or is it some different data for those?

silk acorn Sep 28, 2018, 9:10 AM

#

The length of a set of the names will be the amount of unique names

lapis sequoia Sep 28, 2018, 9:11 AM

#

Its not required, I did the first 3 problems without pandas very easily I just got lost on how to find the latitude and longtitude values from it

#

So I tried adding more columns but it ends up piling multiple data into one column and none of the columns retrieve the latitude, after column 6 it all starts saying NaN

wary willow Sep 28, 2018, 12:45 PM

#

Where should I go to learn very basic machine learning, if I'm a quite beginner python programmer who has no idea how machine learning work?

lean ledge Sep 28, 2018, 12:47 PM

#

@wary willow Check pinned!

small ore Sep 28, 2018, 1:37 PM

#

@lapis sequoia If you have still not been able to solve your problem, try changing the file open code to:

records = []
with open(local_fname,'r') as f:
    for line in f:
        if line.startswith("AL"):
            record = line.strip()
            reports = []
            records.append((record, reports))
        else:
            reports.append([line.strip()])

small ore Sep 28, 2018, 2:13 PM

#

scracth that. No need for that change.
You can just do:

print(max([float(rec.split(',')[4].strip()[:-1]) for rec in record[1]]))

hardy drift Sep 28, 2018, 2:20 PM

#

how could I add a new column to a pandas dataframe, based on other data in each row?

#

say I have a function that takes each row's 'text' column and transforms the data and adds a new column value

#

should i just create a new column , then write a for loop that iterates over each row and fills in the column?

#

not sure if that is the most efficient way

#

hmm it seems like https://stackoverflow.com/questions/34962104/pandas-how-can-i-use-the-apply-function-for-a-single-column answers my question

Stack Overflow

Pandas: How can I use the apply() function for a single column?

I have a pandas data frame with two columns. I need to change the values of the first column without affecting the second one and get back the whole data frame with just first column values changed...

small ore Sep 28, 2018, 2:31 PM

#

Or this?:https://stackoverflow.com/questions/33680666/creating-a-new-column-in-panda-by-using-lambda-function-on-two-existing-columns

Stack Overflow

Creating a new column in Panda by using lambda function on two exi...

I am able to add a new column in Panda by defining user function and then using apply. However, I want to do this using lambda; is there a way around?

For Example, df has two columns a and b. I wa...

tacit meteor Sep 28, 2018, 3:29 PM

#

@small ore you Ninja! 😄 https://javascript.info/ninja-code

Ninja code

hardy drift Sep 28, 2018, 4:44 PM

#

i ended up following the using a loop method described here https://stackoverflow.com/questions/15118111/apply-function-to-each-row-of-pandas-dataframe-to-create-two-new-columns

Stack Overflow

Apply function to each row of pandas dataframe to create two new c...

I have a pandas DataFrame, st containing multiple columns:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23
Data columns:
Date(d...

small ore Sep 28, 2018, 6:23 PM

#

Link from nowhere without context is suspicious brainmon

lapis sequoia Sep 28, 2018, 6:41 PM

#

Alright so I did print(max([float(rec.split(',')[4].strip()[:-1]) for rec in record[1]]))

#

and it returned the highest latitude N of that hurricane. now to generate that for all of them, do I need to make a set out of the latitudes?

small ore Sep 28, 2018, 7:17 PM

#

Well, I believe if you have done the previous problems you can figure out although this involves somewhat more head-scratching. This server does not encourage giving out answers to your assignments afaik. Think and let people here know what you have or where you are stuck. I will see if I can give you more hints tomorrow

lapis sequoia Sep 28, 2018, 8:33 PM

#

Hello data bois

#

anyone good with matplotlib?

earnest prawn Sep 28, 2018, 8:34 PM

#

!t ask

arctic wedgeBOT Sep 28, 2018, 8:34 PM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

earnest prawn Sep 28, 2018, 8:34 PM

#

@lapis sequoia

lapis sequoia Sep 28, 2018, 8:35 PM

#

lol okay. So some basic setup: I'm trying to plot two sets of data; one pulled from iexfinance and one from quandl

#

China = quandl.get_table('DY/IPA', ticker='000010')
China.sort_values('date', inplace=True)```

#

so that works. Okay, great. Now I tried graphing them separately.

#

fig, ax = plt.subplots()
ax.plot(China_date, China.close)
ax.set_title('SSE 180 Index daily price')```

#

and that works. Not gonna lie I'm pretty new to matplotlib / python so that might be a mess

#

and then this works

#

SPY['close'].plot()
plt.show()

#

plt.show()````

#

So i was looking through the matplotlib tutorials and it seemed to indicate that you could just do two plt.plot(data) and it would set them on the same chart

#

like this:

#

plt.plot([2,3,4,5], [10,9,6,4], label='line2')

plt.xlabel('price')
plt.ylabel('date')```

#

Which works fine, alright cool

#

SO why doesn't this work?

#


# these will plot separately but not together. I have no idea why.
plt.plot(China_date, China.close)
SPY['close'].plot()

plt.xlabel('Date')
plt.ylabel('Price')

plt.title('China and SPY Comparison')

plt.show()```

#

I'm fairly sure it's something to do with the "China" data frame having a 'date' column that's converted to an object, while the SPY df has an index column just called date

lapis sequoia Sep 28, 2018, 11:24 PM

#

@earnest prawn any thoughts?

simple crag Sep 29, 2018, 2:53 AM

#

What do you mean by “doesn’t work”

silent current Sep 29, 2018, 4:30 AM

#

Is there a way to increase the size of jupyter inline figures without putting the figure in a scroll box?

#

I'm using plt.rcParams['figure.figsize'] = [15, 5]

#

but no mater what dimensions I make the figsize, I end up getting a scroll box and it's ugly

lapis sequoia Sep 29, 2018, 6:31 AM

#

@silent current I think it's due to the browser resizing the img

lapis sequoia Sep 29, 2018, 5:05 PM

#

hey guys, sorry for the extremely long question yesterday. I kind of tabled that issue as it wouldn't really make sense to graph anyway.

#

However, new question, I'm using Quandl to get the following:

#

SPY = get_historical_data('SPY', start=start, end=end, output_format='pandas')

#

and then plotting the close price

#

plt.show()```

#

📎 download.png

#

but it's not plotting the date, or at least, it's not displaying the date. I'm kind of out of ideas on why / how to get it to actually display the figures

viscid aspen Sep 30, 2018, 12:57 PM

#

Any idea why conda install jupyterlab won't install the latest version for me ? I'm at 0.32 right now but there's already a 0.34, I can even see it in conda search jupyterlab

serene veldt Oct 1, 2018, 10:09 AM

#

hello, im having trouble with memory usage on numpy

#

im working with a lot of matrixes, usually 500+, and when they become about 11x190 +- i get a memoryError

#

is there any eficient way to serialize/compress the matrixes to a list or similar, so that i can just grab them by index and unserialize/decompress on demand?

#

much apreciated

simple crag Oct 1, 2018, 10:46 AM

#

What operations are you performing with the matrices? And how much RAM do you have?

serene veldt Oct 1, 2018, 11:06 AM

#

I have 32 GB of RAM, I'm doing dot products mostly

#

But I managed to fix me memory usage during the operations, it now crashes. By just having them stored

simple crag Oct 1, 2018, 11:08 AM

#

Are you sure you're not growing the matrices? 500 11x190 matrices shouldn't cause OOM issues

serene veldt Oct 1, 2018, 11:26 AM

#

All elements are tuples, not sure if it influences

#

I'm almost sure but I can double check

simple crag Oct 1, 2018, 11:28 AM

#

What do you mean all elements are tuples

serene veldt Oct 1, 2018, 11:35 AM

#

*floats

#

Sorry, idk how it autocorrected

small ore Oct 1, 2018, 1:06 PM

#

Are you doing dot-products on them all together?

#

Or can you open them from a dump and do it one by one and close as you go on?

serene veldt Oct 1, 2018, 1:21 PM

#

I was doing them all together

#

Dumping to dusk and reading when needed will greatly affect performance

#

That's why I wanted to k ow what options I have :/

#

Right know I'm checking pytorch, maybe i can use the tensors

simple crag Oct 1, 2018, 2:10 PM

#

Are you using 64 bit or 32 bit python?

#

Your arrays combined take about 8.4 gigs of memory

serene veldt Oct 1, 2018, 2:23 PM

#

should be 64

#

ill also double check that

placid snow Oct 1, 2018, 2:24 PM

#

I forgot what the name is, but theres a module that (with a bit of overhead) splits up big work loads into smaller chunks that can be run on different hardware even

#

Theres a pycon talk about it alongside pandas and numpy iirc, maybe you can find it

serene veldt Oct 1, 2018, 2:31 PM

#

will also look inot that

#

much apreciated

lapis sequoia Oct 1, 2018, 4:21 PM

#

wtf is data science, analysis.. etc

placid snow Oct 1, 2018, 4:26 PM

#

Read the channel description

lapis sequoia Oct 1, 2018, 5:30 PM

#

@placid snow didn't answer my question^^

placid snow Oct 1, 2018, 5:48 PM

#

It's the work of analysing, categorizing, visualizing and in general read & understand large sets of data.

#

Say you wanted to create a graph showing the wealth distribution of everyone in the US, that's data science for instance

trail flicker Oct 1, 2018, 5:49 PM

#

Big Data™ basically

placid snow Oct 1, 2018, 5:49 PM

#

Or machine learning / AI

#

Pretty much

polar acorn Oct 1, 2018, 5:54 PM

#

It's the channel for the buzziest of buzz words

tall drum Oct 1, 2018, 8:17 PM

#

Hi, I was able to create the following pandas dataframe by using groupby. Now I would need to graph the results with bar plot (one bar for one column, divided to 2 parts)

#

📎 unknown.png

simple crag Oct 1, 2018, 8:18 PM

#

Something like this? https://stackoverflow.com/questions/23415500/pandas-plotting-a-stacked-bar-chart

tall drum Oct 1, 2018, 8:33 PM

#

thanks got it

tight dove Oct 1, 2018, 10:09 PM

#

Hi guys

#

So I'm getting this error while trying to get financial data from Yahooi's API

#

import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader.data as web

style.use('ggplot')

start = dt.datetime(2000, 1, 1)
end = dt.datetime(2016, 12, 31)

df = web.DataReader('TSLA', 'yahoo', start, end)

#

Which gives me this error -

#

ImmediateDeprecationError: 
Yahoo Daily has been immediately deprecated due to large breaks in the API without the
introduction of a stable replacement. Pull Requests to re-enable these data
connectors are welcome.

#

Please how can I resolve this?

#

Is there a work around?

#

I'm using python 3.6

trail flicker Oct 1, 2018, 10:11 PM

#

dont use yahoo

tight dove Oct 1, 2018, 10:11 PM

#

Okay, which can I use

trail flicker Oct 1, 2018, 10:11 PM

#

dunno

tight dove Oct 1, 2018, 10:11 PM

#

*sigh

tight dove Oct 1, 2018, 11:23 PM

#

I found a fix for this

#

import fix_yahoo_finance as yf

yf.pdr_override()

supple mango Oct 2, 2018, 4:32 AM

#

I have a pandas dataframe as follows. I want to add a new cost column that is equal to the Shares column * the corresponding price for the current symbol. So for example in the first row (symbol=AAPL) the cost column would have a value of 1500 * 340.99000 = 511485. How would you do that?

📎 Screenshot_2018-10-02_00.30.06.png

placid snow Oct 2, 2018, 4:39 AM

#

df["newcol"]= df.apply(lambda row: row["Shares"]*row["AAPL"], axis=1)``` perhaps?

#

Can't test it myself on phone, but do try just the apply part first, and see if the result is correct

supple mango Oct 2, 2018, 4:42 AM

#

the row['AAPL'] is hardcoded there is it not? I would like to use whatever is in the Symbol column and pull the appropriate value from the corresponding price column for that symbol

#

so for example the third row with IBM, should multiply 4000 (Shares) * 144.55000 (the price under the IBM column)

placid snow Oct 2, 2018, 4:44 AM

#

You could most likely just pass the symbol to the lamda, or write a function instead and have it read which symbol to use beforehand

supple mango Oct 2, 2018, 4:49 AM

#

orders["newcol"] = orders.apply(lambda row: row["Shares"] * row[row['Symbol']], axis=1)

#

Surprisingly, that seems to be working. Thank you @placid snow !

stone oasis Oct 3, 2018, 4:33 AM

#

yahoo stopped with the ticker data i thought

coral lichen Oct 3, 2018, 8:46 PM

#

hopefully a question that can be answered!

#

I have three 1d arrays of data. Two of the arrays are coordinate arrays (x and y). The third array is intensity (z) measurements are each coordinate position. All three arrays are of the same length. This means that the position x[i], y[i] has an intensity of z[i]. Currently, i cannot create a map like a pcolormesh plot to create a sort of heat map due to the data structures. Does anyone know how i can grid this data so that it will have a map of coordinates with intensity values for each point on the map?

polar acorn Oct 3, 2018, 8:54 PM

#

Are the x and y values placed such that they form a grid?

coral lichen Oct 3, 2018, 8:58 PM

#

no, thats the thing. they are simple 1d arrays that have x and y values that represent coordinate values along each axis. not until you take the first element of each x and y array do we get a position. hoping that makes sense lol

polar acorn Oct 3, 2018, 9:01 PM

#

Sure, they are not formatted as a grid. But if you were to plot them would they make a grid? i.e. do you have n uniqe x values each repeated m times and m unique y values each repeated n times?

coral lichen Oct 3, 2018, 9:02 PM

#

no, i did try to use np meshgrid, and that created the grid like coordinates. now my issue i guess is assigning the positions with their intensity values

#

I assign intensities to the diagonal of that grid right?

polar acorn Oct 3, 2018, 9:11 PM

#

Are all your coordinates positive?

#

As in all x and y values

coral lichen Oct 3, 2018, 9:15 PM

#

theyre both negative actually

polar acorn Oct 3, 2018, 9:17 PM

#

oh

#

# imports
import numpy as np
import matplotlib.pyplot as plt

# fake data, replace with your own x, y, and z list
x = [-1,-2,-3]
y = [-3,-1,-3]
z = [10,3,4]

# turn negative coordinates into positive indexes
x_pos = np.array(x) + abs(min(x))
y_pos = np.array(y) + abs(min(y))

# create empty grid for z values
zz = np.zeros(shape = (max(x_pos)+1, max(y_pos)+1))

# fill empty grid at indexes corresponding to coordinates
for i in range(len(x)):
    zz[x_pos[i],y_pos[i]] = z[i]

# these are the coordinates of the lower right corner of each colored rectangle
xx = np.arange(min(x)-1, max(x)+1) +0.5
yy = np.arange(min(y)-1, max(y)+1) +0.5

# plot
plt.pcolormesh(xx, yy, np.transpose(zz))
plt.show()

polar acorn Oct 3, 2018, 9:46 PM

#

@coral lichen, this might work. Its not very nice looking but it plots the intensity at the right coordinates.

polar acorn Oct 3, 2018, 10:22 PM

#

Also in case your x and y values are not integers you might want to do a simple coloured scatterplot instead. Like this

x = -5*np.random.rand(1000)
y = -7*np.random.rand(1000)
z = x*y

plt.scatter(x,y, marker='o', c=z, linewidths=5)
plt.show()

coral lichen Oct 4, 2018, 12:52 AM

#

@polar acorn thanks! I'm going to work on these when i get home and will let you know!

wild hinge Oct 4, 2018, 10:02 AM

#

hi everyone, I have the following df and I want to create a binary matrix out of it using pandas or any other module to achieve this.

📎 unknown.png

#

the result should be like this

📎 unknown.png

#

I want to find a connection between the series and it seems that the best way to do this is by observing the binary matrix

placid snow Oct 4, 2018, 10:24 AM

#

What about creating dummy data?

#

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html

polar acorn Oct 4, 2018, 10:54 AM

#

If you want binary columns also for values not observed in your columns yet, you can enjoy this monstrous one liner

import pandas as pd

df = pd.DataFrame({'val1':[1,3,4,6], 'val2':[2,2,1,3], 'val3':[5,6,11,2]}, index = ['s1', 's2', 's3', 's4'])

binary_df = pd.concat([pd.concat([pd.get_dummies(df[col]) for col in df], axis=1).groupby(lambda x:x, axis=1).sum(), pd.DataFrame(columns=list(range(1, df.max().max()+1)))]).fillna(0)

Enjoy pulling it apart 😉 I had to much fun writing that to not share it

wild hinge Oct 4, 2018, 11:16 AM

#

@chibli cheers for the answer, the direction is right
@pptt cheers for the answer, I will give this a run asap but first I want to wrap my head around the formula :)))

limpid pivot Oct 4, 2018, 2:07 PM

#

Anyone here know stuff about the ASDF (Advanced Scientific Data Format) library?

small ore Oct 4, 2018, 3:33 PM

#

Guys, if anyone knows, any input regarding my question in #tools-and-devops is appreciated

radiant orbit Oct 5, 2018, 6:42 PM

#

In lowess.. After minimizing the error how should I define y to be predicted.. if I am writing the algorithm from scratch

lapis sequoia Oct 6, 2018, 3:11 PM

#

Is there any python library which is able to detect a float as being a fraction including an irrational number?

simple crag Oct 6, 2018, 3:12 PM

#

https://docs.python.org/3/library/fractions.html

lapis sequoia Oct 6, 2018, 3:12 PM

#

yeah, but I'm talkling about irrationals

#

That library isn't able to detect something as pi, e, or a square root

simple crag Oct 6, 2018, 3:14 PM

#

SymPy, maybe

#

I'm not really sure what the purpose is of what you're trying to do

lapis sequoia Oct 6, 2018, 3:15 PM

#

Well, I do a lot of math stuff using python. And when I get a result I would like to know if it's actually a random number or a fraction which includes an irrational

#

found it in sympy, thanks @simple crag

lapis sequoia Oct 6, 2018, 8:05 PM

#

Hi, anyone can suggest a data science path courses which make me data scientist after courses finishes, but courses have to have nice python syntax and covers all data science concepts, as I see, there is no good course to take my data science and python level simualtenously next level, anyone to help? Thanks 😃

pseudo sentinel Oct 7, 2018, 1:47 AM

#

Have you checked out Udacity?

lapis sequoia Oct 7, 2018, 11:29 AM

#

Yes but, as I remember, there was a payment forcing for nanodegree 😦 . If something, some product is good, or better than others, there is nothing to prove yourself.

#

If course would face 2 face program, maybe I can give this money for this program. But it is online

small ore Oct 7, 2018, 6:39 PM

#

Check pins

#

But yeah, no python. But there are a lot of free courses, tutorials on the net which will not give you any certificate

simple prism Oct 8, 2018, 6:58 AM

#

This channel is a quiet

lapis sequoia Oct 8, 2018, 7:26 AM

#

ded server

vestal axle Oct 8, 2018, 12:10 PM

#

Hello 😃

#

📎 unknown.png

#

Could someone please help me with this task, I really don't know how to approach it

placid snow Oct 8, 2018, 12:51 PM

#

I don't know much about the topic, but have you tried breaking it into smaller pieces?

#

a function for the equations for instance, and plan out what happens

lapis sequoia Oct 8, 2018, 6:26 PM

#

@vestal axle from what I can tell, three parameters are constant

#

a first step *could be rewriting the equations with those values filled in

#

reduce the greek letter salad a bit

small ore Oct 8, 2018, 7:00 PM

#

It looks straight forward to me. ( Havent tried to work it though). You have sigma0 and y0 given and calculate (sigma1, y1) and so on every (sigmat, yt)using the given formula till you get 100 values (t = 100 or 99 not sure, prolly latter) . You have your simulated data. Now plot the data over time. I think how to do the plot in that specific way is the task at your hand

#

@vestal axle

pliant mantle Oct 8, 2018, 11:53 PM

#

this a good place to talk about data structures?

small ore Oct 8, 2018, 11:55 PM

#

Not likely

#

#python-discussion if you want a generic discussion on that topic. Any of the help channels if it is a specific question. #databases if it is regarding databases

vapid lion Oct 9, 2018, 7:47 PM

#

I’m am currently studying ML. And I ran into a query. How would you calculate a confusion matrix (in python) for a multi-label dataset?

pliant mantle Oct 10, 2018, 1:16 AM

#

What is it called when you have a ridiculously extreme deviation in data?

#

singularity?

trail flicker Oct 10, 2018, 1:54 AM

#

outlier?

brittle pewter Oct 10, 2018, 9:17 AM

#

Sounds like outlier

#

@SYMPHONIC DISHARMONY#3195 Row is true label, columns are predicted label

#

compute the counts

#

Use .pivot_table in pandas

vestal axle Oct 10, 2018, 9:23 AM

#

py_noob thank you for your answer, we figured it out! Thanks though 😃

vestal axle Oct 10, 2018, 1:40 PM

#

Btw, is there anyone here who would be willing to help me out with a task that has to be delivered next week 😃

polar acorn Oct 10, 2018, 6:12 PM

#

@vestal axle If you have a specific question just ask and someone will answer it they have time.

@SYMPHONIC DISHARMONY#3195 scikit-learn has a nice implementation if you just want to see your results.

radiant orbit Oct 11, 2018, 4:11 AM

#

Guys which is the best book I should opt for probability and statistics.. ???

turbid bay Oct 12, 2018, 8:24 AM

#

can i ask a question on data-science that isnt written in python but instead octave?

turbid bay Oct 12, 2018, 8:52 PM

#

no one?

#

its not a hard question

simple crag Oct 12, 2018, 8:58 PM

#

This is a Python server, so I'm not sure how well anyone is going to be able to help you with Octave

turbid bay Oct 12, 2018, 9:03 PM

#

im more worried with the logical aspect of the programming tho

#

not any features that octave may have to offer

#

ill ask anyways

#

anybody with octave knowledge want to talk me through how to calculate the cost of the hypothesis using certain theta/weight variables. I am struggling to get my head around it.

i am using a feature/X matrix (5X4), Weight/theta vector (4X1), correct_label/y vector (5X1)

prediction = sigmoid(X*theta)
cost_1 = (-y) .* log(prediction)
cost_2 = (1-y).*log(1-prediction)

total_cost = cost_1-cost_2
J = sum(total_cost)/m```


J should be the cost but it doesnt calculate it correvtly

#

can you @me if you respond thanks 😃

small ore Oct 12, 2018, 9:21 PM

#

@turbid bay What do you mean by not calculate correctly? Values different from what you should have got or are you getting some errors? Secondly you do not seem to have given the entire code. That m in the last line is not available elsewhere in the code. Thirdly did you check if you get the right answers if you created a vector of 1s instead of just using (1-y).

#

The problem may also be in the sigmoid function if you wrote it instead of a built-in

turbid bay Oct 12, 2018, 9:58 PM

#

m is the length of the y vector (sorry for not including that) and no the sigmoid function is built in

#

and yes it doesnt give me the expected output

small ore Oct 12, 2018, 10:08 PM

#

Try (ones(m,1)-y) instead of (1-y) and same change for (1-prediction)

#

You could also do y'*log(prediction) instead of y .* log(prediction) coz that is supposed to be more efficient

turbid bay Oct 12, 2018, 10:14 PM

#

what does the ones() do exactly. and i tried the Transpose matrix multiplication but it wouldn’t give me the right value either so i kept on swapping between the 2

small ore Oct 12, 2018, 10:16 PM

#

ones(p, q) gives you a matrix of the size pXq with all the values as 1

turbid bay Oct 13, 2018, 8:51 AM

#

ah yes that will probably work better than using a scalar value then subtracting a matrix

#

when i am next at my computer i will change it and see if that works. thanks

dreamy tartan Oct 13, 2018, 10:47 AM

#

Hi i want to ask something.
I will try to predict lifetime period. Dataset has this information as months in column and i'll set it as target column for predict it. Lifetime period range is from 0 to 74. Which method should i use for predict it? I was thinking to use Linear Regression but these terms made me confused like Multioutput regression, multi label classification etc.

small ore Oct 14, 2018, 2:24 AM

#

That is an incomplete question with insufficient information. Either make a clear presentation of your problem keeping in mind the reader or ....

#

!t ask

arctic wedgeBOT Oct 14, 2018, 2:24 AM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

timid zealot Oct 14, 2018, 9:13 PM

#

Hello! I'm new here, although I'll do my best to make a clear case. I'm new to data science, and basically my problem is I'm trying to make a recommendation system, using knn algorithm (k-neearest neighbor) and Euclidean distance, mixed ofc with Panda and NumPy modules. I'm looking for some guide or cheatsheet or anything at all to get started, since the ones I've found all talk about the same example (some Iris flower test db that comes with SciKit module) and have little (at least to me) adaptability to my case.

#

What I want is this system to recommend users new artists by comparing their answers to other users who had answered in a similar manner. User would submit three artists of their liking and then the program would have to work with that

#

If I'm allowed to, I can attach some files, like the db and the code I have so far

small ore Oct 14, 2018, 9:24 PM

#

if the code is small enough to fit, post it here using codeblocks, else https://paste.pythondiscord.com/ is your friend.

timid zealot Oct 14, 2018, 9:26 PM

#

https://paste.pythondiscord.com/edufijiwek.py

#

there it is

real wigeon Oct 14, 2018, 11:23 PM

#

@small ore hey I was told you might be the person to speak to regarding selenium

daring bison Oct 15, 2018, 12:42 AM

#

Hi

#

I have a question, why cooling some metals enables them to become superconductive?

small ore Oct 15, 2018, 1:57 AM

#

@real wigeon What? You must be mistaken. I dont even know how it looks like. From my limited knowledge, Selenium does not seem remotely related to this channel

real wigeon Oct 15, 2018, 1:59 AM

#

welp

#

can't tell if I'm being trolled

small ore Oct 15, 2018, 1:59 AM

#

I am not someone to speak to for any topic for that matter. And in general you should not be looking at individuals in this server for any question/discussion even if it they are knowledgeable on a topic

real wigeon Oct 15, 2018, 2:00 AM

#

ok well thank you, I was in the help channel and someone told me I should speak to you.

small ore Oct 15, 2018, 2:00 AM

#

I can't tell if you were really trolled or you are trolling me 😄

real wigeon Oct 15, 2018, 2:03 AM

#

well played sir.. well played

lapis sequoia Oct 15, 2018, 2:19 AM

#

Ayo wha

#

@quiet gyro ^^

#

[better idea to ping @ Moderators or to ping an online mod?]

small ore Oct 15, 2018, 2:31 AM

#

Yeah. Need more active mods who look into this channel. I suspect this channel is going dead in ways

quiet gyro Oct 15, 2018, 2:47 AM

#

Better to ping @moderators

#

@real wigeon That sort of behavior and language isn't tolerated here

small ore Oct 15, 2018, 2:48 AM

#

I mean. If you could ban them ( I think this is repeat. I searched for them and they pinged the last person who asnwered earlier too) and preferably also delete all of what they said, I'd be glad

quiet gyro Oct 15, 2018, 2:49 AM

#

Bit preoccupied right now, I'll look in a bit, thanks for the heads up

real wigeon Oct 15, 2018, 3:14 AM

#

If you’re referring to me, this is the first time I’ve said anything remotely negative. Py_noob is being difficult and asserting that I should not ask questions. I assumed it was a joke so I responded in kind.

#

Throw the banhammer around if you want, life goes on regardless

lapis sequoia Oct 15, 2018, 3:19 AM

#

@real wigeon py_noob is being nothing

simple crag Oct 15, 2018, 3:19 AM

#

Enough

lapis sequoia Oct 15, 2018, 3:19 AM

#

it just looks like facknoobs misunderstood

#

some suggestion they got to talk to someone?

#

oh, sorry. didn't see I think this is repeat.

turbid bay Oct 15, 2018, 6:14 PM

#

hello, i am doing logistic regression and am trying to calculate the cost of some theta values. However, I am not getting the expected value I want with the code I have made. I was originally using Octave but have moved to python as I understand how to use it better. But still no luck. I will paste the entire code and would like to ask if anyone can spot any mistakes. The cost is supposed to equal 2.534819. https://pastebin.com/hsG3HcR6

Pastebin

[Python] import math x = [[1,0.1,0.6,1.1], [1,0.2,0.7,1.2],...

#

thanks in advance for whoever can help me solve this problem

polar acorn Oct 15, 2018, 8:02 PM

#

@turbid bay Can't really see that you're doing something wrong here. Are you sure you are comparing the correct examples?

spare karma Oct 15, 2018, 8:42 PM

#

Anyone familiar with making logistic regression models in python? I'm making my first model and can't seem to increase my accuracy_score() with the available variables. Specifically, whenever I include a 3rd, 4th variable, my score goes down. Does this mean that my model's accuracy in predicting outcomes is decreasing as I include more and more variables?

#

I'm trying to make the "best" model with what's available.

polar acorn Oct 15, 2018, 9:22 PM

#

@spare karma are you testing out of sample? You might be overfitting, consider regularisation.

spare karma Oct 16, 2018, 2:19 AM

#

@polar acorn Thank you for the response. I think out of sample? I'm using a baked-in function to test, train and split (my instructor recommended one). I'll look into regularization. Are there any general conventions towards selecting variables? (If interested, code below - kobe bryant data.)```

fit a logistic regression model and store the predictions

feature_cols = ['combined_shot_type_numeric','season_numeric', 'shot_distance', 'minutes_remaining']
X = kobe[feature_cols]
y = kobe.shot_made_flag

model = Model()
model.fit(X, y)
kobe['pred'] = model.predict(X)

from sklearn.metrics import accuracy_score
accuracy_score(kobe.shot_made_flag, kobe.pred.round())```

#

0.5952834961279527

small ore Oct 16, 2018, 7:11 AM

#

Could you show more code please? I am unable to know what your Model class is. If regularisation does not work, then consider reviewing the model. Like introducing more variables which could be some powers of the existing variables and do some analysis to determine which variables contribute little to your model

#

Disclaimer.I am not an expert. I am trying to learn from online sources and through forums like these

turbid bay Oct 16, 2018, 8:44 AM

#

@polar acorn thats why im extremely confused too. because i think its working correctly. and im using the data from a course on coursera i copied and pasted it pretty much so im 99% sure its the right data. my only thought is. is that ther is multiple ways of calculatong cost so possibly this is just a different way compared to what they expected you to use on the course

dim wolf Oct 16, 2018, 8:45 AM

#

Hi! I have probably trivial question, but I lack the proper word to search effectively for information on it. If I do a standard plt.plot(x,y) using matplotlib.pyplot, and y is a numpy array of large numbers, python will automatically remove some appropriate power from the y-tick labels, and write that power in the upper left corner of the plot. Can I control this feature somehow, and and tell pyplot which factor should be divided out of the ticks? Most importantly, can this also be done for plt.yscale('log') ie a logarithmic axis? Code example here: https://paste.pythondiscord.com/nomamojofe.py

dim wolf Oct 16, 2018, 9:02 AM

#

I mean, I realize that I can just divide my y-array with some number and annotate that in the plt.ylabel, but since the automatic function is already there, it would make sense that it could also be controlled by the user.

small ore Oct 16, 2018, 9:05 AM

#

@turbid bay it is not only efficient but also easier to code and debug if you use those as vectorised stuff.

turbid bay Oct 16, 2018, 9:17 AM

#

@small ore i am using vectorisation in my octave code. however i dont know how to do vectorisation in python

small ore Oct 16, 2018, 9:17 AM

#

Using numpy methods. You dont have to code yourself

#

Numpy has arrays, matrices and ndarrays. I am not sure which one is a good fit but I think any of those will work with corresponding methods for transpose and such

#

If you have problems in your octave bits I will try to help provided the head guys have no problems

turbid bay Oct 16, 2018, 9:47 AM

#

ill msg u privately with octave questions

spare karma Oct 16, 2018, 1:14 PM

#

@dim wolf I'm pretty new, so my opinion isn't worth much but if it were me, and there's a deadline associated to your plot, I'd just extract what I need manually into a separate df.

#

@small ore I can introduce powers to existing variables? That's awesome. I never thought of that. Not sure what you want to see, I'm currently at ~100 lines.

small ore Oct 16, 2018, 1:46 PM

#

I mean if that model.fit is a fancy method which already automatically calculates what variables/powers are needed and what are not and does a super good fit all of what I said regarding bettering the model makes no sense. I just wanted to see what the Model class is

#

And please note: your problem still stays linear if you introduce powers coz these become the co-eff of the function rather than the 'variable'. So when I said 'variable' to your feature_cols , I was in some sense wrong

spare karma Oct 16, 2018, 7:35 PM

#

@small ore If only they'd make em' that way, lol. On the flipside, then they wouldn't be that much fun to produce. Anywho i'm using from sklearn.linear_model import LogisticRegression as Model

wispy harbor Oct 17, 2018, 2:51 AM

#

hey guys i have been trying to upload my ipynb to github. Once they are done uploading I am not able to view them instead it gives me this error

#

📎 Screen_Shot_2018-10-17_at_13.51.09.png

lone mist Oct 17, 2018, 3:03 AM

#

try clicking on "raw" at the top right

wispy harbor Oct 17, 2018, 3:55 AM

#

@lone mist yea but how does it help?

lone mist Oct 17, 2018, 3:57 AM

#

it will show you the contents of the file

#

isn't that what you wanted

small ore Oct 17, 2018, 10:29 AM

#

from sklearn.linear_model import LogisticRegression as Model . I wonder why these modules use such conventions in examples as well as their code.

knotty nexus Oct 17, 2018, 5:54 PM

#

as a first assignment I've been asked to perform a classification task based on JIRA user story summary, description and activity data. Sample size is ~100k user stories. Classification is based on which 'customer' is being supported by said user story, eg, regulators, internal IT improvement, actual product improvement, etc. Any idea which model is optimal for such a case? If supervised, how big should my output sample be?

vestal axle Oct 19, 2018, 12:00 PM

#

List all the odd numbers between 1 and 200 that are divisible by 5. Your result should be a list containing a set of integers satisfiying this condition.

#

Anyone who can help me on this?

hollow gulch Oct 19, 2018, 12:33 PM

#

anyone able to turn this code so that instead of cumcount being inserted into the df, it would use the max count value for each group? Using pandas

#

📎 unknown.png

twilit bolt Oct 19, 2018, 12:49 PM

#

@vestal axle something like this maybe?

#

set_of_integers = []

for i in range(1, 201, 2):
    if (i%5)/5 == 0.0:
        set_of_integers.append(i)

vestal axle Oct 19, 2018, 12:51 PM

#

Yeah, that works! Thanks FONZ 😃

twilit bolt Oct 19, 2018, 1:14 PM

#

@hollow gulch why not just use .count() instead of .cumcount()? Or am I misunderstanding your objective?

hollow gulch Oct 19, 2018, 1:19 PM

#

@twilit bolt for some reasons using count for groupby doesnt work

#

unless i miss something

#

is there anyway to list all the function that I am typing out while using spyder? so I have some idea of what I could do

lyric canopy Oct 19, 2018, 1:25 PM

#

What are you counting? Values within Address Line 1 or do you want to know the count each group has?

#

@hollow gulch

hollow gulch Oct 19, 2018, 1:27 PM

#

@lyric canopy thanks for replying, this is what my df looks like

#

not exactly the problem I have but similar and easier to understand

#

📎 unknown.png

#

I want to create a count column by group to produce result so for example with 'ST', the data in the above example would look like this 2 2 1 1 1

#

the groupby.cumcount + 1 wold look like this 1 2 1 1 1

lyric canopy Oct 19, 2018, 1:29 PM

#

Right, just total count per group on each of the lines of that group

hollow gulch Oct 19, 2018, 1:29 PM

#

yep ❤

#

here is my code

#

df2['Count']=df2.groupby('Address Line 1').max(df2.groupby('Address Line 1').cumcount()+1)

#

the max function doesnt work 😦 I don't know which function available that I could use so I just test out random syntax hope it would work

#

its different from excel where as you start typing, it suggest a list of function available to use. this one doesn't give suggestion which make it harder to code

lyric canopy Oct 19, 2018, 1:33 PM

#

Right. Well, it's been a while since I used panda's, so I'm going to experiment for a minute

#

But, you can find a list of all the groupby functions on this page: https://pandas.pydata.org/pandas-docs/stable/api.html#id39

#

So, df2.groupby('Address Line 1').size() will get you the group sizes

#

Now, you still have to add them to the dataframe

#

This should work:

hollow gulch Oct 19, 2018, 1:40 PM

#

how do I add it to each line?

lyric canopy Oct 19, 2018, 1:41 PM

#

df2['Count'] = df2.groupby('GROUPINGTHINGY').transform(len)

should work I think

#

Wait

#

That wrogn paste

#

Wait, now it doesn't work anymore.

hollow gulch Oct 19, 2018, 1:44 PM

#

yeah, individual syntax work

#

like print (df2.groupby('Address Line 1').size())

#

but I dont know how to put that value into the data frame for each row based on the 'Address Line 1' value

lyric canopy Oct 19, 2018, 1:47 PM

#

Yeah, I know, but there's a simple way to do it with one statement, it's just that I can't remember it because it's been months since I touched pandas. I can only hack it currently:

>>> df['count'] = df.groupby('A').size()
>>> df
   A  count
0  a    NaN
1  a    NaN
2  a    NaN
3  b    NaN
4  b    NaN
5  a    NaN
>>> df['count'] = df.groupby('A').transform(len)
>>> df
   A  count
0  a      4
1  a      4
2  a      4
3  b      2
4  b      2
5  a      4

#

But, that should be possible in one line

hollow gulch Oct 19, 2018, 1:52 PM

#

what do you mean you hack it 😛

twilit bolt Oct 19, 2018, 2:04 PM

#

If I'm understanding correctly, this should do the trick.: ```df = pd.DataFrame({
'City' : ['BILLINGS', 'LANSING', 'HICKORY', ' HAYWARD', 'NORTH EAST', 'SAN DIMAS'],
'ST' : ['MT', 'MI', 'NC', 'CA', 'MD', 'CA']
})

df

df['Count'] = df.groupby('ST')['ST'].transform('count')```

lyric canopy Oct 19, 2018, 2:07 PM

#

That's it, yeah.

#

Couldn't get it to work because I tried:

df['Count'] = df.groupby('ST').transform('count')

So, without the ['ST']. I

#

@hollow gulch Did you see the answer above?

hollow gulch Oct 19, 2018, 2:52 PM

#

what does transform do?

#

let me look it up real quick

#

the syntax looks a bit weird because I dont see groupby () [] next to each other that often

lyric canopy Oct 19, 2018, 2:53 PM

#

Okay.

hollow gulch Oct 19, 2018, 2:53 PM

#

way i understand it is group() and () would call function or combination of condition

#

and [] is to apply array

lyric canopy Oct 19, 2018, 2:54 PM

#

df.groupby('ST') creates a "groupby" object (grouped by 'ST')

#

Next, you select the ['ST'] column of that object and use count on it

#

So, it's equivalent to:

#

grouped_df = df.groupby('ST')
df['Count'] = grouped_df['ST'].transform('count')

hollow gulch Oct 19, 2018, 2:58 PM

#

and I didnt know you could do transform ('count') to command a function using a string

#

i thought it has to be something like groupby().command here

#

learning so much ❤

#

thanks you the greatest teachers ❤ @lyric canopy and @twilit bolt

hollow gulch Oct 19, 2018, 3:18 PM

#

so to have a better understanding of how things work, grouped_df['ST'].cumcount() would give cumulative count for each row within a group.

#

grouped_df['ST'].count() assume it would work would give count of the group but it's not in a usable format (wonder why)

#

so we need to use grouped_df['ST'].transform('count') to put it in the right format (I still don't know which format we are in or needed for each of this)

#

i assume the 2 format we dealing with here are array + int. For Array to work, it has to match the size.

#

I am not used to the transform function and what it does

twilit bolt Oct 19, 2018, 3:23 PM

#

@hollow gulch It seems like you are relatively new to Pandas. May I suggest 10-minutes to Pandas and the Cookbook, short and sweet examples. If you work through the examples therein, you'll be able to tackle about any data wrangling issues you may encounter.

hollow gulch Oct 19, 2018, 3:24 PM

#

❤ thanks so much for the resources I’ve been trying to find a good source so I can study from. I want to move my excel work to panda but syntax is what usually throw me off

twilit bolt Oct 19, 2018, 3:29 PM

#

You're welcome.

vestal axle Oct 19, 2018, 3:37 PM

#

📎 unknown.png

#

Anyone who know's how to solve this?

#

I need to run a four loop, but how should i code it?

lyric canopy Oct 19, 2018, 3:42 PM

#

What have you tried so far?

#

And, do you need to use for-loops? Because you don't need to if you don't

#

Well, I've gtg @vestal axle , but there are plenty of options for calculating the correlation without explicitly using for-loops yourself.

small ore Oct 19, 2018, 4:50 PM

#

I am curious what course that is

proud raven Oct 19, 2018, 8:01 PM

#

I don't know if this was posted elsewhere here but Twitter released a data store for all the accounts related to Russian trolling. Could be a fun dataset to play with. https://about.twitter.com/en_us/values/elections-integrity.html#data

Elections integrity

lapis sequoia Oct 20, 2018, 6:21 AM

#

im pondering the thought of using machine learning to identify what type of log file i am going to perform some graphing on. Would that be possible? Right now i am using pandas ```py
df.columns.tolist() == my_dict.get("type"):
plottype = type

But i feel a little automation there would be cool? would it be possible?

regal yarrow Oct 20, 2018, 6:22 AM

#

Hey

placid snow Oct 20, 2018, 6:24 AM

#

This is not the place for advertisement. @regal yarrow

regal yarrow Oct 20, 2018, 6:25 AM

#

sorry

lapis sequoia Oct 20, 2018, 6:39 AM

#

anaconda is a "closed" enviroment right? it wont mess up my already installed python3 stuff?

lapis sequoia Oct 20, 2018, 6:57 AM

#

hm, installed latest anaconda and started a "conda" project in pycharm, but it said python 3.6. Is it not 3.7 wich comes with conda?

vestal axle Oct 20, 2018, 11:01 AM

#

Vesiculus I managed it

#

py_noob its data analysis in python im on my second year of Msc Finance

#

in

small ore Oct 20, 2018, 3:03 PM

#

Nice!

minor bolt Oct 22, 2018, 11:03 AM

#

any help with timeit

lyric canopy Oct 22, 2018, 11:06 AM

#

Post your question and I'm sure people will help you

hollow gulch Oct 22, 2018, 1:42 PM

#

Good morning, I am trying to make new column so that if QTY and MAX QTY >0, they stay the same, otherwise QTY=0 and MAX QTY = 100000. Datatype is object. and I attempted to convert them to numetic but failed. can anyone help please

#

📎 unknown.png

#

df1[['QTY','MAX QTY']] = df1[['QTY','MAX QTY']].apply(pd.to_numeric)

#

I also tried this pd.to_numeric(df1[['QTY']], errors='coerce').fillna(0).astype(int)

twilit bolt Oct 22, 2018, 2:29 PM

#

@hollow gulch This is the way I would do it ```df = pd.DataFrame({
'qty' : [1, 2, 450, '--'],
'max_qty' : ['nan', 2, '-', '--']
})

df

df.replace(['nan', '--', '-'],[9999, 9999, 9999 ], inplace = True)
df.loc[(df['qty'] == 9999 ) | (df['max_qty'] == 9999), ['qty', 'max_qty']] = 0, 100000```

hollow gulch Oct 22, 2018, 2:34 PM

#

Thanks @twilit bolt as always ❤

#

I was hoping there was a better way to capture it instead of manually identify all miscellaneous items

#

it is what it is then ❤

twilit bolt Oct 22, 2018, 2:36 PM

#

I'm sure there is, using the 're' module. But, I'll leave someone else to come up with that solution. As still can't get my head around working with combinations of strings.

#

You may want to look into it, https://docs.python.org/3/library/re.html

hollow gulch Oct 22, 2018, 2:37 PM

#

quick and simple solution sometimes better as I do need to get this project move on, thanks for all the help

twilit bolt Oct 22, 2018, 2:37 PM

#

np

hollow gulch Oct 22, 2018, 2:37 PM

#

I love python because it has great learning curve and so much to its capability

#

and those are the question I keep on the back of my heads that one day I can come up with a better solution

#

I hope python is a good solution to my data analyics day to day project and it would be better than excel/VBA. I found it more amusing because I'd have to code the same thing in VBA, might as well just do all of the tasks in python and save the workload if I need to go back and edit anything

naive hornet Oct 23, 2018, 2:17 AM

#

this channel is meant to serve somewhat as a data science discussion location, but this is also a Python server first and foremost. might be a question better suited to an off topic channel

foggy junco Oct 23, 2018, 2:29 AM

#

Is there alot of math in datascience

foggy junco Oct 23, 2018, 2:56 AM

#

Hello

#

Yo

placid snow Oct 23, 2018, 4:19 AM

#

Depends what you do.

#

There is also no need to bump your message, this isnt a forum

naive orchid Oct 23, 2018, 6:05 AM

#

I'm getting a memory error from loading a CSV file (a few GB) into pandas. Is there a way to split up the file and analyze it piece by piece?

lyric canopy Oct 23, 2018, 7:22 AM

#

@naive orchid Sure.

#

Take a look at the chunk size parameter of https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.read_csv.html

#

It uses http://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking to read a file chunk by chunk

olive trench Oct 23, 2018, 9:08 AM

#

Guys do you know what could trigger the setting with copy warning in pandas? I have 9 columns I'm doing an operation to and I only get the warning on two of them.

#

this is my code

    for column in metadata_part.columns:
        new_col = metadata_part.loc[:, column].astype(CategoricalDtype(categories=unique_vals[column]))
        metadata_part.loc[:, column] = new_col

polar acorn Oct 23, 2018, 9:18 AM

#

Setting with copy warning is a mysterious beast. But it usually means you are trying to write to something that is a view of a dataframe. In your case where does metadata_part come from? Is it a subset you've taken from a larger data frame?

olive trench Oct 23, 2018, 9:19 AM

#

It is. And I didn't use .loc for it. I probably should, right?

polar acorn Oct 23, 2018, 9:23 AM

#

Might not matter actually. If you want to change the parent dataframe you should change that directly. If you want to change only a copy of the subset and work with that leaving the parent df intact you can use .copy() when you subset it i.e. metadata_part = parent_df.loc[].copy()

olive trench Oct 23, 2018, 9:24 AM

#

I am not working with the source dataframe any further. I might try the .copy() thing. I am mostly confused it warns me only on two of my columns but not on the rest

polar acorn Oct 23, 2018, 9:27 AM

#

As I said it is a strange warning that sometimes triggers when things are fine and sometimes doesn't even when it should. But in general you should always be fine if you're working on a df that is not a shallow copy (either original or subsetted with .copy()) and using only one .loc or .iloc statement to subset it.

olive trench Oct 23, 2018, 9:27 AM

#

Alright, thank you very much for the help. I'll look more into it and see if anything changes!

twilit bolt Oct 25, 2018, 9:29 PM

#

@hollow gulch Something like this? df['Count'] = 1 df.loc[(df['Tax Y/N'] != df['Tx Ex']) | (df['Tx Ex'] == 'NaN'), 'Count'] = 2

blazing bolt Oct 26, 2018, 2:59 PM

#

Has anyone got any good suggestions for creating tables and exporting them automatically using python, in either a .png format or straight onto a powerpoint? Has to be compatible with pandas.

#

???

hallow obsidian Oct 26, 2018, 4:47 PM

#

Say I partition the unit interval [0, 1] into a grid via u=1/(m-1) and G=(k*u for k in range(m)). I want a function which maps a real x to the value in that interval that's closest to x. Is there a better way that doing a sort of logarithmic walk starting in the middle of Gand using abs for comparison? Any libraries which do that. I need to do it hundred thousands of time in a numerical execution.

twilit bolt Oct 26, 2018, 8:22 PM

#

@blazing bolt I know statsmodels allows you to create simple tables from data and save them either to html, csv, or LaTex. https://www.statsmodels.org/dev/generated/statsmodels.iolib.table.SimpleTable.html

blazing bolt Oct 26, 2018, 9:29 PM

#

Thanks :')

lyric canopy Oct 27, 2018, 8:11 AM

#

@blazing bolt You can use Pandas with Matplotlib to create table figures. It's not that difficult to do, but I don't know how difficult it is to make them look pretty. I usually export my table data in LaTeX markup to display it in the style of the document I'm writing, but that may not suitable for your application (Powerpoint; I mainly use LaTeX/pdf for presentations as well).

#

Here's a short tutorial on plotting tables with graphs, but if you skip the graph part, you should be able to just display/export the table in a graphical form: http://pandas.pydata.org/pandas-docs/stable/visualization.html#plotting-tables

#

I've never used it, though, so you maybe have to google a bit for the appropriate approach

#

Anyway, I hope the terminology on that page I linked you helps with that

copper swan Oct 27, 2018, 12:52 PM

#

where do i learn the python libraries for data science

#

and machine learning

earnest prawn Oct 27, 2018, 12:54 PM

#

in their documentation and stack overflow

#

as for many libraries

small ore Oct 27, 2018, 12:56 PM

#

Most YT videos are not good though. Either they use obsolete python 2 or they use dirty conventions and can be confusing.

#

On their sites, they have their own tutorials alongisde the documentation. So that should help too

small ore Oct 27, 2018, 5:11 PM

#

Scroll down to python section and the sections below in https://medium.com/machine-learning-in-practice/over-150-of-the-best-machine-learning-nlp-and-python-tutorials-ive-found-ffce2939bd78
That has some suggestions

Medium

Over 150 of the Best Machine Learning, NLP, and Python Tutorials I...

By popular demand, I’ve updated this article with the latest tutorials from the past 12 months. Check it out here

lapis sequoia Oct 27, 2018, 6:44 PM

#

📎 bJWPjx1Cm1CAKCgCDwKBD4H22f7LV7dlAhAAAAAElFTkSuQmCC.png

#

📎 VB43tmvD2iNQh05pRIc0ulFWqJpQmBtxYyQQAEQAAEiAA5fZHPAq07CPSA4wLeCBAAARAAARAAARDgJyA2GDDT5EeDXBAAARAAAR.png

#

what am I doing wrong ;;

lyric canopy Oct 27, 2018, 7:26 PM

#

You need to calculate the correlation between the first and the second column.

#

In this case, you've only provided one of the two as an argument

#

Try this:

#

corr = np.corrcoef(np_baseball[:, 0], np_baseball[:, 1])

#

That will return a correlation matrix as an array, I think
@lapis sequoia

lapis sequoia Oct 27, 2018, 7:43 PM

#

OH I GOT IT NOW

#

THANK YOU SO MUCH

young aurora Oct 27, 2018, 8:37 PM

#

I need help with data cleaning and asked in help-0. Is it kosher for me to ask here, too?

#

Data cleaning/merging, rather.

lean ledge Oct 27, 2018, 8:47 PM

#

Covariance makes me very happy tbh

#

Non-zero correlation/covariance terms are best

small ore Oct 27, 2018, 8:51 PM

#

@young aurora It is always best to ask those questions here. This channel is not as active as the help channels but there is a chance your question is lost in the help channels if no one around there knows the answer. I have seen people go through old questions and answer in this channel a week after it was asked

young aurora Oct 27, 2018, 8:53 PM

#

If I want to combine 35 CSVs that have somewhat different column names, what's the best way to do it?
I'm trying to do it with "usecols" from Pandas, but when there aren't column names that I'm specifying within a CSV, it spits an error.
Here's an example

                                                                   'COL2',
                                                                   'COL3',
                                                                   'COL4',
                                                                   'COL5',)
    for f in os.listdir(os.getcwd()) if f.endswith('csv')]

combexample = pandas.concat(example, axis=1, join='inner').sort_index()```

So, some of the files might not have COL4 (it's technically the same info within the row, but the header name is different). How can I get it to combine properly?
I'd love any help I can get.
If it helps, there are 40 total files.

#

There's the question, for what its worth - thanks, @small ore

lapis sequoia Oct 27, 2018, 11:46 PM

#

Are there any experts here in Sqlalchemy? I was wanting to have a code review done on a project I’m working on.

trail flicker Oct 27, 2018, 11:48 PM

#

ill take a look

#

this probably belongs in #databases tho

foggy junco Oct 28, 2018, 8:33 PM

#

Hello?

lyric canopy Oct 28, 2018, 8:34 PM

#

Hi

hearty hazel Oct 29, 2018, 2:00 AM

#

@heavy brook We don't currently have a system for recruitment, we're trying to figure out the best way to handle it right now.

#

We tend to remove recruitment messages for the sake of safety

young aurora Oct 29, 2018, 3:13 AM

#

So I've got a function I've written that gives me a number as it's return. I'd like to loop it X times, assigning an ID to each loop + the value within the two columns "Test Number" and "Value". How can I do this?

heavy brook Oct 29, 2018, 3:32 AM

#

im currently working on multi gaze tracking/estimation

#

I dont have an ML background, but I am good at piecing things together

#

im learning python along the way

#

anyhow, the current challenge I want to solve right now is wether someone is looking at the screen or not

#

i'll make a tv stand, put a tv and then put a camera on top

#

I have seen some open source code already for head pose estimation

#

my main question now is, is there a formula to determine if someone is looking directly given a roll, pitch yaw? plus some extra params? distance from the screen angle maybe?

heavy brook Oct 29, 2018, 4:10 AM

#

wow my code is slow AF T_T

fleet ginkgo Oct 29, 2018, 4:13 AM

#

I have a dataset that has a predictor column with two values: 'Fully Paid' and 'Charged Off'. For data visualization purposes, I split data into two. Unfortunately, the 'Fully Paid' subset has twice as many rows as the 'Charged Off' subset. I made a function to randomly shuffle the 'Fully Paid' subset and match length of the 'Charged Off' subset to see if I can understand the full picture better, but every time I run the function, the plots are way different each time. Is there a way I can work around this problem to better handle keeping the size of the two subsets the same but to somehow include the "entirety" of the larger subset? I hope I was clear enough.

polar acorn Oct 29, 2018, 9:34 AM

#

Why would they need to be the same length again?

small ore Oct 29, 2018, 3:09 PM

#

@heavy brook ML is about finding the prediction formula. I have no expertise in determining the (roll, pitch, yaw) of a certain face but I imagine it is needs database of several faces rotated at various (pitch, yaw, roll) each and a formula is determined by number-crunching from that database

hoary terrace Oct 29, 2018, 4:40 PM

#

yo i wanna start a project using AI. Starting from scratch so does anyone have any ideas of a good project idea to do?

weary ferry Oct 29, 2018, 6:21 PM

#

sure

#

a good first project is a Mancala Bot

lean ledge Oct 29, 2018, 7:14 PM

#

@small ore @heavy brook If you have the position and angles of your gaze then calculating whether it's looking at the screen is just simple maths

small ore Oct 29, 2018, 7:16 PM

#

How do you find the position and angle of the gaze? I Understood it as him trying to find the position and angle rather than a binary answer of whether or not he is looking at the screen

#

@lean ledge

lean ledge Oct 29, 2018, 7:17 PM

#

is there a formula to determine if someone is looking directly given a roll, pitch yaw? plus some extra params

#

hecc, you actually dont even need ML for this

#

you can do some fairly simple CV

#

do some face detection, do some feature extraction, a transform for the eyes, get location, compare it to rest location

small ore Oct 29, 2018, 7:18 PM

#

Hm. Yes. That final question seems to mean the opposite but if you look at the rest of the question including
im currently working on multi gaze tracking/estimation

foggy junco Oct 29, 2018, 7:19 PM

#

https://towardsdatascience.com/why-so-many-data-scientists-are-leaving-their-jobs-a1f0329d7ea4

Towards Data Science

Why so many data scientists are leaving their jobs – Towards Dat...

Frustrations of the data scientist!

#

Is this true lol

#

People are leaving data science jobs now

#

Welp

lean ledge Oct 29, 2018, 7:20 PM

#

They're not really

#

I'd ignore 80% of stuff that's on medium

small ore Oct 29, 2018, 7:21 PM

#

And yeah. I did think of transformation too . Wasn't sure if it could be applied

foggy junco Oct 29, 2018, 7:22 PM

#

A data scientist made the article

#

So this could be accurate

lean ledge Oct 29, 2018, 7:23 PM

#

could be. It's most likely not. I too am in data science and nobody wants to leave, they earn too much to :P

#

Everything he said is part of all jobs

#

Just worse in jobs where data science isnt the main business of the company

#

In which case I point you to a bunch of other devs who have the same problems in non tech companies

twilit bolt Oct 29, 2018, 7:24 PM

#

A bit of sample selection bias in those numbers, I think. But, what do I know.

lean ledge Oct 29, 2018, 7:25 PM

#

He provided like no numbers

twilit bolt Oct 29, 2018, 7:26 PM

#

The article, states, in part >These data were collected by Stack Overflow in their survey based on 64,000 developers.

foggy junco Oct 29, 2018, 7:26 PM

#

But he said he wad a dats scientist

twilit bolt Oct 29, 2018, 7:27 PM

#

Right, lol

foggy junco Oct 29, 2018, 7:27 PM

#

Why would they lie about that its a popular article

#

And he proves he knows his stuff

lean ledge Oct 29, 2018, 7:27 PM

#

The only piece of data is ML/DS people are looking for different jobs, which might as well be because there's lots of new high paying jobs coming up

#

I didnt say he wasnt a data scientist

#

I said being a data scientist doesnt at all make him an authority on the market and motivations of thousands of other data scientists

#

Everything else was an opinion piece from his job working in a non-tech company

foggy junco Oct 29, 2018, 7:29 PM

#

It was fact based, he proves he knows his stuff lol

#

Unless u can correct his fact mistakes in the article

lean ledge Oct 29, 2018, 7:30 PM

#

He said no facts lmao, he gave like 1 piece of statistics

foggy junco Oct 29, 2018, 7:31 PM

#

He said alot about vector algorithms

lean ledge Oct 29, 2018, 7:31 PM

#

So do I

foggy junco Oct 29, 2018, 7:31 PM

#

Everything he said made sense

lean ledge Oct 29, 2018, 7:32 PM

#

If your reasoning is that he's a data scientist, I am too. I work in a data science company. My coworkers are data scientists. We work with clients who have their own data science teams yet pay us to do things. Every deal we make gets verified by their data scientists and the data scientists in our accountancy firm (Deloitte)

foggy junco Oct 29, 2018, 7:33 PM

#

So basically u disagree with this guy that is also a data scientist

#

Did he get any of his facts wrong in the article

lean ledge Oct 29, 2018, 7:34 PM

#

I'm not saying he doesn't make sense, I'm saying his reasoning is personally based, and largely applicable only in scenarios where you're a tech employee in a non tech company, which is far from only true for data scientist and can not at all be generalised to explain the motivations of thousands of other data scientists that are looking for a new job

#

Jesus never mind

foggy junco Oct 29, 2018, 7:35 PM

#

So is the everyone looking for a new job and leaving data science jobs true or just a troll

foggy junco Oct 29, 2018, 7:58 PM

#

Well like ur a data scientist is this true lol

lyric canopy Oct 29, 2018, 8:06 PM

#

So, what I've seen around me is that a lot of people from our masters end up in data science. Most of our students are actually scooped up before they even graduate.

#

So, I'd take that article with a grain of salt

#

Still, there are some decent points made in that article

#

Some that apply to more fields than just data science

late garnet Oct 29, 2018, 8:11 PM

#

I think that the article had a great take away of managing expectations of the company. Know what you are getting yourself into prior to joining.

#

I'm in a similar situation as what the article describes - a company in it's infancy when it comes to DS.

#

This can become frustrating at times when expectations on both sides are not agreed upon.

small ore Oct 29, 2018, 8:12 PM

#

And a lack of data on their part?

late garnet Oct 29, 2018, 8:12 PM

#

Depending on the project, yes.

foggy junco Oct 29, 2018, 8:33 PM

#

I thought most students in data science come out with a bachelors?

lean ledge Oct 29, 2018, 8:41 PM

#

most people in data science have masters, maybe even PhD

#

the problem with data science market is the number of companies following the hype that hire people without knowing what data science is for

placid snow Oct 29, 2018, 8:42 PM

#

Can confirm companies scoop up people for hype techs

lean ledge Oct 29, 2018, 8:44 PM

#

they may think their excel sheet with a hundred rows is Big Data, may hire someone without having anything to give them and then expecting something out, may only hire 1 person (1 is pretty much never enough), etc etc. They dont know how to pick the right person because DS really isnt anything like normal software dev.

#

Data scientists likely arent leaving jobs because data science can be a bad job but because there's a ton of companies, even good companies, that have no reason or way to manage data scientists which makes them miserable and because there's an increasing number of more lucrative opportunities daily

lyric canopy Oct 29, 2018, 8:49 PM

#

@foggy junco Getting a masters is pretty much standard here in The Netherlands, but our system is a bit different. It's almost impossible to start a Ph.D. without having completed a masters degree first, since, well, that's seen as the "base" degree for university.

#

The only field I've seen Dutch students working on a Ph.D. without getting their masters first a couple of times is medicine.

foggy junco Oct 29, 2018, 8:51 PM

#

So basically data science is starting to die

lean ledge Oct 29, 2018, 8:52 PM

#

no

#

it's always been this way

foggy junco Oct 29, 2018, 8:57 PM

#

Swe might be more secure than data science in job security wise but idk

lean ledge Oct 29, 2018, 9:01 PM

#

completely different careers with very different skillsets

late garnet Oct 29, 2018, 9:10 PM

#

@lean ledge there is more overlap between SWE and DS than you would think. 😃

lean ledge Oct 29, 2018, 9:11 PM

#

I wouldn't say so. Perhaps if they are at a smaller place and take on some responsibilities that would be handled by a data engineer

#

DS requires experimentation, research skills, exploration and experimentation, ability to stay up to date on literature + whatever maths and statistics skills you need that dont really match up with the skills a good SWE needs

late garnet Oct 29, 2018, 9:15 PM

#

The overlap that I am thinking of is more along the lines of algorithm development - writing unit tests, structuring libraries, etc.

lean ledge Oct 29, 2018, 9:18 PM

#

Not really the sort of stuff data scientists do from what I've seen. Data scientists I've seen do the exploration etc, come up with the right techniques etc in a notebook, pass the notebook on to someone else who writes it as proper deployable software. It's a norm in data science AFAIK

#

There's too much stuff to do for data scientists to be concerned with implementation

late garnet Oct 29, 2018, 9:19 PM

#

@lean ledge agreed within enterprise environment.

lean ledge Oct 29, 2018, 9:19 PM

#

Where else is that not true? Startups?

#

I'm in a startup and it's still the norm here

late garnet Oct 29, 2018, 9:19 PM

#

Open source?

lean ledge Oct 29, 2018, 9:20 PM

#

Open source what?

terse pewter Oct 29, 2018, 9:24 PM

#

Hey, feature vector question I wanted to get clarified

#

Im working on a image classifier using a support vector machine

#

I've extracted various features such as hog, dominant colors, canny edge

#

I stumbled on this on quora:

#

if {a1,a2,a3,a4,a5}, {b1,b2,b3} and {c1,c2,c3,c4} are the features extracted from an object using different feature extraction mechanisms, concatenate them to form a single feature vector as {a1,a2,a3,a4,a5,b1,b2,b3,c1,c2,c3,c4} and use it for classification.

#

So if {a1,a2,a3,a4,a5,b1,b2,b3,c1,c2,c3,c4} is a list of all the features for one image in the training data and I have 500 images, would all of those concatenated vectors be placed into a vector containing: [ {a1,a2,a3,a4,a5,b1,b2,b3,c1,c2,c3,c4}, {a1,a2,a3,a4,a5,b1,b2,b3,c1,c2,c3,c4}, ... ] ?

lean ledge Oct 29, 2018, 9:27 PM

#

Sure, that works.

terse pewter Oct 29, 2018, 9:29 PM

#

Thanks Raggy for the confirmation

#

Additionally, I was wondering why in numpy.concatenate, in some of the examples, the array contents were mixing rather then just adding on to the end of the first previous array. Why is that?

lean ledge Oct 29, 2018, 9:34 PM

#

good question, i have no clue

small ore Oct 29, 2018, 9:37 PM

#

After days (months?) this channel saw some activity. Not sure if I am happy with the nature of activity

terse pewter Oct 29, 2018, 9:37 PM

#

lol thanks anyways raggy

lean ledge Oct 29, 2018, 9:40 PM

#

out of curiosity, what features did you extract from canny edge? just the pixel locations after non-maximal suppression?

terse pewter Oct 29, 2018, 9:45 PM

#

just the pixel locations, tbh I just looked up NMS because of your comment, I think I should have that haha

lean ledge Oct 29, 2018, 9:45 PM

#

NMS is part of canny edge. if you're using a library function, it's included.

terse pewter Oct 29, 2018, 9:45 PM

#

im new to ML and data science and just learning

#

ah

#

Yeah then just the pixel locations

lean ledge Oct 29, 2018, 9:46 PM

#

cool cool

terse pewter Oct 29, 2018, 9:47 PM

#

So for dominant colors I have a size of (7,3) and canny edge and hog have a shape of (100,100). Would I need to pad my dominant colors ndarray?

lean ledge Oct 29, 2018, 9:50 PM

#

shouldnt have to. you should only be comparing relevant features to the same type of features so shape consistency isnt important. you do need to find a way to weigh or normalise differences in feature vectors to get the difference between images

#

assuming you use something like L2 distance between two vectors for measure of "closeness", the distances will be on different scales

terse pewter Oct 29, 2018, 9:57 PM

#

Makes sense, I asked because functions such as np.concatenate, np.hstack, np.vstack, np.column_stack, complain about the shape of the dominant colors. True, I haven't yet thought about normalizing the data yet, was going to get the vector filled and then go back to look into normalization

#

Most examples I've seen online, people have the same rows but different columns, but I have yet to see a different number of rows

terse pewter Oct 29, 2018, 10:45 PM

#

What if I flattened my data and then concatenated it?

#

Oh but that would cause data loss if I have something like a black and white image array as a feature?

hollow gulch Oct 30, 2018, 5:52 PM

#

anyone knows why it won't iterate?

#

📎 unknown.png

simple crag Oct 30, 2018, 5:54 PM

#

len() returns an integer

#

Perhaps you mean to also use range()

hollow gulch Oct 30, 2018, 5:56 PM

#

oh i see

#

range return an array?

simple crag Oct 30, 2018, 5:57 PM

#

No, but it's iterable

desert cradle Oct 30, 2018, 5:57 PM

#

or just for row in df3

hollow gulch Oct 30, 2018, 5:57 PM

#

got not error

#

📎 unknown.png

#

new*

desert cradle Oct 30, 2018, 5:57 PM

#

for i, row in enumerate(df3) if you need the number

#

@hollow gulch it needed to be for row in df3 i think

#

it'd be range(len(df3)) if you were doing what ELA suggested though

hollow gulch Oct 30, 2018, 5:59 PM

#

📎 unknown.png

#

still error

#

I tried all 3

lyric canopy Oct 30, 2018, 6:17 PM

#

This is a different error

#

Your if-statement there is wrong

#

You need to use and instead of &

hollow gulch Oct 30, 2018, 6:56 PM

#

in what event use & | / 'and ', 'or'

desert cradle Oct 30, 2018, 6:56 PM

#

& | are bitwise, or memberwise on numpy arrays

#

and/or are short-circuiting and strictly boolean

#

so if you do foo() and bar() and foo returns true, bar won't even be called

#

though, the other issue [and the actual reason yours fails] is that & is tighter-binding than ==/!=

#

anyway, it's partially a style thing (even though the short circuiting does mean a bit for efficiency) - it's very unusual to use &/| for connecting conditions within an if statement.

#

you'd need parentheses around each of the conditions if you did

small ore Oct 30, 2018, 7:14 PM

#

But the eror complains about them being strings no?

desert cradle Oct 30, 2018, 7:18 PM

#

yes

#

because it's just doing df4.iloc[row1, 2] & df3.iloc[row, 2]

#

just change the & to and

lapis sequoia Oct 31, 2018, 2:22 AM

#

do i need all built functions for data science?

#

in python

lean ledge Oct 31, 2018, 6:16 AM

#

@sa#8919 What do you mean?

#

@lapis sequoia

#

🤔

viscid aspen Oct 31, 2018, 3:35 PM

#

Hey, I have some wifi sensor time series accompanied with a schematic of the area where the sensors are located. Can anyone think of any tools that could help me visualize which sensor is pinged on the map over time ? Something where I can give it my schematic (Image/PDF) and map a column in the timeseries to some object on a map that would change color at a certain time in the animation?

hollow gulch Oct 31, 2018, 6:59 PM

#

📎 unknown.png

#

anyone know what the right syntax to do this?

misty sonnet Oct 31, 2018, 8:05 PM

#

@hollow gulch try using a linter?

fleet flower Oct 31, 2018, 10:31 PM

#

hey buddies

quiet gyro Nov 1, 2018, 3:27 AM

#

https://blogs.msdn.microsoft.com/uk_faculty_connection/2018/10/29/data-science-in-visual-studio-code-using-neuron-a-new-vs-code-extension/

Microsoft Faculty Connection

Lee Stott

Data Science in Visual Studio Code using Neuron, a new VS Code ext...

Guest post by Lorenzo Silvestri, Electronic and Information Engineering Student at Imperial College London. Introduction In this post, I’ll give a short explanation of neuron, a Visual Studio Code extension that aims to be a one-stop-shop for data scientists. It’s an exte...

placid snow Nov 1, 2018, 5:47 AM

#

Cool extension

fathom zenith Nov 1, 2018, 7:05 AM

#

Very cool extension! As an avid jupyter notebook and spyder user, I really ought to give it a shot

#

But I am yet to find an editor with such a convenient ipython terminal like spyder's

twilit bolt Nov 1, 2018, 11:42 AM

#

I would agree. Spyder is, for me, second to none when it comes to iPython implementation in an IDE.

viscid aspen Nov 1, 2018, 11:43 AM

#

I've never used it, would you mind giving some examples briefly ? What kind of features/workflows do you like the most about it ?

fathom zenith Nov 1, 2018, 2:06 PM

#

Spyder? I like it because it's got a variable explorer that displays dataframe data in a tabular form and the iPython terminal allows you to play around with commands before you include them in your code. You can also select and run a few lines of your code in the iPython terminal and it'll keep track of the variables created/changed so you can continue experimenting with those adjusted variables

spark summit Nov 1, 2018, 2:10 PM

#

So I've been trolling in here for a while

#

I have decided that I need to dig into some data-science studies, any recommendation on study materials, or specific topics that I should look into?

viscid aspen Nov 1, 2018, 2:27 PM

#

@fathom zenith Oh, nice. So it's kinda like R-Studio for Python if you're familiar with it.

fathom zenith Nov 1, 2018, 3:00 PM

#

@viscid aspen Yup, exactly! Should've just said that lol, R-Studio for Python

#

@spark summit I would definitely recommend Andrew Ng's machine learning course on Coursera. It's not in Python, it's in Octave but it shouldn't be too difficult to translate everything you learn, besides it's more theory-heavy so you'll only be coding to solidify the theoretical concepts.

#

I'm on Week 8 of it and I must say, that man has a knack for simplifying the most complicated concepts

#

And when you start translating it to Python you'll probably just use a library for much of it instead of writing the learning algorithms from scratch. But ground-up knowledge of the ML algorithms will help you optimize them better, adjusting the parameters, performing error analysis, devising ways to improve the performance of your algorithm, creating learning curves for bias-variance trade-off analysis and all that

#

I can't wait to finish it so I can start his Deep Learning course, and the best part is that that's in Python so I can immediately start using what I learn as I learn it

viscid aspen Nov 1, 2018, 3:09 PM

#

Yeah, I've been doing a lot of work in pandas for uni lately, and jupyter (even jupyterlab) gets annoying quickly. I like VS Code better for how clean it looks, but the variable inspector in spyder looks super useful. I'm assuming it's especially helpful for stuff like pd.Series and basically all numpy structures, since those don't have a nice HTML output in jupyter

fathom zenith Nov 1, 2018, 3:13 PM

#

Absolutely. That's exactly what it's super helpful for. You can inspect your dataframe variables and arrays like they're in spreadsheets.

#

What are you studying in Uni @viscid aspen ?

spark summit Nov 1, 2018, 3:15 PM

#

thanks @fathom zenith

#

how much is the course?

fathom zenith Nov 1, 2018, 3:19 PM

#

you can do it entirely for free

#

but if you want a cert, it's $79

#

well worth it if you ask me

viscid aspen Nov 1, 2018, 3:27 PM

#

@fathom zenith CompSci (Master's), this semester's project is ML-oriented with a focus on the modelling of a real world dataset (a pretty nasty one if you ask me)

analog musk Nov 1, 2018, 4:21 PM

#

hi everyone, i'm developing an image classifier with pytorch and im getting an accuracy of about 80% on my test set. When i predict the image label, few times i get totally wrong predictions. Is this normal or am i supposed anyway to get any prediction right even without having a 100% accuracy?

small ore Nov 1, 2018, 7:01 PM

#

How heavy or light are these spyder and R-Studio? I have an old spyder installation and I cant get it to use jupyter or ipython

lyric canopy Nov 1, 2018, 7:04 PM

#

R-Studio is an IDE for R. Spyder is one for Python that looks very similar to R-Studio. That's also why a lot of people use it, because they were used to R-Studio.

#

I don't know how heavy they are

small ore Nov 1, 2018, 7:10 PM

#

I read above something like R-Studio for python

lyric canopy Nov 1, 2018, 7:11 PM

#

I think what was meant was "Spyder is like R-Studio, but for python"

#

"So it's [Spyder's] kinda like R-Studio for Python if you're familiar with it."

small ore Nov 1, 2018, 7:12 PM

#

Ahh. Ty for the clarification

polar acorn Nov 1, 2018, 8:04 PM

#

Coming from R myself Spyder felt very familiar but also a bit unpolished compared to PyCharm which now has a scientific mode. Haven't looked at Spyder for a year now though so it might have improved.

small ore Nov 1, 2018, 8:07 PM

#

A scientific mode? Is that available on the community edition?

polar acorn Nov 1, 2018, 8:10 PM

#

Oh I'm sorry only on pro

woven tundra Nov 1, 2018, 8:57 PM

#

Jupyter is extremely beginner friendly though, you should start off with that

#

I can't wait until jupyterlabs goes off beta into launch, but right now I just don't want to risk it messing up my analyses

lean ledge Nov 1, 2018, 9:10 PM

#

Seconding Jupyterlabs, it's excellent

#

While we're on the topic, I believe Ng's course is rather bad and extremely mathematically shallow. I wouldn't recommend it. Being MATLAB focused isn't a plus either

#

Sort of lacking in theory

#

Columbia's course on edX is pretty good and is more language agnostic. Gives a much better and stronger grasp at fundamentals

polar acorn Nov 1, 2018, 9:34 PM

#

While that is true, the course can be very good depending on where you come from and what you want out of it of course. So first of all one should maybe think about ones goals and background.

long gate Nov 1, 2018, 10:21 PM

#

If i want to learn much about algorithms and data structure with focus on python. I have already completed a algorithms and data structure course in C, but want to read with python implementation.
Can anyone recommend a resource for data structures and algorithms with python? Don't really care about the medium, if it's printed book i'll buy it 👍 Video tutorials works great too 🙂

#

Just want to focus on the algorithms and data structure, for now i want to avoid diffrent modules like numpy and pandas

lean ledge Nov 1, 2018, 11:31 PM

#

@long gate data science isn't data structures and algorithms but look at CLRS which is all written in pseudocode. There's also elements of programming interviews python edition but I recommend the former.

#

CLRS is 👌

long gate Nov 2, 2018, 1:14 AM

#

Okey was mistaken then, but is this the book you meant? https://en.wikipedia.org/wiki/Introduction_to_Algorithms

Introduction to Algorithms

Introduction to Algorithms is a book by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. The book has been widely used as the textbook for algorithms courses at many universities and is commonly cited as a reference for algorithms in published pa...

#

@lean ledge

#

Nvm saw in the wikipedia page that it's called CLRS, thanks 👍

fathom zenith Nov 2, 2018, 5:10 AM

#

@lean ledge I agree that it's mathematically shallow, but it was never meant to be an academic in-depth course on machine learning and was instead meant to introduce people to it and quickly give them a few tools they can practically use in the workplace.

#

Because unless you're in academia or R&D, you won't really be delving deep into the math all of the time. Most libraries abstract away the math after all, but knowing what you're doing instead of treating every model like a black box and choosing the one with the highest accuracy really helps when using ML to solve problems. Especially when you have to explain your solution to your colleagues.

#

But it all comes down on how you like to learn and what you like to learn. If you're mathematically-oriented, it's the wrong course for you because he just explains the intuition behind the formulas and algorithms.
If you like programming though and and you want to quickly learn how to do some basic data science then it's absolutely the right course for you.

#

It comes down to your motivations, a really high level of math can be overwhelming to most people. But building little programs and watching them work is an absolute delight.

#

All in all, it's a great primer. So I wouldn't go as far as saying it's "bad". There's a reason more than 100,000 people have done it.

#

@viscid aspen what's the dataset, if you don't mind me asking

viscid aspen Nov 2, 2018, 6:29 AM

#

@fathom zenith wifi sensor time-series from a large European airport

#

There's a contractor that labels certain passengers with complicated rule engines etc. We're trying to model it and apply some basic ML

lean ledge Nov 2, 2018, 6:45 AM

#

@fathom zenith I think any decent data scientist will need a good understanding of the maths behind things. There's a reason almost all DS have at least a master's. Perhaps it's a good course for someone in HS or someone who only wants to do ML for fun but if it's a serious career goal, you very very likely should be familiar with the mathematical understanding.

fathom zenith Nov 2, 2018, 6:54 AM

#

Absolutely @lean ledge, but that's for data scientists. There are technology executives, data analysts, and hobbyists, and programmers with passing interests. Multiple types of users all of whom would find Andrew Ng's course a brilliant introduction to the world of machine learning.

#

But I'd like to ask you, with all of the layers of abstraction today, how vital is it to get to the absolute grass roots of the math behind ML? Especially if what you're primarily doing is commercial work within a company where you're employing models (validated by the academic community) and not cutting edge work (i.e., coming up with your own models)

lean ledge Nov 2, 2018, 7:10 AM

#

It's very very much vital. Again, most data scientists have a master's or PhD for a reason. Being able to choose an appropriate model for the task and the data requires understanding of both the data, the techniques and their intricacies. If you ask industry, they'll tell you there's not enough data scientists etc. That's not because there's not enough people interested in it. It's because there's too many people with shallow understanding and lack of people who've gained the understanding of ML techniques. This isn't in academia, this is according to industry.

#

Data scientists generally do not make their own cutting edge models

#

They're specialists in choosing them

#

If you don't understand how your data is shaped, how higher dimensionality affects models, how your network's topology affects it, how different architectures differ, etc, most of which you can't understand without a solid grounding in the mathematics, you can't choose the best model

fathom zenith Nov 2, 2018, 7:15 AM

#

I agree with you

#

But I don't agree that Andrew Ng's course is "bad" simply because its mathematically shallow

#

Different people need different introductions

#

And Andrew Ng's is one type of it

#

You don't need a strong mathematical grasp to get started, to go deeper you do absolutely. But to get started and playing around with it? Absolutely not. Interest is sustained when you can see what you do, not in abstracts. But then again, that's just my opinion

#

As for the vitality of math in commercial work, I'd have to agree to disagree there. I work with practitioners in the field, and the pressure on them to stick to deadlines and deliver fast is immense. That means you don't have the time to think through everything, it's all about pushing that MVP out and if it works, it works. Is it good data science? Hell no. Is it effective? Heck yes.

#

I'm not saying you don't need to know any of the math. Of course you do! But you don't need to go incredibly in-depth.

#

A fundamental understanding should be enough to get started

lean ledge Nov 2, 2018, 7:32 AM

#

The vast majority of companies I know look for mathematical talent. I was hired not because I was a good programmer but because I was great at maths and physics. Many companies finish off their job adverts with "candidates from maths, physics and electrical engineering do the best". Not a single one of my coworkers would agree with you in this case

#

I honestly would suggest someone do a statistics degree rather than a CS or whatever degree if they want to be a data scientist

#

Data science is 99% maths, 1% programming

fathom zenith Nov 2, 2018, 8:04 AM

#

Well that depends on the nature of your job of course. But there are far more users of the knowledge of machine learning than data scientists who don't need to necessarily understand all of the math behind the models.

But in summary my argument is that while that course may not be mathematically dense, it's still not "bad" for many, many people who use that knowledge in various ways.

#

So let's just agree to disagree 😃

ocean crag Nov 2, 2018, 6:19 PM

#

its popping

small ore Nov 3, 2018, 1:46 AM

#

Raggy, fyi, Andrew Ng has topics on how to pick models etc later in his course. Also many-a-times people who know all the math formulae and derivation fail to use their own intution. I believe you are top 1 percentile if you are able to just see those formulae and get the intutions and are able to apply it yourself. Obviously Andrew Ng sounds frivulous for you. I believe for most people it is less stressful to go his way first and then maybe look into the math. It becomes easier to understand math when you already know some application and looking to understand the nuances

lapis sequoia Nov 3, 2018, 3:34 AM

#

@lean ledge and @fathom zenith masters or phd are not really required to be a data scientist. I spoke to a family friend who is a data scientist and he has a bachelor in biology. So, yah.

lean ledge Nov 3, 2018, 6:10 AM

#

@lapis sequoia I am a data scientist. It is not required but the vast majority have a higher degree and they make it much easier to develop the skills you need.

#

@small ore He does not go into detail at all and skips over a lot of important things

#

And can we stop this idea that learning the maths will mean you're somehow incapable of actual practice????

#

I'm not telling you to memorize formulae, that's not learning maths. With maths comes a deeper understanding of what the data and models convey

#

Your mathematical intuition enhances and complements your normal intuition. Learning maths involves learning how things work not memorizing. I believe if you think otherwise that you have a fundamental misunderstanding of what maths involves.

lyric canopy Nov 3, 2018, 8:39 AM

#

I think the problem @lean ledge is highlighting is a major in statistics as well. This comment on Reddit to an article on statistical fraud illustrates why: https://old.reddit.com/r/statistics/comments/9toukm/1_in_4_biostatisticians_say_they_were_asked_to/e8ymwn9/?st=jo16k8bw&sh=9da7ca00

What happens a lot is in academia is that a lot of the users of statistics, so your typical emperical researchers, don't actually know all that much about statistics other than the introductory applied stuff they've been taught while getting their degree. This leads to tunnel vision: People either shaping their problem/research question to fit the model they know ("to a man with a hammer everything's a nail") or, worse, shaping their data to fit to the model. What's still worse than that is that, when they do actually apply a model, they don't realize what the limitations are, what violations of the assumptions mean for the interpretation of the results, or even what it means to delete a few "odd" data points from your data set. Some even don't really know why it's bad to choose a technique based on the results it gives you, so prefering model B over model A because model B gives you the results you were looking for (as described in that comment I linked above).

1 In 4 Biostatisticians Say They Were Asked To Commit Scientific F...

In my experience, some people just don't realise what they're asking (like the story also suggests). I once had a similar request (to pool data...

#

At the end of the day, what this means is that the number of Type I Errors in academia increases (so, the false positives, one of the reasons behind the replication "crisis") and/or models that don't actually do what they're claimed to do. Once it becomes clear that all those results that were once taken as facts turn out to be misleading or wrong, people lose their trust in the methodology and the statistics itself, while it was really the interpretation and the application where stuff went wrong. I think that's a danger for ML as well: If people start using models wrong, claim that they do things they don't actually do, and start suggesting decisions based on misleading results, the confidence in ML will diminish like the confidence in statistics has diminished in some of fields of academics.

In a way, this is already happening: The recent amazing AI tool for recruitement that was scrapped because it turned out to be very biased is one of those examples that could lead to diminishing trust in the techniques, while, in fact, the way the model was trained was the problem, not the fact that ML was applied.

lean ledge Nov 3, 2018, 8:42 AM

#

^^^

#

That's just one of the issues with doing ML without having a mathematical background

lyric canopy Nov 3, 2018, 8:49 AM

#

Yeah, probably. It's just one of the issues with research not actually knowing statistics, too.

lean ledge Nov 3, 2018, 8:55 AM

#

What I said comes down to this. Your models are mathematical models. You cant understand them if you dont understand the maths. The idea that somehow you can know a lot of maths but not know intuition for the model is honestly just ignorance of what data science involves. Data science, including in a commercial setting, involves a lot of maths and it's simply not possible to consistently guess what models you need right every time without any understanding of maths

woven tundra Nov 3, 2018, 11:51 AM

#

I have a question and maybe I'm going to sound super ignorant but let's assume we've got a dude named Bob who has supposedly taken a whole bunch of Coursera courses in ML and DL and all the other "L"s.

#

Bob builds a recommendation engine for his website hacking together models based on what he's learned. Bob doesn't understand the math to a great level but since there are so many great libraries out there abstracting away a lot of the detail, he just uses them to do his job.

#

And it just so happens that it works and more people are buying things on his website because they're getting good recommendations

#

So really a black box method, he knows what goes in, he has very little knowledge of what's happening inside, but what he's getting, the output, is working for him

#

I know that's not proper data science, but is that in general a bad way of approaching ML?

lean ledge Nov 3, 2018, 12:00 PM

#

In general, yeah. now that Bob has made what he's made, he likely doesnt know how to make it better. He doesnt understand why exactly it works, what model they might want to swap out for another model, in what cases his model fails. Were he to try something else, there is no guarantee his previous strategy would work and no guarantee that his knowledge would transfer over.

#

Not being able to make your model better or knowing what's stopping it as it is is a problem. So is not being able to transfer over to another scenario due to lack of understanding.

#

Yes, it has worked apparently well in this scenario and that's great! Bob can continue using this model as it is if it works well enough for him. However, that method tends to not work very well in general

fallow summit Nov 3, 2018, 12:13 PM

#

Hello!
Can someone recommend me how to get started with Machine Learning? I'm kind of person that likes to start making some practical project and learn while doing it. Basically that's how I learnt a lot of programming concepts. For past few days I am trying to dip my toes into machine learning, but all learning resources that I've found were like hours and hours of theory stuff instead of building something. I'm good at math even though I'm still in highschool, but I'll be fine in learning concepts that I do not understand. I've got no problems at all with programming in Python or something else. Can someone recommend me a good way to start my journey with machine learning?

#

I've got no problems learning theory, but I want to get started and see "that something is happening" and learn all the concepts needed while building something.

misty sonnet Nov 3, 2018, 12:30 PM

#

I'm actually really interested in a answer for this ^^^

lean ledge Nov 3, 2018, 1:31 PM

#

/r/LML has a few resources who dont really care about learning ML but just want to build stuff listed in their hackers guide: https://old.reddit.com/r/learnmachinelearning/wiki/getting_into_ml_hackers_guide

getting_into_ml_hackers_guide - learnmachinelearning

Reddit gives you the best of the internet in one place. Get a constantly updating feed of breaking news, fun stories, pics, memes, and videos just for you. Passionate about something niche? Reddit has thousands of vibrant communities with people that share your interests. Alt...

#

There's also a high schooler's guide for someone who wants a career in the field https://old.reddit.com/r/learnmachinelearning/wiki/getting_into_ml_high_schoolers_guide

getting_into_ml_high_schoolers_guide - learnmachinelearning

Reddit gives you the best of the internet in one place. Get a constantly updating feed of breaking news, fun stories, pics, memes, and videos just for you. Passionate about something niche? Reddit has thousands of vibrant communities with people that share your interests. Alt...

#

@fallow summit @misty sonnet

stable tinsel Nov 3, 2018, 1:36 PM

#

@fallow summit #1 pick a problem you want to solve w/ ML

#

#2 make a dataset

lean ledge Nov 3, 2018, 1:37 PM

#

#1 choose a dataset on kaggle
#2 cry

stable tinsel Nov 3, 2018, 1:37 PM

#

yeah i mean building the dataset is like 90% of the battle

#

thats really all that matters

lean ledge Nov 3, 2018, 1:38 PM

#

Worst thing about working with non tech rather old fashioned client companies is that the data they have is the biggest pile of crap

stable tinsel Nov 3, 2018, 1:39 PM

#

"yeah all we got is freeform natural language can you make this work?"

lean ledge Nov 3, 2018, 1:40 PM

#

my coworkers have it worse sometimes

📎 unknown.png

stable tinsel Nov 3, 2018, 1:40 PM

#

i dont even know what that means

lean ledge Nov 3, 2018, 1:41 PM

#

I'm not sure anyone but the guy who wrote it does

#

My last client company was based in chile and sent me all their data in spanish too

stable tinsel Nov 3, 2018, 1:41 PM

#

lol

#

sniadek just say it pls

#

whats on ur mind

fallow summit Nov 3, 2018, 1:43 PM

#

Thanks for your reply!
Don't get me wrong, I want to learn as much as I can about machine learning and Data Science. I'm not aiming for job anytime soon. I'm just fascinated about what can be acomplished with it. I just know my strenghts and I know that jumping into a lot of theory at the beggining and not seeing anything happening will push me back from this field. Usually when I am learning something new I'm trying to build something and learn while building it. I couldn't find any good resources to get straight into the project with machine learning and learn, so I came here to ask. Atm I would like to build something that can play video game, some really basic game.

#

😄

stable tinsel Nov 3, 2018, 1:44 PM

#

there's a lot of research on atari being published

#

do you want to start with an agent that plays atari games?

fallow summit Nov 3, 2018, 1:45 PM

#

Yup!

lean ledge Nov 3, 2018, 1:45 PM

#

Basic machine learning requires maths which is usually taught in early 2nd year uni in the US, late first year in other places. It's hard to get into it in high school when you dont understand multivariate calc or linear algebra

#

You can try andrew ng's course but despite being rather shallow, I still dont recall it going over basic multivariate calc

#data-science-and-ml

Fixing random state for reproducibility

fit a logistic regression model and store the predictions