#data-science-and-ml | Python | Page 187

young aurora Jun 19, 2018, 12:04 AM

#

IT WAS THE LIMITS

#

You're a hero, @velvet anchor

velvet anchor Jun 19, 2018, 12:04 AM

#

It’s the Santa hat.

#

I’m glad that worked because if it didn’t I had no idea where to go

#

Are you using some tutorial for al the work you’re doing with the Chinese timeline thing?

young aurora Jun 19, 2018, 3:12 AM

#

@velvet anchor No, I'm not.

#

Also, here's some cool visualizations from that article I just published

#

📎 xiarulers.png

#

📎 XiaTCvsXSZCP.png

#

And a link to the article, since I actually finished it. I've got a couple others up there too if anyone thinks they're fun to read.

#

http://www.magisterludi.net/2018/06/chinese-dynasties-and-data-part-ii-xia.html

Chinese Dynasties and Data: Part II, the Xia

Short Discussion of Methodology This is the second part of the series I've begun recently on Chinese history. In this post we'll cover th...

velvet anchor Jun 19, 2018, 3:14 AM

#

Oh okay cool i'll take a look in a bit

#

I was just curious since you've been doing so much with the same data

young aurora Jun 19, 2018, 3:16 AM

#

Yeah, I'm working on a massive project where I'm cataloguing every (ruling) dynasty in all of China's history

#

Also multiple concurrent dynasties that existed during warring periods

#

It's a buttload of work

visual notch Jun 19, 2018, 4:02 AM

#

I want to scale high range datas using logarithm based 10, but i find a problem towards datas with 0 value, what's another alternatives to logarithm?

velvet anchor Jun 19, 2018, 4:09 AM

#

what are you exactly trying to do

desert cradle Jun 19, 2018, 4:11 AM

#

arc tangent 😛

velvet anchor Jun 19, 2018, 4:13 AM

#

arctan might not be steep enough

#

it's also rather linear so not super useful for scaling

feral lodge Jun 19, 2018, 4:21 AM

#

Square root, cubic root or log(x+1) might be worth trying

velvet anchor Jun 19, 2018, 4:22 AM

#

yeah x+1 was gonna be my suggestion

feral lodge Jun 19, 2018, 4:23 AM

#

I'd try that one first 👌 @visual notch

velvet anchor Jun 19, 2018, 4:23 AM

#

or maybe even just a flat division depending on scale

visual notch Jun 19, 2018, 8:05 AM

#

Actually im going to normalize those data i have into 0-1 scale

#

Using min-max

#

And placing the actual 0 value as 0

#

Instead of ignoring it like now and placing the lowest non zero value as 0

visual notch Jun 19, 2018, 8:47 AM

#

I think ill pick the log x+1

open pecan Jun 19, 2018, 11:45 AM

#

hello, im trying to implement an A star search to find a least cost path, but i dont know where im going wrong

#

https://paste.pythondiscord.com/gevadujufa.rb

feral lodge Jun 19, 2018, 12:19 PM

#

Does it throw an error or is it just performing poorly?

open pecan Jun 19, 2018, 12:28 PM

#

poor performance, no errors

#

heres a map input ive been trying to get a result on https://paste.pythondiscord.com/ajehojehaz.nginx

feral lodge Jun 19, 2018, 1:05 PM

#

When you parse the map you're setting the cost of a node to be based on its w, r, f, h, m type, but you never seem to use cost anywhere. So, despite you updating the nodes' g, they're never set to anything but 0

#

If I put a little print inside update_node:

    def update_node(self, neighbor, node):
        neighbor.parent = node
        neighbor.g = neighbor.g + node.g
        print(neighbor.g)
        neighbor.h = self.get_heuristic(neighbor)
        neighbor.f = neighbor.h + neighbor.g

I get this:

#

So if i'm not wrong the algorithm currently judges distance based completely on f, which lets it pick expensive paths since f is just the Manhattan distance

#

Since g(n) is the true cost of reaching node n, your problem is fixed by replacing this

neighbor.g = neighbor.g + node.g

with this

neighbor.g = neighbor.cost + node.g

#

@open pecan

open pecan Jun 19, 2018, 3:21 PM

#

oh wow, thank you very much, id been staring at it for so long!

feral lodge Jun 19, 2018, 3:40 PM

#

Always good to get a fresh pair of eyes 👀

velvet anchor Jun 19, 2018, 3:41 PM

#

Yeah I had a bug i'd been tracking and rerunning for like 8+ hours yesterday. as soon as i posted the snippet i saw it haha. sometimes it takes just stepping away or having someone else look to solve

velvet anchor Jun 19, 2018, 5:59 PM

#

@feral lodge best results I was able to get on the obscured dataset was uh

#

Found 3636 correct faces out of 10049 total images

#

I'm continually enlarging the images and stuff to see if something weird happens but at 4x normal size that is the highest and it started going down from that point

feral lodge Jun 19, 2018, 7:27 PM

#

Looks like dlib's face recognizer is built with histogram of oriented gradients! Never heard of it, but from what I can see it's quick and has some other good properties but is largely antiquated by CNNs, at least when you have enough data 🤔
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6077989
https://arxiv.org/pdf/1703.05853.pdf

velvet anchor Jun 19, 2018, 8:01 PM

#

Yeah

#

It's a really cool concept

#

actually inspired my technique a fair bit

#

24hours no hiccups on the genetic algorithm 👌

small ore Jun 19, 2018, 8:06 PM

#

👍

jade sundial Jun 19, 2018, 9:26 PM

#

Hi, I was trying to plot some data real-time at work. Started of by using matplotlib, but it is too slow for the datastream. PyQtGraph seems like a hassle to get to work plotting real-time data. Does anyone have any recommendations, tips, or tricks?

velvet anchor Jun 19, 2018, 10:31 PM

#

How real time are we talking

#

Like how many data points how often

velvet anchor Jun 20, 2018, 5:06 AM

#

https://youtu.be/QbX7BhjOOvY

YouTube

DEFCONConference

DEF CON 25 - Dan Petro, Ben Morris - Weaponizing Machine Learning:...

At risk of appearing like mad scientists, reveling in our latest unholy creation, we proudly introduce you to DeepHack: the open-source hacking AI. This bot ...

▶ Play video

#

Neat video on machine learning

jade sundial Jun 20, 2018, 4:21 PM

#

@velvet anchor It depends, but it is sub 300

#

But it should either be drawn efficiently or be drawn in another thread

#

And the dots are received at 10-30 fps

jade sundial Jun 20, 2018, 6:58 PM

#

And to add some info, it's several different tracks being scatter plotted. So it would be nice if one could scatter plot every track with a unique color.

velvet anchor Jun 20, 2018, 7:00 PM

#

@feral lodge may have some idea personally im not sure

#

if MPL wont keep up

jade sundial Jun 20, 2018, 8:38 PM

#

I think I got something working in pyqtgraph, here's the code for the interested https://github.com/eHammarstrom/pyqtgraph-live

Currently you have to control the clear/update loop externally (which is what I need). Critique is welcome.

GitHub

eHammarstrom/pyqtgraph-live

Contribute to pyqtgraph-live development by creating an account on GitHub.

velvet anchor Jun 20, 2018, 10:16 PM

#

Also slandon after a couple days it seems that 40% is the absolute best dlib can get out of the obscured faces

#

which is honestly pretty good

feral lodge Jun 21, 2018, 5:55 AM

#

Does that percentage include false/true positives/negatives? o: like, if it's a data set of 50% non-faces and 50% faces, if we guess randomly we'd be correct 25% of the time

#

Whereas if the data is all faces, guessing randomly would be correct 50% of the time

#

And sorry for not replying to the mention before! I have no idea to handle live data though, good job getting it working functor

dreamy tartan Jun 21, 2018, 12:55 PM

#

Hi,

How should i decide that i should use which one for feature selection? My purpose is choose best features from my data for predict target feature.

Univariate selection
Recursive Feature Elimination (RFE)
Principle Component Analysis (PCA)
Choosing important features (feature importance)

feral lodge Jun 21, 2018, 2:05 PM

#

Out of those I've only ever used pca, which is cool but should not be used for data with non linear relationships

#

As far as I know, univariate selection is also mostly (not sure if always) used for linear data

#

There are other choices, like MIC and lasso/ridge regression, which can also be used for scoring the features

#

They all come with pros and cons though, and they're not always useful for the same purposes. PCA will not directly tell you which features best describe the data, rather it tell you what linear combinations of the features best describe the data. Lasso has the property of often setting 0 or 1 as coefficients to the features, which singles out important features but loses some information

velvet anchor Jun 21, 2018, 3:06 PM

#

Slandon that 40% is just dlib over the dataset as it is

#

Was just curious how good dlib can detect hella obscured faces. I ran it over my test real faces set and it was 50k/50k

small ore Jun 21, 2018, 5:14 PM

#

Just curious. Is the learning also from obscured faces or is just the predicion on it? Coz learning from a normal data set and predicting on an obscure data set sounds something like a real life scenario

velvet anchor Jun 21, 2018, 5:18 PM

#

It’s not learning anything. It’s just using the default predictor of dlib because we were curious how well it worked

#

But yeah that is a real life scenario and it just highlights how important it is to have a correct data set. If you expect to see obscured faces you need to have your model built to find those too

#

But I think dlib is just trained on a massive amount of face photos. I’m not entirely sure. The specifics

lapis sequoia Jun 22, 2018, 12:28 AM

#

idk if this is the best channel to ask in, but do you guys have any recommended video series on machine learning w/ python (preferably on udemy)?

velvet anchor Jun 22, 2018, 12:29 AM

#

not on udemy, no

#

https://www.youtube.com/watch?v=OGxgnH8y2NM this guys pretty good tho

YouTube

sentdex

Practical Machine Learning Tutorial with Python Intro p.1

The objective of this course is to give you a holistic understanding of machine learning, covering theory, application, and inner workings of supervised, uns...

▶ Play video

lapis sequoia Jun 22, 2018, 12:32 AM

#

ty i'll check it out

lilac shadow Jun 22, 2018, 1:32 AM

#

i've started to have a look at andrew ng's course on machine learning, and so far i like what i see. it's pretty much exactly how i'd prefer to learn stuff like this. i feel like it's better to learn what happens behind the scenes, and then use that knowledge to implement it yourself to develop a strong understanding before using magical frameworks. thanks for the recommendation, guys ^^

velvet anchor Jun 22, 2018, 1:40 AM

#

it depends

#

data science teams are generally like several people big

lilac shadow Jun 22, 2018, 1:40 AM

#

yeah, that's understandable

velvet anchor Jun 22, 2018, 1:41 AM

#

and on that team only 1 of them will know the inner workins of stuff

#

another will be really good at manipulating data, another a math guy, etc

gusty meteor Jun 22, 2018, 1:48 AM

#

@velvet anchor im watching that guty right now! haha

lilac shadow Jun 22, 2018, 1:48 AM

#

i suppose i'm just the type of person to want to know how stuff works, regardless of whether i'm going to be using it a lot

velvet anchor Jun 22, 2018, 1:51 AM

#

yeah and thats fine

#

but the problem is theres a LOT to know

#

almost impossible really

lilac shadow Jun 22, 2018, 1:52 AM

#

yeah, but i at least want a general idea.

wide oxide Jun 22, 2018, 3:24 AM

#

Is anyone into Algorithms and Data Structure?

#

I am trying to get good at Algorithms and Data Structure by solving problems on LeetCode and HackerRank.

#

I've currently solved 69 questions on LeetCode and around 70 on HackerRank but, 95% of them were easy questions.

velvet anchor Jun 22, 2018, 3:31 AM

#

It’s really all just practice honestly. As you solve more you’ll start to notice patterns you can use between problems

wide oxide Jun 22, 2018, 3:32 AM

#

Should I look for answers if I am not able to solve them under 30 mins? (Medium level questions)

velvet anchor Jun 22, 2018, 3:32 AM

#

I’d say only look for answers if you just actually can’t figure it out

#

Idk that there’s a hard and fast rule for like x amount of time

#

It’s important to understand the answer though. Not just get it working and move on

wide oxide Jun 22, 2018, 3:34 AM

#

I try to understand every aspect as possible

#

Did you do any course or read any book?

velvet anchor Jun 22, 2018, 3:36 AM

#

I’m a senior in college so yeah I’ve had a lot of courses in data structures and algorithms and math and stuff

wide oxide Jun 22, 2018, 3:37 AM

#

My college will start from next month ;-;

#

But, they will teach Algorithms and DS from 2nd year.

velvet anchor Jun 22, 2018, 3:39 AM

#

Pretty standard

wide oxide Jun 22, 2018, 4:27 AM

#

I have a list of movies, their genre and gross

#

I've grouped movies by their genre and summed up their grosses, so, now I have genre vs gross.

#

For example, Action, Comedy, Drama | $2 Billion
Comedy | $1.38 Million

#

Now, I am thinking about to make something that will predict the chances of a movie making in billion based on its genre

#

Can someone help me with that? Like what stuff I should search for? ( I guess that I will have to deal with weightage?)

velvet anchor Jun 22, 2018, 4:30 AM

#

This is all statistics

wide oxide Jun 22, 2018, 4:31 AM

#

You mean I will have to go through the whole

velvet anchor Jun 22, 2018, 4:31 AM

#

Specifically you’ll want to look up predictive models

wide oxide Jun 22, 2018, 4:31 AM

#

Oh, thank you!

somber plank Jun 22, 2018, 2:17 PM

#

Is there anyone who can help me decipher the difference between mean square vs. least square when it comes to ML? I looked up some stuff but couldn't make it out in a simple way

#

Nevermind. Just came to me hahaha

small ore Jun 22, 2018, 5:37 PM

#

@lilac shadow Good to know you started on Andrew Ngs course. I kind of got scared and stopped it in the middle when it came to NN. I need a lot more concentration to pass that bit

lapis sequoia Jun 22, 2018, 7:19 PM

#

hey is anybody firmiliar with Q-Learning

feral lodge Jun 22, 2018, 8:13 PM

#

No expert, but I've done a few labs on q-learning! What's the problem? @lapis sequoia

lapis sequoia Jun 22, 2018, 8:18 PM

#

i have some basic Q-Learning code that uses an environment and now i want to test my own environment on the brain, but i don't know how to attach it? i have in the environment the Reward state and actions defined?

#

@feral lodge

#

the used environment in the default code is tkinter and i don't use that? is that a problem i don't thought so? i studied the code but i couldn't find it where i should put in the values and how the environment reads it @feral lodge

feral lodge Jun 22, 2018, 8:24 PM

#

That's a bit tricky to answer without being with you and checking the code, but if I were you I would check for the piece of code that describes the update rule; this thing

📎 unknown.png

#

Since it uses all the important structures and values of Q-learning, you can find all relevant variables and functions there

lapis sequoia Jun 22, 2018, 8:26 PM

#

yes i know that

#

but in coding its diffrent

feral lodge Jun 22, 2018, 8:29 PM

#

Could you maybe link a hastebin with the tkinter code?

lapis sequoia Jun 22, 2018, 8:49 PM

#

is github good also with all the code?

#

also can u pm me?

#

@feral lodge

#

this is the default code i use https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/2_Q_Learning_maze

GitHub

MorvanZhou/Reinforcement-learning-with-tensorflow

Reinforcement-learning-with-tensorflow - Simple Reinforcement learning tutorials

#

so i made my own environment with actions environment and rewards already defined

#

i tried to modify the code to get it to work on my environment but i don't know how i show the AI my environment i made @feral lodge , also PM would be great?

sacred summit Jun 22, 2018, 9:42 PM

#

looks interesting

velvet anchor Jun 22, 2018, 10:26 PM

#

Some people used that same idea at def con this year

#

to attack SQL databases

wide oxide Jun 23, 2018, 9:51 AM

#

@velvet anchor Hey, will predictive analysis require advanced statistics?

#

I've read some basics of predictive analysis

#

Downloaded a book Predictive Analysis - Eric Siegel

dreamy tartan Jun 23, 2018, 11:13 AM

#

Hi everyone,

Im trying to classificate video with CNN using CPU. It takes 200 frames for each predict and every predict takes approximately 1 minute. I want to reduce predict time because it will be real-time project. Is there better way to do it? How can i reduce process time?

velvet anchor Jun 23, 2018, 2:57 PM

#

Get a better GPU talat

#

@wide oxide kind of impossible to answer honestly without knowing the ins and outs. It also just depends on how correlated the data is.

wide oxide Jun 23, 2018, 3:43 PM

#

So, the best option is to go for beginner Statistics and then predictive analysis?

velvet anchor Jun 23, 2018, 4:52 PM

#

I mean predictive analysis is just applied statistics

#

So yeah it’s better to have a fundamental grasp

#

Though @dreamy tartan I guess first make sure your gpu is handling the requests and not your cpu. How are you predicting? You’re not training the model for every request are you?

bleak ether Jun 23, 2018, 11:44 PM

#

Hello! How can I make a histogram using a list of datetime.datetime?

small ore Jun 24, 2018, 2:13 AM

#

https://stackoverflow.com/questions/30330389/histogram-datetime-objects-in-numpy

Stack Overflow

Histogram datetime objects in Numpy

I have an array of datetime objects and I'd like to histogram them in Python.

The Numpy histogram method doesn't accept datetimes, the error thrown is

File "/usr/lib/python2.7/dist-packages/nump...

#

https://stackoverflow.com/questions/34814606/a-per-hour-histogram-of-datetime-using-pandas

Stack Overflow

A per-hour histogram of datetime using Pandas

Assume I have a timestamp column of datetime in a pandas.DataFrame. For the sake of example, the timestamp is in seconds resolution. I would like to bucket / bin the events in 10 minutes [1] bucket...

#

Not sure what you exactly meant. So posted two links unrelated to each other. Pick whichever or give more details

#

@bleak ether

verbal cairn Jun 24, 2018, 12:02 PM

#

Anyone able to assist with helping me figure out how to prepare satellite imagery through this tutorial: https://datacube-core.readthedocs.io/en/latest/ops/prepare_scripts.html#prepare-scripts

lapis sequoia Jun 24, 2018, 1:00 PM

#

hold on did you say satellite imagery ?

#

does that mean that you can watch my house ?

#

@verbal cairn

verbal cairn Jun 24, 2018, 1:01 PM

#

Not yet @lapis sequoia 😦

lapis sequoia Jun 24, 2018, 1:01 PM

#

no i mean will that enable you to ?

#

@verbal cairn

verbal cairn Jun 24, 2018, 1:02 PM

#

No, I just want to do some land classification

#

No interest in houses or individuals

lapis sequoia Jun 24, 2018, 1:02 PM

#

no i mean if i did the same as you will i be able to watch people ? and houses and my school ?

verbal cairn Jun 24, 2018, 1:03 PM

#

You would have access to imagery of country land elements

#

But it's not real time so you wouldnt be "Watching"

lapis sequoia Jun 24, 2018, 1:03 PM

#

nooooo!

#

ok np

#

btw ever made a version of game of life ?

verbal cairn Jun 24, 2018, 1:03 PM

#

The board game?

lapis sequoia Jun 24, 2018, 1:03 PM

#

or the right question is

#

:

#

can you help me in #help-coconut

#

?

#

i really need that help

#

thanks

verbal cairn Jun 24, 2018, 1:05 PM

#

@lapis sequoia You're ahead of me, I'm just learning around that same area

lapis sequoia Jun 24, 2018, 1:05 PM

#

what the ????

#

how the heck am i ahead of you

#

i'm a beginner class programmer

#

and you are playing with sats

verbal cairn Jun 24, 2018, 1:06 PM

#

Yep, but I'm just grabbing the things I need to build the code, you're doing a good job starting from first principles

#

Satellite imagery is just a bunch of numbers on a matrix grid as I understand

#

So it would just be finding the patterns that equate to certain things and then implementing a bit of code to find and collate those patterns in an encoded number set

lapis sequoia Jun 24, 2018, 1:07 PM

#

i hate matrixes

verbal cairn Jun 24, 2018, 1:07 PM

#

I did until recently

#

Then I just realised I've been looking at them as more difficult than they are

lapis sequoia Jun 24, 2018, 1:08 PM

#

i get stuck on creating them

verbal cairn Jun 24, 2018, 1:08 PM

#

They're just tables of numbers

lapis sequoia Jun 24, 2018, 1:08 PM

#

yea a replicated one but

verbal cairn Jun 24, 2018, 1:08 PM

#

No different than a simplified excel

lapis sequoia Jun 24, 2018, 1:08 PM

#

i don't know how to use excel

verbal cairn Jun 24, 2018, 1:08 PM

#

Microsoft excel?

lapis sequoia Jun 24, 2018, 1:09 PM

#

yea

verbal cairn Jun 24, 2018, 1:10 PM

#

Ah, this complicates things a little

lapis sequoia Jun 24, 2018, 1:11 PM

#

i hate the office programs

#

but why do you copy and paste code ?

lapis sequoia Jun 24, 2018, 3:39 PM

#

Hey everyone, I'm trying to learn about statistics and programming at the same-ish time and I could use some guidance with an example:

I have a distribution of families by income in the US in 1973 with pre-counted data as follows:

income level (1000 $) percent

0-1 1
1-2 2
2-3 3
3-4 4
4-5 5
5-6 5
6-7 5
7-10 15
10-15 26
15-25 26
25-50 8

= 50 1

note that the percents do not add to 100% due to rounding
and the class intervals include the left endpoint but not the right endpoint

the problem is that I want to make a histogram with matplotlib.pyplot but I have no idea how to use this pre-counted data, any suggestions?

stone oasis Jun 24, 2018, 3:39 PM

#

well

#

pair the data before doing the plot

#

the income level can be made 1 number

#

1,2,3,4,5,6,7,10,15,25,50+

lapis sequoia Jun 24, 2018, 3:47 PM

#

I'm sorry but I don't understand what you mean?
I kinda get how I can take the income level and turn it into the bins, but I don't know how to turn the percent into the associated height.

hasty maple Jun 24, 2018, 5:14 PM

#

https://stackoverflow.com/questions/33497559/display-a-histogram-with-very-non-uniform-bin-widths this could be helpful @lapis sequoia

Stack Overflow

display a histogram with very non-uniform bin widths

Here is the histogram
To generate this plot, I did:

bins = np.array([0.03, 0.3, 2, 100])
plt.hist(m, bins = bins, weights=np.zeros_like(m) + 1. / m.size)
However, as you noticed, I want to plot ...

#

https://pandas.pydata.org/pandas-docs/stable/visualization.html#area-plot this might be a nicer visualization

lapis sequoia Jun 24, 2018, 7:49 PM

#

Thanks @hasty maple
I managed to solve it differently, kind of hack-ish solution (feels like there must be a better/more pythonic way) but in case anyone is interested in what it should look like:

#given data:
bins = [0, 1, 2, 3, 4, 5, 6, 7, 10, 15, 25, 50]
percent = [1, 2, 3, 4, 5, 5, 5, 15, 26, 26, 8, 1] # % of population income class/bin

#solution:
x_steps = [abs(a - b) for a, b in zip(bins, bins[1:])]
weights = [perc / step for perc, step in zip(percent, x_steps)]
data = bins[:-1] # leave off 50+ value for proper length
plt.hist(data, ec='black', bins=bins, weights=weights) # ec=edgecolor
plt.xlabel('INCOME (THOUSANDS OF DOLLARS)')
plt.ylabel('PERCENT PER THOUSAND DOLLARS')
plt.title('Distribution of families by income in the U.S. in 1973')
locs = range(0, 55, 5)
plt.xticks(locs)
plt.show()

this is exactly what it looks like in the book
if anyone has a better solution (meaning the same histogram from the same given data but with prettier code): I'm all ears

slow oriole Jun 26, 2018, 7:51 AM

#

https://cdn.discordapp.com/attachments/159253281774895106/460975651596533760/FD_m90IA-wE.png

lapis sequoia Jun 26, 2018, 8:58 AM

#

hey somebody firmiliar with Q-Learning? pm me please

lapis sequoia Jun 26, 2018, 7:34 PM

#

nobody?

lapis sequoia Jun 26, 2018, 8:31 PM

#

dunno what that is,

#

but look how cool data scientists can be: https://www.youtube.com/watch?time_continue=104&v=ndyjFUF2e9Q

YouTube

Udacity

Meet The Data Scientist Nanodegree Instructors

▶ Play video

#

lol i dunno i just started it but have some doubts about this program

velvet anchor Jun 26, 2018, 9:33 PM

#

you can ask your question Jan but we dont really do PM help

wide oxide Jun 27, 2018, 12:56 AM

#

Good morning!

stone oasis Jun 27, 2018, 1:33 AM

#

this video is real?

#

cant be

velvet anchor Jun 27, 2018, 1:38 AM

#

i mean

#

at least 2 of them have Ph.D.s

lapis sequoia Jun 27, 2018, 5:44 AM

#

lol

#

"VC at STV, a $500 million venture capital fund"

dusky agate Jun 27, 2018, 7:51 AM

#

hold on

#

http://bfy.tw/Imtp

LMGTFY

#

@lapis sequoia

lapis sequoia Jun 27, 2018, 8:04 AM

#

"I recommend using bing"

#

lmgtfy gets shittier every day

dusky agate Jun 27, 2018, 8:33 AM

#

~~it's not like I intentionally chose bing when creating the link~~

lapis sequoia Jun 27, 2018, 9:30 AM

#

yes. but this question isn't really something for in a group

lapis sequoia Jun 27, 2018, 10:27 AM

#

Is anyone here using an AMD GPU to do deep learning?

winter violet Jun 27, 2018, 12:01 PM

#

I'd like to explore using Python and specifically scikit learn for a project I have it work. I want to do some useful things in machine learning but I'm not really sure how to. I know the basics of machine learning though...

#

I held out the regional manager for sales so I constantly have metrics about each branch location to include average revenue per day, upsell, and sales for certain upgrades on our products. I'm trying to figure out how to use machine learning in conjunction with these data. I'm not really sure how to do it though because every metric is calculated using average. For example average revenue per day is just based on the location average and it's expected that Associates push towards or exceed that average to be competitive... I'm not sure how I can make a machine learning algorithm that can predict or tell me if a certain value or range is acceptable or unacceptable

velvet anchor Jun 27, 2018, 4:20 PM

#

@lapis sequoia I have at home once or twice

#

@winter violet It seems from your description that NN may not be the best way to train on using like scikit learn. This seems like a statistical problem, I would probably just take all the revenue ranges you have and plot them out. Find a standard deviation of your data and see if a value falls within that block. Maybe I'm misunderstanding your statement though.

#

Basically, this seems like a statistics problem I suppose is what I mean

winter violet Jun 27, 2018, 4:41 PM

#

So if you had this context and this data what would you use machine learning for? What a practical application of it

#

@velvet anchor

velvet anchor Jun 27, 2018, 4:45 PM

#

Neural networks / deep learning is nice when there are problems without like direct correlations to it, or with a ton of different inputs that can’t easily be singled out. Or for image analysis where you need to identify specific patterns that may be inside a range of different things. Like Tumors inside X-rays. Or like right now my research is identifying deepfakes with them

#

It’s not that machine learning couldn’t do what you want to use it for. It’s just not needed and you might get a lot of additional noise from it that wouldn’t necessarily be present in a statistical model

#

Also the sheer amount of data you need for networks is kind of another huge road block that stops it from working on applications where statistics and such excel

lapis sequoia Jun 27, 2018, 10:25 PM

#

@velvet anchor what kind of software did you use to do deep learning on the AMD GPU (because CUDA doesn't support AMD GPUs)?

velvet anchor Jun 27, 2018, 10:26 PM

#

I think Theano supports OpenCL

#

and there's a couple tensorflow forks that do too

#

even keras does too

#

https://keras.io/why-use-keras/

polar acorn Jun 28, 2018, 9:08 AM

#

I'll try here as well I guess. So I'm looking for good ways to use Jupyter notebooks with git. I saw the option for adding jq as a filter to gitconfig and it seems very close to what I want. Anyone have any options they feel are better?

hearty plume Jun 28, 2018, 4:31 PM

#

Is anyone familiar with Multi-Input Multi-Output systems? I''ve been reading about them on Keras and I'm currently looking for resources to build my own (or adapt one from Github)

lapis sequoia Jun 29, 2018, 5:32 PM

#

the normalize=true option in some fitting methods says it removes the data mean and divides by the L2-norm. I don't get why... its calculated by summing all data values squared and taking the squareroot. But doesn't this value get larger and larger the more data points you have??

outer tiger Jun 29, 2018, 5:37 PM

#

is anyone familiar with np.ndarrays?

#

i have an array like x[][][][] where each dimension has at between 10-20 values

#

and i want x[1][2][all][all]

#

i know that supposed to be like one command with this, but i dont fully understand the documentation

lapis sequoia Jun 29, 2018, 7:02 PM

#

thats not a data science question its basic python indexing, look e.g. this tutorial: https://www.youtube.com/watch?v=ktyW-kOqGpY

#

in your case x[1][2][:][:] or x[1,2,:,:] should both work

young aurora Jun 29, 2018, 8:25 PM

#

Hey data scientists - wanted to know what people suggest to use for python google maps API usage. Namely, I have an excel that I've thrown into a Pandas Dataframe and I'd like to spit out a map using location data. Still haven't converted the locations to something usable yet (which are literally city names as of now) but I wanted to see what packages are best for this sort of thing

young aurora Jun 29, 2018, 11:48 PM

#

To be clear, it'd be nice to be able to make a map like this and highlight/fill in a specific province or city

📎 2000px-China_location_map.png

south quest Jun 30, 2018, 12:00 AM

#

:^)

desert cradle Jun 30, 2018, 12:00 AM

#

@young aurora I drew this with pyshp and PIL:

📎 animated.gif

south quest Jun 30, 2018, 12:01 AM

#

ahh wrong Joseph, cool and good

desert cradle Jun 30, 2018, 12:01 AM

#

sorry

south quest Jun 30, 2018, 12:01 AM

#

No problem

desert cradle Jun 30, 2018, 12:02 AM

#

@young aurora that's based on a dataset of county boundaries i had to download myself, and it's an equirectangular projection

small pumice Jun 30, 2018, 1:51 AM

#

I am tired of machine learning library tutorials. Whether it be TensorFlow or Keras, all tutorials don’t explain concepts in depth and use popular datasets like MNIST. Before I classify MNIST images, I want to be able to hand-write a dataset, and train a network on that.

velvet anchor Jun 30, 2018, 2:23 AM

#

Well the data sets and such are a much more advanced topic

#

however I agree with you. I'm working on a keras tutorial now actually

#

That plans to go more in depth on the ML side

small pumice Jun 30, 2018, 2:41 AM

#

Cool. I’m just looking for one that explains what concepts are being programmed.

rich grove Jun 30, 2018, 4:12 AM

#

hey guys

#

I'm trying to cluster text data based on common themes

#

I've never really worked with text before, though

#

so far I've eliminated common stop words, converted everything to lower case, and eliminated as many spelling errors as I can

#

does anyone have any advice on where to go from there?

#

I converted each text block into a dictionary with word counts on a lark but I'm not sure what I'd actually do with those

velvet anchor Jun 30, 2018, 5:37 AM

#

Maybe NLTK could be of use

rich grove Jun 30, 2018, 3:25 PM

#

thanks for the recommendation @velvet anchor, that looks like a really good resource!

feral lodge Jul 1, 2018, 7:21 AM

#

@rich grove For your preprocessing, you'll want to do, among other things possibly:

Lemmatization, ie "normalizing" the text, by morphing certain types of words to their lemmas -- their "root" versions. This includes stuff like converting plural to singular: ponies -> pony, converting conjugated verbs to their indefinite form: {being, are, is} -> be, reverting comparative/superlative to standard: {better, best} -> good.
https://en.wikipedia.org/wiki/Lemmatisation, https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
Phrase modeling, ie finding combinations of words that occur often enough that they can be assumed to constitute a phrase. For instance, the words happy and hour have very different meanings by themselves compared to when they're used together as happy hour, so treating happy hour, happy and hour, separately is very important when modelling the themes of their sentences. Phrase modeling is a subtask of something called named entity recognition (for instance, New York or the red happy robot are not only phrases, but they're entities with names that exists in the world -- recognizing this is a larger problem than just recognizing that those word sequences occur together often), so you might find some good resources googling that.
https://en.wikipedia.org/wiki/Named-entity_recognition

For topic clustering, Latent Dirichet Allocation (LDA) is, as far as I know, the standard approach.
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation, http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

I like this video a lot https://www.youtube.com/watch?v=6zm9NC9uRkk; it works with several libraries that can do these things out-of-the-box, including LDA. The whole video is great, but the parts i mentioned start at 24:00, where he shows lemmatization followed by phrase modelling. LDA is shown starting at 40:20; that whole section is really cool

rich grove Jul 3, 2018, 10:29 PM

#

hey @feral lodge thanks for the reference!

#

been using it and it's working great

velvet anchor Jul 3, 2018, 11:13 PM

#

Microsoft also has something they’ve been talking about in their machine learning package too that might work. They’ve been using it to tell whether a review is positive or negative for example

#

But I’m not super knowledgeable on it

balmy moth Jul 4, 2018, 12:34 PM

#

Hi, i am looking for tips on numba, especially on how to modify a code to gpu-paralelise it. Any expert in this topic around here ?

stone oasis Jul 4, 2018, 4:15 PM

#

@balmy moth numpy??

balmy moth Jul 4, 2018, 4:29 PM

#

no numba

#

http://numba.pydata.org/numba-doc/0.35.0/index.html

dreamy tartan Jul 5, 2018, 12:28 PM

#

Hi everyone, i want to predict survival probability of specific person at the specific time. I've looked arround and i found this: http://savvastjortjoglou.com/nfl-survival-analysis-kaplan-meier.html

In this project, i found survival probability for generally i mean not particularly. My purpose is that predict probability for each player. What should i do for predict it?

Savvas Tjortjoglou

Surviving the NFL - Survival Analysis using Python

balmy moth Jul 5, 2018, 2:12 PM

#

Hi, im still looking for a numba expert for some questions, anyone around here ?

steel glen Jul 7, 2018, 8:21 PM

#

Ugh, Just started to learn Deep Learning

#

And Data Preprocessing is really a mess :/

analog rampart Jul 7, 2018, 8:57 PM

#

what is the difference b/w fit , fittransform , transform 😃

lilac shadow Jul 7, 2018, 9:21 PM

#

could somebody please explain to me and @earnest prawn how B+ trees work? nix is smart, but i have very little understanding of them currently so it would be nice if it can be explained clearly as possible. :D

placid snow Jul 8, 2018, 5:33 AM

#

Now I'm curious as well after reading the wikipedia about it and not understanding bais 😇

steel glen Jul 8, 2018, 8:16 AM

#

Can someone eli5 bias variance trade off ?

lean ledge Jul 8, 2018, 10:37 AM

#

@steel glen Dont train your data enough -> Underfitting, you're not fitting the data well enough. Train your model too much with test data -> overfitting, you're trying to perfectly match the training data instead of matching the actual data properties

#

this essentially

📎 Bias-Variance-Tradeoff-In-Machine-Learning-1.png

steel glen Jul 8, 2018, 10:49 AM

#

@lean ledge Hmm thanks

feral lodge Jul 9, 2018, 5:51 PM

#

@steel glen I'll expand on Rags' explanation, becuase it's a interesting and central topic worth mulling over!

Bias and variance are statistical properties of (among other things) predictive models like neural networks, support vector machines, polynomials, etc. Both are causes for errors while evaluating the model, and a good model has been trained to balance between them, being not too biased and having not too much variance.

A model being biased means that it makes assumptions about the underlying structure of the data that don't seem to quite line up with the actual data we observe. This leads to errors when testing the model, since what the model predicts is different from the true observed values. You can think of the word "biased" as meaning "the model is biased towards its own idea of what the data should be, rather than what is actually is". Bias may seem like a purely bad property that we would want to reduce to zero. This isn't the case however, and to see why, have a look at this figure:

#

📎 2.-lin-reg-op.png

#

In that figure we see some scatterplots of some observed data (the green dots), and six different attempts to model this data, using polynomials of power 1, 3, 6, 9, 12 and 15. If you want, you can imagine the graphs as being snapshots of a neural network at various stages of its training -- the effect is pretty much the same. In that case, the first graph is right in the beginning of the network's training, and the last graph is after having trained for many iterations.

Reducing the model's bias to 0 would mean we have created a curve that perfectly touches all of the data points. We can see that the 15th power polynomial (or, the neural network that has been trained for many iterations), is approaching this state -- it's not very biased towards the idea that the data must follow a simple and elegant shape. This immediately seems incorrect to us though, since we know that our data will inevitably contain random fluctuations unrelated to our variables. Just like Rags said, the last model shown has started to model this unimportant random noise, lowering its training error but losing any real understanding of the underlying structure of the data. The 3rd degree polynomial is more likely the "true" one even though its training error is higher and it's more biased. As such, a model being a bit biased towards its own idea of what the data should be is actually a useful and desirable property!

#

Now, as the bias is reduced, the variance is increased. Compare the first two graphs to the last two; you'll see that the curve "wiggles" up and down a lot more. This leads to an increase in the average discrepancy between the predicted values and their mean -- the variance of the model is increased. The increase in variance is what allows the curve to snugly fit the observed data, but this is only valuable to a certain point; like said, the 3rd degree polynomial looks like the proper model of the data.

So, both high bias and high variance are undesirable features of the model.

High bias and low variance is common for models that have too little flexibility or that have been trained to little -- underfit models. These models will perform poorly on both training and testing data.
Low bias and high variance is common for models with too much flexibility, or that have been trained too much -- overfit models. These models will perform very well on training data, but very poorly on testing data, because their high variance makes them model random noise in the training data and they therefore generalize terribly.

The sweet spot for a model is somewhere inbetween the two extremes.

#

@analog rampart You're talking about scikit-learn yeah? fit() computes and saves transformation parameters in the session. transform() then applies these saved parameters to the input. fit_transform() is a combination; it just calls both functions, computing the parameters and applying the transform. For example, the standard scaler http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html scales data by transforming it:

When we work with data it's often a good idea to normalize it before training our model on it, to increase numerical stability or to compensate for different features measured in different magnitudes (like if we have two variables x1, measured in micrometeres, and x2, measured in kilometers), or other reasons. There are different ways of normalizing data. One way is to subtract the mean of the data, and then divide by its standard deviation. Ie, doing this:

X' = (X - μ)/σ

and then using X' for training the model. The difference is this:

#

📎 Screenshot_from_2018-07-09_19-26-28.png

#

So you can see we've transformed X into X' using the parameters μ and σ, centering the data and given it variance 1. With s = StandardScaler(), what scikit's s.fit(X) function does is to compute μ and σ and save them in the session. Xprime = s.transform(X) then applies the transformation X' = (X - μ)/σ. We can do this in one step by writing Xprime = s.fit_transform(X)

plush raptor Jul 9, 2018, 7:41 PM

#

Anybody know any good free online 3d vector plotters?

lapis sequoia Jul 10, 2018, 2:57 AM

#

tf.initialize_all_variables()
lets say I had variables W1,b1,W2,b2,W3,b3
would that function do this?

    W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [12,25], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b2 = tf.get_variable("b2", [12,1], initializer = tf.zeros_initializer())
    W3 = tf.get_variable("W3", [6,12], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
    b3 = tf.get_variable("b3", [6,1], initializer = tf.zeros_initializer())

I notice there is also a tf.initialize_variables function

lapis sequoia Jul 10, 2018, 7:13 AM

#

@honest relic

lapis sequoia Jul 10, 2018, 8:16 AM

#

What a human can do but machine can't?

#

(instead of love and emotional feelings)

dusky agate Jul 10, 2018, 8:18 AM

#

love and emotion can be emulated

#

we aren't unique or special, we're just more complex than the machines we create

lapis sequoia Jul 10, 2018, 8:18 AM

#

yes but they woun;t be real

dusky agate Jul 10, 2018, 8:18 AM

#

define real?

lapis sequoia Jul 10, 2018, 8:18 AM

#

um i don't think i can define real

dusky agate Jul 10, 2018, 8:19 AM

#

if given enough computational power, our thoughts and feelings can be perfectly replicated

lapis sequoia Jul 10, 2018, 8:19 AM

#

but what is the purpose of an AI?

velvet anchor Jul 10, 2018, 8:19 AM

#

Humans are really good at conditional statements that can't be like discretely stated

#

and machines are not

dusky agate Jul 10, 2018, 8:20 AM

#

A better question to ask would be "what practical use can AI perform today that a human has issues with."

lapis sequoia Jul 10, 2018, 8:20 AM

#

oh

dusky agate Jul 10, 2018, 8:20 AM

#

We aren't planning on replacing ourselves afaik

#

But AI can definitely exceed in certain tasks that we suck at

lapis sequoia Jul 10, 2018, 8:21 AM

#

such as

dusky agate Jul 10, 2018, 8:21 AM

#

Scientists and programmers are using it for a lot of tasks, one that always pops to my mind is emulating fluid simulations.

velvet anchor Jul 10, 2018, 8:21 AM

#

mass data processing is a big thing that its just not practical for humans to do

#

or massive number crunching

dusky agate Jul 10, 2018, 8:22 AM

#

Or just processing data instantly

#

Humans can do some amazing video editing, but it takes days or weeks of work to make even a rough video, an AI could do the same thing almost instantaneously

lapis sequoia Jul 10, 2018, 8:42 AM

#

@lapis sequoia if you want to join, I have a server about Human-level AI

#

Ok

lapis sequoia Jul 10, 2018, 10:13 AM

#

how the nyculina and myrcea ways are connected to the future universe?

#

nvm found, through rlk and dr4 ways

muted bridge Jul 10, 2018, 5:03 PM

#

Anyone used scrapy exxtensively here?

glossy elbow Jul 11, 2018, 2:28 AM

#

Could anyone here help me in help-2?

desert marsh Jul 11, 2018, 1:33 PM

#

Hi, was wondering if anyone had any experience with apache spot? I was thinking of parsing some syslogs using their open data model

steel glen Jul 11, 2018, 3:36 PM

#

Heyyo guys

#

Let's say I've trained a CNN and saved the model in h5 format

#

How can I use that model to make predictions?

velvet anchor Jul 11, 2018, 7:50 PM

#

Keras or?

#

Which framework @steel glen

naive hornet Jul 11, 2018, 7:53 PM

#

(he confirmed in another channel that this is a Keras question)

velvet anchor Jul 11, 2018, 7:53 PM

#

In one of the Keras imports that I don’t remember off hand there’s a classifier.load option

#

Then you can pass your data into that with .predict

steel glen Jul 11, 2018, 7:55 PM

#

Thx but i figured out

#

Preprocessing was the part that i had problem

#

Is it possible to overfit when you train CNN with MaxPooling ?

#

Cuz my model always classifies every image with the same class

velvet anchor Jul 11, 2018, 8:00 PM

#

Yes

#

Though it’s also likely you don’t have the data for the other class. Or your data for the class it’s overfitting is too wide

steel glen Jul 11, 2018, 8:02 PM

#

Well basically I'm using kaggle cats and dogs dataset

#

I don't think there's a problem with dataset

velvet anchor Jul 11, 2018, 8:05 PM

#

Can try adjusting windows / filters / activation functions too. Also epocs / steps

#

I had the same issue at work for the longest time

steel glen Jul 11, 2018, 8:07 PM

#

I think I will add one more Convolutional layer

velvet anchor Jul 11, 2018, 8:08 PM

#

It’s just all testing and trying stuff honestly. Messing with parameters. Retrain. Tweaking preprocessing done to images

#

I’ve been working on the same project st work for like almost 6 months now

steel glen Jul 11, 2018, 8:09 PM

#

Wow

#

What you're working on?

velvet anchor Jul 11, 2018, 8:10 PM

#

Training a network to detect deepfakes

steel glen Jul 11, 2018, 8:10 PM

#

Cool!

ruby gale Jul 11, 2018, 8:12 PM

#

The escalation war begins.... someone else is training a deep fake to defeat deep fake detection... The internet started because of pron and the AI overloads started with deep fakes...

velvet anchor Jul 11, 2018, 8:13 PM

#

It’s really cool honestly. Like nvidia is doing kinda the same thing. They made an adversarial network that acts like a filter for photos to defeat facial recognition

ruby gale Jul 11, 2018, 8:14 PM

#

Its the golden age right now.. Im glad Im learnign how it all works.

velvet anchor Jul 11, 2018, 8:15 PM

#

Nvidia and google both have loads of journal articles to read that are super cool with MM

#

With ML*. Like every other month it seems they have just some giant leap forward

#

And some guys at defcon made one to perform sql injections

ruby gale Jul 11, 2018, 8:16 PM

#

Yeah im learning via fast.ai for top down to bottom details and simultaneously going from bottom up with Andrew Ng on coursera

steel glen Jul 11, 2018, 8:22 PM

#

@velvet anchor Just wondering,Which GPU do you have for training?

velvet anchor Jul 11, 2018, 8:26 PM

#

Quadro

#

IDK which one It's one of the low tiers though

steel glen Jul 11, 2018, 8:54 PM

#

Ugh now I have training set accuracy of %83

#

but still it does not predict correctly

#

📎 0.73.jpg

#

📎 cat_or_dog_2.jpg

#

How this is a dog 😩

pulsar surge Jul 11, 2018, 8:59 PM

#

well it is doing a doge-y pose

#

https://i.pinimg.com/736x/64/fe/9a/64fe9a90e6c8cf8ea4345e783b5cf703--smiling-dogs-shiba-inu.jpg
verily, the resemblance is uncanny

steel glen Jul 11, 2018, 9:00 PM

#

I tried with other cat images and they're all dogs according to my model yoj

#

Now It started to predict few cats correctly

#

@pulsar surge It did correctly predict that doggy

pulsar surge Jul 11, 2018, 9:05 PM

#

did it predict the doggo as a cat? 😄

steel glen Jul 11, 2018, 9:06 PM

#

nope

#

as a doggo

#

Yeey

velvet anchor Jul 12, 2018, 12:10 AM

#

TRAining accuracy is going to be flawed

#

if one of your data sets is largely bigger than the other classification

#

like if its alwyas predicting the larger set its gonna gonna be whatever percent the classification is

#

if that makes sense

steel glen Jul 12, 2018, 9:03 AM

#

@velvet anchor My model now makes some great predictions with %81 accuracy on test set

#

I will try to make it like %90

#

but first I need for build Tensorflow from source

#

cuz they don't support CUDA compute capability 3.0 GPUs on stock version

#

I used to train with CPU 😄

#

To make training faster I resized images 64x64

#

If I can build tf-gpu from source then i will train it with resized 128x128 images, hoping to increase test set accuracy

craggy heart Jul 12, 2018, 9:53 AM

#

https://youtu.be/E4Y9BFhCICI

YouTube

Adam Saudagar

Fishy bot | Elder Scrolls online

Made a fishing bot for eso, using python and tensorflow

▶ Play video

#

Made it using tensorflow

topaz walrus Jul 12, 2018, 10:42 AM

#

@velvet anchor This is a bit late but you wouldn't happen to have any resources on that defcon talk about SQLi via machine learning would you? Can't seem to find it myself and I'm curious.

lapis sequoia Jul 12, 2018, 1:27 PM

#

i think i'm in love with matplotlib

craggy heart Jul 12, 2018, 1:55 PM

#

Ye it's a pretty good lib

velvet anchor Jul 12, 2018, 5:46 PM

#

@topaz walrus it’s on YouTube.

velvet anchor Jul 12, 2018, 7:57 PM

#

https://www.youtube.com/watch?v=wbRx18VZlYA here

topaz walrus Jul 12, 2018, 9:27 PM

#

@velvet anchor Awesome thanks, thought I was able to find it after revising my search terms a bit lol

velvet anchor Jul 12, 2018, 9:28 PM

#

no prob

steel glen Jul 12, 2018, 10:48 PM

#

Heyyo

#

I'm getting
ValueError: Negative dimension size caused by subtracting 3 from 2 for 'conv2d_81/convolution' (op: 'Conv2D') with input shapes: [?,2,2,64], [3,3,64,32]

#

📎 malkeras.jpg

#

And here's the code

#

I do use TF not Theano so my (128,3,3) parts are correct

#

But I'don't understand why I get ValueError

feral lodge Jul 12, 2018, 11:12 PM

#

Changing dimension ordering in the ~/.keras/keras.json file seems to fix this for many people: https://github.com/keras-team/keras/issues/3945

steel glen Jul 12, 2018, 11:22 PM

#

Ugh

#

First I need to locate .keras folder within google colab

#

But there's no .keras

velvet anchor Jul 12, 2018, 11:31 PM

#

what slandon said

steel glen Jul 12, 2018, 11:32 PM

#

There's no .keras folder in Google Colab

#

Also It does not let you to mkdir one

velvet anchor Jul 12, 2018, 11:32 PM

#

in the ~ directory?

steel glen Jul 12, 2018, 11:32 PM

#

Yep

velvet anchor Jul 12, 2018, 11:36 PM

#

put a

#

from keras import backend
print(keras.backend.image_data_format())

#

somewhere

#

and lmk if it says channels first or last

steel glen Jul 12, 2018, 11:43 PM

#

channels_last

#

aka Tensorflow

velvet anchor Jul 12, 2018, 11:43 PM

#

kk

steel glen Jul 12, 2018, 11:44 PM

#

Is there any other platform that I can train DL models with GPU for free

#

I've tried GCP but they don't accept prepaid cards so yoj

velvet anchor Jul 12, 2018, 11:45 PM

#

maybe uh

#

target_size=(64,64,3) in your

steel glen Jul 12, 2018, 11:46 PM

#

directy

velvet anchor Jul 12, 2018, 11:46 PM

#

datagen.flow

steel glen Jul 12, 2018, 11:46 PM

#

ok

#

Same

velvet anchor Jul 12, 2018, 11:51 PM

#

same error?

steel glen Jul 12, 2018, 11:51 PM

#

Yeah

#

📎 unknown.png

#

Seriously, the Whole fu*ing universe is against me today

#

GCP,AWS,Colab they all said f*ck off

quick glacier Jul 13, 2018, 12:09 AM

#

@steel glen Crestle has 1 hour of free GPUs lol

#

Also floydhub

steel thicket Jul 13, 2018, 1:08 PM

#

Is anyone here familiar with web scrapers?

#

More specifically the BeautifulSoup library

earnest prawn Jul 13, 2018, 2:21 PM

#

bot.tags.get("ask")

arctic wedgeBOT Jul 13, 2018, 2:21 PM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

earnest prawn Jul 13, 2018, 2:21 PM

#

@steel thicket

steel thicket Jul 13, 2018, 2:22 PM

#

@earnest prawn Alright my bad. It's chill though I figured it out in the end.

earnest prawn Jul 13, 2018, 2:22 PM

#

No problem

prime thistle Jul 14, 2018, 3:15 PM

#

hey guise

#

general best practice question

#

i have a network of 30k nodes

#

any ways of visually making it appealing?

#

trying to do some network detction

prime thistle Jul 14, 2018, 6:07 PM

#

decided to filter on edge count

#

apart from f yeah

📎 unknown.png

steel glen Jul 15, 2018, 7:37 PM

#

Heyyo guys

#

If I have 100 output neurons for a classification problem

#

Let's say i fed my model with some input and i got 83 as a prediction

#

which is related neuron

#

How can i get the label of that neuron ?

velvet anchor Jul 16, 2018, 3:15 AM

#

which framework are you using?

velvet anchor Jul 16, 2018, 9:31 AM

#

@steel glen ask here please

steel glen Jul 16, 2018, 9:31 AM

#

Oh sorry

#

@velvet anchor Keras

#

It's same Q that I asked on 15 july

velvet anchor Jul 16, 2018, 9:32 AM

#

Ok so afaik you have to reconstruct the labels yourself

#

In other words. Keras doesn’t care or store the names of your 100 categories. Instead it returns a 100x1 matrix with probabilities of each classification

steel glen Jul 16, 2018, 9:33 AM

#

Yeah

velvet anchor Jul 16, 2018, 9:33 AM

#

By using np.argmax() you can get the highest element within that matrix

steel glen Jul 16, 2018, 9:34 AM

#

predict_classes() does the same thing

velvet anchor Jul 16, 2018, 9:34 AM

#

But you’ll have to construct a list or some other data structure that maps each Index to a class

#

Yeah but predict classes is depreciated soon I believe

#

But tldr is you can’t get the name. Only the index. You have to map it out yourself

steel glen Jul 16, 2018, 9:34 AM

#

Ugh

#

@velvet anchor Thanks

velvet anchor Jul 16, 2018, 9:35 AM

#

NP

#

One thing you can do is os.listdir() or some other method to traverse your training set directory and map it that way or can hard code. Doesn’t matter really

#

also this thread https://stackoverflow.com/a/47944082/9352862

steel glen Jul 16, 2018, 9:41 AM

#

Hmm thx

lapis sequoia Jul 16, 2018, 12:44 PM

#

Hi, Im trying to create a a numpy array representing an image, but i want every pixel to be the same color

#

I created an array like this

blue = np.zeros((4608, 2592, 3)
blue[blue < 1] = 120
blue[2] = (0,0,255)

#

the second line succesfully changes it to a fully gray image but the third line failes to change every pixel to blue. Am I doing it wrong? I've searched and searched but all I've found are ways to create fully white or black images in grayscale. But I need different values for each RGB value (3)

drifting token Jul 16, 2018, 1:11 PM

#

Would numpy.full give you what you're looking for? https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.full.html

lapis sequoia Jul 16, 2018, 1:15 PM

#

Hmm that could work, I'm gonna try it

lapis sequoia Jul 16, 2018, 1:50 PM

#

tried it out, works great lord greyweather

#

thanks 😃

daring bison Jul 17, 2018, 12:38 AM

#

hi, can someone help me with nns?

#

I want to adapt and this nn to other uses ( I want to change it's possible inputs and outputs)

#

📎 ai.py

#

it's a mini-game, here is the map that comes with it.

📎 map.py

#

help plz

small ore Jul 17, 2018, 12:43 AM

#

Some people prefer not to open attachments. So it is better to post your code here in codeblocks if it is small enough for discord or use hastebin

daring bison Jul 17, 2018, 12:43 AM

#

what is hastebin?

small ore Jul 17, 2018, 12:45 AM

#

Either hastebin.com or the one link that is especially made for this server

#

I am not able to retrieve that link

daring bison Jul 17, 2018, 12:46 AM

#

I copied it on hastebin

#

how do I share it?

#

https://hastebin.com/zevudaqope.rb

#

https://hastebin.com/holumevuqi.rb

#

@small ore

small ore Jul 17, 2018, 12:49 AM

#

Heh. I am not qualified to answer your question. I just wander around here and just told you what people prefer. So please wait for your answer

lapis sequoia Jul 17, 2018, 1:46 AM

#

uhh...
what are some basic strategies on improving a computer vision algorithm?

daring bison Jul 17, 2018, 1:50 AM

#

does your algorithm include a relu function?

#

we're still talking about ais aren't we?

lapis sequoia Jul 17, 2018, 2:40 AM

#

yes

#

it includes a relu function

daring bison Jul 17, 2018, 2:47 AM

#

adapt your algorithm to your situation

#

what is the use of your ai?

#

also using temporal difference over simple q values might help alot

lapis sequoia Jul 17, 2018, 3:27 AM

#

ill be honest im really new to all of this, the only thing ive done in the past is the loan machine learning thing, if you can recommend to me a tutorial on keras that would be great

#

hmm

#

so for like NMIST...

#

are there any

#

is there any way to integrate one hot encoding to each pixel

#

AND

#

what modifications could i preform on the training data

#

i.e. rotating

#

what not

#

ALSO

#

for the initial layer, what should i use if the input is a 2d numpy array or list

daring bison Jul 17, 2018, 3:34 AM

#

I learned ai on udemy, the name of the course was "artificial intelligence a-z"

#

I don't use the exact same vocabulary as you do so it would be wise to ask someone who has deeper knowledge than I do

lapis sequoia Jul 17, 2018, 3:37 AM

#

oh

#

okay

daring bison Jul 17, 2018, 3:38 AM

#

sorry I can't help that much

#

I have to sleep see ya

lapis sequoia Jul 17, 2018, 3:39 AM

#

see ya

lapis sequoia Jul 17, 2018, 11:37 AM

#

So very simple question.. Is it possible to give a function as a blackbox in python?
So I have a function, which should be able to use different distance measures. Usually I would make this with a switch and a shit ton of duplicated code, but something like.

def somefunction(similarity_function,some_object,database) : 
    for database_item in database : 
        similarity = similarity_function(some_object, database_item)

Would be cool.

feral lodge Jul 17, 2018, 12:46 PM

#

Yeah it is, pretty much exactly as you wrote! https://stackoverflow.com/a/706735 @lapis sequoia

lapis sequoia Jul 17, 2018, 1:20 PM

#

Yeah it is exactly as I wrote. I already got the same answer in general - we had a laugh - fun has now been had. But thanks @feral lodge

daring bison Jul 17, 2018, 2:20 PM

#

yo halp plocks

#

can someone help me untangle this mess, I want to know how to make this nn usable for other puposes

#

https://hastebin.com/zevudaqope.rb

#

https://hastebin.com/holumevuqi.rb

#

the first link is the nn and the second is the map of the mini-game

analog rampart Jul 17, 2018, 4:22 PM

#

any good ML course
currently doing one from udemy

daring bison Jul 17, 2018, 4:38 PM

#

are you doing the artificial intelligence from a t0 z course?

analog rampart Jul 17, 2018, 5:31 PM

#

no i m doing ml a-z course

daring bison Jul 17, 2018, 5:38 PM

#

is ml like udemy?

#

wait ml means what?

lapis sequoia Jul 17, 2018, 6:19 PM

#

@analog rampart
Haven't tried any online courses, but I can put you in the direction of some books, where I learned all the theory from.

analog rampart Jul 17, 2018, 6:20 PM

#

not a book guy but i ll give it a short 😉

lapis sequoia Jul 17, 2018, 6:25 PM

#

Okay. I think
http://www.dataminingbook.info/pmwiki.php <-- By Zaki and Meira is really good, with quite advanced theory (currently doing research in a model, proposed here.
https://www.amazon.com/Introduction-Data-Mining-Pang-Ning-Tan/dp/0321321367 <-- very good introduction to basically everything you need to know.
http://www.deeplearningbook.org/ <-- fine book for more theory on neural lolworks.
https://www.springer.com/la/book/9780387848570 <-- is great for the statistical foundation 😃

The Elements of Statistical Learning - Data Mining, Inference, and...

This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal ...

analog rampart Jul 17, 2018, 6:29 PM

#

ty

daring bison Jul 17, 2018, 6:38 PM

#

ty

lapis sequoia Jul 17, 2018, 6:39 PM

#

Are you guys studying CS or some related education ?

daring bison Jul 17, 2018, 7:02 PM

#

if by studying you mean school and university then the answer is nah

#

just finished high school

#

ai is one of my hobbies

feral lodge Jul 17, 2018, 7:04 PM

#

@analog rampart Check out Andrew Ng's ml course on coursera, it's boss 👌 They have a chat room dedicated to it on /r/LearnMachineLearning's discord https://www.reddit.com/r/learnmachinelearning/comments/8smyod/join_study_chat_groups_for_andrew_ngs_coursera/

#

Did my bachelor's in CS, doing my master's in AI! Hoping to start a doctorate next year 🤓

daring bison Jul 17, 2018, 7:07 PM

#

I then shall wish you success in your studies

feral lodge Jul 17, 2018, 7:07 PM

#

Much appreciated 😄

lapis sequoia Jul 17, 2018, 7:09 PM

#

@daring bison that's a pretty solid hobby 😃 .
@feral lodge cool we're the same place then - I also start my doctorate next year in ML.

daring bison Jul 17, 2018, 7:09 PM

#

thanks

feral lodge Jul 17, 2018, 7:09 PM

#

Sick 👌 Any idea what your area'll be?

lapis sequoia Jul 17, 2018, 7:11 PM

#

Currently I'm working on a paper about the class imbalance problem, and maybe I'll go more into semi-supervised learning in the coming year. Not sure though. I have some great ideas for class imbalance handling, that I also consider continuing with.
But it's primarily theoretical..

daring bison Jul 17, 2018, 7:14 PM

#

I'll take this oppurtunity to ask you some questions, I am new when it comes to programming and I desire to learn how to write the codes for a simple nn (I'll go further as I progress), could you recommend me a course that will explain me in detail how to program a nn or cnn for any kind of usage?

daring bison Jul 17, 2018, 7:58 PM

#

why is everyone ignoring my questions : (

grave wasp Jul 17, 2018, 7:59 PM

#

hello all

#

there is anyone that knows about data science on python ?

naive hornet Jul 17, 2018, 8:00 PM

#

@daring bison
(a) most people don't have unlimited time, those that do have the time might not have the answers
(b) Discord is currently experiencing major outages

#

@grave wasp if you have a particular question people are more likely to respond if you ask it!

grave wasp Jul 17, 2018, 8:04 PM

#

im trying to solve a task that its about to compute n-grams , unigrams , probabilities.. i have some errors on my code and i want someone with experience on that

#

i dont want to give me the solution just to help me to figure it out what is the problem with my code

daring bison Jul 17, 2018, 8:05 PM

#

@naive hornet ok I see, I'll be more patient, sorry : (

velvet anchor Jul 17, 2018, 8:17 PM

#

Sorry Prom there’s only 3-4 of us that regularly answer ML stuff.

#

Ok so @daring bison as far as courses go. There’s a couple, but Andrew Ngs on Coursera is 👌 and a lot of people consider it the best

#

@grave wasp post code please. I think I saw it in another channel but I can’t find it

grave wasp Jul 17, 2018, 8:26 PM

#

for some reason hastebin is not working properly

#

alternative like hastebin?

lapis sequoia Jul 17, 2018, 8:26 PM

#

http://paste.pydis.com

#

we actually prefer you use that instead of hastebin 😄

grave wasp Jul 17, 2018, 8:27 PM

#

https://paste.pydis.com/omiguyonat.py

#

this is the code but i have to put what is my task

#

can i put it here?

lapis sequoia Jul 17, 2018, 8:28 PM

#

yep sure

grave wasp Jul 17, 2018, 8:28 PM

#

Computing Conditional Unigram Probabilities [4 points] Now we can build the model giving us the conditional probabilities of the next token given the N-1 previous tokens. The implementation of the required class method extract conditional probabilities() amounts to building and storing a probability distribution over all possible following tokens for each N1-gram which occurred in the corpus. The lookup structure will be stored in the value of the cond prob instance variable. For our example with N = 2, looking up how likely the token “viele” is after “sehe” is done via model.cond prob[("sehe",)]["viele"]. (BTW: the notation (token,) is necessary in the bigram case to enforce that a unary tuple is looked up, because (token) == token in Python). The recommended logical structure of your implementation is as follows: for each N-gram contained as a key in prob split the N-gram into the first N-1 tokens (the "mgram") and the final unigram if the mgram is not yet a key in cond_prob, store a new dictionary under that key set the value for the unigram in cond_prob[mgram] to the probability of the N-gram for every dictionary in the values of cond_prob add up the values assigned to all unigram keys divide the value under each unigram by the sum of values

#

after all i have a key error.. i dont understand because the dictionary is not empty

#

so i splitted

#

i did this on different dictionary

feral lodge Jul 17, 2018, 8:55 PM

#

Still using my code from yesterday? In that case you'll need to change

    if mgram not in cond_prob:
        cond_prob[mgram] = {}

    p = prob[ngram]
    cond_prob[mgram][unigram] = p

to

    if unigram not in cond_prob:
        cond_prob[unigram] = {}

    p = prob[ngram]
    cond_prob[unigram][mgram] = p

Since your example usage showed this:
>>> ngram_model.cond_prob[("beobachteten",)]

rather than something like this, which my code was assuming:
>>> ngram_model.cond_prob[("hello", "my", "name", "is")]

ie, you're using unigrams as keys in cond_prob rather than mgrams

#

So if you are testing this:
>>> ngram_model.cond_prob[("beobachteten",)]

then I would expect a KeyError because ("beobachteten",) does not exist as a key in cond_prob, because we never put it there, because it's a unigram

#

Btw I'm running back and forth in uni right now, so i'm semi-afk 😄

grave wasp Jul 17, 2018, 8:58 PM

#

yes but when i print my dict show me {'alekos' : 1, 'something' :2 }

feral lodge Jul 17, 2018, 8:59 PM

#

That's after changing the stuff i showed?

grave wasp Jul 17, 2018, 9:00 PM

#

before

#

sorry im little confused because im all day on this task and my head is really heavy 😃

feral lodge Jul 17, 2018, 9:01 PM

#

Try making the change and let's see. might not fix all problems, but my previous code was definitely not compatible with your example usage

#

No stress friendo

grave wasp Jul 17, 2018, 9:15 PM

#

my mistake is on this ?

               
        self.cond_prob[i] = self.prob[key]

feral lodge Jul 17, 2018, 9:19 PM

#

That's definitely an issue yeah. What you're doing there is making cond_prob be a dictionary that looks like this:

{("alekos") : 0.08, ("slandon") : 0.05}

But you want to make it looks like this:

{("alekos) : {("my", "name", "is")" : 0.08}, ("slandon") : {("my", "name", "is")" : 0.02}}

#

Remember the difference between a "regular" probability p(X) and a conditional probability p(X|Y).

The regular probability p("slandon") of your text will be very small -- it'll be (number of times "slandon" appears in the text) / (total number of words in the text)

The conditional probability p("slandon" | "my name is") will be much higher -- it will be (number of times "slandon" appears after the string "my name is" in the text) / (total number of words that immediately follow the string "my name is" in the text).

#

So since conditional probabilities are p(X | Y) it makes sense that we would need both the unigram X and the mgram Y to create a dictionary of conditional probabilities. What you're doing is only using X

grave wasp Jul 17, 2018, 9:26 PM

#

hmm...

feral lodge Jul 17, 2018, 9:29 PM

#

Example:

Say your text has 10000 words. The string "my name is slandon" appears once. The string "my name is alekos" appears twice. No one else says "my name is" in the text, and our names don't appear anywhere else but in those sentences.

Then:

p("slandon") = 1/10000
p("alekos") = 2/10000

p("slandon" | "my name is") = 1/3
p("alekos" | "my name is") = 2/3

grave wasp Jul 17, 2018, 9:30 PM

#

yes exactly

feral lodge Jul 17, 2018, 9:30 PM

#

So that's the difference between your two dictionaries prob and cond_prob

#

To make a meaningful dictionary for stuff like

p("slandon" | "my name is") = 1/3
p("alekos" | "my name is") = 2/3

we need to use both the unigrams, like "slandon" and the mgrams, like "my name is"

#

But what you're doing in this snippet

  if i not in self.cond_prob:
    self.cond_prob[i] = self.prob[key]

is to only use the unigram

grave wasp Jul 17, 2018, 9:34 PM

#

i have to get rid of that?

#

if key not in self.cond_prob?

#

self.cond_prob[key] = {} ?

#

it will print something {('alekos', 'something'): {'somethin': 4.}

daring bison Jul 17, 2018, 9:36 PM

#

@feral lodge ik now you're busy so I won't disturb you with my questions, when can I ask them to you?

feral lodge Jul 17, 2018, 9:43 PM

#

I didn't forget you! Like Clay said, Andrew Ng's coursera course is a really good place to start. This free book http://neuralnetworksanddeeplearning.com/ is also a wonderful intro to NNs.

After a while checking out libraries like pytorch or keras will make implementing your own nets very simple and intuitive. As you go along you'll find you almost never want to implement the networks yourself, because you'll need to implement really sophisticated algorithms for computing gradients (partial derivatives) for many, many variables. It's really only feasible, imo, to implement your own neural networks if they have max 2-3 layers, so pytorch/keras/etc are more or less necessary. Implementing your own small nets is a good exercise though. Googling "implementing small neural network python" will probably get you some good results

#

@grave wasp self.cond_prob[i] = {} I think looks better. Remember that cond_prob[key] is a probability. So your snippet would make cond_prob look like this: {0.098 : {}} which is obviously not what you want 😄

#

@daring bison If you're a beginner at programming as well as NNs, you'll definitely want to learn some basic python before you try to implement nets though 😄 Nothing stops you from learning python and NN theory at the same time however

daring bison Jul 17, 2018, 9:52 PM

#

@feral lodge i get most of the things exept for the 2 most important things in programming

#

1.what is the goal of an argument?

#

2.and what does the . mean in for example : torch.optim ?

lapis sequoia Jul 17, 2018, 10:23 PM

#

do you know about classes / OOP

#

the purpose of passing arguments to functions is to do some type of computation and produce a result that is unique to the argument you pass

daring bison Jul 17, 2018, 10:28 PM

#

can you please rephrase in retarted words, I don't fully understand : (

lapis sequoia Jul 17, 2018, 10:28 PM

#

im not really sure how to explain it uhh

#

you know how in math you have functions right?

#

like x and f(x)

#

you know how when you plug a value in x you do some computation to get a result?

daring bison Jul 17, 2018, 10:29 PM

#

oh I see now

#

you plug and x into a function

#

programming is closer to math than I thought

#

thank you

lapis sequoia Jul 17, 2018, 10:30 PM

#

yeah but remember

#

functions can have multiple arguments

daring bison Jul 17, 2018, 10:31 PM

#

yeah like y= a x +b could have 2 arguments and a constant

lapis sequoia Jul 17, 2018, 10:31 PM

#

sure

#

but remember theyre not all like math related

#

also this is kinda going off topic frmo this channel but

#

lets say you want a function that makes all the letters in a word capitalized

#

so you can do py def make_word_uppercase(string): return string.upper()

#

it just uses arguments to do something, which can be produce an output or accomplish a task

daring bison Jul 17, 2018, 10:37 PM

#

I see

#

I was lacking a key part

#

now let's jump to the second part

#

if I say string.upper() does the dot mean it's the fct upper from the string format?

lapis sequoia Jul 17, 2018, 10:41 PM

#

its the method upper from the str class

#

you should look more into classes / object oriented progamming with python

daring bison Jul 17, 2018, 10:50 PM

#

class class_name

#

def name

#

self.name = name

#

when is it always something like that?

novel path Jul 18, 2018, 1:08 AM

#

hey all, I was redirected here

#

I've a df

#

and would like to convert 'Date' to POSIX int

#

Date Open Close High Low Volume
745 2018-07-17 04:00:00+00:00 6723.000000 6695.600000 6759.1 6679.3 331.857518
746 2018-07-17 08:00:00+00:00 6695.600000 6700.273918 6744.0 6667.4 294.171766
747 2018-07-17 12:00:00+00:00 6700.273918 6695.300000 6726.9 6666.0 421.905261
748 2018-07-17 16:00:00+00:00 6695.300000 7190.200000 7274.0 6695.3 2132.705123
749 2018-07-17 20:00:00+00:00 7190.200000 7310.800000 7483.9 7157.7 3500.710142

#

could someone help?

#

thx a lot in advance guys

lapis sequoia Jul 18, 2018, 1:20 AM

#

like a unix timestamp?

#

you can use datetime.datetime.strptime

lapis sequoia Jul 18, 2018, 8:50 AM

#

somebody firmiliar with matplotlib

naive hornet Jul 18, 2018, 8:51 AM

#

bot.tags['ask']

arctic wedgeBOT Jul 18, 2018, 8:51 AM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

lapis sequoia Jul 18, 2018, 8:53 AM

#

hi
i have a question about visualizing
i wanna call a matplotlib chart
"so that i get the chart how we see it
and then give it to the code
how should that be done?

#

any ideas

#

nobody....?

velvet anchor Jul 18, 2018, 9:03 AM

#

Most people familiar with matplotlib are probably asleep / working. Least here it won’t get buried like in a help channel. Prolly take a few hours before someone can take a look

lapis sequoia Jul 18, 2018, 9:07 AM

#

sad yeah

#

don't you know any?

thorn topaz Jul 18, 2018, 9:42 AM

#

metis sux

wraith frigate Jul 18, 2018, 2:57 PM

#

Have some previous python experience in uni. Any thoughts on this udemy course for some personal learning? https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/

Udemy

Python for Data Science and Machine Learning Bootcamp

Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!

velvet anchor Jul 18, 2018, 5:49 PM

#

Not on that course @wraith frigate Andrew NGs course though on coursera I believe is currently one of the best as far as I know

wraith frigate Jul 18, 2018, 6:00 PM

#

@velvet anchor thanks I appreciate your input

novel path Jul 18, 2018, 7:34 PM

#

@wraith frigate , I've enrolled and have the material. PM me if you are interested

#

guys, anyone on dataframe experience?

#

df['Test'] = df['Close'].ewm(span = 24).mean()

#

for ['Close'] NaN cells, I would like to keep ['Test'] cells also NaN

lapis sequoia Jul 19, 2018, 4:59 AM

#

GUYS

#

IS THERE AN ANIME

#

THAT INVOLVES NEURAL NETWORKS

quiet gyro Jul 19, 2018, 5:08 AM

#

@lapis sequoia Stop spamming channels with unrelated questions

ornate folio Jul 19, 2018, 8:30 AM

#

Hey guys, if you had a tool to collect information from a blockchain. What information would you want it to give back to you?

earnest prawn Jul 19, 2018, 11:20 AM

#

I'd make a graph with all the info about it

ornate folio Jul 19, 2018, 12:20 PM

#

At the moment, I'm only collecting raw data. Just trying to get ideas on what to collect, by asking people what they would like to see

prime thistle Jul 19, 2018, 3:08 PM

#

couldnt you just collect everything

wanton pier Jul 19, 2018, 7:22 PM

#

data newbie here. Do you guys know how to do a single colorbar for contour subplots? I'm trying to visualize some ocean current data and I'm breaking it up by component (so I have a scalar to contour). The velocity variable is 2D from a netCDF file and depends on time and depth (instrument used collects data at many different binned depths). I found something on stackoverflow that is almost my exact question (https://stackoverflow.com/questions/13784201/matplotlib-2-subplots-1-colorbar) but I can't seem to get imshow to play nice with my data

Stack Overflow

Matplotlib 2 Subplots, 1 Colorbar

I've spent entirely too long researching how to get two subplots to share the same y-axis with a single colorbar shared between the two in Matplotlib.

What was happening was that when I called the

ornate folio Jul 19, 2018, 10:51 PM

#

@prime thistle yep, I could. Still need to format it though

#

I am just going by ear now and refactoring

serene oar Jul 20, 2018, 4:36 PM

#

Hello!
If I was to create a research bot kind of a thing, what should I start with? The idea would be to gather papers, posts etc that mention in them for example a specific keyword from sources such as scholar.google.com or reddit etc.. At first I was under the impression that web scraping would be the thing to learn, but it seems to be a different thing.
Any insight/ feedback is much appreciated.

lapis sequoia Jul 20, 2018, 5:10 PM

#

Can't you solve this by actually searching google scholar and research gate for instance - and just be subscribed to different authors ?

serene oar Jul 20, 2018, 5:16 PM

#

Are you suggesting to not develop anything for this at all and continue manual research?

lapis sequoia Jul 20, 2018, 5:18 PM

#

So if you do a google scholar search for a keyword, you get a result - why have a bot to do that? (unless you have 1000 keywords), but then you'll get n*1000 papers

serene oar Jul 20, 2018, 5:19 PM

#

There are other sources too, would be nice to have it centralised and then the program could be improved as time foes on

lapis sequoia Jul 20, 2018, 5:22 PM

#

Well I can' t really see the benefit of the program.. In the ideal scenario, you auto pull new papers containing the keywords you're interested. Deriving if a paper is important takes a ton of time, especially compared to searching, so I can only imagine that such a bot will give you a shitload of a papers, that you don't have time to read..

#

So what is it that you're trying to achieve academically by this? Do you want to be right at the edge of a specific research area - looking for new icebergs to take a shot at?

serene oar Jul 20, 2018, 5:25 PM

#

I'm just looking to learn by doing something that will help me.
I'd prefer if it brought up relevant news, articles etc of current innovations on e.g. gas fuels and mof's etc. But perhaps the MVP of the idea wouldn't be of much help

lapis sequoia Jul 20, 2018, 5:39 PM

#

Are you interested in learning data science? 😃

serene oar Jul 20, 2018, 5:52 PM

#

Yes!

#

I have been doing some stuff with pandas, matlibplot, dash etc

velvet anchor Jul 20, 2018, 6:00 PM

#

Here’s the problem - by creating what you want to do now you need both access to a google search API and the ability to process keywords to find words that are close but not in your exact keyword set.

#

You’ll probably want to use NLTK for the keyword handling

serene oar Jul 20, 2018, 6:02 PM

#

Isn't the API available?

velvet anchor Jul 20, 2018, 6:06 PM

#

No. Google scholar both disallows bots in their robots.txt and disallows use of the service through anything but the interface they provide in the ToS

lapis sequoia Jul 20, 2018, 7:39 PM

#

@serene oar I would probably look at Kaggle.com to get started

velvet anchor Jul 20, 2018, 7:40 PM

#

Individual journal publishers may have APIs you can use however

serene oar Jul 20, 2018, 7:45 PM

#

Alright. Thank you 😃

wanton pier Jul 20, 2018, 7:52 PM

#

Do any of you guys know how to extract the slope at arbitrary points from contour lines from a generated contour plot? I'm trying to align some data along directions defined by a constant water depth (isobaths). If I can get the slope of the contour at any point, I can calculate the angle it makes with the x axis and project my data onto the resultant vector.

velvet anchor Jul 20, 2018, 7:55 PM

#

That’s just the derivative right?

wanton pier Jul 20, 2018, 7:56 PM

#

yes

velvet anchor Jul 20, 2018, 7:56 PM

#

Like take first derivative then plug in your point. If you have the equations.

wanton pier Jul 20, 2018, 7:57 PM

#

I dont unfortunately. I only have a data field that represents the bathymetry, so I'd have to calculate the contours to find the direction of the bathymetry.

velvet anchor Jul 20, 2018, 7:57 PM

#

Hm. There’s probably a way to do that with a function call but I’m not sure.

#

I’ll have to research when I get home

#

I found this @wanton pier https://skemman.is/bitstream/1946/16233/1/final_processingwithpython_dillon.pdf

#

But I haven’t read through it. It looks promising though

wanton pier Jul 20, 2018, 8:06 PM

#

Thanks @velvet anchor I'll read through it

steep herald Jul 20, 2018, 11:10 PM

#

Hi Guys

I have a dataframe

which , in column form, looks like

Date Store Question
20180701 Store A Q1
20180701 Store A Q2
20180701 Store A Q3

20180702 Store B Q1
20180702 Store B Q2
20180702 Store B Q3

etc

I've written a pretty length app that parses the info as needed
but im trying to expand my pandas knowledge so ive rebuilt it pretty neatly.

however I'm stuck on how to getting unique data out with the new sleeker version of the code.

essentially i want to use df to get

Date Store Q1 Q2 Q3

with the store being unique

I've looked at pandas melt, groupby, unique, stack

but i can't seem to pull it off in the format I want.

Any advice?

#

Currently im thinking to concatenate , Date + Storename + OwnerName + Telephone to get a new column, 'id' , which is essentially a uniqueness checker?

Then use groupby ? to join all rows with the same 'id' so that i can get one row per unique entry?

velvet anchor Jul 20, 2018, 11:19 PM

#

Ok so

#

You want to extract all rows with a given store name?

#

Is that correct?

steep herald Jul 20, 2018, 11:22 PM

#

eh - i might be wording it wrong

i essentially just want to get to

20180701 Store A Q1 Q2 Q3
20180702 Store B Q1 Q2 Q3

velvet anchor Jul 20, 2018, 11:24 PM

#

Admittedly my pandas knowledge only comes from helping others but

#

Is GroupBy() what you want?

#

If not I’ll load up PyCharm and mess around some. Just got hone

steep herald Jul 20, 2018, 11:26 PM

#

I'm going to test if Groupby works with the idea I had uptop Date + Storename + OwnerName + Telephone to get a new column, 'id' , which is essentially a uniqueness checker?

but yeah - it seems to maybe be what i need

if there was an easier way to do this that would be great.

i should probably give a better snippet of what the data looks like because i realize i might just run into a different problem

#

columns: ['SurveyGUID', 'SQNo', 'Data Type', 'Date', 'Description', 'Caption_1', 'Value_1', 'Caption_2', 'Value_2', 'Caption_3', 'Value_3', 'Caption_4', 'Value_4', 'Caption_5', 'Value_5', 'Question', 'Answer', 'Report_Criteria_1', 'Report_Criteria_2', 'Report_Criteria_3', 'Report_Criteria_4', 'Report_Criteria_5', 'Report_Criteria_6', 'Report_Criteria_7', 'GPS Longitude', 'GPS Latitude', 'Clerk', 'Checkbox_Option_1', 'Checkbox_Value_1', 'Checkbox_Option_2', 'Checkbox_Value_2', 'Checkbox_Option_3', 'Checkbox_Value_3', 'Checkbox_Option_4', 'Checkbox_Value_4', 'Checkbox_Option_5', 'Checkbox_Value_5', 'Checkbox_Option_6', 'Checkbox_Value_6']

I extract data from a less than ideal database application ( im going to build my own soon, to replace the current one, with proper normalization )

Whats problematic with group by is the Answers

I'm trying to arrange the multiple rows of each store into one row per store ( but I have to ensure the store isn't a duplicate )

however some Questions have their Answers in the very next column.
but other Questions have their Answers in Checkbox Form

velvet anchor Jul 20, 2018, 11:29 PM

#

Hmm

#

Maybe intersect?

#

It seems kinda similar to what they’re asking here

#

https://datascience.stackexchange.com/questions/14817/pandas-get-feature-values-which-appear-in-two-distinct-dataframes

Data Science Stack Exchange

Pandas - Get feature values which appear in two distinct dataframes

I have a Pandas DataFrame structured like this:

user_id movie_id    rating

0 1 1193 5
1 2 1193 5
2 12 1193 4
3 15 1193 4
4 17 1...

#

Different idea but kind of? Similar concept

steep herald Jul 20, 2018, 11:32 PM

#

breaks my brain trying to visualize that into what i need 😛

#

I dont think its far off however

velvet anchor Jul 20, 2018, 11:34 PM

#

Pandas can do basically everything but it’s so difficult sometimes

steep herald Jul 20, 2018, 11:45 PM

#

it can be quite frustrating to learn new things.

using stuff im used to however makes things a breeze usually. just new territory thats horrid

lilac shadow Jul 20, 2018, 11:46 PM

#

i've never used pandas but apparently it has a steep learning curve.

velvet anchor Jul 20, 2018, 11:47 PM

#

thats a bit of an understatement

#

but it is one of the most powerful data libraries

#

@feral lodge knows a bit about data frames I believe, not sure when he'll be around to see it though

steep herald Jul 20, 2018, 11:49 PM

#

hmm - im defintely going to pop back in here - i didnt realize Python Discord had a datascience section.

velvet anchor Jul 20, 2018, 11:49 PM

#

Yeah

#

its pretty sparse, theres only 3-4 people who regularly help but we normally don't give up xd

#

I'd look at intersect though

#

I'm not sure how to implement it for your df but I know that's the key

lilac shadow Jul 20, 2018, 11:50 PM

#

yeah, the people who can help here usually help a fuckton

steep herald Jul 20, 2018, 11:52 PM

#

im busy implementing the concatenation + groupby still -
if i can get everything into the same row ,

i can generate a checker that differentiates between Traditional Single Answer Type Questions and Checkbox Answer Type Questions.

then getting the results in the proper format shouldn't be hard.

if it fails il re-evaluate with intersect

#

all in all - if the database was just built correctly - i would not have these issues. 😛

feral lodge Jul 20, 2018, 11:57 PM

#

This too dirty?


import pandas as pd

# Recreate important part of dataframe
d = {'Store' : ["StoreA"]*3 + ["StoreB"]*3,
     'Question' : [f"q{j}{i}" for j in 'AB' for i in [1,2,3]]}
df = pd.DataFrame(data=d)

print(df)  # have a look
print("------")

# hop(s) returns pd.Series([s, s+3, s+6, ..., nrow])
nrow = df.shape[0]
hop = lambda start : pd.Series(range(start, nrow, 3))

# Drop questions. Keep only first row for each Store
df_new = df.drop('Question', axis=1)
df_new = df_new.loc[hop(0)];

# Extract questions
df_new = df_new.assign(Q1 = df.Question[hop(0)].copy().values)
df_new = df_new.assign(Q2 = df.Question[hop(1)].copy().values)
df_new = df_new.assign(Q3 = df.Question[hop(2)].copy().values)

# Update indices from [0,3] to [0,1] 
df_new = df_new.reset_index(drop=True) 

print(df_new)

#

Prints this

    Store Question
0  StoreA      qA1
1  StoreA      qA2
2  StoreA      qA3
3  StoreB      qB1
4  StoreB      qB2
5  StoreB      qB3
------
    Store   Q1   Q2   Q3
0  StoreA  qA1  qA2  qA3
1  StoreB  qB1  qB2  qB3

velvet anchor Jul 20, 2018, 11:58 PM

#

slandon you're a fucking wizard honestly

feral lodge Jul 21, 2018, 12:00 AM

#

My secret is this magic elixir

📎 IMG_20180721_015905876.jpg

steep herald Jul 21, 2018, 12:05 AM

#

thanks so much - my swampy brain is going to take a while to understand why you did what you did

I'm going to attempt to MacGyver your code into my own and then add the necessary steps for the results i need

but thanks ! its definitely going to help

feral lodge Jul 21, 2018, 12:06 AM

#

My pleasure friendo, hope it helps

#

Do ask if anything is unclear 👌

steep herald Jul 21, 2018, 12:07 AM

#

will do - nice tunes btw

feral lodge Jul 21, 2018, 12:07 AM

#

Cheers 😄 🎶

#

It's the spotify radio for the album Polish Spirit 👌

velvet anchor Jul 21, 2018, 7:31 AM

#

Andrew Ng's course has a new enrollment on the 23rd btw for anyone whose interested

lapis sequoia Jul 21, 2018, 8:25 AM

#

Are you guys part of other data-science'y discords, that is not only for Python ❤ ❤ ❤

#

i am @lapis sequoia

#

also is this diagram accurate for a simple CNN?

#

do you want an invite to the server or something?

#

also your drawing might be a convolutional, but it's a little hard to see in that notation what you're convolving.

#

@lapis sequoia yeah sure

lapis sequoia Jul 21, 2018, 2:35 PM

#

oh nvm i realized it isnt

strange radish Jul 21, 2018, 2:51 PM

#

Hi everyone. I was looking for a tool that'd help me visualize decision boundaries in 2D and 3D.

#

I know Matplotlib exists, but is there an easy interface to it that'll let me do this?

#

(I can code many complex algos but when it comes to visualizing stuff I'm kind of useless)

lapis sequoia Jul 21, 2018, 5:59 PM

#

Can someone tell me a easy-to-use machine learning lib

placid snow Jul 21, 2018, 6:03 PM

#

Sklearn `? http://scikit-learn.org/

lapis sequoia Jul 21, 2018, 6:04 PM

#

Thank you

velvet anchor Jul 21, 2018, 9:09 PM

#

Sklearn Keras tensorflow and pytorch are the relevant ones @lapis sequoia they all have pros and cons

#

Keras and sklearn are probably the two easiest

steel glen Jul 22, 2018, 10:56 AM

#

@lapis sequoia Keras is da best

#

with just few lines you get a ANN,CNN,RNN etc

#

With Tensorflow support

dense oak Jul 22, 2018, 1:35 PM

#

Good morning all, I am trying to finish a degree program and ended up in a Data Mining class as my only option...I have not coded in lets go witth "a long time" and I am really struggling, with even simple things. wondering if anyone has some burnable time today to help walk me through some stuff.

minor whale Jul 22, 2018, 4:30 PM

#

please tell me good resources to learn ML with python!!!

placid snow Jul 22, 2018, 4:40 PM

#

@minor whale @wind wasp you could have a go at https://developers.google.com/machine-learning/crash-course/

Google Developers

Machine Learning Crash Course | Google Developers

An intensive, practical 20-hour introduction to machine learning fundamentals, with companion TensorFlow exercises.

minor whale Jul 22, 2018, 4:40 PM

#

@placid snow thanks does it have assignments, exercises?

placid snow Jul 22, 2018, 4:41 PM

#

Yeah

#

A few ones in between sessions

minor whale Jul 22, 2018, 4:41 PM

#

okay

daring bison Jul 22, 2018, 5:07 PM

#

@feral lodge do you think it's possible to make an android with the current knowledge we have?

#

like for the walking part we take an ars algorithm

#

for the eyes just a cnn

#

for the language we use a chatbot

#

and etc

subtle idol Jul 22, 2018, 5:55 PM

#

i am trying to import my mnist data but when I try it

📎 Screen_Shot_2018-07-22_at_1.55.00_PM.png

#

it gives me that error

#

even tho that file exists

#

📎 Screen_Shot_2018-07-22_at_1.55.44_PM.png

daring bison Jul 22, 2018, 5:59 PM

#

correctly parented ?

#

format of the file?

subtle idol Jul 22, 2018, 6:08 PM

#

it is correctly parented

#

and it is a zip file

daring bison Jul 22, 2018, 6:12 PM

#

o_0

steel glen Jul 22, 2018, 6:55 PM

#

@daring bison Things you've said are of course possible but consciousness is decade(s) away

daring bison Jul 22, 2018, 6:56 PM

#

well at least we have the basics : )

steel glen Jul 22, 2018, 6:57 PM

#

@subtle idol U sure it's on right dir?

velvet anchor Jul 22, 2018, 7:54 PM

#

@minor whale not strictly python related but Andrew Ngs on coursera is the best ML course.

#

@subtle idol try unzipping it maybe and then loading the pickle file directly

daring bison Jul 22, 2018, 8:38 PM

#

to unzip you write "unsqueeze. "?

velvet anchor Jul 22, 2018, 8:40 PM

#

I meant like uncompressing the gz part so you have just the .pkl

minor whale Jul 23, 2018, 7:03 AM

#

@velvet anchor it's only the mathematical basis of ml?

velvet anchor Jul 23, 2018, 7:03 AM

#

Yeah but thats the important part really

#

the rest kinda transfers over into any language because most of the popular libraries have ports

small ore Jul 23, 2018, 9:18 AM

#

@minor whale Not much mathematical background required for Andrew Ngs course. It has programming exercises but uses the much easier higher level Octave/Matlab scripts. You can of course implement the same excercises in python with some numpy and scipy.

#

For a course with thorough mathematical approach grab the one pinned by Rags

minor whale Jul 23, 2018, 10:51 AM

#

@small ore where do i find that course?

placid snow Jul 23, 2018, 10:52 AM

#

Pinned messages upper right 📌

minor whale Jul 23, 2018, 11:27 AM

#

@placid snow ah saw it thanks

lapis sequoia Jul 23, 2018, 11:53 AM

#

Anyone good with pandas profiling

#

Actually this might just be a returning issue

#

https://stackoverflow.com/questions/51477800/how-do-i-display-my-html-file-on-my-page-in-flask

Stack Overflow

How do I display my html file on my page in Flask

I'm trying to render my report to the page however the html code is showing not the actual object, how do I display the object.

profile = pp.ProfileReport(df, check_correlation=False)

return

subtle idol Jul 23, 2018, 1:48 PM

#

@steel glen it is in the right dir

#

@velvet anchor I unzipped the file how do i load it directly?

subtle idol Jul 23, 2018, 3:32 PM

#

Nevermind I got it to work

copper swan Jul 23, 2018, 8:55 PM

#

any guide to get started with machine learning?

placid snow Jul 23, 2018, 8:56 PM

#

https://developers.google.com/machine-learning/crash-course/ May be what you're looking for

Google Developers

Machine Learning Crash Course | Google Developers

An intensive, practical 20-hour introduction to machine learning fundamentals, with companion TensorFlow exercises.

lapis sequoia Jul 23, 2018, 8:56 PM

#

well idk, i haven't heard too many good things about tensorflow

placid snow Jul 23, 2018, 8:57 PM

#

I used that course to pass my classes at least. If it works for everyone, that's another tale

lapis sequoia Jul 23, 2018, 8:57 PM

#

oh

#

idk i haven't tried it myself but from what i've heard you have to do things a particular way compared to other ML libraries likes keras

#

also i'm unexperienced too lol

velvet anchor Jul 23, 2018, 8:58 PM

#

Andrew Ngs course on coursera @copper swan

placid snow Jul 23, 2018, 8:59 PM

#

Else there's http://scikit-learn.org/stable/tutorial/index.html which is a library built on learning to use scipy afaik

velvet anchor Jul 23, 2018, 8:59 PM

#

scikit learn, keras, tensorflow, pytorch are the big 4

#

tensorflow is fine its just less abstracted

lapis sequoia Jul 23, 2018, 9:00 PM

#

oh

velvet anchor Jul 23, 2018, 9:01 PM

#

whereas with Keras you can like

#

throw numpy.arrays() at it

copper swan Jul 23, 2018, 9:11 PM

#

@velvet anchor and what language does that course on coursera uses?

velvet anchor Jul 23, 2018, 9:11 PM

#

You can use any

#

its a ML course not really a language specific one

#

(disclaimer: i haven't taken it, I am soon though, I just know its the most recommended one and its more on the backbone of ML than implementing it)

copper swan Jul 23, 2018, 9:12 PM

#

oh okay. but i suppose we need to learn about python libraries to program too right?

velvet anchor Jul 23, 2018, 9:12 PM

#

Yeah but if you know what you're looking for it becomes much easier to transfer between languages

copper swan Jul 23, 2018, 9:13 PM

#

oh okay

#

thanks man

rapid pawn Jul 24, 2018, 12:28 AM

#

is anyone familiar with python spline interpolation?

so i wrote something to do natrual cubic spline interpolation
and the yellow points should be all connected
yet i cant find where i went wrong

#

📎 unknown.png

tulip cosmos Jul 24, 2018, 1:03 AM

#

Hey there, just came by to ask if any of you guys are bioinformaticians?

#

I'm thinking about switching from my biotechnology m.sc. to bioinformatics m.sc. and am still undecided

#

if so, do you have any pros/cons for me?

#

i'm kind of tired of the lab work (which is part of bioinformatics too) but I'm sure programming is more of my topic since I prefer math/statistics over biochemistry

#

would appreciate it a lot

misty sonnet Jul 24, 2018, 6:53 AM

#

@tulip cosmos if you enjoy one more than the other, that's pretty important, take that into account.

hearty hazel Jul 24, 2018, 9:13 AM

#

Numpy 1.15.0 is out and contains a number of breaking changes! https://github.com/numpy/numpy/releases/tag/v1.15.0

GitHub

numpy/numpy

numpy - Numpy main repository

small ore Jul 24, 2018, 9:16 AM

#

breaking as in it would break old code that uses numpy?

#

Or just a short for 'Ground breaking'?

hearty hazel Jul 24, 2018, 9:18 AM

#

It'd break old code, of course

small ore Jul 24, 2018, 9:19 AM

#

Hope it aint too much. Coz most good tutorials will become obsolete

lapis sequoia Jul 24, 2018, 9:24 AM

#

Well array indexing is changing. It will be quite is easy to fix, but this may require change in many simple tutorials.
The hardest point for me will be to adapt my small brain to not use the old syntax.

tulip cosmos Jul 24, 2018, 10:58 AM

#

@misty sonnet I know that, but the risk is high since I'm new to programming. My M.sc. i'm doing right now is pretty safe and would be finished in one year. If I switch it will take me 3 years and I don't even know if I'm capable of doing this.. other things such as girlfriend (it is 2h apart from my home) and loosing the right to continue my current master are also points that make me insecure in that regard.

#

Worst case scenario is that I switch and I'm not smart enough/experienced enough to finish, it could even be that bioinformatics isn't the right thing for me, that's why I was asking

misty sonnet Jul 24, 2018, 11:22 AM

#

@tulip cosmos looks like alot of negatives to me

#

No one can tell you whether you will get coding or not quicky

tulip cosmos Jul 24, 2018, 12:07 PM

#

don't know honestly, it's 3 years vs. the rest of my life which makes it an important decision

small ore Jul 24, 2018, 12:22 PM

#

If you already know some coding and when you look at your course syllabus and perhaps browse what kind of algorithm/math you have to implement and if you feel confident about it, only then go about it

lean ledge Jul 24, 2018, 12:22 PM

#

you talk as if you cant do bioinformatics with a biotechnology degree

#

its very very easy to get experience with coding

#

pretty simple to teach yourself

small ore Jul 24, 2018, 12:23 PM

#

Unrelated:

NumPy 1.16 will drop support for Python 3.4.
NumPy 1.17 will drop support for Python 2.7.```

tulip cosmos Jul 24, 2018, 12:26 PM

#

everyone says python is simple, but getting into machine learning etc can be a real pain cant it?

lean ledge Jul 24, 2018, 12:27 PM

#

a bit. if you're comfortable with maths/stats, its not too bad

tulip cosmos Jul 24, 2018, 12:27 PM

#

I really find this stuff interesting but if not in university I don't really know where to learn all this stuff apart from youtube celebrities

#

i'm decent I guess, i'm more of an average guy

lean ledge Jul 24, 2018, 12:27 PM

#

edX/Coursera courses, online resources, books, etc

#

lots of stuff

tulip cosmos Jul 24, 2018, 12:28 PM

#

yea, I know theres lots of literature

#

it's just super hard to start from 0 basically

#

(I guess)

lean ledge Jul 24, 2018, 12:28 PM

#

eh, its aight

#

try Andrew Ng and Introduction to Statistical Learning

tulip cosmos Jul 24, 2018, 12:29 PM

#

you talking about andrew Ngs books or does he have online courses?

#

I'll have to look it up, thanks a lot

lean ledge Jul 24, 2018, 12:30 PM

#

online courses

#

both the coursera course and slightly better CS229 content

tulip cosmos Jul 24, 2018, 12:31 PM

#

ok 👌🏻

#

luckily the university I'm applying for has video records of some of their courses so I can learn this way too

#

a bit unrelated but what's the best IDE for python in your opinion?

misty sonnet Jul 24, 2018, 12:41 PM

#

Pycharm 100%

#

Nothing else compares

#

@tulip cosmos

tulip cosmos Jul 24, 2018, 12:41 PM

#

cool

#

thanks

small ore Jul 24, 2018, 12:57 PM

#

Rags, can you also pin the Andrew Ng course link?

chilly crest Jul 24, 2018, 2:47 PM

#

@small ore the pinned messages reveals a link for this course https://courses.edx.org/courses/course-v1:ColumbiaX+CSMM.102x+1T2017/course/ which is purported by Rags to be superior to the Andrew Ng course

lapis sequoia Jul 24, 2018, 5:34 PM

#

is it actualy superior?

feral lodge Jul 24, 2018, 7:31 PM

#

@rapid pawn could you post your code for the splines?

rapid pawn Jul 24, 2018, 7:32 PM

#

@feral lodge i found the problem thank you man 😃

feral lodge Jul 24, 2018, 7:33 PM

#

Oh good job! What was the issue if you don't mind? Never saw splines looking like that before

rapid pawn Jul 24, 2018, 7:34 PM

#

it was one of my coefficient formula

#

namely the c coefficient

#

but it was rather strange since when i approx with less than 5 data points my c coefficient was correct

#

but once i past that threshhold c begins to get weird values lol

feral lodge Jul 24, 2018, 7:38 PM

#

Weird 🤔 good job finding it 👌

rapid pawn Jul 24, 2018, 7:39 PM

#

lol it was werid indeed thus why i couldnt find it at first

wanton pier Jul 24, 2018, 11:19 PM

#

Have any of you guys used scypi's KD tree for nearest neighbor searches? I have two large datasets (call them A and B) and I need to find for each point in A the corresponding closest point in B. A and B are spacial data (so x and y coordinates) and are not the same size. A is rectangular, B is not. I found this on stackoverflow (https://stackoverflow.com/questions/10818546/finding-index-of-nearest-point-in-numpy-arrays-of-x-and-y-coordinates) and OP answers his own question using a KDtree. I unfortunately don't at all get what's going on with his code, so if any of you guys have experience with this and would help me understand, I'd be very grateful lol

Stack Overflow

Finding index of nearest point in numpy arrays of x and y coordinates

I have two 2d numpy arrays: x_array contains positional information in the x-direction, y_array contains positions in the y-direction.

I then have a long list of x,y points.

For each point in the...

small ore Jul 25, 2018, 5:17 AM

#

@chilly crest I am just going to believe the edx course pinned by Rags is superior coz I am myself unable to assess it as it requires a superior intellect like Rags to straight away understand that math without also reading and revising math elsewhere. I therefore prefer Andrew Ngs course on courseera

lapis sequoia Jul 25, 2018, 8:14 AM

#

Anyone good with pandas /pandas profiling? I keep getting this zero division error: float division by zero

grim elk Jul 25, 2018, 9:18 AM

#

Im looking for ELK channels

hearty heron Jul 25, 2018, 9:53 AM

#

does anyone have any experience with Surprise?

#

I want to know how I can implement my own similarity measures in it

#

as it doesn't have an implementation for Jaccard similarity

#

I could use MSD

dreamy tartan Jul 25, 2018, 12:09 PM

#

Which algorithms are recommended for predict customer churn? Im using Xgboost classifier, is it good?
I tried MLPClassifier from sklearn, and Xgboost classifier. Xgboost get better accuracy than MLPClassifier.

feral lodge Jul 25, 2018, 12:18 PM

#

@hearty heron https://github.com/NicolasHug/Surprise/blob/master/surprise/similarities.pyx I'd have a look at their similarities module to see how their methods are implemented! The cython stuff might look a bit off putting if you're not used to that, but otherwise it seems straight forward to implement new measures in the same way they did

hearty heron Jul 25, 2018, 1:36 PM

#

Thanks, I'll take a look. Yeah haven't done cython before, but I'm sure it won't be too hard to figure out with a background in C

#

thanks for the pointer in the right direction @feral lodge 👍

serene oar Jul 25, 2018, 5:09 PM

#

Are any graphing libs such as matplotlib that allow the graph to be modified (style, index, axes) when the plot is already opened?
Thinking of it in regards to a program that reads and plots different CSV's but some might want to graph the data differently.

velvet anchor Jul 25, 2018, 5:13 PM

#

Matplot has a live option

#

But I’m not positive that’s exactly what you’re looking for

serene oar Jul 25, 2018, 5:14 PM

#

I will check it out, thanks

chilly crest Jul 25, 2018, 7:34 PM

#

lol @small ore, I'd like to compare the difference at some point.

feral lodge Jul 25, 2018, 8:15 PM

#

@wanton pier hiho buddy, did you figure it out? This guy https://stackoverflow.com/a/32781737 in the same thread you linked gave a nice example

serene oar Jul 26, 2018, 12:40 PM

#

Dear plotting gurus.

df = pd.read_csv(str)
        sharedax = df.ix[:, 0]
        a1 = df.ix[:, 1]
        a2 = df.ix[:, 2]
        print(a1, a2)
        ax1 = plt.subplot(211, sharex=sharedax)
        plt.plot(sharedax, a1)

        ax2 = plt.subplot(212, sharex=sharedax)
        plt.plot(sharedax, a2)
        plt.show()

TypeError: 'Series' objects are mutable, thus they cannot be hashed.

I can not find the error.
a1 and a2 print out perfectly, but I can't plot them for some reason.
When I'd plot the df before splitting them into subplots it works fine too.
Is there a workaround here or did I approach it from the wrong angle in the first place?

earnest prawn Jul 26, 2018, 1:16 PM

#

in whihc line does the error happen @serene oar

serene oar Jul 26, 2018, 2:41 PM

#

when starting to plot, at ax1 =

feral lodge Jul 26, 2018, 2:43 PM

#

https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots.html

sharex is a bool, no? Not a sequence

serene oar Jul 26, 2018, 2:44 PM

#

Took inspiration from this

📎 unknown.png

#

They use sharex=CustomAxisCreatedBefore

#

Oh, removing the sharex thing makes the window to pop up, but also not respong..

feral lodge Jul 26, 2018, 2:50 PM

#

df = pd.read_csv(str)
x = df.ix[:, 0]
a1 = df.ix[:, 1]
a2 = df.ix[:, 2]

ax1 = plt.subplot(211)
plt.plot(x, a1)

ax2 = plt.subplot(212, sharex=ax1)
plt.plot(x, a2)

plt.show()

How about this?

serene oar Jul 26, 2018, 2:52 PM

#

opens, but crashes. not responding

feral lodge Jul 26, 2018, 2:56 PM

#

And still crashes when removing all mentions of sharex?

serene oar Jul 26, 2018, 3:01 PM

#

Yep

#

Other functions in the same class run well. E.g. plotting right after reading as csv.

#

Looks like it depends on the file that's being read.
A simple, self-made one runs, whereas a downloaded one doesn't.
Both have 3 lines of data.

feral lodge Jul 26, 2018, 3:19 PM

#

Three columns yeah? Try just using a few rows of the downloaded data, like this:

x = df.ix[3:10, 0]
a1 = df.ix[3:10, 1]
a2 = df.ix[3:10, 2]

Maybe there's some trash in the beginning or end of the file

serene oar Jul 26, 2018, 3:27 PM

#

That works

feral lodge Jul 26, 2018, 3:28 PM

#

Then I'd try using an increasingly larger range of rows until it stops working 😄

serene oar Jul 26, 2018, 3:28 PM

#

It's weird, cause plotting it without making subplots works

#

Maybe it's the number or rows. Might be too big?
There are 1200 of em

feral lodge Jul 26, 2018, 3:29 PM

#

I have no experience with pyplot so I can't say for sure, but that seems unlikely to me

#

It'd be a pretty poor excuse for a plotting library in that case imo

#

And it would also probably throw some custom TooMuchDataException

serene oar Jul 26, 2018, 3:32 PM

#

Can I loop trough the list to find the TypeError?

feral lodge Jul 26, 2018, 3:35 PM

#

Maybe, I couldn't tell you how though 🤔 the row breaking the plot is probably in the beginning or end though

serene oar Jul 26, 2018, 3:41 PM

#

Hm, the more data I load in, the longer it takes to start it. At 300 lines from all columns it stops responding for a second.

#

Eventually loads it still

#

Are the any other plotting libs that allow me to do this subplot view?

feral lodge Jul 26, 2018, 4:01 PM

#

Matplotlib is the goto afaik! But this is pretty weird though, 1200 data points isn't all that much. Surely it should be able to handle tens of thousands of points. Maybe this approach works better with dataframes? https://stackoverflow.com/questions/22483588/how-can-i-plot-separate-pandas-dataframes-as-subplots

#

They show subplots here too, also using df.plot http://pandas.pydata.org/pandas-docs/version/0.13/visualization.html

#

Looks like that's the intended way to plot

serene oar Jul 26, 2018, 4:22 PM

#

Awesome, thanks. That does the trick

dense oak Jul 28, 2018, 3:09 AM

#

Hey Everyone, I have 1 homework question left, and I simply can not figure it out. I have to create a correlation matrix with rows from one dataframe (which is someone how has to be generated from the values in 1 column) and columns from the other. right now all i can do is either get the column any one out there that can help? I have been at this for 5 hours and my brain is in a deathspiral

torn inlet Jul 28, 2018, 3:10 AM

#

df.iloc[row_start:row_end]?

dense oak Jul 28, 2018, 3:13 AM

#

the column to make the rows has 25191 rows in it

#

i need to group it some how first

#

what is the bracket command to post code here? i cant remember

torn inlet Jul 28, 2018, 3:14 AM

#

I feel like some information is missing. What do you mean by "the column to make the rows"? Do you mean turning one of the columns into the index?

#

Single backticks for inline, groups of 3 for a block.

dense oak Jul 28, 2018, 3:15 AM

#

''' b3 = kdd20['label'].map(itc)
b3.value_counts()
b4 = pd.DataFrame (b3.value_counts())
b4 '''

#

i did that wrong

torn inlet Jul 28, 2018, 3:15 AM

#

It's the one on the tilde key.

#

and they have to be on their own lines.

dense oak Jul 28, 2018, 3:15 AM

#

b3 = kdd20['label'].map(itc)
b3.value_counts()
b4 = pd.DataFrame (b3.value_counts())
b4

torn inlet Jul 28, 2018, 3:16 AM

#

yay

dense oak Jul 28, 2018, 3:16 AM

#

i am trainable

velvet anchor Jul 28, 2018, 3:16 AM

#

result = pd.concat([df1, df2], axis=1).corr()

#

like this?

dense oak Jul 28, 2018, 3:16 AM

#

TypeError: cannot concatenate a non-NDFrame object

#

i can share my book...its on azure notebooks if you all are really interested...lol

velvet anchor Jul 28, 2018, 3:17 AM

#

did you replace df1 and df2 with your dataframes? :p

dense oak Jul 28, 2018, 3:17 AM

#

result = pd.concat([b3, Basic], axis=1).corr()

velvet anchor Jul 28, 2018, 3:18 AM

#

Hmm @feral lodge is normally who I turned to with df questions but I think he's AFK rn

#

I did find this SO page though https://stackoverflow.com/questions/41823728/how-to-perform-correlation-between-two-dataframes-with-different-column-names might have your answer

torn inlet Jul 28, 2018, 3:19 AM

#

I'm still unclear on what you want to do. Why do you need to group the rows? Is there a count mismatch?

dense oak Jul 28, 2018, 3:20 AM

#

the question i am trying to answer is : Create data frames which have intrusions (rows) and features based on the mappings given for Basic and Content features. Then calculate the correlation matrices for each. What is the highest absolute value of correlation (other than 1.0) in the Basic matrix?

#

Basic and Content are basically filters against column headers for a CSV file

#

ex.
Basic = ["duration","protocol_type","service","flag","src_bytes","dst_bytes","land","wrong_fragment","urgent"]

torn inlet Jul 28, 2018, 3:21 AM

#

Ya, makes sense.

dense oak Jul 28, 2018, 3:23 AM

#

all the intrusion data existed in one column in the spread sheet named 'label' but was categorized using
itc = {'back':'DOS','land':'DOS','neptune':'DOS','pod':'DOS','smurf':'DOS','teardrop':'DOS','satan':'PROBE','ipsweep':'PROBE','nmap':'PROBE','portsweep':'PROBE','normal':'NORMAL','guess_passwd':'R2L','ftp_write':'R2L','imap':'R2L','phf':'R2L','multihop':'R2L','warezmaster':'R2L','warezclient':'R2L','spy':'R2L','buffer_overflow':'U2R','loadmodule':'U2R','perl':'U2R','rootkit':'U2R'}

#

kdd20['label'].map(itc)

#

kdd20 is the variable for the speardsheet

torn inlet Jul 28, 2018, 3:23 AM

#

Ok, this is making a lot more sense now

dense oak Jul 28, 2018, 3:24 AM

#

i am glad it is for someone 😃

torn inlet Jul 28, 2018, 3:25 AM

#

I am not a data scientist but I have some Pandas experience

dense oak Jul 28, 2018, 3:25 AM

#

yay 😃

torn inlet Jul 28, 2018, 3:25 AM

#

Do you really need a groupby? Are both the labels in Basic and itc features?

dense oak Jul 28, 2018, 3:26 AM

#

i think ...both are features

torn inlet Jul 28, 2018, 3:27 AM

#

It looks like it to me. I understand that rows are always observations or data points, so I assume that's the case here too

dense oak Jul 28, 2018, 3:28 AM

#

the closest I have gotten is

bcm = bcmd[Basic]
co = bcm.corr()
co ```

and

``` groupme = kdd20[Basic].groupby(kdd20['class'])
groupme.corr()```

the first set of code is only correlating basic against itself
the second set of code does put the intrusion cats in , but its funky, and still is correlation basic against it self, almost nested like

torn inlet Jul 28, 2018, 3:29 AM

#

Correct, Compute pairwise correlation of columns, excluding NA/null values.

dense oak Jul 28, 2018, 3:29 AM

#

okay...how?

torn inlet Jul 28, 2018, 3:29 AM

#

So you want correlation between the Basic and Content data?

dense oak Jul 28, 2018, 3:30 AM

#

yes

#

i basically need something like

torn inlet Jul 28, 2018, 3:30 AM

#

Do the number of rows match properly? Can you concatenate the columns together?

dense oak Jul 28, 2018, 3:31 AM

#

they call come from the same csv so i would assume yes

torn inlet Jul 28, 2018, 3:31 AM

#

That is, are the ids (really the indices) match between the two, is what I'm asking

dense oak Jul 28, 2018, 3:31 AM

#

that I am not sure, sorry.

torn inlet Jul 28, 2018, 3:33 AM

#

This doesn't directly answer your question, but if I understand the problem statement correctly, Create data frames which have intrusions (rows) and features based on the mappings given for Basic and Content features. Then calculate the correlation matrices for each. What is the highest absolute value of correlation (other than 1.0) in the Basic matrix?, you don't need to calculate the correlation between the Basic and Content DataFrames, only within each.

dense oak Jul 28, 2018, 3:35 AM

#

I need to calculate the correlation matrices for the intrusions and basic and separately intrusions and content

#

something that looks like this bad ms paint representation

#

📎 Captureex.PNG

torn inlet Jul 28, 2018, 3:36 AM

#

aha

dense oak Jul 28, 2018, 3:37 AM

#

that example is intrusions correlated to Basic

torn inlet Jul 28, 2018, 3:38 AM

#

I think one way to do it, though ugly, would be (as I think you said) have each intrusion type be a column label, and be 1 or True for an observation where it's the correct intrusion type, and 0/False otherwise.

dense oak Jul 28, 2018, 3:39 AM

#

but i still need the values per intrusion type

#

kdd20['label'].map(itc)
b3 = kdd20['label'].map(itc)
b3.value_counts()
b4 = pd.DataFrame (b3.value_counts())
b4
label
NORMAL    13449
DOS    9234
PROBE    2289
R2L    209
U2R    11 ```

ripe willow Jul 28, 2018, 10:45 AM

#

hey, can someone pls help with a webcrawler ?

naive hornet Jul 28, 2018, 10:55 AM

#

!tag g ask

arctic wedgeBOT Jul 28, 2018, 10:55 AM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

zealous ermine Jul 28, 2018, 8:41 PM

#

any of u good with statistics get how sutff like MSE, NRMSE, PSNR and SSIM work? (or at least the first 2)

#

mse is mean squared error, nrmse is normalised root mean squared error, psnr is peak signal-to-noise ratio, and ssim is structural similarity (no idea what the IM is)

#

http://scikit-image.org/docs/dev/api/skimage.measure.html

#

the functions for them are towards the bottom of that table

#

📎 a.png