#data-science-and-ml | Python | Page 203

lyric canopy Jun 20, 2019, 2:47 PM

#

@silk forge Are you familiar with the term "mean squared error"?

silk forge Jun 20, 2019, 2:48 PM

#

thats exactly what i dont understan d

#

@lyric canopy

#

Regression is so complex

#

Classification way easier

lyric canopy Jun 20, 2019, 2:49 PM

#

Your model predicts values, right?

#

Let's call those predicted values y-hat (ŷ).

#

We also have the "true" values, let's call those why

#

Now we want to compare the predictions our model makes to the true values to see how well it does

#

That's what the formula does

#

It computers the difference between the predicted values and the true values, ŷ - y, and we call that the error of the model

#

Or, how wrong each prediction is

#

We then square those errors to get the squared error between each individual prediction and "true" value

#

And then we calculate the mean of those squared errors by summing them together and dividing them by the number of predictions we have

#

And that's what the formula does here

#

So, it's the mean squared error

silk forge Jun 20, 2019, 2:53 PM

#

Oh

#

so they try to find accuracy

lyric canopy Jun 20, 2019, 2:53 PM

#

In this case, there's an additional 1/2 involved (that's why it's 1/2m), but that's just to make differentiating easier

#

It doesn't have influence on the optimization

silk forge Jun 20, 2019, 2:54 PM

#

but why 1/2

#

?

lyric canopy Jun 20, 2019, 2:54 PM

#

Just to make a term drop out when you take the derivative

#

It doesn't influence the minimum/optimization, so it doesn't matter

#

Now, think of your model: If the MSE is some kind of measurement for how accurate the model is, then it would be a good thing to minimize it

silk forge Jun 20, 2019, 2:55 PM

#

so i dont need to care about that right?

lyric canopy Jun 20, 2019, 2:55 PM

#

And that's what a lot of algorithms do: Find the model with the lowest MSE (or another measure of accuracy)

silk forge Jun 20, 2019, 2:55 PM

#

also is there any good linear regression tutorials?

#

that i can learn from

lyric canopy Jun 20, 2019, 2:56 PM

#

I'm not sure. I only know of a really good book, but that book probably contains more than what you're looking for

#

Fox (2015) Applied Regression Analysis and Generalized Linear Models (third edition)

silk forge Jun 20, 2019, 2:57 PM

#

hmm

#

thanks

#

ill see if i can get the book

#

yo wait

#

@lyric canopy

#

in an RMSE

#

do we need to do formula stuff in every point

#

i mean

#

📎 unknown.png

#

in all these green points

lyric canopy Jun 20, 2019, 3:29 PM

#

do you see that red line in there labeled error? That's the ŷ - y of earlier

#

The difference between the predicted value of the model (represented by the line), and the actual value (the green dot)

wind knoll Jun 20, 2019, 4:52 PM

#

Anyone have any of the following books? (Ping me)
https://smile.amazon.com/Machine-Learning-Deep-Algorithms-Techniques/dp/9388511131/ref=sr_1_3?keywords=deep+learning+matlab&qid=1561045671&s=books&sr=1-3
https://smile.amazon.com/MATLAB-Deep-Learning-Artificial-Intelligence/dp/1484228448/ref=sr_1_3?keywords=deep+learning+matlab&qid=1561046115&s=books&sr=1-3
https://smile.amazon.com/MATLAB-Machine-Learning-Recipes-Problem-Solution/dp/1484239156/ref=sr_1_9?keywords=deep+learning+matlab&qid=1561046632&s=books&sr=1-9

wide echo Jun 20, 2019, 5:05 PM

#

~~> Matlab~~

spark nimbus Jun 20, 2019, 7:35 PM

#

@lean ledge given an Array<Sample> how would I combine those samples back into a proper waveform?

jagged stump Jun 21, 2019, 12:34 PM

#

Hey everyone . I wanna brand detection on live video. How can I Do it ? What is your advice? I am trying with using CNN

lapis sequoia Jun 21, 2019, 12:38 PM

#

can someone tell me what this 3 means here

#

energy_series = df.loc[:, ('Energy', '3')]

#

Energy is a column name..

#

but not sure what '3' refers to

#

hmm maybe this is a tuple of columns

#

im having trouble coercing my dataframe column to datetime values

#

it's in unicode

#

I tried pd.to_datetime

desert oar Jun 21, 2019, 1:24 PM

#

@weary rose fortunately for you, 1 million isnt considered big 😛

#

yes you "can" do that

#

how is the data stored?

#

@lapis sequoia it could be one of two scenarios: 1) the column label is actually a tuple, so this tuple is accessing the column called ('Energy', '3'), or 2) the dataframe has a "multiindex" instead of simple column names, so ('Energy', '3') is accessing the key 'Energy' in the outer level of the index, and the key '3' in the next level of the index

#

@jagged stump i have no experience in this area myself but i found this which could help https://wiki.tum.de/display/lfdv/Video+Analysis

wide gyro Jun 21, 2019, 1:35 PM

#

What would be the best way to converting a text file of cells into csv?

desert oar Jun 21, 2019, 1:35 PM

#

@wide gyro pd.read_csv ?

wide gyro Jun 21, 2019, 1:36 PM

#

@desert oar well I want to set it to a csv file that allows a dataframe to come in after and take the data but I'm struggling with getting the formatting down for csv

desert oar Jun 21, 2019, 1:36 PM

#

create the csv using pandas in the first place?

#

what do you mean "getting the formatting down"

#

you shouldnt ever be manually constructing CSV data except in very simple cases

wide gyro Jun 21, 2019, 1:38 PM

#

I am using iwlist scan and trying to get that output into csv file

desert oar Jun 21, 2019, 1:39 PM

#

what is "iwlist scan"

wide gyro Jun 21, 2019, 1:39 PM

#

But my csv file is setting it row by row, with no column headers

desert oar Jun 21, 2019, 1:39 PM

#

can you share some code

wide gyro Jun 21, 2019, 1:40 PM

#

oops forgot iwlist is in linux, but it scans for access points nearby

desert oar Jun 21, 2019, 1:40 PM

#

can you share some code

wide gyro Jun 21, 2019, 1:40 PM

#

Sure gimme a sec

desert oar Jun 21, 2019, 1:40 PM

#

iwlist is a command line tool right

#

so what are you doing with it

wide gyro Jun 21, 2019, 1:41 PM

#

ye

desert oar Jun 21, 2019, 1:41 PM

#

👍

wide gyro Jun 21, 2019, 1:42 PM

#

trying to do something similar to wifimap but a beginner version of it

#

if that makes sense

desert oar Jun 21, 2019, 1:42 PM

#

im not familiar w/ wifimap. but how are you parsing the output of iwlist

wide gyro Jun 21, 2019, 1:43 PM

#

can i show a terminal script here? im not doing it off python rn

desert oar Jun 21, 2019, 1:43 PM

#

is this a shell script?

#

yes ofc

#

use "bash" instead of "python" for highlighting

wide gyro Jun 21, 2019, 1:43 PM

#

yeah that's what im stuck on right now

desert oar Jun 21, 2019, 1:43 PM

#

bash highlighting:

echo 1
x=3
echo $x

wide gyro Jun 21, 2019, 1:44 PM

#

wifimap . io I think it is @desert oar it's like a hotspot finder for restaurants or stores

#

good if you travel and stuff

desert oar Jun 21, 2019, 1:44 PM

#

ok. yeah go ahead and share your shell script

#

the goal is to create a CSV that you can read to pandas right

wide gyro Jun 21, 2019, 1:46 PM

#

echo "Scan?"
select yn in "Yes" "No"; do
    case $yn in
        Yes) iwlist wlan0 scan | tee ~/output.txt;;
        No) exit;;
    esac
done

#

Yes

#

I receive a block of cells that all have their respective data, but not sure how to then parse it

#

Maybe I could bring all the data onto one line without using the category for it, and then when setting it to csv I could set the column headers manually that match up to the data?

#

or would that not be smart @desert oar

desert oar Jun 21, 2019, 1:52 PM

#

can you show some example output from iwlist wlan0 scan?

wide gyro Jun 21, 2019, 1:54 PM

#

Actually I figured it out, there's a command called awk that will take care of it

desert oar Jun 21, 2019, 1:54 PM

#

yes i've used awk a lot

#

still

#

if you make bad CSV files you make bad CSV files

#

also awk isn't that easy to use

#

and there is a lot of really really bad awk advice out there

#

so i'd still like to see the iwlist output

#

so you don't waste time on someone's over-clever awk solution that you don't understand

wide gyro Jun 21, 2019, 1:56 PM

#

Is there a formatting for txt files?

desert oar Jun 21, 2019, 1:56 PM

#

nothing standard

#

can you just share some output

wide gyro Jun 21, 2019, 1:56 PM

#

ye

#

Cell 01 -    Address: 00.something
        Channel:    10
        Quality: 57/70
        ESSID: "Main"

Cell 02 -    Address: 00.something
        Channel:    10
        Quality: 57/70
        ESSID: "Main"

#

Essentially that for txt file, and on csv its that but each one is a new row

#

I figured I would get rid of the cell categories and just keep the data, get all the data on same line separated by a comma or tab, and then when setting it to a csv, I would put the column headers in manually

#

I saw online something like sed -e "s/\tsignal: //" -e "s/\tSSID: //" which I figured I would do for all of them

desert oar Jun 21, 2019, 2:08 PM

#

can you demonstrate the output format you want to see

wide gyro Jun 21, 2019, 2:08 PM

#

00.something 10 57/70 "Main"

#

something like that, or could be separated by tabs or commas

desert oar Jun 21, 2019, 2:08 PM

#

got it. ok awk is actually a good tool for that

#

you could also do it in python

wide gyro Jun 21, 2019, 2:09 PM

#

Which would you suggest?

#

I just saw awk first and jumped on it

#

but open to anything

desert oar Jun 21, 2019, 2:10 PM

#

since you already know python id personally suggest using python

wide gyro Jun 21, 2019, 2:11 PM

#

But would you suggest trying to clean it up a bit in linux before?

desert oar Jun 21, 2019, 2:11 PM

#

eh

#

no point imo

wide gyro Jun 21, 2019, 2:11 PM

#

Or everything in python

#

ye I understand

desert oar Jun 21, 2019, 2:12 PM

#

if you already know the tools, or want an excuse to learn them, then by all means let's get into it

#

but if you just wanna get the data processed, use what you already know

wide gyro Jun 21, 2019, 2:12 PM

#

I might wanna try it a couple ways just to learn, but I'll head down the path of what I already know just to get it running first

#

and then have a backup plan ready in case I end up not understanding the other methods

desert oar Jun 21, 2019, 2:14 PM

#

thats a good idea

#

i learned not to do that the hard way

wide gyro Jun 21, 2019, 2:15 PM

#

What do you mean haha

desert oar Jun 21, 2019, 2:50 PM

#

i mean, learning a bunch of new tools while trying to actually get something done, without having a backup in place first

drowsy marsh Jun 21, 2019, 2:53 PM

#

hello, I fail to create a .mplstyle for matplotlib with Spyder. Python doesn't find the new_style in the style library. And even if I modify a present default style, it doesn't change the output. I don't really know where I'm missing something :/

#

I missed this: matplotlib.style.reload_library()

silk forge Jun 21, 2019, 4:15 PM

#

import sklearn.naive_bayes as bae
import numpy
import pandas as pd




data = pd.read_csv(filepath_or_buffer="C:/Users/admin/Desktop/nibbeh.csv")


x = pd.DataFrame.as_matrix(data.Height,data.Weight)
y = [data["Gender"]]

clf = bae.GaussianNB()
clf.fit(x,y)
n = clf.predict([[140,120]])

#

ValueError: Expected 2D array, got 1D array instead:

#

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

#

help?

desert oar Jun 21, 2019, 4:28 PM

#

the error message is pretty clear in this case

#

the x is expected to be a 2d array

#

you have a 1d array

earnest prawn Jun 21, 2019, 4:55 PM

#

predict and expect do expect sequences of data points not just a single data point

#

so you always gotta wrap them in an additional array

#

@drowsy marsh

#

oh shoot

#

@silk forge

bitter pewter Jun 21, 2019, 5:57 PM

#

Guys, how do I make selenium do multiple pages read to export a excel file after?

#

I've successfully made it reads one page and then export a file with the results

#

Let me show MWE:

#

from bs4 import BeautifulSoup
from openpyxl import Workbook
import numpy as np
import pandas as pd

url = "https://scon.stj.jus.br/SCON/legaplic/toc.jsp?materia=%27Lei+8.429%2F1992+%28Lei+DE+IMPROBIDADE+ADMINISTRATIVA%29%27.mat.&b=TEMA&p=true&t=&l=1&i=18&ordem=MAT,@NUM"

driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)

python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div/div/div[3]/div[2]/div/div/div/div[16]/a')
python_button.click()

driver.switch_to.window(driver.window_handles[-1])

python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div[1]/div/div[3]/div[2]/div/div/div/div[3]/div[2]/span[2]/a')
python_button.click()

driver.switch_to.window(driver.window_handles[-1])

textList = driver.find_elements_by_class_name("docTexto")

resultados = BeautifulSoup(driver.page_source, 'lxml')

parse = resultados.find('div', {'id':'listadocumentos'})
paragrafoBRS = parse.find_all('div',{'class':'paragrafoBRS'})

header = []
content = []
for each in paragrafoBRS:
    header.append(each.find('h4', {'class':'docTitulo'}).text.strip())
    content.append(each.find(['div','pre'], {'class':'docTexto'}).text.strip())

    df = pd.DataFrame([content], columns = header)

df.to_excel('dados.xlsx')

driver.quit()

#

So, selenium opens up a page, then go through some links and get to the point I want (a page that display the data I want to scrape)

#

The problem is that there are 5 pages of data

#

And I'm struggling to make it reads all the 5 pages and then export the Excel file

bitter pewter Jun 21, 2019, 7:19 PM

#

Guys, nvm

#

I've figured it out

#

I'm struggling how to merge columns that have the same name

#

Any tips?

#

Example: I have 3 columns named "Processo", but I want to merge those columns to instead of having 1 line and 3 columns, have 1 column and 3 lines of data

#

Using Pandas DataFrame, ofc

lapis sequoia Jun 22, 2019, 12:08 AM

#

rephrase your question so it makes sense

inland viper Jun 22, 2019, 1:04 AM

#

Hello?

earnest prawn Jun 22, 2019, 1:10 AM

#

hello

inland viper Jun 22, 2019, 1:11 AM

#

I cannot figure out what the second part of the prompt is asking for: Given the root of a binary search tree with distinct values, modify it so that every node has a new value equal to the sum of the values of the original tree that are greater than or equal to node.val.

#

I can post a picture of an example if that helps

earnest prawn Jun 22, 2019, 1:13 AM

#

I dont really see how much clearer you could express this question, what exactly is your problem with it?

inland viper Jun 22, 2019, 1:14 AM

#

I do not understand how to modify the tree

earnest prawn Jun 22, 2019, 1:15 AM

#

node.val = 4343434?

inland viper Jun 22, 2019, 1:15 AM

#

Input: [4,1,6,0,2,5,7,null,null,null,3,null,null,null,8]
Output: [30,36,21,36,35,26,15,null,null,null,33,null,null,null,8]

#

This is the example. There is a picture of the tree as well. But I do not understand where the values are coming from

earnest prawn Jun 22, 2019, 1:17 AM

#

show me the picture then

inland viper Jun 22, 2019, 1:18 AM

#

📎 binarytree.PNG

earnest prawn Jun 22, 2019, 1:21 AM

#

oh those lists are actually equally long i see

#

that confused me

#

well then its quite obvious where the new numbers are coming from, the tasks describes exactly how to calculate them

inland viper Jun 22, 2019, 1:22 AM

#

What does node.val mean?

earnest prawn Jun 22, 2019, 1:23 AM

#

the value associated with the node?

inland viper Jun 22, 2019, 1:32 AM

#

I see

lean ledge Jun 22, 2019, 4:01 AM

#

@inland viper just a recursive solution should work. It's not a hard problem by any measure. It recurses down and returns the sum to the upper tree. Upper tree takes the sum to the left and assigns it as it's value and returns back next sum by adding its old value to the sum

#

This is also not data science

strong flare Jun 22, 2019, 4:22 AM

#

HI may i know how can i show all x axis fit in the graph ?

📎 unknown.png

silk forge Jun 22, 2019, 5:56 AM

#

wanna show us the code? @strong flare

strong flare Jun 22, 2019, 6:03 AM

#

@silk forge hi fwiz here is the code 😀

# Plot with differently-colored markers.
plt.plot(combine_month.index, combine_month.April, 'b-', label='April')
plt.plot(combine_month.index, combine_month.March, 'g-', label='March')
plt.plot(combine_month.index, combine_month.May, 'r-', label='May')

# Create legend.
plt.legend(loc='right')
plt.xlabel('Date')
plt.ylabel('Month trend')
plt.show()

bitter pewter Jun 22, 2019, 6:11 AM

#

@lapis sequoia when I get the xlsx file with all the data I've collected with selenium and so, instead of having multiple lines correlated to 11 different columns, it create multiple columns and one single line. I'll post a screenshot over here so you can check it out.

#

As you can see, the column B "Processo" is repeated at column L, but they are correlate to different data sets. I want to get them both at the same column, but in two different lines.

📎 unknown.png

strong flare Jun 22, 2019, 9:53 AM

#

Hi anyone know how to change a line plot Y-aixs to normal figure ? not in scientific notation mode

outer marsh Jun 22, 2019, 11:17 AM

#

Hello everyone

#

Does someone know why my scipy optimize minimze doesn't give the most optimal solution

#

https://github.com/nobobo1234/zipf/blob/master/main.py#L44

#

Things like this

📎 unknown.png

lean ledge Jun 22, 2019, 12:26 PM

#

https://old.reddit.com/r/MachineLearning/comments/c3e9qu/d_those_who_hireinterview_for_machine_learning/erqkrm4/

[D] Those who hire/interview for machine learning positions, what ...

Posted in r/MachineLearning by u/AdditionalWay • 427 points and 126 comments

serene crane Jun 22, 2019, 1:52 PM

#

When you're working with a jupyter notebook, do you still tend to put all imports at the top, or does it make sense to group imports around the cells they are used? What's the common pattern?

polar acorn Jun 22, 2019, 2:26 PM

#

I put them on top. If you're sharing the notebook with someone later they have all the requirements in one cell instead of scattered around.

onyx granite Jun 22, 2019, 3:13 PM

#

yes you want people to run into any errors immediately, not later down the line. It's always good to put imports first, especially since it wont give you a compile time error, and mostly runtime @serene crane

desert oar Jun 22, 2019, 3:35 PM

#

@outer marsh what's wrong with that output and what were you expecting instead

outer marsh Jun 22, 2019, 3:51 PM

#

@desert oar The second one could've been a bit more down

#

Same for the first one

#

I could a better line by hand

desert oar Jun 22, 2019, 4:18 PM

#

So you're trying to fit a linear regression line with least squares?

#

Oh I see, power law

#

How about this, punch in the parameters for what you think is a better line and compute the log likelihood of both your "manual" solution and the solution the computer found

#

If it turns out that you can in fact draw a better line then clearly it failed to converge for some reason

#

Or didn't find a global minimum

#

Also make sure to rule out any bugs in your log likelihood

#

That log likelihood looks kind of off to me

#

What expression are you actually trying to optimize, zipfs law log likelihood?

lapis sequoia Jun 22, 2019, 7:35 PM

#

has anyone here worked with Python cv2, computer vision. for face detection or anything

outer marsh Jun 22, 2019, 7:36 PM

#

@desert oar Yeah

hollow shard Jun 22, 2019, 7:39 PM

#

Hi guys, I'm in a spot of bother with a 1 hidden layer neural net. I created a very basic neural network after the one I created for MNIST didnt work, training on a very simple generated dataset. However, it spits out nonsense results, and nothing that ive tried works. Could anyone take a look? Apologies in advance for rubbish code and the probably basic errors, I'm completely new.

#

http://dpaste.com/1ZY6QZW

desert oar Jun 22, 2019, 7:42 PM

#

@hollow shard it looks like you're re-initializing everything inside the loop

#

so every single step all the parameters get re-initialized to 0

hollow shard Jun 22, 2019, 7:43 PM

#

oh crap hahaha

#

thanks, I appreciate it

onyx granite Jun 22, 2019, 7:55 PM

#

I have a problem that I can't seem to figure out, I am trying to create a boosting model based off of car accidents for a particular city. I have about 30-50k records, since this is a classic imbalanced data case, I am "creating" my own training set by sampling accidents, changing a feature, then if it is non-negative, add it to my trainset as a 0/non-accident

#

I can't seem to get any good recall scores, since I am more focused on actually preventing type II errors

#

anyone who can help would be appreciated

#

i have about 15-20 different features as well btw, I can go into more detail if needed

hollow shard Jun 22, 2019, 8:25 PM

#

strange, I modified the code to put the resets outside of the loop, and its still playing up.

#

http://dpaste.com/21KPBEK

sand reef Jun 22, 2019, 8:44 PM

#

Anybody on? I have a smol question.

#

Can increasing the resolution of a picture reduce the accuracy of the model?

#

Cuz at 45x45 the model was performing flawlessly. When I made it 100x100, suddenly the accuracy got stuck at 47%

#

And wouldn't rise even with 10 epochs. Btw the training data size of 73k images.

desert oar Jun 22, 2019, 8:53 PM

#

seems weird

sand reef Jun 22, 2019, 8:57 PM

#

Yeah. I too am wondering what went wrong. From what I found online, they say that low resolution helps in gathering global features better and high helps in finer features. But as you increase the resolution, it's harder to grasp global features.

#

So I am not sure if OCT scans seem to be classified on the basis of global features though.

desert oar Jun 22, 2019, 8:59 PM

#

maybe you can visualize the middle layers to see what's being learned

#

what was accuracy in 45x45?

sand reef Jun 22, 2019, 9:02 PM

#

88

desert oar Jun 22, 2019, 9:06 PM

#

oh. huh

#

im really not a image/vision guy but that aint right

#

the usual "check your code" and "check your data" caveats apply

sand reef Jun 22, 2019, 9:10 PM

#

Well. Can't really check my data. It's just an image.

#

Code wise. We could try out different designs though.

desert oar Jun 22, 2019, 9:12 PM

#

well how are you producing the images

#

i assume these arent actually 45x45 images

#

and youre downsampling them somehow

sand reef Jun 22, 2019, 9:13 PM

#

Nope.

#

Those are 360x760

desert oar Jun 22, 2019, 9:13 PM

#

so do the 100x100 images actually look right?

sand reef Jun 22, 2019, 9:13 PM

#

Yeah. That's why I was like, 45 might be very low resolution.

desert oar Jun 22, 2019, 9:13 PM

#

its kind of a long shot but worth checking. sorry im not more helpful on the actual neural network side of things

sand reef Jun 22, 2019, 9:13 PM

#

And guess what, with 45x45, the test dataset accuracy is 91%

desert oar Jun 22, 2019, 9:14 PM

#

it's possible that 100x100 is a "bad zone" where the easy blob-like features start to break up but the image itself hasnt resolved into anything meaningful yet

#

but again im speculating

#

cause 45x45 is just like what, blobs of dark and light?

sand reef Jun 22, 2019, 9:14 PM

#

Yus

#

So I guess then I'll just do it with multiple image sizes and networks

#

And plot it on tensorboard

desert oar Jun 22, 2019, 9:16 PM

#

seems reasonable as long as training isnt too intensive

sand reef Jun 22, 2019, 9:16 PM

#

Well. Not much. 76k images 100x100 takes 1. 10 min per epoch.

#

*1min 10s

#

I am pretty happy at the fact that this actually works fast with a gpu. With a cpu every epoch would've taken 5-10min

#

Say. Another smol question. I have some work related to taking features out of an image to generate a vector with features. I need to take texture, a little bit of geometry and some morphological features. Which library has these functions and can I get a source where I can look up such algorithms?

#

Stuff like glcm, drlbp and all.

jagged stump Jun 22, 2019, 9:24 PM

#

Hey everyone , I wanna ask a question . Is Data Engineer good ? It looks like focus arthitect beyond the data. And less work about ML,DL etc is it right

#

cause I took a call about data engineer position and idk what will I do anyone can give me info about it well ?

#

thanks

desert oar Jun 22, 2019, 9:28 PM

#

"good" is up to you. it's certainly important work, and the industry can always use competent data engineers

#

the work you'd do depends on the company, but you will generally be building or maintaining database systems and data pipelines

#

so maybe a data scientist or ML researcher will produce a model, maybe in a docker container

#

and the data engineers might be the people who connect that container to an application, which other programmers can use

#

or maybe you would be administering a data warehouse

#

stuff like that

#

possibly doing some "light duty" data analysis of your own

jagged stump Jun 22, 2019, 9:38 PM

#

Thanks for answer I am talking about Really big and good company

#

Datas coming from autonomus cars

#

and they want hadoop spark also kubernetes docker etc. Also they say about creating new pipelines and ML,DL is plus

#

my mind so mess nw idk that company is awesome and job would be better if it would be data-scientist but still looking cool cause of autonomus cars

desert oar Jun 22, 2019, 9:39 PM

#

yeah that would be a serious data engineering job

#

low latency pipelines handling huge amounts of data

jagged stump Jun 22, 2019, 9:40 PM

#

cause of datas will be real time . So its level 5

#

also level 4 I guess cause predict they are also suppose

#

so clearly level4-5 data engineer for autonomous cars

desert oar Jun 22, 2019, 9:41 PM

#

i dont know what levels you refer to

jagged stump Jun 22, 2019, 9:42 PM

#

in fact I dont know well about levels just I Know like level 4 is like predictable data and level 5 like real time data

desert oar Jun 22, 2019, 9:42 PM

#

ive never heard of these levels

jagged stump Jun 22, 2019, 9:46 PM

#

Here you have ;https://dataconomy.com/2014/08/infographic-the-5-levels-of-big-data-maturity/

Dataconomy

Infographic: The 5 Levels of Big Data Maturity - Dataconomy

(Image credit: Knowledgent)

sand reef Jun 22, 2019, 11:22 PM

#

So. I am still waiting if anyone knows about such sources. Please do answer.

lean ledge Jun 22, 2019, 11:49 PM

#

@sand reef yes, increasing resolution can reduce accuracy in some scenarios. Might be a good idea to learn about Laplacian triangles and multi-scale features from traditional computer vision. But you would expect deeper CNN to generally learn Gaussian blurring filters to get over that often, so it might have to do with your hyperparameters and how they're suboptimal for the new loss landscape

sand reef Jun 23, 2019, 12:49 AM

#

I see. Thanks! And @lean ledge do you know of a library with feature extraction algorithms for images like glcm, drlbp, etc.?

lean ledge Jun 23, 2019, 12:50 AM

#

I do not, sorry

sand reef Jun 23, 2019, 12:50 AM

#

Oh ok.

void anvil Jun 23, 2019, 2:46 AM

#

Data engineering for that is basically making high speed pipelines into ML models along with data validation and cleaning.

#

Personally that’s my least favorite part of the process, but it’s super valuable

#

What you want to know really well is data structures, pipeline methods, and outlier detection for the interview

silk forge Jun 23, 2019, 10:02 AM

#

from sklearn import linear_model
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (train_x, train_y)
# The coefficients
print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)

plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS,  color='blue')
plt.plot(train_x, regr.coef_[0][0]*train_x + regr.intercept_[0], '-r')
plt.xlabel("Engine size")
plt.ylabel("Emission")

#

plt.plot(train_x, regr.coef_[0][0]*train_x + regr.intercept_[0], '-r')

#

i dont understand this line ^

#

why the [0][0] and stuff

desert oar Jun 23, 2019, 11:01 AM

#

Its bad

#

coef_ is a 2d array, i.e. a matrix

#

Looks like they're plotting the 0,0th element of coef_

#

Should be [0,0] not [0][0] - the second way works but it's not recommended or idiomatic

#

Meanwhile intercept is a 1d array, i.e. a vector

fierce shadow Jun 23, 2019, 11:07 AM

#

Hey I was thinking about to get started with machine learning, just wanted to know what math is required ?

silk forge Jun 23, 2019, 11:09 AM

#

linear algebra

#

differential calculus and linear equations

#

@fierce shadow

fierce shadow Jun 23, 2019, 11:11 AM

#

anything more ?

silk forge Jun 23, 2019, 11:11 AM

#

probability

#

ik a course for you

fierce shadow Jun 23, 2019, 11:11 AM

#

hmm...okay...really ?

#

can you suggest one please ?

silk forge Jun 23, 2019, 11:12 AM

#

https://courses.edx.org/courses/course-v1:Microsoft+DAT256x+1T2019a/course/

#

Essential math for machine learning

fierce shadow Jun 23, 2019, 11:13 AM

#

thanks

lean ledge Jun 23, 2019, 11:28 AM

#

@fierce shadow there's a "mathematics of machine learning" post pinned on this channel that summarises the maths used

#

Right here actually https://towardsdatascience.com/the-mathematics-of-machine-learning-894f046c568

Towards Data Science

The Mathematics of Machine Learning

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using…

lapis sequoia Jun 23, 2019, 11:28 AM

#

@sand reef sry for ping but i got a life or death situation again

#

i am loading few images from a directory but their shape is (250, 250, 2)

#

why is an images shape 3?

#

when i printed it

#

it has one pair of useless brackets

#

how can i remove it?

earnest prawn Jun 23, 2019, 11:30 AM

#

The images shape is 3D because you have pixel X pixel y colour channels

lapis sequoia Jun 23, 2019, 11:30 AM

#

how can i reshape it?

earnest prawn Jun 23, 2019, 11:31 AM

#

You usually don't want to?

lapis sequoia Jun 23, 2019, 11:31 AM

#

oh

lean ledge Jun 23, 2019, 11:31 AM

#

Why do you want to reshape it?

lapis sequoia Jun 23, 2019, 11:31 AM

#

so its fine?

earnest prawn Jun 23, 2019, 11:31 AM

#

Yeah it's perfectly fine

lapis sequoia Jun 23, 2019, 11:31 AM

#

okay thanks

lean ledge Jun 23, 2019, 11:31 AM

#

Image data is spatial. CNNs are spatial. Can't reshape it in any way without screwing up how CNNs naturally work

earnest prawn Jun 23, 2019, 11:31 AM

#

I mean you could certainly make it smaller or bigger or just make it grey scale (aka 1 channel) if you wanted to

lean ledge Jun 23, 2019, 11:32 AM

#

You can up or down sample etc or do other similar preprocessing but you can't reshape the network layer without losing properties

lapis sequoia Jun 23, 2019, 11:32 AM

#

okay

round crane Jun 23, 2019, 4:53 PM

#

anyone knows how to make pie charts with matplotlib.figure?

olive willow Jun 23, 2019, 4:55 PM

#

you mean matplotlib.pyplot ? @round crane

round crane Jun 23, 2019, 4:55 PM

#

no

#

i don't mean that

#

i'm in fact explicitly avoiding that

olive willow Jun 23, 2019, 4:56 PM

#

https://matplotlib.org/3.1.0/gallery/pie_and_polar_charts/bar_of_pie.html#sphx-glr-gallery-pie-and-polar-charts-bar-of-pie-py

#

this should be it

round crane Jun 23, 2019, 4:56 PM

#

first line import matplotlib.pyplot as plt

olive willow Jun 23, 2019, 4:56 PM

#

idk I clicked figure

#

sry

#

so you mean this kind of import https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.figure.html

round crane Jun 23, 2019, 4:57 PM

#

right this one

olive willow Jun 23, 2019, 4:57 PM

#

cuz what I clicked was as an example of it

round crane Jun 23, 2019, 4:57 PM

#

really

olive willow Jun 23, 2019, 4:58 PM

#

yh

#

look

#

it last column 5th row

#

bar or pie

round crane Jun 23, 2019, 4:59 PM

#

but https://matplotlib.org/3.1.0/faq/howto_faq.html#matplotlib-in-a-web-application-server

#

this mentions figure.figure not pyplot.figure

olive willow Jun 23, 2019, 4:59 PM

#

oohhh you want it to be in a flask site?

round crane Jun 23, 2019, 4:59 PM

#

not necessarily

#

but yes a web-app

olive willow Jun 23, 2019, 5:00 PM

#

I don't think it makes a difference

#

@earnest prawn please explain this

#

from matplotlib.figure import Figure

round crane Jun 23, 2019, 5:01 PM

#

right

olive willow Jun 23, 2019, 5:01 PM

#

import matplotlib.pyplot.figure

#

what's the difference

#

I think that the first one only imports the figure module

#

and the second one too

round crane Jun 23, 2019, 5:02 PM

#

the FAQ says without pyplot

olive willow Jun 23, 2019, 5:02 PM

#

idk

earnest prawn Jun 23, 2019, 5:02 PM

#

whats there to explain

#

you got your examples and youre done

olive willow Jun 23, 2019, 5:02 PM

#

what's the difference ?

#

or is there none?

earnest prawn Jun 23, 2019, 5:02 PM

#

pyplot can actually generate figures if you want it to

olive willow Jun 23, 2019, 5:02 PM

#

I think that they're the same

earnest prawn Jun 23, 2019, 5:02 PM

#

so pyplot is just a bit higher level

olive willow Jun 23, 2019, 5:03 PM

#

oohhhh so like more functionality

earnest prawn Jun 23, 2019, 5:03 PM

#

but internally (and with the right way also externally) you can created figures with it

#

no its less functionality

#

pyplot internally uses figures

olive willow Jun 23, 2019, 5:03 PM

#

I mean pyplot

earnest prawn Jun 23, 2019, 5:03 PM

#

yeah basically

olive willow Jun 23, 2019, 5:03 PM

#

Danke!

#

you see, you know everything

#

they should make a nix bot

earnest prawn Jun 23, 2019, 5:04 PM

#

i in fact am guessing this based on seeing other matplotlib code which mentioned the creation of a fig variable

#

and as in htis code they also do it i am just making an assumption

sand reef Jun 23, 2019, 5:43 PM

#

I have a question. I want to make a face detection system. Just detection, nothing else. How do I get a good dataset to train my network in? I want to do it with live video feed.

#

And what model is good for it?

#

I want to do something for a small self project.

#

I tried training a small network with 32 conv net filters, dropout layers and a 36 neuron network. With just a resolution of 100x100. I used 1k images with face and 1k without.

#

Which I captured using the video camera.

#

How do I proceed? I tried doing it and the issues I see is, the model kind of does and doesn't recognize the face at different angles and distances.

#

I thought of using the hidden layers and generating Euclidean distance to find similarity between images and not, but I am not sure if that will work. Will it?

#

And I am also not sure how would I implement it.

#

Sorry. The resolution is 50x50

hallow pendant Jun 23, 2019, 6:47 PM

#

import re

pattern = r"^gr.y$"

if re.match(pattern, "grey"):
print("Match 1")

if re.match(pattern, "gray"):
print("Match 2")

if re.match(pattern, "stingray"):
print("Match 3")

#

this outputs match 1 and 2

#

and someone explain meta characters to me??

sand reef Jun 23, 2019, 6:56 PM

#

@hallow pendant wrong channel. Go to one of the help channels

hallow pendant Jun 23, 2019, 7:56 PM

#

oh sorry

jagged stump Jun 23, 2019, 10:55 PM

#

I wanna ask something. For example I want data for CNN . Mercedes-Benz brand logo. I didnt want ready logo so I wanted make my own. I did install automatically from google graphics. with different keywords first 40 image. Keywords like ; mercedes benz logo , mercedes benz brand etc. But some of them really not about mercedes benz so its not well picked data. Wha is the best way classification this data and out non mercedes benz images? Thanks a lot

velvet thorn Jun 23, 2019, 11:22 PM

#

well...that's what you're building the CNN for, right?

#

you could a. do it yourself or b. pay someone to do it for you

#

you could run unsupervised classification

lapis sequoia Jun 24, 2019, 1:09 AM

#

@jagged stump if you just want to filter images between sets, you can use microsoft computer vision api.. it's easy, just upload a couple of images to train it

strong flare Jun 24, 2019, 2:36 AM

#

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 2000, 10000)
plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

plt.xlabel('x label')
plt.ylabel('y label')

plt.title("Simple Plot")

plt.legend()

plt.show()

Hi guys, may i know how can i set ytick to number not in scientific notation ?

#

this is the graph look like

📎 unknown.png

lean ledge Jun 24, 2019, 2:40 AM

#

you want the y axis to write 1 followed by 9 zeroes????

#

1000000000
2000000000
3000000000

strong flare Jun 24, 2019, 2:44 AM

#

yes

#

you have any idea ? 😀

desert oar Jun 24, 2019, 3:13 AM

#

ax.set_yticklabels should do it

lapis sequoia Jun 24, 2019, 3:21 AM

#

what's the support? (in classification tasks) i'm trying to make sense of my classification report..

#

I trained a classifier for 3 classes, I have the precision recall f1 score and support for 120k examples.. and it shows for individual classes, the support is 40k

#

not sure what that means

strong flare Jun 24, 2019, 3:30 AM

#

%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np

def tickformat(x):
    if int(x) == float(x):
        return str(int(x))
    else:
        return str(x)  

    
plt.figure(figsize = (8,6)) 
fig, ax = plt.subplots()
    
x = np.linspace(0, 2000, 10000)
plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

fmt = FuncFormatter(lambda x, pos: tickformat(x))
ax.yaxis.set_major_formatter(fmt)

plt.xlabel('x label')
plt.ylabel('y label')
plt.title("Simple Plot")

plt.legend()

plt.show()

#

📎 unknown.png

lapis sequoia Jun 24, 2019, 3:31 AM

#

do you want your y in that scale?

strong flare Jun 24, 2019, 3:31 AM

#

@desert oar @lean ledge i just get this on google it's work

lapis sequoia Jun 24, 2019, 3:31 AM

#

ok seems like you did

strong flare Jun 24, 2019, 3:32 AM

#

@lapis sequoia Yes, this is just a sample i need those number in figure not in scientific notation

lapis sequoia Jun 24, 2019, 3:33 AM

#

ok

#

@lapis sequoia I'm having trouble understanding Support

jagged stump Jun 24, 2019, 5:17 AM

#

@lapis sequoia I prefer use python script

#

@velvet thorn so we can say like K-means? But how ? HoG? Shift? What is the way?

lapis sequoia Jun 24, 2019, 5:22 AM

#

??

jagged stump Jun 24, 2019, 5:49 AM

#

@lapis sequoia you offer me ; microsoft computer vision api.. I have no idea what its even . Can you give me more details

lapis sequoia Jun 24, 2019, 5:55 AM

#

oh it seems like you want to use python.. so it's not an option

jagged stump Jun 24, 2019, 10:18 AM

#

it must be someehow. I have 100 data and 50 of them mercedes benz and others something different . Cant I take somehow only take that 50 mercedes benz ? Maybe with some ML algorithm?

#

any idea?

hollow quartz Jun 24, 2019, 10:55 AM

#

Hi. I hava a variation of numerical variable with date-time variable. Can I use a ML algorithm to predict the numerical variable?

sand reef Jun 24, 2019, 2:56 PM

#

What's a good model to use for face detection? For live video feed?

wide gyro Jun 24, 2019, 3:58 PM

#

Switching from iterrows() to get_value for efficiency, but now I'm running into a problem with appending my copied dataframe

#

I have everything inside for loop for for i in df.index:

#

and inside that loop I have a check statement and then dfCopy = dfCopy.append(i)

#

in which I am now getting TypeError: cannot concatenate object of type "<class 'int'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

#

I previously used for index, row in df.iterrows():

#

with the append statement being dfCopy = dfCopy.append(row)

#

which worked fine

#

nevermind seeing its being removed in a future release

desert oar Jun 24, 2019, 8:34 PM

#

chances are you don't actually need to loop over rows manually

#

what are you trying to achieve?

#

most of the time you think you need iterrows() you probably just want .apply(..., axis=1) instead

quartz stream Jun 25, 2019, 9:55 AM

#

Anyone on any start for voice authentication

#

I mean recognizing a perosn voice

#

doesn't matter if initially its only for numbers

earnest prawn Jun 25, 2019, 1:15 PM

#

"initially only for numbers"??? a computer cannot make predictions on inputs which are not numeric

hollow quartz Jun 25, 2019, 2:11 PM

#

I have a much values in my scatter. How can I reduce the values of x axis

📎 Capture.PNG

desert oar Jun 25, 2019, 2:59 PM

#

ax.set_xticks @hollow quartz

vital bison Jun 25, 2019, 5:43 PM

#

anyone here has particiated/participating ina two sigma competition in kaggle? I was interested in how the data set looks like as the deadline has passed, if from an older one also there's no problem , please feel free to DM me

lapis sequoia Jun 25, 2019, 6:46 PM

#

I would love to learn data science, but I am still struggling with learning it.

sand reef Jun 25, 2019, 8:08 PM

#

Could I get a recommendation for a project to do? A project to do so that ppl will hire me for internship?

#

I know, wxpython, pyqt5, decent amount of tensorflow and keras, SQL and currently doing Django.

#

And I have been doing competitive in python and C++.

#

And I know cv2 and Pygame. And SFML in C++.

#

I checked online for good projects to do, and ppl were like reimplement GNU ld, objdump and make a binary recompiler or something. And I thought maybe I could do something with Facenet, turns out there is a library already with what I want to do. I wanted to use Facenet with live video feed. I am completely lost.

void anvil Jun 25, 2019, 10:11 PM

#

@sand reef scrape the weather station info from NOAA and try to do 5 minute ahead wind speed, direction, and day-ahead high / low temperatures

sand reef Jun 25, 2019, 11:34 PM

#

Oh. That's almost perfect. Thanks!

void anvil Jun 26, 2019, 1:43 AM

#

Common weather forecasts are: 5 minute ahead, 1 hour ahead, 1-7 days ahead, 90 day average temp

lapis sequoia Jun 26, 2019, 3:32 AM

#

That actually sounds really interesting

strong flare Jun 26, 2019, 7:47 AM

#

Hi guys i have a dataframe with date, may i know how can i get the moving avg with most easy way ?
i already get the min & max date

#

the moving avg like 3 day and 7 day

#

some of the solution is using this pandas_datareader but this is only online catching data ?
https://stackoverflow.com/questions/40060842/moving-average-pandas
https://youtu.be/XWAPpyF62Vg

Stack Overflow

Moving Average- Pandas

I would like to add a moving average calculation to my exchange time series.

Original data from Quandl

Exchange = Quandl.get("BUNDESBANK/BBEX3_D_SEK_USD_CA_AC_000", authtoken="xxxxxxx")

Valu...

YouTube

The Academy

Python Pandas data analysis tutorial part 2

Welcome to the python pandas programming tutorial part 2. In this python pandas episode, we are going to calculate moving averages for a stock using python p...

▶ Play video

strong flare Jun 26, 2019, 8:06 AM

#

ok i get it have to use this df.rolling().mean()

sand reef Jun 26, 2019, 1:02 PM

#

Can I get some advice on a series of algos I got from scikit image?

#

I need to take out texture features.

#

And there are a multitude of algos for it.

#

But what I see is: SIFT, SURF, Daisy, LBP, HoG and a binary descriptor extractor called ORB.

#

Is it of any use if I use multiple of them? Or just one suffices?

#

Here is the documentation : https://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.hog

#

And could anyone point me to a source where I can understand why should I use any of them, and which one is preferred when?

lean ledge Jun 26, 2019, 1:11 PM

#

Computer vision is a rather large field in of itself, with a very large body consisting of stuff inspired by signal processing, traditional algorithms, statistics and machine learning based techniques

#

Seriously though, it's a large field.
Take one example, SIFT. It's "Scale Invariant Feature Transform"
It takes an image and converts it into a difference of gaussian scale space representation for scale invariance (signal and image processing)
It finds features using generic multivariate calculus techniques + a discretised taylor series on the image, doing that over the entire new 3D scale space (general multivariate calculus)
It does feature matching (backed by a kd tree supported nearest neighbour searches) (traditional algorithms)
Uses hough transformations to identify clusters in the feature space and do voting (generic computer vision optimisation stuff)

#

There is no possible reference you can give that goes over the nuances of the techniques without a lot of background and explanation

sand reef Jun 26, 2019, 2:24 PM

#

Hmm. So that means back to square one.

#

And any directions for if I need to find some geometric features like depth of the retina, and for morphological features like lesions in the retina?

#

Something like the depth(or height? whichever makes more sense, its 2D) of the whitish region.

📎 CNV-103044-2.jpeg

#

Or abnormalities on the surface, like a little part of it being partially peeled.

#

And from the looks of it, I think that computer vision might be a field gradually explored by experience? Like very very gradual exploration

lean ledge Jun 26, 2019, 2:42 PM

#

Or just like, reading a book or going through a course? @sand reef

#

There's no need to be slow and gradual but also don't expect the field to be reduceable to a cheat sheet

#

@sand reef https://www.edx.org/course/robotics-vision-intelligence-machine-pennx-robo2x Here's how I learnt it

edX

Learn how to design robot vision systems that avoid collisions, safely work with humans and understand their environment.

#

There's little specific to robots there so be scared not

sand reef Jun 26, 2019, 3:13 PM

#

Thanks! I'll look it up! Although, any pointers on what would I do for those requirements in my above question?

hollow hearth Jun 26, 2019, 3:16 PM

#

Hey guys, I have a pandas question. I am working with a dataset that has 'Download Date' and 'Customer Name' fields built into it. What I want to do is remove duplicate Customer Names based on the Download Date. For example, if this is the starting point:
Download Date: 06/10/2017 Customer Name: Jim Jacobs
Download Date: 06/10/2017 Customer Name: Mark Johnson
Download Date: 06/10/2017 Customer Name: Jim Jacobs
Download Date: 09/15/2018 Customer Name: Jim Jacobs

I want it to end like this:
Download Date: 06/10/2017 Customer Name: Jim Jacobs
Download Date: 06/10/2017 Customer Name: Mark Johnson
Download Date: 09/15/2018 Customer Name: Jim Jacobs

I have been trying to do it without iterating (I don't think I have to, I could be wrong). Can anyone point me in the right direction? My initial thoughts were to use .groupby() and do stuff with the groups but for some reason I cannot get that to work how I want it to.

Edit: I got it. I needed to pass the download date into the subset parameter of drop_duplicates.

chilly shuttle Jun 26, 2019, 3:59 PM

#

you can use drop_duplicates

#

the more general approach is to use groupby to aggregate by whatever fields you need and perform whatever operation you need

hollow hearth Jun 26, 2019, 7:10 PM

#

The only reason why I swayed from that is because the excel file that I was using was already grouped by the download date. Forgot to mention that D:. But I just ended up passing download date and customer name into drop_duplicates and that got me the results that I was looking for

true badger Jun 27, 2019, 4:03 AM

#

Do distances in PCA 2D space mean anything?

polar acorn Jun 27, 2019, 12:54 PM

#

Numerically, I don't think so i.e. a distance of 5 or 10 has no interpretation. But relative to each other I would think that a distance has some interpretation i.e. point pairs that have smaller distances are in some way more similar than point pairs with large distances, but I don't think there is a clear interpretation other than this vague idea of similarity. At least not without carefully examining the principal components.

lapis sequoia Jun 27, 2019, 1:09 PM

#

Okay, so I want to use machine learning to try to categorize measurement series, where every data point is essentially a group of measurements (somewhere between 10 000 and 100 000 measurements for each data point). Does anyone know if there exists machine learning algorithms for these types of tasks, that will derive attributes from the data on its own, or will that have to be done manually?

polar acorn Jun 27, 2019, 1:15 PM

#

This could be done with a LSTM or an TCN. Do I understand it correctly if each timestep in your series has between 10 000 and 100 000 values associated with it? If that's the case maybe you should look into some form of dimensionality reduction before feeding to a classification model?

lapis sequoia Jun 27, 2019, 1:20 PM

#

Each timestep basically has as many values as i want. The more i use, the more accurate it becomes, as the measurements are prone to noise

#

I am basically looking at RTT delay measurements.

An example of what i want to be able to do is getting measurements between two points A and B, and a Server C. Then i want to be able to differentiate between the RTT of A-C and B-C. Most of the time the minimum value should be sufficient if i have enough measurements, but i want to be able to reduce the number of measurements to a point where i may not be able to tell from the minimum delay alone.

#

Does that clarify it a little bit? @polar acorn

polar acorn Jun 27, 2019, 1:53 PM

#

Sure! Now I do in no way have enough domain knowledge to actually be of use here so i'll try making some assumptions. If I understand this correctly you have a series of measurement sets for example lets say [[3,4,3], [4,5,4,5], [5,5,2]] etc of some arbitrary length. And you want to estimate if this series belong to the A-C class or the B-C class. I assume that you have several examples of both classes. I assume what sets A-C and B-C apart is some attribute of the series it self. Maybe A-C rises quicker or B-C oscillates more over time or whatever. In that case I would for each time step find the mean, std, minimum and maximum of the values sampled at that time and create a new time series with the same length but those same measurements at each time step. And then feed that to an LSTM model.

lapis sequoia Jun 27, 2019, 2:00 PM

#

Yea that was basically hat i was wondering: If i had to derive values such as mean, std, minimum, etc. on my own, or if i could just feed the whole measurement series into a ML method that would do it on its own.

#

In regards to domain i am measuring RTT in a network with low traffic, so network traffic is generally considered noise i want to sort out, so maximum values wont be very interesting

#

📎 Skjermbilde_2019-06-25_kl._09.04.35.png

#

That is an example of the data i can get between 2 different points and a Server C

#

Anyway i should probably look into RNN and LSTM

polar acorn Jun 27, 2019, 2:17 PM

#

Another way could be to first compute summary stats for each time step and then use something like. https://github.com/blue-yonder/tsfresh to automatically extract features from the whole of those time series and then throw those features into whatever classifier you like. I'm afraid you have to create those summary stats yourself, but I'm sure you can set them up them nicely into some pipeline together with your training.

GitHub

blue-yonder/tsfresh

Automatic extraction of relevant features from time series: - blue-yonder/tsfresh

lapis sequoia Jun 27, 2019, 2:22 PM

#

Yea, I was actually about to do something similar before i stopped myself and wondered if this was something ML could take care of entirely

#

Anyway, thanks so much for the input!

lapis sequoia Jun 28, 2019, 12:54 AM

#

deep learning would do a lot better of a job than machine learning ever could @lapis sequoia

earnest prawn Jun 28, 2019, 12:55 AM

#

Deep learning is a sub discipline of machine learning so your statement does simply not make any sense

eternal flare Jun 28, 2019, 1:24 AM

#

hey all can anyone look at some sample code ive been trying to get working? Im working on a simple trading algorithm using pandas and numpy and i feel like ive been looking at the code so long i cant see whats in front of my face

lapis sequoia Jun 28, 2019, 1:24 AM

#

!ask

arctic wedgeBOT Jun 28, 2019, 1:24 AM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

eternal flare Jun 28, 2019, 1:26 AM

#

Whats the simplest way to post code here?

#

Its over the 2,000 limit

lean ledge Jun 28, 2019, 1:26 AM

#

@lapis sequoia that is objectively false

#

Deep learning is rarely better than traditional ML in reality for a lot of reasons

lapis sequoia Jun 28, 2019, 1:28 AM

#

not from what I’ve read up about it js @lean ledge

#

from what I’ve read, deep learning is machine learning, but better cause of the neural networks @earnest prawn

eternal flare Jun 28, 2019, 1:29 AM

#

Can anyone tell me what im doing wrong when im trying to get my signal to buy in this simple trading algorithm applicaton? https://docs.google.com/document/d/1d3LzcXByLDlF6a-amt-U4b_bOy8l6c2L109i9xU7OvE/edit?usp=sharing

Google Docs

stocks

import pandas as ps import matplotlib.pyplot as plt import numpy as np from datetime import * sma10 = 10 sma30= 30 Bank = 10000.00 charge = 5.00 transactions = 0 df = ps.read_csv('OilHistoryData.csv', parse_dates=['Date']) df['Date'] = ps.to_datetime(df['Date']) mask = (d...

lapis sequoia Jun 28, 2019, 1:30 AM

#

why’re you trying to get your signal to do that? @eternal flare

lean ledge Jun 28, 2019, 1:30 AM

#

@lapis sequoia neural networks doesnt make it better

#

Deep learning is very rarely better

#

Outside of the domains of CV and NLP, it's not even used much

eternal flare Jun 28, 2019, 1:31 AM

#

its supposed to be a basic cross investment strategy. When the simple moving average for 10 or 30 days is reach a signal is put out to either buy or sell.

lapis sequoia Jun 28, 2019, 1:33 AM

#

I concur.. DL hasn't had major applications over ML in applications outside of NLP.. I'm not sure about what they do in CV.. but image processing otherwise has had lot of impact

lapis sequoia Jun 28, 2019, 3:13 AM

#

well then why exactly isn’t it ever used much, in that case? @lean ledge

lean ledge Jun 28, 2019, 3:47 AM

#

Time to train
Amount of data needed
Being finicky in training

#

Getting ample clean data is ridiculously hard in the real world

#

Unlike traditional models, getting training right is a hard part of DL

lapis sequoia Jun 28, 2019, 6:21 AM

#

yeah, but yet it still manages to be better in terms of accuracy @lean ledge

lean ledge Jun 28, 2019, 6:52 AM

#

@lapis sequoia please don't comment on stuff you don't know.

#

It's best to not

lapis sequoia Jun 28, 2019, 6:55 AM

#

there are literally full on articles online talking about how deep learning's better than machine learning @lean ledge

lean ledge Jun 28, 2019, 6:57 AM

#

Congrats there's articles saying the opposite also. Your point?

#

I've listed out the reasons why deep learning is rarely the solution for things outside CV and NLP. Think you know better? Tell me why I'm wrong instead of referring to "there's articles"

#

@lapis sequoia

sand reef Jun 28, 2019, 7:11 AM

#

If I am not wrong, the amount of data needed for DL is the biggest drawback, right?

silent swan Jun 28, 2019, 7:25 AM

#

there're the generic things like: lots of data, big compute (varies depending on algo), large memory requirement

#

there're more hand-wavey things specific to specific areas like "imposing structure on your problem/solution"

#

DL is big and very good and progressing quickly. It's not all of ML, but it's certainly the most exciting and quickly progressing part of ML right now

#

but it's also because DL is big and exciting that you don't hear much about regular ML (which is still everywhere, and growing), because it's not that exciting to read about

#

no one wants to read about 1000 small companies using collaborative filtering or gradient boosted trees

#

everyone wants to read about deepnudes

lapis sequoia Jun 28, 2019, 9:19 AM

#

lol deepnudes

quartz stream Jun 28, 2019, 1:18 PM

#

anyone

#

wanna do a project

#

on Speaker Verification ?

#

creating a complete library where we just take voice of user say from number 1 to 10

#

and create a model that works for that user only

#

there are no ready made material for this so more the merrier

#

maybe a publish a good research paper on our technique.

lapis sequoia Jun 28, 2019, 1:30 PM

#

guys i have an image with shape (120, 120, 2)
how how i show it in mat plotlib?

#

import os
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np


DOGS_DIRECTORY = "training_set/dogs/"
CATS_DIRECTORY = "training_set/cats/"
cats = os.listdir(CATS_DIRECTORY)
dogs = os.listdir(DOGS_DIRECTORY)
def load_data_dogs():
  x = []
  for dog in dogs:
    if dog[-4] == ".":
      img = Image.open(DOGS_DIRECTORY + dog)
      img = img.resize((120, 120))
      img = img.convert('LA')
      x.append(np.array(img))
  return x

x = load_data_dogs()
plt.imshow(x[0])```

#

when i run this code

#

TypeError                                 Traceback (most recent call last)
<ipython-input-30-2e8a46bc51f4> in <module>()
     20 
     21 x = load_data_dogs()
---> 22 plt.imshow(x[0])
     23 

3 frames
/usr/local/lib/python3.6/dist-packages/matplotlib/image.py in set_data(self, A)
    636         if not (self._A.ndim == 2
    637                 or self._A.ndim == 3 and self._A.shape[-1] in [3, 4]):
--> 638             raise TypeError("Invalid dimensions for image data")
    639 
    640         if self._A.ndim == 3:

TypeError: Invalid dimensions for image data```

#

please ping me when answering Thank you

lapis sequoia Jun 28, 2019, 5:49 PM

#

how're you able to program an image with shape? @lapis sequoia

earnest prawn Jun 28, 2019, 6:03 PM

#

@lapis sequoia images always have a shape, what matplotlib.imshow is expecting is however an image with 3 channels ( the format being (X pixels, Y Pixels, Channels) @lapis sequoia what type of image is this to only have two channels?

dull fern Jun 28, 2019, 6:48 PM

#

@naive shore have you tried to change dtype parameter ?

lapis sequoia Jun 28, 2019, 7:49 PM

#

no, I mean I was taking a bit aback that was even able to literally program his own entire image too, although I shouldn't have been, now thinking back @earnest prawn

earnest prawn Jun 28, 2019, 8:00 PM

#

What

lapis sequoia Jun 28, 2019, 8:43 PM

#

I was referring to @lapis sequoia with my comment @earnest prawn

lean ledge Jun 28, 2019, 9:42 PM

#

@lapis sequoia Can you stop pinging random people with dumb questions? You've been asked to do so by an admin already

lapis sequoia Jun 28, 2019, 9:43 PM

#

sorry, didn’t mean to ping you asshole @lean ledge

south quest Jun 28, 2019, 9:50 PM

#

@lapis sequoia That's not an appropriate way to address others, you've already been warned for your behaviour, watch it.

lapis sequoia Jun 28, 2019, 9:54 PM

#

well tbqf, mr. “dumb questions” wasn’t being appropriate himself, with the tone he was conveying, so @south quest

obsidian linden Jun 28, 2019, 9:56 PM

#

could agree that was also not polite. Still using '@' on every post is not necessary and very annoying

#

In rooms with so little traffic it should not be needed

#

It triggers a notification

lapis sequoia Jun 28, 2019, 9:58 PM

#

it’s how I usually directly convey a message to someone in a public chat usually, although I’ve been trying to refrain from doing it as much recently

obsidian linden Jun 28, 2019, 10:01 PM

#

Probably a good thing. Using @ notifies the user directly even sometimes with sound. Can be very annoying 😃

#

It adds importance and urgency to your message, something that is almost never needed

void anvil Jun 28, 2019, 10:31 PM

#

I'm running some custom data feature creation on WSL. I'm getting a "kernal has died" error pretty consistently on the run through. Individually, each function runs through and the resulting dataframe when dumped to .csv should be 1.7 gb (~1.4M rows, ~100 columns) and fits in memory. I've created this for a slightly smaller data set (~1.2m rows, what the estimates are based off of). Is this happening due to an out of memory error for the WSL subsystem?

Error dump from the console:
https://pastebin.com/X1WrHrLS

Pastebin

7f602abd0000-7f602ac50000 rw-p 00000000 00:00 0 7f602ac50000-7f60...

grave void Jun 28, 2019, 10:32 PM

#

I am looking to use Grafana to plot some data via influxDB and am not sure how to structure my data

#

I have a CSV file with a column of scores (0,10) and a timestamp for each score

#

Does anyone here have any experience with Grafana/InfluxDB and how I should go about posting my data?

#

Not sure if I should use a data_frame

lapis sequoia Jun 29, 2019, 2:04 AM

#

@silent swan you seem to know your stuff in terms of machine & deep learning, you a dev for either?
(referring to this):

📎 Screen_Shot_2019-06-28_at_3.13.43_AM.png

versed wyvern Jun 29, 2019, 6:47 AM

#

AYE

#

im here for the DS!

sand reef Jun 29, 2019, 9:41 AM

#

@lapis sequoia it should always be 3D image. When it's black and white, it has 1 channel. Meaning it's dimension is [X, Y, 1]

#

So try using numpy.reshape

lapis sequoia Jun 29, 2019, 9:46 AM

#

👍

#

Solved it yesterday but thanks

#

But I get another problem as usual

#

I don’t think this is good

📎 image0.png

#

How can I reduce the over fitting

#

from keras.models import Sequential
from keras.layers import Flatten, Dense, Conv2D, MaxPooling2D, Activation


x_train, y_train = load_train_data()
x_test, y_test = load_test_data()
x_train = x_train.reshape((-1, 120, 120, 1))
print(x_train.shape, y_train.shape)

model = Sequential()
model.add(Conv2D(120, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(optimizer="adam", loss="binary_crossentropy",
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, shuffle=True)```
I already normalised the data

earnest prawn Jun 29, 2019, 10:13 AM

#

Maybe a dropout between the conv and the linear part of the model could help @lapis sequoia

lapis sequoia Jun 29, 2019, 10:13 AM

#

so a drouput after the first layer?

earnest prawn Jun 29, 2019, 10:14 AM

#

I'd say after the max pooling but i am honestly not exactly sure

lapis sequoia Jun 29, 2019, 10:14 AM

#

okay lemme try after the maxpooling

earnest prawn Jun 29, 2019, 10:21 AM

#

@lapis sequoia gtg now, I'll be back later but I'm sure if this doesn't help you'll find other people with more ideas

lapis sequoia Jun 29, 2019, 10:21 AM

#

okay thanks

#

it did show slight improvement but very minute

silk forge Jun 29, 2019, 1:25 PM

#

whats a good way to visualize pandas data sets

#

python interpreter can fit all the data 😫

#

what if i wanted to see all the data

brazen wing Jun 29, 2019, 1:47 PM

#

That would depend on the data

lapis sequoia Jun 29, 2019, 1:52 PM

#

@silk forge pandas profiling? or you can do a group by and do size()

#

for each group

jagged stump Jun 29, 2019, 1:59 PM

#

so what about sparkSQL does anyone knows it very well?

lapis sequoia Jun 29, 2019, 2:07 PM

#

what about it

#

depends why you want to use it

jagged stump Jun 29, 2019, 2:16 PM

#

I have a study case from a company about data analystics. I never doing something about big data but it looks like a little big on OpenStreetMap(OSM).

#

So I must do something with sparkSQL(I know basic SQL) or/and MLlib of Spark

#

I am looking for example and well introduce about these topics

lapis sequoia Jun 29, 2019, 2:18 PM

#

if your purpose is to dashboard the data or use sql for running queries on it.. then you're better off using big query, etc..

#

if you want to write nested queries on a large dataset, go with sparksql

silk forge Jun 29, 2019, 2:19 PM

#

hey

jagged stump Jun 29, 2019, 2:21 PM

#

I have to with Sparksql cause I already told its such a big data

lapis sequoia Jun 29, 2019, 2:31 PM

#

sparksql is not the only tool for handling big data..

jagged stump Jun 29, 2019, 2:34 PM

#

I want explore data with it

crude parcel Jun 29, 2019, 6:09 PM

#

yeah what do people prefer with big data and python?

#

i personally hate spark

#

good for its time but i dont like it

jagged stump Jun 29, 2019, 6:38 PM

#

in fact I Must knowledge about scala but so lazzy for laern it 😛

crude parcel Jun 29, 2019, 7:04 PM

#

not a big a fan of scala either

#

there are lots of other options too, in python. Maybe some that have CPython involved too.

stoic rune Jun 29, 2019, 8:27 PM

#

How do you calculate a probability vector in Python? https://scipy.github.io/devdocs/generated/scipy.spatial.distance.jensenshannon.html

jagged stump Jun 29, 2019, 9:00 PM

#

Row(tags=[Row(key=bytearray(b'highway'), value=bytearray(b'residential')), Row(key=bytearray(b'name'), value=bytearray(b'Honv\xc3\xa9d utca'))])sample of map data. Its data from first row. So how can I get all rows' data like key=highway and value=residential with sparkSQL

#

any idea

celest moss Jun 30, 2019, 12:48 PM

#

Hey guys ! I am new to machine learning. I want to understand the math used in the sklearn library. Can you recommend any books ?

safe merlin Jun 30, 2019, 12:54 PM

#

EVERYONE LISTEN, WHICH IS BETTER JUPYTER NOTEBOOK OR JUPYTER LABS

#

OOOF

simple crag Jun 30, 2019, 12:56 PM

#

Type like an adult

lean ledge Jun 30, 2019, 1:15 PM

#

@celest moss KF Riley's Mathematical methods for physics and engineering has always been my maths reference of choice

hollow shard Jun 30, 2019, 6:56 PM

#

Does anyone here know a good tutorial where the deriavatives for the first layer of a 1 hidden layer Neural net are derived?

#

All the ones that I've read so far gloss over it

#

Also, really sorry for off topic question, but does anyone know where I could ask a broad questionabout fluid simulations with python?

jagged stump Jun 30, 2019, 7:13 PM

#

 |-- version: integer (nullable = true)
 |-- timestamp: long (nullable = true)
 |-- changeset: long (nullable = true)
 |-- uid: integer (nullable = true)
 |-- user_sid: binary (nullable = true)
 |-- tags: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- key: binary (nullable = true)
 |    |    |-- value: binary (nullable = true)
 |-- nodes: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- index: integer (nullable = true)
 |    |    |-- nodeId: long (nullable = true)

#

I want parse data with sparkSQL anyone knows how?

#

Row(tags=[Row(key=bytearray(b'highway'), value=bytearray(b'residential')), Row(key=bytearray(b'name'), value=bytearray(b'Honv\xc3\xa9d utca'))])

#

#

Sample of data. I wanna tags column has value-key as you see. For example I wanna get data key=highway and value=primary

proud iris Jul 1, 2019, 6:35 PM

#

Offtopic, and MATLAB. So thing is I'm interfacing with ros and subscribing to some topics in matlab

#

When I'm using the rossubscriber command in terminal, everything is working fine, but when I'm trying to put it in a .m code, for example

Sub=rossubscriber('point_data')
Recv=Sub.LatestMessage
Recv.Data

The code is not printing anything

#

The same code works in the terminal. A note here, the point_data has a custom message, I've tried the same code with other topics with standard messages and they worked in both terminal and .m file

#

I can't understand why something will behave differently in terminal and code

dull fern Jul 1, 2019, 6:57 PM

#

@hollow shard https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ explain backpropagation very well

Matt Mazur

Mazur

A Step by Step Backpropagation Example

Background Backpropagation is a common method for training a neural network. There is no shortage of papers online that attempt to explain how backpropagation works, but few that include an example…

hollow shard Jul 1, 2019, 6:57 PM

#

oh thanks

#

much obliged 👍

lyric canopy Jul 1, 2019, 7:19 PM

#

https://discordapp.com/invite/bBMbNCT

#

This invite was removed by our filters, but it doesn't quite fit what we mean by advertisement. I think it was meant for @proud iris by @polar acorn

proud iris Jul 1, 2019, 7:22 PM

#

Thanks!

faint spruce Jul 1, 2019, 7:44 PM

#

Hey guys, I have a small question about Arrays and Lists. Is this the right place to ask?

lyric canopy Jul 1, 2019, 7:48 PM

#

That probably depends on your question. If it's a more general Python question, you're probably better off in a help channel, since you'll get a much quicker response there. If it's more touching on data science or a related field, then this channel is probably a better fit.

faint spruce Jul 1, 2019, 7:54 PM

#

Ok, it is actually verry basic. I need to compare lists to arrays for an exam. If I want to use an array I have to declare it first, right? So I would need to specify the data type it uses and how big it will be

lyric canopy Jul 1, 2019, 7:57 PM

#

It depends a bit on what kind of array you're talking about. When you construct a Python array.array, you do indeed set the type, but it doesn't have a fixed size. You can still append to it just like a regular list. The same holds for a numpy.ndarray.

faint spruce Jul 1, 2019, 7:57 PM

#

yeah

#

thats why I am confused

lyric canopy Jul 1, 2019, 7:58 PM

#

There's another difference that's more important

faint spruce Jul 1, 2019, 7:58 PM

#

Because that was my understanding of Arrays and then I saw the following code

#

a = arr.array('d', [1.1, 3.5, 4.5])

#

and there was no specification of size

lyric canopy Jul 1, 2019, 7:58 PM

#

Yes, this sets the type (first argument) and initializes it with three elements (from the iterable that's provided as the second argument)

#

But, you can still append to it

#

a.append(9.2)

#

It's just like a python list: a = list([1.1, 3.5, 4.5])

#

It's just that if you were to look at how the two are defined in memory, then you'd see something entirely different

faint spruce Jul 1, 2019, 8:00 PM

#

yeah

#

thats what I am getting at

#

so isn't that type of array more of a linked list?

lyric canopy Jul 1, 2019, 8:00 PM

#

No

#

A Python list doesn't actually contain the values in memory, but rather references to values. In this case, you'd have three float objects somewhere in memory, 1.1, 3.5, and 4.5, and the list object holds references to those objects

#

So, the list itself doesn't really care for the size of objects, because it only ever holds references to objects "somewhere" else

faint spruce Jul 1, 2019, 8:01 PM

#

hm

lyric canopy Jul 1, 2019, 8:02 PM

#

An array.array, in contrast, doesn't store references, but rather the binary representation of the values itself

faint spruce Jul 1, 2019, 8:02 PM

#

ok

#

and that binary representation is stored all in the same place, right?

lyric canopy Jul 1, 2019, 8:02 PM

#

So, if you were to look in memory, you'd really see those three float represented as bit sequentially in an array

faint spruce Jul 1, 2019, 8:02 PM

#

ah

#

And a linked list is something different, right?

lyric canopy Jul 1, 2019, 8:03 PM

#

Yes

faint spruce Jul 1, 2019, 8:03 PM

#

so there are 3 different types of ways to store lists/ collections of data

lyric canopy Jul 1, 2019, 8:04 PM

#

In a linked list, there is no sequence of values in memory, one after the other. Rather, you have an element that holds a reference to the next element, which holds a reference to the next, and so on

#

So, you've got separated things in memory, but they reference each other in a linked chain, so a linked list.

faint spruce Jul 1, 2019, 8:04 PM

#

ok, so the reference is stored with the pieces of data?

lyric canopy Jul 1, 2019, 8:05 PM

#

Yes and each value just references the next one (singly linked list) or both the previous and the next one (doubly linked list)

#

But, to get to the third item, you need to start with the first, get the address of the second, then look at the second to get the third

#

Since the first doesn't know where the third is, it only knows where the second is

faint spruce Jul 1, 2019, 8:06 PM

#

jeez. I didn't think there where this many caveats to something as simple as storing packets of data. Thats so interesting

lyric canopy Jul 1, 2019, 8:06 PM

#

The problem with a regular list or array.array is that when you want to insert or remove something from the middle or the beginning, you need to move all the values after it, since you can't have a gap

#

With a linked list, if you want to cut out, say the sixth element, you only need to change the address the fifth refers to: let it point to what was the seventh instead of the sixth

#

This makes for very fast "popping" at both ends and that's why the Python deque is such a doubly linked list.

faint spruce Jul 1, 2019, 8:08 PM

#

Can I recap if I got everything right?

lyric canopy Jul 1, 2019, 8:09 PM

#

Sure

faint spruce Jul 1, 2019, 8:11 PM

#

so there are more or less 3 diffrent ways to store collections of data

#

An array just takes some space in memory and then stores them one after another

#

A python list, doesn't store them in the same place, but stores the location of where the actual data is stored

#

And a linked list is stored "all over the place" with refrences from each element to the next one?

lyric canopy Jul 1, 2019, 8:13 PM

#

And sometimes from the current one to the previous one as well (for a doubly linked list)

#

But, yeah, that sounds about right.

faint spruce Jul 1, 2019, 8:14 PM

#

nice

#

Im going to get to 10 minutes from this stuff alone :D

#

I will have to hand in some sources to from where I know all of this stuff. Do you know any by chance?

lyric canopy Jul 1, 2019, 8:20 PM

#

No, I'm sorry, I don't have any handy

faint spruce Jul 1, 2019, 8:21 PM

#

ok

#

oh and one more thing. Those things in linked lists that reference to the next/previous element, are these called pointers or am I getting things mixed up?

lyric canopy Jul 1, 2019, 8:27 PM

#

Yes,

#

Although we don't usually use that term in Python, as Python abstracts that layer away

faint spruce Jul 1, 2019, 8:30 PM

#

Yeah, but my exam requires me to dig a little deeper into the stuff we covered in class

lyric canopy Jul 1, 2019, 8:31 PM

#

If you're interested, the implementation is here: https://github.com/python/cpython/blob/master/Modules/_collectionsmodule.c

faint spruce Jul 1, 2019, 8:32 PM

#

Thank you. You are a great help...

#

Another question, what advantages does the method bring in wich we store the adress of the data in a list? wouldn't it itself need to store the data in some type of array or linked list? why not skip this extra step?

void anvil Jul 2, 2019, 1:08 AM

#

Because it’s faster to go to 6 than to 1,2,3,4,5,6

lapis sequoia Jul 2, 2019, 9:10 AM

#

What's the difference between tensorflow and keras?
I mean what are they, algorithms or libraries.?

earnest prawn Jul 2, 2019, 10:26 AM

#

Tensorflow is a c++ library for ml related algorithms with a python frontend

#

An ugly one

#

Or rather hard to use for the average guy

#

Keras is part of tf (tf.keras) and exposes an easier to use and more simplistic python API

#

@lapis sequoia

lapis sequoia Jul 2, 2019, 10:29 AM

#

Why is keras seperate while importing?

#

One can just say import keras

#

Without importing tf

lean ledge Jul 2, 2019, 10:33 AM

#

Keras can be used for more than just tf

lapis sequoia Jul 2, 2019, 10:42 AM

#

Like for most of neural networks algorithms

#

And am running TF on spider, I can't import mnist dataset.
Does it need internet to work with TF?

vale arrow Jul 2, 2019, 12:42 PM

#

Would this be an appropriate channel for help with pandas?

#

or are the general help channels where I should go?

polar acorn Jul 2, 2019, 12:43 PM

#

Go ahead, I don't know how appropriate it is but you're hardly alone in doing it.

zenith nova Jul 2, 2019, 12:44 PM

#

pandas fits here I believe

vale arrow Jul 2, 2019, 12:47 PM

#

Gotcha

#

I'm trying to do some data processing with pandas in Python, but when I give the command to split my data into the dependent variable and independent variable I'm having an issue. Code below:

#

import matplotlib.pyplot as plt 
import pandas as pd  

# Import Dataset 
dataset = pd.read_csv('CyberData.csv') 
x = dataset.iloc[:, :-1 ].values #not viewable object? 
y = dataset.iloc[:, 44].values```

zenith nova Jul 2, 2019, 12:49 PM

#

```python
code
```

vale arrow Jul 2, 2019, 12:49 PM

#

Thanks. Both my dataset variable and my y variable are correct. But my x variable turns into an n-dimensional array (ndarray object of numpy module, as my IDE calls it). I've no idea why it is turning this into that instead of a 2D array like my y variable.

zenith nova Jul 2, 2019, 12:50 PM

#

:-1 -> -1?

vale arrow Jul 2, 2019, 12:50 PM

#

I can provide the file if you would like as well

#

But I need every column besides the last column

zenith nova Jul 2, 2019, 12:51 PM

#

well

#

If you're slicing over 3 dimensions you'll get an ndarray

#

your second line slices over 2 dimensions

vale arrow Jul 2, 2019, 12:51 PM

#

how so?

#

Does :-1 not just mean all columns but 1?

zenith nova Jul 2, 2019, 12:52 PM

#

it does

#

but :, : means all/all

#

whereas the other means all/one

#

...I think

vale arrow Jul 2, 2019, 12:54 PM

#

If I just do -1 it only gives me the last column, essentially making x the same as y

zenith nova Jul 2, 2019, 12:54 PM

#

what's the shapes of the outputs

vale arrow Jul 2, 2019, 12:55 PM

#

dataset.head():

#

0 Copper 24.2 43 ... -2370 -262.368 -2570 
1 Copper 24.2 44 ... -2320 -263.158 -2580 
2 Copper 24.2 45 ... -2310 -263.158 -2580 
3 Copper 23.7 43 ... -2380 -262.632 -2580 
4 Copper 23.8 43 ... -2380 -263.158 -2590 
[5 rows x 45 columns]```

#

y is just a 1x 10 array, while I need x to be a 44x10

zenith nova Jul 2, 2019, 12:56 PM

#

dataset.iloc[:, :-1 ].shape

vale arrow Jul 2, 2019, 12:57 PM

#

ah okay my bad. one sec

#

(10,44)

zenith nova Jul 2, 2019, 12:58 PM

#

and the other lines .shape?

#

dataset.iloc[:, 44].shape

vale arrow Jul 2, 2019, 1:00 PM

#

(10,)

zenith nova Jul 2, 2019, 1:00 PM

#

right, so one is a one-dimensional array, and one is a two dimensional array

vale arrow Jul 2, 2019, 1:01 PM

#

But when I split it into x, the type listed is different than y and I can see a preview of y, but not x

#

whereas in the videos I was watching, they did what I'm doing (with different but similar data) and x was the same as y

paper niche Jul 2, 2019, 1:03 PM

#

this just seems like an ide issue. what ide?

vale arrow Jul 2, 2019, 1:03 PM

#

Spyder

#

It may be, I'm not really sure. They are using spyder in the video I'm watching without an issue, so I can only assume it's something on my end

paper niche Jul 2, 2019, 1:05 PM

#

well it’s easy to check I think. generate a random 2D numpy array and see if you can view it

vale arrow Jul 2, 2019, 1:06 PM

#

Yep, I can

#

So as far as data types go, 1D and 2D arrays say either int64 or float64 (respectively), while my x variable just says 'object'. It also won't list the data inside the sidebar like a 1D or 2D does, instead it says 'ndarray object of numpy module' which tells me its like a 3D array or something?

polar acorn Jul 2, 2019, 1:11 PM

#

https://python-forum.io/Thread-Spyder-ndarray-object-of-numpy-module-error

Welcome to python-forum.io

Spyder: ndarray object of numpy module error

Hi all, I am going through the ML A-Z course on udemy to help me solve an assignment and I am at the stage of preprocessing my data at the moment. My code is as below: import numpy as np import pandas as pd import matplotlib.pyplot as plt dataset...

#

Seems like a Spyder issue. And seems your code should work fine despite it.

vale arrow Jul 2, 2019, 1:12 PM

#

That doesn't really make sense, as in the video his array has multiple types and works just fine. Regardless, I suppose I'll have to switch. What IDE do you all recommend?

polar acorn Jul 2, 2019, 1:14 PM

#

In the thread they mentioned it might be a problem with a specific Spyder version. Maybe check that you're using the latest one?

vale arrow Jul 2, 2019, 1:15 PM

#

I just downloaded it a few weeks back. So I would think it should be newer than whatever version they were using in 2018 lol

#

I'll check

polar acorn Jul 2, 2019, 1:17 PM

#

I use PyCharm for the record. ¯_(ツ)_/¯

vale arrow Jul 2, 2019, 2:28 PM

#

I've tried like 5 different ways of updating this IDE. It refusese to change, so PyCharm it is

haughty wind Jul 2, 2019, 11:24 PM

#

I'm trying to fit a neural net on my laptop, which has an NVIDIA GTX 1050

#

For now, I have ~36k training images, and I'm using VGG19 and doing some transfer learning so I don't have to train the net from scratch

#

Problem is, when I run the code I get a resource exhausted error

#

Is this reasonable, given the amount of data I have, my GPU, and the RAM (24 GB)?

#

Do I have to acquire a dedicated machine for this, or is there a way to optimize my data/training procedure?

earnest prawn Jul 2, 2019, 11:38 PM

#

The issue here is that your GPU only has a limited amount of memory itself (GPUs have dedicated so called VRAM which at least on normal ones doesnt exceed one digit GB space) what you can do in order to stop this error from happening with limited resources is split your data up into so called mini batches, so instead of fitting on 36k images only fit on as many images as you GPU can hold at a time, then the next minibatch the next one and so on until you got every image and then continue with the next epoch @haughty wind

lean ledge Jul 2, 2019, 11:48 PM

#

👍

#

You're probably running out of vram. Try monitoring your vram usage using nvidia-smi and reduce your batch size to 1

#

(for now)

#

If it still can't fit on your VRAM, you'll need to use a different GPU

earnest prawn Jul 2, 2019, 11:51 PM

#

(if VRAM is actually the issue that's highly unlikely as the model you have doesnt have a version with less than 2 GB ....and youd really have to have a big image to exceed that with one image at a time)

haughty wind Jul 2, 2019, 11:52 PM

#

Right now my images are somewhere around 70x70 pixels or so

earnest prawn Jul 2, 2019, 11:52 PM

#

yeah if VRAM is actually the issue the batch size fix should work out perfectly

haughty wind Jul 2, 2019, 11:52 PM

#

I dropped my batch size to 4 as well as samples per epoch

#

I forgot I also dropped my epochs to 5 so my net understands nothing, but at least it made it through

earnest prawn Jul 2, 2019, 11:53 PM

#

cool

#

keep increasing the batch size until it doesnt work anymore then and then decrease by a little and you should be ready to go 👍

haughty wind Jul 2, 2019, 11:54 PM

#

ok cool, thanks for the help

onyx granite Jul 3, 2019, 1:46 AM

#

what';s the recommended approach for deploying ML models? specifically just a normal pickle/joblib? other than using AWS, Azure, Google Cloud

#

Flask + Celery + Redis?

quartz stream Jul 3, 2019, 9:03 AM

#

Hey

#

I have a dataset

#

Can anyone help me in Modifying it

#

like every row is in form of id;message;label in one column

#

i need them in different column

polar acorn Jul 3, 2019, 9:08 AM

#

Is your dataset stored in a csv?

hollow quartz Jul 3, 2019, 12:58 PM

#

Hey in my data set i have two feature, one of type datetime and one double values. I want to predict double values. Is it possible?

lapis sequoia Jul 3, 2019, 1:10 PM

#

Has anyone here done a proper live company project?

I have started to work in a small startup, it's basically a online event ticketing platform website.
In order for me to join this company they have given me a task to come up with a model that can help their business.
I want to know how can I use machine learning or any similar technology in python to create a live project which can be deployed to help that company.
Any ideas?

polar acorn Jul 3, 2019, 1:14 PM

#

Come up with an idea or create and deploy a model to carry it out?

odd osprey Jul 3, 2019, 1:32 PM

#

I need help with keras model

#

im getting must be from the same graph as Tensor when im trying to use it from python loading reshaped cv image

earnest prawn Jul 3, 2019, 1:45 PM

#

@odd osprey could you show us your code and your exact error?

odd osprey Jul 3, 2019, 1:53 PM

#

ahm, just fixed finally -_-'

#

my issue was actually very stupid - I was wrong with singleton in pyhon :/

distant merlin Jul 3, 2019, 2:19 PM

#

@earnest prawn Well well well

earnest prawn Jul 3, 2019, 2:19 PM

#

what

distant merlin Jul 3, 2019, 2:19 PM

#

I joined a server and here you are

#

This was unexpected One in random

earnest prawn Jul 3, 2019, 2:20 PM

#

ive been here for ages lol

distant merlin Jul 3, 2019, 2:20 PM

#

I just joined

#

server in nice structure

#

The Authentication is nice

#

Best to avoid spam

dense rose Jul 3, 2019, 10:14 PM

#

Does anyone use the vim mappings in jupyterlab?

#

I'm trying to figure out how to do some remapping.

gusty maple Jul 3, 2019, 10:53 PM

#

Hello anyone here have studied the sugarscape or the Think Complexity book?

lapis sequoia Jul 4, 2019, 2:59 AM

#

I have a problem i'm not sure how to solve

#

there's labelled data, random garbled text sequences and they're split into four classes.. how would I go about running a classifier on them

#

they're not meaningful words but all the words in each sequence are five characters, all alphanumeric

onyx granite Jul 4, 2019, 3:10 AM

#

so like example: row 1: sdasa tyghs ileis 1 row 2: ffasa byghs glets 2

#

?

#

1,2 being labels

lapis sequoia Jul 4, 2019, 3:12 AM

#

yeah

#

I have no clue how to approach this.. because they are meaningless

onyx granite Jul 4, 2019, 3:15 AM

#

meaningless meaning they are meaningless but have a certain pattern

#

or meaningless meaning you just dont know the pattern

#

or meaningless meaning literally meaningless

#

because the last part wont really let you do much with the data

lapis sequoia Jul 4, 2019, 3:16 AM

#

meaningless meaning I probably don't know the pattern

#

or what it's about

#

also, the words are separate by comma

onyx granite Jul 4, 2019, 3:18 AM

#

if you think there might be a pattern and you just dont know it, usually if it's dealing with words or characters it's more NLP based

#

fair warning i am not data scientist

#

but usually something related to a multi class classifier like naive bayes is a possible starting point

#

or if you can use some type of unsupervised learning it might can reveal a pattern

#

i would look up like NLP algorithms that can do multi class classification

#

then just create your normal train, test, validation sets

#

there is also some for document classification, like if you have 30 different text corpuses from different letters you are reading, it can detect which ones belong to which areas

#

but that's more unsupervised, you said they have classes already

lapis sequoia Jul 4, 2019, 3:21 AM

#

the hints do say that the words are from a text document

#

but it is a classification task

#

I will use unsupervised for eda

onyx granite Jul 4, 2019, 3:23 AM

#

try something like naive bayes first

#

then move up to logistic regression + word2vec or something like that

#

then if you need even better results, move towards something more advanced

#

honestly logistic regression works very well for like sentiment analysis and such, even text classification in general

#

but obviously wont be as good as something like a NN

lapis sequoia Jul 4, 2019, 3:31 AM

#

thanks man

desert oar Jul 4, 2019, 3:45 AM

#

@lapis sequoia what kind of data is this?

#

it's "garbled" but is there any meaning to it?

#

if it's truly random then you won't have much luck with this

lapis sequoia Jul 4, 2019, 4:40 AM

#

@desert oar i'm not sure where it's from.. I was only told "training data is tab separated with a label and tokenized sequential data..the tokenized data are actually words of a text document. Each token is a length of 5 (alpha-numeric letters)."

#

but it is a classification task

desert oar Jul 4, 2019, 4:41 AM

#

I see

#

So they are words but you arent allowed to see the words? Interesting

#

Yes bag of words classification seems to be the best (maybe only) option

lapis sequoia Jul 4, 2019, 4:42 AM

#

yeah.. and there's 4 classes, so I'm guessing if there are common words between labels, I should drop them and check accuracy of classifying again

#

they might be stop words, etc

desert oar Jul 4, 2019, 4:43 AM

#

Yeah although i wouldnt drop all common words

#

Could use eg chi square or mutual information for feature selection

#

How many records and how many unique words?

lapis sequoia Jul 4, 2019, 4:44 AM

#

i'm not sure, let me reduce it and check

#

reduce it to a csv I mean, with the labels removed

#

I wonder if it makes sense to check if there are duplicate rows (same sequences repeated) but i'm not able to view it as rows next to each other when I do :

desert oar Jul 4, 2019, 4:45 AM

#

How is your data provided?

lapis sequoia Jul 4, 2019, 4:45 AM

#

the data is in a tsv, label separated by sequence of words

desert oar Jul 4, 2019, 4:45 AM

#

And are they actually sequential or just a bag of words already

lapis sequoia Jul 4, 2019, 4:45 AM

#

let me show an example row

#

printed from a df

desert oar Jul 4, 2019, 4:46 AM

#

Use df['sequence'].str.split(',') to get a column of lists of words

#

No need to save again

#

However you can save as Parquet format in the future to avoid having to split again every time you load

lapis sequoia Jul 4, 2019, 4:48 AM

#

I have the column of words now, it's a series and each series seems to be now converted to a list

#

I might export to csv and check it in something like answer minor to get quick eda

desert oar Jul 4, 2019, 4:49 AM

#

Each element of the series will be list yes

#

You can use .map(len) on that now to get the length of each list, etc

lapis sequoia Jul 4, 2019, 4:50 AM

#

they're all varying lengths, I see 19 and 45.. etc.

desert oar Jul 4, 2019, 4:51 AM

#

Use matplotlib or .plot to plot them, eg hist or kernel density

#

Or .describe

lapis sequoia Jul 4, 2019, 4:52 AM

#

63799 count, min 14, max 204, mean 44

desert oar Jul 4, 2019, 4:57 AM

#

Use co.Counter(_.tolist()) to get word counts

#

Import collections as co

lapis sequoia Jul 4, 2019, 5:08 AM

#

to_list on the series?

#

I can do the map on the co.Counter()

desert oar Jul 4, 2019, 11:29 AM

#

@lapis sequoia .tolist() and .map() are Series methods

#

its much faster to iterate over a list than a series

#

Btw that was wrong anyway, it should be Counter(chain.from_iterable(df['sequences_split'].tolist()))

lapis sequoia Jul 4, 2019, 11:31 AM

#

neat:)

#

im just cleaning up my code.. seeing if some of the test sequences are already in train, etc.. haven't gotten to training part yet

celest moss Jul 4, 2019, 2:46 PM

#

I am trying to implement Naive bayes classification on Iris dataset, but the distribution of some features are not normal, so how do I proceed ? Should I just ignore it and use guassian distribution ?

📎 graph.png

polar acorn Jul 4, 2019, 7:21 PM

#

@celest moss I think you assume normality for each feature conditioned on each class. So if the PetalLenght fore each class looks somewhat normal that would be good enough.

celest moss Jul 4, 2019, 7:37 PM

#

@pptt, Thank you ! Implemented with normal distribution and seems like you are right, I got an accuracy of 93.33 %. So you are saying that P(feature_vector | class) must be normally distributed right ?

polar acorn Jul 4, 2019, 7:38 PM

#

Yes, that is the assumption we are making at least. I'm sure you could often get good enough results even if that is not entirely the case.

celest moss Jul 4, 2019, 7:39 PM

#

Thank you very much !

polar acorn Jul 4, 2019, 7:40 PM

#

np

#

@heavy crow @digital meteor @digital crescent
The data science channel could use some of this discussion 😃

digital meteor Jul 4, 2019, 7:48 PM

#

hi

heavy crow Jul 4, 2019, 7:48 PM

#

this is the dataset

#

https://cdn.discordapp.com/attachments/425755501276430336/596070621721526279/water-levels.csv

#

feel free to download it and play with it

digital meteor Jul 4, 2019, 7:49 PM

#

okay so what I am saying to do it
fit a polynomial to it
in a linear regression

heavy crow Jul 4, 2019, 7:49 PM

#

go ahead and do it. i cant get any good results with it

digital meteor Jul 4, 2019, 7:49 PM

#

i.e the cubic fit that you already did

heavy crow Jul 4, 2019, 7:49 PM

#

because for the 5th time. i dont want the trend

#

i want values

polar acorn Jul 4, 2019, 7:50 PM

#

@heavy crow the thing you should do to make sure that your LSTM actually perform well is to compare it to a simple model. So for instance split the data at several points train a LSTM on the data prior to the points and fit a linear model to the data prior to that point. Compare how wrong their predictions are on the next three months.

heavy crow Jul 4, 2019, 7:50 PM

#

yeah. does way better than the linear

#

the linear does terrible.

polar acorn Jul 4, 2019, 7:50 PM

#

What Apex is trying to tell you is that the LSTM likely isn't doing better than the linear trend in such a scenario.

digital meteor Jul 4, 2019, 7:50 PM

#

^

#

or the quadratic/cubic

#

https://cdn.discordapp.com/attachments/267624335836053506/596420235448418345/Advk0TqvKmzEAAAAAElFTkSuQmCC.png

#

i.e. this

heavy crow Jul 4, 2019, 7:51 PM

#

the obv hits almost all the points in training and hits most points in validation

polar acorn Jul 4, 2019, 7:52 PM

#

Also if you're into time series have a look at this free e-book https://otexts.com/fpp2/ It's to the point and a good intro.

Forecasting: Principles and Practice

2nd edition

heavy crow Jul 4, 2019, 7:52 PM

#

mk gonna take a look at tha

#

t

digital meteor Jul 4, 2019, 7:53 PM

#

essentially a summary of what I have been trying to say is that
I think deviations from that quadratic curve are random noise
and when I say random noise what I mean is "a random variable that is normally distributed with constant variance"
that's how it was defined in my uni lectures anyway 🤷

heavy crow Jul 4, 2019, 7:53 PM

#

ok cool but that doesnt help at all when you are trying to predict values not trends

digital meteor Jul 4, 2019, 7:53 PM

#

you may be able to improve the quadratic fit with some time-lagged variables i.e. autoregression
but I don't think the beta-coefficient on time-lagged variables will be that big in this case

#

well what the quadratic fit allows you to do
is predict future values using the assumption that the quadratic trend will continue

#

so essentially if you are using the quadratic fit to predict a value 3 months in the future, you can do that by carrying on the red line and taking the y-value from that date

#

this is essentially just the regular multivariate regression method

#

what I don't think is that you can make a model (including non-regression-based models) that can predict in the future all those bumps and dips. I don't think its possible to accurately predict those with the data that you have

#

because, as I was saying, the bumps and dips don't seem to have much of a pattern, they just look like random noise on top of the quadratic curve
(random noise being a random variable that is normally distributed, mean of zero and constant variance)

#

so the way I was taught multivariate analysis was that in this situation you can't capture that random noise, and that you should settle for a polynomial fit, maybe with some lagged variables (autoregression)

#

even if there is an underlying pattern in the deviations from the quadratic, it hasn't repeated itself that many times. This means that if you do extend your model to try to capture the deviations from the quadratic (under the assumption that they are non-random) the model will have a lot of trouble capturing the pattern.

#

so its possible that a machine learning algo or other recursive algo will capture those deviations from quadratic a bit better, if they are non-random, but it won't beat the quadratic regression by much even if it does. Especially if lagged variables are introduced to the regression because they have some limited ability to capture patterns themselves.

#

that's essentially what I was trying to say earlier

polar acorn Jul 4, 2019, 11:11 PM

#

@heavy crow

Because I'm procrastinating and should be doing useful stuff I did this instead. I wrote a short script that tests your model (the LSTM for instance) vs a linear model with time series cross validation. You just have to implement two functions. One that extracts the features you want from x_train and trains your model from scratch. And one that extracts the features you want from x_test and predicts on it. Feel free to try it out. If you are interested in predictive power, comparing predictions might be a good idea 😃

#

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import TimeSeriesSplit

df = pd.read_csv("water-levels.csv")

# Baseline model, we will try to beat this. (if we can you were right and the "noise" was not noise!)
lr = LinearRegression()

# Your alternative model, implement a train and predict function for your model.
def train_alt_model(x_train, y_train):
    # Your code here
    return alt_model
    
def predict_alt_model(alt_model, x_test):
    # Your code here
    return predictions

# Set up experiment.
tscv = TimeSeriesSplit(max_train_size=None, n_splits=26)
results_df = pd.DataFrame()

# Time series cross validation.
for train_index, test_index in tscv.split(df):
    x_train = df.date[train_index]
    y_train = df.waterlevel[train_index]
    x_test = df.date[test_index]
    y_test = df.waterlevel[test_index]
    
    results_df_tmp = df.iloc[test_index].copy()
    
    # Fit and predict with linear regression
    lr.fit(x_train.values.reshape(-1, 1), y_train)
    results_df_tmp['linear_regression_prediction'] = lr.predict(x_test.values.reshape(-1, 1))
    
    # Fit and predict with alternative model
    alt_model = train_alt_model(x_train, y_train)
    results_df_tmp['alternative_model_prediction'] = predict_alt_model(alt_model, x_test)
    
    # Update results data frame
    results_df = pd.concat([results_df, results_df_tmp], axis=0)

# Compute t statistic, we assume here that both model's residuals are standard normal dist.
linear_regression_residuals = (results_df.waterlevel-results_df.linear_regression_prediction)**2
alternative_model_residuals = (results_df.waterlevel-results_df.alternative_model_prediction)**2
residual_deltas = linear_regression_residuals - alternative_model_residuals
t_stat = np.mean(residual_deltas)/(np.std(residual_deltas)/np.sqrt(len(results_df)))

#

# See if we beat linear regression.
if t_stat >= 1.96: # We're assuming here that len(results_df) > 1000
    print("Congrats! Your model beats linear regression!")
elif t_stat > 0:
    print("Your model predicts slightly better than linear regression but the improvement is not statistically significant.\n" 
          "So we can not say for sure that you beat linear regression.")
else:
    print("Your model predicts worse than linear regression. Did you even try?")
    
plt.plot(df.date, df.waterlevel)
plt.plot(results_df.date, results_df.linear_regression_prediction)
plt.plot(results_df.date, results_df.alternative_model_prediction)
plt.legend(['waterlevel', 'linar reg', 'alt model'])
plt.show()

#

n_splitsis set to 26 which is equivalent to predicting a year ahead into the future at each split. Set it to 100 to predict 3 months ahead, although this might take some time. If it takes to long you can set it to 10 which is predicting 2.5 years into the future. Or you could reduce max_train_size to cut down on the time.

Also if you want to compare linear and quadratic regression I made the two functions you have to implement:

def train_alt_model(x_train, y_train):
    alt_model = LinearRegression()
    x_train_reshaped = x_train.values.reshape(-1, 1)
    x_train_full = np.stack([x_train_reshaped[:,0], x_train_reshaped[:,0]**2], axis=1)
    alt_model.fit(x_train_full, y_train)
    return alt_model
    
def predict_alt_model(alt_model, x_test):
    predictions = [2]*len(x_test)
    x_test_reshaped = x_test.values.reshape(-1, 1)
    x_test_full = np.stack([x_test_reshaped[:,0], x_test_reshaped[:,0]**2], axis=1)
    predictions = alt_model.predict(x_test_full)
    return predictions

Good luck!

lapis sequoia Jul 5, 2019, 2:55 AM

#

hi

lapis sequoia Jul 5, 2019, 8:46 AM

#

this is weird

#

my numpy array shows datatype as

#

dtype='|S1181'

strong flare Jul 5, 2019, 8:52 AM

#

HI Guys may i know what error is this ? i cant do the prediction on my data it showing me keyError

📎 unknown.png

polar acorn Jul 5, 2019, 9:01 AM

#

It's says pretty clearly it's a keyError. You are asking where in the index is the key, ''20150217". But that key isn't in the index and so you get an error.

strong flare Jul 5, 2019, 9:08 AM

#

@polar acorn i try to check the index but it show this

📎 unknown.png

polar acorn Jul 5, 2019, 9:12 AM

#

The result doesn't have a index. When you call result.predict somewhere inside that function it checks some index for your key. I don't know what index that is or how the insides of result.predict looks. Can you try to remove ''20150217" and only predict on "20170301"?

strong flare Jul 5, 2019, 9:14 AM

#

@polar acorn it show another error now

📎 unknown.png

polar acorn Jul 5, 2019, 9:15 AM

#

Ah I see now, those are the start and end values. Okay you should put that value back. Hmmm.

strong flare Jul 5, 2019, 9:16 AM

#

😂 in a deadlock now cant figure out which part error

jagged stump Jul 5, 2019, 10:42 AM

#

What is the best way for real time logo detection ? ANy suggestion?

quartz monolith Jul 5, 2019, 12:31 PM

#

I have a big dataframe to clean and some columns have

'200, 334'
'200/334'
'200/ 334'
ADJ:Enc'
>Faul```
first i want to clean them by a space, afterwards i want to split them into two different columns to train the model

heavy crow Jul 5, 2019, 12:46 PM

#

.replace(" ","")

#

And by what do you want to split them?

polar acorn Jul 5, 2019, 12:56 PM

#

@quartz monolith
This should work, replace col with your column name.

import pandas as pd
df = pd.DataFrame({'col':['203,608', '200, 334', '200/334', '200/ 334']})

df.col = df.col.str.replace(' ', '')
df[['col_split_1', 'col_split_2']] = df.col.str.extract("(\d*)[,|/](\d*)")

quartz monolith Jul 5, 2019, 1:10 PM

#

@polar acorn Thanks that exactly what I'm looking for

polar acorn Jul 5, 2019, 1:16 PM

#

np

quartz monolith Jul 5, 2019, 1:34 PM

#

Whats the best method dealing with NaN values and learning machine? rather dropping the raws, converting in int like 999 or just replace them

#

if i drop my raws i will lose better result so imo thats not the case

desert oar Jul 5, 2019, 3:54 PM

#

It depends on what the missing values mean

#

Sometimes it's better to "impute" the missing values

hollow quartz Jul 5, 2019, 4:26 PM

#

How to do to display all the values to the column. I use pyspark

📎 Capture.PNG

quartz monolith Jul 5, 2019, 4:37 PM

#

Why dont you round the values? @hollow quartz

hollow quartz Jul 5, 2019, 4:38 PM

#

ok i try

vestal gazelle Jul 5, 2019, 6:39 PM

#

@hollow quartz also string formatting might help, if that applies in your case

quartz monolith Jul 5, 2019, 7:05 PM

#

I'm training a DecisionTreeClassifier and I'm really bluffed... Why 5500 rows dataset performs better than 130k. Somebody has exp. with DT?

#

precision recall f1-score support
weighted avg 0.57 0.59 0.57 1383 small
weighted avg 0.56 0.58 0.57 33584 big

onyx granite Jul 5, 2019, 7:13 PM

#

your smaller dataset might be overfitting due to less samples that are more representative of your actual dataset

#

also I think DTs are very much influenced by the data size more so than others

#

also be sure you dont have a class imbalance issue somewhere

#

just a few things off the top of my head

onyx granite Jul 5, 2019, 11:09 PM

#

Hi, I am trying to deploy a model to google ai platform and running into some generic errors, was wondering if anyone has ran into certain error I am getting

#

Create Version failed. Bad model detected with error: "Failed to load model: Could not load the model: /tmp/model/0001/model.joblib. 47. (Error code: 0)"

lapis sequoia Jul 6, 2019, 1:16 AM

#

is anyone alive

vital bison Jul 6, 2019, 6:31 AM

#

I'm using sns.distplot to compare two features of a training dataset!
does it work only for binary datasets?

do the histograms get corrupted if the dataset is too large?
another question
how important are system features? if I'm using only cloud platforms?
like colab
https://gyazo.com/db268fec4c046c6203f94e0b7799a212

Gyazo

fringe timber Jul 6, 2019, 8:57 AM

#

i feel alive, but i might be biased, need to collect more data

hollow quartz Jul 6, 2019, 1:24 PM

#

Hi I want to normalize a column. I can normalize a column of Vector but not a column of value

lapis sequoia Jul 6, 2019, 2:00 PM

#

what do you mean

hollow quartz Jul 6, 2019, 3:50 PM

#

@lapis sequoia the difference between featurescolumn and scaled_featurescolumn

📎 Capture.PNG

lapis sequoia Jul 6, 2019, 3:51 PM

#

which column are you having trouble normalizing

hollow quartz Jul 6, 2019, 3:53 PM

#

conso_total column because it is not a vector

lapis sequoia Jul 6, 2019, 3:53 PM

#

what does conso total represent

hollow quartz Jul 6, 2019, 3:55 PM

#

it's the my target prediction$

#

my code is here df_train.withColumn('Scalerconso_total', ((df_train.conso_total)-_min)/(_max-_min))

#

the problem is here AnalysisException: "Can't extract value from conso_total#16: need struct type but got double;"

lapis sequoia Jul 6, 2019, 4:01 PM

#

sorry it's like 1 am here.. I'll get back to you in the morning

hollow quartz Jul 6, 2019, 4:02 PM

#

ok thanks

nova crow Jul 6, 2019, 7:30 PM

#

I am currently focusing on making a "Python Bot" for an online flash game. I've been looking it up on the internet, and most results say that i have to use Autopy in order to handle the automated mouse click and keyboard input. But i'm having problems in sniffing the output that comes from the Flash Game to my ip adress, and if i successfully sniff the packages, how am i able to use AutoPy in Python to make the bot capture a screenshot every 0.5 seconds? (the bot should be able to do it because it can then convert the captured screen into a numpy array for analysis)

rocky dust Jul 6, 2019, 7:43 PM

#

I think screenshots can be taken with the ImageGrab module from PIL.

#

Keyboard press and Mouse clicks are also supported by the module: pyautogui

quartz monolith Jul 6, 2019, 9:15 PM

#

LabelEncoder with never seen before values someone encounter this kind of stuff when im transforming my labels?
found the answer maybe but cant implement it into my code:
https://stackoverflow.com/a/52505373

Stack Overflow

sklearn.LabelEncoder with never seen before values

If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set.

The only solution I could come up with for this is to map everythin...

#

📎 Bildschirmfoto_2019-07-06_um_23.20.42.png

hollow shard Jul 6, 2019, 9:22 PM

#

Hi guys, sorry to bother you again but my mnist 1 hidden layer neural net is still refusing to work. I'm completely stumped

#

http://dpaste.com/13ACF3V

#

heres the code, thanks in advance to those who look at it 👍

vital bison Jul 7, 2019, 4:51 AM

#

although i ahve helper installed but this line of code is giving me
https://gyazo.com/8c378d412bb0fe0a4bdde8fd1db10475
can anyone help me solve that

Gyazo

lapis sequoia Jul 7, 2019, 5:11 AM

#

is anyone alive

#

I need to evaluate my model on a suitable metric, but I'm wondering if k-fold CV is enough..

#

usually I figure out a good heuristic that's specific for the problem im solving, but i'm tired and facing a deadline..

#

I guess what im asking is.. how do I cover my ass

#

lol

lapis sequoia Jul 7, 2019, 8:47 AM

#

I saved my model, how do I know the properties of one of my layers

#

I want to find the filter size (kernel size)

#

I got it

#

.get_layer(layername).kernel_size

quartz monolith Jul 7, 2019, 2:53 PM

#

How to inverse transform on the True label and Predicted label?

📎 Bildschirmfoto_2019-07-07_um_16.51.49.png

#

`# Accuracy per Class

ConfusionMatrix heatmap between RootCause

y_test_it = new_le.inverse_transform(y_test)
y_pred_it = new_le.inverse_transform(y_pred)
conf_mat = confusion_matrix(y_test_it, y_pred_it)

f, ax = plt.subplots(figsize=(50, 35))
#cmap=cmap
conf_mat_normalized = conf_mat.astype('int') / conf_mat.sum(axis=1)[:, np.newaxis]
mask = np.zeros_like(conf_mat)
sns.heatmap(conf_mat_normalized, vmax=1, center=0,
square=True, linewidths=2,
annot=True, annot_kws={"size": 15}, fmt=".1f", mask=mask)

plt.ylabel('True label')
plt.xlabel('Predicted label')`

tight sparrow Jul 7, 2019, 6:50 PM

#

hey guys I'm not sure if i should post here or help

#

📎 unknown.png

#

im following a video and the n_jobs =1 in output

#

not sure how that can be

feral lodge Jul 7, 2019, 10:35 PM

#

n_jobs is just the number of parallell CPU threads scikit used; it doesn't change any properties of your linear model. If you have n_jobs=None, then it defaults to 1 anyway, so there's no difference at all @tight sparrow

#

If your output is different from the video, then you might have a slightly different scikit version or something, but that's fine probably

tight sparrow Jul 7, 2019, 10:42 PM

#

Awesome thanks for the clarification 👍

grizzled folio Jul 8, 2019, 5:44 AM

#

Hey team, I have a 3d numpy array (e.g. 218x100x2502). I run np.argmax(..., axis=1) to get indices along the middle axis. I want to index another array with the same shape, using these indices. Essentially:

for ii in ndindex(a.shape[0]):
  for kk in ndindex(a.shape[2]):
    out[ii,kk] = a[ii, indices[ii,kk], kk]

I thought maybe np.take would do this (I tried np.take(a, indices, axis=1) but this is definitely not the right thing, since it hangs Python...)

#

Looks like this: https://stackoverflow.com/questions/32089973/numpy-index-3d-array-with-index-of-last-axis-stored-in-2d-array is almost what I want. Couldn't get the right search terms!

Stack Overflow

Numpy: Index 3D array with index of last axis stored in 2D array

I have a ndarray of shape(z,y,x) containing values. I am trying to index this array with another ndarray of shape(y,x) that contains the z-index of the value I am interested in.

import numpy as np

#

np.take_along_axis(a, indices[:,None,:], axis=1).squeeze() -- easy! thanks team

silk forge Jul 8, 2019, 12:03 PM

#

yo