#data-science-and-ml | Python | Page 191

lime lava Nov 30, 2018, 6:27 PM

#

Gimme a sec ill put and example

#

I want to do this https://gist.github.com/waio1990/166ad2946302b368bc4dfcc2209e5b2e

Gist

test.md

GitHub Gist: instantly share code, notes, and snippets.

#

Obv for a very big database but the idea is to remove duplicated groups, in the example i would be the third group (Id=3)

desert oar Nov 30, 2018, 6:30 PM

#

How big

lime lava Nov 30, 2018, 6:31 PM

#

12k ids, each with variable amount of a and b characteristics

desert oar Nov 30, 2018, 6:33 PM

#

that's not big 😉 anyway you can do .groupby('B').apply(whatever).drop_duplicates(keep='first')

lime lava Nov 30, 2018, 6:33 PM

#

Haha youre right its not that big

#

So you’re suggesting i do it like reverse, group by the characteristic combo and then remove ids?

desert oar Nov 30, 2018, 6:36 PM

#

Maybe I don't fully understand what you are getting at

#

It looks to me like you are doing an aggregate operation by group

#

And then removing duplicates after that aggregation

lime lava Nov 30, 2018, 6:43 PM

#

I update the gist which maybe shows my problem a bit better

#

Group with Id 2 has the same first 2 characteristics, but has a third one so its not the same as groups 1 and 3

#

On the other hand groups 1 and 3 are, excluding ID, the same

#

So i want to remove either 1 or 3

desert oar Nov 30, 2018, 6:46 PM

#

Isnt that just what drop_duplicates does?

#

You can use subset=['foo', 'bar'] to only look at specific columns

lime lava Nov 30, 2018, 6:47 PM

#

Wouldn’t that catch group 2 though?

desert oar Nov 30, 2018, 6:52 PM

#

hm

#

i see

#

naively you can iterate of unique pairs of groups and remove dupes

#

for 17k records just do it

#

use tqdm to monitor progress and go get some water while it runs

lime lava Nov 30, 2018, 6:58 PM

#

Okay that doesn’t sound so bad actually

desert oar Nov 30, 2018, 7:04 PM

#

hmm the dupe removal is nontrivial

#

from a data structure perspective

#

import itertools as it

import pandas as pd
from tqdm import tqdm

def groups_are_identical(g1, g2):
    cols = ['A', 'B']
    try:
        pd.testing.assert_frame_equal(g1, g2)
    except AssertionError:
        return True
    else:
        return False

# assumes 'id' is a column, not the dataframe index itself
grps = data.groupby('id') 
n_grps = len(grps)
grp_pairs = {(lab1, lab2): groups_are_identical(g1, g2)
             for (lab1, grp1), (lab2, grp2)
             in tqdm(it.combinations(grps, 2), total=n_grps * (n_grps - ) / 2)}

this gets you part of the way

lime lava Nov 30, 2018, 7:07 PM

#

Thank you!

desert oar Nov 30, 2018, 7:08 PM

#

that tells you if any pair of id's is a dupe

#

then youd have to build up connected sets, and grab 1 id from each set

#

probably an easier way to do it tho

#

yeah theres gotta be an easier way

#

or not. graph algorithms pop up in unlikely places

#

wait hang on this is idiotic

#

lol

#

unique_grps = {}

for lab1, grp1 in grps:
    for grp2 in unique_grps.values():
        if not groups_are_identical(grp1, grp2):
            unique_grps[lab1] = grp1

ukeys, ugrps = zip(*unique_grps.items()
data_deduplicated = pd.concat(ugrps, ukeys)

lime lava Nov 30, 2018, 7:18 PM

#

😮

desert oar Nov 30, 2018, 7:19 PM

#

ok thats my queue to go for a walk. too tired for this lol

lime lava Nov 30, 2018, 7:21 PM

#

Thank you!

granite stream Dec 1, 2018, 3:18 AM

#

Could anyone help with https://discuss.pytorch.org/t/attributeerror-nonetype-object-has-no-attribute-in-channels/30882 ?

PyTorch Forums

AttributeError: 'NoneType' object has no attribute 'in_channels'

I am trying to use prune.py in python 3.7.1 and pytorch 1.0, but I have the following error: [phung@archlinux SqueezeNet-Pruning]$ python finetune.py --prune /usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:187: UserWarning: The use of the transforms.S...

azure wren Dec 1, 2018, 5:49 AM

#

https://cdn.discordapp.com/attachments/465999263059673088/518232411818033162/unknown.png

#

🤔 Has anyone tried these Python for Data Science essential oils?

brazen spade Dec 1, 2018, 11:04 AM

#

Hello,

I'm having a hard time finding information on on some machine learning topics. I'm looking for theoretical help, not how to / programming stuff. If anyone can answer these questions for me it would be much appreciated (and sorry I'm a noob!), or at least point me in the right direction:

After training and testing a model, the accuracy score and jaccard index are reporting a 98.7% accuracy rate. This seems ridiclously high considering I didn't even tune the model, using default parameters. I am following someone else's example that did extensive feature engineering, is it possible it was just that good or am I right to be skeptical?
After training a model, can I make alternations to the parameters of the model or to the original data itself to obtain different results? Or do I need completely new data? I've tried this and the result does not seem to change
In the same line of thought, can I train multiple models, say a decicion tree and an SVM on the same data set (albeit one data set separated into training and test sets, but the same training and test set used on both models?) or do they need completely different data sets?

arctic moth Dec 1, 2018, 12:36 PM

#

@brazen spade for the first question, it is difficoult to say, because it depends on the ratio of data you are using. If you use 99% of dataset to train the model and just 1% to train the model. You can get really high accuracy, because of overfitting. So it is possible, but I would have to see the code. 2) What do you mean by changing parameters of the model. If you fit the model on same data and with some parameters and afterwards change the original data, the model would still be trained on the previous data and the inner weights would not change. But you can train the 2 models with the different parameters on the same dataset and compare the results.

#

if you are following someone else example (same parameters, same dataset etc.) you should obtain a comparable result.

brazen spade Dec 1, 2018, 1:06 PM

#

@arctic moth thanks for the comment. I followed an example for the data cleaning / feature engineering, but I did not like their modeling approach, it was complex with very little notes explaining why there were doing what they were. My goal was to obtain the same data that they used essentially, then model it with maybe three different model of with default parameters, then pick the best default resulting mode and use a GridSearchCV from scikit-learn or something like that to generate the hyper parameters and see improvement. As I mentioned I'm new to this so I'm not really sure the best way to do it. Thank you for your answers though, you've answered a couple of my questions

arctic moth Dec 1, 2018, 1:07 PM

#

Sure np.

arctic moth Dec 1, 2018, 10:22 PM

#

Anyone know any good free tensorflow tutorials?

ripe niche Dec 2, 2018, 4:02 AM

#

@arctic moth If you're just starting out, might want to try pytorch instead.

chilly shuttle Dec 2, 2018, 8:54 AM

#

or keras

#

probably keras

arctic moth Dec 2, 2018, 10:57 AM

#

And do you have any recommendation for keras tutorials?

undone dirge Dec 2, 2018, 10:59 AM

#

There are some on reddit.. forgot the link

#

And you tube

arctic moth Dec 2, 2018, 11:04 AM

#

thanks 😃

desert oar Dec 2, 2018, 12:42 PM

#

@brazen spade 1) check baseline prevalence of class, or possibly its wildly overfitted to the test set, or its too easy (eg MNIST you should be getting > 98%). 2) same data is fine... why wouldn't it be? 3) as with 2, model algorithm choice is just a big parameter so same data is fine. But know that if you use a test set to compare models and select the best one, that test set is no longer valid for estimating the true out of sample performance of the model. Because once you use it to compare/evaluate models, you are incorporating information from it into the model. This is why i recommend keeping a "holdout set" until the end of the project. Choose model type, parameters, and features using cross validation on say 80% of the data, then use holdout set to evaluate final accuracy after you do all that stuff ... ideally you exclude this holdout set from exploratory analysis as well but that's not always feasible. When i start a predictive modeling / machine learning project i always try to reserve a holdout set asap if possible

#

Sorry for wall of text, there is no "return" in discord on mobile

#

@arctic moth i saw the link to the datasets, what did you want to know about them? I dont have personal experience with them

granite stream Dec 2, 2018, 4:44 PM

#

Could anyone help with https://discuss.pytorch.org/t/attributeerror-nonetype-object-has-no-attribute-in-channels/30882/13?u=promach ?

PyTorch Forums

AttributeError: 'NoneType' object has no attribute 'in_channels'

What do you think about this test function which somehow has the ability of testing out the pruned model ? [phung@archlinux SqueezeNet-Pruning]$ python finetune.py --run /usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:187: UserWarning: The use of the ...

arctic moth Dec 3, 2018, 8:38 AM

#

@desert oar i just wanned to ask how to import dataset which is in matlab file, but I already figured out 😃

#

for anyone interested it is in module scipy.io

#

import scipy.io
mat = scipy.io.loadmat('musk1_normalized_matlab.mat')```

lapis sequoia Dec 4, 2018, 12:04 AM

#

so uh

#

I have a 4 million entries csv which can occupy more than like 30GB when in a pandas dataframe

#

what's the best way to compress all this mess?

#

~~actually 200GB~~

gritty hawk Dec 4, 2018, 7:02 AM

#

@lapis sequoia I would try to split that csv up first

#

it's kind of a mess to find things to split them by though =/

#

also see http://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking

dreamy tartan Dec 4, 2018, 8:58 AM

#

Hi, i want to ask something. Onehot ignoring missing values in test data there is no problem in categoric features but there are some missing values in numeric features too, is there any way to ignore these missing values in test data? I dont want to drop these rows.

olive trench Dec 4, 2018, 12:46 PM

#

Guys do you know why pandas function cat.codes skips some numbers? It is consistent, but my problem is that I want to use one of the cat codes as a row number indicator. I have around 154k unique values but the cat codes go up to 157k (with around 3k being skipped). The cat codes was used on a column that has a text ID

void star Dec 4, 2018, 3:58 PM

#

didn't get far this morning...

📎 unknown.png

chilly shuttle Dec 4, 2018, 4:03 PM

#

@olive trench i mean it's trivial to convert any column into unique integer representation, so you could just do that

#

only guessing here but it might be skipping number ranges to help with ordering

#

and if you want a row number indicator... just make one?

desert oar Dec 5, 2018, 4:34 AM

#

@void star that's just your IDE telling you that you haven't used it yet...

polar acorn Dec 5, 2018, 11:55 AM

#

I assume he meant he didn't get anything done except for importing numpy, I've had such mornings too...

lapis sequoia Dec 5, 2018, 1:07 PM

#

Hey guys: quick question here:

[1,3] in list1
>>> False
nplist = np.asarray(list1)
[1,3] in nplist
>>> True```
I would like numpy to compare the entire [1,3] and not break the list down first. I came up with this solution, but it's ugly.
```np.any(np.all([1,3] == nplist, axis=1))```
Any suggestions for a nicer solution? [was routed here from help]

desert oar Dec 5, 2018, 1:56 PM

#

oof

#

i dont know of a better workaround honestly

#

its a glaring edge case in numpy

#

because tbh its sometimes extremely convenient to write myarray == [1,2] instead of myarray == np.array([1,2]), so its really eager about casting iterable non-array data structures to arrays

#

specifically lists...

#

try it with tuples maybe but i think those get converted as well

#

did you try converting the RHS to array?

#

wait i actually dont think i understand what youre trying to do

#

are you trying to replicate the non-numpy comparison behavior?

lapis sequoia Dec 5, 2018, 2:21 PM

#

yes. that is what I was trying to do.,

#

since I am working with coords, I figured it would be more logical to just convert everything to tuples and take it from there

#

but the the numpy issue still kinda bothered me. So if you have a suggesstion for the future

lyric canopy Dec 5, 2018, 2:52 PM

#

Yes, I think that's the best workaround. I'd write it as this, but it comes down to same thing:

>>> x = np.arange(20).reshape((10,2))
>>> x
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])
>>> ([6, 7] == x).all(axis=1).any()
True
>>> ([6, 8] == x).all(axis=1).any()
False

timber crescent Dec 5, 2018, 6:07 PM

#

guys i have this error in this notebook https://a.uguu.se/TuJggNeodOHj_mnist_1.ipynb

#

any one help??

dreamy tartan Dec 6, 2018, 12:09 PM

#

I couldnt solve this error:
TypeError: Wrong type for parameter `n_values`. Expected 'auto', int or array of ints, got <class 'numpy.ndarray'>
I didnt understand why i get this error because i didnt do something different than what i was doing till now. I have some numeric and categoric columns, i fit_transform X_train and i transform X_test and thats all. Also already my categoric columns are labeled by labelencoder.

ohe = OneHotEncoder(categorical_features=col_index, handle_unknown='ignore')
X_train = ohe.fit_transform(X_train).toarray()
X_test = ohe.transform(X_test).toarray()```

What should i do? What im missing?

desert oar Dec 7, 2018, 1:39 AM

#

hmmm

#

that seems weird

#

full traceback?

thorn river Dec 7, 2018, 10:04 AM

#

I want to use POS-Tags to train a model.

The data is like this:

label = [1, 0, 0, ... N] ```

(1 = Female, 0 = Male) 

I have tokenized the strings with SpaCy and intend to use the POS-tagger from SpaCy.
If I apply the POS-tag to the tokenized strings, do I have to do anything else to train a model on this? Such as concatenating the POS-tags to the strings? 
Or can I immediately apply something like tfidfvectorizer or something to supply it to a model (such as a SVM or anything)

solemn topaz Dec 7, 2018, 1:46 PM

#

Anyone interested in helping my with classifying text documents based on the presence of certain keywords in them?

scenic musk Dec 7, 2018, 1:47 PM

#

!t ask

arctic wedgeBOT Dec 7, 2018, 1:47 PM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving
• Keep your patience while we're helping you.

You can find a much more detailed explanation on our website.

solemn topaz Dec 7, 2018, 1:49 PM

#

Ok. I have a bunch of call center recordings that have been transcribe with AWS transcribe. I'm trying to detect whether or not the customer service agent promoted a new chat-bot tool on a website available to the customer.

#

When the agent promote the tool they usually say something along the lines of "..by the way have you tried our new virtual-assistant"

#

or "we have a new tool available where you can chat with us..."

#

So my approach was to have all these key words and phrases 'ask IT', 'virtual assistant', 'chat with us' etc

#

And scan for their presence in the call transcription json files spit out by AWS transcribe

#

That approach wasn't very accurate. A lot of false positives. So now I am trying cosine similarity by tokanizing the transcription into sentences, stemming, removing stop words etc. Get the term frequency vectors of each sentence and comparing it to the term frequency vector of a concatenated string of all the keywords

#

It's still not as accurate as I'd like so I'm wondering if anyone knows a good approach for this.

#

Cosine similarity seems to work well for comparing two sentences together however I'm trying to find the presence of certain keywords in the text and I'm not sure it's the most suitable method for my particular problem .

#

Wondering if anyone has any suggestions

scenic musk Dec 7, 2018, 1:56 PM

#

Try Machine Learning?

solemn topaz Dec 7, 2018, 1:56 PM

#

the problem is I only have about 10 confirmed examples of the agent promoting the tool. Not much of a training set.

scenic musk Dec 7, 2018, 1:56 PM

#

Yeah.

solemn topaz Dec 7, 2018, 1:57 PM

#

The plan is to use my current approach to detect new instances of promotion, verify them and slowly build up a decent sized training set.

#

but that will take some time.

#

If anyone has any experience with text similarity measures and knowledge of how to best approach this problem I would be so grateful.

hybrid temple Dec 7, 2018, 9:15 PM

#

Is python pretty good when it comes to handling math stuff Im about to take a math class that requires you to learn python along the way

reef bone Dec 7, 2018, 9:40 PM

#

I would say computers are generally pretty good when it comes to handling math stuff

#

Python is beginner-friendly and has many libraries

#

So it's a good tool

hybrid temple Dec 7, 2018, 10:24 PM

#

Yeah thats what i was more looking for

#

Seems python as more user friendly math Libs then java

desert oar Dec 8, 2018, 4:08 AM

#

in python you will just have to write a lot less code than java

#

to do simple things

#

also python has Sympy for symbolic math. im not aware of a java equivalent

granite stream Dec 8, 2018, 10:15 AM

#

For https://gitlab.com/promach/Pruning-CNN/blob/master/SqueezeNet-Pruning/finetune.py#L261 , What do 'batch' and 'label' do ? I have printed them out, but I cannot figure it out

GitLab

SqueezeNet-Pruning/finetune.py · master · promach / pytorch-pruning

AlexNet and Squeezenet Pruning

#

See https://paste.ubuntu.com/p/gWQvTYVv9W/ for the printout

lapis sequoia Dec 8, 2018, 4:32 PM

#

A good tutorial for pandas and numpy?

chilly shuttle Dec 8, 2018, 5:23 PM

#

To do what? Those libraries are very pretty general

vivid hedge Dec 8, 2018, 5:40 PM

#

Would this channel include mathematics?

placid snow Dec 8, 2018, 5:41 PM

#

The one relevant to data-science I suppose

vivid hedge Dec 8, 2018, 5:43 PM

#

Im not sure if this is but Ill give it a shot. Im trying to generate passwords based on different settings and with weights on certain characters (like I want symbols to be heavier then say lowercase letters) and was wondering if you guys can redirect me to some good resources for this.

hearty token Dec 8, 2018, 6:06 PM

#

@vivid hedge random.choices have a weights parameter you could use for that: https://docs.python.org/3/library/random.html#random.choices Not sure what other settings you were going for, but maybe that's a start anyway 😃

vivid hedge Dec 8, 2018, 6:23 PM

#

@hearty token That could actually help me.. I believed I had to write my own (or implement someone elses) algorithm for this.

That might be shooting for the stars abit tho.

Thank you!

hearty token Dec 8, 2018, 6:24 PM

#

Sure thing! Best of luck with it.

half olive Dec 8, 2018, 8:28 PM

#

hey guys. Would like to contribute in an open source python package. I am a data scientist with a good knowledge of python.
I am aware that there are plenty of projects in github where I could start. But, I would like to start with small projects as it is
my first time. cheers!

lean ledge Dec 9, 2018, 11:14 PM

#

@lapis sequoia This is not data science

#

Also change your nickname please

lapis sequoia Dec 9, 2018, 11:14 PM

#

Prove me wrong

lean ledge Dec 9, 2018, 11:14 PM

#

?

lapis sequoia Dec 9, 2018, 11:14 PM

#

Prove me it’s not data science

lean ledge Dec 9, 2018, 11:15 PM

#

Stop trolling

simple crag Dec 9, 2018, 11:18 PM

#

!ban 387197586370592768 troll

arctic wedgeBOT Dec 9, 2018, 11:18 PM

#

:ok_hand: permanently banned @rustic lily (troll).

runic siren Dec 10, 2018, 12:57 PM

#

hello

lapis sequoia Dec 10, 2018, 5:12 PM

#

sup

#

so i am using this function: ```python
import requests

limit = 100
symbol = "BTCUSDT"
timeframe = "1m"

def get_bars(symbol, limit=100):
api = '/api/v1/klines?'
postdict = {
'symbol': symbol,
'interval': timeframe,
'limit': limit
}
return _curl_fox(api=api, postdict=postdict)

def _curl_fox(api, postdict=None):
BASE_URL = 'https://api.binance.com'
url = BASE_URL + api
if postdict:
response = requests.get(url, params=postdict).json()
else:
response = requests.get(url).json()
return response

bardata = get_bars(symbol=symbol, limit=limit)

C = []
for innerlist in bardata:
C.append(innerlist[5])

#print(bardata)
print(C)```

#

that should make a list of the closing prices of btc for the 1m x 100 times that

#

so i want to make a rsi out of that

#

i have a function from that somewhere ```python
def make_RSI(dataframe):
delta = dataframe['c'].diff()
dUp, dDown = delta.copy(), delta.copy()
dUp[dUp < 0] = 0
dDown[dDown > 0] = 0
RolUp = dUp.rolling(14).mean()
RolDown = dDown.rolling(14).mean().abs()

RS = RolUp / RolDown
dataframe['RSI'] = 100 - (100/(1+RS))``` but this one uses a dataframe pandas and i don't wanna use pandas, so how do i make this function usefull for the code i am already using?!

chilly shuttle Dec 10, 2018, 5:19 PM

#

re-implement all the pandas functionality that's using..? I'm not sure why anyone would want to do that though

lapis sequoia Dec 10, 2018, 5:49 PM

#

no i wanna avoid using pandas @chilly shuttle

#

and i don't know how to do that thats what i am asking?

#

?

#

anybody knows how to rewrite the function for my list?

wispy blaze Dec 10, 2018, 5:59 PM

#

"don't wanna use pandas"? but why?

lapis sequoia Dec 10, 2018, 5:59 PM

#

too complicated for me now i don't know how to use it on what i wanna do

wispy blaze Dec 10, 2018, 5:59 PM

#

pandas is like the core.

#

you need to learn it xD

simple crag Dec 10, 2018, 5:59 PM

#

You're making it more complicated by attempting to reimplement Pandas

lapis sequoia Dec 10, 2018, 6:00 PM

#

how do i change my code to make this data in a pandas dataframe?

simple crag Dec 10, 2018, 6:02 PM

#

Didn't the code you copied from the internet already do that?

lapis sequoia Dec 10, 2018, 6:02 PM

#

yes

#

        bar_data = pd.DataFrame(get_bars(symbol=symbol, limit=limit))

        if len(bar_data.index) < length + 2: #if the api dont return sufficent OHLC data: TERMINATE
            printMessages.terminatingProgram()
            printMessages.notEnoughBarData()
            quit()

        bar_data.drop([0, 6, 7, 8, 9, 10, 11], axis=1, inplace=True)
        bar_data.columns = ['o', 'h', 'l', 'c', 'v']
        for j in ['o', 'h', 'l', 'c', 'v']:
            for i, v in enumerate(bar_data[j]):
                bar_data.loc[i, j] = float(v)

        # H/O L/O H/C L/C
        for i in bar_data.index:
            bar_data.loc[i, 'Body'] = min((max(abs(bar_data.loc[i, 'o'] - bar_data.loc[i, 'c']), 0.0001) / max(
                (bar_data.loc[i, 'h'] - bar_data.loc[i, 'l']), 0.0001)), 0.001)
            bar_data.loc[i, 'L/O'] = (bar_data.loc[i, 'l'] / bar_data.loc[i, 'c'])
            bar_data.loc[i, 'C/O'] = (bar_data.loc[i, 'c'] / bar_data.loc[i, 'o'])
            if bar_data.loc[i, 'c'] >= bar_data.loc[i, 'o']:
                bar_data.loc[i, 'TopBottom'] = min((max((bar_data.loc[i, 'h'] - bar_data.loc[i, 'c']), 0.001) / max(
                    (bar_data.loc[i, 'o'] - bar_data.loc[i, 'l']), 0.0001)), 100)
            else:
                bar_data.loc[i, 'TopBottom'] = min((max((bar_data.loc[i, 'h'] - bar_data.loc[i, 'o']), 0.001) / max(
                    (bar_data.loc[i, 'c'] - bar_data.loc[i, 'l']), 0.0001)), 100)```

#

but i have like zero clue of pandas

chilly shuttle Dec 10, 2018, 6:07 PM

#

"too complicated for me now i don't know how to use it on what i wanna do"
then you're gonna have zeeeeero idea how to replicate the functionality in those functions without pandas

#

move on

lapis sequoia Dec 10, 2018, 6:07 PM

#

import requests
import pandas as pd
limit = 100
symbol = "BTCUSDT"
timeframe = "1m"
length = 60



def get_bars(symbol, limit=100):
    api = '/api/v1/klines?'
    postdict = {
        'symbol': symbol,
        'interval': timeframe,
        'limit': limit
    }
    return _curl_fox(api=api, postdict=postdict)
    
    
def _curl_fox(api, postdict=None):
    BASE_URL = 'https://api.binance.com'
    url = BASE_URL + api
    if postdict:
        response = requests.get(url, params=postdict).json()
    else:
        response = requests.get(url).json()
    return response

    
        # get the bars
bar_data = pd.DataFrame(get_bars(symbol=symbol, limit=limit))

if len(bar_data.index) < length + 2: #if the api dont return sufficent OHLC data: TERMINATE
    printMessages.terminatingProgram()
    printMessages.notEnoughBarData()
    quit()

    bar_data.drop([0, 6, 7, 8, 9, 10, 11], axis=1, inplace=True)
    bar_data.columns = ['o', 'h', 'l', 'c', 'v']
    for j in ['o', 'h', 'l', 'c', 'v']:
        for i, v in enumerate(bar_data[j]):
            bar_data.loc[i, j] = float(v)

        # H/O L/O H/C L/C
    for i in bar_data.index:
        bar_data.loc[i, 'Body'] = min((max(abs(bar_data.loc[i, 'o'] - bar_data.loc[i, 'c']), 0.0001) / max((bar_data.loc[i, 'h'] - bar_data.loc[i, 'l']), 0.0001)), 0.001)
        bar_data.loc[i, 'L/O'] = (bar_data.loc[i, 'l'] / bar_data.loc[i, 'c'])
        bar_data.loc[i, 'C/O'] = (bar_data.loc[i, 'c'] / bar_data.loc[i, 'o'])
        if bar_data.loc[i, 'c'] >= bar_data.loc[i, 'o']:
            bar_data.loc[i, 'TopBottom'] = min((max((bar_data.loc[i, 'h'] - bar_data.loc[i, 'c']), 0.001) / max((bar_data.loc[i, 'o'] - bar_data.loc[i, 'l']), 0.0001)), 100)
        else:
            bar_data.loc[i, 'TopBottom'] = min((max((bar_data.loc[i, 'h'] - bar_data.loc[i, 'o']), 0.001) / max((bar_data.loc[i, 'c'] - bar_data.loc[i, 'l']), 0.0001)), 100)


#print(bardata)
print(bar_data)```

#

this is what it should look with that

#

you can run it for yourself what the return is (its too big too paste here)

#

so how do i call this function then ```python
def make_RSI(dataframe):
delta = dataframe['c'].diff()
dUp, dDown = delta.copy(), delta.copy()
dUp[dUp < 0] = 0
dDown[dDown > 0] = 0
RolUp = dUp.rolling(14).mean()
RolDown = dDown.rolling(14).mean().abs()

RS = RolUp / RolDown
dataframe['RSI'] = 100 - (100/(1+RS))```

#

like what is "dataframe" bar_data?

simple crag Dec 10, 2018, 6:09 PM

#

bar_data is a dataframe, yes

lapis sequoia Dec 10, 2018, 6:11 PM

#

then i get ```KeyError: 'c'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "pandas_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 811, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required```

#

with thiscode

wispy blaze Dec 10, 2018, 6:12 PM

#

your columns are named after integers

#

not letters

lapis sequoia Dec 10, 2018, 6:12 PM

#

yes?

wispy blaze Dec 10, 2018, 6:12 PM

#

dataframe['c']

#

this is calling collumn with header "c"

chilly shuttle Dec 10, 2018, 6:12 PM

#

50 bucks it'll return object

#

as the dtype

wispy blaze Dec 10, 2018, 6:13 PM

#

well yeah

#

It's still not "c" tho

chilly shuttle Dec 10, 2018, 6:13 PM

#

i don't get it

#

all the code there is referencing a column named c

lapis sequoia Dec 10, 2018, 6:14 PM

#

yes

#

it should be named c

wispy blaze Dec 10, 2018, 6:14 PM

#

📎 unknown.png

lapis sequoia Dec 10, 2018, 6:14 PM

#

but it isn;t

wispy blaze Dec 10, 2018, 6:14 PM

#

his output

lapis sequoia Dec 10, 2018, 6:14 PM

#

yeah

#

shouldn't they be renamed here? for j in ['o', 'h', 'l', 'c', 'v']: for i, v in enumerate(bar_data[j]): bar_data.loc[i, j] = float(v)

#

or here

#

bar_data.columns = ['o', 'h', 'l', 'c', 'v']

#

currently running this code: https://hastebin.com/mewuxanuri.py

wispy blaze Dec 10, 2018, 6:17 PM

#

ah

#

okay

#

your if statement is failing

#

if len(bar_data.index) < length + 2:

#

length = 60

#

oh wait

#

lolol

lapis sequoia Dec 10, 2018, 6:18 PM

#

lol

wispy blaze Dec 10, 2018, 6:18 PM

#

i need a nap

lapis sequoia Dec 10, 2018, 6:18 PM

#

also an approach.....

wispy blaze Dec 10, 2018, 6:19 PM

#

so if i run it piecewise.. it works

lapis sequoia Dec 10, 2018, 6:19 PM

#

yes

#

but once i want to get indicators it doesn't

wispy blaze Dec 10, 2018, 6:20 PM

#

It is your if statement

#

len(bar_data.index) = 100

#

length = 60

#

if you switch < tp > it works

📎 unknown.png

#

if you leave it in you need to move the rest of the code to else:

#

    printMessages.terminatingProgram()
    printMessages.notEnoughBarData()
    quit()

else:
    bar_data.drop([0, 6, 7, 8, 9, 10, 11], axis=1, inplace=True)
    bar_data.columns = ['o', 'h', 'l', 'c', 'v']
    for j in ['o', 'h', 'l', 'c', 'v']:
        for i, v in enumerate(bar_data[j]):
            bar_data.loc[i, j] = float(v)

        # H/O L/O H/C L/C
    for i in bar_data.index:
        bar_data.loc[i, 'Body'] = min((max(abs(bar_data.loc[i, 'o'] - bar_data.loc[i, 'c']), 0.0001) / max((bar_data.loc[i, 'h'] - bar_data.loc[i, 'l']), 0.0001)), 0.001)
        bar_data.loc[i, 'L/O'] = (bar_data.loc[i, 'l'] / bar_data.loc[i, 'c'])
        bar_data.loc[i, 'C/O'] = (bar_data.loc[i, 'c'] / bar_data.loc[i, 'o'])
        if bar_data.loc[i, 'c'] >= bar_data.loc[i, 'o']:
            bar_data.loc[i, 'TopBottom'] = min((max((bar_data.loc[i, 'h'] - bar_data.loc[i, 'c']), 0.001) / max((bar_data.loc[i, 'o'] - bar_data.loc[i, 'l']), 0.0001)), 100)
        else:
            bar_data.loc[i, 'TopBottom'] = min((max((bar_data.loc[i, 'h'] - bar_data.loc[i, 'o']), 0.001) / max((bar_data.loc[i, 'c'] - bar_data.loc[i, 'l']), 0.0001)), 100)```

lapis sequoia Dec 10, 2018, 6:23 PM

#

yes

#

so now it should work right?

wispy blaze Dec 10, 2018, 6:24 PM

#

it should

#

that rsi function should eat the bar_data just fine

lapis sequoia Dec 10, 2018, 6:25 PM

#

yeah oke thats cool i guess

#

but here is the part that i don't get lol, how to use this data

#

i wanna create an average of the rsi

#

but as you can see the rsi starts with nan in order to start creating a rsi

#

0     4.277778        NaN
1     4.462882        NaN
2   100.000000        NaN
3     3.490909        NaN
4     1.837302        NaN
5     2.070513        NaN
6    25.000000        NaN
7     0.212454        NaN
8     0.000741        NaN
9     3.985294        NaN
10    1.128302        NaN
11    1.585714        NaN
12    2.701754        NaN
13   42.500000        NaN
14    0.126482  55.902004```

#

so we can't run an calculation on that

wispy blaze Dec 10, 2018, 6:27 PM

#

wait what

#

rephrase

lapis sequoia Dec 10, 2018, 6:28 PM

#

the rsi's first outputs are NAN as you can see

#

so i want the latest rsi value

chilly shuttle Dec 10, 2018, 6:29 PM

#

yeah that's what rolling window functions will do

lapis sequoia Dec 10, 2018, 6:29 PM

#

in a variable

#

so how do you extract such thing from it?

chilly shuttle Dec 10, 2018, 6:29 PM

#

define 'latest'

lapis sequoia Dec 10, 2018, 6:29 PM

#

the latest known value so the 100th value in this case

#

correction 99th since the first is the name

wispy blaze Dec 10, 2018, 6:30 PM

#

just drop those rows?

lapis sequoia Dec 10, 2018, 6:30 PM

#

no but i want that value in a variable

wispy blaze Dec 10, 2018, 6:30 PM

#

then get more data?

lapis sequoia Dec 10, 2018, 6:30 PM

#

whut

chilly shuttle Dec 10, 2018, 6:30 PM

#

you are trying to build a castle

#

on a cloud

#

with no foundation

wispy blaze Dec 10, 2018, 6:30 PM

#

xD

chilly shuttle Dec 10, 2018, 6:30 PM

#

understand stats. Understand pandas

#

then do whatever tf you're trying to do

lapis sequoia Dec 10, 2018, 6:31 PM

#

i don't understand pandas, i told that already and also said i wanted to avoid it

wispy blaze Dec 10, 2018, 6:31 PM

#

there is no avoiding pandas

#

if you want to avoid pandas python is not for you xD

chilly shuttle Dec 10, 2018, 6:32 PM

#

you sound like 'i wanna drift around a pro racing circuit in a top 10% time, but I don't understand transmissions. How do I race without using a transmission?'

wispy blaze Dec 10, 2018, 6:32 PM

#

^^

lapis sequoia Dec 10, 2018, 6:32 PM

#

there are only 2 things left that i need? please 😄 ?

wispy blaze Dec 10, 2018, 6:32 PM

#

is this homework?

lapis sequoia Dec 10, 2018, 6:32 PM

#

nope

wispy blaze Dec 10, 2018, 6:33 PM

#

then youve got plenty of time to read up xD

lapis sequoia Dec 10, 2018, 6:33 PM

#

sad face

#

this is the code i have now

#

https://hastebin.com/venuxukoji.py

#

i added another indicator to it since heck why no

#

i am just testing around with this data, but sadly it is in pandas thats why i wanna make them variables now so its more in my field

#

so if you'd want to explain to me how extracting specific data from the dataframe is possible so that we can both sleep 😄

chilly shuttle Dec 10, 2018, 6:36 PM

#

.dropna().values[0]

#

now get out

wispy blaze Dec 10, 2018, 6:36 PM

#

LOLOL

lapis sequoia Dec 10, 2018, 6:36 PM

#

no i wasn't talking about the nan values

wispy blaze Dec 10, 2018, 6:37 PM

#

Your doing an extremely poor job of explaining what you even want out of the dataframe.

lapis sequoia Dec 10, 2018, 6:37 PM

#

i need the latest (this case the 99th value) of the RSI collum in a variable named x and i need to have the distance from the upperband and underband from the ma in (2) variables

wispy blaze Dec 10, 2018, 6:38 PM

#

x=dataframe['column'][row]

chilly shuttle Dec 10, 2018, 6:38 PM

#

wait you can do that now?

#

no need for iloc?

wispy blaze Dec 10, 2018, 6:38 PM

#

I've never used iloc

chilly shuttle Dec 10, 2018, 6:39 PM

#

i guess it's very pandas'y to have multiple syntax for the same thing

wispy blaze Dec 10, 2018, 6:39 PM

#

bar_data['c'][12]

📎 unknown.png

#

iloc has always bothered me tbh

simple crag Dec 10, 2018, 6:40 PM

#

coming from MATLAB iloc is great

wispy blaze Dec 10, 2018, 6:41 PM

#

MATLAB is sin

simple crag Dec 10, 2018, 6:41 PM

#

No memes pls

wispy blaze Dec 10, 2018, 6:41 PM

#

i hear the germans still love it tho

lapis sequoia Dec 10, 2018, 6:41 PM

#

lol

chilly shuttle Dec 10, 2018, 6:42 PM

#

📎 unknown.png

#

oh shit it wins on perf too

#

TIL

#

caveat being it only works in favour if you're fetching a single column

wispy blaze Dec 10, 2018, 6:43 PM

#

true.

#

bar_data[['c','o']][1:2] unless you do that

#

but the output is a bit different

chilly shuttle Dec 10, 2018, 6:44 PM

#

also vectorized access

#

different beasts after all

#

but yeah i had no idea you would access rows with a second indexer like that

#

...even though it's the same as slicing a dataframe or view

#

I guess I'm just dumb

wispy blaze Dec 10, 2018, 6:45 PM

#

shrug you seem alright to me

chilly shuttle Dec 10, 2018, 6:47 PM

#

so the only actual winning use case for iloc is vectorized access

#

a la

📎 unknown.png

#

and iirc it's the only way to do assignment anymore

lapis sequoia Dec 10, 2018, 6:48 PM

#

yes

#

i got it working now

#

thank you guys both so much

chilly shuttle Dec 10, 2018, 6:48 PM

#

congrats, you have build a castle on top of a cloud

lapis sequoia Dec 10, 2018, 6:49 PM

#

yey 😄

chilly shuttle Dec 10, 2018, 6:49 PM

#

it will fall apart at the first breeze

wispy blaze Dec 10, 2018, 6:49 PM

#

lol

lapis sequoia Dec 10, 2018, 6:49 PM

#

¯_(ツ)_/¯

#

defenitly

#

thats why i don't touch it anymore lolzs

#

but just thanks, really appreciated

wispy blaze Dec 10, 2018, 6:53 PM

#

geen probleem

lapis sequoia Dec 10, 2018, 6:55 PM

#

fijne avond nog

chilly shuttle Dec 10, 2018, 6:55 PM

#

also for those who haven't seen it
https://github.com/google/jax

GitHub

google/jax

GPU- and TPU-backed NumPy with differentiation and JIT compilation. - google/jax

vale hedge Dec 11, 2018, 4:13 AM

#

anyone know for pandas dataframe, what is simplest way to add a row?

slender oracle Dec 11, 2018, 4:18 AM

#

If the row you’re adding is another dataframe with the same columns you could use pd.concat

woven tundra Dec 11, 2018, 6:26 AM

#

df.append({"col1": val1, "col2": val2}, ignore_index=True)

#

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html

thorn river Dec 11, 2018, 11:26 AM

#

Does anyone know how to cite the SpaCy POS-tagger? I've scoured the internet but cant'seem to find anything

void anvil Dec 11, 2018, 5:03 PM

#

for ML in python, is it more correct / efficient to keep everything in data frames or separate lists

#

e.g. df1 has predictor 1, predictor x, bins

#

or df predictor 1, predictor x; list bin

twin sierra Dec 11, 2018, 9:23 PM

#

Hello, I am looking for a better way to mask out the Green and Blue channel of an opencv image.

The current way I do it is create an image the same size as my target, and set each pixel to (0,0,255), then use cv.bitwise_and between the two images so only the Red channel is left. This works, but I would like to not have to have a 2nd buffer for the mask when every pixel of the mask is the same.
The 2nd way I tried is img_target[:,:,:2] = 0 which sets for first two channels of every pixel to 0. This also works, but takes 16 times as long as the bitwise_and method I currently use.
Finally, I thought I could apply the bitwise operation using a slide like img_target[:,:] &= (0,0,255), but that threw a type error.

Is there a way to do this that is as fast as bitwise_and, but doesn't use a 2nd buffer?

twin sierra Dec 11, 2018, 9:42 PM

#

I figured out a way that is even faster than the buffer. cv.bitwise_and(img_target, (0, 0, 255), img_target)

wheat wedge Dec 12, 2018, 1:58 AM

#

is there a good way in python to calculate the joint eigenvalues of a pair of matrices?
so finding the lambdas for det(\lambda*A-B)=0

#

where A and B are covariance matrices

wheat wedge Dec 12, 2018, 3:03 AM

#

nvm, it is solved afaik

latent flicker Dec 12, 2018, 9:26 AM

#

Anyone have any good pandas resources to go from beginner to intermediate?

polar acorn Dec 12, 2018, 1:48 PM

#

https://github.com/tdpetrou/Learn-Pandas
https://medium.com/dunder-data/how-to-learn-pandas-108905ab4955

#

Should give you a good grasp of the fundamentals at least when it comes to indexing, where I think a lot of beginners have trouble.

gray tartan Dec 12, 2018, 3:31 PM

#

hey there !
important question
how would u make a 2D image into 180° 3D curviline perspective ?
(sth like from a fisheye objectif)
https://static.fnac-static.com/multimedia/Images/FR/MC/29/84/32/20087849/1541-4/tsp20160630233228/Objectif-Fisheye-0-35X-pour-reflex-CANON-EOS.jpg

i need to turn a panoramic 2D image into this

late garnet Dec 12, 2018, 5:06 PM

#

Does anyone have experience with NLP? I am trying to do some data cleansing on some free form text - specifically job titles. However, there are many variations in typos making it difficult to cleanse. Do any of you have any general suggestions? So far I have manually curated job titles, built a BernoulliNB model to suggest labels and have attempted to use spelling correction. Another approach I have thought about is to use edit distance, but I would need to find a list of common job titles.

carmine lava Dec 12, 2018, 7:47 PM

#

hi

lean ledge Dec 13, 2018, 12:12 AM

#

@gray tartan the image you linked looks like an image with high barrel radial distortion. Should be able to construct a distortion coefficient matrix that does that and crop into a circle. OpenCV should be able to do that. Think you might have to construct an identity camera matrix

small ore Dec 13, 2018, 12:36 AM

#

What is a greedy algorithm?

lean ledge Dec 13, 2018, 12:38 AM

#

Algorithms that at any point maximise local benefit using a heuristic function but plan to find the global optimum solution anyway

small ore Dec 13, 2018, 12:44 AM

#

That definition seems more complicated than ESL 😄

#

"heuristic function" :faint:

latent flicker Dec 13, 2018, 12:47 AM

#

@polar acorn thanks

lean ledge Dec 13, 2018, 4:03 AM

#

@small ore A heuristic function is a "hint" function. It approximates and hints to whether you're close to the goal or not. Imagine if you were looking for a path from point A to B in a maze. A naive search algorithm would start at start and search in all directions even if its not meant to go that way. A heuristic function would be a measure of how close you are to the goal. So an algorithm like A* would start by expanding only in the directions that help reach the goal (by listening to the heuristic function) but if that doesnt work try out other options

gray tartan Dec 13, 2018, 7:45 AM

#

@lean ledge hey thx

#

I learnt about it

#

But i cant find any algorithm that can reproduce it

#

So i think i'll try it myself

#

With opencv i found onky things to correct it

#

So, the contrary

#

To project on a spheric surface, is it better to do barrel or pincushion distorsion ?

lean ledge Dec 13, 2018, 7:50 AM

#

If you dont mind C++, https://stackoverflow.com/questions/10935882/re-distort-points-with-camera-intrinsics-extrinsics/13778828#13778828

Stack Overflow

Re-distort points with camera intrinsics/extrinsics

Given a set of 2D points, how can I apply the opposite of undistortPoints?

I have the camera intrinsics and distCoeffs and would like to (for example) create a square, and distort it as if the cam...

#

you want barrel distortion

gray tartan Dec 13, 2018, 7:51 AM

#

Cool

#

Thx

#

In fact i've just to make the points nearer to the center

#

Starting by the center

#

I'll try by dividing by 10 the distance between the pixel and the center

gray tartan Dec 13, 2018, 8:15 AM

#

ok @lean ledge

#

i've happenned with this :

#

from PIL import Image
import math


im = Image.open('08.jpg').convert('RGBA')
im2 = Image.new('RGBA', (im.width, im.height), (255,255,255,255))

coordsStart = (int(im.width/2) + 1, int(im.height/2) - 1)
cardinalSearch = [(-1, 0), (0, 1), (1, 0), (0, -1)]
pointer = 0
for x in range(im.height - 1):
    for i in range(2):
        for y in range(x):
            coords = (coordsStart[0] + (x * cardinalSearch[pointer][0]), coordsStart[1] + (x * cardinalSearch[pointer][1]))
            centerDistance = math.sqrt((coords[0] - coordsStart[0]) ** 2 + (coords[1] - coordsStart[1]) ** 2)

            im2.putpixel((int(coords[0] - cardinalSearch[pointer + 1 if pointer != 3 else 0][0] * centerDistance / 4), int(coords[1] - cardinalSearch[pointer + 1 if pointer != 3 else 0][1] * centerDistance / 4)), im.getpixel(coords))

        pointer += 1 if pointer != 3 and x != 0 else -3 if x != 0 else 0

im2.save('barrelDistorded.png')

#

but it gives me an index out of range on getpixel :/

#

coords is supposed to go spiral from center

#

to get a smooth distortion

#

starting from the top right pixel from the square of 4 pixels at the center

somber zodiac Dec 13, 2018, 4:00 PM

#

Hello

#

Is anyone familiar with Yolo object detection?

#

I'm doing a project for Cambridge Uni

#

Would greatly appreciate some help

#

Not regarding code, but concept

hearty token Dec 13, 2018, 4:06 PM

#

I'm not familiar with it @somber zodiac, but I saw your question in #help-kiwi and had a look. Just a thought, but:
By default, YOLO only displays objects detected with a confidence of .25 or higher. You can change this by passing the -thresh <val> flag to the yolo command. If you have many similar objects in the image (like the smileys), perhaps the threshold of 25% match is too low? So it would find too many matches. The tool is made for real-time object detection after all, so it can't be too picky. But in your case it seems it should be very picky. Did you try adjusting this?

somber zodiac Dec 13, 2018, 4:09 PM

#

I have. I'll explain more clearly what's happening

#

Suppose you have this image

#

📎 mobile.png

#

Now suppose all of the emojis are moving, such that they may occlude other emojis

#

And suppose we want to detect just 1 of all the emojis on the screen

#

I.e. this one:

#

📎 unknown.png

#

How many images of 😪 would you expect to require to train the model?

#

Because I've used 300 and sometimes it produces bounding boxes around 😪 and 😆 for instance saying they are the same thing

#

Thing is, for those 300 images, they're all of the same icon

#

I've changed the threshold as best as I can to narrow the detection down to 1 object as closely as possible but sometimes the minimum it can detect is 2 objects

#

Which it shouldn't, because there is only 1 😪

#

My error is near 0 in training

hearty token Dec 13, 2018, 4:19 PM

#

Well, this is not my area, but I'll share my thoughts. I'm thinking if you want it to be able to identify partially obstructed smileys, shouldn't you train it with what the partially obstructed smiley looks like? 300 identical images wouldn't give it any training, as far as I can tell anyway (or does the algorithm account for that?). Maybe you could randomize training images by randomly cropping off segments of the smiley you want to identify, or something like that?

somber zodiac Dec 13, 2018, 4:20 PM

#

Yup I've done that

#

I didn't literally use 300 of the same image

#

I used 300 of the same emoji, but in different scenarios

#

i.e. how you desccribe

#

Sometimes it is partially covered

#

Sometimes it isn't but there is something in the background

hearty token Dec 13, 2018, 4:25 PM

#

Well, I don't know, to be honest. And I don't even know if you're getting unexpected behavior. Does the percentage of times it gets it right correspond somewhat with the confidence threshold?

somber zodiac Dec 13, 2018, 4:27 PM

#

It does depend on the threshold. I'm usually able to detect it correctly when the object isn't occluded but sometimes it classifies 2 different emojis that aren't occluded as the same object

#

But when I vary the thresold to detect just 1 when it does that

#

It picks the wrong one

hearty token Dec 13, 2018, 4:31 PM

#

In the demo video YOLO is distinguishing bicycles from dogs, for example. That difference is pretty big compared to 😪 and 😆 . So perhaps it's expected that it confuses the two? Both smileys will have obstructed parts that look the same. I guess maybe the training images shouldn't be too obstructed? Because if all the distinguishing features are obstructed, and you're saying "that's my smiley", then the program will think anything yellow is the one you're looking for

somber zodiac Dec 13, 2018, 4:34 PM

#

Yes that's correct, but what would need to be done to overcome this?

#

Also, the problem I'm talking about doesn't have any occlusion

#

Sometimes, the emojis aren't even covered, yet it still classifies more than 1 emoji that aern't covered as the same thing

lyric canopy Dec 13, 2018, 4:37 PM

#

I've also been reading a bit about the model: Isn't its main goal to detect classes of object, not specific individuals?

#

That would explain why it sometimes groupes smileys together

#

It also doesn't seem to care much about small details (because it isn't interested in classifying individuals, but classes)

hearty token Dec 13, 2018, 4:39 PM

#

@somber zodiac Right, but wouldn't that happen if you've trained it with too many "feature-weak" images?

#

Or perhaps it's in the model itself, as Ves suggests

#

Also, the first answer you got (in #help-kiwi ) was that you should train it more. You said it shouldn't need training because it's a static image, but in this problem description, it isn't at all static. So maybe it is just more training required?

somber zodiac Dec 13, 2018, 4:52 PM

#

Suppose you have a 60 second clip of the emojis moving

#

I've taken 60 frames

#

So 1 frame per second

#

So those are my static images

#

I have 60 static images

#

And in each image I want to detect that emoji

hearty token Dec 13, 2018, 4:58 PM

#

Why not try giving it more training and see if it improves?

lapis sequoia Dec 13, 2018, 5:14 PM

#

Also doesn't that more fall under object tracking rather than classification?

small ore Dec 13, 2018, 9:10 PM

#

Thanks @lean ledge .

lean ledge Dec 13, 2018, 9:14 PM

#

That sounded unproductive, lol

#

@somber zodiac Is it generating extra bounding boxes for the other emoji during the non maximal suppression stage?

#

It sounds to me like the problem might be removing the bounding box for the emoji behind because of high shared area

#

Have you tried using RCNNs instead of YOLO? They are not as fast but they result in generally better performance than YOLO

#

So they're better for non real-time contexts

#

And they can generally be built up to more complicated stages where they can do instance segmentation instead of just detection which YOLO isnt really built for. If you get instance segmentation working for your network, your problem would be solved because it would be trying to segment each different instance of the emoji differently

delicate nymph Dec 13, 2018, 9:23 PM

#

good evening

#

is there a way to draw the best line for this graph?

#

📎 unknown.png

lean ledge Dec 13, 2018, 9:24 PM

#

looks like you can just do linear fits for each set of points

delicate nymph Dec 13, 2018, 9:25 PM

#

the look more like curves to me

lean ledge Dec 13, 2018, 9:25 PM

#

search up linear regression. lots of frameworks that can do that

#

you sure?

#

they're blobby but the relationship is still linear

delicate nymph Dec 13, 2018, 9:26 PM

#

well i have a slightly different

#

📎 unknown.png

lean ledge Dec 13, 2018, 9:26 PM

#

now that is not linear, yes

delicate nymph Dec 13, 2018, 9:26 PM

#

and i think they both should be curves

#

may i explain what i've tried so far?

lean ledge Dec 13, 2018, 9:27 PM

#

honestly, the best fit to me looks like 2 different lines

delicate nymph Dec 13, 2018, 9:28 PM

#

i don't understand i'm sorry

#

i've tried a 2 libraries so far and they don;t have subplots

#

fig5, ax5 = plt.subplots()
ax5.plot(data30['PAR'],data30['302nm'],'bo',data30['PAR'],data30['312nm'],'ko',
        data30['PAR'],data30['320nm'],'go',data30['PAR'],data30['340nm'],'ro',
        data30['PAR'],data30['380nm'],'mo',ms=2)
ax5.grid()
fig5.savefig("PAR30.png")
plt.show()

lean ledge Dec 13, 2018, 9:29 PM

#

one line with a higher slope for the first few parts, another line for the second part also going up but less so

delicate nymph Dec 13, 2018, 9:30 PM

#

shouldn't it be a code where it finds itself the string? i mean there is in origin there should be in python too

lean ledge Dec 13, 2018, 9:30 PM

#

the closest curve I can think of for that data would be the output characteristics and operation regions of a MOSFET

📎 tran37.png

#

so you can look at the model equations they use for its regions

delicate nymph Dec 13, 2018, 9:31 PM

#

but this wouldn't be exact right?

#

it's like scipy's curve_fit

#

you have to guess the equation

fierce saffron Dec 13, 2018, 9:55 PM

#

Is there any good way of including a numeric measure of the number of samples for jointplot?

#

in seaborn

#

I've tried kdeplot & jointplot, but neither seems to provide meaningful numbers to go along with the plot.

ancient dome Dec 14, 2018, 5:15 AM

#

hello

#

anyone here can help me with time series forecasting ?

polar acorn Dec 14, 2018, 9:19 AM

#

Maybe, ask your question and we'll see.

strange radish Dec 14, 2018, 8:11 PM

#

Hey, everyone. I'm the lead dev and maintainer of PyData/Sparse (http://sparse.pydata.org/) and I'd like to invite everyone here to a webinar where I'll be talking about it. It's on December 19 at Noon Eastern. https://app.livestorm.co/quansight/

Quansight

Quansight events | Livestorm

Partner with us on Open Source to prioritize your needs, hire from proven contributors, apply proven collaboration techniques to your business, and get support and maintenance for open source.

small ore Dec 14, 2018, 9:11 PM

#

Helps if you can give a x hours from now, @strange radish

strange radish Dec 14, 2018, 9:12 PM

#

@small ore Done.

small ore Dec 14, 2018, 9:13 PM

#

Ah. Still days to it. I said x hrs from now so the timezone miscalculation problem can be avoided

strange radish Dec 14, 2018, 9:14 PM

#

I'll re-post closer to when it's about to happen! 😄

#

But if you're interested, you can register and it'll email you.

small ore Dec 14, 2018, 9:15 PM

#

👍 I check here more than I check emails. Thanks for letting us know

serene veldt Dec 14, 2018, 11:11 PM

#

hey guys

#

im having an issue with eager tensorflow

#

i have this dataset

#

70,1,4,130,322,0,2,109,0,24,2,3,3,2
67,0,3,115,564,0,2,160,0,16,2,0,7,1
57,1,2,124,261,0,0,141,0,3,1,0,7,2
64,1,4,128,263,0,0,105,1,2,2,1,7,1
74,0,2,120,269,0,2,121,1,2,1,1,3,1
65,1,4,120,177,0,0,140,0,4,1,0,7,1
56,1,3,130,256,1,2,142,1,6,2,1,6,2
59,1,4,110,239,0,2,142,1,12,2,1,7,2
60,1,4,140,293,0,2,170,0,12,2,2,7,2
63,0,4,150,407,0,2,154,0,4,2,3,7,2

#

just a small sample for testing

#

and am trying to read it properly

#

tf.enable_eager_execution()
defaults = [tf.float64] * 14
dataset=tf.data.experimental.CsvDataset(path, defaults)
>>> dataset
>>> <CsvDataset shapes: ((), (), (), (), (), (), (), (), (), (), (), (), (), ()), types: (tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64, tf.float64)>

#

so far so good, but then i get this CsvDataset object

#

my goal would be to have a list of tensors containing all the values from each row

#

like such

#

[<tf.Tensor(shape=(10,), dtype=float64, numpy=array([70.0,67.0,57.0,64.0,74.0,65.0,56.0,59.0,60.0,63.0]))>,
(...) x14]

#

i have tried doing : col1 = dataset.map(lambda *row: row[0]) which makes an iterable <MapDataset shapes: (), types: tf.float64>

#

the problem with that would be raising the complexity to O(n^2) since i would have to loop all the columns and then iterate over the MapDatasets

#

isnt there any proper way to get that desides list of tensors?

small ore Dec 14, 2018, 11:45 PM

#

Could we discuss l1 vs l2 regression please? I just learnt about each and I am not sure how one fares vs the other and in what circumstances each is used

serene veldt Dec 15, 2018, 12:20 AM

#

ignore my question, found an answear

lean ledge Dec 15, 2018, 4:20 AM

#

l1 vs l2 isnt specifically about regression, it's about the nature of the norm

polar acorn Dec 15, 2018, 1:10 PM

#

@small ore Regression is used when you want to avoid overfitting right? Regularisation works by adding a term to the cost function, the term involves the parameters of your model. So in essence you want to minimize the cost function while at the same time keeping your parameters close to zero. So do you want to mimize the square of the parameters (this is l2) or the absolute value of the parameters (this is l1)?

If you choose l1, the absolute value, there is a larger chance that some of your parameters end up at zero, but the cost function itself might not be as minimized so the model is a slightly worse fit to the data. With l2 you might end up with fewer parameters that are zero but with a slightly better fit.

So in the end. If you have a lot of parameters in your model and you suspect some of them are superfluous you can use l1 to reduce the number of parameters that influence your model. If you have fewer parameters and you suspect they are all valuable you could use l2 to squeeze out a better fit while still avoiding overfitting.
Makes sense?

chilly shuttle Dec 15, 2018, 3:09 PM

#

that's a pretty good explanation

#

one thing worth adding is even though regularisation modifies trainable parameters, the consequence is it modifies sensitivity to actual features in the input

#

and L1 will cause low-impact features to get ignored as they're received with a 0 term

small ore Dec 15, 2018, 3:59 PM

#

Woo. Awesome explanations, pppt and bicubic. Thank you. Though I would like to hear any more intutions if there is to it.

chilly shuttle Dec 15, 2018, 4:00 PM

#

ask questions

small ore Dec 15, 2018, 4:03 PM

#

Hm. This is also a discussion channel. So I thought a discussion would benefit. Anyway, question: Is there a way to assess sensitivity of each parameter rather than just find out these are in or out from a l1 after the optimization?

chilly shuttle Dec 15, 2018, 4:05 PM

#

there's a whole bunch of feature selection algorithms, using L1 is not the only way to do feature selection

small ore Dec 15, 2018, 4:07 PM

#

Well, noob here. I just learnt l1 and was interested in knowing how best to put it to use

charred crest Dec 15, 2018, 7:49 PM

#

Hey, is there the right channel to ask question about DL ? (Basically I'm wondering if my results are coherent, I train an AI against itself then I make it play against another AI that my teacher trained before, but my results are kind of weird)

lean ledge Dec 15, 2018, 11:05 PM

#

@charred crest Yes

charred crest Dec 15, 2018, 11:08 PM

#

So, I'm wondering why when I train my AI 70.000 times I have a better % of victory than when I train it 100.000 times, my % of victory decrease by 5 ? That's kind of weird, no? @lean ledge

📎 unknown.png

latent flicker Dec 15, 2018, 11:32 PM

#

@lean ledge are there assignments in that course?

charred crest Dec 15, 2018, 11:34 PM

#

@latent flicker Are you talking to me or?

latent flicker Dec 15, 2018, 11:34 PM

#

No, to @lean ledge

small ore Dec 16, 2018, 12:15 AM

#

Anyone has experience with azure notebooks unable to restart?

worn hollow Dec 16, 2018, 5:24 AM

#

Hey im trying to get started with machine learning and neural networks in python (seems like tensorflow is the way to go?) could anyone get my pointed in the right direction to start this stuff? I tried to do the tutorials from https://www.tensorflow.org/tutorials/ but keras fails to download the datasets.

arctic moth Dec 16, 2018, 9:40 AM

#

Hello, I want to use the weights that are outputted by SVM classificator from scikit-learn to test the data that I used to fit the model (Positive should give me number > 1 and negative should give me number < -1) and for small number of samples (5) the output weights are 2D. but when I try to add more points (100) and train the classifier on 2-dimensional vectors its output weights is 3-dimensinal array.How can I get the coefficients that are used to multiply the 2D vector?

#

@worn hollow Dont start with tensorflow, if you have no experience with machine learning or neural networks. Try Keras tutorials for starters, or even scikit-learn is much more user friendly. There is a lot of concepts behind machine learning and neural networks in general and it is much better to start learning in easier frameworks than in the hardest one.

sleek path Dec 16, 2018, 11:49 AM

#

hi

#

is ms in data science a good idea?

lyric canopy Dec 16, 2018, 11:59 AM

#

No one can decide that for you

charred crest Dec 16, 2018, 1:06 PM

#

Is this coherent result for IA reinforcement learning with Q-learning / Td-lambda algorithms?

📎 unknown.png

lapis sequoia Dec 17, 2018, 11:09 AM

#

Not sure to what extend this is a specific data science question, but my application is regarding data science, so I'll ask away. Do any of you have experience in subclassing numpy 's ndarray. The context is typhoon tracking. My idea is to break down my global model into regions class instances and add my parameters to these. The parameters are all np arrays (from netCDF)
e.g.

def func_for_ndarray()
    pass

#region.parameter.function
se_asia.pressure.func_for_ndarray()

Would this method be advisable? It would be extremely elegant, but I'm afraid of starting something that's maybe too complex for what it's worth.

void anvil Dec 17, 2018, 6:19 PM

#

Quick question about using an ensemble classifier for time series data:

Say I have a dataset that's all of 2017. For a train / test split I arbitrarily choose October as the last month for train data, Nov:Dec for the test and I train a few models (SVM, RF, MLP, etc.) on this data.

Now I want to train an ensemble (without contamination) on the few model's I've made (e.g. train an RF using predictors of SVM, RF, MLP probability output from the Jan:Oct / Nov:Dec train / test split). Can I predict Jan:Oct using the previous models and feed that to train the ensemble model (e.g. use the same Jan:Oct train / Nov:Dec test), or is that 'cheating' because the initial set of models are trained on the input data which is then being fed to the ensemble? I'm thinking in order to create a 'clean' model I would need to train on the November model outputs (of this example) and test on the December actuals in order to keep the data sanitar

void anvil Dec 17, 2018, 7:40 PM

#

Basically should the ensemble train/test be red or green?

📎 unknown.png

fierce saffron Dec 17, 2018, 8:34 PM

#

anyone know if I can add numbers to the histograms in this seaborn plot:

📎 unknown.png

void anvil Dec 17, 2018, 8:36 PM

#

What do you mean numbers

#

like volume numbers on the x / y axis?

fierce saffron Dec 17, 2018, 8:52 PM

#

yeah.

#

like the number the bar on the histogram represents.

void anvil Dec 18, 2018, 2:05 AM

#

don't know, sorry

#

Anyone know how I should preprocess data? Currently using

scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)```

However, trying to do a Multinomial NaiveBayes is throwing me for a massive loop because it's rather upset there's negative values
```ValueError: Input X must be non-negative```

feral lodge Dec 18, 2018, 9:17 AM

#

@void anvil Draws from a k-dimensional multinomial is like casting a k-sided die n times and counting how many times each side came up, right? So here are a few draws from a Multinomial(k=3, n=5):

[1, 1, 3]
[3, 0, 2]
[4, 1, 0]
[3, 2, 0]
[0, 3, 2]
[3, 1, 1]
[1, 2, 2]
[2, 2, 1]
[3, 2, 0]
[3, 2, 0]

In this case the die is not fair; the probabilities for the three sides are [0.5, 0.3, 0.2]. So, if you have a bunch of data and you want to see which multinomial fits it best, your data should look like that -- a bunch of counts from each category. If you're using sklearn, you can also use fractional "counts". Then the above draws are equivalent to this:

[0.2, 0.2, 0.6]
[0.6, 0.0, 0.4]
[0.8, 0.2, 0.0]
[0.6, 0.4, 0.0]
[0.0, 0.6, 0.4]
[0.6, 0.2, 0.2]
[0.2, 0.4, 0.4]
[0.4, 0.4, 0.2]
[0.6, 0.4, 0.0]
[0.6, 0.4, 0.0]

Regardless of whether your data is integer counts or fractional counts, you can see that negative number don't make sense for fitting a multinomial -- no multinomial will ever produce negative counts, so therefore there is no good fit.

feral lodge Dec 18, 2018, 9:44 AM

#

@void anvil By the way, in an old post (https://discordapp.com/channels/267624335836053506/366673247892275221/465938441520152576) I explained and plotted what the standard scaler does to your data, in case you're not sure! Since the standard scaler transforms your data to have zero mean, you'll always get some negative data points. I've personally never been in a situation where I needed to normalize multinomial data. It's usually either in fractional count (ie category frequencies) or integer count form already, so I have always been able to just fit it

void anvil Dec 18, 2018, 1:12 PM

#

@feral lodge thanks for that

Unfortunately this is real world data and it very much needs to be scaled for piping into MLP. I'm setting up data pipelines inside my ensemble class so it can make the appropriate decision on a model by model basis (e.g. feed data in to ensemble class; pipeline scales for MLP, doesn't scale for RF, etc.)

feral lodge Dec 18, 2018, 5:21 PM

#

@void anvil Oh, okay! Could you maybe show some sample data? I've only ever fit multinomials as part of school work and I'm having a hard time understanding why vectors of category counts would need any kind of normalization (other than dividing by the sum to produce category frequencies)

feral lodge Dec 18, 2018, 5:42 PM

#

Or I guess since it's an ensemble you don't really know the nature of the data? The way I see it (though i'm only vaguely familiar with multinomials) we can immediately exclude the multinomial distribution as a model of x if x is not either (1): A vector of positive integers, or (2) A vector of fractions that sum to 1

#

In the same way we can immediately, for example, exclude the Beta distribution if the input is not a real number between 0 and 1; the Gamma distribution if the input is negative; or the Bernoulli distribution if the input is not exactly 1or 0

void anvil Dec 18, 2018, 6:18 PM

#

so an example of the data would be like:

10999    -1    37    1/1/18 00:05
10432    -1    5        1/1/18 02:13
11993     1    3        1/1/18 02:15
12345    1    27    1/1/18 17:29
13500    -1    -13    1/1/18 23:23
13400    1    -150    1/1/18 23:45

#

We know the nature of the data before it gets fed in to the initial models, then the models are fed to an ensemble

#

so it looks something like:

Data
stage 1 models
stage 2 ensemble```

#

And each stage 1 model needs a different data pipeline

#

MNB requires non zero, MLP requires standardized data

#

RF doesn't require any data preprocessing

#

so when you're piping in to the first stage models there's varying preprocessing needed (none, some)

feral lodge Dec 18, 2018, 7:28 PM

#

You know, I think it might work with feature scaling https://en.wikipedia.org/wiki/Feature_scaling rather than standard normalization

#

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html

#

@void anvil

void anvil Dec 18, 2018, 7:48 PM

#

it fucks up way bad when you do 0-1 and you can get features outside

feral lodge Dec 18, 2018, 7:51 PM

#

Ah, true! I'm out of ideas, maybe someone else can pick up from here 😄 To me it just doesn't look like multinomial data, so if it were me I'd probably just not include a multinomial in the ensemble

void anvil Dec 18, 2018, 7:53 PM

#

even if the data isn't multinomial you can still get good results

#

using MNB and using it as part of an ensemble

#

just need it to pick up something the other models don't

#

it's about model diversity rather than single model perfection

small ore Dec 18, 2018, 8:13 PM

#

Out of curiosity, how are you modelling the date and time?

stray elk Dec 18, 2018, 8:20 PM

#

i dont mean to interupt a conversation if one is all ready happening, but I have a question regarding hyperparameter tuning that i cant seem to find an answer to

#

I would like to know if it is possible to dynamically tune hyperparameters that aren't specified in the code

#

I have some code that will iterate over a bunch of different regression models, and then will select the 3 most accurate (based on mae) and create an ensemble model out of them

#

I was wondering if it was possible to be able to tune the parameters on this ensemble model, given I won't know beforehand what ml models will actually be a part of it

#

it could even do the parameter tuning within the ModelTransformer class that I have created for the ensemble model, however if that was the case it would still need to somehow do it dynamically due to not knowing what model could be passed to it

void anvil Dec 18, 2018, 11:39 PM

#

you would tune it the same way you would a regular model tuning

#

you should tune all the 1st layer algorithms

#

if you're going to do that

#

and once you're done tuning hyperparameters on the first layer your inputs to layer 2 don't change

#

so tune all your stuff there

#

etc.

#

just make sure you don't train too many

small shore Dec 19, 2018, 2:55 PM

#

Hello. So I need to run tensorflow gpu on my computer, but I have cuda 10 installed and no cuda 9 support because of rtx not being able to work with it apparently, so how do I get its support/how do I compile tensorflow from the source/able to run cuda 10.

#

idk if this is appropiate to ask here, so excuse me if it isnt

strange radish Dec 19, 2018, 4:16 PM

#

Hey everyone. I'm the maintainer of PyData/Sparse. Join me in talking about this FOSS project in a webinar hosted by Quansight. It starts in 43ish minutes, You can register at https://app.livestorm.co/quansight/

Quansight

Quansight events | Livestorm

Partner with us on Open Source to prioritize your needs, hire from proven contributors, apply proven collaboration techniques to your business, and get support and maintenance for open source.

#

@small ore Sorry if the ping is unwelcome, but as I understand it you wanted to join. 😄

small ore Dec 19, 2018, 4:39 PM

#

I am already trying to connect from the past 5 mins. No luck

strange radish Dec 19, 2018, 5:02 PM

#

😦

#

Just like, click Access Webinar in the registration email.

#

@small ore

#

It's live now.

stray elk Dec 19, 2018, 6:55 PM

#

@void anvil hi, I had a look into tuning the 1st layer algorithms, but I wasn't quite sure how to do that as the first layer parameters are simply the classes for the transformers I have created

#

📎 unknown.png

#

this is part of the output for when i run get_params() on the ensemble model

#

Im not quite sure how i would go about tuning the model as a whole, given 1) i wouldn't know what the parameters are in the first place due to dynamic creation, and 2) I don't know how to access deeper parameter levels given the actual ml models are wrapped in a transformer class as part of a feature union

void anvil Dec 19, 2018, 7:06 PM

#

basically before you pump into the ensemble

#

you want to tune each one individually

#

so the pred_union_modelA

#

just gridsearch all the hyperparameters

#

or w/e

#

then when you run the ensemble, it should be the 'best pred_union_modelA'

#

even if it isn't chosen

stray elk Dec 19, 2018, 7:08 PM

#

yea, that is what i was wondering i would need to do

void anvil Dec 19, 2018, 7:08 PM

#

it's not really efficient computationally

#

but if you have unlimited time / resources that's what I'd do

#

not sure how long your training and stuff takes

#

and how much performance increase you get out of it

stray elk Dec 19, 2018, 7:08 PM

#

yea, this isn't time limited

void anvil Dec 19, 2018, 7:09 PM

#

but yeah basically just lazily gridsearch until you perfect each submodel

#

then train on the optimal models

#

then do the same for each layer

#

until you have your final output

stray elk Dec 19, 2018, 7:09 PM

#

so in that case, I would need to pass the params as an argument in the class instantiation

void anvil Dec 19, 2018, 7:09 PM

#

yeah

stray elk Dec 19, 2018, 7:09 PM

#

cool, figured i would probably need to do it that way

void anvil Dec 19, 2018, 7:09 PM

#

I mean you can do

#

    def stage1:
        for models in dic
            grid_search best  hyper params
        return predictions

    def stage 2 (x,y)
        train stage 1
        return ensemble predictions

stray elk Dec 19, 2018, 7:12 PM

#

thanks for helping out, would the predictions being returned in stage 1 be predictions for the best parameters?

void anvil Dec 19, 2018, 7:12 PM

#

yeah

#

you would grid search hyperparam for each model

#

then only return predictions from the best

#

it's going to be horribly slow likely

stray elk Dec 19, 2018, 7:13 PM

#

ok cool i see, thanks

#

idm about speed, its a personal project

void anvil Dec 19, 2018, 7:13 PM

#

because you'll train a fuck ton of models

#

depending on how in depth you do your gridsearch

stray elk Dec 19, 2018, 7:13 PM

#

i can always do random search

void anvil Dec 19, 2018, 7:13 PM

#

if you do a 10x10x10 you're training 1k models for each model then selecting the best

#

sure

#

just keep in mind you're just adding a ton of bloat

#

and it may or may not affect the results

#

might run a trial of

#

random guess 25 hyperparams for each

#

and 10 and 50

#

and compare the ensemble returns vs the normal ones

#

and see if you really get any improvements

stray elk Dec 19, 2018, 7:14 PM

#

thanks

#

also, I don't mean to bother you more, but while having this convo i have just noticed that nothing i do changes the mae of the ensemble model

#

meaning that something is up with the file, presumably

void anvil Dec 19, 2018, 7:15 PM

#

make sure your train / test splits

#

aren't cheating

stray elk Dec 19, 2018, 7:15 PM

#

my train/test split is 450/200

#

i just ran a test where all 4 models were doing simple ols regression, and the ensemble mae is still 3.15

#

This is one set of results:

📎 unknown.png

#

with ridge, ard and elastic net being the 3 models built into the ensemble

#

and this is a set of results when simply using 3 of the same:

📎 unknown.png

#

as you can see, the mae of the models are incredibly similar, yet the models being used are quite different

#

do you have any idea why this may be?

void anvil Dec 19, 2018, 7:40 PM

#

because the ensemble is able to pick up mostly the same from each set of outputs?

#

look at how correlated each model is to eachother

#

if it's an easy data set and you get great results for each

#

you're probably only going to see marginal improvements

#

so if linear regression is .90 auc and ridgecv is .9001

#

it's only marginally better

#

and you'd only expect a marginally better result

#

by putting in a .9000 vs a .9001 model to the ensemble

#

unless the .9001 is highly uncorrelated to the other models plugged in whereas the .9000 is highly correlated

#

If you try with a much harder dataset, you can probably find substantial improvements

#

as you swap in / out

#

but if every model is similar

#

swapping isn't going to do fuck all

stray elk Dec 19, 2018, 7:48 PM

#

thats the thing, the models are all quite different

#

📎 unknown.png

void anvil Dec 19, 2018, 7:49 PM

#

you need to look at

stray elk Dec 19, 2018, 7:49 PM

#

also, i think its notable that the ensemble model is improving accuracy by just under 1 whole % point

void anvil Dec 19, 2018, 7:49 PM

#

colinearity between model predictions

#

because each set of 4 could be not colinear and be enough for the ensemble to get the same info

#

as the other set of 4

stray elk Dec 19, 2018, 7:50 PM

#

ah ok i see what you mean, although wouldn't the different maes indicate that the collinearity isn't perfect?

void anvil Dec 19, 2018, 7:50 PM

#

not really

stray elk Dec 19, 2018, 7:50 PM

#

whereas the ensemble mae would suggest the colinearity was perfect

void anvil Dec 19, 2018, 7:50 PM

#

well

#

it's sort of saying that 'roughly the same amount of decision making material is present in both sets'

#

if you dump all 8 of them together and the mae doesn't change

#

then you know that both sets of 4 are passing, roughly, the same information

stray elk Dec 19, 2018, 7:52 PM

#

ok, i think i get what you are saying

#

so in this case, accuracy can still be gained by tuning the models separately and then putting them into the ensemble?

stray elk Dec 19, 2018, 8:15 PM

#

ok its taking a while to run and hasn't thrown an error yet, so im pretty sure i have got the tuning working

void anvil Dec 19, 2018, 8:22 PM

#

yeah basically

#

You know about pipelines?

#

'mlp': Pipeline([('transform', scaler.transform), ('clf', MLPClassifier())]),

is throwing
TypeError: All intermediate steps should be transformers and implement fit and transform. '<bound method StandardScaler.transform of StandardScaler(copy=True, with_mean=True, with_std=True)>' (type <class 'method'>) doesn't

stray elk Dec 19, 2018, 8:31 PM

#

yea, i have a pipeline as part of the ensemble builder function

void anvil Dec 19, 2018, 8:46 PM

#

no I mean

#

I'm trying to put one in

#

and it's messing up

#

I'm not sure where to drop the scaler.fit

#

I think I put it here?

        for name, clf in self.learners.items():
            clf.fit(X,y)```

changes to 

```    def train(self, X, y):
        for name, clf in self.learners.items():
             scaler.fit(X,y)  
         clf.fit(X,y)```?

#

nah still breakig]

stray elk Dec 19, 2018, 8:55 PM

#

i have a pipeline, but i have only ever created this one so i don't have much experience

#

this is what mine currently looks like:

📎 unknown.png

#

in fact looking at that i would say i don't even need to use the ModelTransformer at the lin_reg_start, i could just use LinearRegression() on its own

#

well, turns out one of my tuned models is in fact less accurate than when left untuned:

📎 unknown.png

#

@void anvil by removing the ModelTransformer at the lin_reg_start, i got that exact error, so im guessing the first and last steps of a pipeline have to have a fit and transform method?

#

linear regression on its own has no transform method

void anvil Dec 19, 2018, 9:26 PM

#

yeah

#

makes sense

#

thanks

stray elk Dec 20, 2018, 4:47 PM

#

@void anvil Hey, sorry to bother you again, but i don't think the model always returning a mae of 3.15 is because of high colinearity between different models

void anvil Dec 20, 2018, 4:47 PM

#

no I mean they're probably not colinear

#

the combinations probably all give the same info to the second model

#

the sum of information from model group a is ~= model group b

stray elk Dec 20, 2018, 4:47 PM

#

when i comment out the preprocessing stages, RobustScaler and PolynomialFeatures, the resulting mae of the ensemble model is almost the exact same as just regular LinearRegression

#

suggesting to me that the feature union isn't quite functioning right

void anvil Dec 20, 2018, 4:48 PM

#

sure

#

or it could be that 3.15 is all you can learn

stray elk Dec 20, 2018, 4:49 PM

#

so this:

📎 unknown.png

void anvil Dec 20, 2018, 4:49 PM

#

most of your

#

models

#

don't need feature transform

stray elk Dec 20, 2018, 4:49 PM

#

📎 unknown.png

void anvil Dec 20, 2018, 4:49 PM

#

and will behave the same regardless of transform

stray elk Dec 20, 2018, 4:49 PM

#

the mae of the ensemble model which uses ridgecv, ardregression and elasticnetcv is almost the exact same as linear regression on its own

void anvil Dec 20, 2018, 4:49 PM

#

MLP and MNB are the two ones that need

#

transforms

#

And again, there might not be much to learn

stray elk Dec 20, 2018, 4:50 PM

#

wdym by feature transform?

void anvil Dec 20, 2018, 4:50 PM

#

15000, 20000, 25000 => -.66, 0, .66

#

or => 0, 0.5, 1

#

that's what the scaling does

stray elk Dec 20, 2018, 4:51 PM

#

ok, the scaling

void anvil Dec 20, 2018, 4:51 PM

#

if you feed any of those 3 into a linear regression

#

you'll get the same result

#

same with RF

#

etc.

#

or about the same I guess

#

your data could exhibit linear relationships

stray elk Dec 20, 2018, 4:51 PM

#

its feeding 450 different rows of data for the training data

void anvil Dec 20, 2018, 4:51 PM

#

which is why linear is nearly as good as the ensemble

stray elk Dec 20, 2018, 4:52 PM

#

surely one would expect variation between LinearRegression and an ensemble of ARD, RidgeCV and ElasticNetCV?

void anvil Dec 20, 2018, 4:52 PM

#

run a linear regression, look at test statistics, etc.

stray elk Dec 20, 2018, 4:52 PM

#

especially given when done individually, those three return different maes than lr on its own

void anvil Dec 20, 2018, 4:52 PM

#

it might be linear

#

and it might be you can't really learn better than 3.15

stray elk Dec 20, 2018, 4:53 PM

#

is that a thing in ml? a 'maximum' that can be learned by the data at hand?

void anvil Dec 20, 2018, 5:01 PM

#

yeah

#

unless you want to way overfit

#

everything is modeled by y = ax1 + bx2... + randomness

#

more or less

#

you may be at the part that's just randomness

stray elk Dec 20, 2018, 5:05 PM

#

Tbh that makes sense, it’s just that a Mae of 3.15 seems high for the problem I’m dealing ei

#

With*

#

In fairness tho, I think that this is because of the data I’ve chosen to do this with, rather than flaws in the ml implementation

void anvil Dec 20, 2018, 5:08 PM

#

if you want to make some fake data

#

or run the same thing on another training set

#

you can verify it's the data vs what you did

stray elk Dec 20, 2018, 5:17 PM

#

Yea, I think that is how I’m going to do it in the evaluation

#

I will run the same thing on the entire set, hopefully get a more accurate result, and then use that to say it’s the ml model but the data

stable egret Dec 21, 2018, 9:19 PM

#

I am looking at diving deeper and learning about data science, I am coming from a strictly javascript background for the past 10 years tho. Can anyone recommend good courses, resources or even projects I can see the source code of? I am more of a practical learner than a theoretical

sinful forge Dec 21, 2018, 9:46 PM

#

@lean ledge I guess this is the right place

#

Can I get some links? :)

proud jolt Dec 21, 2018, 9:47 PM

#

https://pt.coursera.org/learn/machine-learning

Coursera

Machine Learning | Coursera

Machine Learning from Stanford University. Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, ...

#

https://www.edx.org/school/columbiax

edX

ColumbiaX

lean ledge Dec 21, 2018, 9:47 PM

#

And the Columbia one https://www.edx.org/course/machine-learning-columbiax-csmm-102x-0

edX

Machine Learning

Master the essentials of machine learning and algorithms to help improve learning from data without human intervention.

sinful forge Dec 21, 2018, 9:48 PM

#

Awesome thanks mate

lean ledge Dec 21, 2018, 9:48 PM

#

Mind you, I personally really don't like Andrew Ng's course because of how mathematically shallow it is, given machine learning is basically a field of maths, but other people here get annoyed at me if I say I don't recommend it.

sinful forge Dec 21, 2018, 9:48 PM

#

I'll keep your opinion in mind

#

@proud jolt you got them udemy links? I wouldn't mind accessing them too

proud jolt Dec 21, 2018, 9:58 PM

#

https://www.udemy.com/courses/search/?src=ukw&q=machine+learning

sinful forge Dec 21, 2018, 10:17 PM

#

Thanks bro

void anvil Dec 21, 2018, 11:35 PM

#

anyone know what's up

lone mist Dec 21, 2018, 11:36 PM

#

Probably doesn't like the ~

void anvil Dec 21, 2018, 11:36 PM

#

I've done that everywhere else

#

should I put the whole C:/filepath

#

?

lone mist Dec 21, 2018, 11:37 PM

#

I thought that the tilde wouldn't work at all, but maybe I am misremembering

#

worth a shot to use a full path though

topaz nacelle Dec 21, 2018, 11:37 PM

#

i don't think open expands ~ by default

void anvil Dec 21, 2018, 11:37 PM

#

yeah

#

it dumps if I put the whole filepath

#

pandas takes the ~/shortcut

topaz nacelle Dec 21, 2018, 11:37 PM

#

you'd have to use os.path.expanduser or pathlib

void anvil Dec 21, 2018, 11:37 PM

#

yeah cancer

#

that's fine

#

thanks

topaz nacelle Dec 21, 2018, 11:37 PM

#

"cancer"

void anvil Dec 21, 2018, 11:38 PM

#

When the package makers don't implement the lazy solution everyone else has

#

heaven forbid they don't copy/paste some code or put the warning that they don't

lone mist Dec 21, 2018, 11:39 PM

#

It's likely somewhere in the docs

#

But perhaps it could be clearer. I wont comment on that since I don't remember where it's documented

twilit current Dec 22, 2018, 1:47 AM

#

Hey there, I need some help with a pandas module question 😃

#

Can anyone here take a look at my problem? I've been working on it for a while, but not getting anywhere

#

I've posted my problem in #help-grapes

forest willow Dec 22, 2018, 6:01 PM

#

I'm thinking of taking part in a Code Camp and need to develop an app on Pattern Analysis and Data Visualization, it would be a great help if you guys provided me with some suggestions on it. Thanks!

reef bone Dec 22, 2018, 7:20 PM

#

Quick question about numpy: say I have an array that goes like a = [1, 2, 3] and i want to move all elements left by one step.

numpy.roll(a, -1, axis=0) will produce [2, 3, 1] which is good but I would prefer it not to wrap around, so I'd like to produce [2, 3, 0] instead. Is there a nice way to do this?

#

For now I pad the values with 0s manually but that's not nice 😬

sinful forge Dec 23, 2018, 3:10 AM

#

@forest willow if you find out anything please let me know or tag me because I'm really. Intereted in learning anything data science!

midnight oracle Dec 23, 2018, 7:41 AM

#

from subprocess import check_output

display = check_output(["ls"])
print(display)```

im getting an error on this code... please help

error is:

```py
Traceback (most recent call last):
  File "C:\Users\Omer Kural\Desktop\�mer\Python Projects\Projects\data_projectx.py", line 3, in <module>
    display = check_output(["ls"])
  File "C:\Users\Omer Kural\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "C:\Users\Omer Kural\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 403, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Users\Omer Kural\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Users\Omer Kural\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified```

kind sluice Dec 23, 2018, 8:21 AM

#

hey guys, can anyone recommend a good geospatial analysis tools (preferably for jupyter), what I'd like to see is basically being able to show heatmaps and scatterplots on a map given data (pandas dataframe of real estate stuff) and being able to interactively select regions/shapes so that I could filter those within the selected region and calc selective stats. Folium seems to be the most promising option but maybe there's something with simpler plotting functionality? I also tried geonotebook but it was broken for me.

chilly shuttle Dec 23, 2018, 10:54 AM

#

@kind sluice the closest thing is pyleaflet

dim slate Dec 23, 2018, 3:57 PM

#

@reef bone roll it, then change the last member to 0?

reef bone Dec 23, 2018, 5:44 PM

#

Yeah I ended up writing my own function that takes the axis and index difference as arguments, in my use case the array is multi dimensional and will sometimes have to roll by more than one, so I was looking for something more robust. Now I can one line it. I was surprised that np.roll will always wrap around as filling the "new" indices with 0s seems like something that would be useful more often than wrapping around. Thanks for the answer. @dim slate

dim slate Dec 23, 2018, 6:12 PM

#

Share? @reef bone

stray elk Dec 23, 2018, 6:17 PM

#

@void anvil sorry I cant help with your issue, I just wanted to say that i've been using the wrong metric to measure my model accuracy

#

mae was not good, im now using r2

#

i was getting low mae values, but the predictions were still wildly off the actual results, now that im using r2, all my tuning is working fine, and i've managed to predict the EU referendum within 0.2%

gaunt axle Dec 23, 2018, 10:49 PM

#

Hi!
I've been trying to get a good answer to "why huffman codes is a greedy algorithm"
and I've found:
"The reason that this is a greedy algorithm is that at each stage we
perform a merge without regard to global considerations. We merely
select the two smallest trees." in "Data Structures and Algorithm
Analysis in C" by by Mark Allen Weiss

But still, why would "merge the two smallest trees" be the optimal local solution?

feral lodge Dec 24, 2018, 1:28 AM

#

@gaunt axle Because in the Huffman algorithm, we want to create a binary tree where high-frequency symbols appear high up (near the root) -- higher up than low-frequency symbols.

When we merge two trees, we create a parent node to the trees, pushing them down one level. If we're feeling greedy, what trees look like good choices to push down one level? The trees whose nodes are associated with the lowest-frequency symbols! If we push those trees down, that means that the remaining trees, associated with more frequent symbols, will remain further up in the final tree, which is what we wanted.

That's why it's a greedy algorithm; we take a quick look at the available trees and push down (merge) the trees whose leafs are the lowest-frequency symbols. Your quote calls these trees "smallest", which is not a good choice of words IMO. They can be big and deep, it's just that the sum of their leafs' frequencies is low.

gaunt axle Dec 24, 2018, 1:30 AM

#

I understand better now, thanks :)

feral lodge Dec 24, 2018, 1:32 AM

#

This picture shows what i mean by the trees being "pushed down"

📎 HuffmanCodeAlg.png

heavy apex Dec 24, 2018, 9:25 AM

#

What should I expect from a Data Visualization class next semester? My data science class after finishing data analysis with spread sheets, so not sure how in depth it can really get.

desert oar Dec 24, 2018, 10:59 PM

#

R, Python, and/or D3.js code

#

Likely one of the first two, and possibly the latter

#

Basic statistics

#

Undergrad level? Masters?

terse pewter Dec 27, 2018, 10:20 AM

#

Good to probably familiarize yourself with matplotlib and numpy if you haven't used them already

candid pilot Dec 27, 2018, 11:37 AM

#

Hey everyone! So I recently decided to start my journey on machine learning.
I searched quite a bit, and built my first neural network! It was super fun!
But now I have a kind of "problem". I don't know how to actually use it...
I mean I have tested with random numbers that really don't mean anything. What if I want to test it with something like written numbers recognition, or simple voice commands recognition, how do I "convert" those things to numbers?
Maybe this is a silly question but I'm really new to machine learning...

#

I mean, they need to be converted to numbers right? So they can work with the weights and the activation function

spark summit Dec 27, 2018, 5:08 PM

#

definitely started with python with numpy, scipy, and matplotlib as a substitute for matlab/octave

#

it's like octave with OOP

light cloud Dec 27, 2018, 7:11 PM

#

Is anyone familiar with pyLDAvis? Having a deprecation issue

hardy crag Dec 27, 2018, 8:08 PM

#

@candid pilot depends on what you are doing. If you are doing image based tasks, the images you are using are already "numbers" in the way that for each pixel there is a value between 0 and 255 (in grayscale) and 3 values for a RGB image.

#

If you are doing natural language processing, e.g. translation between languages, you need to encode the "text" you are using into numbers.

#

For more common tasks there are already lots of ideas how to accomplish that, however it really depends on your data and your task

candid pilot Dec 27, 2018, 8:11 PM

#

Oh ok thanks! So for example, if I want to use voice recognition, is there anything for that already created?

#

Like "Yes" or "No", is there anything I can use to "convert" that to numbers?

hardy crag Dec 27, 2018, 8:13 PM

#

by voice recognition do you mean finding words in sound files

#

?

candid pilot Dec 27, 2018, 8:15 PM

#

Yeah

hardy crag Dec 27, 2018, 8:16 PM

#

so afaik sound is already represented as a series of numbers

#

you might want to check out this https://www.kaggle.com/davids1992/speech-representation-and-data-exploration

#

it's a data exploration of an audio dataset

candid pilot Dec 27, 2018, 8:47 PM

#

Thanks!

carmine lava Dec 27, 2018, 8:53 PM

#

Darknet any one try

spark summit Dec 27, 2018, 9:07 PM

#

we don't discuss that here

sinful forge Dec 28, 2018, 1:42 AM

#

Lol

#

What about Skynet? :)

prime elm Dec 28, 2018, 4:39 AM

#

Proper topic i’d say

#

As long aa skynet is a bunch of LSTMs huehue

placid snow Dec 28, 2018, 7:44 AM

#

#databases y'all seem to have a misplaced question there GWchadThink

scarlet salmon Dec 28, 2018, 2:21 PM

#

Hey guys, so I've got some simple stuff that I need to do for an assignment

#

I'm basically building a simple k means clustering program

#

    def kmeans(m,c):
        cluster1 = []
        cluster2 = []
        cluster3 = []
        for x in c:
            if x-m[0]>x-m[1] and x-m[2]:
                cluster1.append(x)
            if x-m[1]>x-m[0] and x-m[1]:
                cluster2.append(x)
            if x-m[2]>x-m[1] and x-m[2]:
                cluster3.append(x)

This is what I have so far which I'd expect to go through one iteration

#

To do: write something to store new means, go through multiple iterations, and terminate the function

#

Before I go any further I'd just like some confirmation that this should work as intended

small ore Dec 28, 2018, 5:05 PM

#

Did you mean to do if x-m[0]>x-m[1] and x-m[0]>x-m[2]: or if x-m[0]>x-m[1] > x-m[2]: instead? For each of those if conditions

#

@scarlet salmon

scarlet salmon Dec 28, 2018, 5:05 PM

#

Oh, sorry

#

I've solved this now

#

Here it is finished

#

import numpy as np
from itertools import chain
with open(r"C:\Users\Evan\Desktop\Patch 4 year 2\data.txt") as data:

    means = data.readline()
    means = [int(n) for n in means.split(", ")] 
    
    numbers1 = data.readlines()[0:] #reads all lines except first
    numbers = [elem.strip().split(';') for elem in numbers1]
    
    clusters = list(chain(*numbers)) #lists numbers
    clusters = map(int, clusters) #changes list to integers
    
    def kmeans(m,c):
        for counter in range (10):
            cluster1 = []
            cluster2 = []
            cluster3 = []
            for x in c:
                c1 = abs(x - m[0])
                c2 = abs(x - m[1])
                c3 = abs(x - m[2])
                if c1 < c2 and c1 < c3:
                    cluster1.append(x)
                elif c2 < c1 and c2 < c3:
                    cluster2.append(x)
                elif c3 < c1 and c3 < c2:
                    cluster3.append(x)
            print "mean = " + str(m[0])+ ' ' + str(cluster1)
            print "mean = " + str(m[1])+ ' ' + str(cluster2)
            print "mean = " + str(m[2]) + ' ' + str (cluster3)
            print "end of iteration"
            print '\n'
            if np.mean(cluster1) == m[0] and np.mean(cluster2) == m[1] and np.mean(cluster3) == m[2]:
                return
            m = []
            m.append(np.mean(cluster1))
            m.append(np.mean(cluster2))
            m.append(np.mean(cluster3))
        return
    
    kmeans(means, clusters)

placid snow Dec 28, 2018, 6:57 PM

#

You can technically dedent almost everything in that code

#

except means = data.readline()

desert oar Dec 29, 2018, 1:40 AM

#

what?

#

i still wouldnt do data = open() even in a script

#

just a bad habit

#

and asking for mistakes

#

especially since so much data science is done in notebooks

#

where you can have a process run for days depending on your setup

carmine plinth Dec 29, 2018, 9:47 AM

#

can you recommend framework for chatbot?

candid pilot Dec 29, 2018, 10:56 AM

#

When training a neural network, is it ok to train it in different loops and each loop as a target? For example: I'm doing a voice recognition, is it ok to train for each word in a different loop or should I mix everything?

hardy crag Dec 29, 2018, 1:17 PM

#

@candid pilot You should mix all the words. If you sort them the network will "forget" earlier words.

candid pilot Dec 29, 2018, 1:17 PM

#

Ok thanks!

scarlet salmon Dec 30, 2018, 3:28 PM

#

Can anyone tell me why this plot doesn't show a line that fits the data well?

#

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq


time = np.arange(0.0,35.1,5.0)
pop = np.array([12,48,84,113,195,225,193,188])


def logistic(pars,t):
    r,K,y0 = pars
    return K*y0*np.exp(r*t)/(K+y0*(np.exp(r*t)-1))

def logistic_resid(pars,t,data):
    return logistic(pars,t)-data

# code to run the Levenberg-Marquardt algorithm

#initial values - can you adjust these to better guesses?
p0 = np.array([1.0,1.0,1.0])

lsq_out = leastsq(logistic_resid, p0, args=(time,pop))
# code to plot the data and fitted values goes here
plt.plot(time, pop, 'h', logistic([0.224119,207.214,13.2892],time))

#

The plot I get back is:

#

📎 6884da5c06.png

#

I feel like I'm making a very basic mistake

dire atlas Dec 30, 2018, 4:40 PM

#

plt.plot(time, pop, 'h')
plt.plot(logistic([0.224119,207.214,13.2892],np.arange(0,35,1)))

#

📎 log.png

#

logistic([0.224119,207.214,13.2892],time))

this only generates 8 values, and since you don't specify it's corresponding x values it just uses x=1,2,3...8

#

you could also just specify the x values like this

plt.plot(time, logistic([0.224119,207.214,13.2892],time))

scarlet salmon Dec 30, 2018, 4:53 PM

#

Ah, thank you

#

I wonder if you'd be able to help with the project I have going on that's lead on from this

#

https://www.reddit.com/r/learnpython/comments/aaxb3w/fitting_given_data_to_a_model_that_has_both_odes/

r/learnpython - Fitting given data to a model that has both ODEs a...

1 vote and 0 comments so far on Reddit

#

It's detailed here (bit of a long post)

sweet ember Dec 30, 2018, 5:49 PM

#

Someone help pls. This is the titanic train dataset. what do i do?

📎 error.PNG

#

Solved it. Stupid me forgot to put the square brackets

scarlet salmon Dec 30, 2018, 6:35 PM

#

So I currently have this

#

from scipy.integrate import odeint
import numpy as np
from scipy.optimize import leastsq

time = np.array([0,168,336,504,672,840,1008,1176,1344,1512,1680,1848,2016,2184,2352,2520,2688,2856])
pop = np.array([2,27,43,36,39,32,27,22,13,10,14,14,4,4,9,3,3,1])
# defines the model equations
def amr_ode(x,t,params):
    B, G, Da = params
    S, R, A = x
    derivs = [0.5*(1-R+S/10**7)*0.6577*S-0.025*S-(B*S*R/R+S), 0.5*(1-G)*(1-R+S/10**7)*0.0000156*R-0.025*R+(B*S*R/R+S), -Da*5.6]
    return derivs

def amr_run(pars,t):
    B,G,Da,S0,R0,A0 = pars
    ode_params=[B,G,Da]
    ode_starts=[S0,R0,A0]
    out = odeint(amr_ode, ode_starts, t, args=(ode_params,))
    return out

def amr_resid(pars,t,data):
    B,G,Da,S0,R0,A0 = pars
    ode_params=[B,G,Da]
    ode_starts=[S0,R0,A0]
    out = odeint(amr_ode, ode_starts, t, args=(ode_params,))
    return amr_run(pars,t)-data

p0 = [0.00001, 0.5, 400, 1, 1, 1]
lsq_out = leastsq(amr_resid, p0, args=(time,pop))
lsq_out

#

For the problem indicated in the above reddit post

#

But I get this error: ValueError: operands could not be broadcast together with shapes (18,3) (18,)

scarlet salmon Dec 30, 2018, 7:02 PM

#

Okay I've fixed the above issue

#

I now get a graph (yay) but the estimated line of best fit falls completely flat ( 😦 )

#

📎 687bb45b11.png

#

from scipy.integrate import odeint
import numpy as np
from scipy.optimize import leastsq
import matplotlib.pyplot as plt

time = np.array([0,168,336,504,672,840,1008,1176,1344,1512,1680,1848,2016,2184,2352,2520,2688,2856])
pop2 = np.array([2,27,43,36,39,32,27,22,13,10,14,14,4,4,9,3,3,1])

# defines the model equations
def amr_ode(x,t,params):
    B, G, Da = params
    S, R, A = x
    derivs = [0.5*(1-R+S/10**7)*0.6577*S-0.025*S-(B*S*R/R+S), 0.5*(1-G)*(1-R+S/10**7)*0.0000156*R-0.025*R+(B*S*R/R+S), -Da*5.6]
    return derivs

# runs a simulation and returns the population size
def amr_run(pars,t):
    B,G,Da,S0,R0,A0 = pars
    ode_params=[B,G,Da]
    ode_starts=[S0,R0,A0]
    out = odeint(amr_ode, ode_starts, t, args=(ode_params,))
    return out[:,0] # we only return the population size - we don't worry about the substrates as they are not measured

# residual function. Note that the parameters for optimization include all of the starting values, but the parameters for the ODE do not
def amr_resid(pars,t,data):
    B,G,Da,S0,R0,A0 = pars
    ode_params=[B,G,Da]
    ode_starts=[S0,R0,A0]
    out = odeint(amr_ode, ode_starts, t, args=(ode_params,))
    return amr_run(pars,t)-data

p0 = [1e-05, 0.5, 400, 1, 1, 1]
lsq_out = leastsq(amr_resid, p0, args=(time,pop2))
plt.plot(time, pop2, 'h')
plt.plot(time, amr_run([0.000148593, 17.7663, 112260, 1.99715, -20.8516, -313.528], time))
plt.show()

#

If anyone would be so kind as to want to help out and wants a link to the full problem sheet I can send it via PM

small pumice Dec 30, 2018, 11:48 PM

#

Okay, I've got a rather long question here:

I am working on a project that uses neural networks and satellite data to predict wildfires. I am using the Google Earth Engine Javascript API and will use Keras to train a deep ANN. The network will take the temperature, humidity (if I can get the data), and vegetation (I might use NDVI if possible). (Just in advance, part of my question won't necessarily have to do with datasets and more with neural networks, I just want to get more done in one question).

I am using the MODIS satellite to find the temperature of given areas within a given timeframe using the Land Surface Temperature and Emissivity dataset. I am able to do this with the following code:

#

var dataset = ee.ImageCollection('MODIS/006/MOD11A1')
              .filter(ee.Filter.date('2018-12-10', '2018-12-23'));
var landSurfaceTemperature = dataset.select('LST_Day_1km');
var landSurfaceTemperatureVis = {
  min: 13000.0,
  max: 16500.0,
  palette: [
    '040274', '040281', '0502a3', '0502b8', '0502ce', '0502e6',
    '0602ff', '235cb1', '307ef3', '269db1', '30c8e2', '32d3ef',
    '3be285', '3ff38f', '86e26f', '3ae237', 'b5e22e', 'd6e21f',
    'fff705', 'ffd611', 'ffb613', 'ff8b13', 'ff6e08', 'ff500d',
    'ff0000', 'de0101', 'c21301', 'a71001', '911003'
  ],
};
Map.setCenter(6.746, 46.529, 2);
Map.addLayer(landSurfaceTemperature, landSurfaceTemperatureVis, 'Land Surface Temperature');
print(landSurfaceTemperature);

// map over the image collection and use server side functions
var tempToDegrees = landSurfaceTemperature.map(function(image){
  return image.multiply(0.02).subtract(273.15);
});
// print and add to the map
print('image collection in temp in degrees', tempToDegrees);
Map.addLayer(tempToDegrees, {min: -20, max: 40, palette: landSurfaceTemperatureVis.palette}, 'temp in degrees');

With this code, I can click on a specific area on the map and get a graph of the temperature within a specified timeframe. How would I go about turning this into a Python array, with the temperatures of 1 km squares with their respective coordinates? I also want to be able to find such array for humidity and vegetation.

Second, I am also using the Terra Thermal Anomalies & Fire Daily Global 1km MODIS dataset for my wildfire data. I want to combine this data with the temperature data to find whether a wildfire will occur in a 1 km square within a month. How can I turn this into an array that corresponds with the other array(s)?
Overall, I want to build a neural network that, for input data, takes the temperature, humidity, and vegetation in a given area and output the likelihood of a fire occurring in the area within a month.

hardy crag Dec 31, 2018, 12:15 AM

#

@scarlet salmon can you please explain the pops? why is p0 not in use?

scarlet salmon Dec 31, 2018, 12:18 AM

#

I thought p0 was in use in the least squares function to calculate estimates for the values I want to know

#

I'll be honest I think I'm in way over my head

hardy crag Dec 31, 2018, 12:19 AM

#

right, but why not also in the plotting?

scarlet salmon Dec 31, 2018, 12:20 AM

#

Since when I plot I want the original data against a predicted line of best fit

hardy crag Dec 31, 2018, 12:20 AM

#

(it doesn't really change anything if it's used, just curious where the numbers in plt.plot(time, amr_run([0.000148593, 17.7663, 112260, 1.99715, -20.8516, -313.528], time))
come from

#

)

scarlet salmon Dec 31, 2018, 12:20 AM

#

p0 is guesses at what the values may be

#

Those values come from least square estimating what the values are

hardy crag Dec 31, 2018, 12:29 AM

#

Is your goal to actually solve the differential equations or just fit the pop2 data?

scarlet salmon Dec 31, 2018, 12:31 AM

#

Mostly just to fit the pop2 data so I have values for B, G, Da

#

If you want to help out I can DM you the project sheet if you'd like

#

It explains things a bit better than I can typing over discord

hardy crag Dec 31, 2018, 12:37 AM

#

sure

sweet ember Dec 31, 2018, 1:22 PM

#

What have missed in here? help pls

📎 error.PNG

dire atlas Dec 31, 2018, 1:55 PM

#

what are you trying to do?

#

plt.plot takes 2 arrays of the same length as input

sweet ember Dec 31, 2018, 3:00 PM

#

I am trying to do take X as features and y as the labels. This is the titanic dataset from kaggle. I want to evalute the models I knowa and submit it

#

same lenght?

📎 error.PNG

hardy crag Dec 31, 2018, 3:15 PM

#

if titanic_data is a pandas DataFrame, you are trying to plot many columns at once

#

which obviously does not work, because you only have two dimensions

reef bone Dec 31, 2018, 3:27 PM

#

There should be an error at the end of the traceback

#

What does it say?

sweet ember Dec 31, 2018, 4:15 PM

#

I get this "AttributeError: 'NoneType' object has no attribute 'update'"

#

how do I do it then?

small pumice Dec 31, 2018, 6:17 PM

#

Hey could someone please help me with the question I asked yesterday? It’s for a project that I need to finish soon.

lapis sequoia Dec 31, 2018, 9:32 PM

#

anyone good with data analytics?
or know a thing for data i don't think R will be useful. I have 5 columns.

source address
destination address
protocol
source port
destination port

and sometimes destination port has a name instead of an int

#

this is the data in maltego

📎 DeepinScreenshot_select-area_20181231201313.png

desert oar Jan 1, 2019, 12:07 AM

#

What is your question @lapis sequoia

#

@small pumice what is your question

small pumice Jan 1, 2019, 12:08 AM

#

Just scroll up a bit

#

It’s kind of long

void anvil Jan 1, 2019, 3:46 AM

#

@small pumice set an array with geocoordinates [x,y] to be the grid value at [x,y] based off of whatever abitrary value you want to choose as 0,0

#

then dump it into a single DF with an array with columns of whatever values you have + temp + the values in the 8 directions around the 1x1 square

#

or create values for each square in a time series df and add in lags

#

withg thousands of predictor variables

#

assuming you have enough time

sweet ember Jan 1, 2019, 4:27 AM

#

@hardy crag what do I do about it then?

vital bison Jan 1, 2019, 6:31 AM

#

so how many of you have appeared for GRE exams
im planning for MSDS

#

and get into data science

hardy crag Jan 1, 2019, 12:48 PM

#

@sweet ember you need plot single columns. check out https://pandas.pydata.org/pandas-docs/stable/visualization.html

#

titanic_data.plot()

#

and see what happens maybe. or you could do

#

for col in titanic_data.columns:
    titanic_data.plot(x='col', y='survived')

#

(this doesn't really make sense if the columns have different data types)

sweet ember Jan 1, 2019, 1:29 PM

#

I got the output for the plot titanic_data.plot()

#

Thanks!

#

The other one did't work

small ore Jan 1, 2019, 3:36 PM

#

@sweet ember By other one did you mean the for loop? Maybe try without the ' ' for col

sweet ember Jan 1, 2019, 3:39 PM

#

I tried it and got this

📎 error1.PNG

#

Also when I try to fit it in LinearRegression I get ValueError: could not convert string to float: 'Q'

small ore Jan 1, 2019, 4:53 PM

#

Unclean data? Also it helps if you can show the head. Titanic_data.head()

desert oar Jan 1, 2019, 5:12 PM

#

what do you expect to happen @sweet ember ?

#

survived takes 2 values, and you are plotting that against passenger id

#

it's oscillating between 1 and 0

#

and drawing a line between them

#

can you describe in words what kind of plot you are trying to create?

sweet ember Jan 1, 2019, 5:19 PM

#

Titanic.head()

📎 head.PNG

#

I actuaclly want to plot the the raph that I get to see how many survived with reepect to each column so that I can identify what model to use

desert oar Jan 1, 2019, 5:20 PM

#

what do you mean "how many survived with respect to each column"

#

can you give a few examples of columns and describe how the chart should look

small ore Jan 1, 2019, 5:21 PM

#

I think histograms are a better option but not really for every column. Doesnt make sense for passenger id for example

sweet ember Jan 1, 2019, 5:21 PM

#

I maede this

📎 isthatit.PNG

#

I want to plot age against survived, price against survived etc

small ore Jan 1, 2019, 5:23 PM

#

I don't think a passenger id or a name wouldmake sense for one of the axes to be plotted with the 'survived'

sweet ember Jan 1, 2019, 5:24 PM

#

unclean data but too scared to drop them as may lose valuable data

📎 error.PNG

desert oar Jan 1, 2019, 5:24 PM

#

i still dont understand what you expect here

#

do you want a histogram in each category (survived / perished)?

#

a violin plot? box plot?

#

horizontally jittered point cloud?

#

its just not clear how you expect to plot a continuous data set vs a categorical one

small ore Jan 1, 2019, 5:25 PM

#

something like:

plot_cols = ['Age', 'pclass', 'Sex'....]
for col in plot_cols:
   <plot functions here>

might help

desert oar Jan 1, 2019, 5:25 PM

#

and pandas is just making a guess

#

if you just want points like in that pairs matrix, use .plot.scatter instead of .plot

#

.plot is for lines, .plot.scatter is for points

#

but you will need to jitter said points on the categorical axis, otherwise it's just a line and you can't actually see the density of the data in each category

small ore Jan 1, 2019, 5:27 PM

#

Salt rock lamp, by histogram, I meant number survived vs relevant columns

desert oar Jan 1, 2019, 5:27 PM

#

wait whose question am i answering lol

small ore Jan 1, 2019, 5:27 PM

#

His. Not mine 😃

sweet ember Jan 1, 2019, 5:29 PM

#

Yes I tried to create featurees as X but was not able to due to atrribute error

#

📎 eeee.PNG

desert oar Jan 1, 2019, 5:30 PM

#

back up

#

stop writing code

#

write down in words what you are trying to do

#

then write the code to achieve what you wrote down in words

sweet ember Jan 1, 2019, 5:31 PM

#

I want to do this with each column

📎 Kaggle_Titanic_02.png

desert oar Jan 1, 2019, 5:31 PM

#

ok great. so focus on how to do it for one column first

small ore Jan 1, 2019, 5:33 PM

#

You were trying to plot the 'survived' column as opposed to the number survived in a normal line plot

#

I suggest going through this: https://github.com/MicrosoftLearning/Principles-of-Machine-Learning-Python/blob/master/Module2/VisualizingDataForRegression.ipynb
Gives you a nice start on visualizing data

GitHub

MicrosoftLearning/Principles-of-Machine-Learning-Python

Principles of Machine Learning Python. Contribute to MicrosoftLearning/Principles-of-Machine-Learning-Python development by creating an account on GitHub.

desert oar Jan 1, 2019, 5:35 PM

#

@sweet ember https://stackoverflow.com/a/6873956/2954547

Stack Overflow

Plot two histograms at the same time with matplotlib

I created a histogram plot using data from a file and no problem. Now I wanted to superpose data from
another file in the same histogram, so I do something like

n,bins,patchs = ax.hist(mydata1,100...

#

plt.clear()
ax = plt.gca()
ax.hist(titanic.loc[titanic['Surived'], 'Age'].values, alpha=0.5, bins=16, label='Survived')
ax.hist(titanic.loc[~titanic['Survived'], 'Age'].values, alpha=0.5, bins=16, label='Did not survive')

something like that should do what you want

#

now you have to figure out how to automate that for every column in the dataset

#

also .hist won't work with a column that's already categorical like gender

#

so you will think a bit more on how to generalize your plots to work with all columns in the data

#

and yes, name and passenger id are poor choices for this visualization method...

sweet ember Jan 1, 2019, 5:38 PM

#

Thanks now I ll try that now and get back @small ore , @desert oar

#

So i ll just use age, gender and class then

haughty wind Jan 2, 2019, 3:40 AM

#

This is kind of more of a hardware question - I want to upgrade my ram from 8 GB to 16 or 24 GB, will the extra 8 GB really matter if I want to run neural nets and such or is it not going to be that useful?

reef bone Jan 2, 2019, 9:04 AM

#

It heavily depends on what you're trying to do.

will the extra 8 GB really matter if I want to run neural nets

They will matter if you're doing something that requires more than 8 GB. Ideally you would want to have all the data you're working with loaded in RAM, but depending on the dimensionality and overall size it might not be possible. 8 GB extra can make a big difference, but it's possible that as a beginner you won't have much use for it. Also keep in mind that we generally want to utilize GPUs for neural nets, the performance gain is substantial, so in this case the GPU's memory also matters.

desert oar Jan 2, 2019, 2:19 PM

#

Ive hit double digits memory usage just fitting an SVM

#

Gradient descent should in theory be much more memory efficient tho

#

Having a ton of RAM basically means you dont have to think all that hard about memory efficiency

#

Vs if you are RAM-constrained you gotta be careful with sparsity and stuff

#

In this day and age i would upgrade to 16 pretty much no matter what

#

And 24 would be a nice buffer, also add some future proofing as well

#

Especially if you plan to do stuff other than just machine learning on this machine

#

Basically 24gb frees you from having to care about memory usage day to day

reef bone Jan 2, 2019, 2:58 PM

#

Yeah, it depends. 16 GB of RAM is useful to have, definitely. I'm currently training an LSTM on a chatlog corpus with 500,000 expressions, the np arrays holding the data aren't too big because i can use sparse matrices and sparse categorical crossentropy, but my GPU suffers and I get OOM errors when I try batch sizes above 32

haughty wind Jan 2, 2019, 6:44 PM

#

For GPU memory, is that the VRAM in the system? I know it's important if I want to run CNNs

small pumice Jan 2, 2019, 7:59 PM

#

Hi,
I’m trying to use Google Earth Engine to make a neural network that can predict whether a fire will occur in a given area. I want the neural network to take in temperature, humidity, and amount of vegetation in a given area and output whether a wildfire will occur in the area within a month. What kind of neural network should I use? I was thinking a deep network because it would be the easiest in terms of data preprocessing. Would a recurrent network also work well for this situation?

steady escarp Jan 2, 2019, 11:27 PM

#

Oh, nice! A data science channel!

#

That's what I want to do.

#

Also, are graphing modules usually a little tougher to install than other modules?

#

I've been working on a project that tracks bird behaviours, and want to graph those sightings.

#

It's pretty dang cool.

lapis sequoia Jan 2, 2019, 11:38 PM

#

Can I see it?

small pumice Jan 2, 2019, 11:45 PM

#

@steady escarp do you mean modules like Matplotlib? In that case, just use “pip install matplotlib”.

steady escarp Jan 2, 2019, 11:46 PM

#

Yeah, I've installed matplotlib. I just wanted the basemap module specifically. Matplotlib is pretty great so far.

#

I just want a map that uses lat and long coordinates.

small pumice Jan 3, 2019, 1:44 AM

#

Oh I get it

#

Physically graph it

#

@steady escarp You might want to look at gmplot:
https://www.google.com/amp/s/www.geeksforgeeks.org/python-plotting-google-map-using-gmplot-package/amp/

GeeksforGeeks

Python | Plotting Google Map using gmplot package - GeeksforGeeks

gmplot is a matplotlib-like interface to generate the HTML and javascript to render all the data user would like on top of Google Maps. Command… Read More »

steady escarp Jan 3, 2019, 2:07 AM

#

Oh, thats kinda neat. Using google maps. Maybe I can create a scatter plot using this. Thanks!

small pumice Jan 3, 2019, 3:04 AM

#

np

opal knot Jan 3, 2019, 7:13 AM

#

Hey guys, wanted to know something

#

I have been doing data analysis and machine learning for more than a year

#

And I can deploy full blown apps for these purposes (frontend, backend, deployment etc.)

#

How do i go from being a data analyst to a data scientist? Am I already technically a data scientist?

reef bone Jan 3, 2019, 7:24 AM

#

@haughty wind Yes, GPU memory is usually referred to as VRAM. Convolutional networks can be quite costly, usually you would downsample inbetween conv layers.

sinful forge Jan 3, 2019, 3:19 PM

#

@reef bone you said you should utilise GPU. If we running the AI on a server In a Datacentre they don't use for right so it would need to be CPU and ram no?

#

Also I'm very interested in your 500,000 expression corpus data set you have. Can we converse sometime on this?

reef bone Jan 3, 2019, 3:25 PM

#

I'm not sure what you mean by the first question

#

And yes feel free to ask anything, it's nothing spectacular, just a compilation of data I'm using at the moment for a project

sinful forge Jan 3, 2019, 3:30 PM

#

Oh I mean you said that it's best to run neural networks on GPU rather then ram. That's fine I guess if you are using it on a desktop but what If I needed more processing power or I wanted it running 24 7 off my PC. I'd use a Datacentre dedicated server no? They don't normally have GPU in them right?

#

Oh nevermind

#

https://www.enahost.com/gpu-dedicated-server

ENAHOST

GPU Dedicated server | ENAHOST

reef bone Jan 3, 2019, 3:34 PM

#

I've never used cloud computing services but I'm sure ones aimed at machine learning will offer powerful GPUs. It depends on the plan you have. You can refer to AWS for example (first one I checked).
https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html

Recommended GPU Instances - Deep Learning AMI

A GPU instance is recommended for most deep learning purposes. Training new models will be faster on a GPU instance than a CPU instance. You can scale sub-linearly when you have multi-GPU instances or if you use distributed training across many instances with GPUs.

sinful forge Jan 3, 2019, 3:34 PM

#

My info is old

#

Awesome thanks

reef bone Jan 3, 2019, 3:35 PM

#

Regardless, you would normally use the GPU for training. Once trained, inference isn't nearly as costly so provided you can hold the model in RAM can be done even on regular computers.

sinful forge Jan 3, 2019, 3:35 PM

#

I'm still learning python. Takes time likebeer

#

Awesome thanks for the info!

reef bone Jan 3, 2019, 3:35 PM

#

pepwink

sinful forge Jan 3, 2019, 3:37 PM

#

I want to create a AI that's able to hold a text conversation and learn from me typing to it. Wonder how possible that is?

#

👌

reef bone Jan 3, 2019, 3:46 PM

#

It's a fun project to do and definitely possible, there's a lot of ways to approach that

#

For example, if you've used keras before, turning the example seq2seq model they have into a chatbot should be fairly simple, but I can't promise you it'll give good results

#

https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

#

They have a character level model for translation, using the keras embedding layer it should be fairly straight forward to turn this into a word level model

#

learn from me typing to it might be a bit difficult to do, usually you would either train on a publicly available dataset (cornell movie lines comes to mind, that one is quite popular), or make your own by scraping reddit comments for example

#

Oh, in fact that article shows how to turn it into a word level model

#

So that's something you can use for inspiration and maybe have a look at github and see what others have come up with

#

However, I probably wouldn't recommend this as a starter project, NLP and sequence modelling come with some difficulties that might be a little daunting for beginners, so if you're just starting maybe something like image classification would be a more fitting project

#

LSTMs are still a bit of magic to me as well

sinful forge Jan 3, 2019, 9:32 PM

#

Wow. Thank you so much @reef bone !!

full shard Jan 4, 2019, 12:34 PM

#

Hello, I've got the following problem - I'm trying to use supervised machine learning to differentiate between multiple classes of data. The data I am plugging in consists of, essentially, 500-point scatter plots - therefore, my feature vector per sample is all the x values, followed by the matching y values (each normalized to between 0 and 1). I am using AdaBoost as a multiclass classifier, and it works well with unseen data up to 3-4 classes. However, with 6-7 classes, as some of the classes are more similar (although they can be differentiated by eye), the classification breaks down. Would anyone have any suggestions as to what other multiclass classification methods to use? And is the x-y value feature vector a good option, or should I use the average and standard deviation of each variable instead? (turning the feature vector from a length of 1000 into a length of 4)
Thank you for any suggestions!

chrome lily Jan 4, 2019, 5:16 PM

#

Context: A group of programs needs to be run in many servers on wich the program can be distributed
Problem: For each task each server has a diffefent execution time (example: CPU and RAM available)
Objective: Find a combination of servers wich allow minimal time of execution of a program/software (adding of tasks)

📎 20181203_151004.jpg

#

Servidores = Server and Tarefas = Tasks

#

This is my first year project im pretty much new to python

#

I'd be forever grateful for any help

carmine lava Jan 4, 2019, 5:59 PM

#

Any one know how to train faster rcnn

#

Any tutorials

hardy crag Jan 4, 2019, 10:11 PM

#

@chrome lily what kind of class? optimization? programming? any more information available? I guess you should try to form a equation about transfer time and calculation time and then minimize the complete time.

#

@carmine lava your question is rather broad. googling for 10 sec revealed this: https://www.analyticsvidhya.com/blog/2018/11/implementation-faster-r-cnn-python-object-detection/

Analytics Vidhya

Pulkit Sharma

Implementing Faster R-CNN in Python for Object Detection

Introduction Which algorithm do you use for object detection tasks? I have tried out quite a few of them in my quest to build ...

chrome lily Jan 4, 2019, 10:23 PM

#

@hardy crag
Its a programming class, artificial inteligence. My teacher gave me this assingment and Im a bit lost on how to do it, Ive just started to learn python a couple weeks ago and was a little intimidated.

[Problem 1]
Context: A group of programs needs to be run in many servers in wich they can be organized/distributed
Problem: For each task each server has a different execution time (example: Too much load in server; CPU and RAM available)
Objective: Finding the combination of servers who allow minimizing the time of execution of a program (SUM of tasks)

[Problem 2]
Context: Many predicting programs must be executed and they generate different volume of data.
Problem: Each program generates different volume of data and the server wich executes them have a limited capacity of storage.
Objective: Determinate the maximum number of programs wich can run in a given server and determinate the set of minimum servers in order for them to execute all predict programs.

hardy crag Jan 4, 2019, 10:27 PM

#

do you have an idea about how to solve these problems in general (without the programming part)? What part of your class would you think is best suited for this task?

chrome lily Jan 4, 2019, 10:35 PM

#

Basically we have a 2D cost matrix, with the values of the execution times of each server for each task

And I have an array with the same row number and columns as the previous one, but only have 1s and 0s

This makes the activation of a certain task-server pair

1 - enabled, 0 - disabled

Then we implement the genetic algorithm on top of that

I also took a pic today if it helps

#

📎 20190104_223527.jpg

#

Sorry about my phone quality

#

@hardy crag

hardy crag Jan 4, 2019, 10:49 PM

#

right, you have the theory down. I recommend setting up a python script that either gets the matrices as input of some kind (e.g. txt or csv) and calls a class which contains the actual optimizer.

#

(if the input is always the same matrix, then you can just hardcode them)