#data-science-and-ml

velvet thorn Sep 7, 2020, 11:38 PM

#

I guess my PoV is that a lot of people treat DataFrames as slightly better 2D lists

#

which leads to avoiding a lot of the conveniences and tools that pandas gives you

#

e.g. vectorisation

#

anyway yeah @misty mica it also depends on what you want to do with the data after you're done with it

#

that'd influence storage concerns

#

just for completeness, it's worth noting that the SQL way to handle this (canonically) would be to create two separate tables

#

and in the second table each row would be an original filename ID, an index, and one part of the filename

misty mica Sep 7, 2020, 11:40 PM

#

I split it into a list so I could do some tag-like analysis, but I think just storing the string instead of a list is adequate.

velvet thorn Sep 7, 2020, 11:40 PM

#

if you did that you could just do df[(df['index'] == 1) & (df['filename'] == whatever)]

#

which is nice, but also adds cognitive overhead because now you have to juggle two tables

misty mica Sep 7, 2020, 11:41 PM

#

In general for most things I'll only care about the files that are formatted according to my standard example above, so I'm going to store those three fields and also filename as a string.

#

Thanks for the assistance!

velvet thorn Sep 7, 2020, 11:44 PM

#

yw

indigo obsidian Sep 7, 2020, 11:48 PM

#

speaking of snake_case vs camelCase, constantly going between R and Python for school is starting to destroy my sanity 😫

velvet thorn Sep 7, 2020, 11:48 PM

#

hm

#

I use TypeScript (camelCase) for my frontend and Python (snake_case) for my backend

#

it's p okay but

#

because API responses are returned in snake case

#

I ALSO have snake case variables in my TypeScript

#

🥴

misty mica Sep 7, 2020, 11:58 PM

#

That's funny, I've been wanting to use camel/pascal casing in python because my databases all tend to be snake case.

indigo obsidian Sep 7, 2020, 11:59 PM

#

is there anything inherently wrong with camelCase in python? or just a matter of good/bad practice?

velvet thorn Sep 7, 2020, 11:59 PM

#

is there anything inherently wrong with camelCase in python? or just a matter of good/bad practice?
@indigo obsidian snake case is preferred

#

but not inherently wrong, no

misty mica Sep 8, 2020, 12:02 AM

#

If everyone uses the same style guide it is nice, but not worth going to battle over in most organizations.

#

I mean, not worth fighting an existing standard that isn't your preferred, it's definitely worth having a standard.

velvet thorn Sep 8, 2020, 12:04 AM

#

yup

#

what sucks, though

#

is not having a standard.

#

😦

wheat pilot Sep 8, 2020, 1:26 AM

#

in a list of tuples how do i get the first index of the list but second element in the tuple

#

something like list = [(2.3, 1), (3.5, 0)]

#

and get out the 1

slate hollow Sep 8, 2020, 1:28 AM

#

@wheat pilot 1. this probably isn't the best place but since you asked
you would so smth like x[0][1]

indigo obsidian Sep 8, 2020, 1:33 AM

#

just wondering, where would be the best place to ask more fundamental questions like this?

tidal bough Sep 8, 2020, 2:22 AM

#

This is a python-specific question, really, so #python-discussion or a help channel.

dusty bough Sep 8, 2020, 2:45 AM

#

how to solve it

📎 20200908_104529.jpg

violet mesa Sep 8, 2020, 4:45 AM

#

Anyone know of a good textbook for Time Series Analysis in Python? Understanding ACF, PACF, ARIMA, SARIMA with a good depth on the formulae etc...

glacial rune Sep 8, 2020, 8:27 AM

#

What is the most performant way of inserting lots of records into a SQL database? It took me almost 100 seconds for 57k records using executemany()

indigo obsidian Sep 8, 2020, 9:04 AM

#

SqlBulkCopy? from what i understand executemany isn't a true bulk operation, and is actually inserting your rows one by one under the hood.
https://programmingwithmosh.com/net/using-sqlbulkcopy-for-fast-inserts/

#

BULK INSERT is another possible option

lapis sequoia Sep 8, 2020, 10:26 AM

#

@glacial rune : executemany() isn't a true bulk operation, so it tends to be pretty slow.

I found I get significant performance gains by composing a single operation as a massive string and sending it all in one go.

Since you only have 57k records, that may be your best option. Beware that you have to be careful about how you convert numerical data into strings to avoid character truncation.

Another option is to use pyodbc.cursor.fast_executemany. I've not tried this, it just looks promising.

https://github.com/mkleehammer/pyodbc/wiki/Features-beyond-the-DB-API

sqalchemy added support for this feature

https://docs.sqlalchemy.org/en/13/changelog/migration_13.html#support-for-pyodbc-fast-executemany

GitHub

mkleehammer/pyodbc

Python ODBC bridge. Contribute to mkleehammer/pyodbc development by creating an account on GitHub.

glacial rune Sep 8, 2020, 10:46 AM

#

Thanks 😄 ultimately I will have like 30 million records... the db is on google cloud so I wonder if it would be faster to upload csv files

desert oar Sep 8, 2020, 12:20 PM

#

@glacial rune what database are you actually using?

#

they all have different features for this

glacial rune Sep 8, 2020, 12:27 PM

#

MySQL on Google Cloud

#

5.7

opaque isle Sep 8, 2020, 1:04 PM

#

I have a question. Normally is it possible for an image classifier built on CNN to give the count of something in an image. (e.g. the no of cats in an image)?

velvet thorn Sep 8, 2020, 1:22 PM

#

I have a question. Normally is it possible for an image classifier built on CNN to give the count of something in an image. (e.g. the no of cats in an image)?
@opaque isle yes

runic stream Sep 8, 2020, 2:04 PM

#

hey so i want to train a GRU network with an input of shape (748, 500, 12)
but i'm getting this error:

#

📎 unknown.png

#

📎 unknown.png

#

the model 👆 can someone please help?

cobalt jetty Sep 8, 2020, 2:06 PM

#

can you show a full picture of the error message?

#

I think you're messing something with x_train with regards to the shape.

runic stream Sep 8, 2020, 2:06 PM

#

📎 unknown.png

#

📎 unknown.png

#

these are the shapes

cobalt jetty Sep 8, 2020, 2:08 PM

#

my guess is that xtr has a wrong shape here.

📎 unknown.png

runic stream Sep 8, 2020, 2:09 PM

#

the first dim is the no. of examples, the next is the time steps(500 samples), and the last one is the features

cobalt jetty Sep 8, 2020, 2:10 PM

#

is it the pytorch GRU?

runic stream Sep 8, 2020, 2:10 PM

#

keras

#

📎 unknown.png

#

this is the network i'm trying to implement

cobalt jetty Sep 8, 2020, 2:17 PM

#

of the top of my head, I can give a proper answer right now.

#

I do think it's a size mismatch between the input_size you're using and the size of xtr, though.

runic stream Sep 8, 2020, 2:28 PM

#

I think the GRU outputs five different sequences, each of which I have to pass through another Dense layer, but I don't know how to do that,....maybe that maybe the reason for the error....

#

I do think it's a size mismatch between the input_size you're using and the size of xtr, though.
@cobalt jetty input size i have given is (500,12) to the GRU, and xtr is of shape (748, 500, 12)

cobalt jetty Sep 8, 2020, 3:03 PM

#

I looked back at the error and it seems to arise from this function. I.e. your output shape and your y_train shape have a mismatch.

📎 unknown.png

runic stream Sep 8, 2020, 3:29 PM

#

sorry i tried everything it is not working still, i want a multiclass classification, and i think i'm doing something wrong here

I think the GRU outputs five different sequences, each of which I have to pass through another Dense layer, but I don't know how to do that,....maybe that maybe the reason for the error....
but i don't know how to solve that

safe tapir Sep 8, 2020, 3:32 PM

#

Anyone have experience with using eGPUs for DL? Most of the benchmarks I see online show a 10-30% performance hit for gaming. Should I expect similar for DL?

cobalt jetty Sep 8, 2020, 3:35 PM

#

Try inputing (5,) as an output shape rather than len(labels), @runic stream

runic stream Sep 8, 2020, 3:37 PM

#

📎 unknown.png

#

i guess the ouput units have to be an int,

willow karma Sep 8, 2020, 4:09 PM

#

Hey squad - hope everyone had a good weekend and is staying safe/healthy. I'm starting to dip my toes into sentiment analysis, and can imagine this work has been explored in so much detail that there are some good Python libraries that can basically "plug and play" with text that you feed it.

If this is the case, are there any libraries ya'll recommend I explore? I imagine there are easier alternatives to running e.g. CountVectorizer

slate hollow Sep 8, 2020, 5:34 PM

#

hey um does anyone know how much space buillding tensorflow from source takes? bc rn it has taken up frickin 30 gigs

desert oar Sep 8, 2020, 5:40 PM

#

@void anvil might have to just read the source code

#

or use subprocess

#

cat your_file.naf | corefgraph -l en_conll > output.naf

ugh, useless use of cat

#

and sudo pip install too

#

yeah you'd have to check the source code for how it works

#

good old academic software

#

no idea

#

let me know if you figure it out though @void anvil

#

i like having all this nlp stuff in my toolbox

crimson vector Sep 8, 2020, 6:18 PM

#

okay im really dumb and new to everything and would like some help. ive spent 2 days trying to figure out what is wrong with my neural network. im trynna do the handwritten digits thing (mnist) and my code is both super slow and the cost only goes up

#

can someone look it over and tell me where i am going wrong?

#

import numpy as np

def cross_entropy(output, y_target):
    return - np.sum(np.log(output) * y_target, axis=1)

def cost(output, y_target):
    return np.mean(cross_entropy(output, y_target))

def sigmoid(z):
    return 1 / (1 + np.exp(z * -1))

def sigmoid_deriv(z):
    return sigmoid(z) * (1 - sigmoid(z))

def softmax(z):
    return (np.exp(z.T) / np.sum(np.exp(z), axis=1)).T

m = 10000
y = np.zeros((m, 10))
x = np.zeros((m, 784))

file = open("data\\mnist_train.txt", 'r')
for i in range(m):
    line = file.readline()
    x_line = line[2:].split(',')
    x_line = np.array([int(i) for i in x_line]).reshape(1, 784)
    x[i] = x_line

    y_line = np.zeros((10, 1))
    y_line[int(line[0])] = 1
    y[i] = np.array(y_line.T)

y = y.reshape(m, 10).T
x = x.T
alpha = .01

W1 = np.random.rand(256, 784) * .01
b1 = np.zeros((256, 1))

W2 = np.random.rand(256, 256) * .01
b2 = np.zeros((256, 1))

W3 = np.random.rand(10, 256) * .01
b3 = np.zeros((10, 1))

for i in range(1000):
    # feed forward
    Z1 = np.dot(W1, x) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    Z3 = np.dot(W3, A2) + b3
    A3 = softmax(Z3)

    # calculating gradients
    dz3 = A3 - y
    dw3 = np.dot(dz3, A2.T) / m
    db3 = np.sum(dz3, axis=1, keepdims=True) / m
    da2 = np.dot(W3.T, dz3)

    dz2 = np.multiply(da2, sigmoid_deriv(Z2))
    dw2 = np.dot(dz2, A1.T) / m
    db2 = np.sum(dz2, axis=1, keepdims=True) / m
    da1 = np.dot(W2.T, dz2)

    dz1 = np.multiply(da1, sigmoid_deriv(Z1))
    dw1 = np.dot(dz1, x.T) / m
    db1 = np.sum(dz1, axis=1, keepdims=True) / m

    # updating weights and biases
    W3 = W3 - alpha * dw3
    b3 = b3 - alpha * db3
    W2 = W2 - alpha * dw2
    b2 = b2 - alpha * db2
    W1 = W1 - alpha * dw1
    b1 = b1 - alpha * db1

#

ping me if u can help or something

desert oar Sep 8, 2020, 6:54 PM

#

@void anvil what are you ultimately trying to do

sly mango Sep 8, 2020, 7:06 PM

#

Guys, I have created a small corpus of 1.8M sentences and 250K unique words in Spanish for NLP, but I really don't know where to post it. 😅

desert oar Sep 8, 2020, 7:08 PM

#

what is the output from a tool like this @void anvil ?

#

some kind of matrix of coreferences?

#

e.g. matrix C where Cij = 1 if entities i and j appear in the same doc?

desert oar Sep 8, 2020, 7:52 PM

#

hm

#

interesting

#

so you need to see coreferences of things like "BOILER" and "TECHNICIAN"?

lapis sequoia Sep 8, 2020, 7:56 PM

#

Anyone with TF lite experience need some help

desert oar Sep 8, 2020, 9:04 PM

#

interesting that coreference resolution is a separate task from "just" entity resolution

wheat pilot Sep 8, 2020, 9:12 PM

#

how do i standard scale a dataframe to have 0 mean and 1 stdev using sklearn?

#

i tried standardscaler and scale but when i manually check the returned data the mean is not 0

desert oar Sep 8, 2020, 9:18 PM

#

@wheat pilot it might be +/- some small amount due to floating point error

wheat pilot Sep 8, 2020, 9:19 PM

#

when i use df.mean() one row returns -3.552714e-16

#

so i guess that might be it

#

but then this is being used towards data preprocessing and my initial accuracy for a knn implementation is 1 but for a standard scaled is lower

#

and even lower for a min max scaled

#

shouldnt they be higher? @desert oar

desert oar Sep 8, 2020, 9:21 PM

#

do you have any idea how tiny 1e-16 is

wheat pilot Sep 8, 2020, 9:25 PM

#

yea super close to 0

#

i wasnt sure if it should be exact or not

#

but are my accuracies supposed to get worse?

#

with preprocessing

#

also i thought the way things were standard scaled is x-mean/stdev

desert oar Sep 8, 2020, 9:26 PM

#

yeah that is right

#

so you should have mean ~0 and sd ~1

wheat pilot Sep 8, 2020, 9:26 PM

#

but when i manually do that for one row of my dataset i get a different value

desert oar Sep 8, 2020, 9:26 PM

#

huh?

#

what do you mean "for one row"?

wheat pilot Sep 8, 2020, 9:27 PM

#

7.3,0.74,0.08,1.7,0.094,10.0,45.0,0.9957600000000001,3.24,0.5,9.8

#

this is one row of my datafram

#

its mean is 7.2227054545455

#

and its stdev is 13.098395928751

desert oar Sep 8, 2020, 9:28 PM

#

you arent supposed to scale rows

#

you're supposed to scale columns

wheat pilot Sep 8, 2020, 9:28 PM

#

ohhhh

#

oh man

desert oar Sep 8, 2020, 9:28 PM

#

the mean of each column should be around 0, and the stddev of each column should be around 0

wheat pilot Sep 8, 2020, 9:28 PM

#

no wonder

#

ok so

desert oar Sep 8, 2020, 9:29 PM

#

the point is so that all the data is centered in roughly the same place and occupies roughly the same amount of "space"

wheat pilot Sep 8, 2020, 9:29 PM

#

what is the implementation using sklearn for this?

#

scale vs fit_transform

desert oar Sep 8, 2020, 9:29 PM

#

eh?

#

first of all what are your data types

#

how many columns

#

any missing values

#

etc

#

just so i know what you are dealing with

wheat pilot Sep 8, 2020, 9:30 PM

#

its a pandas dataframe i think

#

initially it has 12 columns and one of those is the label column

#

no missing values

#

in the start of my standard scaling def i removed the last column of the dataframe with ```python
xTrain = xTrain[xTrain.columns[:-1]];
xTest = xTest[xTest.columns[:-1]];

#

since i dont want to standardize the labels i think?

desert oar Sep 8, 2020, 9:32 PM

#

correct

#

well you can standardize those too, but you wouldnt want to do it here

wheat pilot Sep 8, 2020, 9:32 PM

#

yea

desert oar Sep 8, 2020, 9:32 PM

#

you dont need semicolons in python btw

wheat pilot Sep 8, 2020, 9:32 PM

#

oh yea

desert oar Sep 8, 2020, 9:32 PM

#

i assume you use javascript?

wheat pilot Sep 8, 2020, 9:32 PM

#

habit

#

all of my coursework up till this course has been java based

desert oar Sep 8, 2020, 9:33 PM

#

ah

#

good so you understand objects and stuff

wheat pilot Sep 8, 2020, 9:33 PM

#

we didnt have any introduction to python just an assignment on the topic of the course 😦

#

yeah a bit

desert oar Sep 8, 2020, 9:33 PM

#

all the columns are numeric right?

#

as in, they only contain numbers?

wheat pilot Sep 8, 2020, 9:33 PM

#

i get a little iffy on fundamentals

#

yea they are

#

except the header?

desert oar Sep 8, 2020, 9:33 PM

#

yeah python has some things in common with java and some things that are very different

wheat pilot Sep 8, 2020, 9:34 PM

#

each column has a name

desert oar Sep 8, 2020, 9:34 PM

#

yeah we dont care about the column names

#

pandas is smart enough not to mix those up with your data

#

x_train = x_train.iloc[:, :-1]
x_test = x_test.iloc[:, :-1]

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

wheat pilot Sep 8, 2020, 9:35 PM

#

i imported the preprocessing packet(not sure if its the right term) using ```python
from sklearn import preprocessing

desert oar Sep 8, 2020, 9:35 PM

#

("module" or "package")

#

(a "package" is just a "module" that contains other modules, called submodules)

#

the code i posted should "just work"

#

it will scale each column independently

wheat pilot Sep 8, 2020, 9:36 PM

#

do i have to use iloc instead of what i had?

desert oar Sep 8, 2020, 9:36 PM

#

no but its less typing

#

or if you know the column name of the label you can do

x_train = x_train.drop(columns=[label_colname])

pandas gives you a few different ways to perform similar operations, depending on what exactly you want

wheat pilot Sep 8, 2020, 9:38 PM

#

for the line with scaler = StandardScaler i need to use preprocessor.StandardScaler() right?

desert oar Sep 8, 2020, 9:38 PM

#

in your case yes

#

i usually write from sklearn.preprocessing import StandardScaler

#

they both work

wheat pilot Sep 8, 2020, 9:38 PM

#

i have to use min max scaler as well

#

so i imported the bigger package

desert oar Sep 8, 2020, 9:38 PM

#

using both is weird

#

from sklearn.preprocessing import StandardScaler, MinMaxScaler

this is one option

wheat pilot Sep 8, 2020, 9:39 PM

#

ooh even better

#

then i can just use StandardScaler() on its own?

desert oar Sep 8, 2020, 9:39 PM

#

i wouldnt recommend using both unless you know what you're doing and have a good reason to do it

wheat pilot Sep 8, 2020, 9:40 PM

#

using both minmax and standard?

desert oar Sep 8, 2020, 9:40 PM

#

min-max scaling (aka "normalizing") works best when the data has a logical "maximum" and "minimum" value

wheat pilot Sep 8, 2020, 9:40 PM

#

they are for two different definitions to see how preprocessing affects my accuracy

desert oar Sep 8, 2020, 9:40 PM

#

whereas shifting by mean and scaling by std dev (aka "standardizing") works best on unbounded data

#

yeah, dont use them on the same data

#

but if youre comparing then go for it

wheat pilot Sep 8, 2020, 9:41 PM

#

ah ok good i think its set up to use a new copy each time

desert oar Sep 8, 2020, 9:41 PM

#

the "function-only" versions like sklearn.preprocessing.minmax_scale don't preserve the values you need to re-apply the scaling later

#

whereas the class-based versions like sklearn.preprocessing.MinMaxScaler store the scaling parameters, which lets do you fit_transform on the training data and then just transform on the test data

wheat pilot Sep 8, 2020, 9:42 PM

#

where you named things scaled are there issues if i use ```python
xTrain = scaler.fit_transform(xTrain)

desert oar Sep 8, 2020, 9:42 PM

#

i dont like not having access to my original data

#

it's just more annoying

#

https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py this might be a good read for you

wheat pilot Sep 8, 2020, 9:43 PM

#

ah wait

#

on what u said about function only

#

for knn should i be using the same tranform values on the test data?

#

or scaling test data separately

desert oar Sep 8, 2020, 9:44 PM

#

for knn should i be using the same tranform values on the test data?
yes, you should do this

#

think about it practically: the test data is meant to simulate "out of sample" data. if the data is out of sample, where are you going to get the scaling parameters? nowhere. you have to use the parameters from the training data

wheat pilot Sep 8, 2020, 9:45 PM

#

ohh

#

i see

#

my teaching assistant mentioned not doing the same process on the test data

#

but the assignemnt info made it seem like we were supposed to

#

but i think it meant same process as in same parameters as the training and not same code process

#

so for the same deal but a min max version i would just replace the standard scaler with a min max?

#

xTrain = xTrain.iloc[:, :-1]
    xTest = xTest.iloc[:, :-1]
    scaler = MinMaxScaler()
    xTrain_scaled = scaler.fit_transform(xTrain)
    xTest_scaled = scaler.transform(xTest)
    return xTrain_scaled, xTest_scaled

desert oar Sep 8, 2020, 9:47 PM

#

yep

wheat pilot Sep 8, 2020, 9:48 PM

#

do you know anything about adding noisy features?

#

i think they might also be called irrelevant features

#

i implemented this but im not sure this is actually what i should be trying to do ```python
def add_irr_feature(xTrain, xTest):
"""
Add 2 features using Gaussian distribution with 0 mean,
standard deviation of 1.

Parameters
----------
xTrain : nd-array with shape n x d
    Training data 
xTest : nd-array with shape m x d
    Test data 

Returns
-------
xTrain : nd-array with shape n x (d+2)
    Training data with 2 new noisy Gaussian features
xTest : nd-array with shape m x (d+2)
    Test data with 2 new noisy Gaussian features
"""
# TODO FILL IN
feature1_train = np.random.normal(0, 1, len(xTrain))
feature2_train = np.random.normal(0, 1, len(xTrain))
feature1_test = np.random.normal(0, 1, len(xTest))
feature2_test = np.random.normal(0, 1, len(xTest))
xTrain['irr_feat1'] = feature1_train
xTrain['irr_feat2'] = feature2_train
xTest['irr_feat1'] = feature1_test
xTest['irr_feat2'] = feature2_test
return xTrain, xTest

desert oar Sep 8, 2020, 9:49 PM

#

note that those features are uncorrelated with your "meaningful" features

#

is this part of your homework?

#

if not, sklearn has a nice routine for making fake classification data https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html

wheat pilot Sep 8, 2020, 9:50 PM

#

yea it is

#

we never went over any code in class though

#

just the concept of knn

#

TT

#

im having trouble following what the classification thing means

desert oar Sep 8, 2020, 9:52 PM

#

what does the homework actually ask you to do?

wheat pilot Sep 8, 2020, 9:52 PM

#

i do understand im supposed to be making extra columns that are not "necessary" and may mess with results

#

Fill in the add irr feature function to add two irrelevant features to the training
and test data. The data for each column should be drawn from a Gaussian (normal)
distribution with 0 mean and standard deviation of 1.

desert oar Sep 8, 2020, 9:53 PM

#

seems like you did the right thing then

#

dont overthink that question

wheat pilot Sep 8, 2020, 9:53 PM

#

oh cool cool

#

when i run this though my accuracy went up for the one i thought it would go down

#

and it went down for the ones i thought should go up

desert oar Sep 8, 2020, 9:55 PM

#

well yeah you were usin the scaling totally wrong lol

wheat pilot Sep 8, 2020, 9:55 PM

#

oh i mean with new code

#

my results are: ```python
Test Acc (no-preprocessing): 1.0
Test Acc (standard scale): 0.8
Test Acc (min max scale): 0.7
Test Acc (with irrelevant feature): 1.0

#

actually this may be because i ran it on a small test set

desert oar Sep 8, 2020, 9:58 PM

#

use the same test and train set for all of those

#

dont re-draw each time

wheat pilot Sep 8, 2020, 9:58 PM

#

what do you mean

#

redraw?

desert oar Sep 8, 2020, 9:59 PM

#

make sure you use the same test/train split for all 4 of those methods

#

to get a fair comparison

wheat pilot Sep 8, 2020, 9:59 PM

#

oh my data is divided into 4 csv

#

xtraining and ytraining, xtest and ytest

desert oar Sep 8, 2020, 9:59 PM

#

thats fine

wheat pilot Sep 8, 2020, 9:59 PM

#

x has the data and y has labels

#

theres something preimplemented to get each thing

#

i think its pd.read_csv

desert oar Sep 8, 2020, 10:00 PM

#

thats fine

wheat pilot Sep 8, 2020, 10:01 PM

#

its running now but since my knn has some for loops it takes a few mins for each test

#

the csvs are 500x12

#

ah it still goes down

#

Test Acc (no-preprocessing): 0.8395833333333333
Test Acc (standard scale): 0.70625

#

so far

desert oar Sep 8, 2020, 10:05 PM

#

it might actually just be worse

#

depends on the data

#

seems unlikely but

wheat pilot Sep 8, 2020, 10:06 PM

#

Evaluate the accuracy of the model on the test dataset for the different preprocessing
techniques as a function of k. What conclusions can you draw with regards to the
different forms of preprocessing and the sensitivity to irrelevant features for this dataset?

#

i feel like its fishing for an answer about accuracy getting better since scale should usually matter for knn?

#

i ended up with Test Acc (no-preprocessing): 0.8395833333333333
Test Acc (standard scale): 0.70625
Test Acc (min max scale): 0.8
Test Acc (with irrelevant feature): 0.84375

desert oar Sep 8, 2020, 10:07 PM

#

yeah that would be my guess as well

wheat pilot Sep 8, 2020, 10:07 PM

#

im not sure what conclusions to draw as its opposite of my gues

desert oar Sep 8, 2020, 10:07 PM

#

just triple check for mistakes

#

worst case scenario you get it wrong

wheat pilot Sep 8, 2020, 10:08 PM

#

would you be able to skim through my knn to see if i had anything major causing this to be wrong?

desert oar Sep 8, 2020, 10:08 PM

#

i can, but i cant offer that much since this is homework

wheat pilot Sep 8, 2020, 10:08 PM

#

it works for the first data set that i had to write it for but this question uses that knn for a different set

#

https://pastebin.com/aFfxcJy2

Pastebin

import argparseimport numpy as npimport pandas as pdclass Knn(objec...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

https://pastebin.com/xBmb8d12

Pastebin

import argparseimport numpy as npimport pandas as pdfrom sklearn.pr...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

the first is my knn and the second is the preprocessing tests

lapis sequoia Sep 8, 2020, 10:27 PM

#

Are there any websites similar to “kaggle” that offers competition ?

wheat pilot Sep 8, 2020, 10:53 PM

#

@desert oar were you able to find anything?

verbal haven Sep 9, 2020, 12:06 AM

#

does someone know what this number -> 22/22 means when training a nn with tensorflow?
Epoch 94/100
22/22 [==============================] - 0s 750us/step - loss: 0.0123

#

i understand its related to the dataset size but i cant find the relation

desert oar Sep 9, 2020, 12:51 AM

#

is that a progress bar for stochastic gradient descent?

hasty grail Sep 9, 2020, 12:53 AM

#

Step 22 of 22 in the 94th epoch

#

Usually in TF a "step" is equivalent to one batch

lean wharf Sep 9, 2020, 2:24 AM

#

Does anyone have an idea how I could recognise the steps annotated below using python? I've got the xy data in a dataframe at the moment

📎 unknown.png

hasty grail Sep 9, 2020, 2:25 AM

#

smooth out the curve using moving average then find the derivative at each point?

lean wharf Sep 9, 2020, 2:26 AM

#

Some sort of "if the difference between y and y+1 > n, print x and x+1"

verbal haven Sep 9, 2020, 2:26 AM

#

scipy.signals.find_peaks

hasty grail Sep 9, 2020, 2:27 AM

#

the difference between y and y+1`
basically the derivative

lean wharf Sep 9, 2020, 2:28 AM

#

Aye. At the moment I've got something looking a little like this code wise

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.signal import savgol_filter


df = pd.read_csv('e:/Projects/HiringTest/submission/sample.txt', sep = ' ')

# Data befrore filtering.
df.plot(x ='x', y='y', linewidth=0.2)

# Savitzky-Golay filter implementation
dataIIR = df
dataIIR['y'] = savgol_filter(df['y'], 101, 2)
dataIIR.plot(x ='x', y='y', linewidth=0.4)

#

So importing the data, and reducing the noise a little with filter

verbal haven Sep 9, 2020, 2:30 AM

#

'e:/Projects/HiringTest/submission/sample.txt' lol,
have you tried with find_peaks or similars? tweaking it, it should find the steps

lean wharf Sep 9, 2020, 2:30 AM

#

Trying now

#

Just doing some practice interview q's lol

lean wharf Sep 9, 2020, 2:48 AM

#

@verbal haven So say I want to find a point with difference of 1 between y and y+1, would that be height=1?

#

ie. jumps = find_peaks(dataIIR['y'], height=1.5)

#

I'm struggling to get to grips with understanding how the function works

hasty grail Sep 9, 2020, 2:51 AM

#

I think you want threshold instead of height

verbal haven Sep 9, 2020, 2:53 AM

#

yes its threshold

lean wharf Sep 9, 2020, 2:54 AM

#

Hmmm, so ```
jumps = find_peaks(dataIIR['y'], threshold=1)
print(jumps)

Doesn't seem to return a values as such

verbal haven Sep 9, 2020, 2:55 AM

#

it should return an array with a peak index

#

oh you mean with threshold = 1 ?

lean wharf Sep 9, 2020, 2:57 AM

#

So, I've got a some data that looks like this

2.24756189047 2.70009589679
2.24831207802 2.85466124369
2.24906226557 2.85664726093
2.24981245311 2.84088991726
2.25056264066 5.23410679429
2.25131282821 5.01424916475
2.25206301575 4.81484599199
2.2528132033 5.25819389546
2.25356339085 4.99236143949

#

you can see the jump from 2.8 to 5.2

#

I'm looking to find where in the code that jump happens. I've put it as 1 just because it seems like a reasonable value to measure the number of jumps

hasty grail Sep 9, 2020, 3:22 AM

#

what does it return?

lean wharf Sep 9, 2020, 3:23 AM

#

So running starts with this

📎 unknown.png

hasty grail Sep 9, 2020, 3:23 AM

#

ok so it literally returns nothing

lean wharf Sep 9, 2020, 3:24 AM

#

Basically yes

arctic wedgeBOT Sep 9, 2020, 3:24 AM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

lean wharf Sep 9, 2020, 3:26 AM

#

But I'm not sure why. As you can see above, there clearly is a difference of greater that 2. Even 1 returns nothing

hasty grail Sep 9, 2020, 3:28 AM

#

After messing around in #bot-commands I think I see the problem

#

The threshold checks both sides

#

if it's just a jump it wouldn't register it as a peak as the other side doesn't have a big enough jump

lean wharf Sep 9, 2020, 3:29 AM

#

Hmm I did think that might be a possibility. It looks for spikes not steps basically

#

I might need to do it manually in that case

hasty grail Sep 9, 2020, 3:33 AM

#

try this

#

!e

import numpy as np

arr = np.array([2.24756189047, 2.70009589679,
2.24831207802, 2.85466124369,
2.24906226557, 2.85664726093,
2.24981245311, 2.84088991726,
2.25056264066, 5.23410679429,
2.25131282821, 5.01424916475,
2.25206301575, 4.81484599199,
2.2528132033, 5.25819389546,
2.25356339085, 4.99236143949])

x, y = arr[::2], arr[1::2]
print(f"x={x}")
print(f"y={y}")

print(x[np.gradient(y) >= 1])

arctic wedgeBOT Sep 9, 2020, 3:33 AM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

lean wharf Sep 9, 2020, 3:34 AM

#

It returns: x=[2.24756189 2.24831208 2.24906227 2.24981245 2.25056264 2.25131283 2.25206302 2.2528132 2.25356339] y=[2.7000959 2.85466124 2.85664726 2.84088992 5.23410679 5.01424916 4.81484599 5.2581939 4.99236144] [2.24981245 2.25056264]

hasty grail Sep 9, 2020, 3:36 AM

#

if you only want a strict difference between consecutive elements, instead of np.gradient you can just do np.ediff1d

lean wharf Sep 9, 2020, 3:49 AM

#

Can't seem to get it to how I'd want

#

I've currently mocked up some code to possibly solve it: ```python
n = 0
for row_index,row in dataIIR.iterrows():
np1 = row['y']
diff = np1 - n
if(diff > 2):
print(row_index)
n = row['y']

#

But it's not returning anything. I'm trying to get it so it will print the row index if the difference between y and y+1 is greater than 2

velvet thorn Sep 9, 2020, 3:55 AM

#

how about

#

np.argmax((a - a[1:]).abs())

#

that's what I would do

lean wharf Sep 9, 2020, 4:07 AM

#

Sorry for what part? @velvet thorn

hasty grail Sep 9, 2020, 4:07 AM

#

!e

import numpy as np

arr = np.array([2.24756189047, 2.70009589679,
2.24831207802, 2.85466124369,
2.24906226557, 2.85664726093,
2.24981245311, 2.84088991726,
2.25056264066, 5.23410679429,
2.25131282821, 5.01424916475,
2.25206301575, 4.81484599199,
2.2528132033, 5.25819389546,
2.25356339085, 4.99236143949])

x, y = arr[::2], arr[1::2]
print(f"x={x}")
print(f"y={y}")

print(x[np.ediff1d(np.concatenate([[y[0]], y])) >= 1])

arctic wedgeBOT Sep 9, 2020, 4:07 AM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

hasty grail Sep 9, 2020, 4:07 AM

#

This works?

#

The concatenation is to ensure that the result of ediff1d is the same length as the original array

lean wharf Sep 9, 2020, 4:10 AM

#

Give me a sec to wrap my head around it haha

hasty grail Sep 9, 2020, 4:12 AM

#

alternatively

#

x[np.ediff1d(y, to_begin=y[0]) >= 1]

lean wharf Sep 9, 2020, 4:15 AM

#

I think you're correct, I just need to try it with the full data file

#

so I'll need to convert the dataframe to a similar array

#

but data.to_numpy() gives it in the following format:

[4000 rows x 2 columns]
[[ 0.00000000e+00 -5.72766726e-03]
 [ 7.50187547e-04 -5.37550170e-03]
 [ 1.50037509e-03 -5.03534022e-03]
 ...
 [ 2.99849962e+00  5.02267064e+00]
 [ 2.99924981e+00  5.02299900e+00]
 [ 3.00000000e+00  5.02332816e+00]]

hasty grail Sep 9, 2020, 4:21 AM

#

take the first column as x and the second as y?

#

x, y = np.split(arr, 2, axis=1)

lean wharf Sep 9, 2020, 4:24 AM

#

print(x[np.ediff1d(np.concatenate([[y[0]], y])) >= 1]) should still work as intended as far as I can see right?

#

Ah, wait, hmm

hasty grail Sep 9, 2020, 4:28 AM

#

I think it should

lean wharf Sep 9, 2020, 4:31 AM

#

running into an issue that my dataFrames aren't keeping seperate

#

It's return nothing atm

#

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.signal import savgol_filter
from scipy.signal import find_peaks


data = pd.read_csv('e:/Projects/HiringTest/submission/sample.txt', sep = ' ')

# Data befrore filtering.
data.plot(x ='x', y='y', linewidth=0.2)

# Savitzky-Golay filter implementation
dataIIR = data
savgol_filter(dataIIR['y'], 101, 2)
dataIIR.plot(x ='x', y='y', linewidth=0.2)

# Locating jumps
arr = dataIIR.to_numpy()
x, y = np.split(arr, 2, axis=1)
print(x[np.ediff1d(np.concatenate([[y[0]], y])) >= 1])

#plt.show()

#

Wait, think it was a bug

#

Fantastic, I think it's found the 3 steps!!

#

[[0.75018755]
 [1.50037509]
 [2.25056264]]

#

Thanks for all your help. I may be back in a few minutes with some more questions as to how I actually smooth the step between them, but I should be able to do that by appending the array with some values after doing some exponential smoothing

hasty grail Sep 9, 2020, 4:46 AM

#

np

lapis sequoia Sep 9, 2020, 5:31 AM

#

I don’t understand any of this but good work

lapis sequoia Sep 9, 2020, 6:06 AM

#

any idea what Im doing wrong here btw Im using PostgreSQL

c.execute("SELECT salary FROM EMPLOYEE WHERE name=$1", ("James",))

gives me 
Traceback (most recent call last):
  File "D:/Projects/DSaML/Main.py", line 21, in <module>
    c.execute("SELECT salary FROM EMPLOYEE WHERE name=$1", "James",)
psycopg2.errors.UndefinedParameter: there is no parameter $1
LINE 1: SELECT salary FROM EMPLOYEE WHERE name=$1

#

just ping me when you answer pls

lean wharf Sep 9, 2020, 6:59 AM

#

Does anyone know how I'd go about smoothing data between two points? I've got the sections I want to smooth coloured below:

📎 unknown.png

#

and my data as a dataframe

hasty grail Sep 9, 2020, 6:59 AM

#

did your moving average trick not work?

lean wharf Sep 9, 2020, 7:00 AM

#

I couldn't get it implemented without error

hasty grail Sep 9, 2020, 7:00 AM

#

code?

lean wharf Sep 9, 2020, 7:01 AM

#

2 secs

#

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import signal
from scipy import optimize

data = pd.read_csv('e:/Projects/HiringTest/submission/sample.txt', sep = ' ')

# Data befrore filtering.
plt.figure()
plt.plot(data['x'], data['y'], linewidth=0.2)

# Locating jumps
arr = data.to_numpy()
x, y = np.split(arr, 2, axis=1)
stepIndex = data.index[np.ediff1d(np.concatenate([[y[0]], y])) >= 1]
print(stepIndex)

step1x = x[range(stepIndex[0]-100, stepIndex[0]+100)]
step2x = x[range(stepIndex[1]-100, stepIndex[1]+100)]
step3x = x[range(stepIndex[2]-100, stepIndex[2]+100)]

step1y = y[range(stepIndex[0]-100, stepIndex[0]+100)]
step2y = y[range(stepIndex[1]-100, stepIndex[1]+100)]
step3y = y[range(stepIndex[2]-100, stepIndex[2]+100)]


def moving_avg(x, n):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[n:] - cumsum[:-n]) / float(n)

step1yMA = moving_avg(step1y, 3)
step2yMA = moving_avg(step2y, 3)
step3yMA = moving_avg(step3y, 3)

plt.plot(step1x, step1yMA)
plt.plot(step2x, step2yMA)
plt.plot(step3x, step3yMA)

# Savitzky-Golay filter implementation
dataIIR = data
dataIIR['y'] = signal.savgol_filter(data['y'], 101, 2)
plt.figure()
plt.plot(dataIIR['x'], dataIIR['y'], linewidth=0.2)

plt.show()

#

@hasty grail

#

So I haven't modified the dataframe with the new data yet, just plotted it, but it should still work. Instead, it prints the error:
ValueError: x and y must have same first dimension, but have shapes (200, 1) and (198,)

hasty grail Sep 9, 2020, 7:12 AM

#

which line?

#

I think that the issue is that the moving average calculation isn't defined for the first n-1 values

#

hence the difference in shape

lean wharf Sep 9, 2020, 7:14 AM

#

I don't understand what you mean by that sorry

velvet thorn Sep 9, 2020, 7:16 AM

#

I don't understand what you mean by that sorry
@lean wharf say you have an average of 3 values

#

over this data: [10, 5, 10, 15, 20, 15]

#

if you always want to have 3 values in the calculation

#

you'll end up with [25/3, 10, 15, 50/3] (4 values)

#

1-3, 2-4, 3-5, 4-6

#

if you wanted 4 values in the moving average, you'd have 3 values in the result

lean wharf Sep 9, 2020, 7:20 AM

#

So my value of n is wrong?

#

I'm still not quite grasping the issue

#

Trying out a different implementation:

 Locating jumps

arr = data.to_numpy()
x, y = np.split(arr, 2, axis=1)
stepIndex = data.index[np.ediff1d(np.concatenate([[y[0]], y])) >= 1]
print(stepIndex)

step1x = x[range(stepIndex[0]-100, stepIndex[0]+100)]
step2x = x[range(stepIndex[1]-100, stepIndex[1]+100)]
step3x = x[range(stepIndex[2]-100, stepIndex[2]+100)]

step1y = y[range(stepIndex[0]-100, stepIndex[0]+100)]
step2y = y[range(stepIndex[1]-100, stepIndex[1]+100)]
step3y = y[range(stepIndex[2]-100, stepIndex[2]+100)]

def movingaverage(interval, window_size):
    window= np.ones(int(window_size))/float(window_size)
    return np.convolve(interval, window, 'same')

y_av1 = movingaverage(step1y, 10)
y_av2 = movingaverage(step2y, 10)
y_av3 = movingaverage(step3y, 10)
plt.plot(step1x, y_av1)
plt.plot(step2x, y_av2)
plt.plot(step3x, y_av3)

#

But now getting the error:
ValueError: object too deep for desired array

hasty grail Sep 9, 2020, 7:31 AM

#

what gm said

lean wharf Sep 9, 2020, 7:31 AM

#

I feel like there's a fundemental flaw in my understanding here

hasty grail Sep 9, 2020, 7:31 AM

#

no matter what the value of n is, you will have to define the first n-1 values of your moving average

#

otherwise you will always be a couple of values short and the arrays won't align to each other

lean wharf Sep 9, 2020, 7:32 AM

#

Can I do that by simply increasing the size of the data points I draw from?

#

ie. step1y = y[range(stepIndex[0]-101, stepIndex[0]+100)]

hasty grail Sep 9, 2020, 7:34 AM

#

ok the problem here is that you might get out-of-bound indices when you add/subtract from stepIndex

#

If I were you I would create a padded version of y before doing that

lean wharf Sep 9, 2020, 7:39 AM

#

Apologies again, I don't know what you mean by that. I'm new to these concepts

hasty grail Sep 9, 2020, 7:45 AM

#

window_len, exp_alpha = 201, 0.5

pad_left, pad_right = window_len // 2, (window_len - 1) // 2
y_padded = np.pad(y, (pad_left, pad_right), constant_values=(np.nan, np.nan))

exp_kernel_left = ((1 - exp_alpha) ** np.arange(1, pad_left + 1))[::-1]
exp_kernel_right = (1 - exp_alpha) ** np.arange(1, pad_right + 1)
exp_kernel = np.concatenate([exp_kernel_left, [1], exp_kernel_right])

avg_values = []
for i in range(len(y)):
    window = y_padded[i:i+window_len]
    exp_sum = exp_kernel[~np.isnan(window)].sum()
    exp_avg = np.nansum(window * exp_kernel) / exp_sum
    avg_values.append(exp_avg)

#

something like this maybe

#

Made some errors, I have edited the code

#

oops the range should be from 1 to N instead of 0 to N-1

lean wharf Sep 9, 2020, 7:57 AM

#

Do I no longer need the step1y = y[range(stepIndex[0]-100, stepIndex[0]+100)] functions then, to define the range?

hasty grail Sep 9, 2020, 7:58 AM

#

replace all that with what I wrote

lean wharf Sep 9, 2020, 7:58 AM

#

I'm struggling to make sense of this, it's a bit above my pay grade haha

#

But I'll give it a shot

hasty grail Sep 9, 2020, 7:58 AM

#

exp_kernel_left = ((1 - exp_alpha) ** np.arange(1, pad_left + 1))[::-1]
exp_kernel_right = (1 - exp_alpha) ** np.arange(1, pad_right + 1)
exp_kernel = np.concatenate([exp_kernel_left, [1], exp_kernel_right])

This builds the kernel for calculating the moving average. It's equal to one in the center then exponentially falls off towards the sides

#

y_padded = np.pad(y, (pad_left, pad_right), constant_values=(np.nan, np.nan))

This creates a padded version of y where the padded values are NaNs so they can be filtered out later

#

in the for loop, it takes a window from y_padded such that y[i] is in the center of the window

#

then, it is multiplied with the kernel to get the numerator

#

the denominator is the sum of the values (aka weights) in the kernel

topaz sparrow Sep 9, 2020, 8:00 AM

#

how did you get that colour in the text?

hasty grail Sep 9, 2020, 8:01 AM

#

The average is taken such that NaNs are not counted

#

!code

arctic wedgeBOT Sep 9, 2020, 8:01 AM

#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
• These are backticks, not quotes. Backticks can usually be found on the tilde key.
• You can also use py as the language instead of python
• The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')

lean wharf Sep 9, 2020, 8:01 AM

#

Alright, that makes a bit more sense

topaz sparrow Sep 9, 2020, 8:02 AM

#

print ('hello world!')

#

?

hasty grail Sep 9, 2020, 8:02 AM

#

with ```

#

before python

#

on the same line

topaz sparrow Sep 9, 2020, 8:03 AM

#

python print ('Hello world!')

#

oof

hasty grail Sep 9, 2020, 8:03 AM

#

you can just copy the text from that bot messgae

topaz sparrow Sep 9, 2020, 8:03 AM

#

yeah

#

print('Hello world!')

#

oh

#

thanks

#

sir

hasty grail Sep 9, 2020, 8:03 AM

#

np

topaz sparrow Sep 9, 2020, 8:04 AM

#

I'm a beginner at python

hasty grail Sep 9, 2020, 8:04 AM

#

you should open a separate help channel if you're looking for Python-specific help

#

#❓｜how-to-get-help

topaz sparrow Sep 9, 2020, 8:04 AM

#

yeah

#

i knew that

#

thanks sir

#

😄

#

one last question, what's the python bot's prefix?

hasty grail Sep 9, 2020, 8:06 AM

#

an exclamation mark

topaz sparrow Sep 9, 2020, 8:06 AM

#

oh

#

thanks

lean wharf Sep 9, 2020, 8:06 AM

#

Traceback (most recent call last):
  File "e:/Projects/HiringTest/submission/assignment1.py", line 31, in <module>
    exp_sum = exp_kernel[~np.isnan(window)].sum()
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

hasty grail Sep 9, 2020, 8:07 AM

#

hmm can you print the shape of each variable?

lean wharf Sep 9, 2020, 8:11 AM

#

Is that possible without being able to run the code?

lapis sequoia Sep 9, 2020, 8:12 AM

#

any idea what Im doing wrong here btw Im using PostgreSQL

c.execute("SELECT salary FROM EMPLOYEE WHERE name=$1", ("James",))

gives me 
Traceback (most recent call last):
  File "D:/Projects/DSaML/Main.py", line 21, in <module>
    c.execute("SELECT salary FROM EMPLOYEE WHERE name=$1", "James",)
psycopg2.errors.UndefinedParameter: there is no parameter $1
LINE 1: SELECT salary FROM EMPLOYEE WHERE name=$1

lean wharf Sep 9, 2020, 8:12 AM

#

This is the total code thus far for continuity sake:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import signal
from scipy import optimize

data = pd.read_csv('e:/Projects/HiringTest/submission/sample.txt', sep = ' ')

# Data befrore filtering.
plt.figure()
plt.plot(data['x'], data['y'], linewidth=0.2)

# Locating jumps
arr = data.to_numpy()
x, y = np.split(arr, 2, axis=1)
# stepIndex = data.index[np.ediff1d(np.concatenate([[y[0]], y])) >= 1]
# print(stepIndex)

window_len, exp_alpha = 201, 0.5

pad_left, pad_right = window_len // 2, (window_len - 1) // 2
y_padded = np.pad(y, (pad_left, pad_right), constant_values=(np.nan, np.nan))

exp_kernel_left = ((1 - exp_alpha) ** np.arange(1, pad_left + 1))[::-1]
exp_kernel_right = (1 - exp_alpha) ** np.arange(1, pad_right + 1)
exp_kernel = np.concatenate([exp_kernel_left, [1], exp_kernel_right])

avg_values = []
for i in range(len(y)):
    window = y_padded[i:i+window_len]
    exp_sum = exp_kernel[~np.isnan(window)].sum()
    exp_avg = np.nansum(window * exp_kernel) / exp_sum
    avg_values.append(exp_avg)


# Savitzky-Golay filter implementation
dataIIR = data
dataIIR['y'] = signal.savgol_filter(data['y'], 101, 2)
plt.figure()
plt.plot(dataIIR['x'], dataIIR['y'], linewidth=0.2)

plt.show()

hasty grail Sep 9, 2020, 8:13 AM

#

you need to run the whole thing probably

lean wharf Sep 9, 2020, 8:13 AM

#

Yeah I'm doing so

#

Possible unbalanced tuple unpacking with sequence defined at line 785 of numpy.lib.shape_base: left side has 2 label(s), right side has 0 value(s)

hasty grail Sep 9, 2020, 8:14 AM

#

not sure which line of your code that is on

lean wharf Sep 9, 2020, 8:15 AM

#

Oh, didn't copy over

#

It's line 15, so x, y = np.split(arr, 2, axis=1)

hasty grail Sep 9, 2020, 8:18 AM

#

maybe that's just an error of the interpreter

#

as long as arr indeed has 2 columns it should be ok

lean wharf Sep 9, 2020, 8:19 AM

#

That's what I'm thinking also - I've ignored that specific message up until now - but it's not wanting to run

#

But it keeps throwing

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed" ```
at me

hasty grail Sep 9, 2020, 8:22 AM

#

yeah print their shapes

lean wharf Sep 9, 2020, 8:30 AM

#

It doesn't return anything. It won't run due to the index error

hasty grail Sep 9, 2020, 8:30 AM

#

comment it out then

lean wharf Sep 9, 2020, 8:32 AM

#

exp_kernel return (201,)

lavish finch Sep 9, 2020, 8:35 AM

#

hii, sorry random question! was curious if there's a way to write code to create columns in google sheets? i want to insert values from a different sheet into another by recognizing the same location names- is there a way to do that? i know how to do it in jupityer notebook, but don't know how to apply it to google sheets

hasty grail Sep 9, 2020, 8:38 AM

#

how about window?

lean wharf Sep 9, 2020, 8:40 AM

#

Ah, window is (201, 201)

hasty grail Sep 9, 2020, 8:41 AM

#

wait what

#

print y_padded.shape and y.shape

lean wharf Sep 9, 2020, 8:42 AM

#

(4200, 201)
(4000, 1)

#

hmm, y should be a single column

#

unless I'm mistaken

hasty grail Sep 9, 2020, 8:45 AM

#

oh I think I know why, the dimensions are not squeezed after split

#

maybe you should do the more straightforward x, y = arr[:, 0], arr[:, 1] instead

lean wharf Sep 9, 2020, 8:47 AM

#

Awesome, they're all single column now

#

So, this should smooth the step theoretically now?

hasty grail Sep 9, 2020, 8:48 AM

#

try

lean wharf Sep 9, 2020, 8:49 AM

#

I'm plotting x against avg_values just to clarify?

hasty grail Sep 9, 2020, 8:54 AM

#

yeah

lean wharf Sep 9, 2020, 8:56 AM

#

So it's reduced the noise (the moving average filter is Fig2):

📎 unknown.png

hasty grail Sep 9, 2020, 8:57 AM

#

the end result is Fig3 right?

lean wharf Sep 9, 2020, 8:57 AM

#

Thanks for soldiering through it

#

Fig3 is the Savitzky-Golay filter

hasty grail Sep 9, 2020, 8:57 AM

#

ah

lean wharf Sep 9, 2020, 8:57 AM

#

The MA has maintained the step gradient better

hasty grail Sep 9, 2020, 8:57 AM

#

if you want it to be smoother you can try adjusting alpha

#

you can try to selectively smooth the graph only close to the points where the jump is significant

lean wharf Sep 9, 2020, 9:00 AM

#

that's what I was trying to do earlier with the +100/-100

#

and then somehow make it sine wave like to join the steps

#

I'm gonna call it a day though I think. Thanks again for all your help, it's been super appreciated! @hasty grail

hasty grail Sep 9, 2020, 9:07 AM

#

np

meager delta Sep 9, 2020, 11:02 AM

#

Does somebody know of a numpy function, which takes for example 3 vectors, and represents the first as a linear combination of the others two (I need this for representing a face, as a linear combination of its eigenfaces)

merry ridge Sep 9, 2020, 11:08 AM

#

numpy.linalg.solve?

#

Maybe that won't work depending on the rank of the matrix you construct from the two vectors

#

Looks like numpy.linalg.lstsq would be able to handle it for a general mx2 matrix to me.

mild topaz Sep 9, 2020, 12:37 PM

#

hii, i have image recognition model which recognizes passport and driving_licence images

#

as we know some countries have statewise driving_licence

#

how i can make code weather my model is statewise or countrywise
my image recognition model is basically recognizes documents like "passports" and "driving_licence"
as some country has driving_licence "statewise" and some countries has "countrywise

#

as u know some countries has their driving_licence statewise

#

for e.g. "usa" , "australia", "india"
some country has "countrywise"
for e.g. "albania", "united_kingdom" etc

#

how i can make condition if country has state and user does not provide state name

#

so it should return "provide valid state name"

#

my inputs this way

📎 unknown.png

feral spoke Sep 9, 2020, 12:47 PM

#

Guys I need some help with pandas

#

Suppose there are two columns with values in it.

#

I want to find the entry that has the highest difference in values.

#

How should I go around it?

#

nvm guys

keen pine Sep 9, 2020, 12:54 PM

#

df.loc[(df['A']-df['B']).idxmax()] , this should be useful.

feral spoke Sep 9, 2020, 12:54 PM

#

got the solution

#

thnx for the help tho @keen pine

meager delta Sep 9, 2020, 1:04 PM

#

Looks like numpy.linalg.lstsq would be able to handle it for a general mx2 matrix to me.
@merry ridge I need to find the linear combination of the vectors, that form the first one, so I can take the coefficients, and put them into a weight vector (If you are familiar with eigenfaces, this is used to reconstruct the main face from the eigen ones + the mean one)

merry ridge Sep 9, 2020, 1:04 PM

#

I don't see why that won't do what you're asking

keen pine Sep 9, 2020, 1:04 PM

#

@feral spoke dmi.

merry ridge Sep 9, 2020, 1:06 PM

#

Given a v_3 in span{v_1, v_2} linalg.lstsq finds the minimizer of the norm of Ax-b where A = [v1, v2]. and b = v3

#

the minimizer x is precisely the coefficients that satisfy x[0]v_1 + x[1]v_2 = v_3

#

Obviously if the norm is greater than some epsilon tolerance level, then v_3 is not in the span and there is no solution. I am assuming you already know that v_3 is in the set otherwise you will have to measure the norm at each step and do some error handling if you try to do this.

meager delta Sep 9, 2020, 1:15 PM

#

I will try it with two basis orthogonal vectors, and one other, which must be in the span, just to test for now

merry ridge Sep 9, 2020, 1:16 PM

#

That sounds like a good approach

meager delta Sep 9, 2020, 1:17 PM

#

It works

📎 unknown.png

merry ridge Sep 9, 2020, 1:17 PM

#

Great!

meager delta Sep 9, 2020, 1:24 PM

#

But what about this one? It seems that the function is returning me wrong coefficients... I mean those 2 vectors are linearly independent, so it has to return me the right solutions

📎 unknown.png

merry ridge Sep 9, 2020, 1:25 PM

#

I haven't used this function before. Let me load up a jyupter notebook and see

#

1, 8 is a perplexing answer

#

Ok, it looks like when you type [[2,1] , [0,1]] that is imputting your vector as the rows not the columns

meager delta Sep 9, 2020, 1:39 PM

#

I think that I have to pass it like [[2, 0], [1, 1]]

#

And this way it would work

#

Like a T-Matrix

merry ridge Sep 9, 2020, 1:39 PM

#

So if you do 1*[2,1] + 8*[1,1] you get [10, 8] as required

#

Yeah, you basically need to transpose it

keen pine Sep 9, 2020, 1:40 PM

#

hi , i have problem on my model in pytorch ligthning , first i wonder what the possible reason of increasing validation loss while train loss decrease

📎 unknown.png

#

this takes my 3 days. in above sample validation set and train set are same.

merry ridge Sep 9, 2020, 1:42 PM

#

From a pure linear algebra perspective, passing in [2,0] [1,1] feels very unnatural, but it makes sense

meager delta Sep 9, 2020, 1:44 PM

#

This way works completely fine, I can pass my vectors as array, and numpy will do the transposition, instead of manually changing

📎 unknown.png

merry ridge Sep 9, 2020, 1:45 PM

#

Yeah. I'm just spoiled by matlab notation so whenever I start manipulating vectors in python I turn stupid

meager delta Sep 9, 2020, 1:49 PM

#

Thank you for the help!

merry ridge Sep 9, 2020, 1:50 PM

#

Yep, good luck!

wintry oyster Sep 9, 2020, 4:14 PM

#

does sorting algorithms fit here?

tidal bough Sep 9, 2020, 4:14 PM

#

#algos-and-data-structs would be best.

keen saddle Sep 9, 2020, 4:24 PM

#

hey guys, if anybody has experience with pandas I have a question over in #help-corn
I'd be grateful for any help 😄

hoary agate Sep 9, 2020, 4:44 PM

#

Yo

hoary agate Sep 9, 2020, 5:06 PM

#

I started working last week, and i'm joining the I.T team as a minor

#

My boss really wants me to learn Python so i've taken a corse to learn the basics of the alnguage

#

I already worked with C# before so im mildly familiar with programming

#

But my boss has been demanding that i do some data scrapping for him lately

#

And i don't really know how to do that

#

So i just wanted to ask if anyone if willing to help me learn a little bit about it

#

If no one is that is fine, it's not your job to take request from ramdos and i know it's demanding

#

I just really don't know how to start doing this

limber ledge Sep 9, 2020, 5:23 PM

#

Here's a guide to use Python for data scraping webpages: https://www.edureka.co/blog/web-scraping-with-python/

Edureka

Web Scraping With Python - A Beginner's Guide | Edureka

In this web scraping with Python tutorial, you will learn about web scraping and how data can be extracted, manipulated and stored in a file using Python.

hoary agate Sep 9, 2020, 5:25 PM

#

Thank you very much

#

I have no idea how i can repay you

boreal summit Sep 9, 2020, 5:26 PM

#

Hello everyone, is is possible to use ipython without installing Jupyter notebooks, cause I use vs code for data science.

#

I can send you some YouTube links to learn web scraping, and I have a PDF specifically for data scraping.

hoary agate Sep 9, 2020, 5:27 PM

#

Damn that's really kind of you

#

I'd really appreciate if you did

boreal summit Sep 9, 2020, 5:28 PM

#

📎 20200909_182801.jpg

#

I've the PDF on my PC.

#

📎 Screenshot_20200909-182906.jpg

hoary agate Sep 9, 2020, 5:30 PM

#

Where did you originally get it?

boreal summit Sep 9, 2020, 5:30 PM

#

Check out <<data school>> on YouTube, they can give you a good foundation for learning web scraping, after which you can move on to data scraping and web crawlers.

#

I downloaded it from YouTube to my phone.

#

Although, I've watched all of them

#

There's also this book I studied earlier this year, it gives a solid summary and foundation for web scraping.

#

Python projects for beginners by Connor Milliken.

#

That's the book title.

#

Do you use WhatsApp?

hoary agate Sep 9, 2020, 5:36 PM

#

I do, but i don't have a personal number

#

The company gave me the phone since i couldnt buy one, but i can only use it for work

boreal summit Sep 9, 2020, 5:38 PM

#

DM me, I'll send you the YouTube links and PDFs. I'll tell you where to read so you can go straight to web scraping and stuff.

flat quest Sep 9, 2020, 6:44 PM

#

@boreal summit u can just run ipython in the terminal and it'll work

Also vscode has support for jupyter notebooks, if u want to use that

boreal summit Sep 9, 2020, 6:45 PM

#

Yea, I use Jupyter notebooks in vs code and run code in the terminal. When I used Jupyter notebooks, I could just search ipython in the windows search bar and use it as a stand alone application. @flat quest

#

So that's why I was wondering if I could use ipython as a stand alone application without installing Jupyter notebooks.

#

I currently use notebooks in vs code for data science projects.

rugged shale Sep 9, 2020, 7:19 PM

#

Hello guys, I'm working on a project for my portfolio. I'm trying to predict how long does it take to a dog to be adopted from an animal shelter. After a couple rounds of preprocessing and feature engineering, I thought it was time to do some encoding. But, when it comes to the dog's race, theres about 150 unique races available. So, for this project, I'm thinking about encoding the 5 most frequent races like 'isGermanSheperd' or 'isPoodle' and the other non-frequent races in a 'otherRaces' variable. What do you guys think about this strategy? And in this case, is there other strategies you would suggest?

dusk aspen Sep 9, 2020, 8:25 PM

#

hi, i need some help. i am trying to compare three images and find out which one is the closest to one of the images, like if i had two pictures, one that would is red and the other one is blue, and i had another one that would be close to fully blue. the program would decide which picture is the closest to the third picture. someone told me to make a siamese network but i dont know how

ivory panther Sep 9, 2020, 9:39 PM

#

I have a question about ANN, not python and I would like to have your opinion. I have a multivariate timeserie that I used to train a multilayer perceptron ANN. When use just a layer with 100 neurons I get a result like this and when I increase the number of layers I get much wors results. Is this because of I have a very small set of training (around 50)?

📎 unknown.png

velvet thorn Sep 9, 2020, 11:07 PM

#

@gray sedge

#

use np.random.randint(1, 9, size=(3, 5))

gray sedge Sep 9, 2020, 11:08 PM

#

@velvet thorn I love you

velvet thorn Sep 9, 2020, 11:08 PM

#

thank you

gray sedge Sep 9, 2020, 11:09 PM

#

you just solved 40 minutes of frustration in 30 seconds you are a god

velvet thorn Sep 9, 2020, 11:09 PM

#

yeah, it can get confusing

#

part of solving this kind of problems is knowing what to Google

#

I would probably have tried "generate random integers numpy"

#

yeah, that gives np.random.randint as the first result 🙂

gray sedge Sep 9, 2020, 11:12 PM

#

I have that part correct further up the page, but there was just too many things for me to comprehend

#

That was step # 2 of part E, have done like 90 others before getting to this one

velvet thorn Sep 9, 2020, 11:15 PM

#

it can be overwhelming, too

gray sedge Sep 9, 2020, 11:15 PM

#

@velvet thorn First time combining that with size in the same line, I overthought it. Thank you so much I may need advice again before this part is over, but I'll probably be able to google my way through it

velvet thorn Sep 9, 2020, 11:16 PM

#

sure

#

just ask here, I guess

#

don't tag me though please

gray sedge Sep 9, 2020, 11:17 PM

#

sorry about that

velvet thorn Sep 9, 2020, 11:18 PM

#

just now was fine

#

I mean, in the future

#

sorry for the misunderstanding

#

like in general don't tag anyone who hasn't replied to your new question, I guess

gray sedge Sep 9, 2020, 11:39 PM

#

That makes sense, I've done it when someones question goes unanswered for quite a few messages

#

I'm stuck again but I'm gonna pick it back up tomorrow I'm overwhelming myself

lean wharf Sep 10, 2020, 12:45 AM

#

Does anyone have any experience implementing a smoothstep function?

#

I'm looking to smooth the transition between the highlighted areas

📎 unknown.png

#

I've got some code that provides a value for the centre of the jumps as an array [ 999 1999 2999]

verbal haven Sep 10, 2020, 2:34 AM

#

what about applying a low pass filter on the slices?

#

ive never smoothed a signal with steps like that tbh

hasty grail Sep 10, 2020, 2:47 AM

#

Was yesterday's result not good enough?

flat quest Sep 10, 2020, 6:22 AM

#

i mean the general method for dealing with those kinds of time series is ARIMA @lean wharf

If you want the neural network to figure it out thats a whole nother matter

#

i can't quite remember, but I'm pretty sure i could use ipython from terminal without installing jupyter notebooks @boreal summit

It was a while back tho so i might be wrong

hasty grail Sep 10, 2020, 6:29 AM

#

https://discordapp.com/channels/267624335836053506/366673247892275221/753159254004990003

#

Slightly rewritten to be more efficient

window_len, exp_alpha = 201, 0.5

pad_left, pad_right = window_len // 2, (window_len - 1) // 2
y_padded = np.pad(y, (pad_left, pad_right), constant_values=(np.nan, np.nan))
exp_kernel = (1 - exp_alpha) ** np.abs(np.arange(-pad_left, pad_right + 1))

def get_avg_values():
    for i in range(len(y)):
        window = y_padded[i:i+window_len]
        mask = ~np.isnan(window)
        yield np.average(window[mask], weights=exp_kernel[mask])

avg_values = np.fromiter(get_avg_values(), float, count=len(y))

velvet thorn Sep 10, 2020, 7:26 AM

#

i can't quite remember, but I'm pretty sure i could use ipython from terminal without installing jupyter notebooks @boreal summit

It was a while back tho so i might be wrong
@flat quest yes, Jupyter relies on IPython

flat quest Sep 10, 2020, 7:45 AM

#

yes it does
but the question is if its also standalone.

Ah looks like it is. There's an ipython package u can just download

velvet thorn Sep 10, 2020, 8:10 AM

#

yes it does
but the question is if its also standalone.

Ah looks like it is. There's an ipython package u can just download
@flat quest yeah, that was what I meant to say

#

but I got distracted

#

🥴

lean wharf Sep 10, 2020, 10:11 AM

#

@hasty grail it did the job of smoothing the whole signal, but not joining the steps

#

I'm nearly getting there. I know I need something like a sigmoid function to connect them

#

What I've got thus far

#_______________ Sigmoid function to smooth steps _______________
xs = np.array(x)
ys = np.array(avg_values)

diff = ys[1:] - ys[:-1]
indexBool = diff > 0.385 # Variable adjusted to fit number of steps
index = np.argwhere(indexBool).reshape(-1)

step1x = xs[(index[0]-100):(index[0]+100)]
step1y = ys[(index[0]-100):(index[0]+100)]
step2x = xs[(index[1]-100):(index[1]+100)]
step2y = ys[(index[1]-100):(index[1]+100)]
step3x = xs[(index[2]-100):(index[2]+100)]
step3y = ys[(index[2]-100):(index[2]+100)]

def sigmoid(x, mi, mx): 
    return mi + (mx-mi)*(lambda t: (1+200**(-t+0.5))**(-1) )( (x-mi)/(mx-mi) )

# Alternative to sigmoid junction
def smoothclamp(x, mi, mx): 
    return mi + (mx-mi)*(lambda t: np.where(t < 0 , 0, np.where( t <= 1 , 3*t**2-2*t**3, 1 ) ) )( (x-mi)/(mx-mi) )


plt.figure()
plt.plot(xs, sigmoid(x, y[index[0]-100], y[index[0]+100]),'b-', lw=3, alpha=0.5, label='sigmoid')
plt.plot(xs, sigmoid(x, y[index[1]-100], y[index[1]+100]),'b-', lw=3, alpha=0.5, label='sigmoid')
plt.plot(xs, sigmoid(x, y[index[2]-100], y[index[2]+100]),'b-', lw=3, alpha=0.5, label='sigmoid')
plt.plot(xs, ys)
plt.plot(step1x, step1y)
plt.plot(step2x, step2y)
plt.plot(step3x, step3y)

plt.show()```

#

Which give a plot like this:

📎 unknown.png

#

So I basically need those purple plots scaled down on the x axis

#

I'm just having some difficulties now where:

plt.plot(step1x, sigmoid(step1x, y[index[0]-100], y[index[0]+100]),'b-', lw=3, alpha=0.5, label='sigmoid')

Only takes a "slice" of the data, and isn't scaled down to it if you get my meaning:

📎 unknown.png

#

I think what I actually want is ```python
plt.plot(xs, sigmoid(step1x, ys[index[0]-100], ys[index[0]+100]),'b-', lw=1, alpha=0.5, label='sigmoid')

but then we get a similar problem to yesterday where: 
```ValueError: x and y must have same first dimension, but have shapes (4000,) and (200,)```

merry ridge Sep 10, 2020, 10:26 AM

#

It's not clear what you are actually trying to achieve. You just want to replace the colored regions by a smooth continuous function without a jump discontinuity?

lean wharf Sep 10, 2020, 10:26 AM

#

Yes basically

#

So replace these 200 values or so for y in each region with a smooth function

merry ridge Sep 10, 2020, 10:28 AM

#

It's not clear to me what the difficulty is. You can just choose basically any interpolating function you want and fix it to join those points

#

To me the laziest solution would be to just cut out say the yellow part and interpolate the start and the end of the yellow section by a straight line

#

and if that is not sufficiently smooth at the connecting points, pass the signal through a one dimension heat equation or some other mollifier to smoothen it out

#

in a neighborhood of the connecting points, not the whole signal

#

You could also try connecting it with spline or something, but it really depends on what properties you require in that section

lean wharf Sep 10, 2020, 10:39 AM

#

At this stage the laziest solution is best

#

I still want the 200 values in that region though, so I can slice them back into the original array

merry ridge Sep 10, 2020, 10:41 AM

#

so just write down a parametric equation of a line that intercepts those two points and evaluate it at 200 points?

shadow quiver Sep 10, 2020, 10:55 AM

#

Hey guys. I'm training a model that in every epoch, the train data completely differs. But the model overfits. How is that possible?

#

Like this. I would be glad if you mention me for any suggestion

📎 unknown.png

lean wharf Sep 10, 2020, 11:04 AM

#

I was attempting to get the properties of a sigmoid function @merry ridge

#

https://en.wikipedia.org/wiki/Sigmoid_function

Sigmoid function

A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve. A common example of a sigmoid function is the logistic function shown in the first figure and defined by the formula:

    S
    (
    x
   ...

#

ie.

📎 unknown.png

merry ridge Sep 10, 2020, 11:08 AM

#

If you have that plot already then what is the problem?

lean wharf Sep 10, 2020, 11:08 AM

#

The black line was drawn to demonstrate sorry, should have specified

tidal bough Sep 10, 2020, 11:09 AM

#

like Hexicle said, just write down the equation and evaluate it at the points

lean wharf Sep 10, 2020, 11:10 AM

#

That's the bit I'm struggling with I think. I'm an EE engineer, I'm still a python novice relatively speaking

tidal bough Sep 10, 2020, 11:10 AM

#

to be more specific, write a function that takes an array of X and Y values, calculates the equation of a sigmoid function that passes through the first and the last of the points, and then evaluate it at each of the X points, and return the result

#

let's see...

merry ridge Sep 10, 2020, 11:10 AM

#

This is just high school math, it's function compositions to shift and scale

#

like take S(x), center it by function composition with S(x-0.7) or whatever the mid point of the deleted data is

#

etc

tidal bough Sep 10, 2020, 11:11 AM

#

well, there is a problem of the sigmoid function, strictly speaking, not passing through 0 and 1 ever 😛

lean wharf Sep 10, 2020, 11:11 AM

#

I understand the maths, it's being able to modify the values of the array

merry ridge Sep 10, 2020, 11:12 AM

#

Right, but you don't need it to pass through 0 or 1

#

Just restrict the domain and truncate it at some y values

#

then scaling the sigmoid will match it up for you

#

What I would do is to identify which indices your yellow parts contain then use a lambda function to plot a sigmoid over a linspace of 200 points in a list and then shove those values into where the yellow part was

lean wharf Sep 10, 2020, 11:16 AM

#

Hmm, I'll give that a shot

sly magnet Sep 10, 2020, 11:16 AM

#

I want to add L_0.5 loss in my model while training
i have written :: loss = tf.reduce_sum(tf.pow(tf.abs(self.Coef),0.5))
But its giving NaN Error!
Whereas its working perfectly with L1 loss and L2 loss
loss = tf.reduce_sum(tf.square(self.Coef))
loss = tf.reduce_sum(tf.abs(self.Coef))
the above 2 lines are working perfectly, but i want to use L_0.5 loss....How to do that?

tidal bough Sep 10, 2020, 11:24 AM

#

a-ha

lean wharf Sep 10, 2020, 11:24 AM

#

It's a wonderful sight, thanks for your help @tidal bough @merry ridge

📎 unknown.png

tidal bough Sep 10, 2020, 11:24 AM

#

@lean wharf check this out

📎 unknown.png

#

oh, lol, you made it too 😅

merry ridge Sep 10, 2020, 11:24 AM

#

Good work

lean wharf Sep 10, 2020, 11:25 AM

#

Yeah, had to use smoothclamp instead of sigmoid though

tidal bough Sep 10, 2020, 11:25 AM

#

here's mine:

import numpy as np
from scipy.special import expit # for single values, manual implementation is faster, but expit is better for arrays
def sig_approx(X,Y,x_scale=10):
    X = X.copy()
    middle = X[len(X)//2]
    X -= middle
    x_coeff = (2*x_scale)/(X[-1]-X[0])
    X = X*x_coeff
    return Y[0]+Y[-1]*expit(X)

# plotting stuff:
%matplotlib widget
import matplotlib.pyplot as plt
#test case:
end = 10
X = np.linspace(0,end,100)
midpoint = X.shape[0]//2
Y = np.zeros(X.shape)
Y[midpoint:] = 1

#usage:
inds = slice(midpoint-10,midpoint+10)
Y[inds] = sig_approx(X[inds],Y[inds])
plt.plot(X,Y)

lean wharf Sep 10, 2020, 11:25 AM

#

Sigmoid gave me something a little like this

📎 unknown.png

merry ridge Sep 10, 2020, 11:26 AM

#

How are you scaling it?

lean wharf Sep 10, 2020, 11:26 AM

#

    return mi + (mx-mi)*(lambda t: np.where(t < 0 , 0, np.where( t <= 1 , 3*t**2-2*t**3, 1 ) ) )( (x-mi)/(mx-mi) )

chrome laurel Sep 10, 2020, 11:26 AM

#

so i am watching a kinda of outdated course on pandas does anyone know what happend to the ix[] and are there any equivalent?

merry ridge Sep 10, 2020, 11:26 AM

#

I mean how are you scaling the sigmoid

tidal bough Sep 10, 2020, 11:26 AM

#

How are you scaling it?
Such that the first X gets changed to -10, and the last one to 10.
and by the Y axis - by Y[-1]-Y[0]

merry ridge Sep 10, 2020, 11:26 AM

#

Sorry I mean Aromasin's plot

lean wharf Sep 10, 2020, 11:27 AM

#

diff = ys[1:] - ys[:-1]
indexBool = diff > 0.385 # Variable adjusted to fit number of steps
index = np.argwhere(indexBool).reshape(-1)

def smoothclamp(x, mi, mx): 
    return mi + (mx-mi)*(lambda t: np.where(t < 0 , 0, np.where( t <= 1 , 3*t**2-2*t**3, 1 ) ) )( (x-mi)/(mx-mi) )

plt.plot(xl, sigmoid(yl, ys[index[0]-100], ys[index[0]+100]))

#

So the top set of code returns the point where the step happens

merry ridge Sep 10, 2020, 11:29 AM

#

What is the definition of your sigmoid function

lean wharf Sep 10, 2020, 11:29 AM

#

I didn't use sigmoid in the end, my sigmoid def was: def sigmoid(x, mi, mx): return mi + (mx-mi)*(lambda t: (1+200**(-t+0.5))**(-1) )( (x-mi)/(mx-mi) )

merry ridge Sep 10, 2020, 11:42 AM

#

I would have defined it differently, to be honest, I don't know why that notation even works

#

Define the signal as f(t) and the sigmoid as S(t) = exp(t)/(1+exp(t).

#

Find the interval [a,b] that contains the yellow part

tidal bough Sep 10, 2020, 11:46 AM

#

1/(1+exp(-t)) is one less exponent 😉

merry ridge Sep 10, 2020, 11:46 AM

#

Then replace S(t) by S(t - (b-a)/2). Call this function g(t). Then find a constant K such that Kg(b) = f(b) so that K = f(b)/g(b). Then use g(t)*f(b)/g(b)

lean wharf Sep 10, 2020, 11:47 AM

#

Yeah my implementation is scrappy as hell

merry ridge Sep 10, 2020, 11:48 AM

#

I don't even understand that implementation. You have a (mx-mi) on the left, and a "/mx-mi" on the right. Those would cancel? I don't understand the notation here

tidal bough Sep 10, 2020, 11:49 AM

#

@merry ridge The last paranthesis group is the argument passed to the lambda

merry ridge Sep 10, 2020, 11:49 AM

#

Oh, thanks

tidal bough Sep 10, 2020, 11:50 AM

#

so it's (mx-mi) * f( (x-mi) / (mx-mi) ), which is about right

lean wharf Sep 10, 2020, 11:51 AM

#

I could probably rewrite it like:

def smoothstep(x, x_min=0, x_max=1, N=1):
    x = np.clip((x - x_min) / (x_max - x_min), 0, 1)

    result = 0
    for n in range(0, N + 1):
         result += comb(N + n, n) * comb(2 * N + 1, N - n) * (-x) ** n

    result *= x ** (N + 1)

    return result

tidal bough Sep 10, 2020, 11:51 AM

#

@lean wharf I highly suggest you generally split code into more lines - it's more readable for us, and believe me, you too are going to regret this in a week when you try to read the code and can't 🙂

merry ridge Sep 10, 2020, 11:51 AM

#

But if you are evaluating at (x-mi)/(mx-mi) that isn't what you want

lean wharf Sep 10, 2020, 11:51 AM

#

Yeah, I do generally, just code vomiting atm till it works

#

Where N is how smooth I want the curve

#

Probably a tad more legible

tidal bough Sep 10, 2020, 11:54 AM

#

that code looks like it'd be inefficient, if it works

#

I don't quite get what's happening, but you generally want to vectorize things when possible

lean wharf Sep 10, 2020, 12:03 PM

#

Yeah, I've read that vectorizing is more efficient in python but again I'm still relatively new to it

vague jetty Sep 10, 2020, 12:36 PM

#

I'm totally drawing a blank - what's the name for the method of determining the statistical significance of multiple variables on an output? It's not ANOVA, it's <<something>> <<something>> analysis

#

Nvm, it's principal component analysis

safe sparrow Sep 10, 2020, 1:07 PM

#

im trying to learn the exponent using tf.math.pow in Tensorflow Keras

#

my layer is created doing

#

class Dense_Power(Layer):
    def __init__(self, **kwargs):

        super(Dense_Power, self).__init__(**kwargs)

    def build(self, input_shape):
        self.kernel = self.add_weight('kernel',
                                      shape=(input_shape[1],),
                                      initializer=tf.keras.initializers.glorot_uniform(),
                                      trainable=True)

        # Create a trainable weight variable for this layer.
        self.power = self.add_weight('power',
                                      shape=(input_shape[1],),
                                      initializer=tf.keras.initializers.glorot_uniform(),
                                      trainable=True)

        super(Dense_Power, self).build(input_shape)  # Be sure to call this at the end

    def call(self, x):
        power_val = tf.math.pow(x, self.power)
        dot_prod = tf.linalg.matmul(power_val, self.kernel)
        return dot_prod

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[1])

#

however i only get nan's in my training doing this

#

as it somehow invalidates all weights in my model

gritty jackal Sep 10, 2020, 2:21 PM

#

While reading Hands-on Machine Learning with Scikit Learn, Keras and Tensorflow Book, I came across this equation for batch gradient descent partial derivative. In batch gradient descent, we try to minimize a prediction error by finding an appropriate weights for our features. In order to find the weights, batch gradient descent calculates partial derivative of the cost function as mentioned in the equation below where it uses random weights initialized at the start of the training, input values, etc(confused about what are all the values). But in this equation, I am not able to understand all the variables, theta is for randomly initialized weights for sure, what are xi and xij out side the braces? I believe yi should be actual value of a dependent variable but need confirmation for the same because it could be predicted value also.

📎 unknown.png

#

Tried looking on the internet, but could not find the explanation of this same equation anywhere.

tidal bough Sep 10, 2020, 2:43 PM

#

@gritty jackal Fairly sure it's the actual value, because Theta^T @ X is how you get a prediction.

#

So (Theta^T @ x^i - y^i) is just the prediction error of this point.

gritty jackal Sep 10, 2020, 2:57 PM

#

@tidal bough alright that makes sense, but what about x outside the braces? which value is that?

tidal bough Sep 10, 2020, 2:58 PM

#

x^i_j is the jth component of the ith input point

gritty jackal Sep 10, 2020, 2:58 PM

#

and x^i is input values matrix right?

tidal bough Sep 10, 2020, 2:58 PM

#

it might be easier to see if you see how this equation is derived

#

the cost function is:

1/m * Sum(i from 1 to m)[(Theta^T @ X^i - y^i)^2]

#

Does that make sense so far?

gritty jackal Sep 10, 2020, 2:59 PM

#

yes

tidal bough Sep 10, 2020, 3:00 PM

#

now, we take the derivative with regards to Theta_j

#

that's a bit tricky, since it's a single component of the Theta row vector

#

but we can notice that if we were to expand the product:

Theta^T @ X^i

(for any i), we would notice that there's a single term involving Theta_j:

Theta_j * X^i_j

#

so it's only it that contributes to the derivative.

#

The derivative of the outer sum is just the sum of derivatives of the things that are summed over.

#

Each thing is:

(Theta^T @ X^i - y^i)^2

,the derivative of which is:

2 * (Theta^T @ X^i - y^i) * d/d(Theta_j) (Theta^T @ X^i - y^i))

#

Does that make sense? It's the derivative of a composite function rule, if I remember the name right:

d/dx (f^2 (x)) == 2*f(x)*d/dx (f(x))

merry ridge Sep 10, 2020, 3:04 PM

#

That is the chain rule.

gritty jackal Sep 10, 2020, 3:06 PM

#

yup , understood. @tidal bough Thank you so much for your time and efforts. Appreciated 👍

tidal bough Sep 10, 2020, 3:08 PM

#

And then, as I said before, only one of the components of the vector product contributes to the derivative, so that derivative on the right is just X^i_j

#

And so we obtain the right formula

gritty jackal Sep 10, 2020, 3:10 PM

#

Hmm, yes I got it now.

merry ridge Sep 10, 2020, 3:13 PM

#

If you want to read many pages on this equation, Cosma Shalizi has a very good and free book on advanced data analysis

#

http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/

royal tundra Sep 10, 2020, 3:14 PM

#

I have a 5gb csv with 11 columns and over 30M rows. I have to connect it to a MySQL db which I will then connect to AWS (I believe RDS) and then access in Tableau.
I am unfamiliar with MySQL and AWS. I know this is a python channel but can someone please help me set this up? I can pay you

gritty jackal Sep 10, 2020, 3:15 PM

#

@merry ridge Thanks 👍

#

I would surely take a look at that book

merry ridge Sep 10, 2020, 3:16 PM

#

He has another 400 pages on this topic (as if the first 5 chapters of that other book isn't enough) that is easier to read here: http://www.stat.cmu.edu/~cshalizi/TALR/

gritty jackal Sep 10, 2020, 3:17 PM

#

That's great

solar bluff Sep 10, 2020, 3:18 PM

#

Anyone ever created a pandas ExtensionArray and ExtensionDtype

brittle agate Sep 10, 2020, 3:33 PM

#

📎 lol.png

turbid halo Sep 10, 2020, 3:50 PM

#

damn this stuff looks hard lmfao

flat quest Sep 10, 2020, 4:16 PM

#

so true lol

Worst is when it keeps jumping up and down like it does with gans

upbeat cradle Sep 10, 2020, 4:30 PM

#

Hey, what’s the best way to get a list or dataframe of the counts of a certain field? I’ve created a DataFrame using value_counts but that value seems to store as the field Key which existed before and the value I want the counts of has no field name now

#

values[:25]```

Just shows something like:
          Key
AB     3
BC     2

#

Key is the field of AB, BC, etc

novel remnant Sep 10, 2020, 4:33 PM

#

np.unique with return_counts=True will return two arrays of the unique values and the count of each unique value. I don't understand exactly what you want but it might help.

upbeat cradle Sep 10, 2020, 4:37 PM

#

I'm just trying to be able to access the value_counts info as a string

#

I want the value_counts and the AB/BC

#

I can use values.iloc[0][0] to get the value, but not the key (AB/BC)

novel remnant Sep 10, 2020, 4:58 PM

#

AB and BC are the index of your new dataframe you can access them with values.index[0] for example.
If you want AB and BC can become a new column of your dataframe by using values.reset_index(inplace=True, drop=False)
then you can have both the value name and value counts as columns of your dataframe which can be accessed with .loc.
Still not sure if that's what you're after

mental vortex Sep 10, 2020, 5:58 PM

#

@lofty meteor Sorry for late reply

#

You still there?

#

Actually, let's move to another channel

lofty meteor Sep 10, 2020, 5:58 PM

#

Yep

lapis sequoia Sep 10, 2020, 6:02 PM

#

Hello, I have a general question. I can show the code if someone wants details but the question is like, I had a MATLAB program that i converted to python because it was slow in matlab. the run time in matlab is 0.048 seconds per run, the run time in python is 0.008 seconds, now I am looping the program to verify the speed difference but for some reasons when I loop over python it's slower than looping over matlab.

Just to clarify the loop isn't represented in the body of the code at all
like I have for i=1:3000
xx
end
for matlab where i , is not part of the code
and in python I have
for i in range(3000)
xx
basically I recorded run time for matlab growth and it doesn't grow linearly while python grows linearly. I just want to see if there is a feature used in matlab to like skip some calculations when looping? that i could import to python as it starts faster

#

in python the time to run the 100th loop is 100 * run tiime

#

but in matlab it's just 10x

tidal bough Sep 10, 2020, 6:04 PM

#

probably something specific to what you're doing.

lapis sequoia Sep 10, 2020, 6:05 PM

#

📎 untitled.png

#

but i converted the code from matlab to python code per code

#

sec

#

https://paste.pythondiscord.com/ayupanoxab.pl
https://paste.pythondiscord.com/lafuhudahu.py

#

https://paste.pythondiscord.com/sunizefeja.shell MATLAB

#

https://paste.pythondiscord.com/welukuxoku.shell MATLAB

#

well only thing is I just added the code in matlab to calculate run time

tidal bough Sep 10, 2020, 6:07 PM

#

oh dear

#

I can suggest using a debugger to see what's different on the 100th iteration from the 1st one, for instance

#

whether the execution time per iteration is constant or not shouldn't depend on the language

lapis sequoia Sep 10, 2020, 6:08 PM

#

ok that sounds great, how do i do that

tidal bough Sep 10, 2020, 6:08 PM

#

so something's probably not right

lapis sequoia Sep 10, 2020, 6:08 PM

#

well in python it is constant

#

but matlab it gets faster

#

for some reason

#

and i want to incorporate that

#

again my python isn't slowing down ibut time grows linearly

#

matlab grows below linearly

#

as shown in time of run vs number of iteration graph i created

tidal bough Sep 10, 2020, 6:09 PM

#

...how did you determine that it grows below-linearly?

#

that looks pretty linear to me

lapis sequoia Sep 10, 2020, 6:10 PM

#

if it was linear time to run 2 iteration = 2 * time to run 1 iteration

#

also it's clear on the end it's flatenning

tidal bough Sep 10, 2020, 6:11 PM

#

not if it has a constant term

lapis sequoia Sep 10, 2020, 6:11 PM

#

yep you are right

#

but the main idea persists that python does first iteration for 0.008 seconds but the 3500th is 3500* 0.008= 28 seconds

#

while matlab starts with .05 for 1 iteration but does 3500 in 8 seconds

tidal bough Sep 10, 2020, 6:13 PM

#

📎 unknown.png

#

this looks pretty linear for me, again

#

you can do a linear regression and calculate the R^2 if you want, but it sure looks like a linear function + some noise.

lapis sequoia Sep 10, 2020, 6:14 PM

#

true but that's beside the point

#

there are some calculations matlab doesn't have to redo

#

that python does

#

represented in the constant term

#

as you said

#

my point is that matlab is also slow for me in large iterations

#

and i switched to python because it's faster and it's faster in first 10 iterations

tidal bough Sep 10, 2020, 6:15 PM

#

but the main idea persists that python does first iteration for 0.008 seconds but the 3500th is 3500* 0.008= 28 seconds
that's not true if there's a constant term.
The right rule is:

(y3-y2)/(x3-x2) == (y2-y1)/(x2-x1)

which should be true for any three points for a linear function.

#

Anyway, so is the total running time slower in Python or not? I don't quite get what you're concerned about.

lapis sequoia Sep 10, 2020, 6:16 PM

#

it is slower

#

in python

#

anyone know how to turn this list into a dictionary?

["Max "Brown Eye" Scherzer, P (2008-)", "", "Season Pitching", "gamesPlayed: 9", "gamesStarted: 9", "groundOuts: 33", "airOuts: 46", "runs: 19", "doubles: 8", "triples: 1", "homeRuns: 6", "strikeOuts: 69", "baseOnBalls: 17", "intentionalWalks: 0", "hits: 50", "hitByPitch: 1", "avg: .255", "atBats: 196", "obp: .315", "slg: .398", "ops: .713", "caughtStealing: 0", "stolenBases: 5", "stolenBasePercentage: 1.000", "groundIntoDoublePlay: 1", "numberOfPitches: 866", "era: 3.40", "inningsPitched: 50.1", "wins: 4", "losses: 2", "saves: 0", "saveOpportunities: 0", "holds: 0", "earnedRuns: 19", "whip: 1.33", "battersFaced: 216", "gamesPitched: 9", "completeGames: 1", "shutouts: 0", "strikes: 560", "strikePercentage: 64.7", "hitBatsmen: 1", "balks: 0", "wildPitches: 3", "pickoffs: 0", "groundOutsToAirouts: 0.72", "winPercentage: .667", "pitchesPerInning: 17.2", "gamesFinished: 0", "strikeoutWalkRatio: 4.06", "strikeoutsPer9Inn: 12.34", "walksPer9Inn: 3.04", "hitsPer9Inn: 8.94", "runsScoredPer9: 5.01", "homeRunsPer9: 1.07", "inheritedRunners: 0", "inheritedRunnersScored: 0", "sacBunts: 0", "sacFlies: 2", "", ""]

#

there is no constant term

#

📎 unknown.png

#

this is the run time in python

#

for 3500

#

for 1 seconds iteration i will show u

#

📎 unknown.png

#

u can see range(1) compared to range (3500)

#

the ratio is 3500

#

for matlab most of the time is in the constant term, so the growth is less

#

do you understand my concern now?

#

while for matlab ratio is 8 seconds / .05 seconds = 180

#

growth

tidal bough Sep 10, 2020, 6:20 PM

#

for matlab, I'm getting:
0.3966 / 0.2016 * 101 ~= 198.69

#

so it also is exactly proportional to the number of iterations

#

so there's nothing weird going on here - they both scale with the number of iterations directly. The Python implementation is just slower in general.

lapis sequoia Sep 10, 2020, 6:21 PM

#

how is the python implementation slower when first iteration for python is 0.008 seconds

#

while first one with matlab is

#

like ~0.05

tidal bough Sep 10, 2020, 6:23 PM

#

sounds like either one first iteration isn't timed right, or there's some bug that causes all the later iterations to be slower

#

I'd check the former first.

lapis sequoia Sep 10, 2020, 6:23 PM

#

i just have
start=datetime.now()
before the start of the loop

#

and

#

at the end

#

print( datetime.now()-start)

#

of the loop

#

i mean after the loop ends

#

as you said because there is a constant in matlab
ratio of last iteration to first is (slope100+constant)/(slope1+constant), if the constant is high then ratio is small, i am not saying the slope isn't constant

#

i am saying the constant is a large component of the run time

#

while for python it's just directly proportional to the ratios

#

yeah i fitted regression and the time is
time=0.0017*iteration+.007

#

while for python it's simply time =0.008*iteration

#

so matlab grows at a lower rate but yes still linearly

#

so is my solution to try to rewrite the python code independent of the matlab code? i think by force imposing the format of the matlab code i probably am not utilizing all the features of python?

#

or maybe it has to do with spyder itself? cause some people told me spyder is not fast

crisp jewel Sep 10, 2020, 6:49 PM

#

what does sequence[:,:-1] return

desert oar Sep 10, 2020, 6:58 PM

#

@lapis sequoia use time.perf_counter for timing

#

Spyder shouldn't affect the speed of running python code

#

Python is doing a lot of work in the background, can't speak for matlab but it's probably doing less work as the runtime is more specialized

#

Eg the "first iteration" in Python also involves implicitly calling iter() which takes some overhead

#

Not to mention whatever memory allocation and garbage collection is being triggered intermittently

lapis sequoia Sep 10, 2020, 7:01 PM

#

i think one important factor is in matlab i can clear all variables except the ones i need and maybe that makes it faster?

#

while with python all variables stay in memory through the run

desert oar Sep 10, 2020, 7:01 PM

#

Matlab probably has a lot of optimizations that python does not have

#

That isn't likely to make a big difference but it might help trigger garbage collection at more regular intervals

lapis sequoia Sep 10, 2020, 7:01 PM

#

well the thing is everyone who used python for their calculations told me it's faster than matlab

#

but who knows i guess i have to know how to write a fast python code

#

rather than convert a matlab code to python

#

i mean i wrote the code in matlab and tried to convert it line per line and probably that's not the best thing

desert oar Sep 10, 2020, 7:02 PM

#

That I don't know, but in general porting code from one language and runtime to another is not a guarantee that you'll get a fair comparison

#

You posted both versions above?

lapis sequoia Sep 10, 2020, 7:03 PM

#

yes

#

well it says matlab is shell but it's just matlab, but because in matlab comment is %

#

but it i meant the discord server xd

desert oar Sep 10, 2020, 7:04 PM

#

is there a "short version"

#

this is a lot of code

#

and what exactly are you concerned about

#

the matlab code seems faster per iteration than the python code?

#

    e=np.array([2e5]);   # Young's MODULUS OF ELASTICITY
    g=np.array([1e5]);   # MODULUS OF RIGIDTY 
    den=np.array([7850]); # MASS DENSITY

why are you creating all these length-1 arrays?

lapis sequoia Sep 10, 2020, 7:05 PM

#

because the form

#

has to accept multiple size

#

i mean i am coding it for running over various values

#

of e,g

#

etc

desert oar Sep 10, 2020, 7:06 PM

#

and what is it with matlab programmers and horrible variable names 😛

#

would it kill you to write density instead of den?

#

i dont understand the point of this though

#

what is igtyp supposed to be

lapis sequoia Sep 10, 2020, 7:07 PM

#

well this is finite element modelling, so igtyp is like whether u have a tower made of same material

#

or different material

#

if i have 3 types of material i would have [1 2 3]

#

and each value would corrospond toa type

desert oar Sep 10, 2020, 7:08 PM

#

oh sorry i meant imtyp

#

rho = den[(imtyp-1).astype(int)]

lapis sequoia Sep 10, 2020, 7:08 PM

#

imtype is material property

#

igtype is geometric properties

desert oar Sep 10, 2020, 7:08 PM

#

it looks like you're just trying to broadcast a number into some shape, right?

#

since igtyp and imtyp are all 1?

lapis sequoia Sep 10, 2020, 7:09 PM

#

yeah but they need now be

#

basically i am drawing a beam

#

that beam could be uniform or it could have like increasing cross section

#

or like different types of metals

#

and each node represents a segment

desert oar Sep 10, 2020, 7:10 PM

#

right, but practically you're using imtyp to "expand" this density into a matrix of some size

#

is that right?

lapis sequoia Sep 10, 2020, 7:10 PM

#

yes

#

em=np.array(e[(imtyp-1).astype(int)])
gm=np.array(g[(imtyp-1).astype(int)])

#

em will be a 40x1 vector

#

let say i had4 types i element, e would have 4 elements, and i would have em(0)... em(9) equal to e(0)

#

then em(10) to em(19) equal to e(1) etc

#

my em would have each element as a sub of the possible building blocks defined in e

#

same for gm

desert oar Sep 10, 2020, 7:13 PM

#

ok. im not sure about matlab but in numpy you can just do this

DENSITY = 7850.0

...

rho = np.full(imtyp.shape, DENSITY)

#

or better yet

rho = np.full_like(imtyp, DENSITY)

lapis sequoia Sep 10, 2020, 7:14 PM

#

instead of den[(imtyp-1).astype(int)]

#

?

desert oar Sep 10, 2020, 7:14 PM

#

yes

#

should be somewhat more efficient

#

and also easier to read / less confusing

lapis sequoia Sep 10, 2020, 7:14 PM

#

i needed to use astype iint

#

cuz it kept telling me things like it's floating or tuple or something

#

i don't remember

desert oar Sep 10, 2020, 7:15 PM

#

thats fine, ```python
rho = np.full_like(imtyp, DENSITY, dtype=int)

lapis sequoia Sep 10, 2020, 7:15 PM

#

i keep having to use as type int when i try to index

desert oar Sep 10, 2020, 7:15 PM

#

oh

#

i see

#

well you arent using any indexing here

#

all this is saying is, "make a new array in the same shape as imtyp and fill it with DENSITY"

#

the actual contents of imtyp are irrelevant

#

making an array of 1s just to "expand" numbers to arrays isn't necessary at all ever in numpy

lapis sequoia Sep 10, 2020, 7:16 PM

#

no the thing is let say if imtype was [ 1 1 2 3 1 ] then it should have [ den(1) den1) den(2) den(3) den(1)]

#

would that code actuate that?

desert oar Sep 10, 2020, 7:16 PM

#

oh, no

#

but den is only 1 element

#

so that would be an error anyway...

lapis sequoia Sep 10, 2020, 7:17 PM

#

den can have as many elements as there are unique elements

#

of imtype

#

den has 1 element because in the simple example i am doing

desert oar Sep 10, 2020, 7:17 PM

#

ah, ok

#

in that case never mind

lapis sequoia Sep 10, 2020, 7:18 PM

#

basically this is like me calling each element an ID and associated a set of values to that ID

desert oar Sep 10, 2020, 7:18 PM

#

right

#

this is fine then

slender nymph Sep 10, 2020, 7:18 PM

#

hi folks

#

Simulate a portfolio of home insurance policies (5,000 homes insured).
The value of damages is distributed according to a Uniform law between $ 250,000 and $ 2.25 million.
An “accident” can occur with probability p. If this is the case, there is a probability q that the damage is the maximum possible (total loss). With probability 1-q, the loss is partial according to a Uniform distribution on (0,1).
You don't know what the liability loss could be, but it can be up to 10 times the value of the property.

#

what module i need for this?

desert oar Sep 10, 2020, 7:19 PM

#

numpy and maybe scipy @slender nymph

lapis sequoia Sep 10, 2020, 7:19 PM

#

the thing is
em=np.array(e[(imtyp-1).astype(int)])
gm=np.array(g[(imtyp-1).astype(int)])
rho=den[(imtyp-1).astype(int)];
sxi=mi[(igtyp-1).astype(int)];
a=aa[(igtyp-1).astype(int)];
sk=shp[(igtyp-1).astype(int)];
dx=xp[(n[1,0:]-1).astype(int)]-xp[(n[0,0:]-1).astype(int)]
dy=yp[(n[1,0:]-1).astype(int)]-yp[(n[0,0:]-1).astype(int)]
me haviing to use as type int all the time

desert oar Sep 10, 2020, 7:19 PM

#

@lapis sequoia ok, other than that i don't see anything too strange in your code. although

    f[tuple([fdof,0])]=f1

is weird

lapis sequoia Sep 10, 2020, 7:19 PM

#

is it slowing me down and how can i change it?

desert oar Sep 10, 2020, 7:19 PM

#

you can save it as another variable

lapis sequoia Sep 10, 2020, 7:19 PM

#

no i mean how can i make it normally accept indexiing

slender nymph Sep 10, 2020, 7:20 PM

#

how can i simulate 5k police house insurance?

lapis sequoia Sep 10, 2020, 7:20 PM

#

without having to write as type iint all the time

desert oar Sep 10, 2020, 7:20 PM

#

imtyp_indexer = (imtyp - 1).astype(int)

em = e[imtyp_indexer]
gm = g[imtyp_indexer]
rho = den[imtyp_indexer]
...

lapis sequoia Sep 10, 2020, 7:21 PM

#

btw that tuple is because it refused to accept f[fdof,0]=f1

desert oar Sep 10, 2020, 7:21 PM

#

what is f

lapis sequoia Sep 10, 2020, 7:21 PM

#

it's a vector

#

array of float 64

desert oar Sep 10, 2020, 7:21 PM

#

oh i see, f=np.zeros((ntdof,1))

#

btw you can remove the trailing semicolons

#

python doesn't need them

#

it looks like you have them in some places but not others

lapis sequoia Sep 10, 2020, 7:21 PM

#

i know but ii copy pasted from matlab

#

so

#

i mean i copy pasted the matlab code and tried removing some

#

i guess i can just do replace all

desert oar Sep 10, 2020, 7:22 PM

#

wait

#

oh

#

why is fdof an array?

#

and why is f1 an array?

lapis sequoia Sep 10, 2020, 7:22 PM

#

because there can be multiple forces

#

f is the acting force

desert oar Sep 10, 2020, 7:22 PM

#

f1 = np.array(100) this is a "size 0" array, don't do this

lapis sequoia Sep 10, 2020, 7:22 PM

#

it can act on multiple component of the beam

desert oar Sep 10, 2020, 7:23 PM

#

i see

#

what about f1?

lapis sequoia Sep 10, 2020, 7:23 PM

#

f1 is the value of f

#

fdof is which part of the beam does it act on

#

basically it's like

desert oar Sep 10, 2020, 7:23 PM

#

but is that supposed to be an array too?

lapis sequoia Sep 10, 2020, 7:23 PM

#

i have to specific the location and value of f

#

yes same size as f1

desert oar Sep 10, 2020, 7:23 PM

#

ok

#

it doesn't work because you wrote np.array(100) instead of np.array([100])

lapis sequoia Sep 10, 2020, 7:24 PM

#

ah well i didnt test it for vector case yet

#

but good for letting me know

desert oar Sep 10, 2020, 7:25 PM

#

import numpy as np

nn=41
ntdof=nn*3

f1 = np.array([100])
fdof = np.array([122-1])
f = np.zeros((ntdof, 1))
f[fdof, 0] = f1

print(f)

lapis sequoia Sep 10, 2020, 7:26 PM

#

so i won't need tuple

desert oar Sep 10, 2020, 7:26 PM

#

np.array(100) is a weird array thing that has zero shape

lapis sequoia Sep 10, 2020, 7:26 PM

#

if i do that?

tidal bough Sep 10, 2020, 7:26 PM

#

this code could use some variable names 😅

desert oar Sep 10, 2020, 7:26 PM

#

correct

#

@tidal bough i know i already said

#

it seems to be a plague among matlab programmers

#

every matlab programmer ive worked with writes code like this

#

as little whitespace as possible and as short variable names as possible

tidal bough Sep 10, 2020, 7:26 PM

#

I suddenly feel the urge to check my old Octave code 😅

lapis sequoia Sep 10, 2020, 7:26 PM

#

because we are engineers first and not experienced with programming practices

desert oar Sep 10, 2020, 7:27 PM

#

@slender nymph is this for school?

#

@lapis sequoia no its because you have bad role models who also write code like this

lapis sequoia Sep 10, 2020, 7:27 PM

#

yes exactly but majority of engineers are such xD

#

bad role models at programming

#

at least the ones i work with

desert oar Sep 10, 2020, 7:27 PM

#

@crisp jewel if sequence is a numpy array, that returns everything in the array except the last column

lapis sequoia Sep 10, 2020, 7:27 PM

#

obviously i can't speak worldwide

tidal bough Sep 10, 2020, 7:28 PM

#

I suddenly feel the urge to check my old Octave code 😅
...ehh, it was good enough 😛

📎 unknown.png

desert oar Sep 10, 2020, 7:28 PM

#

anyway, other than creating the imtyp_indexer variable i dont see anything really slow about this code

#

  totl=sum(xxl);
    glom=np.zeros((ntdof,ntdof));
    glost=np.zeros((ntdof,ntdof));
    for ie in range(ne):
        est=estif_frame(ndofe,ie,a,sxi,xxl,em,rho,theta);
        for id in range(ndofe):
            for jd in range(ndofe):
                igdof=ndof[id,ie]
                jgdof=ndof[jd,ie]
                glost[igdof.astype(int)-1,jgdof.astype(int)-1]=glost[igdof.astype(int)-1,jgdof.astype(int)-1]+est[id,jd]

im not sure if there is a better way to do this

#

iterating over arrays is slow

#

but there might not be a vectorized version

lapis sequoia Sep 10, 2020, 7:29 PM

#

well one way i can solve it is that i oonly do it once

#

but if iteration i >1

desert oar Sep 10, 2020, 7:29 PM

#

right, but thats changing the algorithm

lapis sequoia Sep 10, 2020, 7:29 PM

#

i use thhe same values of igdof

desert oar Sep 10, 2020, 7:29 PM

#

which is fine, but not a fair comparison w/ the matlab code

lapis sequoia Sep 10, 2020, 7:29 PM

#

because ndof will be the same my entire loop

desert oar Sep 10, 2020, 7:29 PM

#

sure

#

of course you can always try JIT compiling this with numba too

#

i do however recommend you read through PEP 8

#

!pep 8

arctic wedgeBOT Sep 10, 2020, 7:30 PM

#

**PEP 8 - Style Guide for Python Code**

Link

Status

Active

Created

05-Jul-2001

Type

Process

lapis sequoia Sep 10, 2020, 7:30 PM

#

C:\Users\hamad\Downloads\Frame_2D_EU123C_new.py:121: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.

#

btw when i did what u wrote

#

it told me this

desert oar Sep 10, 2020, 7:30 PM

#

a non-tuple sequence?

lapis sequoia Sep 10, 2020, 7:30 PM

#

when i removed tuple

desert oar Sep 10, 2020, 7:30 PM

#

what did you write

lapis sequoia Sep 10, 2020, 7:30 PM

#

and put []

desert oar Sep 10, 2020, 7:30 PM

#

no

lapis sequoia Sep 10, 2020, 7:30 PM

#

instead

desert oar Sep 10, 2020, 7:30 PM

#

wrong

#

look at what i wrote

lapis sequoia Sep 10, 2020, 7:31 PM

#

f1= np.array([100]) # load value

desert oar Sep 10, 2020, 7:31 PM

#

f[fdof, 0] = f1

it sounds like you did this

f[[fdof, 0]] = f1

lapis sequoia Sep 10, 2020, 7:31 PM

#

yeah

#

uu r right

#

what's the difference btw?

#

it doesn't giive the error anymore

desert oar Sep 10, 2020, 7:36 PM

#

@lapis sequoia with f[fdof, 0] you are indexing with fdof and 0. with f[[fdof, 0]] you are indexing with a single object, [fdof, 0]

#

numpy tries to be smart and infers that if you write f[(fdof, 0)] you mean f[fdof, 0]

#

the error has to do with the fact that the default inference behavior is changing

#

however i recommend not relying on inference

lapis sequoia Sep 10, 2020, 7:38 PM

#

ah ok

#

could something completely unrelated to the code cause the speed difference?

#

like whether spyder is installed in ssd or hdd etc

#

or the code i am running

desert oar Sep 10, 2020, 7:39 PM

#

spyder? no

#

i mean... maybe, if it has some kind of debugging features that are slowing down the interpreter

#

but probably not

tame pelican Sep 10, 2020, 7:43 PM

#

#Absolute
@cli.command()
@click.argument('N1', type=int)
@click.option('--num', is_flag=True, help='INTEGER')
def abs(n1):
    """Calculates absolute value."""
    answer = int(abs(n1))

    click.echo('abolsute value = {}'.format(answer))

code is above, im trying to make a command so it shows the absolute value but an error says its not valid.. ideas?

lapis sequoia Sep 10, 2020, 7:47 PM

#

i dont think it has to do with my code @desert oar
if I do this

from time import perf_counter
start= perf_counter()
for j in range(1):
for i in range(j):
x=1;

end= perf_counter()
delta=end-start

desert oar Sep 10, 2020, 7:47 PM

#

@tame pelican what is the error? it looks like you're missing the num parameter to abs

lapis sequoia Sep 10, 2020, 7:47 PM

#

and compare to matlab, it's the same thing for 1 iteration python is faster

#

for 10000 iterations matlab is faster

i dont think it has to do with my code @desert oar if I do this

i dont think it has to do with my code @desert oar
if I do this