#data-science-and-ml

1 messages Β· Page 250 of 1

velvet thorn
#

I guess my PoV is that a lot of people treat DataFrames as slightly better 2D lists

#

which leads to avoiding a lot of the conveniences and tools that pandas gives you

#

e.g. vectorisation

#

anyway yeah @misty mica it also depends on what you want to do with the data after you're done with it

#

that'd influence storage concerns

#

just for completeness, it's worth noting that the SQL way to handle this (canonically) would be to create two separate tables

#

and in the second table each row would be an original filename ID, an index, and one part of the filename

misty mica
#

I split it into a list so I could do some tag-like analysis, but I think just storing the string instead of a list is adequate.

velvet thorn
#

if you did that you could just do df[(df['index'] == 1) & (df['filename'] == whatever)]

#

which is nice, but also adds cognitive overhead because now you have to juggle two tables

misty mica
#

In general for most things I'll only care about the files that are formatted according to my standard example above, so I'm going to store those three fields and also filename as a string.

#

Thanks for the assistance!

velvet thorn
#

yw

indigo obsidian
#

speaking of snake_case vs camelCase, constantly going between R and Python for school is starting to destroy my sanity 😫

velvet thorn
#

hm

#

I use TypeScript (camelCase) for my frontend and Python (snake_case) for my backend

#

it's p okay but

#

because API responses are returned in snake case

#

I ALSO have snake case variables in my TypeScript

#

πŸ₯΄

misty mica
#

That's funny, I've been wanting to use camel/pascal casing in python because my databases all tend to be snake case.

indigo obsidian
#

is there anything inherently wrong with camelCase in python? or just a matter of good/bad practice?

velvet thorn
#

is there anything inherently wrong with camelCase in python? or just a matter of good/bad practice?
@indigo obsidian snake case is preferred

#

but not inherently wrong, no

misty mica
#

If everyone uses the same style guide it is nice, but not worth going to battle over in most organizations.

#

I mean, not worth fighting an existing standard that isn't your preferred, it's definitely worth having a standard.

velvet thorn
#

yup

#

what sucks, though

#

is not having a standard.

#

😦

wheat pilot
#

in a list of tuples how do i get the first index of the list but second element in the tuple

#

something like list = [(2.3, 1), (3.5, 0)]

#

and get out the 1

slate hollow
#

@wheat pilot 1. this probably isn't the best place but since you asked
you would so smth like x[0][1]

indigo obsidian
#

just wondering, where would be the best place to ask more fundamental questions like this?

tidal bough
dusty bough
violet mesa
#

Anyone know of a good textbook for Time Series Analysis in Python? Understanding ACF, PACF, ARIMA, SARIMA with a good depth on the formulae etc...

glacial rune
#

What is the most performant way of inserting lots of records into a SQL database? It took me almost 100 seconds for 57k records using executemany()

indigo obsidian
#

BULK INSERT is another possible option

lapis sequoia
#

@glacial rune : executemany() isn't a true bulk operation, so it tends to be pretty slow.

I found I get significant performance gains by composing a single operation as a massive string and sending it all in one go.

Since you only have 57k records, that may be your best option. Beware that you have to be careful about how you convert numerical data into strings to avoid character truncation.

Another option is to use pyodbc.cursor.fast_executemany. I've not tried this, it just looks promising.

https://github.com/mkleehammer/pyodbc/wiki/Features-beyond-the-DB-API

sqalchemy added support for this feature

https://docs.sqlalchemy.org/en/13/changelog/migration_13.html#support-for-pyodbc-fast-executemany

glacial rune
#

Thanks πŸ˜„ ultimately I will have like 30 million records... the db is on google cloud so I wonder if it would be faster to upload csv files

desert oar
#

@glacial rune what database are you actually using?

#

they all have different features for this

glacial rune
#

MySQL on Google Cloud

#

5.7

opaque isle
#

I have a question. Normally is it possible for an image classifier built on CNN to give the count of something in an image. (e.g. the no of cats in an image)?

velvet thorn
#

I have a question. Normally is it possible for an image classifier built on CNN to give the count of something in an image. (e.g. the no of cats in an image)?
@opaque isle yes

runic stream
#

hey so i want to train a GRU network with an input of shape (748, 500, 12)
but i'm getting this error:

#

the model πŸ‘† can someone please help?

cobalt jetty
#

can you show a full picture of the error message?

#

I think you're messing something with x_train with regards to the shape.

runic stream
#

these are the shapes

cobalt jetty
runic stream
#

the first dim is the no. of examples, the next is the time steps(500 samples), and the last one is the features

cobalt jetty
#

is it the pytorch GRU?

runic stream
#

keras

#

this is the network i'm trying to implement

cobalt jetty
#

of the top of my head, I can give a proper answer right now.

#

I do think it's a size mismatch between the input_size you're using and the size of xtr, though.

runic stream
#

I think the GRU outputs five different sequences, each of which I have to pass through another Dense layer, but I don't know how to do that,....maybe that maybe the reason for the error....

#

I do think it's a size mismatch between the input_size you're using and the size of xtr, though.
@cobalt jetty input size i have given is (500,12) to the GRU, and xtr is of shape (748, 500, 12)

cobalt jetty
#

I looked back at the error and it seems to arise from this function. I.e. your output shape and your y_train shape have a mismatch.

runic stream
#

sorry i tried everything it is not working still, i want a multiclass classification, and i think i'm doing something wrong here

I think the GRU outputs five different sequences, each of which I have to pass through another Dense layer, but I don't know how to do that,....maybe that maybe the reason for the error....
but i don't know how to solve that

safe tapir
#

Anyone have experience with using eGPUs for DL? Most of the benchmarks I see online show a 10-30% performance hit for gaming. Should I expect similar for DL?

cobalt jetty
#

Try inputing (5,) as an output shape rather than len(labels), @runic stream

runic stream
#

i guess the ouput units have to be an int,

willow karma
#

Hey squad - hope everyone had a good weekend and is staying safe/healthy. I'm starting to dip my toes into sentiment analysis, and can imagine this work has been explored in so much detail that there are some good Python libraries that can basically "plug and play" with text that you feed it.

If this is the case, are there any libraries ya'll recommend I explore? I imagine there are easier alternatives to running e.g. CountVectorizer

slate hollow
#

hey um does anyone know how much space buillding tensorflow from source takes? bc rn it has taken up frickin 30 gigs

desert oar
#

@void anvil might have to just read the source code

#

or use subprocess

#
cat your_file.naf | corefgraph -l en_conll > output.naf

ugh, useless use of cat

#

and sudo pip install too

#

yeah you'd have to check the source code for how it works

#

good old academic software

#

no idea

#

let me know if you figure it out though @void anvil

#

i like having all this nlp stuff in my toolbox

crimson vector
#

okay im really dumb and new to everything and would like some help. ive spent 2 days trying to figure out what is wrong with my neural network. im trynna do the handwritten digits thing (mnist) and my code is both super slow and the cost only goes up

#

can someone look it over and tell me where i am going wrong?

#
import numpy as np

def cross_entropy(output, y_target):
    return - np.sum(np.log(output) * y_target, axis=1)

def cost(output, y_target):
    return np.mean(cross_entropy(output, y_target))

def sigmoid(z):
    return 1 / (1 + np.exp(z * -1))

def sigmoid_deriv(z):
    return sigmoid(z) * (1 - sigmoid(z))

def softmax(z):
    return (np.exp(z.T) / np.sum(np.exp(z), axis=1)).T

m = 10000
y = np.zeros((m, 10))
x = np.zeros((m, 784))

file = open("data\\mnist_train.txt", 'r')
for i in range(m):
    line = file.readline()
    x_line = line[2:].split(',')
    x_line = np.array([int(i) for i in x_line]).reshape(1, 784)
    x[i] = x_line

    y_line = np.zeros((10, 1))
    y_line[int(line[0])] = 1
    y[i] = np.array(y_line.T)

y = y.reshape(m, 10).T
x = x.T
alpha = .01

W1 = np.random.rand(256, 784) * .01
b1 = np.zeros((256, 1))

W2 = np.random.rand(256, 256) * .01
b2 = np.zeros((256, 1))

W3 = np.random.rand(10, 256) * .01
b3 = np.zeros((10, 1))

for i in range(1000):
    # feed forward
    Z1 = np.dot(W1, x) + b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    Z3 = np.dot(W3, A2) + b3
    A3 = softmax(Z3)

    # calculating gradients
    dz3 = A3 - y
    dw3 = np.dot(dz3, A2.T) / m
    db3 = np.sum(dz3, axis=1, keepdims=True) / m
    da2 = np.dot(W3.T, dz3)

    dz2 = np.multiply(da2, sigmoid_deriv(Z2))
    dw2 = np.dot(dz2, A1.T) / m
    db2 = np.sum(dz2, axis=1, keepdims=True) / m
    da1 = np.dot(W2.T, dz2)

    dz1 = np.multiply(da1, sigmoid_deriv(Z1))
    dw1 = np.dot(dz1, x.T) / m
    db1 = np.sum(dz1, axis=1, keepdims=True) / m

    # updating weights and biases
    W3 = W3 - alpha * dw3
    b3 = b3 - alpha * db3
    W2 = W2 - alpha * dw2
    b2 = b2 - alpha * db2
    W1 = W1 - alpha * dw1
    b1 = b1 - alpha * db1
#

ping me if u can help or something

desert oar
#

@void anvil what are you ultimately trying to do

sly mango
#

Guys, I have created a small corpus of 1.8M sentences and 250K unique words in Spanish for NLP, but I really don't know where to post it. πŸ˜…

desert oar
#

what is the output from a tool like this @void anvil ?

#

some kind of matrix of coreferences?

#

e.g. matrix C where Cij = 1 if entities i and j appear in the same doc?

desert oar
#

hm

#

interesting

#

so you need to see coreferences of things like "BOILER" and "TECHNICIAN"?

lapis sequoia
#

Anyone with TF lite experience need some help

desert oar
#

interesting that coreference resolution is a separate task from "just" entity resolution

wheat pilot
#

how do i standard scale a dataframe to have 0 mean and 1 stdev using sklearn?

#

i tried standardscaler and scale but when i manually check the returned data the mean is not 0

desert oar
#

@wheat pilot it might be +/- some small amount due to floating point error

wheat pilot
#

when i use df.mean() one row returns -3.552714e-16

#

so i guess that might be it

#

but then this is being used towards data preprocessing and my initial accuracy for a knn implementation is 1 but for a standard scaled is lower

#

and even lower for a min max scaled

#

shouldnt they be higher? @desert oar

desert oar
#

do you have any idea how tiny 1e-16 is

wheat pilot
#

yea super close to 0

#

i wasnt sure if it should be exact or not

#

but are my accuracies supposed to get worse?

#

with preprocessing

#

also i thought the way things were standard scaled is x-mean/stdev

desert oar
#

yeah that is right

#

so you should have mean ~0 and sd ~1

wheat pilot
#

but when i manually do that for one row of my dataset i get a different value

desert oar
#

huh?

#

what do you mean "for one row"?

wheat pilot
#

7.3,0.74,0.08,1.7,0.094,10.0,45.0,0.9957600000000001,3.24,0.5,9.8

#

this is one row of my datafram

#

its mean is 7.2227054545455

#

and its stdev is 13.098395928751

desert oar
#

you arent supposed to scale rows

#

you're supposed to scale columns

wheat pilot
#

ohhhh

#

oh man

desert oar
#

the mean of each column should be around 0, and the stddev of each column should be around 0

wheat pilot
#

no wonder

#

ok so

desert oar
#

the point is so that all the data is centered in roughly the same place and occupies roughly the same amount of "space"

wheat pilot
#

what is the implementation using sklearn for this?

#

scale vs fit_transform

desert oar
#

eh?

#

first of all what are your data types

#

how many columns

#

any missing values

#

etc

#

just so i know what you are dealing with

wheat pilot
#

its a pandas dataframe i think

#

initially it has 12 columns and one of those is the label column

#

no missing values

#

in the start of my standard scaling def i removed the last column of the dataframe with ```python
xTrain = xTrain[xTrain.columns[:-1]];
xTest = xTest[xTest.columns[:-1]];

#

since i dont want to standardize the labels i think?

desert oar
#

correct

#

well you can standardize those too, but you wouldnt want to do it here

wheat pilot
#

yea

desert oar
#

you dont need semicolons in python btw

wheat pilot
#

oh yea

desert oar
#

i assume you use javascript?

wheat pilot
#

habit

#

all of my coursework up till this course has been java based

desert oar
#

ah

#

good so you understand objects and stuff

wheat pilot
#

we didnt have any introduction to python just an assignment on the topic of the course 😦

#

yeah a bit

desert oar
#

all the columns are numeric right?

#

as in, they only contain numbers?

wheat pilot
#

i get a little iffy on fundamentals

#

yea they are

#

except the header?

desert oar
#

yeah python has some things in common with java and some things that are very different

wheat pilot
#

each column has a name

desert oar
#

yeah we dont care about the column names

#

pandas is smart enough not to mix those up with your data

#
x_train = x_train.iloc[:, :-1]
x_test = x_test.iloc[:, :-1]

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)
wheat pilot
#

i imported the preprocessing packet(not sure if its the right term) using ```python
from sklearn import preprocessing

desert oar
#

("module" or "package")

#

(a "package" is just a "module" that contains other modules, called submodules)

#

the code i posted should "just work"

#

it will scale each column independently

wheat pilot
#

do i have to use iloc instead of what i had?

desert oar
#

no but its less typing

#

or if you know the column name of the label you can do

x_train = x_train.drop(columns=[label_colname])

pandas gives you a few different ways to perform similar operations, depending on what exactly you want

wheat pilot
#

for the line with scaler = StandardScaler i need to use preprocessor.StandardScaler() right?

desert oar
#

in your case yes

#

i usually write from sklearn.preprocessing import StandardScaler

#

they both work

wheat pilot
#

i have to use min max scaler as well

#

so i imported the bigger package

desert oar
#

using both is weird

#
from sklearn.preprocessing import StandardScaler, MinMaxScaler

this is one option

wheat pilot
#

ooh even better

#

then i can just use StandardScaler() on its own?

desert oar
#

i wouldnt recommend using both unless you know what you're doing and have a good reason to do it

wheat pilot
#

using both minmax and standard?

desert oar
#

min-max scaling (aka "normalizing") works best when the data has a logical "maximum" and "minimum" value

wheat pilot
#

they are for two different definitions to see how preprocessing affects my accuracy

desert oar
#

whereas shifting by mean and scaling by std dev (aka "standardizing") works best on unbounded data

#

yeah, dont use them on the same data

#

but if youre comparing then go for it

wheat pilot
#

ah ok good i think its set up to use a new copy each time

desert oar
#

the "function-only" versions like sklearn.preprocessing.minmax_scale don't preserve the values you need to re-apply the scaling later

#

whereas the class-based versions like sklearn.preprocessing.MinMaxScaler store the scaling parameters, which lets do you fit_transform on the training data and then just transform on the test data

wheat pilot
#

where you named things scaled are there issues if i use ```python
xTrain = scaler.fit_transform(xTrain)

desert oar
#

i dont like not having access to my original data

#

it's just more annoying

wheat pilot
#

ah wait

#

on what u said about function only

#

for knn should i be using the same tranform values on the test data?

#

or scaling test data separately

desert oar
#

for knn should i be using the same tranform values on the test data?
yes, you should do this

#

think about it practically: the test data is meant to simulate "out of sample" data. if the data is out of sample, where are you going to get the scaling parameters? nowhere. you have to use the parameters from the training data

wheat pilot
#

ohh

#

i see

#

my teaching assistant mentioned not doing the same process on the test data

#

but the assignemnt info made it seem like we were supposed to

#

but i think it meant same process as in same parameters as the training and not same code process

#

so for the same deal but a min max version i would just replace the standard scaler with a min max?

#
xTrain = xTrain.iloc[:, :-1]
    xTest = xTest.iloc[:, :-1]
    scaler = MinMaxScaler()
    xTrain_scaled = scaler.fit_transform(xTrain)
    xTest_scaled = scaler.transform(xTest)
    return xTrain_scaled, xTest_scaled
desert oar
#

yep

wheat pilot
#

do you know anything about adding noisy features?

#

i think they might also be called irrelevant features

#

i implemented this but im not sure this is actually what i should be trying to do ```python
def add_irr_feature(xTrain, xTest):
"""
Add 2 features using Gaussian distribution with 0 mean,
standard deviation of 1.

Parameters
----------
xTrain : nd-array with shape n x d
    Training data 
xTest : nd-array with shape m x d
    Test data 

Returns
-------
xTrain : nd-array with shape n x (d+2)
    Training data with 2 new noisy Gaussian features
xTest : nd-array with shape m x (d+2)
    Test data with 2 new noisy Gaussian features
"""
# TODO FILL IN
feature1_train = np.random.normal(0, 1, len(xTrain))
feature2_train = np.random.normal(0, 1, len(xTrain))
feature1_test = np.random.normal(0, 1, len(xTest))
feature2_test = np.random.normal(0, 1, len(xTest))
xTrain['irr_feat1'] = feature1_train
xTrain['irr_feat2'] = feature2_train
xTest['irr_feat1'] = feature1_test
xTest['irr_feat2'] = feature2_test
return xTrain, xTest
desert oar
#

note that those features are uncorrelated with your "meaningful" features

#

is this part of your homework?

wheat pilot
#

yea it is

#

we never went over any code in class though

#

just the concept of knn

#

TT

#

im having trouble following what the classification thing means

desert oar
#

what does the homework actually ask you to do?

wheat pilot
#

i do understand im supposed to be making extra columns that are not "necessary" and may mess with results

#

Fill in the add irr feature function to add two irrelevant features to the training
and test data. The data for each column should be drawn from a Gaussian (normal)
distribution with 0 mean and standard deviation of 1.

desert oar
#

seems like you did the right thing then

#

dont overthink that question

wheat pilot
#

oh cool cool

#

when i run this though my accuracy went up for the one i thought it would go down

#

and it went down for the ones i thought should go up

desert oar
#

well yeah you were usin the scaling totally wrong lol

wheat pilot
#

oh i mean with new code

#

my results are: ```python
Test Acc (no-preprocessing): 1.0
Test Acc (standard scale): 0.8
Test Acc (min max scale): 0.7
Test Acc (with irrelevant feature): 1.0

#

actually this may be because i ran it on a small test set

desert oar
#

use the same test and train set for all of those

#

dont re-draw each time

wheat pilot
#

what do you mean

#

redraw?

desert oar
#

make sure you use the same test/train split for all 4 of those methods

#

to get a fair comparison

wheat pilot
#

oh my data is divided into 4 csv

#

xtraining and ytraining, xtest and ytest

desert oar
#

thats fine

wheat pilot
#

x has the data and y has labels

#

theres something preimplemented to get each thing

#

i think its pd.read_csv

desert oar
#

thats fine

wheat pilot
#

its running now but since my knn has some for loops it takes a few mins for each test

#

the csvs are 500x12

#

ah it still goes down

#

Test Acc (no-preprocessing): 0.8395833333333333
Test Acc (standard scale): 0.70625

#

so far

desert oar
#

it might actually just be worse

#

depends on the data

#

seems unlikely but

wheat pilot
#

Evaluate the accuracy of the model on the test dataset for the different preprocessing
techniques as a function of k. What conclusions can you draw with regards to the
different forms of preprocessing and the sensitivity to irrelevant features for this dataset?

#

i feel like its fishing for an answer about accuracy getting better since scale should usually matter for knn?

#

i ended up with Test Acc (no-preprocessing): 0.8395833333333333
Test Acc (standard scale): 0.70625
Test Acc (min max scale): 0.8
Test Acc (with irrelevant feature): 0.84375

desert oar
#

yeah that would be my guess as well

wheat pilot
#

im not sure what conclusions to draw as its opposite of my gues

desert oar
#

just triple check for mistakes

#

worst case scenario you get it wrong

wheat pilot
#

would you be able to skim through my knn to see if i had anything major causing this to be wrong?

desert oar
#

i can, but i cant offer that much since this is homework

wheat pilot
#

it works for the first data set that i had to write it for but this question uses that knn for a different set

#

the first is my knn and the second is the preprocessing tests

lapis sequoia
#

Are there any websites similar to β€œkaggle” that offers competition ?

wheat pilot
#

@desert oar were you able to find anything?

verbal haven
#

does someone know what this number -> 22/22 means when training a nn with tensorflow?
Epoch 94/100
22/22 [==============================] - 0s 750us/step - loss: 0.0123

#

i understand its related to the dataset size but i cant find the relation

desert oar
#

is that a progress bar for stochastic gradient descent?

hasty grail
#

Step 22 of 22 in the 94th epoch

#

Usually in TF a "step" is equivalent to one batch

lean wharf
#

Does anyone have an idea how I could recognise the steps annotated below using python? I've got the xy data in a dataframe at the moment

hasty grail
#

smooth out the curve using moving average then find the derivative at each point?

lean wharf
#

Some sort of "if the difference between y and y+1 > n, print x and x+1"

verbal haven
#

scipy.signals.find_peaks

hasty grail
#

the difference between y and y+1`
basically the derivative

lean wharf
#

Aye. At the moment I've got something looking a little like this code wise

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.signal import savgol_filter


df = pd.read_csv('e:/Projects/HiringTest/submission/sample.txt', sep = ' ')

# Data befrore filtering.
df.plot(x ='x', y='y', linewidth=0.2)

# Savitzky-Golay filter implementation
dataIIR = df
dataIIR['y'] = savgol_filter(df['y'], 101, 2)
dataIIR.plot(x ='x', y='y', linewidth=0.4)
#

So importing the data, and reducing the noise a little with filter

verbal haven
#

'e:/Projects/HiringTest/submission/sample.txt' lol,
have you tried with find_peaks or similars? tweaking it, it should find the steps

lean wharf
#

Trying now

#

Just doing some practice interview q's lol

lean wharf
#

@verbal haven So say I want to find a point with difference of 1 between y and y+1, would that be height=1?

#

ie. jumps = find_peaks(dataIIR['y'], height=1.5)

#

I'm struggling to get to grips with understanding how the function works

hasty grail
#

I think you want threshold instead of height

verbal haven
#

yes its threshold

lean wharf
#

Hmmm, so ```
jumps = find_peaks(dataIIR['y'], threshold=1)
print(jumps)

Doesn't seem to return a values as such
verbal haven
#

it should return an array with a peak index

#

oh you mean with threshold = 1 ?

lean wharf
#

So, I've got a some data that looks like this

2.24756189047 2.70009589679
2.24831207802 2.85466124369
2.24906226557 2.85664726093
2.24981245311 2.84088991726
2.25056264066 5.23410679429
2.25131282821 5.01424916475
2.25206301575 4.81484599199
2.2528132033 5.25819389546
2.25356339085 4.99236143949
#

you can see the jump from 2.8 to 5.2

#

I'm looking to find where in the code that jump happens. I've put it as 1 just because it seems like a reasonable value to measure the number of jumps

hasty grail
#

what does it return?

lean wharf
hasty grail
#

ok so it literally returns nothing

lean wharf
#

Basically yes

arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

lean wharf
#

But I'm not sure why. As you can see above, there clearly is a difference of greater that 2. Even 1 returns nothing

hasty grail
#

After messing around in #bot-commands I think I see the problem

#

The threshold checks both sides

#

if it's just a jump it wouldn't register it as a peak as the other side doesn't have a big enough jump

lean wharf
#

Hmm I did think that might be a possibility. It looks for spikes not steps basically

#

I might need to do it manually in that case

hasty grail
#

try this

#

!e

import numpy as np

arr = np.array([2.24756189047, 2.70009589679,
2.24831207802, 2.85466124369,
2.24906226557, 2.85664726093,
2.24981245311, 2.84088991726,
2.25056264066, 5.23410679429,
2.25131282821, 5.01424916475,
2.25206301575, 4.81484599199,
2.2528132033, 5.25819389546,
2.25356339085, 4.99236143949])

x, y = arr[::2], arr[1::2]
print(f"x={x}")
print(f"y={y}")

print(x[np.gradient(y) >= 1])
arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

lean wharf
#

It returns: x=[2.24756189 2.24831208 2.24906227 2.24981245 2.25056264 2.25131283 2.25206302 2.2528132 2.25356339] y=[2.7000959 2.85466124 2.85664726 2.84088992 5.23410679 5.01424916 4.81484599 5.2581939 4.99236144] [2.24981245 2.25056264]

hasty grail
#

if you only want a strict difference between consecutive elements, instead of np.gradient you can just do np.ediff1d

lean wharf
#

Can't seem to get it to how I'd want

#

I've currently mocked up some code to possibly solve it: ```python
n = 0
for row_index,row in dataIIR.iterrows():
np1 = row['y']
diff = np1 - n
if(diff > 2):
print(row_index)
n = row['y']

#

But it's not returning anything. I'm trying to get it so it will print the row index if the difference between y and y+1 is greater than 2

velvet thorn
#

how about

#

np.argmax((a - a[1:]).abs())

#

that's what I would do

lean wharf
#

Sorry for what part? @velvet thorn

hasty grail
#

!e

import numpy as np

arr = np.array([2.24756189047, 2.70009589679,
2.24831207802, 2.85466124369,
2.24906226557, 2.85664726093,
2.24981245311, 2.84088991726,
2.25056264066, 5.23410679429,
2.25131282821, 5.01424916475,
2.25206301575, 4.81484599199,
2.2528132033, 5.25819389546,
2.25356339085, 4.99236143949])

x, y = arr[::2], arr[1::2]
print(f"x={x}")
print(f"y={y}")

print(x[np.ediff1d(np.concatenate([[y[0]], y])) >= 1])
arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

hasty grail
#

This works?

#

The concatenation is to ensure that the result of ediff1d is the same length as the original array

lean wharf
#

Give me a sec to wrap my head around it haha

hasty grail
#

alternatively

#

x[np.ediff1d(y, to_begin=y[0]) >= 1]

lean wharf
#

I think you're correct, I just need to try it with the full data file

#

so I'll need to convert the dataframe to a similar array

#

but data.to_numpy() gives it in the following format:

[4000 rows x 2 columns]
[[ 0.00000000e+00 -5.72766726e-03]
 [ 7.50187547e-04 -5.37550170e-03]
 [ 1.50037509e-03 -5.03534022e-03]
 ...
 [ 2.99849962e+00  5.02267064e+00]
 [ 2.99924981e+00  5.02299900e+00]
 [ 3.00000000e+00  5.02332816e+00]]
hasty grail
#

take the first column as x and the second as y?

#

x, y = np.split(arr, 2, axis=1)

lean wharf
#

print(x[np.ediff1d(np.concatenate([[y[0]], y])) >= 1]) should still work as intended as far as I can see right?

#

Ah, wait, hmm

hasty grail
#

I think it should

lean wharf
#

running into an issue that my dataFrames aren't keeping seperate

#

It's return nothing atm

#
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.signal import savgol_filter
from scipy.signal import find_peaks


data = pd.read_csv('e:/Projects/HiringTest/submission/sample.txt', sep = ' ')

# Data befrore filtering.
data.plot(x ='x', y='y', linewidth=0.2)

# Savitzky-Golay filter implementation
dataIIR = data
savgol_filter(dataIIR['y'], 101, 2)
dataIIR.plot(x ='x', y='y', linewidth=0.2)

# Locating jumps
arr = dataIIR.to_numpy()
x, y = np.split(arr, 2, axis=1)
print(x[np.ediff1d(np.concatenate([[y[0]], y])) >= 1])

#plt.show()
#

Wait, think it was a bug

#

Fantastic, I think it's found the 3 steps!!

#
[[0.75018755]
 [1.50037509]
 [2.25056264]]
#

Thanks for all your help. I may be back in a few minutes with some more questions as to how I actually smooth the step between them, but I should be able to do that by appending the array with some values after doing some exponential smoothing

hasty grail
#

np

lapis sequoia
#

I don’t understand any of this but good work

lapis sequoia
#

any idea what Im doing wrong here btw Im using PostgreSQL

c.execute("SELECT salary FROM EMPLOYEE WHERE name=$1", ("James",))

gives me 
Traceback (most recent call last):
  File "D:/Projects/DSaML/Main.py", line 21, in <module>
    c.execute("SELECT salary FROM EMPLOYEE WHERE name=$1", "James",)
psycopg2.errors.UndefinedParameter: there is no parameter $1
LINE 1: SELECT salary FROM EMPLOYEE WHERE name=$1
#

just ping me when you answer pls

lean wharf
#

Does anyone know how I'd go about smoothing data between two points? I've got the sections I want to smooth coloured below:

#

and my data as a dataframe

hasty grail
#

did your moving average trick not work?

lean wharf
#

I couldn't get it implemented without error

hasty grail
#

code?

lean wharf
#

2 secs

#
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import signal
from scipy import optimize

data = pd.read_csv('e:/Projects/HiringTest/submission/sample.txt', sep = ' ')

# Data befrore filtering.
plt.figure()
plt.plot(data['x'], data['y'], linewidth=0.2)

# Locating jumps
arr = data.to_numpy()
x, y = np.split(arr, 2, axis=1)
stepIndex = data.index[np.ediff1d(np.concatenate([[y[0]], y])) >= 1]
print(stepIndex)

step1x = x[range(stepIndex[0]-100, stepIndex[0]+100)]
step2x = x[range(stepIndex[1]-100, stepIndex[1]+100)]
step3x = x[range(stepIndex[2]-100, stepIndex[2]+100)]

step1y = y[range(stepIndex[0]-100, stepIndex[0]+100)]
step2y = y[range(stepIndex[1]-100, stepIndex[1]+100)]
step3y = y[range(stepIndex[2]-100, stepIndex[2]+100)]


def moving_avg(x, n):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[n:] - cumsum[:-n]) / float(n)

step1yMA = moving_avg(step1y, 3)
step2yMA = moving_avg(step2y, 3)
step3yMA = moving_avg(step3y, 3)

plt.plot(step1x, step1yMA)
plt.plot(step2x, step2yMA)
plt.plot(step3x, step3yMA)

# Savitzky-Golay filter implementation
dataIIR = data
dataIIR['y'] = signal.savgol_filter(data['y'], 101, 2)
plt.figure()
plt.plot(dataIIR['x'], dataIIR['y'], linewidth=0.2)

plt.show()
#

@hasty grail

#

So I haven't modified the dataframe with the new data yet, just plotted it, but it should still work. Instead, it prints the error:
ValueError: x and y must have same first dimension, but have shapes (200, 1) and (198,)

hasty grail
#

which line?

#

I think that the issue is that the moving average calculation isn't defined for the first n-1 values

#

hence the difference in shape

lean wharf
#

I don't understand what you mean by that sorry

velvet thorn
#

I don't understand what you mean by that sorry
@lean wharf say you have an average of 3 values

#

over this data: [10, 5, 10, 15, 20, 15]

#

if you always want to have 3 values in the calculation

#

you'll end up with [25/3, 10, 15, 50/3] (4 values)

#

1-3, 2-4, 3-5, 4-6

#

if you wanted 4 values in the moving average, you'd have 3 values in the result

lean wharf
#

So my value of n is wrong?

#

I'm still not quite grasping the issue

#

Trying out a different implementation:

 Locating jumps

arr = data.to_numpy()
x, y = np.split(arr, 2, axis=1)
stepIndex = data.index[np.ediff1d(np.concatenate([[y[0]], y])) >= 1]
print(stepIndex)

step1x = x[range(stepIndex[0]-100, stepIndex[0]+100)]
step2x = x[range(stepIndex[1]-100, stepIndex[1]+100)]
step3x = x[range(stepIndex[2]-100, stepIndex[2]+100)]

step1y = y[range(stepIndex[0]-100, stepIndex[0]+100)]
step2y = y[range(stepIndex[1]-100, stepIndex[1]+100)]
step3y = y[range(stepIndex[2]-100, stepIndex[2]+100)]

def movingaverage(interval, window_size):
    window= np.ones(int(window_size))/float(window_size)
    return np.convolve(interval, window, 'same')

y_av1 = movingaverage(step1y, 10)
y_av2 = movingaverage(step2y, 10)
y_av3 = movingaverage(step3y, 10)
plt.plot(step1x, y_av1)
plt.plot(step2x, y_av2)
plt.plot(step3x, y_av3)



#

But now getting the error:
ValueError: object too deep for desired array

hasty grail
#

what gm said

lean wharf
#

I feel like there's a fundemental flaw in my understanding here

hasty grail
#

no matter what the value of n is, you will have to define the first n-1 values of your moving average

#

otherwise you will always be a couple of values short and the arrays won't align to each other

lean wharf
#

Can I do that by simply increasing the size of the data points I draw from?

#

ie. step1y = y[range(stepIndex[0]-101, stepIndex[0]+100)]

hasty grail
#

ok the problem here is that you might get out-of-bound indices when you add/subtract from stepIndex

#

If I were you I would create a padded version of y before doing that

lean wharf
#

Apologies again, I don't know what you mean by that. I'm new to these concepts

hasty grail
#
window_len, exp_alpha = 201, 0.5

pad_left, pad_right = window_len // 2, (window_len - 1) // 2
y_padded = np.pad(y, (pad_left, pad_right), constant_values=(np.nan, np.nan))

exp_kernel_left = ((1 - exp_alpha) ** np.arange(1, pad_left + 1))[::-1]
exp_kernel_right = (1 - exp_alpha) ** np.arange(1, pad_right + 1)
exp_kernel = np.concatenate([exp_kernel_left, [1], exp_kernel_right])

avg_values = []
for i in range(len(y)):
    window = y_padded[i:i+window_len]
    exp_sum = exp_kernel[~np.isnan(window)].sum()
    exp_avg = np.nansum(window * exp_kernel) / exp_sum
    avg_values.append(exp_avg)
#

something like this maybe

#

Made some errors, I have edited the code

#

oops the range should be from 1 to N instead of 0 to N-1

lean wharf
#

Do I no longer need the step1y = y[range(stepIndex[0]-100, stepIndex[0]+100)] functions then, to define the range?

hasty grail
#

replace all that with what I wrote

lean wharf
#

I'm struggling to make sense of this, it's a bit above my pay grade haha

#

But I'll give it a shot

hasty grail
#
exp_kernel_left = ((1 - exp_alpha) ** np.arange(1, pad_left + 1))[::-1]
exp_kernel_right = (1 - exp_alpha) ** np.arange(1, pad_right + 1)
exp_kernel = np.concatenate([exp_kernel_left, [1], exp_kernel_right])

This builds the kernel for calculating the moving average. It's equal to one in the center then exponentially falls off towards the sides

#
y_padded = np.pad(y, (pad_left, pad_right), constant_values=(np.nan, np.nan))

This creates a padded version of y where the padded values are NaNs so they can be filtered out later

#

in the for loop, it takes a window from y_padded such that y[i] is in the center of the window

#

then, it is multiplied with the kernel to get the numerator

#

the denominator is the sum of the values (aka weights) in the kernel

topaz sparrow
#

how did you get that colour in the text?

hasty grail
#

The average is taken such that NaNs are not counted

#

!code

arctic wedgeBOT
#

Discord has support for Markdown, which allows you to post code with full syntax highlighting. Please use these whenever you paste code, as this helps improve the legibility and makes it easier for us to help you.

To do this, use the following method:

```python
print('Hello world!')
```

Note:
β€’ These are backticks, not quotes. Backticks can usually be found on the tilde key.
β€’ You can also use py as the language instead of python
β€’ The language must be on the first line next to the backticks with no space between them

This will result in the following:

print('Hello world!')
lean wharf
#

Alright, that makes a bit more sense

topaz sparrow
#
print ('hello world!')
#

?

hasty grail
#

with ```

#

before python

#

on the same line

topaz sparrow
#

python print ('Hello world!')

#

oof

hasty grail
#

you can just copy the text from that bot messgae

topaz sparrow
#

yeah

#
print('Hello world!')
#

oh

#

thanks

#

sir

hasty grail
#

np

topaz sparrow
#

I'm a beginner at python

hasty grail
#

you should open a separate help channel if you're looking for Python-specific help

topaz sparrow
#

yeah

#

i knew that

#

thanks sir

#

πŸ˜„

#

one last question, what's the python bot's prefix?

hasty grail
#

an exclamation mark

topaz sparrow
#

oh

#

thanks

lean wharf
#
Traceback (most recent call last):
  File "e:/Projects/HiringTest/submission/assignment1.py", line 31, in <module>
    exp_sum = exp_kernel[~np.isnan(window)].sum()
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
hasty grail
#

hmm can you print the shape of each variable?

lean wharf
#

Is that possible without being able to run the code?

lapis sequoia
#

any idea what Im doing wrong here btw Im using PostgreSQL

c.execute("SELECT salary FROM EMPLOYEE WHERE name=$1", ("James",))

gives me 
Traceback (most recent call last):
  File "D:/Projects/DSaML/Main.py", line 21, in <module>
    c.execute("SELECT salary FROM EMPLOYEE WHERE name=$1", "James",)
psycopg2.errors.UndefinedParameter: there is no parameter $1
LINE 1: SELECT salary FROM EMPLOYEE WHERE name=$1
lean wharf
#

This is the total code thus far for continuity sake:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import signal
from scipy import optimize

data = pd.read_csv('e:/Projects/HiringTest/submission/sample.txt', sep = ' ')

# Data befrore filtering.
plt.figure()
plt.plot(data['x'], data['y'], linewidth=0.2)

# Locating jumps
arr = data.to_numpy()
x, y = np.split(arr, 2, axis=1)
# stepIndex = data.index[np.ediff1d(np.concatenate([[y[0]], y])) >= 1]
# print(stepIndex)

window_len, exp_alpha = 201, 0.5

pad_left, pad_right = window_len // 2, (window_len - 1) // 2
y_padded = np.pad(y, (pad_left, pad_right), constant_values=(np.nan, np.nan))

exp_kernel_left = ((1 - exp_alpha) ** np.arange(1, pad_left + 1))[::-1]
exp_kernel_right = (1 - exp_alpha) ** np.arange(1, pad_right + 1)
exp_kernel = np.concatenate([exp_kernel_left, [1], exp_kernel_right])

avg_values = []
for i in range(len(y)):
    window = y_padded[i:i+window_len]
    exp_sum = exp_kernel[~np.isnan(window)].sum()
    exp_avg = np.nansum(window * exp_kernel) / exp_sum
    avg_values.append(exp_avg)


# Savitzky-Golay filter implementation
dataIIR = data
dataIIR['y'] = signal.savgol_filter(data['y'], 101, 2)
plt.figure()
plt.plot(dataIIR['x'], dataIIR['y'], linewidth=0.2)

plt.show()
hasty grail
#

you need to run the whole thing probably

lean wharf
#

Yeah I'm doing so

#

Possible unbalanced tuple unpacking with sequence defined at line 785 of numpy.lib.shape_base: left side has 2 label(s), right side has 0 value(s)

hasty grail
#

not sure which line of your code that is on

lean wharf
#

Oh, didn't copy over

#

It's line 15, so x, y = np.split(arr, 2, axis=1)

hasty grail
#

maybe that's just an error of the interpreter

#

as long as arr indeed has 2 columns it should be ok

lean wharf
#

That's what I'm thinking also - I've ignored that specific message up until now - but it's not wanting to run

#

But it keeps throwing

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed" ```
at me
hasty grail
#

yeah print their shapes

lean wharf
#

It doesn't return anything. It won't run due to the index error

hasty grail
#

comment it out then

lean wharf
#

exp_kernel return (201,)

lavish finch
#

hii, sorry random question! was curious if there's a way to write code to create columns in google sheets? i want to insert values from a different sheet into another by recognizing the same location names- is there a way to do that? i know how to do it in jupityer notebook, but don't know how to apply it to google sheets

hasty grail
#

how about window?

lean wharf
#

Ah, window is (201, 201)

hasty grail
#

wait what

#

print y_padded.shape and y.shape

lean wharf
#

(4200, 201)
(4000, 1)

#

hmm, y should be a single column

#

unless I'm mistaken

hasty grail
#

oh I think I know why, the dimensions are not squeezed after split

#

maybe you should do the more straightforward x, y = arr[:, 0], arr[:, 1] instead

lean wharf
#

Awesome, they're all single column now

#

So, this should smooth the step theoretically now?

hasty grail
#

try

lean wharf
#

I'm plotting x against avg_values just to clarify?

hasty grail
#

yeah

lean wharf
hasty grail
#

the end result is Fig3 right?

lean wharf
#

Thanks for soldiering through it

#

Fig3 is the Savitzky-Golay filter

hasty grail
#

ah

lean wharf
#

The MA has maintained the step gradient better

hasty grail
#

if you want it to be smoother you can try adjusting alpha

#

you can try to selectively smooth the graph only close to the points where the jump is significant

lean wharf
#

that's what I was trying to do earlier with the +100/-100

#

and then somehow make it sine wave like to join the steps

#

I'm gonna call it a day though I think. Thanks again for all your help, it's been super appreciated! @hasty grail

hasty grail
#

np

meager delta
#

Does somebody know of a numpy function, which takes for example 3 vectors, and represents the first as a linear combination of the others two (I need this for representing a face, as a linear combination of its eigenfaces)

merry ridge
#

numpy.linalg.solve?

#

Maybe that won't work depending on the rank of the matrix you construct from the two vectors

#

Looks like numpy.linalg.lstsq would be able to handle it for a general mx2 matrix to me.

mild topaz
#

hii, i have image recognition model which recognizes passport and driving_licence images

#

as we know some countries have statewise driving_licence

#

how i can make code weather my model is statewise or countrywise
my image recognition model is basically recognizes documents like "passports" and "driving_licence"
as some country has driving_licence "statewise" and some countries has "countrywise

#

as u know some countries has their driving_licence statewise

#

for e.g. "usa" , "australia", "india"
some country has "countrywise"
for e.g. "albania", "united_kingdom" etc

#

how i can make condition if country has state and user does not provide state name

#

so it should return "provide valid state name"

feral spoke
#

Guys I need some help with pandas

#

Suppose there are two columns with values in it.

#

I want to find the entry that has the highest difference in values.

#

How should I go around it?

#

nvm guys

keen pine
#

df.loc[(df['A']-df['B']).idxmax()] , this should be useful.

feral spoke
#

got the solution

#

thnx for the help tho @keen pine

meager delta
#

Looks like numpy.linalg.lstsq would be able to handle it for a general mx2 matrix to me.
@merry ridge I need to find the linear combination of the vectors, that form the first one, so I can take the coefficients, and put them into a weight vector (If you are familiar with eigenfaces, this is used to reconstruct the main face from the eigen ones + the mean one)

merry ridge
#

I don't see why that won't do what you're asking

keen pine
#

@feral spoke dmi.

merry ridge
#

Given a v_3 in span{v_1, v_2} linalg.lstsq finds the minimizer of the norm of Ax-b where A = [v1, v2]. and b = v3

#

the minimizer x is precisely the coefficients that satisfy x[0]v_1 + x[1]v_2 = v_3

#

Obviously if the norm is greater than some epsilon tolerance level, then v_3 is not in the span and there is no solution. I am assuming you already know that v_3 is in the set otherwise you will have to measure the norm at each step and do some error handling if you try to do this.

meager delta
#

I will try it with two basis orthogonal vectors, and one other, which must be in the span, just to test for now

merry ridge
#

That sounds like a good approach

meager delta
merry ridge
#

Great!

meager delta
#

But what about this one? It seems that the function is returning me wrong coefficients... I mean those 2 vectors are linearly independent, so it has to return me the right solutions

merry ridge
#

I haven't used this function before. Let me load up a jyupter notebook and see

#

1, 8 is a perplexing answer

#

Ok, it looks like when you type [[2,1] , [0,1]] that is imputting your vector as the rows not the columns

meager delta
#

I think that I have to pass it like [[2, 0], [1, 1]]

#

And this way it would work

#

Like a T-Matrix

merry ridge
#

So if you do 1*[2,1] + 8*[1,1] you get [10, 8] as required

#

Yeah, you basically need to transpose it

keen pine
#

hi , i have problem on my model in pytorch ligthning , first i wonder what the possible reason of increasing validation loss while train loss decrease

#

this takes my 3 days. in above sample validation set and train set are same.

merry ridge
#

From a pure linear algebra perspective, passing in [2,0] [1,1] feels very unnatural, but it makes sense

meager delta
#

This way works completely fine, I can pass my vectors as array, and numpy will do the transposition, instead of manually changing

merry ridge
#

Yeah. I'm just spoiled by matlab notation so whenever I start manipulating vectors in python I turn stupid

meager delta
#

Thank you for the help!

merry ridge
#

Yep, good luck!

wintry oyster
#

does sorting algorithms fit here?

tidal bough
keen saddle
#

hey guys, if anybody has experience with pandas I have a question over in #help-corn
I'd be grateful for any help πŸ˜„

hoary agate
#

Yo

hoary agate
#

I started working last week, and i'm joining the I.T team as a minor

#

My boss really wants me to learn Python so i've taken a corse to learn the basics of the alnguage

#

I already worked with C# before so im mildly familiar with programming

#

But my boss has been demanding that i do some data scrapping for him lately

#

And i don't really know how to do that

#

So i just wanted to ask if anyone if willing to help me learn a little bit about it

#

If no one is that is fine, it's not your job to take request from ramdos and i know it's demanding

#

I just really don't know how to start doing this

limber ledge
hoary agate
#

Thank you very much

#

I have no idea how i can repay you

boreal summit
#

Hello everyone, is is possible to use ipython without installing Jupyter notebooks, cause I use vs code for data science.

#

I can send you some YouTube links to learn web scraping, and I have a PDF specifically for data scraping.

hoary agate
#

Damn that's really kind of you

#

I'd really appreciate if you did

boreal summit
#

I've the PDF on my PC.

hoary agate
#

Where did you originally get it?

boreal summit
#

Check out <<data school>> on YouTube, they can give you a good foundation for learning web scraping, after which you can move on to data scraping and web crawlers.

#

I downloaded it from YouTube to my phone.

#

Although, I've watched all of them

#

There's also this book I studied earlier this year, it gives a solid summary and foundation for web scraping.

#

Python projects for beginners by Connor Milliken.

#

That's the book title.

#

Do you use WhatsApp?

hoary agate
#

I do, but i don't have a personal number

#

The company gave me the phone since i couldnt buy one, but i can only use it for work

boreal summit
#

DM me, I'll send you the YouTube links and PDFs. I'll tell you where to read so you can go straight to web scraping and stuff.

flat quest
#

@boreal summit u can just run ipython in the terminal and it'll work

Also vscode has support for jupyter notebooks, if u want to use that

boreal summit
#

Yea, I use Jupyter notebooks in vs code and run code in the terminal. When I used Jupyter notebooks, I could just search ipython in the windows search bar and use it as a stand alone application. @flat quest

#

So that's why I was wondering if I could use ipython as a stand alone application without installing Jupyter notebooks.

#

I currently use notebooks in vs code for data science projects.

rugged shale
#

Hello guys, I'm working on a project for my portfolio. I'm trying to predict how long does it take to a dog to be adopted from an animal shelter. After a couple rounds of preprocessing and feature engineering, I thought it was time to do some encoding. But, when it comes to the dog's race, theres about 150 unique races available. So, for this project, I'm thinking about encoding the 5 most frequent races like 'isGermanSheperd' or 'isPoodle' and the other non-frequent races in a 'otherRaces' variable. What do you guys think about this strategy? And in this case, is there other strategies you would suggest?

dusk aspen
#

hi, i need some help. i am trying to compare three images and find out which one is the closest to one of the images, like if i had two pictures, one that would is red and the other one is blue, and i had another one that would be close to fully blue. the program would decide which picture is the closest to the third picture. someone told me to make a siamese network but i dont know how

ivory panther
#

I have a question about ANN, not python and I would like to have your opinion. I have a multivariate timeserie that I used to train a multilayer perceptron ANN. When use just a layer with 100 neurons I get a result like this and when I increase the number of layers I get much wors results. Is this because of I have a very small set of training (around 50)?

velvet thorn
#

@gray sedge

#

use np.random.randint(1, 9, size=(3, 5))

gray sedge
#

@velvet thorn I love you

velvet thorn
#

thank you

gray sedge
#

you just solved 40 minutes of frustration in 30 seconds you are a god

velvet thorn
#

yeah, it can get confusing

#

part of solving this kind of problems is knowing what to Google

#

I would probably have tried "generate random integers numpy"

#

yeah, that gives np.random.randint as the first result πŸ™‚

gray sedge
#

I have that part correct further up the page, but there was just too many things for me to comprehend

#

That was step # 2 of part E, have done like 90 others before getting to this one

velvet thorn
#

it can be overwhelming, too

gray sedge
#

@velvet thorn First time combining that with size in the same line, I overthought it. Thank you so much I may need advice again before this part is over, but I'll probably be able to google my way through it

velvet thorn
#

sure

#

just ask here, I guess

#

don't tag me though please

gray sedge
#

sorry about that

velvet thorn
#

just now was fine

#

I mean, in the future

#

sorry for the misunderstanding

#

like in general don't tag anyone who hasn't replied to your new question, I guess

gray sedge
#

That makes sense, I've done it when someones question goes unanswered for quite a few messages

#

I'm stuck again but I'm gonna pick it back up tomorrow I'm overwhelming myself

lean wharf
#

Does anyone have any experience implementing a smoothstep function?

#

I've got some code that provides a value for the centre of the jumps as an array [ 999 1999 2999]

verbal haven
#

what about applying a low pass filter on the slices?

#

ive never smoothed a signal with steps like that tbh

hasty grail
#

Was yesterday's result not good enough?

flat quest
#

i mean the general method for dealing with those kinds of time series is ARIMA @lean wharf

If you want the neural network to figure it out thats a whole nother matter

#

i can't quite remember, but I'm pretty sure i could use ipython from terminal without installing jupyter notebooks @boreal summit

It was a while back tho so i might be wrong

hasty grail
#

Slightly rewritten to be more efficient

window_len, exp_alpha = 201, 0.5

pad_left, pad_right = window_len // 2, (window_len - 1) // 2
y_padded = np.pad(y, (pad_left, pad_right), constant_values=(np.nan, np.nan))
exp_kernel = (1 - exp_alpha) ** np.abs(np.arange(-pad_left, pad_right + 1))

def get_avg_values():
    for i in range(len(y)):
        window = y_padded[i:i+window_len]
        mask = ~np.isnan(window)
        yield np.average(window[mask], weights=exp_kernel[mask])

avg_values = np.fromiter(get_avg_values(), float, count=len(y))
velvet thorn
#

i can't quite remember, but I'm pretty sure i could use ipython from terminal without installing jupyter notebooks @boreal summit

It was a while back tho so i might be wrong
@flat quest yes, Jupyter relies on IPython

flat quest
#

yes it does
but the question is if its also standalone.

Ah looks like it is. There's an ipython package u can just download

velvet thorn
#

yes it does
but the question is if its also standalone.

Ah looks like it is. There's an ipython package u can just download
@flat quest yeah, that was what I meant to say

#

but I got distracted

#

πŸ₯΄

lean wharf
#

@hasty grail it did the job of smoothing the whole signal, but not joining the steps

#

I'm nearly getting there. I know I need something like a sigmoid function to connect them

#

What I've got thus far

#_______________ Sigmoid function to smooth steps _______________
xs = np.array(x)
ys = np.array(avg_values)

diff = ys[1:] - ys[:-1]
indexBool = diff > 0.385 # Variable adjusted to fit number of steps
index = np.argwhere(indexBool).reshape(-1)

step1x = xs[(index[0]-100):(index[0]+100)]
step1y = ys[(index[0]-100):(index[0]+100)]
step2x = xs[(index[1]-100):(index[1]+100)]
step2y = ys[(index[1]-100):(index[1]+100)]
step3x = xs[(index[2]-100):(index[2]+100)]
step3y = ys[(index[2]-100):(index[2]+100)]

def sigmoid(x, mi, mx): 
    return mi + (mx-mi)*(lambda t: (1+200**(-t+0.5))**(-1) )( (x-mi)/(mx-mi) )

# Alternative to sigmoid junction
def smoothclamp(x, mi, mx): 
    return mi + (mx-mi)*(lambda t: np.where(t < 0 , 0, np.where( t <= 1 , 3*t**2-2*t**3, 1 ) ) )( (x-mi)/(mx-mi) )


plt.figure()
plt.plot(xs, sigmoid(x, y[index[0]-100], y[index[0]+100]),'b-', lw=3, alpha=0.5, label='sigmoid')
plt.plot(xs, sigmoid(x, y[index[1]-100], y[index[1]+100]),'b-', lw=3, alpha=0.5, label='sigmoid')
plt.plot(xs, sigmoid(x, y[index[2]-100], y[index[2]+100]),'b-', lw=3, alpha=0.5, label='sigmoid')
plt.plot(xs, ys)
plt.plot(step1x, step1y)
plt.plot(step2x, step2y)
plt.plot(step3x, step3y)

plt.show()```
#

So I basically need those purple plots scaled down on the x axis

#

I'm just having some difficulties now where:

plt.plot(step1x, sigmoid(step1x, y[index[0]-100], y[index[0]+100]),'b-', lw=3, alpha=0.5, label='sigmoid')

Only takes a "slice" of the data, and isn't scaled down to it if you get my meaning:

#

I think what I actually want is ```python
plt.plot(xs, sigmoid(step1x, ys[index[0]-100], ys[index[0]+100]),'b-', lw=1, alpha=0.5, label='sigmoid')

but then we get a similar problem to yesterday where: 
```ValueError: x and y must have same first dimension, but have shapes (4000,) and (200,)```
merry ridge
#

It's not clear what you are actually trying to achieve. You just want to replace the colored regions by a smooth continuous function without a jump discontinuity?

lean wharf
#

Yes basically

#

So replace these 200 values or so for y in each region with a smooth function

merry ridge
#

It's not clear to me what the difficulty is. You can just choose basically any interpolating function you want and fix it to join those points

#

To me the laziest solution would be to just cut out say the yellow part and interpolate the start and the end of the yellow section by a straight line

#

and if that is not sufficiently smooth at the connecting points, pass the signal through a one dimension heat equation or some other mollifier to smoothen it out

#

in a neighborhood of the connecting points, not the whole signal

#

You could also try connecting it with spline or something, but it really depends on what properties you require in that section

lean wharf
#

At this stage the laziest solution is best

#

I still want the 200 values in that region though, so I can slice them back into the original array

merry ridge
#

so just write down a parametric equation of a line that intercepts those two points and evaluate it at 200 points?

shadow quiver
#

Hey guys. I'm training a model that in every epoch, the train data completely differs. But the model overfits. How is that possible?

lean wharf
#

I was attempting to get the properties of a sigmoid function @merry ridge

merry ridge
#

If you have that plot already then what is the problem?

lean wharf
#

The black line was drawn to demonstrate sorry, should have specified

tidal bough
#

like Hexicle said, just write down the equation and evaluate it at the points

lean wharf
#

That's the bit I'm struggling with I think. I'm an EE engineer, I'm still a python novice relatively speaking

tidal bough
#

to be more specific, write a function that takes an array of X and Y values, calculates the equation of a sigmoid function that passes through the first and the last of the points, and then evaluate it at each of the X points, and return the result

#

let's see...

merry ridge
#

This is just high school math, it's function compositions to shift and scale

#

like take S(x), center it by function composition with S(x-0.7) or whatever the mid point of the deleted data is

#

etc

tidal bough
#

well, there is a problem of the sigmoid function, strictly speaking, not passing through 0 and 1 ever πŸ˜›

lean wharf
#

I understand the maths, it's being able to modify the values of the array

merry ridge
#

Right, but you don't need it to pass through 0 or 1

#

Just restrict the domain and truncate it at some y values

#

then scaling the sigmoid will match it up for you

#

What I would do is to identify which indices your yellow parts contain then use a lambda function to plot a sigmoid over a linspace of 200 points in a list and then shove those values into where the yellow part was

lean wharf
#

Hmm, I'll give that a shot

sly magnet
#

I want to add L_0.5 loss in my model while training
i have written :: loss = tf.reduce_sum(tf.pow(tf.abs(self.Coef),0.5))
But its giving NaN Error!
Whereas its working perfectly with L1 loss and L2 loss
loss = tf.reduce_sum(tf.square(self.Coef))
loss = tf.reduce_sum(tf.abs(self.Coef))
the above 2 lines are working perfectly, but i want to use L_0.5 loss....How to do that?

tidal bough
#

a-ha

lean wharf
tidal bough
#

oh, lol, you made it too πŸ˜…

merry ridge
#

Good work

lean wharf
#

Yeah, had to use smoothclamp instead of sigmoid though

tidal bough
#

here's mine:

import numpy as np
from scipy.special import expit # for single values, manual implementation is faster, but expit is better for arrays
def sig_approx(X,Y,x_scale=10):
    X = X.copy()
    middle = X[len(X)//2]
    X -= middle
    x_coeff = (2*x_scale)/(X[-1]-X[0])
    X = X*x_coeff
    return Y[0]+Y[-1]*expit(X)

# plotting stuff:
%matplotlib widget
import matplotlib.pyplot as plt
#test case:
end = 10
X = np.linspace(0,end,100)
midpoint = X.shape[0]//2
Y = np.zeros(X.shape)
Y[midpoint:] = 1

#usage:
inds = slice(midpoint-10,midpoint+10)
Y[inds] = sig_approx(X[inds],Y[inds])
plt.plot(X,Y)
lean wharf
merry ridge
#

How are you scaling it?

lean wharf
#
    return mi + (mx-mi)*(lambda t: np.where(t < 0 , 0, np.where( t <= 1 , 3*t**2-2*t**3, 1 ) ) )( (x-mi)/(mx-mi) )
chrome laurel
#

so i am watching a kinda of outdated course on pandas does anyone know what happend to the ix[] and are there any equivalent?

merry ridge
#

I mean how are you scaling the sigmoid

tidal bough
#

How are you scaling it?
Such that the first X gets changed to -10, and the last one to 10.
and by the Y axis - by Y[-1]-Y[0]

merry ridge
#

Sorry I mean Aromasin's plot

lean wharf
#
diff = ys[1:] - ys[:-1]
indexBool = diff > 0.385 # Variable adjusted to fit number of steps
index = np.argwhere(indexBool).reshape(-1)

def smoothclamp(x, mi, mx): 
    return mi + (mx-mi)*(lambda t: np.where(t < 0 , 0, np.where( t <= 1 , 3*t**2-2*t**3, 1 ) ) )( (x-mi)/(mx-mi) )

plt.plot(xl, sigmoid(yl, ys[index[0]-100], ys[index[0]+100]))
#

So the top set of code returns the point where the step happens

merry ridge
#

What is the definition of your sigmoid function

lean wharf
#

I didn't use sigmoid in the end, my sigmoid def was: def sigmoid(x, mi, mx): return mi + (mx-mi)*(lambda t: (1+200**(-t+0.5))**(-1) )( (x-mi)/(mx-mi) )

merry ridge
#

I would have defined it differently, to be honest, I don't know why that notation even works

#

Define the signal as f(t) and the sigmoid as S(t) = exp(t)/(1+exp(t).

#

Find the interval [a,b] that contains the yellow part

tidal bough
#

1/(1+exp(-t)) is one less exponent πŸ˜‰

merry ridge
#

Then replace S(t) by S(t - (b-a)/2). Call this function g(t). Then find a constant K such that Kg(b) = f(b) so that K = f(b)/g(b). Then use g(t)*f(b)/g(b)

lean wharf
#

Yeah my implementation is scrappy as hell

merry ridge
#

I don't even understand that implementation. You have a (mx-mi) on the left, and a "/mx-mi" on the right. Those would cancel? I don't understand the notation here

tidal bough
#

@merry ridge The last paranthesis group is the argument passed to the lambda

merry ridge
#

Oh, thanks

tidal bough
#

so it's (mx-mi) * f( (x-mi) / (mx-mi) ), which is about right

lean wharf
#

I could probably rewrite it like:

def smoothstep(x, x_min=0, x_max=1, N=1):
    x = np.clip((x - x_min) / (x_max - x_min), 0, 1)

    result = 0
    for n in range(0, N + 1):
         result += comb(N + n, n) * comb(2 * N + 1, N - n) * (-x) ** n

    result *= x ** (N + 1)

    return result
tidal bough
#

@lean wharf I highly suggest you generally split code into more lines - it's more readable for us, and believe me, you too are going to regret this in a week when you try to read the code and can't πŸ™‚

merry ridge
#

But if you are evaluating at (x-mi)/(mx-mi) that isn't what you want

lean wharf
#

Yeah, I do generally, just code vomiting atm till it works

#

Where N is how smooth I want the curve

#

Probably a tad more legible

tidal bough
#

that code looks like it'd be inefficient, if it works

#

I don't quite get what's happening, but you generally want to vectorize things when possible

lean wharf
#

Yeah, I've read that vectorizing is more efficient in python but again I'm still relatively new to it

vague jetty
#

I'm totally drawing a blank - what's the name for the method of determining the statistical significance of multiple variables on an output? It's not ANOVA, it's <<something>> <<something>> analysis

#

Nvm, it's principal component analysis

safe sparrow
#

im trying to learn the exponent using tf.math.pow in Tensorflow Keras

#

my layer is created doing

#
class Dense_Power(Layer):
    def __init__(self, **kwargs):

        super(Dense_Power, self).__init__(**kwargs)

    def build(self, input_shape):
        self.kernel = self.add_weight('kernel',
                                      shape=(input_shape[1],),
                                      initializer=tf.keras.initializers.glorot_uniform(),
                                      trainable=True)

        # Create a trainable weight variable for this layer.
        self.power = self.add_weight('power',
                                      shape=(input_shape[1],),
                                      initializer=tf.keras.initializers.glorot_uniform(),
                                      trainable=True)

        super(Dense_Power, self).build(input_shape)  # Be sure to call this at the end

    def call(self, x):
        power_val = tf.math.pow(x, self.power)
        dot_prod = tf.linalg.matmul(power_val, self.kernel)
        return dot_prod

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[1])
#

however i only get nan's in my training doing this

#

as it somehow invalidates all weights in my model

gritty jackal
#

While reading Hands-on Machine Learning with Scikit Learn, Keras and Tensorflow Book, I came across this equation for batch gradient descent partial derivative. In batch gradient descent, we try to minimize a prediction error by finding an appropriate weights for our features. In order to find the weights, batch gradient descent calculates partial derivative of the cost function as mentioned in the equation below where it uses random weights initialized at the start of the training, input values, etc(confused about what are all the values). But in this equation, I am not able to understand all the variables, theta is for randomly initialized weights for sure, what are xi and xij out side the braces? I believe yi should be actual value of a dependent variable but need confirmation for the same because it could be predicted value also.

#

Tried looking on the internet, but could not find the explanation of this same equation anywhere.

tidal bough
#

@gritty jackal Fairly sure it's the actual value, because Theta^T @ X is how you get a prediction.

#

So (Theta^T @ x^i - y^i) is just the prediction error of this point.

gritty jackal
#

@tidal bough alright that makes sense, but what about x outside the braces? which value is that?

tidal bough
#

x^i_j is the jth component of the ith input point

gritty jackal
#

and x^i is input values matrix right?

tidal bough
#

it might be easier to see if you see how this equation is derived

#

the cost function is:

1/m * Sum(i from 1 to m)[(Theta^T @ X^i - y^i)^2]
#

Does that make sense so far?

gritty jackal
#

yes

tidal bough
#

now, we take the derivative with regards to Theta_j

#

that's a bit tricky, since it's a single component of the Theta row vector

#

but we can notice that if we were to expand the product:

Theta^T @ X^i

(for any i), we would notice that there's a single term involving Theta_j:

Theta_j * X^i_j
#

so it's only it that contributes to the derivative.

#

The derivative of the outer sum is just the sum of derivatives of the things that are summed over.

#

Each thing is:

(Theta^T @ X^i - y^i)^2

,the derivative of which is:

2 * (Theta^T @ X^i - y^i) * d/d(Theta_j) (Theta^T @ X^i - y^i))
#

Does that make sense? It's the derivative of a composite function rule, if I remember the name right:

d/dx (f^2 (x)) == 2*f(x)*d/dx (f(x))
merry ridge
#

That is the chain rule.

gritty jackal
#

yup , understood. @tidal bough Thank you so much for your time and efforts. Appreciated πŸ‘

tidal bough
#

And then, as I said before, only one of the components of the vector product contributes to the derivative, so that derivative on the right is just X^i_j

#

And so we obtain the right formula

gritty jackal
#

Hmm, yes I got it now.

merry ridge
#

If you want to read many pages on this equation, Cosma Shalizi has a very good and free book on advanced data analysis

royal tundra
#

I have a 5gb csv with 11 columns and over 30M rows. I have to connect it to a MySQL db which I will then connect to AWS (I believe RDS) and then access in Tableau.
I am unfamiliar with MySQL and AWS. I know this is a python channel but can someone please help me set this up? I can pay you

gritty jackal
#

@merry ridge Thanks πŸ‘

#

I would surely take a look at that book

merry ridge
gritty jackal
#

That's great

solar bluff
#

Anyone ever created a pandas ExtensionArray and ExtensionDtype

brittle agate
turbid halo
#

damn this stuff looks hard lmfao

flat quest
#

so true lol

Worst is when it keeps jumping up and down like it does with gans

upbeat cradle
#

Hey, what’s the best way to get a list or dataframe of the counts of a certain field? I’ve created a DataFrame using value_counts but that value seems to store as the field Key which existed before and the value I want the counts of has no field name now

#
values[:25]```

Just shows something like:
          Key
AB     3
BC     2
#

Key is the field of AB, BC, etc

novel remnant
#

np.unique with return_counts=True will return two arrays of the unique values and the count of each unique value. I don't understand exactly what you want but it might help.

upbeat cradle
#

I'm just trying to be able to access the value_counts info as a string

#

I want the value_counts and the AB/BC

#

I can use values.iloc[0][0] to get the value, but not the key (AB/BC)

novel remnant
#

AB and BC are the index of your new dataframe you can access them with values.index[0] for example.
If you want AB and BC can become a new column of your dataframe by using values.reset_index(inplace=True, drop=False)
then you can have both the value name and value counts as columns of your dataframe which can be accessed with .loc.
Still not sure if that's what you're after

mental vortex
#

@lofty meteor Sorry for late reply

#

You still there?

#

Actually, let's move to another channel

lofty meteor
#

Yep

lapis sequoia
#

Hello, I have a general question. I can show the code if someone wants details but the question is like, I had a MATLAB program that i converted to python because it was slow in matlab. the run time in matlab is 0.048 seconds per run, the run time in python is 0.008 seconds, now I am looping the program to verify the speed difference but for some reasons when I loop over python it's slower than looping over matlab.

Just to clarify the loop isn't represented in the body of the code at all
like I have for i=1:3000
xx
end
for matlab where i , is not part of the code
and in python I have
for i in range(3000)
xx
basically I recorded run time for matlab growth and it doesn't grow linearly while python grows linearly. I just want to see if there is a feature used in matlab to like skip some calculations when looping? that i could import to python as it starts faster

#

in python the time to run the 100th loop is 100 * run tiime

#

but in matlab it's just 10x

tidal bough
#

probably something specific to what you're doing.

lapis sequoia
#

but i converted the code from matlab to python code per code

#

sec

#

well only thing is I just added the code in matlab to calculate run time

tidal bough
#

oh dear

#

I can suggest using a debugger to see what's different on the 100th iteration from the 1st one, for instance

#

whether the execution time per iteration is constant or not shouldn't depend on the language

lapis sequoia
#

ok that sounds great, how do i do that

tidal bough
#

so something's probably not right

lapis sequoia
#

well in python it is constant

#

but matlab it gets faster

#

for some reason

#

and i want to incorporate that

#

again my python isn't slowing down ibut time grows linearly

#

matlab grows below linearly

#

as shown in time of run vs number of iteration graph i created

tidal bough
#

...how did you determine that it grows below-linearly?

#

that looks pretty linear to me

lapis sequoia
#

if it was linear time to run 2 iteration = 2 * time to run 1 iteration

#

also it's clear on the end it's flatenning

tidal bough
#

not if it has a constant term

lapis sequoia
#

yep you are right

#

but the main idea persists that python does first iteration for 0.008 seconds but the 3500th is 3500* 0.008= 28 seconds

#

while matlab starts with .05 for 1 iteration but does 3500 in 8 seconds

tidal bough
#

this looks pretty linear for me, again

#

you can do a linear regression and calculate the R^2 if you want, but it sure looks like a linear function + some noise.

lapis sequoia
#

true but that's beside the point

#

there are some calculations matlab doesn't have to redo

#

that python does

#

represented in the constant term

#

as you said

#

my point is that matlab is also slow for me in large iterations

#

and i switched to python because it's faster and it's faster in first 10 iterations

tidal bough
#

but the main idea persists that python does first iteration for 0.008 seconds but the 3500th is 3500* 0.008= 28 seconds
that's not true if there's a constant term.
The right rule is:

(y3-y2)/(x3-x2) == (y2-y1)/(x2-x1)

which should be true for any three points for a linear function.

#

Anyway, so is the total running time slower in Python or not? I don't quite get what you're concerned about.

lapis sequoia
#

it is slower

#

in python

#

anyone know how to turn this list into a dictionary?

["Max "Brown Eye" Scherzer, P (2008-)", "", "Season Pitching", "gamesPlayed: 9", "gamesStarted: 9", "groundOuts: 33", "airOuts: 46", "runs: 19", "doubles: 8", "triples: 1", "homeRuns: 6", "strikeOuts: 69", "baseOnBalls: 17", "intentionalWalks: 0", "hits: 50", "hitByPitch: 1", "avg: .255", "atBats: 196", "obp: .315", "slg: .398", "ops: .713", "caughtStealing: 0", "stolenBases: 5", "stolenBasePercentage: 1.000", "groundIntoDoublePlay: 1", "numberOfPitches: 866", "era: 3.40", "inningsPitched: 50.1", "wins: 4", "losses: 2", "saves: 0", "saveOpportunities: 0", "holds: 0", "earnedRuns: 19", "whip: 1.33", "battersFaced: 216", "gamesPitched: 9", "completeGames: 1", "shutouts: 0", "strikes: 560", "strikePercentage: 64.7", "hitBatsmen: 1", "balks: 0", "wildPitches: 3", "pickoffs: 0", "groundOutsToAirouts: 0.72", "winPercentage: .667", "pitchesPerInning: 17.2", "gamesFinished: 0", "strikeoutWalkRatio: 4.06", "strikeoutsPer9Inn: 12.34", "walksPer9Inn: 3.04", "hitsPer9Inn: 8.94", "runsScoredPer9: 5.01", "homeRunsPer9: 1.07", "inheritedRunners: 0", "inheritedRunnersScored: 0", "sacBunts: 0", "sacFlies: 2", "", ""]

#

there is no constant term

#

this is the run time in python

#

for 3500

#

for 1 seconds iteration i will show u

#

u can see range(1) compared to range (3500)

#

the ratio is 3500

#

for matlab most of the time is in the constant term, so the growth is less

#

do you understand my concern now?

#

while for matlab ratio is 8 seconds / .05 seconds = 180

#

growth

tidal bough
#

for matlab, I'm getting:
0.3966 / 0.2016 * 101 ~= 198.69

#

so it also is exactly proportional to the number of iterations

#

so there's nothing weird going on here - they both scale with the number of iterations directly. The Python implementation is just slower in general.

lapis sequoia
#

how is the python implementation slower when first iteration for python is 0.008 seconds

#

while first one with matlab is

#

like ~0.05

tidal bough
#

sounds like either one first iteration isn't timed right, or there's some bug that causes all the later iterations to be slower

#

I'd check the former first.

lapis sequoia
#

i just have
start=datetime.now()
before the start of the loop

#

and

#

at the end

#

print( datetime.now()-start)

#

of the loop

#

i mean after the loop ends

#

as you said because there is a constant in matlab
ratio of last iteration to first is (slope100+constant)/(slope1+constant), if the constant is high then ratio is small, i am not saying the slope isn't constant

#

i am saying the constant is a large component of the run time

#

while for python it's just directly proportional to the ratios

#

yeah i fitted regression and the time is
time=0.0017*iteration+.007

#

while for python it's simply time =0.008*iteration

#

so matlab grows at a lower rate but yes still linearly

#

so is my solution to try to rewrite the python code independent of the matlab code? i think by force imposing the format of the matlab code i probably am not utilizing all the features of python?

#

or maybe it has to do with spyder itself? cause some people told me spyder is not fast

crisp jewel
#

what does sequence[:,:-1] return

desert oar
#

@lapis sequoia use time.perf_counter for timing

#

Spyder shouldn't affect the speed of running python code

#

Python is doing a lot of work in the background, can't speak for matlab but it's probably doing less work as the runtime is more specialized

#

Eg the "first iteration" in Python also involves implicitly calling iter() which takes some overhead

#

Not to mention whatever memory allocation and garbage collection is being triggered intermittently

lapis sequoia
#

i think one important factor is in matlab i can clear all variables except the ones i need and maybe that makes it faster?

#

while with python all variables stay in memory through the run

desert oar
#

Matlab probably has a lot of optimizations that python does not have

#

That isn't likely to make a big difference but it might help trigger garbage collection at more regular intervals

lapis sequoia
#

well the thing is everyone who used python for their calculations told me it's faster than matlab

#

but who knows i guess i have to know how to write a fast python code

#

rather than convert a matlab code to python

#

i mean i wrote the code in matlab and tried to convert it line per line and probably that's not the best thing

desert oar
#

That I don't know, but in general porting code from one language and runtime to another is not a guarantee that you'll get a fair comparison

#

You posted both versions above?

lapis sequoia
#

yes

#

well it says matlab is shell but it's just matlab, but because in matlab comment is %

#

but it i meant the discord server xd

desert oar
#

is there a "short version"

#

this is a lot of code

#

and what exactly are you concerned about

#

the matlab code seems faster per iteration than the python code?

#
    e=np.array([2e5]);   # Young's MODULUS OF ELASTICITY
    g=np.array([1e5]);   # MODULUS OF RIGIDTY 
    den=np.array([7850]); # MASS DENSITY 

why are you creating all these length-1 arrays?

lapis sequoia
#

because the form

#

has to accept multiple size

#

i mean i am coding it for running over various values

#

of e,g

#

etc

desert oar
#

and what is it with matlab programmers and horrible variable names πŸ˜›

#

would it kill you to write density instead of den?

#

i dont understand the point of this though

#

what is igtyp supposed to be

lapis sequoia
#

well this is finite element modelling, so igtyp is like whether u have a tower made of same material

#

or different material

#

if i have 3 types of material i would have [1 2 3]

#

and each value would corrospond toa type

desert oar
#

oh sorry i meant imtyp

#
rho = den[(imtyp-1).astype(int)]
lapis sequoia
#

imtype is material property

#

igtype is geometric properties

desert oar
#

it looks like you're just trying to broadcast a number into some shape, right?

#

since igtyp and imtyp are all 1?

lapis sequoia
#

yeah but they need now be

#

basically i am drawing a beam

#

that beam could be uniform or it could have like increasing cross section

#

or like different types of metals

#

and each node represents a segment

desert oar
#

right, but practically you're using imtyp to "expand" this density into a matrix of some size

#

is that right?

lapis sequoia
#

yes

#

em=np.array(e[(imtyp-1).astype(int)])
gm=np.array(g[(imtyp-1).astype(int)])

#

em will be a 40x1 vector

#

let say i had4 types i element, e would have 4 elements, and i would have em(0)... em(9) equal to e(0)

#

then em(10) to em(19) equal to e(1) etc

#

my em would have each element as a sub of the possible building blocks defined in e

#

same for gm

desert oar
#

ok. im not sure about matlab but in numpy you can just do this

DENSITY = 7850.0

...

rho = np.full(imtyp.shape, DENSITY)
#

or better yet

rho = np.full_like(imtyp, DENSITY)
lapis sequoia
#

instead of den[(imtyp-1).astype(int)]

#

?

desert oar
#

yes

#

should be somewhat more efficient

#

and also easier to read / less confusing

lapis sequoia
#

i needed to use astype iint

#

cuz it kept telling me things like it's floating or tuple or something

#

i don't remember

desert oar
#

thats fine, ```python
rho = np.full_like(imtyp, DENSITY, dtype=int)

lapis sequoia
#

i keep having to use as type int when i try to index

desert oar
#

oh

#

i see

#

well you arent using any indexing here

#

all this is saying is, "make a new array in the same shape as imtyp and fill it with DENSITY"

#

the actual contents of imtyp are irrelevant

#

making an array of 1s just to "expand" numbers to arrays isn't necessary at all ever in numpy

lapis sequoia
#

no the thing is let say if imtype was [ 1 1 2 3 1 ] then it should have [ den(1) den1) den(2) den(3) den(1)]

#

would that code actuate that?

desert oar
#

oh, no

#

but den is only 1 element

#

so that would be an error anyway...

lapis sequoia
#

den can have as many elements as there are unique elements

#

of imtype

#

den has 1 element because in the simple example i am doing

desert oar
#

ah, ok

#

in that case never mind

lapis sequoia
#

basically this is like me calling each element an ID and associated a set of values to that ID

desert oar
#

right

#

this is fine then

slender nymph
#

hi folks

#

Simulate a portfolio of home insurance policies (5,000 homes insured).
The value of damages is distributed according to a Uniform law between $ 250,000 and $ 2.25 million.
An β€œaccident” can occur with probability p. If this is the case, there is a probability q that the damage is the maximum possible (total loss). With probability 1-q, the loss is partial according to a Uniform distribution on (0,1).
You don't know what the liability loss could be, but it can be up to 10 times the value of the property.

#

what module i need for this?

desert oar
#

numpy and maybe scipy @slender nymph

lapis sequoia
#

the thing is
em=np.array(e[(imtyp-1).astype(int)])
gm=np.array(g[(imtyp-1).astype(int)])
rho=den[(imtyp-1).astype(int)];
sxi=mi[(igtyp-1).astype(int)];
a=aa[(igtyp-1).astype(int)];
sk=shp[(igtyp-1).astype(int)];
dx=xp[(n[1,0:]-1).astype(int)]-xp[(n[0,0:]-1).astype(int)]
dy=yp[(n[1,0:]-1).astype(int)]-yp[(n[0,0:]-1).astype(int)]
me haviing to use as type int all the time

desert oar
#

@lapis sequoia ok, other than that i don't see anything too strange in your code. although

    f[tuple([fdof,0])]=f1

is weird

lapis sequoia
#

is it slowing me down and how can i change it?

desert oar
#

you can save it as another variable

lapis sequoia
#

no i mean how can i make it normally accept indexiing

slender nymph
#

how can i simulate 5k police house insurance?

lapis sequoia
#

without having to write as type iint all the time

desert oar
#
imtyp_indexer = (imtyp - 1).astype(int)

em = e[imtyp_indexer]
gm = g[imtyp_indexer]
rho = den[imtyp_indexer]
...
lapis sequoia
#

btw that tuple is because it refused to accept f[fdof,0]=f1

desert oar
#

what is f

lapis sequoia
#

it's a vector

#

array of float 64

desert oar
#

oh i see, f=np.zeros((ntdof,1))

#

btw you can remove the trailing semicolons

#

python doesn't need them

#

it looks like you have them in some places but not others

lapis sequoia
#

i know but ii copy pasted from matlab

#

so

#

i mean i copy pasted the matlab code and tried removing some

#

i guess i can just do replace all

desert oar
#

wait

#

oh

#

why is fdof an array?

#

and why is f1 an array?

lapis sequoia
#

because there can be multiple forces

#

f is the acting force

desert oar
#

f1 = np.array(100) this is a "size 0" array, don't do this

lapis sequoia
#

it can act on multiple component of the beam

desert oar
#

i see

#

what about f1?

lapis sequoia
#

f1 is the value of f

#

fdof is which part of the beam does it act on

#

basically it's like

desert oar
#

but is that supposed to be an array too?

lapis sequoia
#

i have to specific the location and value of f

#

yes same size as f1

desert oar
#

ok

#

it doesn't work because you wrote np.array(100) instead of np.array([100])

lapis sequoia
#

ah well i didnt test it for vector case yet

#

but good for letting me know

desert oar
#
import numpy as np

nn=41
ntdof=nn*3

f1 = np.array([100])
fdof = np.array([122-1])
f = np.zeros((ntdof, 1))
f[fdof, 0] = f1

print(f)
lapis sequoia
#

so i won't need tuple

desert oar
#

np.array(100) is a weird array thing that has zero shape

lapis sequoia
#

if i do that?

tidal bough
#

this code could use some variable names πŸ˜…

desert oar
#

correct

#

@tidal bough i know i already said

#

it seems to be a plague among matlab programmers

#

every matlab programmer ive worked with writes code like this

#

as little whitespace as possible and as short variable names as possible

tidal bough
#

I suddenly feel the urge to check my old Octave code πŸ˜…

lapis sequoia
#

because we are engineers first and not experienced with programming practices

desert oar
#

@slender nymph is this for school?

#

@lapis sequoia no its because you have bad role models who also write code like this

lapis sequoia
#

yes exactly but majority of engineers are such xD

#

bad role models at programming

#

at least the ones i work with

desert oar
#

@crisp jewel if sequence is a numpy array, that returns everything in the array except the last column

lapis sequoia
#

obviously i can't speak worldwide

tidal bough
#

I suddenly feel the urge to check my old Octave code πŸ˜…
...ehh, it was good enough πŸ˜›

desert oar
#

anyway, other than creating the imtyp_indexer variable i dont see anything really slow about this code

#
  totl=sum(xxl);
    glom=np.zeros((ntdof,ntdof));
    glost=np.zeros((ntdof,ntdof));
    for ie in range(ne):
        est=estif_frame(ndofe,ie,a,sxi,xxl,em,rho,theta);
        for id in range(ndofe):
            for jd in range(ndofe):
                igdof=ndof[id,ie]
                jgdof=ndof[jd,ie]
                glost[igdof.astype(int)-1,jgdof.astype(int)-1]=glost[igdof.astype(int)-1,jgdof.astype(int)-1]+est[id,jd]

im not sure if there is a better way to do this

#

iterating over arrays is slow

#

but there might not be a vectorized version

lapis sequoia
#

well one way i can solve it is that i oonly do it once

#

but if iteration i >1

desert oar
#

right, but thats changing the algorithm

lapis sequoia
#

i use thhe same values of igdof

desert oar
#

which is fine, but not a fair comparison w/ the matlab code

lapis sequoia
#

because ndof will be the same my entire loop

desert oar
#

sure

#

of course you can always try JIT compiling this with numba too

#

i do however recommend you read through PEP 8

#

!pep 8

arctic wedgeBOT
#
**PEP 8 - Style Guide for Python Code**
Status

Active

Created

05-Jul-2001

Type

Process

lapis sequoia
#

C:\Users\hamad\Downloads\Frame_2D_EU123C_new.py:121: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.

#

btw when i did what u wrote

#

it told me this

desert oar
#

a non-tuple sequence?

lapis sequoia
#

when i removed tuple

desert oar
#

what did you write

lapis sequoia
#

and put []

desert oar
#

no

lapis sequoia
#

instead

desert oar
#

wrong

#

look at what i wrote

lapis sequoia
#

f1= np.array([100]) # load value

desert oar
#
f[fdof, 0] = f1

it sounds like you did this

f[[fdof, 0]] = f1
lapis sequoia
#

yeah

#

uu r right

#

what's the difference btw?

#

it doesn't giive the error anymore

desert oar
#

@lapis sequoia with f[fdof, 0] you are indexing with fdof and 0. with f[[fdof, 0]] you are indexing with a single object, [fdof, 0]

#

numpy tries to be smart and infers that if you write f[(fdof, 0)] you mean f[fdof, 0]

#

the error has to do with the fact that the default inference behavior is changing

#

however i recommend not relying on inference

lapis sequoia
#

ah ok

#

could something completely unrelated to the code cause the speed difference?

#

like whether spyder is installed in ssd or hdd etc

#

or the code i am running

desert oar
#

spyder? no

#

i mean... maybe, if it has some kind of debugging features that are slowing down the interpreter

#

but probably not

tame pelican
#
#Absolute
@cli.command()
@click.argument('N1', type=int)
@click.option('--num', is_flag=True, help='INTEGER')
def abs(n1):
    """Calculates absolute value."""
    answer = int(abs(n1))

    click.echo('abolsute value = {}'.format(answer))

code is above, im trying to make a command so it shows the absolute value but an error says its not valid.. ideas?

lapis sequoia
#

i dont think it has to do with my code @desert oar
if I do this

from time import perf_counter
start= perf_counter()
for j in range(1):
for i in range(j):
x=1;

end= perf_counter()
delta=end-start

desert oar
#

@tame pelican what is the error? it looks like you're missing the num parameter to abs

lapis sequoia
#

and compare to matlab, it's the same thing for 1 iteration python is faster

#

for 10000 iterations matlab is faster