#data-science-and-ml
1 messages ยท Page 202 of 1
thank you very much for help! ๐
he uses the term dimension for everything so that kinda confused me
yeah, i too try not to get confused on that
arrays are confusing at themselves
what he meant is that if you were to plot it on a graph, the graph would be 100 dimensions
meaning 100 features
number of dimensions = number of features?
not really
input_dim means what is the shape of array that you are passing into the function
huh, then it's worth it to always type in input_dim = array.shape
and in ML generally we represent the stuff to learn in a 2D matrix
yeah so we can just do input_dim = array.shape ?
exactly, but instead you say, input_dim = array.shape[1:]
because the first dimension is generally the number of examples
for a dataset
so if the shape is (10, 5)
input_dim = 5
?
(not for my example, but for ml)
then what if the shape is (10, 5, 5)?
so input_dim = 25?
that looks simple now for shapes since i was always confused by stuff like "oh god what shape should i pass into there and how much neurons there should be"
no, the inut dim is (5,5)
huh
so, input_dim always = array.shape[1:]
even if it's d100
as it seems at least
yep
since we dont mention the batch size
@lapis sequoia read this
so @prisma verge what are we trying to predict?
okay thanks
so, from the csv i read and the pic what you sent....this seems like a regression problem
49 = round
3, 1, 2 = team results
and definitely not classification
That regression is the problem of predicting a continuous quantity output for an example.
seems like it
it's like predicting results for tv show actually, based on previous results
yeah, since we are not classifying anything
yes, exactly
just i've only built a bit of image classification networks and finetuning one about text generation
so i don't know how to handle this situation
and it'd be probably quite useful experience for my knowledge of ml
yeah, if my net works T^T
one more question xD
sure
from keras import Sequential
from keras.layers import Dense
import numpy as np
# For a single-input model with 2 classes (binary classification):
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=(200, 100)))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
data = np.random.random((1000, 200, 100))
labels = np.random.randint(2, size=(1000, 1))
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)```
just wanted to tweak around with input dim to make my self understand
i get this error:-
but isnt input_dim = data.shape[1:]??
well, in this case its a dense layer right?
dense layers, well need to flatten the input
so dense layers dont support 3d?
nope, you will have to flatten it, what you can do is add a flatten layer before the dense layer
the problem with predicting the data imo is that there's not enough of dataset
just 47 rounds which is really not a lot for network
like this?
Flatten(input_shape=(200, 100))
?
remove the 1000
its input_shape u sure i gotta remove 1000?
yus
the way you are asking is like its a life and death situation
i am scared now xD
lmao xD
so the first element in the tuple is always the number of examples so i gotta remove it?
@prisma verge i too am confused......we dont have a value for 'y' instead we are just predicting x
yus
that thing above can also be:-
Flatten(input_dim=(200, 100))
?
oh okay, we can use the number of days and use multiple linear regression
@lapis sequoia yus
no wait, there might be a small issue
now i wonder whats the difference between input_dim and input_shape xD
be careful, there is an apparant subtle difference between them
also, i've asked too much probably already, but may you comment the code for better understanding?
i'm aiming for understanding more than result :p
sure, let me see what i can come up with it in it
also, really thank you for all the help with this fuss
cool i didnt get an error thanks man
np
so, @prisma verge we will have to use linear regression where X will be the day number and Y will be the one of the features
huh, why one of the features though if there's three besides the days?
ah
gimme a little bit of time, i'll tag you if i manage to do it
that'd be really cool!
also, if this prediction model won't lie, i'll give you something from the wins, haha
xD
well, i got 32% accuracy xD
*34%
so, i got the accuracy up to 43.75
got it up to 50 xD
how much epochs?
welp, here is the thing @prisma verge
it turns out that there is something called Multivariate Multiple Regression
that is used to capture the relation between the multiple outputs
we might need that
but there's no that thing in keras
and there's very little info about it except universities and other very scientific resources :p
and I dont think keras or sklearn has that
aw man
that's sad
https://www.datacamp.com/community/tutorials/keras-r-deep-learning
this seems to be the thing
because it was linked in this question https://www.researchgate.net/post/Implement_Multivariate_Regression_by_Neural_Network_with_Tensorflow
so, bunch of relu's
i guess, that'd be interesting thing for you but too advanced for beginner like me :p
thing is
especially when i'm not that good at math
what we need is something to capture the relation between the outputs
is multivariate regression the only way to do that?
because separately, we are getting the outputs for each of them
but they are not like 1, 2 and 3
multivariate multiple regression
multivariate means with multiple inputs
but multivariate multiple means the outputs are also considered
... i'm not really sure how to handle that then
since it'd be no use for me since it's just too hard for my basic understanding, and it'd be probably a lot of sucking power from you
i am thinking of doing something really weird tho
i guess, 30 rounds heists-streak is my dream and will stay that :p
like, train a model on the number of days
and then train that model on other inputs
its like linking multiple regressions together
hmmm
hey, that may actually work
lets try that
like, instead of making whole sandwich, you make sandwich part by part
but it's still sandwich :p
woah, that's a lot
welp that didnt work
i used the previous model to predict the new models training values and used the old labels
well, the accuracy fell
... whoops
yeah, i think i see the issue
seems like Life (TM) logic doesn't work in ML world :p
its that, we dont have enough features
we only have one feature to predict stuff off
so, its not going to capture the output
... so, the stuff with multiple regression is needed
yeah
ah man, why everything nice in this world is so hard :p
or i might be running too many layers
apparantely if your network becomes too deep, the accuracy starts falling
how much layers do you have actually?
3 hidden should be enough for every feature, no?
2 hidden in first, 1 in second
but it went to 50 in the past, so it's actually lower than some previous variant
hm
yeah, but for some reason the validation accuracy crashes when training hits 50
i think that overfits it in the middle
well, there is barely anything to dropout
that works when your training accuracy hits like 90 and all but validation is bad
here our training itself is bad
hm
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import to_categorical
df = pd.read_csv('/content/gdrive/My Drive/m.csv')
dataset = df.values
X = dataset[1:, 0]
Y = dataset[1:, 1:]
#Y = dataset[1:, :]
'''min_max_scaler = preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
print(X_scale)'''
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X, Y, test_size=0.3)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)
model1 = Sequential([Dense(1, activation='relu', input_shape=(1,)),
Dense(32, activation='relu'),
Dense(32, activation='relu'),
Dense(3, activation='tanh')])
model1.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
hist = model1.fit(X_train, Y_train,
batch_size=1,
epochs=100,
validation_data=(X_val, Y_val))
predictions = model1.predict(X)
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(predictions, Y, test_size=0.3)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)
model = Sequential([Dense(3, activation='relu', input_shape=(3,)),
Dense(32, activation='relu'),
Dense(3)])
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
hist = model.fit(X_train, Y_train,
batch_size=1,
epochs=100,
validation_data=(X_val, Y_val))
print(model.predict(model1.predict([50])))```
so, it can't achieve bigger 50% without math, stats, and tensorflow?
not that I know of
aw
well, I mean, thats the limit of my knowledge right there
I don't know what could increase it.
not like my knowledge is very better :p
more features, more examples, or sometimes better model
yes
if you want, you can do one thing
let's see how this heist goes and if predictions are bad or good
though, after second run the results are very different
we need to summon ML specialists!
so, the results will be different
i'll give you a function, to take the values and predict them right
gimme a sec, i need to check something lol
what does an activation do?
values after being multiplied by the matrix are passed through a function
that function is activation and it introduces non linearity
ml shows best results until epoch 500 hits
then loss chanegs veeeery slowly
changes*
should i choose activations based on the situation?
sometimes they do matter, sometimes they dont
so, what's that function I wonder?
so i can give relu for now?
here
def correct(list_values):
temp = np.argsort(list_values)
names = ['Micheal', 'Franklin', 'Trevor']
return [names[x] for x in temp[0]]```
pass in the prediction we made
and yes @lapis sequoia you may do so
instead of model.predict[50]?
Thanks bro i luv you [no homo]
yeah, the weird print() at the end right? pass its parameter into this function
and np!
works?
it goes through epochs
well, thats the model lol
well, i dont know why am i feeling happy even tho it didnt exactly work out
it still goes throug epochs
where did you get this csv from tho?
i made it in google tables by copying it from the site
oh? what site was it?
lol reddit only
lol
but yeah, this guy does reddit giveaways which has 2/3 chances to win
i see
you gotta have 9 streak to get 5 euros steam giftcard
and chance of it working is 2.3%
so, since i'm interested in ml, i thought "hey that'd be nice to predict with machine capabilities!"
but then i got confused with what should i do
so, uhhh
wat?
that's what it outputs
lol
accuracy now doesn't bounce
so, what does this guy do?
it shows stuff in order "biggest chance > middle chance > lowest chance"?
yeah thats what the function does
this guy does a giveaway series where you get more points the more your streak is
streak = heists without getting onto biggest team
so this time, it was to make a predictive model?
well, you can possible get as far with this model as far you can without a model lol
this ai is always bouncing within it's worth to choose trevor or michael
*possibly
maybe Franklin doesnt get in the biggest team a lot
xD, so, then i guess we are better off without a model xD
just make a random number generator
nah, it's your task to go on the team with least numbers or at least middle team
biggest team = arrest
i just thought that with all prediction stuff it's actually possible to do some predicts that have more than 20% chance of being true
nah boi
for that just use maths
you need a lot of data and features to do predictive modeling
good luck, and that was fun :p
@lapis sequoia moving here; Do you know of an efficient way to handle EQ?
@spark nimbus what do you exactly mean? Like frequency bands?
yup
since this is the most taxing part of my current implementation
def bands_to_eq_size(self, frame: numpy.array) -> numpy.array:
frame *= self.eq
return frame / 1000
def transform(self, frame: numpy.ndarray) -> numpy.ndarray:
fftified = self.fft(frame.copy())
eq_applied = self.bands_to_eq_size(fftified)
return self.ifft(eq_applied)
def process(self, audio: AudioSequence) -> AudioSequence:
left, right = audio / 2
new_left = []
new_right = []
for old, new in zip([left, right], [new_left, new_right]):
for frame in old:
new.append(frame.apply(self.transform, seq=True))
return (
left.new(
numpy.concatenate(
[f.audio
for f in new_left])) *
right.new(
numpy.concatenate(
[f.audio
for f in new_right])))
```here's my current implementation
Well I will work on that this evening. But I cant promise anything. Asking efficient way is quit relative.
Hmmm srry i dont have a pc right now. I am at work hahah
right now it's done frame-by-frame
Hmmm well
My way is to divide frequecy spectrumnfor human hearing into bands
So say 0 hz to 22 khz can be static or a logaritmic function
wdym by that?
The on each band you can do fft
hmm
Well the thing is
sadly due to lack of knowledge in this area I have no clue what all that means
and there's no easy way to get into audio it seems
Welll uhm
Yeah mine knowledge is also very low. It costs me a lot ofntime. But you have to firstly know what you want to achieve
Is it plotting? What do you need exactly? Because it can be peaks? Of a specific band plotted? Or you can have averages of bands
You really must tell exactly what you want because thayt will
basically applying FX
passing straight to pyaudio
So you have a wave file then you want to apply fx and send that to pyaudio?
Fx is quit complicated bro. You really nedd to understand basic fft and complex numbers to make a fx on it. So be honest i dont have a clue on how to do that. I know i can get a plot for freqs peaks and i know how to calculate averages on freqs bands but appyling fx is quit complicated
@sand reef hey! what do you think, what will happen if you pass less data to the nn?
like, only last ten entries?
will the prediction become closer to "The Truth" (TM) because last entries are easier to judge from?
accuracy is 60% on latest entries btw
@sand reef so, NN doesn't work because it never gave me a result like this lol
hi there!
Hows everyone doing today?
I am part of the chamber of commerce comittee for my local Indian community in my city
i am thinking about creating a census from the indian community so i can understand the demographic and the analysis of the demographic
so my committee can understand how to better serve the indian community
it would be easy to send a newsletter to everyone with a google form and to work with other indian leaders in my community to make a concerted effort to gather information about what part of india they are from , their type of professions, their families and household, and age group, and area they lived. of course, this is not about asking private information but more a general information
do you have any suggestions or ideas of how I could improve this project idea?
How do you want to improve it? Like what are your limitations? And what kind of improvement are you looking for?
it actually guessed the stuff right!
once
now i've reran it to see if it'll gonna guess it again
because i've added dropouts and third network to it
@sand reef imagine all of money we'll win if my way works!
Lol
"your car doesn't work? just add another engine to it" (C)
You added a 3rd network like how I added the second one?
yeah
Well, how I wish they would.
though this predict may have been an accident
Probably
network is still learning to see if it wasn't
Ppl would say calculate a p value for it
do i look like a mathematician?
But we have only 1 feature, wtf we gonna calculate?
we'd have more if those keras guys implemented multiple regression thing
and we could win thousand of 5 euro gift cards...
We are making features out of one feature
Well. Imma go read the pdf I was reading.
You also read this.
MAN
IT PREDICTED IT RIGHT
though it may have been reverse order because i don't remember what order it was
Read that and become a mathematician
IT MADE "michael - trevor - franklin"
AND MICHAEL WAS BIGGEST TEAM, TREVOR WAS MIDDLE, FRANKLIN WAS SMALLEST
DOES IT ACTUALLY WORK???
Yep. That was reverse
i've reran it again
So. Nope. I think that was luck.
i mean third time cannot be luck
In reverse?
by in reverse i mean it could have outputted franklin trevor michael but i don't remember since my memory is goldifshes memory
i'm rerunning it
Well. If you do get 1000s worth of gift cards, send a couple thousand here
and if it outputs good result third time
๐
i'll donate euro 2000 to you, get euro 2000 for me, and donate euro 1000 to somebody
:p
Sure!
that's if our magic ai actually works
Well. I gtg now. Time to sleep. It's kinda getting late now.
Yeah. I actually hope it works.
... it made it again
michael trevor franklin
predicting it again on new dataset
let's see it it'll talk truth
... okay now it always outputs michael trevor franklin
even with new data
weird
it now outputs same results over and over
2.283496 2.372942 2.0555158
it says now that trevor will be least, franklin biggest, michael middle
xen0remind
(putting this to search for it later when heists end)
I will share something about data science ML and DL maybe you guys know it but I guess that cheat sheet will be helpfull for you
ML Structure
Deep Learning (Coursera Notes)
Thanks a lot @jagged stump !
450+ msgs since yesterday. This channel is suddenly rocking
the summer of "how do I get started on ml/dl?" ๐
"I'm really interested in AI and ML" - literally every first and second year computer science student ever
๐ถ
But 3rd year?
And what do I learn to get into Computer Vision? I am seeing a lot of stuff and feeling overwhelmed and don't know where to begin from. I know how to use Regular conv NNs.
I thought I already told you what is expected out of a job
@sand reef
The list is completely different if you want to go for traditional computer vision over deep vision
If you need something else about documemnts I have so many
I can share guys
AI Basics
ML
Here so many cheatsheet
Holy...
Well. I guess I'll do something about it then
So, basically, ML and DL isn't as hyped up as it appears to be? @lean ledge
it is as hyped up as it appears to be in the public
just not the industry or at least just in the minority of industry
I have a question . What is the best way for real time brand detected ? Any idea?
CNN etc which one you advice
a large majority of major models used by large institutions are still more "traditional" machine learning, partly because they work very well still and partly because DL/ML is harder to "explain" to regulators than a regression type model
you would have to get into more details which would be cumbersome for explaining to less tech-savy people
I have a question from ISLR.
This is from linear regression
These are the parameters we have to minimize, and we calculate them using this formula (above)
This is given formula for calculating the variance of the means. (above)
How are we able to derive those? (above)
What I know:
2. Standard error is measured for the "deviation" between the "means" taken from random samples and the true mean from a given dataset. ```
So, what am I missing here? I can't seem to derive this weird formula. Nor am I able to logically deduce this either.
ISLR First Printing (Page 66)
@sand reef well it kinda predicted it but trevor was middle not michael
franklin was biggest still though
Nice!
@sand reef I'm not sure which part you're having issues with
I need some help
there's three classes in my dataframe, how do I limit them to 100K rows each
@silent swan how the standard error formulas are being derived for the square of the parameters.
ah, I would say for those you'll actually need to get into real statistics
like a standard regression textbook or regression chapter in a statistics textbook will derive them
import pandas as pd
markets = ['ETH', 'BTC', 'ADA', 'ETC', 'EOS', 'NEO', 'BCHABC']
pair = 'USDT'
dfs = []
start_time = '2018-01-01'
end_time = '2019-06-15'
for market in markets:
df = pd.read_csv(market + pair + '.csv', parse_dates=True, index_col=0)
df = df.drop(columns=['Open', 'High', 'Low', 'Volume'])
df.rename(columns={'Close': market + pair + ' Close'}, inplace=True)
df = df.loc[start_time: end_time]
dfs.append(df)
there is like 7 columns.. each with the closing price of each coin.. is there a way i could add all of the prices and average it and make it a separate column called 'average'
there is
if i am not wrong, there should either be a mean function in pandas itself, if not, i think numpy will work as well
cuz, again, if i m not wrong, pandas was built on numpy
so i have a function like this
def epochUpd(row):
row.at['updated'] = time.strftime("%a, %b %d %Y", time.localtime(row.at['updated']))```
and i have another function called clean which does something like this
def clean(df)
for index,row in df.iterrows():
epochUpd(row)
df.to_csv('updated.csv',index=False)
Basically, I am trying to clean a data file, but I want to change every 'updated' column to show the value I created from the epochUpd(row)
Firstly, I don't think the iterrows() is the right way to go because I feel like it'll take a while and there could be a much easier way
Also, I'm not sure if the formatting is even near correct
Could I change the epoch function to
def epochUpd(df)
df.updated = time.strftime("%a, %b %d %Y", time.localtime(df.updated))
and then keep the for loop how it is in the clean function
nevermind, getting invalid syntax on that last function
but there's gotta be a way
may i know whats wrong with this code
import tensorflow as tf
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from keras import Sequential
from keras.layers import Dense, Flatten
x, y = load_digits().data, load_digits().target
x_train, x_test, y_train, y_test = train_test_split(x, y, shuffle=False)
model = Sequential()
model.add(Dense(64, activation=tf.keras.activations.relu))
model.add(Dense(64, activation=tf.keras.activations.relu))
model.add(Dense(10, activation="softmax"))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3, batch_size=10)
print(model.predict(x_test[0]))```
File "/Users/Kushi/PycharmProjects/test/misc/ML_test.py", line 19, in <module>
model.fit(x_train, y_train, epochs=3, batch_size=10)
File "/Users/Kushi/PycharmProjects/test/venv/lib/python3.7/site-packages/keras/engine/training.py", line 952, in fit
batch_size=batch_size)
File "/Users/Kushi/PycharmProjects/test/venv/lib/python3.7/site-packages/keras/engine/training.py", line 789, in _standardize_user_data
exception_prefix='target')
File "/Users/Kushi/PycharmProjects/test/venv/lib/python3.7/site-packages/keras/engine/training_utils.py", line 138, in standardize_input_data
str(data_shape))
ValueError: Error when checking target: expected dense_3 to have shape (10,) but got array with shape (1,)
but in the load_digits dataset there are 10 class names so shouldnt i have dense 3 ten so that it gives me the probability of each one?
please ping me when helping
@sand reef
['Trevor', 'Franklin', 'Micheal']
[[2.3735387 2.3416874 1.861764 ]]
let's see if it works :p
if it'll guess biggest team third time
that'd be cool
@lapis sequoia you didn't give it that though, you gave it a single int
HUH
IT DID GUESS THAT MICHAEL WILL BE BIGGEST
still weird with middle and smallest team
it mixes them up so trevor was middle
but yeah it predicts biggest team so that's good already
['Franklin', 'Micheal', 'Trevor']
[[2.1652157 1.9663016 2.2141366]]
let's see if next game will be right too
if yeah...
then that's a gem
Hey guys what is the best place to learn ML and Neural Networks?
dude first make a mind map what you need to start learning it because you need for example programming fundamentals, linear algebra, stats and later also calc
to properly understand it
cough no you don't cough
assuming you know more complex algebra and functions and how to use them
?
to really understand it yes you do
for building networks just for fun you can use keras
also there's amazing Zero to Deep Learning course which covers all the stuff you need including math explainatory
yes but building and understanding are two different places
Fullstack Deep Learning tutorials to go from zero to production covering all basics and advanced concepts.
Math was my best finals subject I understand it ez
but can you do linear algebra?
he's 13
why do you think so tho
... oh
Erm I am 14 now sur
happy birthday then!
I'm using datacamp
with source codes, jupyter notebooks, and stuff
i'm planning to go web infosec so
i'm just doing nns for fun
btw khan academy is good for math
like text generation, prediction, stuff
oohh sure
and it doesn't require much math, just basic array understanding
thats ez
and python and numpy syntax knowledge
I actually need to know a lot to do good DS
SQL, numpy, pandas, ML, math etc.
python
a lot of visualizing tools
yes but that's if you go for DS as a job
i'm building projects just to laugh them off with friends :p
oohh hahahahahaha
I want to have an own robotics company which is using AI to make them human like
i've made a thing that'd make poem-like neuromancer quotes
oooohhh that cool I guess
i also want to learn markov chains but my brain refuses to understand their realizations in python
whats that?
nix out of nowhere
oh no by mentioning scientific words you summon nix
sure
i have ben watching for around 10 minutes actually
hahahhaahha
because I dont convert to jpegs
i kinda get markov chains basics like
there's a bunch of words and every word has a chance to follow the other word
no
... but
that is not the deeper idea of markov chains
where can i get simple explaination of MC
the deeper idea of markov chains is that you have a set of states and then you have a transitions probability from every state to every state
that is everything there is
the words are just a form of states
oh
so MC are just counting probabilities between transitions which can be any object like
weather, number
kinda like that?
counting probabilities?
Say. Has anyone heard of reservoir computing? What is it about?
compute probabilities
how do i say it
i can't say it with right words
that's not what i mean
the markov chain itself does not compute probabilities
it expects you to provide probabilities and states
yeah i get that
all the libraries you see for markov chains just do this for you
dozens
may i just get fancy text generator
because i love random stuff
and markov chains examples seem to be like human words but still with a bit of computer craziness
i mean you can easily implement stationarymarkov chains yourself
it gets a bit ugly with mth order markov chains because you have to look m states back for your new decision
is there a way to access printed information?
hmmm i thought it would be topical to data-science
but i don't understand markov chains and i just want to get more random stuff that look humanese but weird
sorry
dont worry
so for markov chains, suppose we have three states s1 s2 s3
the transition probabilities could for example be
0.4 for s1 -> s2
0.8 for s1 -> s3
0.2 for s3 -> s2
0.1 for s3 -> s2
0.9 s2 -> s1
all ommited ones are 0
so if you start out in s1 you check your probabilities and see you have to go to s3
s3 transitions to s2
and s2 transitions to s1 (and now we're lost in an endless loop)
so nix as you're here, can you tell me what this is Parametric representations of lines in linear algebra because the guy in khan academy only confuses me more
huh
idk that's why I ask the all mighty nix who knows mostly everything
that now seems easy
yes it is extremly easy @prisma verge
and yeah, nix seem to know everything
pretty much
praise nix
that's the title of the video, i think you should watch it to find out what it is
may we get :praiseNix: emote
yh
@olive willow is there a specific part of it? or the whole thing is confusing
also Id argue that a parametric representation of a line is (from a geometrical point of view)
a vector lets call it a
another vector lets call it b
and a variable scalar l
and then you could represent a line which goes from a in direction b like
a + l * b
but that is my school geometry thinking I dont really know what it is, it just sounds similar
and no I do not know everything in fact id consider myself horribly bad at almost everything related to data science
also, nix
from what i've heard, to get probability for each word you gotta divide the amount of words on amount of times this word appeared in the text
right?
its saying that a vector is a "parametric" representation of a line
"parametric" in this case means that it uses a few specific numbers, i.e. parameters, to represent a large collection of objects, in this case a line
but a vector represents an infinite amount of lines
i don't want to hear anything about infinitiness so im getting outta here
it would be infinite mostly
this isn't that @prisma verge
i mean a vector is just a direction, it doesnt define from where it starts just where it points
so you can choose every point in space
VECTOR IS JUST X AND Y CHANGE MY MIND
yh but if we would start at the origin
derez x and derez y
to make things easy
@olive willow @prisma verge the punch line is at around 10:00 in that video -- you can use y = mx + b to describe a line in R2, but a vector can be used to represent an arbitrary "line" in an arbitrarily-dimensioned space
hence a vector is a parametric representation of a collection of lines
an arbitrary "line" in an arbitrarily-dimensioned space ?
yeah, what's a line in 10 dimensional space?
may we just simplify that
there's a lot of dots and you can line to any of dot with x and y coordinates
daz vector now
a line in 10 deimensional space is a plane in 9 dimensional space
can have more than that
๐
nix stop!
can have 3, 10, or even infinity
so instead of having to write special equations for each case
you use vectors
(and eventually matrices)
also i really respect nix and srl for knowing math because i like people that know math
in numpy
A vector is a thing that has a direction and magnitude
thats one way to think of it, yes
and you can place it anywhere you want if it has the same direction and magnitude
@prisma verge can you elaborate on your question about word probabilities
but to simplify things I would see a vector as a scalar of 2 base vectors
@olive willow he gives a good example at around 12 mins in the video
how do you characterize the line that goes through 2 specific points
gonna watch
uhhh, so, we were talking about markov chains and when i was reading about it i've read that probability of each word being added in the text = amount of words in this text divided by times this word appeared in the text
oh
i just want to have nice text generator, really
i love random stuff
then get yourself markov chain lib x and youre done
i mean, it's not like i'm going in data science to know the theory
and, i guess simplified version of theory is good enough that was provided by you nix
haha
... oh
i mean
you can express this a lot more formal
with random variables and whatever
but it boils down to this idea
i hate formal things to be honest since i'm stupid and it's easier when you can compare stuff with real life isntead of learning hundreds of new terms
idk how you ever expect to learn anything then
real-life examples are valuable and important
but if you want to get past the basic level, at some point you have to sit down and try to understand what's going on
this goes for many things, not just data science
i have no idea?
i mean, uh, i kinda get python syntax when i'm bad at explaining this at formal side
and i even get what i do in code (on high level, not low level tho)
i guess i can say "it just works"
... no, not about code, about how i'm learning stuff
Not undderstanding code will restrain you enormly in the long term
but i'm saying that i do get it
i'm just bad at getting things when they're explained with a lot of programming and math only terms
and that's weird
pretty new to data science, but currently trying to analyse some sensor output. i am making pandas plots and checking if the sensor behaves "nice" according to a temperature cycle over time. are there some tools or methods in pandas which might help me discover sensor "noise" or "shifts" in data? like for example gradually changing offset over time?
Hi all, i'm not sure if this is the correct place to ask, but is there anyone who's versed with python Learning To Rank packages? Especially Pairwise ranking algorithms? I'm struggling quite a bit with implementing it in python.
Ranking packages?
Yes, for example:
@median siren you have to be more specific
what are you trying to rank
@lapis sequoia auto regressive methods
@lapis sequoia Apologies, i'll try to be more specific.
For example: consider the following table:
| Name | Rank | Points | Embedding |
|-------------|--------|---------|--------------|
| Man. City | 1 | 98 | [....] |
| Liverpool | 2 | 97 | [...] |
| Tot. Hots. | 3 | 93 | [...] |
Here, we try to predict the rank of a team, according to their embedding. So there is 1 query, namely the results of this season, and you want to predict the rank of a team based on embeddings. (Note: the embeddings can also be replaced with other features)
Well, to be frank I'm quite doubtful about that, Considering the goal is to "predict a rank", I assume the query is the rank, and you want to have the most accuare name with that query.
So in this case, if the query would be "1" You'd want Man. City to return. This considering the opposite doesn't hold true, I assume?. For example: When you query "Man. City." the result can't be "1", no?
what does the embedding represent..
It's a graph embedding, so the embedding represent the name in a graph.
I'm not able to understand the case here.. it's very cryptic..
you want to predict the rank of football clubs.. but what are the features exactly?
it's very rare for features other than text to be represented as embeddings..
That's not true. A lot of things use embeddings. It's less common to call them embeddings outside NLP but the idea of taking a representation, finding patterns in it and embedding it into a lower dimensional latent space that's more fundamental is a very common thread throughout ML
can't refer to all vector representations as embeddings..because they are specific task-restricted representations..
@lapis sequoia thanks, Ill look into that
@lapis sequoia tbh python has bad tools for time series analysis. R has much better stuff for decomposition, trends, etc
@lapis sequoia isn't that pretty much the definition of an embedidng? an invertible mapping into a lower-dimensional space?
theres nothing specific to text in that
I don't know.. it feels very odd to me to use that term for anything other than text o.o
What makes text embeddings different from non text ones?
lets see.. in text embeddings there's cases like context free and context aware..
in other vector representations.. the features are represented in a certain space and can be reduced to another, just like embeddings that represent text..
features for a vector representation are often chosen for predictability.. taking into things like avoiding multicollinearity.. this is the opposite of context aware representations in text, which are representations that take into account their surrounding text..
the idea of embeddings is that the representation can be used for multiple related nlu/nlg tasks or in a related domain..
apparently they also use a term called Vector Embeddings to denote outputs from DL layers..
How does everyone have their python environments setup?
Good idea to just install anaconda?
pika pi?
I used to use anaconda for university.. then was doing everything on the cloud.. so.. make a script for launching a quick instance with everything I need
pikachu
I am using miniconda with conda-forge as source. Then I conda install or pip install what is needed and set this env for the project in PyCharm (or whatever IDE) settings. If I use a lot of packages, I am uploading a venv.yml export along with the git repo.
I just use anaconda works fine for me. I use conda environments as well.
@lapis sequoia Vector embeddings are just embeddings. They are represented as a vector, same as text embeddings. Also not just used for deep learning. Deep learning is just very often used for things that lower dimensionality. Autoencoders' entire point is to create embeddings and classification tasks naturally create embeddings. Also a lot of embeddings use some sort of contextual information. Context around in text is just another part of what makes the information's position. In images, it would be the larger context around some area. It's not that different, really.
@lapis sequoia doing a lag plot of a large dataset (some 600000ish rows) certainly took its time.. maybe I am doing this wrong. Even tried to slice out every other row to "half" the data but that still took ages.
@lapis sequoia , Exactly as @lean ledge says, In my specific case I use a graph that I embed. A graph embedding can be seen similar to text embedding. The context of a word is reffered to as the probability of two words occuring next to eachother, or near eachother in a sentence. In a graph, the context of a node is reffered to as the probability of a node being a neighbour to another node.
Then, after that, I want to predict the ranking of those nodes based on their graph features, which in my case is represented in embeddings.
I'm still not sure I'm comfortable with people now using "embedding" to refer to any kind of hidden state
I'm referring to just any intermediate state in a neural network model
people have started referring to those as embeddings too
@desert oar that's entirely false lol
@void anvil ?
There are just as many, if not more, tools for TS in python and they're a lot faster
Do tell
Seriously it would be great to not have to call out to rpy2 for stuff like time series decomposition and arima
Statsmodels doesnt count
@silent swan What's wrong with use of the term embedding for that? It's generally only the last layer or two that are called embeddings, not any random hidden layer. It creates feature vectors that are separable according to a quality that we care about (which we used to classify) so it's honestly very much an embedding.
It's not purposefully made to be any kind of encoder but it certainly works that way given how neural networks seem to work experimentally (I'm not sure of any theoretical papers that exist on how semantically focused information comes out in later layers when training for a classification task etc. Would be nice to read if anyone's got one)
I'm looking for some help with numpy's bincount function
and trying to use bincount to make an azimuthal average but to ignore radii where data is completely missing but still have the missing value show up at that radius in the average
@granite bobcatotboi one pattern that I've seen for example, is where transfer learning is performed and the output of the model being reused/transferred is referred to as an embedding
depending on what you refer to as classification (sounds like you're more referring to the cases of few-class classification, otherwise even generation tasks are classification over the vocabulary), I can probably pull up some NLP papers for that
I would contrast "embedding" with "representation", which I think is more appropriate for this usage. On one hand, representation is broad enough to basically say nothing other than "contains information", but on the other hand that's probably more appropriate, because in these cases we're just taking some intermediate transformation of the input. In contrast, I think an embedding is a more specific term that say something more about the space/transformation being applied
@brazen wing sry for ping but if u mind can u help me fix it?
tomorrow is fine too
sure
you want it to give you an array of size 10
with a 1 for whatever the right category is. but your training data provides the y value as an int from 0 to 9
you just need to convert that from a 1d array to a 2d array with 1's in the corresponding int index
this is what i did
@lapis sequoia
'sup guys!
I'm developing a code to get into a javascript environment, then I want to scrape the data from the website using BeautifulSoup. The point is that I realized that there isn't any table in the environment, so I was wondering about how can I scrape the data from the website.
Any tips?
MWE:
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re
import pandas as pd
from tabulate import tabulate
import os
url = "https://scon.stj.jus.br/SCON/legaplic/toc.jsp?materia=%27Lei+8.429%2F1992+%28Lei+DE+IMPROBIDADE+ADMINISTRATIVA%29%27.mat.&b=TEMA&p=true&t=&l=1&i=18&ordem=MAT,@NUM"
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div/div/div[3]/div[2]/div/div/div/div[16]/a')
python_button.click()
driver.switch_to.window(driver.window_handles[-1])
python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div[1]/div/div[3]/div[2]/div/div/div/div[3]/div[2]/span[2]/a')
python_button.click()
driver.switch_to.window(driver.window_handles[-1])
pagina_de_resultados = BeautifulSoup(driver.page_source, 'lxml')
table = pagina_de_resultados.find_all('table')
df = pd.read_html(str(table), header=0)
datalist.append(df[0])
x += 1
driver.quit()
result = pd.concat([pd.DataFrame(datalist[i]) for i in range(len(datalist))], ignore_index=True)
json_records = result.to_json(orient='records')
print(tabulate(result, headers=["Employee Name", "Job Title", "Overtime Pay", "Total Gross Pay"], tablefmt='psql'))
path = os.getcwd()
f = open(path + "\fhsu_payroll_data.json", "w")
f.write(json_records)
f.close()```
Actually getting the following Traceback:
File "C:/Users/Pedro/PycharmProjects/data_scraping/data_scraping.py", line 29, in <module>
df = pd.read_html(str(table), header=0)
File "C:\Users\Pedro\PycharmProjects\data_scraping\venv\lib\site-packages\pandas\io\html.py", line 1094, in read_html
displayed_only=displayed_only)
File "C:\Users\Pedro\PycharmProjects\data_scraping\venv\lib\site-packages\pandas\io\html.py", line 916, in _parse
raise_with_traceback(retained)
File "C:\Users\Pedro\PycharmProjects\data_scraping\venv\lib\site-packages\pandas\compat\__init__.py", line 420, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found```
Hi, I wish to calculate the weighted average of n data points, while lowering 2 specific data points to value of 1 arbitrary and adjusting the other data points so that the ratio of the weight is kept as-well as the weighted average. How can I achieve this?
For example I have the following data points and weights:
value1 - 1.186, weight1 - 100
value2 - 1.294, weight2 - 50
value3 - 2.157, weight3 - 200
value4 - 3.235, weight4 - 150
average - 2.2
I wish to set value1 and value2 to 1 and adjust value3 and value3 to satisfy my condition as to keep the weights ratio and the weighted average.
@void anvil nice, thanks for that. doesn't cover everything i'd want but it's a start
still would be looking for seasonal decomposition
Just throwing this out there because I'm interested. Does anyone know how common it is for people to pair computer science degrees and biology degrees in college like in a double major? I'm have to pick my college major soon and I love both but am not sure if its a thing that is possible.
@wary fox it is.. only if you're into bio statistics.. plan to go work for pharma or a clinical research organization..
or maybe consulting.. where you're advising those sorts of companies
@lapis sequoia interesting. So its not common for field biologist to have computer science degrees?
what's a field biologist.. :v
People searching for Field Biologist: Job Description, Duties and Requirements found the links, articles, and information on this page helpful.
I'm really not sure of what other prospects biologists have.. aside from ones related to data science..
Oh ok, thanks for your input though! Its much appreciated.
@bitter pewter dont use a form then? you can just select the text you want using an id or name
Hey @brazen wing can you give me some light? Maybe a documentation link or any kind of guide... Iโm kinda lost LOL newbie at Python
sure I'll try. but what data are you actually trying to get
Have you tried the code? It redirects you to a public database of a tribunal court. That database have some judgements infos. I want to get that info and copy to a table. Iโll try to show you! Give me a sec.
As you can see, I want to get the info into โProcessoโ, โRelator(a)โ etc. fields
ill just get up the webpage
Sure! Iโll get you the url
You should click on ยง 6ยบ text
And then click in โ41 documento(s) encontrado(s)โ
ah ok i see
well first of all i don't think you are selecting this
when you click in selenium
Oh, itโs going right to where Iโve pointed
It works well till I try to get the data
yeah its a bunch of divs
Yeah LOL
Sure. Iโve saw that we have docTexto as class and a bunch of divs about those fields. Iโm not sure that there is a pattern about those divs names.
Anyone know any ML groups?
There's an AI server, a data science server, /r/learnmachinelearning server
Channels on every major programming discord
I never liked any of them much though so your mileage may vary
Smart ML people seem to not use discord too much
@bitter pewter ok its the first two elements of class docTexto
so you can use driver.find_elements_by_class_name("docTexto")
and that will give you a list
of the elements
@lean ledge I am a beginner.
Iโll use it with append?
Well you will need to get the text from them.
so it will be like
textList = driver.find_elements_by_class_name("docTexto")
Other beginners are not who you want to talk to as a beginner
and then textList[0].text will give you a string of Processo field
Yes, I am looking for a group that knows what they are doing.
I see! 0-X will be the position of the field I want
yeah thats right
Nice! How do I get that texts and makes a table from it?
I want to get it as a spreadsheet
table = textList ?
well you will probably need to construct it with pandas
I see!
Np, I will study about pandas later
As you can see, Iโve kinda copied the code Iโm using
Iโm trying to understand it time by time
yes it did seem that way haha. an ambitious start
pandas is great though, so learning it will be useful
Iโm trying to use it as my undergraduate thesis in Law
๐คฃ
Iโll make an analysis of a Courtโs decisions group
If itโs ok, Iโve added you in my friends list. I wonโt bother you ๐คฃ
yeah no problem. you can probably just message me in here anyway
but yeah if you just do a simple tutorial on panda, it should be obvious how to make a table of strings then
no prob dude
Hi I made a little data mining / data science project.
Feel free to check it out.๐
https://github.com/tomg404/Zeit-Scraper
@proper sierra looks a little inspired by the SPIEGEL mining talk at 34C3 right?
@earnest prawn yes xd
hey this is pretty cool
HI guys, May i know how can i transform this dateset on python like do on excel ?
like this on python
may i know the first pic is showing or keep loading ?
just keeps loading
Hi this is the first pic look like sorry for waiting
@strong flare you're using pandas right?
that just means you will have it in a csv
or you should have it in a csv
you can just import from the csv with pandas, and then you will need to create a new dataframe with the headings you want
right. do you know how to construct a dataframe?
i m learning right now but i dont know why show out this error
ValueError: x and y must have same first dimension, but have shapes (3174,) and (1509, 4)
can you help me in code 
well I would do a tutorial on pandas first to understand the basics
it will make it easier to understand
ok you can use my dataset to do the tutorial
i mean look one up in google
thanks bozo
man
both pandas and numpy docs
they really lack a "basic concepts" section
it all jumps right into technical stuff
the world needs a good coherent intro to this stuff
More basic than https://docs.scipy.org/doc/numpy/user/quickstart.html?
thats better than i remember
huh you know thats pretty good
it covers broadcasting rules
There's also one for pandas: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#min
10 minutes to pandas was the one i was thinking of
i remember when i started w/ pandas coming from R, i read 10 minutes to pandas and had absolutely no idea what was going on
selection by label? what?
the numpy one is serviceable because it starts by describing the data model
pandas expects you just figure out that theres this Index data structure, and that a Dataframe is logically a collection of Series, et al
I think the main problem in the pandas one is that they omit half of the example
Show the df with labels and then the result of a selection
ugh yes. they reuse the same example from the beginning and expect you to keep track
That way, you can see what's happening and learn the terminology
Yeah, it's one of the basic writing principles we were taught in our scientific writing course: Don't rely on your readers remembering things read on page 1 when they're reading page 7
good principle
Hey guys, I want to make my clean function more accessible and wanted to know if drop.na(subset='COLUMNS') could be set to where it reads every column no matter what file you put through it
So if I have a csv of 15 columns, it'll sort through them, but then sort through another csv with 5
so just dropna and that's it
yes
well with ()
but in general you can get all the columns from a dataframe with df.columns
Tutorial on the basics of Python's data frames (spread sheet) library, Pandas in this tutorial. Intro to statistical data analysis and data science. RELATED ...
Here's a good guide on Pandas dataframes
Ignore the "from Silicon Valley" crap marketing lol
Hey, guys! I'm trying to make the code to read 5 different pages content and gather it as a table in Pandas. How do I make the selenium to change the pages and read the content?
MWE:
from bs4 import BeautifulSoup
from openpyxl import Workbook
import numpy as np
import pandas as pd
url = "https://scon.stj.jus.br/SCON/legaplic/toc.jsp?materia=%27Lei+8.429%2F1992+%28Lei+DE+IMPROBIDADE+ADMINISTRATIVA%29%27.mat.&b=TEMA&p=true&t=&l=1&i=18&ordem=MAT,@NUM"
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div/div/div[3]/div[2]/div/div/div/div[16]/a') # Aponta o dispositivo (art. 17, p. 6o)
python_button.click()
driver.switch_to.window(driver.window_handles[-1])
python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div[1]/div/div[3]/div[2]/div/div/div/div[3]/div[2]/span[2]/a') # Aponta os resultados da pesquisa
python_button.click()
driver.switch_to.window(driver.window_handles[-1])
textList = driver.find_elements_by_class_name("docTexto") # Variรกvel que puxa os dados nos campos da lista de resultados
resultados = BeautifulSoup(driver.page_source, 'lxml')
parse = resultados.find('div', {'id':'listadocumentos'})
paragrafoBRS = parse.find_all('div',{'class':'paragrafoBRS'})
header = []
content = []
for each in paragrafoBRS:
header.append(each.find('h4', {'class':'docTitulo'}).text.strip())
content.append(each.find(['div','pre'], {'class':'docTexto'}).text.strip())
df = pd.DataFrame([content], columns = header)
df.to_excel('dados.xlsx') # Exporta as informaรงรตes para um arquivo .xlsx
driver.quit()
It already successfuly export the data to a .xlsx spreadsheet
I'm trying to output a coordinates map but my lat and lon are switched and I think that's the reason for making my map behind it super small
as in my lat is on x axis and lon is y axis
I am setting geometry = [Point(xy) for xy in zip(df["lng"],df["lat"])]
Which I thought would do the trick, but the latitude is still x axis
Nevermind
I think I got it
it depends on how you defined your Point class
Also check out namedtuples
df = pd.read_excel(r'data\mcd vs kfc eg.xlsx')
KFC = df['Restaurants'] == 'KFC'
KFC_Amount = df['Amount'] > 0
KFC_Data = df[KFC & KFC_Amount]
MCD = df['Restaurants'] == 'MCD'
MCD_Amount = df['Amount'] > 0
MCD_Data = df[KFC & KFC_Amount]
KFC_Data = KFC_Data.groupby(['Date']).sum()
MCD_Data = MCD_Data.groupby(['Date']).sum()```
HI Guys may i know how can i join this two data (KFC_Data,MCD_Data) in a table with specific columns name ?
is it i have convert this 2 data to new dataframe and join together ?
@onyx granite it can be date ? a column in the dataset
there should be a way to specify an index to be a datetime
check the docs and you should be able to join based on that
hi
i'm looking for a good plot
I have some groups names in a column and their correlation score in a column
what sort of plot would look good
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel(r'Data\mcd vs kfc eg11.xlsx')
KFC = df['Restaurants'] == 'KFC'
KFC_Amount = df['Amount'] > 0
KFC_Data = df[KFC & KFC_Amount]
KFC_Data.rename(columns={
'Amount': 'KFC_Sales',
'Count': 'KFC_Count',
},
inplace=True)
KFC_Data.set_index('Date')
KFC = KFC_Data.groupby('Date').sum()
MCD = df['Restaurants'] == 'MCD'
MCD_Amount = df['Amount'] > 0
MCD_Data = df[MCD & MCD_Amount]
MCD_Data.rename(columns={
'Amount': 'MCD_Sales',
'Count': 'MCD_Count',
},
inplace=True)
MCD_Data.set_index('Date')
MCD = MCD_Data.groupby('Date').sum()
Combined = pd.concat([KFC, MCD], axis=1)
# Plot with differently-colored markers.
plt.plot(Combined.index, Combined.KFC_Sales, 'b-', label='KFC_Sales')
plt.plot(Combined.index, Combined.MCD_Sales, 'g-', label='MCD_Sales')
plt.plot(Combined.index, Combined.KFC_Count, 'r-', label='KFC_COUNT')
plt.plot(Combined.index, Combined.MCD_Count, 'y-', label='MCD_COUNT')
# Create legend.
plt.legend(loc='lower right')
plt.xlabel('Date')
plt.ylabel('KFC VS MCD')
plt.show()
Finally success 
but it still showout error
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
To register the converters:
>>> from pandas.plotting import register_matplotlib_converters
>>> register_matplotlib_converters()
warnings.warn(msg, FutureWarning)
that's just a warning
@lapis sequoia thanks i will just ignore
Does anyone have any good suggestions for economic forecasting libraries on python
Iโm using ARIMA for time series but I feel itโs lacking complexity for economics and I donโt know how to fix that lol
Hi all Pros, May i know how can i select month on date ?
like i need to select March on the date listed below, how can i get it ? the dtype is ('<M8[ns]')
2019-03-01
2019-04-01
2019-05-01
@vestal lagoon You can check out facebook's prophet library. It's a bit more advanced but still quite intuitive. Time series forecasting is hard though and simple models often perform as well as more complex models so be aware that there might not be a magic bullet here.
@strong flare You can find the month of a datetime variable date as follows date.month. If you have your dates as a column in a pandas data frame called df you can get a subset of the dataframe on March like this: df.loc[df['Dates'].month==3, ]
@polar acorn ok got it thank you
So I found this post with the exact same question I have, and the answer is very clear to me. However, I'm having a lot of issues implementing this, can anyone give me some directions on what I should and shouldn't do?
https://dsp.stackexchange.com/questions/24017/which-filter-for-an-audio-equalizer?rq=1
also @summer plover could the admins discuss the possibility of opening up a channel specifically for audio/video/other signal processing?
that is something we can talk about. could you make the same request in #community-meta as well?
sure
โค
@spark nimbus hmu with your signal processing questions
putting them mechatronic engineering skilsl to work
they have specific terms btw, not frequency domain vs time domain
okay so tl;dr I want to add an EQ for users to configure simply with sliders (similar to how V4A works if you've ever used it)
IIR and FIR
IIR will probably be better
You can treat them as multiple bandpass butterworth filters for which you change the gain
I what
note that I have practically zero knowledge of such terms
I know IIR and FIR and thats it
A band-pass filter, also bandpass filter or BPF, is a device that passes frequencies within a certain range and rejects (attenuates) frequencies outside that range.
Basically the user changes these sliders (though idk how to extract the cutoff values used here) and it gets applied to my signal
Just make arbitrary cutoffs (take the entire 20-20khz range and split it up (possibly in log scale?)) and make a bandpass filter for each range
I couldn't find any code that does this at all sadly
any recommendation on how to do that?
also, how would I make a bandpass filter?
and do I pass it through all of them sequentially or in parallel and them sum them?
Bandpass filter is essentially a low pass and a high pass together
And parallel and then sum
what's a low pass/high pass filter?
Anyone familiar with pyspark? i have a spark dataframe containing a column of arrays (A) and another column of structs (B). The array elements themselves are structs of the same structure as the dataframe struct column (B), whatโs the correct way to append B to arrays in A, row-wise?
I checked that one but it didn't seem like it worked the same nor was the code very readably structured
and it also doesnt specify the width
looking at the code, there's 5 filters and each is initialised with it's parameters already made
I also tried chekcing the pulseeffects source but it's also pretty difficult to understand their implementation
https://github.com/twoz/pyEQ/blob/master/filters.py
defines all filter types, in particular as elliptic filters for the band-pass and high pass filters and butterworth filter for low pass. it uses scipy as a backend so it shouldnt be hard to figure out what each parameter is for with a little bit of documentation reading
so I wanted the bandpass filter right
is that the HPButter one?
HPButter -> butterworth high pass
uhh
In signal processing, a filter is a device or process that removes some unwanted components or features from a signal. Filtering is a class of signal processing, the defining feature of filters being the complete or partial suppression of some aspect of the signal. Most often...
and now in a way I understand? ^^'
This gives basic background
high pass blocks low frequencies (high ones pass), low pass blocks high frequencies (low ones pass), bandpass blocks both low and high except in a band that passes
butterworth, chebyshev and elliptic are types of filters
so to make a bandpass I just add up a low pass and band pass?
wont that make a giant spike tho
\omega is angular frequency, |G| the the magnitude of gain
this is adding up both
Where'd you get those graphs from?
it's an approximation I made in desmos using just log_e(x) and e^x
both converging to the same Y
- it's not what filter responses look like generally
- you dont add them up in the
+sense of the word, you compose them. one after another. that's a convolution in the time domain, multiplication in s domain
a what
- you have control over the cutoff frequencies. you can shift them left and right
whats a time domain and s domain and convolution
f(t) is your signal
f(\omega) is your signal in frequency domain (after fourier transform)
f(s) is your signal in complex frequency domain (after laplace transform)
convolution is mathematical operation involving shifting, multiplying and integrating two functions. according to the convolution domain, a convolution in one domain (time or frequency) is a multiplication in the other domain
Wish I could dumb it all down but it's taken me 2 semesters of classes in my engineering degree to get a hang of it all, it's not an easily approachable subject for a person without an electrical engineering background and maths is the fields' bread and butter
It's mostly just terminology I need to get used to
If I see code I practically understand complex concepts immediately
Code doesnt help you learn a semester's worth of signal theory that underpins that code
My suggestion would be to just reuse someone else's code
But there's not even any code compatible with my project :(
The design is really hard to work with given my current framework
Just figuring out how to make it work without GUI is difficult enough already
Then I need to make it work from just a list of floats
Which part don't you understand