#data-science-and-ml
1 messages · Page 307 of 1
So if I set my unit=32
I'm guessing we have 32 different combinations of bs and cs
Right?
subject to the above caveat
but
"combination" is not really
an appropriate word
or rather, it's ambiguous
but, yes, in essence you have 32 (w, b) tuple-equivalents
which are independent
So what decides which 32 values of b,c you get?
what's backpropagation of error
kaggle
I would suggest
you pick up a book, or a video course, or something like that
it's important to have a theoretical foundation
Oh, the loss function?
it's the application of the chain rule, given the application of a loss function to a neural network's prediction vs ground truth, to successively update the weights (including bias) of preceding layers
the layer closest to the end is updated first
and then the weight updates are propagated backwards throughout the network
So the number of layers is the number of times the gradient is calculated and applied?
hm
you can think of it that way
but that is not always true
because you work at a higher level of abstraction than that
sometimes layers may incorporate multiple such mathematical operations
each of which requires one backpropagation step
consider for example
RNNs
Yeah I'm gunna go look up backpropagation a bit
I think I'm there with the individual components, I just dont have much intuition as to how it all ties together
Thanks 🙂
Can someone suggest me a good roadmap for deep learning? Thanks!
Heyo, does anyone here know how to download a .json file from a html link and convert it into dataframe or csv format?
use requests library?
i know the derivation of the above two equation
but not able to derive for third one (marked with 2 arrows)
it is for one hidden layer NN
we are using sigmoid for first layer and tanh for output layer
this is cost function j
does someone know how to derive it
hi im having trouble converting this architecture into code
(2) The convolutional layer is followed by a max pooling layer. The pooling is 2x2 with stride 2.
(3) After max pooling, the layer is connected to the next convolutional layer, with 64 output feature maps. The convolution kernels are of 5x5 in size. Use stride 1 for convolution. The activation is ReLU.
(4) The second convolutional layer is followed by a max pooling layer. The pooling is 2x2 with stride 2.
(5) After max pooling, the layer is connected to another convolutional layer, with 128 output feature maps. The convolution kernels are of 5x5 in size. Use stride 1 for convolution. The activation is ReLU.
(6) After convolutional layer, there is fully connected layer with 3072 nodes and ReLU activation function.
(7) The fully connected layer is followed by another fully connected layer with 2048 nodes and ReLU activation function, then connected to the last fully connected layer with 10 output nodes (corresponding to the 10 classes). Use the SoftMax activation for the last layer. ```
so far i have:
keras.layers.Conv2D(64, (5, 5), 1 , padding='same', activation='relu',
input_shape=(32, 32, 3)),
keras.layers.MaxPooling2D((2,2), 2),
keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
input_shape=(32, 32, 3)),
keras.layers.MaxPooling2D((2,2), 2),
keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
input_shape=(32, 32, 3)),```
i don't know how to make a fully connected layer
or know if my input_shape arguments are correct
let me know 😎
@ me when you respond tysm
keras.layers.Conv2D(64, (5, 5), 1 , padding='same', activation='relu',
input_shape=(32, 32, 3)),
keras.layers.MaxPooling2D((2,2), 2),
keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
input_shape=(32, 32, 3)),
keras.layers.MaxPooling2D((2,2), 2),
keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
input_shape=(32, 32, 3)),
keras.layers.Flatten(),
keras.layers.Dense(3072, activation='relu'),
keras.layers.Dense(2048, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])```
something like this?
please help
but it doesent go more than 0.8
i have been training a hotword 5000 times
try changing huperparameters
is that supposed to be VGG~ish?
maybe use bigger network
architecture description doesn't mention it, but it looks similar kinda
is it coursera
uni class
@lapis sequoia You can search it up on StackOverflow, it's a hardware reason in GPU - batches in the power of 2 can be efficiently calculated by (4 CUDA cores in parallel?) in the end, it boils down to the GPU architecure and what Nvidia has adopted
model = keras.Sequential([
keras.layers.Conv2D(64, (5, 5), 1 , padding='same', activation='relu',
input_shape=(32, 32, 3)),
keras.layers.MaxPooling2D((2,2), 2),
keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
input_shape=(32, 32, 3)),
keras.layers.MaxPooling2D((2,2), 2),
keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
input_shape=(32, 32, 3)),
keras.layers.Flatten(),
keras.layers.Dense(3072, activation='relu'),
keras.layers.Dense(2048, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
lr_schedule = keras.callbacks.LearningRateScheduler(
lambda epoch: 1e-4 * 10**(epoch / 10))
optimizer = keras.optimizers.SGD(
learning_rate=0.01, momentum=0.0, nesterov=False, name="SGD"
)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics=['accuracy'])```
i ended up setting up my model like this im not exactly sure if its correct but i really hope so cause training 20 epochs is taking forever even on colab
what's the prob?
im not exactly sure if i set it up correctly and wanted to make sure before i spent the training the model
specifically the input shape param
i did it mostly looking at a kaggle notebook and kinda guessing
uni eh? what's the end aim? any baselines?
its using svhn dataset the google street view house numbers
but its more a learning activity or something
first intro to NN
wow, that's not a first intro to NN
requirement is to train the model and plot loss functions
so is your knowledge in CNN's fully fleshed out?
not particularly, but that may be my fault i am a little behind
yea, I suggest you take things slow and learn the basics first
i have a little experience with them cause i took the andrew ng DL coursera course a few years back but it has been a while
andrew ng is the goat
better learn DL from the ground up
Andrew NG's course is shit - it's just spoon feeding you code
yeah i got that vibe when i took it
though I didn't complete it, so I might be biased
but learning NN's from the ground up is much better
see this i aint feeding from spoon.......(he said you may not wonder how to derive cuz its complex......but i did wonder)
i think its good if you see deeper with the course side by side
i don't mean to flood the channel but is this normal? ik its early in the training of the model but accuracy hasn't changed in 4 epochs but the loss is going down
I don't see much from ground up skimming over his syllabus - and one slide does not represent his entire course
it's just starting to overfit
o not good
run it over a few more epochs to see whether val accuracy increases
wait can we tell overfitting just by this?
sorry for the stupid question but hows it even possible for the loss to decrease and the accuracy to remain the same? isn't the loss function measuring accuracy in a way by calculating error?
can someone help me🙄
@cobalt creek can i dm you?
ye maybe
can someone help with this
how?
What model are u using
ls hotword
I hv not used it but roughly hyperparameters are the values changing which affect accuracy,
can someone help me please
how can i help anyway
why is this 2000 training partition is of 65000... i m confused pls help
is there some default value of batch size, i just set it to 1, i have 65000 on the counter
exactly what i was expecting
Ive got RL algorithm to choose from buying,selling or holding things. How do I prevent it from choosing actions like buying when it has no money, or selling when it doesnt have anything? Cause it can choose these things for a lot of iterations and gets 0 reward, which breaks everything I assume
@drifting void this one
If i have an image, and its mask, what operation do i need to apply the mask but leave the background white?
Sorry, my lab got destroyed and I couldn't get the sample data...
So my case is the following. I am generating a lot of data in a form I choose, last version is something like that:
[
'0001c06e32a85a5d92c9cb784ff6a492df1d0055',
'00088f45a8bc798ceb2b5a37505f787fad19d9af',
[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[350, 351, 352, 353, 354, 355, 356, 357, 99],
-9.5,
1.0
]
Since I have many of these, I do that in parallel and append to a file. I chose msgpack but now the file is so large that I cannot read it back...
I use Dask for other use case and it worked well with reading many parquet files. So maybe I should write in several parquet files instead of msgpack single file.
My question is how do you usually write big files and what do you use for reading and searching later on
can someone please help me
i am making a personal assistant and i want to add a hotword in it
please help
Hey this is a fairly basic question I think. I was told to post in here.
I currently have pandas 0.20.3 installed. I want 0.24 or newer versions. I tried update pandas but it apparently only sees the 0.20.3 as the newest version.
Currently working on corporate servers so I can't download anything. Anybody know how to get the new version of pandas in my situation?
If i have an image, and its mask, what operation do i need to apply the mask but leave the background white?
What Python version do you have?
3.7.4
That's really strange, hmm
Probably just how it's all set up here
Try updating pip, perhaps. python -m pip install --upgrade pip
I've had weird behaviour from old pips
Hmm that's giving me an error in the prompt. Says unable to get local issuer certificate
try also pip install --upgrade pip, I guess, but that ends up badly for me sometimes
It gives me that same error
hmm, weird
might want to open a help channel
something is wrong with your pip, possibly
in a regression model, would you keep variables that have low correlation with the target variable?
imagine my classifier classifies melons and water melons
how can i make it infer a melon colored with red as a melon and not a water melon?
thats what i mean
if i paint a melon with red, like, manually on photoshop
the cnn will think it is water melon, but it is actually a melon
or orange - lemon
Well, include in the training set such trick examples.
like my question is, how can i make it not rely that much on colors but on shape
hey guys i wrote my first blog. It would be great if you check it out https://www.analyticsvidhya.com/blog/2021/04/exploratory-analysis-using-univariate-bivariate-and-multivariate-analysis-techniques/
will be good randomly paint some images on the data training set???
Possibly, yeah
like, idk, then maybe when he sees an orange, maybe it will think it is a lemon that has been painted
:/
how will it not mess up with real and fake?
Include enough examples and eventually it will learn.
You could also just grayscale the image and so abandon matching on color entirely, but that might reduce accuracy on normal examples.
i just made it right now
if u hadnt the original photos, it will be hard even for u to see what is a lemon and what is an orange
I'd be fooled by that too, yeah
so there is not actual way?
and on black and white, since u have less data to analyze, u need more examples, right?
Hey i'm trying to build an ai that predicts a 6. number to a given 4 number series. what is the best neural Net i could use for that? (I heard that RNN or specifically LTSM is good for the task)
hello I need help with an AI in open cv someone knows about this topic
that's a very circumstancial question. If you have no limitations in processing power, data collection, etc, would you keep it all?
Feature engineering / selection is practically a field of its own within ML / AI, so I dont think its a trivial question
Could anyone direct me to information on feeding a bayesian network distributions as inputs? All I see are on using bayesian networks to produce a distribution as an output
to be fair, by "producing a distribution" we usually mean "parameters of a predefined family of distributions"
so you can also say that your inputs follow a parameterized family and feed those in
Hi guys how can i get 8 peak values at every charts? I have this values saved in txt file and numpy dataFrame
How many times do neurons get backpropagated in a neural network?
im not sure if a beautifulsoup question belongs here, but #help-carrot
that would be a #web-development question.
Hey guys! I had a quick Numpy question. If I have an array such as
[
[0], [1],
[1, 0],
[1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]
]
And I wanted to populate the blank spaces martrix with any number lets say 8 to become a 8 by 3 matrix such as
[
[0, 8, 8],
[1, 8, 8],
[1, 0, 8],
[1, 1, 8],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]]
How would I do something like this?
how is numpy letting you make a non-rectangular array like that?
Its not in the first place I have to specify dtype=object
it's an array of lists
why is your data like that?
anyway, this is probably what I would do
Well its accually a step for solving a problem in a question on my CS assigment so yea
I would find another way to approach the problem so that you end up with nans instead of an improper matrix.
!e
import numpy as np
data = [
[0], [1],
[1, 0],
[1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]
]
max_length = len(max(data, key=len))
repeated_element = 8
a = np.array([row + [repeated_element] * (max_length - len(row)) for row in data])
print(a)
@velvet thorn :white_check_mark: Your eval job has completed with return code 0.
001 | [[0 8 8]
002 | [1 8 8]
003 | [1 0 8]
004 | [1 1 8]
005 | [1 0 0]
006 | [1 0 1]
007 | [1 1 0]
008 | [1 1 1]]
wow
neurons don't get backpropagated.
define "peak"
Sorry, the weights
How many times do they get recalculated
Thanks that helped 🙂
that would depend on the library
weights don't get backpropagated either...
...but if I understand the thrust of your question correctly
that would depend on the architecture of the network.
are you talking about resilience to adversarial examples?
I am solving this problem on help-cake could u check it out? I am struggle with one thing
@velvet thorn In your code how would I replace the number 8 with something different?
Nm i got it
How can i check when series of booleans like "True, True, Ture,True, False,False" is changing from True to False?
compare to a slice
Guys can you please help me with this question
I'm struggling to implement a sine wave predictor using LSTM in pytorch. If someone can help me understand why it's not working
wtf
how r ya guys
@bright aurora u there?
woao
highjshgsjdf
where did u learn pytorch=?
What's up
What’s up guys?
anyone free to look at my beginner data analysis code? I really appreciate the help
Would someone mind having a look at a short notebook where I'm toying with some data exploration/visualization? I wound up coming to almost the opposite conclusion I expected when I started and I'm wondering if I inverted something somewhere or made some really stupid mistake in sorting and filtering my data?
How does one typically use non-image data with image data when training models using pytorch? I have used pytorch for image classification, but never for image classification with non-image features. Any thoughts?
Sorry, asked this in help section, but since I'm in deep dung, and the question probably was a bit too specific, I'd ask it here if ok:
I did SVD on some precentered data,
# done in python
# T: amount of samples with time, kind of our "variable" with this type of data
# X: data put inside a np.ndarray
# X.shape = (T=109, N_Lat*N_Lon=alot)
# X has mean ~ 0
U, s, V, = svd(X)
# mean ~ 0, std ~ var ~ 1
# but min ~ -2, max ~2.5
# retain three components
standardised_PCs = sqrt(T) * U[:, 0:2]
# Since standardised, I'd assume this would result in the correlation matrix, but...
standardised_PCs.T @ standardised_PCs
array([[ 1.09000000e+02, 1.45674989e-14, -8.57975238e-15],
[ 1.45674989e-14, 1.09000000e+02, 2.23983947e-14],
[-8.57975238e-15, 2.23983947e-14, 1.09000000e+02]])
The diagonals are equal to T rather than 1.
I feel like I misunderstand the approach or result somehow. Everywhere I look I feel they say you'd get a correlation matrix using standardised PCs
My reference for this approach is (eq. 16)
http://www.ehu.eus/eolo/pyclimate/downloads/matrix.pdf
Indirectly. How often a backpropagation is fired depends on your batch size, and your learning algorithm. So, you can pretend it happens once for every batch but there's exceptions too. Now that means number of epochs also affects it. And then finally you throw a gpu into the mix and it all goes to shit
Makes sense. Helps with the intuition anyway. Thanks.
i'm having a problem with chinese and korean words using seaborn+matplotlib, someone know how i can fix that?
Can't help you directly, but found this:
https://stackoverflow.com/questions/58172176/python-seaborn-plot-shows-data-names-as-ㅁㅁㅁㅁ-how-can-i-fix-this
ty
!eval [code]
Can also use: e
*Run Python code and get the results.
This command supports multiple lines of code, including code wrapped inside a formatted code
block. Code can be re-evaluated by editing the original message within 10 seconds and
clicking the reaction that subsequently appears.
We've done our best to make this sandboxed, but do let us know if you manage to find an
issue with it!*
what do u guys prefer to save some ML model? i read about h5, pickle, YAML, json...which one should i prefer
probably h5 or pickle, would prefer h5
storing giant arrays of numerical data in YAML or JSON is a crime against efficiency
like, how'd you encode them, as base64?
oh
does R perform better than python though
like in terms of computing speed
i assume it would
considering python needs so much dependencies
Bad assumption, you're assuming number of dependencies decides programming speed.
Is there a go-to lib for a/b testing?
i mean wouldnt numpy be slower compared to what it would be if it was ddirectly a python thing
I imagine most performance-driven stuff in python as well as r is basically a wrapper around lower level language functionality.
Hey guys, I need to make a Visualisation project in Tableau
I chose the London Underground, Bus and Overground usage data compared to daily covid cases. The data looks like this:
I need ideas for the visualisation
the tricky part is that I dont have dates, rather time periods. Date from to Date to. How can we handle it while visualizing?
Numpy and dask are two examples for instance
Any Data Engineering and Visualisation expert here?
No, numpy is actually going to be a lot faster than native python or R. (for context, numpy is written in C). This is kinda why the python ecosystem is so strong, you have python acting as glue language with heavy lifting written in low lvl languages. Otherwise python wouldn't be dominating the ds space right now
It's a very fun read. I really like Jeff's story, need more people like him.
Both R and Python end up calling some C code (for datascience stuff), that C code probably involves calling BLAS/CBLAS (e.g. calling numpy) which will result in mostly identical speed. The overhead of Python and R for calling a C function may be different, but it's irrelevant to any data science task. For example, if python took say 0.2 milliseconds to call a C function which got the mean of 20000 data points and R did the same but with 0.1 milliseconds overhead, it would not matter since something like 99.9...% of the time is spent in the C function (actually computing the mean). So it's a micro-optimization at best. If you are worried about speed and want to get serious about it, consider learning C to make fast things. As it will probably result in you learning about the relative speeds of things in modern computing and the C community is more focused on such things while Python/R is focused on using the things made by those people to be productive without too much work (Systems programmers make the fast systems which Python and R programmers use for their specific use cases).
(People that know both Python and C are the engines of the Python community that let everyone be very productive with Python (and there are a lot of them -> python is very big / used everywhere))
yea, I was pretty taken aback when I learnt he was the palm guy. you wouldn't think someone in Neuroscience would dip into mobile phones/portables.
Yeah it's what makes me trust his opinions much more, he has actual experience making things (software especially, a bunch of people in AI never actually programmed which is strange (Often lacking a grasp of computational complexity and such)).
a bunch of people in AI never actually programmed which is strange
I wouldn't believe it lol. it's such a fundamental thing when working with heavy computation
Yeah they put out of a bunch of theory stuff (typically some crazy equations and such), but don't know that if such a thing could be computed it would be easy. One cannot ignore the physical reality of implementing an idea.
hey y'all, could someone of you take pity on me and have a look at my problem that posted over here on r/learnpython? https://www.reddit.com/r/learnpython/comments/mxr8yw/merging_pandas_dataframes_how_can_i_split_my_big/
Can someone please help me with sklearn.preprocessing.OneHotEncoder. I can't figure out how to use its categories parameter.
Categories is what the encoder "learns" after you instance it. That is
if your column had 3 options A,B,C those are going to the categories tehe encoder is going to fit
I think you can also pass a list of your own categories, if you already have them or if you want to exclude unknowns
Hey ! I just followed the tutorial of Tech With Team to create an AI playing Flappy Bird using the NEAT algorithm
everything works as intended and I now want to check if I understood correctly by coding a snake game
But I'm wondering about something : something the snake game has that flappy bird don't is collectibles
Basically If I have X snakes playing around at any given time but only one apple for them to eat, this will cause issues. My question is : should I give each snake its own Apple that other snakes can't eat ? In this case, should all the apples be at the same position (apple #1 will be at 54;60 for every snake then apple #2 at 100;100, etc) or will a random position for each snake work just fine ?
Thanks for your answers lol I'm only starting out with ML
you are starting ML with NEAT !??
someone who understand about asia fonts(cjk) can give me a hand?
i'm trying to set the Noto Sans CJK font family using seaborn
sns.set_style({'font.family':'NotoSansCJK-Medium.ttc'})
i tried this too
sns.set_style({'font.family':'Noto Sans CJK'})
idk if the problem is with the font or with the code
Following along one of keras tutorials with my own data....really just trying to use datasets instead of a dataframe....but I keep getting the error that the model expects 3 inputs, but only receives 1 input tensor when it tries to fit the model to the dataset. Stackoverflow solution was to ensure that the second part of the tuple for the dataset needs to be the targets, which I have done....so not really sure what to do to resolve this. Anyone know how to resolve this?
`metal = 'N_SiII'
dataset = tf.data.Dataset.from_tensor_slices((Dataframe[['N_H','Redshift',metal]].values,Dataframe[['Metallicity','Density']].values))
def get_train_and_test_splits(dataset,train_size,batch_size=1):
train_dataset = (dataset.take(train_size).shuffle(buffer_size=train_size).batch(batch_size))
test_dataset = dataset.skip(train_size).batch(batch_size,drop_remainder=True)
return train_dataset, test_dataset
def run_experiment(model, loss, train_dataset, test_dataset):
model.compile(
optimizer=keras.optimizers.RMSprop(learning_rate=learning_rate),
loss=loss,
metrics=[keras.metrics.RootMeanSquaredError()],
)
model.fit(train_dataset, epochs=num_epochs, validation_data=test_dataset)
run_experiment(baseline_model, mse_loss, train_dataset, test_dataset)`
ah alright
also woah you explained it very nicely 😳
anyone have a tip on what to learn/do after finishing 'ml for stanford' ?
Yes, is that a bad way to start ? I think I understood the concepts
what are we actually doing in this L2 regularisation
i dont get the sigma part
are we squaring the numbers in weight parameter W and adding them?
oof, I just read about a "ML scientist" (not a Data Scientist) who doesn't know any aspect of DL or anything in NLP, CNN's etc. And is wondering why he got fired from his company
Is there a way to use kwargs for the parameters in scipy.stats distributions? I can't find them in the docs:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html
Example: we are looking for parameters mu and sigma2 here
the lambda/2m term
anyway yes just take the sum of the squares of the weights
ok but what it does??
it applies a penalty that scales with the magnitudes of the weights
so the purpose is to decrease the overfitting right?
uh.
overfitting i saw it
that is what it is commonly used for, yes
if we are adding to term how is it penalising
that’s the loss function
higher = worse
of so we are increasing loss function so than dW and db increase?
sorry, didn’t understand that
i wanna know why are we adding it to loss function
use alpha
xticks yticks
@somber prism
anyone knows how do we create a function like grid search
Well, you can use numpy's linspace and meshgrid (or a similar method) to generate the sets of parameters, then you evaluate the function on all of them and pick the best results.
i need to pass strings, trying with some for loops first may be function is not necessary
?
@somber prism nvm dont use alpha.......its for discriminating how dense the overlapping plot is...........instead you can use plt.setp
u can rotate it through an angle and it would be much clear
something like this :
the usage is something like this:
ok
I have a program with face recognition by adding my custom database with images. Reading the video and doing face recognition. Libs that i am using are cv2, face_recognition. My question is can i use sklearn for classification report?
Doing a bayesian regression. I fit the model with separated dataframes x_train, y_train of shapes (samples,3).
Trying to look at output distributions instead of the deterministic values from model.predict(). So I feed the model x_test. Get error that x_test has no rank.
Tried converting the dataframe to a series oriented dictionary and spits out the error "expected one input tensor and got 3.
Any suggestions on what I need to convert my testing dataframe to, to view the output distributions?
within unet there is something called backbone_name parameter which takes resnet, vggnet.. unable to understand the fundamental difference between unet and (vggnet, resnet) ... Are'nt the latter too models like unet?
base_model = Unet(backbone_name='resnet34', encoder_weights='imagenet')
anyone online??
help me out with this guys....
😩 😩
idk what am i talking about 😄 my model failed predicting something cuz it wasnt colored as it is supposed to be
hey
if I want to learn AI
but I don't exactly have the mathematical background to understand it all
but I still want to understand the math instead of just using pre-built pipelines and treating it as black box
where should I look
youtube
😄
BRUH oK thank you 😁
but seriously are there books or courses that could do the trick
u can look for something like maths under a neural network or soemthing
and then building my own nn from scratch
ppl do this and implement everything on their own
h u h
alright that sounds like a good idea
im pretty sure there are, but i know none
oK then anyways thanks again :)
Pattern Recognition and Machine Learning by Bishop.
ISBN-10: 0387310738
You should know linear algebra and multivariate calculus for that book. There are tons of books for both of those things. For linear algebra try Linear Algebra Done Right. For multivariate calculus, idk, do whichever.
There is also links for the math in the pins.
Hey ! I'm currently trying to apply the NEAT algorithm to a snake game I coded with python. For now I already have the "base" : I have a snake object with it's own food, So I can spawn how much snake I want at once and each snake will only be able to eat it's own food. Now, I'm wondering about which inputs I should give each snake for the algorithm to work
the obvious ones are easy : position of the food, position of the head, current direction and current lenght of the snake
but for it to be efficient, the snake should be able to know the position of each of it's body part for it to be able to avoid it properly
the problem is that the number of body parts can change, and from what I know, the number of inputs should be fixed. How should I proceed ?
I've seen this on the web but here the snake still does not have informations about the position of it's body
My goal is to make it learn to avoid self enclosing
any pandas experts in here willing to educate me?
if you have a question ask it and if someone can help they most likely will
Fair enough
Hey guys a question regarding anaconda..
I have initially installed anaconda on different drive and now I have reinstalled windows and deleted the .anaconda2 and .anaconda3 hidden folders inside AppData.
The problem is that now i dont know how to make it work with pycharm
Maybe i should create all the envs, one by one using the .yml files. But I cant find them inside the env's folder
What question do you have about pandas?
I am trying to calculate the average sales across months. I have a pivot table created with pandas and if i was in excel i would use as sum if to aggregate each row but I am new to python any help would be greatly appreciated
show your data
as text
also if you have a question, just ask it.
no need for a preface
you can try https://www.deeplearningbook.org/
The data is very large and I am afraid I lack the ability to reduce it to a smaller understandable form
show a subset of the data
and an expected result
otherwise it's hard to help you
I understand thats why I was more looking to pick someones brain so I can understand what happens to the dataframe when I apply a panda function. I am 43 and teaching myself how to code
I can show you a screenshot of a pivot table in excel if that is helpful, i figured you wanted data you could manipulate
you can use agregation in pandas too
for example
df.groupby(COLUMN).agg(function(s) to aggregate with)
did you mean to reply to something else
pivot table pretty much does the same, but I find groupby cleaner to read
I'm not really sure what "as sum if" is
maybe you can give me an example
of what you want to do
and what the shape of your data is like
same
SUMIF is an excel funciton that does SUM when the IF is true
you guys type faster than i can think lol
!e
import pandas as pd
s = pd.Series([2, 5, 4, 3, 8]) # data
evens = s[s % 2 == 0] # events
print(evens.sum())
@velvet thorn :white_check_mark: Your eval job has completed with return code 0.
14
Pretty much yes
excel has it in a single function
so there's no direct equivalent of SUMIF
so he probably needs to make a custom function
but you can apply filters
which is what I did
s[s % 2 == 0] this is basically "get me the subset of the data where the remainder when divided by 2 is 0"
and you can apply any condition in the same way
even something much more complex
Thank you for reading my mind i would not be able to respond fast enough to be helpful
I'm self-taught as well, i've been there trying to replicate some excel functionalities lol
pandas > excel though
specially for much larger files
again, it depends on what you want to do specifically
@merry frost the best way to get help is to include a subset or visual sample of your data and what result you expect to see
I have sales data with 13 different revenue types and 700+ reps I need to use historical data to create a sales goal
how is the data structured? the revenue types are columns and the reps rows?
so its a 700x13 table?
stupid question can i post a photo here?
its not a stupid question, and yes, but its noyt preferred.
I dont remember how people post their df's here thou lol
each row is an event they went to with the date of the even the revenue type, rep name, number of new members.
so for example if you want the total revenue per rep (regarldess of date)
you can do
data.groupby("rep").agg(sum)
that will give you the rep, and the sum of each revenue (assuming they are columns)
you can but
it's harder to read
so not everyone will
you can just do .sum() btw
how would i than take that and get an average of each monthly total for each rep in each revenue type
Ik, but i still find that cleaner :p
interesting. I would like to test it.
my guess is that .agg(sum) would use the builtin sum
which sums as Python objects
-> slow
whereas .agg('sum') would use C summation
it might be specialcased
no idea
what is a revenue type
is it a column?
yes
you could always pass np.sum, which is what i do haha
or is it a value in a column for each row
df.groupby('rep').mean()
which is
take all the rows
group them by representative
then take the mean of all remaining columns
!e
import pandas as pd
df = pd.DataFrame([['a', 5, 8], ['b', 3, 6], ['a', 2, 7], ['b', 1, 7], ['a', 4, 3], ['c', 2, 6]], columns=['rep', 'type_a', 'type_b'])
print(df.groupby('rep').mean())
@velvet thorn :white_check_mark: Your eval job has completed with return code 0.
001 | type_a type_b
002 | rep
003 | a 3.666667 6.0
004 | b 2.000000 6.5
005 | c 2.000000 6.0
!e
import pandas as pd
df = pd.DataFrame([['a', 5, 8], ['b', 3, 6], ['a', 2, 7], ['b', 1, 7], ['a', 4, 3], ['c', 2, 6]], columns=['rep', 'type_a', 'type_b'])
print(df)
@velvet thorn :white_check_mark: Your eval job has completed with return code 0.
001 | rep type_a type_b
002 | 0 a 5 8
003 | 1 b 3 6
004 | 2 a 2 7
005 | 3 b 1 7
006 | 4 a 4 3
007 | 5 c 2 6
^ the original
revenue type is the first column?
no
then?
i think its a hash from the elastic search the data is exported from
Anyone have any idea of how to solve this? The wording is confusing to say the least. It doesn't help that I'm self taught.
sorry i dont think i understood your question Type is not the first column the first column is the hash i spoke of ( 'id') if you are asking which column the type is in that would be the 5th column
but the first column is type
in your screenshot
Correct i cut 45 columns of useless information
Do I need GPU for image recognition model training?
not necessarily
yeah, I was taking reference from your screenshot
it depends on the complexity of your model. you rarely NEED a GPU, but it can help a lot.
Not sure if this helps but this is what I'm trying to accomplish
Goal = MidQuintile(MonthlySum(new membersByRepByType))
I am sorry if this channel is inappropriate for this question :C
May i ask for recommendations for python packages that helps extract or convert music into some sort of data?
how can i remove rows in pandas by the name of the column
i needed some help doing something specific with tensorflow
im really not sure how this all work since i didnt write the code but, this is the code and what i want to do is serve it as with a Flask API, as in i want it to take image data as input and get the output. How should i approach it? should i build a model file and then somehow run it? if i were to simply import that to the flask main.py it would do a lot of computation on each request so im not sure how to do it.
ive never worked with tf before
i mean yeah its from google collab but i need know what changes i need to make
I'd say step 1, get familiar with the code. You should be able to tell yourself what each line is doing before proceeding.
No point trying to build on top of something you don't understand, especially when the code is right there
um i have a question
being RLY new to trying to llearn tensorflow, are there any good resources to use to actually learn the code and how it works?
i understand the basics of how nueral nets work but other then premade tensorflow code i cant find any good resources for learning tensorflow
Are you trying to learn how tensorflow itself is implemented? Or make a network with just TF's basic building blocks?
im trying to learn how to use tensorflow to write nueral nets
so i guess the second one u asked
im trying to learn how to actually use tensorflow to implement convulutional nets and gans and such, but theres nothing i can find that explains what the code actually does like what attributes do what etc
Take a look at that simple feed forward neural network example.
That code just uses basic concepts from TF, not an entire prebuilt network.
Prebuilt networks are really just a bunch of those basic units combined and made into a class.
Fundamentally, TF and Pytorch are just fancy automatic differentiation tools that make running stuff on the GPU (and CPU with threading and vector operations) easier (for the most part).
(I actually have my own which is very much like pytorch and it was not hard to make, the real gains from using pytorch or TF is that lots of other people have already made a bunch of models for you)
damn that wouldve taken allot of learning and math to do that though?
Not anything anybody that's really into ML would not know.
I second using pre-built stuff like TF. Very easy to set up
ok i understand what u mean by TF code uses basic blocks and adds them together to do bigger tasks,
what i dont get is jus what each attribute and function actually does, and the tensorflow documentation isint very good at explainging it at a level a beginner would understand
Give me a concrete example of what is causing you trouble.
for example, the conv2Dtranspose function, what does it actually do to make an image??
that makes no sense to me, cause as far as i know a transpose of a matrix is just rotating the matrix, why that help?
its used in a bunch of GAN code i saw to essentialy morph the input data toward turning it into an image
but i cant find a good explanation as to what its actually doing to the data
That has an animation
ah thankyou
this helps this specific issue
guess i should search more on stack overflow when i get these questions
That's more of a general deep learning question rather than an TF question.
Note that a lot of things in DL and ML have terrible names. Like "convolution" does not make sense to begin with (but many miss-uses later and it just became accepted that it means a specific thing in the context of ML/image processing).
And even worse, very popular papers will use different definitions for the same word (even in the same context).
So it's important to kind of be on the same wave-length in terms of jargon to be able to quickly understand what is going on. This can only be done by just having followed of bunch of projects and read a bunch of ideas. It's kind of like playing baseball and not knowing all the baseball terminology that was made up just for baseball. https://en.wikipedia.org/wiki/Glossary_of_baseball_terms
It's annoying to have to learn it all, but not really any way around it.
Unless you build an NLP AI around it 😏 ... Oh, wait.
Hey @rose thicket!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:
• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
you should ask your question
I just needed some guidance.
I am a beginner in this field
I'm done with python and numpy
But have no clue, what to do next
this question can be answered even by learners... not necessarily data scientists
So can you??
Another question regarding anaconda. I have exported multiple envs as .yml files and imported those file using anaconda navigator. But in every enviroment the packages are not installed. Should I do a conda install "something" ?
Is there a reason you're using anaconda? I don't really know anyone who uses it and it's easier to get help if you use venv
i just stuck to it from the beginning. Was not having any problem until now, when I had to do a win 10 reinstall on my machine
tried it just now but when i do conda env list its shows none installed packages.
Ill try to export my env as a requirements.txt
conda env list only shows if the environment was installed not the packages. If you want to see the packages you need to use conda list -n environment name
I need a software which work to recognize money. I like to work with open cv o what you recomend?
Hi, how do you check for values such as those in Dask dataframe:
df[(df['val0']==val0) & (df['val1']==val1)].compute()
the above is super slow so perhaps there's another way?
Why is it that you're using Dask, in this case?
Is the data larger than your RAM, or are you doing operations in parallel?
Yes it is a lot of data in parquet files
I want to add more data in case it doesn’t exist
I guess I shouldn’t be doing that with dask but rather use a different data structure to do the check
I'm wondering if you should be putting all this in a proper database and querying it.
I was considering that too. The data will be growing to (if I calculated that correctly) 20-30 millions of entries
I think there's a point at which you have as much RAM as you have.
FYI, I probably won't know if you've responded unless you ping/reply to me
I am not sure what you mean...
If you're working with more data than you can fit in live memory, the operation is going to be slown down by disk reads.
Yes, true. I thought Dask would help here.
I should probably start using a database. It may be useful
I checked the docs for Dask and it said that you can use it to parallelize certain operations, but that it's not an alternative to a database
On the flip side, I had somehow never heard of Dask, so thanks for bringing that to my awareness!
BlackBerry 🤣 😂 "Innovation" 🤣🤣
Im dying
Guys, what all do i need to study under Stats for DS, anybody?
I would probably start with probability theory.
How do I decrease discount factor in reinforcement learning
Worked fine this time. Tried again with .yml file and all packages are there. Thanks guys 💪
guys can anyone assist me on how to train models efficiently if i have a low grade GPU and buying a new one is not an option
also i've tried google colab and looking for other suggestions
do we compute cost after setting output values that are greater then 0.5 to 1 and others to 0, or before that?
in NN
What data I need to build a machine learning model which can predict future coronavirus cases count?
You have to know what factors account for the number of coronavirus cases, and then see if you can obtain reliable data for those factors
And which algorithm is best for my model?
in reference to binary classification
You'll want to look into algorithms that look at data points in chronological order, rather than those that predict based on each data point in isolation.
Thanks sir
Time series predictions? RNN it is
Dask is cute, but a database should be the go to for more serious storage and processing of data beyond just some spreadsheets and things that fit in memory.
Apart from cloud options like GCP, nothing else. You could try doing CPU-only training too, if you can afford a beefier CPU
what can you suggest who is taking a data science career? actually I'm a second year student im so confused whether I wanted to be a software eng. or a data sci.
Thanks! Yes, I am looking into neo4j right now. Might be a good fit for my needs
Can someone help me with python code for AI face recognizer
we have a chinese/korean/japanase data scientist here? . _. i'm having problem about use a asia font to do a data visualisation ; -;
=/ i'm hard stuck on this problem using seaborn
Change the font to one that supports the characters you need.
thats the problem, dont matter what i try he dont change
technically, it's bigger than GPT-3 ^^
import seaborn as sns
from matplotlib.pyplot import show
import numpy as np
sns.set(font="Noto Sans CJK JP")
sns.heatmap(np.array([[1,2,3]]), annot=np.array([['ë', 'bădărău', 'いえ']]), fmt='')
show()
-scared american noises-
nah tbh that looks pretty impressive. especially because it seems they did entirely with chinese tech. No Tensorflow / pytorch, no nvidia, etc
i tried this but dont works for me
i think i'll just give up, i tried to search everywhere for a solution, but nothing works
i was looking for a virtual GPU that could provide close enough if not equal to a physical GPU
if i run ur code:
i thought i had it
do u know where i can find the original one for i install?
woah
No, just use another CJK font.
u are my new religion, ty for the help
a angel on my life
idk why but some fonts works for chinese words and dont work for korea words
def cannex_format_over1y(url,product):
curr_dt = datetime.datetime.today().strftime('%Y-%m-%d')
# curr_dt = (datetime.datetime.today() - BDay(2)).strftime('%Y-%m-%d')
curr_dt_str = datetime.datetime.today().strftime("%Y%m%d")
df_html = pd.read_html(url,header=1)
header = df_html[0].iloc[0]
cols = ['Financial Institution'] # only forward fill on Financial Institution column
df = (df_html[0].iloc[1:])
df[cols].fillna(method='ffill')
df.columns = header
df.insert(0, 'Date', curr_dt)
# df.to_csv(csv_path)
df.rename(columns={df.columns[5]: "1Y",
df.columns[6]: "2Y",
df.columns[7]: "3Y",
df.columns[8]: "4Y",
df.columns[9]: "5Y",
df.columns[10]: "6Y"},inplace = True)
df.insert(loc=2,column='product',value=product)
return df
cannex_format_over1y(gic_nonreg_1to6y_url,'Non-registered GIC').replace('-','')```
hey guys i have a really simple code that im getting a warning on. Im wondering if you guys can help me figure out what needs to change to avoid the warning
this is the warning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
return super().rename(```
im struggling to understand what the issue is, but i assume it has to do with the
df[cols].fillna(method='ffill')```
portion of the code
Getting a font that can do multiple will not be easy, but they exist.
A font only has the glyphs that the typographer created for the font. Making a font with glyphs for multiple languages is a ton of work.
Maybe ask reddit: https://www.reddit.com/r/typography/
Few fonts are free and even fewer are free and good.
wdym?
you have full control over your GPU in the cloud. There is nothing you can't do physically that you can do using a terminal 🤷
ikr - not even CUDA 👀 that impressive af, but thank god US bans these chineese companies from operating
I mean, if you ask me that's a stupid idea
If I was on the CCP id say "The U.S is unreliable as partner for this, let's throw bullshit cash at it and develop our own" and boom. goodbye Tensorflow / PyTorch AI monopoly on the U.S
well.. i found some fonts where has options to chinese/korean/japanese... but when i try to make the seaborn use it he dont use . _.
that's just my take thou
I trust no chineese AI dev - especially when you see what they use AI in China like. it's literally like 1984, and all research done helps them 😞
i tried
sns.set(font="Arial Unicode MS")
sns.set(font="ArialUnicodeMS")
sns.set(font="arial-unicode-ms")
but none of those work
I have no doubt China does shady shit with AI, but some of those claims I find them absurdedly exaggerated (they would require AI beyond state-of-the-art)
like?
social credit stuff, which ended up being mistranslations and exagerrations
its pretty much a credit score lol
The documentaries I have seen are pretty demonstrative of those tech used
it's in prototype stage
one doc actually interviewed the chineese guy making it - and he was answering those question very carefully
he expressly stated that those technologies will "benefit" chineese citizens
Idk I find it still exagerrated. Think it like this from another perspective:
JP Morgan and all other U.S banks have credit scores. Literally your whole life, including employment is based on this credit score, and this is not called "dystopian". Either all these types of scores are dystopian (they are) or none are, but cherry picking because "X GOOD" "Y BAD" its annoying.
In fact, US credit score sounds worse than a social score lol.
Can you lose variance even though you do a full reconstruction with SVD? Like X = s @ V.T
it's not only credit scores or anything - it's a lot of tracking tech too, which is perfectly plausible to build with the proper investment
I mean, just look at what NSA did in earlier times. no one could have believed that such resources would be poured just to track common people.
but this doesnt ma,ke the chinese government dystopian. People willlingly give this up.
I didnt see any U.S protests over the NSA leaks that basically all Facebook, Cisco, etc, have backdoors for the U.S gov.
the sad trust is: people give away privacy for conveniency
you dont need government intervention when people give it away on their own
ngl, people do give up their privacy. but you would be wrong that there were no protests or any opposition
*no significant
if there was no change, the protest was irrelevant.
I would know that, living in 3rd world semi dictatorship country
anyways, I for one support US's mass surveillance
I dont. Neither chinese nor US. screw both
china's is really bad - but then you never know when USA might be too
well.....if it guranteed safety for your kids 🤷
we go back to square 1 lol.
I know a lot of chinese. they dont care about privacy, HR or whatever, as long as they're safe and prosper
if giving up a part of your daily privacy can prevent some mass shootings (maybe with your family involved) would you pay the price?
I feel a lot of americans think the same
i would yes,
in fact i do
but i dont call that dystopian
no, but what china does is defintely wrong - and their research funds all go into that "dystopian" research
we are a bit off topic here i think, not in the domain of the channel
if you'd like we can continue discussing political perspectives via DM and not spam the channel
Did you read that research where some chineese uni made a model to classify criminals based on their faces? with a lot of SOTA work, they got 85% accuracy in predicting criminals alone from their faces
may not be gov sponsored (didn't check), but still
everything is pretty much gov sponsored in China.
true. who would have even thought of making a model to do that unless specifically directed??
in any case, chineese life is just ....depressing to say the least
I mean, i would try random crap if i was bored, but that was oddly specific
!ot
Off-topic channels
There are three off-topic channels:
• #ot0-psvm’s-eternal-disapproval
• #ot1-perplexing-regexing
• #ot2-never-nester’s-nightmare
Their names change randomly every 24 hours, but you can always find them under the OFF-TOPIC/GENERAL category in the channel list.
Please read our off-topic etiquette before participating in conversations.
https://www.technologyreview.com/2016/11/22/107128/neural-network-learns-to-identify-criminals-by-their-faces/
2016?? oh shit....
Soon after the invention of photography, a few criminologists began to notice patterns in mugshots they took of criminals. Offenders, they said, had particular facial features that allowed them to be identified as law breakers. One of the most influential voices in this debate was Cesare Lombroso, an Italian criminologist, who believed that crim...
@grave frost we are off topic, you want we can conitnue via DM, lets stop spamming here
some day later 🙂 Ive got Homework to do
sure! good luck man (y)
How do you even detokenize this dataset? 😐
i want words! i mean, I solved my problem, but can't check it
got it
from some issues xD
Hi, I am using implicit package (https://github.com/benfred/implicit) to create a recommender system. I am using the implicit least square algorithm.
I was able to make predictions for already existing users, or to find similar items, no prob. But I don't get how can I get predictions for a new user which was not in input data? the idea is that I have a set of items (each one existing in input data), and I want recommendations based on this set. I could get recommendations for each items and sum them up, but it doesn't feel right. This seems like a common usage, so I think I am missing something ^^'. Any ideas? Thanks 🙂
is anyone available to help me?
@lapis sequoia Possibly, what is it that you're trying to do?
anyone familiar with tensorflow? Having some issues getting logs to write for a customtensorboard
why is this returning 11
coins = 8
max = 0
while max < coins:
# print(max)
for i in costs:
max +=i
max```
because in the first loop of "while" the condition "max < coins" true, you has been go through all elements in "costs" and add it to "max", so the result is 11 is normally 😄
The while loop only goes through one iteration.
anyone here use machine learning for finance?
Hello guys. I need help in understanding this dataset .
http://www.timeseriesclassification.com/description.php?Dataset=WordSynonyms
What are those features? And what is being classified?
Each case is a word. A series is formed by taking the height profile of the word
WordSynonyms remapped FiftyWords to 25 classes
But the data is the same (and I think flipped)
@lapis sequoia What do the classes tell and what is height profile of a word?
hi, i'm new here
anyone want's to help me with some code suggestions?
i'm working on CNN project, and i have to prepare a dataset for my boss, who gave me 2 .HID files with inside some specific image filenames from a big Dataset of Images. I've converted every line of the .HID file in a element of a list, and i have a dictionary with all image filenames. But to check if the names in the .HID are matched with the names of the Dataset, i have to join ".jpg" string at the end of every elements of the list, cause the list elements are image filenames without the extension. Is right my
reasoning? Someone who can help me to do this? Cause the problem is that you can't concatenate list elements with string...
`import cv2
from PIL import Image
import os
path_file = 'E:\Work\AU(13)-SottoCampioniA e B\SetA.HID'
path_image = 'E:\Work\AU13_face'
work_dir_tr = 'E:\Work\x'
image_file_names = [i for i in os.listdir(path_image)]
#images = [i for i in os.listdir(path_image) if i[-3:]=='JPG' or 'jpg']
file1 = open(path_file, 'r')
list_of_lists = []
for line in file1:
#print(line_list)
stripped_line = line.strip()
line_list = stripped_line.split()
#print(line_list)
list_of_lists.append(line_list)
list_of_lists = [line_list + ".jpg" for line_list in list_of_lists]
file1.close()
print(list_of_lists)
#============================================================================
#############################################################################
#============================================================================
#result = any(elem in line for elem in list_of_lists)
os.listdir(path_image)
wIDTH = 100
hEIGHT = 100
for i,image in enumerate(image_file_names):
#print(image)
if any(elem in list_of_lists for elem in image_file_names):
print(i,'matched')
# im = Image.open(image)
# im = im.convert('L')
# I = Image.open(path_image+"/"+ image)
# I = I.resize((wIDTH,hEIGHT), Image.BICUBIC)
# I.save(work_dir_tr+'/'+ image)`
This looks like something that should go in one of the help channels
As it's not necessarily directly related to AI
yes but, there is no people who answer me
wait :)
It took me two weeks but I've only just noticed that the reason my model isn't training isn't necessarily my data. It's my code for my model:
Orange is sklearn.linear_model.linearRegression() blue is my own OLS algorithm
kill me ;-;
idk how i didn't notice what a shit job it was doing
has anyone used hugging face before or is good with AI
I am trying to create a personal assistant app that would answer questions based on information I trained it on. I want to lets say upload a book or a dataset of a lot of research papers then when I ask a question it would give it
is that possible without having context or something
just train it with some data?
is this what i am looking for?
or is this something else
Hey @lapis sequoia!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
import cv2
import numpy as np
def dibujar(mask,color):
,contornos, = cv2.findContours(mask, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
print ("ya deje de joder")
for c in contornos:
area = cv2.contourArea(c)
if area > 3000:
M = cv2.moments(c)
if (M["m00"]==0): M["m00"]=1
x = int(M["m10"]/M["m00"])
y = int(M['m01']/M['m00'])
nuevoContorno = cv2.convexHull(c)
cv2.circle(frame,(x,y),7,(0,255,0),-1)
cv2.putText(frame,'{},{}'.format(x,y),(x+10,y), font, 0.75,(0,255,0),1,cv2.LINE_AA)
cv2.drawContours(frame, [nuevoContorno], 0, color, 3)
cap = cv2.VideoCapture(0)
azulBajo = np.array([100,100,20],np.uint8)
azulAlto = np.array([125,255,255],np.uint8)
amarilloBajo = np.array([15,100,20],np.uint8)
amarilloAlto = np.array([45,255,255],np.uint8)
redBajo1 = np.array([0,100,20],np.uint8)
redAlto1 = np.array([5,255,255],np.uint8)
redBajo2 = np.array([175,100,20],np.uint8)
redAlto2 = np.array([179,255,255],np.uint8)
font = cv2.FONT_HERSHEY_SIMPLEX
while True:
ret,frame = cap.read()
if ret == True:
frameHSV = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
maskAzul = cv2.inRange(frameHSV,azulBajo,azulAlto)
maskAmarillo = cv2.inRange(frameHSV,amarilloBajo,amarilloAlto)
maskRed1 = cv2.inRange(frameHSV,redBajo1,redAlto1)
maskRed2 = cv2.inRange(frameHSV,redBajo2,redAlto2)
maskRed = cv2.add(maskRed1,maskRed2)
dibujar(maskAzul,(255,0,0))
dibujar(maskAmarillo,(0,255,255))
dibujar(maskRed,(0,0,255))
cv2.imshow('frame',frame)
if cv2.waitKey(1) & 0xFF == ord('s'):
break
cap.release()
cv2.destroyAllWindows()
Someone help me i dont know what is wrong
Hello, Do you know some good sources for finding a plan for how to become a data scientist? I mean a real plan, not how to become a senior data scientist. I found some articles but there is too much to learn, you need a lifetime to learn this stuff. I started with linear algebra and also learned the basics for ANN, but there is much more. I need a good plan because I want to find a job in a year maybe.
Hello, I'm a noobie to ML and was learning about Decision Tree Regression and was testing out something on my own. The Decision Tree algorithm for Regression works in a way that, for each node of the tree, iterates through all the values of all the features trying to find the split that decreases the SSR the most. At each iteration the algo considers only 2 points at a time, takes their average, makes the split at that average, and then makes predictions using that split and calculates the SSR. And then selects the split which decreases the SSR the most. I was wondering, does the number of observations considered at the time of a split (i.e. 2 right now) affect the model in any way. I believe its a trade-off between speed/time taken by model to train and accuracy of the model. So I wrote a notebook for testing it out whether this trade-off is significant enough to be considered. Can someone please go through this notebook and let me know if I'm just wasting my time doing silly & useless things or should I continue this exploration. It'll be really valuable to me if someone gives a feedback on this. Thanks
Here is the notebook : https://github.com/Noobie20/ML/blob/master/Regression/Decision Tree Regression/n_obs_split.ipynb
is there a way to data mine facebook?
hi so if im prediciting sales for a company whats the best type of model to use for something like that?
used above, what does int64 do?
does it limit the number of digits to such that they are 64bits and fasten up the model?
I think it's more about your system architecture. What type is A2?
in that case, A2 > 0 returns a boolean array, and the call to np.int64 is converting bools to 64-bit ints.
sounds right to me
did saw something like that in lecture
it looks like that line is just a fancy way of setting certain values in dA2 to 0
I think dA2[A2 <= 0] = 0 would have the same effect.
he did it in less fancy manner one in forward propogation
why would he make it more fancy here lol
😆
he did this earlier
in forward prop of dropout
Can you see how that could be simplified?
no cuz i dont get what int64 is doing there
there is no int64 in that one
ya in this i understand everything
its making values less then probability to 0 and others unchanged
so do you see how you could simplify it, knowing that dA2[A2 <= 0] = 0 is an alternative to the other one?
its sort of same i guess i understand
True == 1
False == 0
That’s why when you sum a series of true and false, you’ll get the total count of True values
i didnt follow
In Python 3.x True and False are keywords and will always be equal to 1 and 0.
guys
can anyone help me with opencv?
i wanna detect angular velocity
of a rotating object
Abyone?
*anyone
damn I hate pandas
I want to drop the second row, but it messes up the index
0 column_2
2 ....
3 ....
1 is missing from the index due to the deletion causing keyerror. anyone know what this problem is called and Its solution?
You could always reset the index, but that's ill-advised since you want to be able to trace back the changes from the original dataset back to your actual work.
🤷 Im just double iterating over it now rather than indexing
In any case, if you want to go that way, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html
something like that - I wanted to iterate over double columns, so I converted it to a dict to make everything easy to work with
no more pandas 🥳
ya but still it doesnt make sense to me about using int64
hey im working on a KNN algorithm that tries to predict whether a youtube video will trend based on the title. Does anyone know where i can get a dataset that includes trending and non trending? i can only find trending so far...
so if i have a program
that checks a csv file, and it is like if this input is found in the a column then go the value next to it in the b column, and check if the next input the user types matches that.
but i dont know how to do that
any help?
that's what i am trying to do lol
You could probably do something trivial like this (not sure if the best python way to do it)
# finding index of the "match"
idx = df[df["column"] == value].index
# retrieving index at column b
val = df.at(idx,"column-b")
how to filter date multiple coloumn
Hi guys, I hope this is the right channel to ask for opinions about this: I want to create a whatsapp bot using Python that report to the users the status of delivery of their product. Any ideas of how can I do that?
df['Percent'] = clean_values(df['changesPercentage'])
Note: clean_values removes brackets and symbols out of the number and convert the string to float. Is there something wrong with the syntax
I am working on a unsupervised text sentiment project but this is the first time I am doing this. I got some feedback the last time I posted here but I still have some questions.
Currently I have a dataset to train the model but I don't know how to make the model.
-
I have preproccesed the data. (stemmed, lemmetized, removed stopwords)
-
I have used a w2vec
-
Used Kmeans to create 2 clusters (but the clusters are not good because I don't know what I can do.)
-
Now I don't know what to do
you cannot believe how many hours it took me to realize that fit_transform is actually fitting and then transorming the DataFrame 🤣
I was applying that nonstop on test dataset
Oh boy hahahaha. I guarantee you're not doing that mistake again. Emotional trauma is the best teacher sometimes lol.
yeah, i was literally hugging the documentation at night and praying to it in hopes of finding an answer
I have like 5 days fighting over a read-streaming-data program I kinda need for work to do some aggregations on huge datasets.
I read a bit on https://wiki.haskell.org/Lazy_evaluation Haskell's Lazy approach and when combined with Pandas it can deal with the data in chunks decently. Even tho it was going really slow so something was amiss. I have a padding function for UnicodeEncodeErrors where it just printed a '?' for each invalid char it found, but the issue was I passed the whole chunk of the dataframe and it casted it to str, instead just the invalid value. Since almost each chunk had one weird char, I was casting everything as a string, printing each char one by one in that chunk and then casting it back to pd.Dataframe. To read 1 million records and 30ish rows, it took 52 minutes lol. I haven't fixed it yet, but hopefully it'll work it out in 20 seconds (ish) if everything goes accordingly.
Hi all, is it possible to have 2 y axis for 1 x value? https://stackoverflow.com/questions/66545695/python-plotly-dash-question-custom-labels-and-color-based-on-values
I want 2 bars with px.bar.
@primal tulip doesn’t provide a bar example nor with plotly.express unfortunately
sheesh yeah i've noticed string like work in pandas with huge datasets is a bit slow, datetime modules are even slower.
I need help to fit multiple columns in a linear
LinearRegression
Like comparing X to diffrent Ys(diffrent columns)
I have to do something like this in the near future for my program. I'll get back to you with the answer (if I have it) in case you haven't found it yet.
How is this momentum equation derived.....i need reason to why the equations are like this....
?
I'm trying to plot information/facts about companies from a dictionary item onto a timeline like this, how would I do that? At the moment, I've got one item stored in the dictionary, it gets plotted on the graph but doesn't get labelled with its name and as well annotating other info too...
does anyone know why using multiprocessing while using matplotlib opens up multiple plots
Quick question about plotting data
Say i've got this data:
You could easily just draw a straight line through it and say they're linearly correlated
but is this gap in the middle problematic?
If i plotted a straight line through both regions individually it would say that they're not linearly correlated
is plt.show() in the function?
If so thats probably why
nope
it can be any function, even with a function with just print and itll do the same thing
So nothing to do with matplotlib is in the function t
Just the fact that you imported matplotlib causes the multiple plotting
depends on the data and context of your problem
2 data clusters seem pretty significant depending on your problem
like if you were in retail and it represented 2 different demographics
otherwise, you could probs use a linear model...just probs not the strongest is all
My colab continuously crashes when I simply take the difference between predicted values and true values.
Worked around that by throwing it into a loop to compute difference of each element (wouldnt this take more RAM??).
Now it gets stuck when attempting to plot the histogram of those difference. Any tips for reducing the load....which i honestly don't even get why its having a problem with plotting yet trains just fine
hello, can someone help me understand the non-decreasing property of R^2 regarding regression models. I clearly, can't understand why the hell can R^2 never decrease upon addition of new predictors. I found this explanation on stackexchange. At the end of this answer, the guy says Or if extra estimated coefficient(βp+1) takes a nonzero value , the SSE will reduce. Why would the SSE necessarily decrease. Isn't is possible that the new combination of coefficients (β's) would make even worse predictions. What if the model, upon addition of a new predictor makes even more worse prediction than the model where the "new predictor" wasn't present. Because of worse predictions, the SSE will increase as a result of which, R^2 will decrease. Where am I wrong?
with some stretching and maybe some of it is redundant but this is pandas version haha https://pastebin.com/vzBKKTNy
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
@desert oar , just saying I solved it if you were curious hah
@late shell The basic idea is Sum of Squares total (SSTotal) equals to Sum of square of individual factors + Sum of sqaure of their interactions(if any) + Sum of squares of error(SSE)
SST = (SS1 + SS2+...SSn)+ (SS12+SS13+...)+ SSE.
So by adding a new predictor say n+1 , then it comes in form of SS(n+1) and its interactions with others (if any) and since SStotal remains constant the SSerror has to decrease.
Thus in R^2 formula either the SSE decreases or it remains constant. So the R^2 either increases or remains constant.
Hope it helped you.
Hey, I was wondering if this channel included neural networks and machine learning, or if this is just for standard a.i.
Sure, the channel description mentions ML and there are often ML people hanging out here.
can one CNN model be used for all types of images? 🤔 (for recognition)
For example i have model which is good at dogs, cats, ducks dataset. At the result, can i just change dataset to else images? Without changing fundamental CNN model
Sure, with no guarantees whether it will perform just as well or not.
Hello people... first time posting here... I am working on a model that predicts whether a person was arrested based on some variable information...
My target variable has multi-class data and I chose to convert the classification to numerical values prior to fitting the data to the model.
new_target_values = {'Arrest':0,
'Field Contact':1,
'Citation / Infraction':2,
'Offense Report':3,
'Referred for Prosecution':4}
I got ValueError: ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].
Should I just do a binary classification and have arrest be 1 while the rest are 0s or should i try fitting in a multiclass model
question what applications is a random forest model good for?
it can sometimes be good for classification tasks.
what would be a good model to use to predict sales?
kinda general I know
if you wanna use classical ML, try some sort of boosting
Does anyone have a code snippet for multi word keyword analysis
Epoch 1/100
1250/1250 [==============================] - 1s 400us/step - loss: 128.7992
Epoch 2/100
1250/1250 [==============================] - 0s 397us/step - loss: 1.5939
Epoch 3/100
1250/1250 [==============================] - 1s 406us/step - loss: 1.4500
Epoch 4/100
1250/1250 [==============================] - 0s 385us/step - loss: 1.3226
Epoch 5/100
1250/1250 [==============================] - 0s 398us/step - loss: 1.1951```so uh my `X_train` size is 40k, so why is it only 1250
(ping 2 reply thx)
show your training code
wait uh
here
import tensorflow as tf
import numpy as np
keras = tf.keras
def func(inp: np.ndarray) -> np.ndarray:
return np.array([inp[0] * 2, inp[1] + inp[0] * 3, inp[0] * 1 + inp[1] * 10, inp[1], inp[0] + inp[1]])
training = []
for x in range(200):
for y in range(200):
training.append([x, y])
X_train = np.array(training)
y_train = np.array([func(x) for x in X_train])
model = keras.models.Sequential(layers=[
keras.layers.Dense(5, input_shape=(2,)),
# keras.layers.Dense(5)
])
lr_decay = keras.callbacks.LearningRateScheduler(lambda e, lr: lr * np.exp(-0.1) if e < 20 else lr)
model.compile(optimizer=keras.optimizers.SGD(lr=1e-3), loss=keras.losses.MeanAbsoluteError())
model.fit(X_train, y_train, epochs=100, callbacks=lr_decay)
it's really rudimentary, but i'm just learning
default batch size is 32
40000 / 32 = 1250
also another question, the loss is the sum of the losses
but it's been a while
wait what
not a function applied over individual values in actual and predicted
i'm confused
you have an array of actual values, and an array of predicted values
and the loss function
does whatever it wants
so
in the case of mean absolute error
it depends on the loss function
yw 👋
hey, uh, i have another question
when tweaking the parameters, why is it lr * gradient? wouldn't just a general direction be enough? (positive, negative, or 0)?
Hey guys, so i want to build a 3D model of a place for a project and i want to run an AI simulation through it based on customer shopping patterns. What is a good program for me to use which supports AI in the 3D model
the learning rate is an adjustment to fine-tune how much your function is changing based on the observed gradient.
if you simply set it to 1 you have no flexibility and your model might never reach (or take forever to) reach the global minima for the cost function. Too high LR can make you "fly over" the minima and too small may take too long and use too much computational resources to converge into a solution
but just because a function is like really steep where the params are now, doesn't mean it's steep for a long time
But you dont (generally) know that, neither does the algo.
i mean like take this hypothetical cost function: / ---/ <-- we're herei mean it's steep, but that doesn't mean we should
oh ok
so it's just generally agreed upon, and it's worked for most models?
this is what LR does
Learning rate just determines the "step size" how large is your jump
and this is a good visualization of what happens with different learning rates
if you look at the right image it "jumps" over the minima because the step size (LR) is too large
hey- for the sklearn housing data set, ik the target variable is the mean house price, but what's the unit for that?
$100k or something?
Hi ,
I am developing a recommendation system
I have a question...
that suppose we have the product list so how we can do synthetic grouping of that list.
for example
we have
milk , 1L
milk, 500ml
milk,2L
I want that my system consider it as same
any idea ...
You're driving on the road. You ask, are we there yet?. I tell you, no, but your destination is ahead. You ask, how far are we?. I say, "your destination is ahead". You say.. That's not very useful on its own is it. I say, yep. Too bad.
Here's the real kicker. My direction information felt incomplete when we even knew our destination. Now imagine the same scenario where we don't even know where the destination is.. Oh and we teleport randomly to different roads and keep asking the same question.....
Best part is, that assumes that we would even know we're there when we arrive. Which we don't. Sounds like fun
thank you
What is the difference between np.min(array) and array.min()? I timed it on a numpy.ndarray, and array.min() is a bit faster. Would've thought it would be the same speed.
Hey everyone, I have a question about the credit risk notebook from pysurvival
the goal in this notebook is to predict the speed of a repayment loan
but at the end
we finish by plotting this graph that I don't understand
I'm not sure to know what is the y-axis and I don't understand how the high risk line can be faster to repay the loan than the low and medium risk
shouldn't it be the opposite?
Also I'm not sure to understant what the "T=6.0" means (the actual time)
I looked at the code but it didn't help me that much, can you help me please?
Hello. Anybody with data science experience? I want Simpsons transcripts for a machine learning task. I want them all in .txt files for all the episodes named ep1.txt, ep2.txt, ep3.txt, ep4.txt ... and so on. I found a script dataset of Simpsons here: https://www.kaggle.com/prashant111/the-simpsons-dataset?select=simpsons_script_lines.csv
but it is one csv file that is not split. How to I get the data in the kaggle link to my format?
Can anybody tell me a script to get the data in the format I want? Or is the data available in my wanted formatted anywhere? I'd appreciate any sort of help!
@lapis sequoia do you know pandas?
What is the difference between the correlation coefficient and the p-value in relation to how good the regression model is?
I need some help with numpy's random.randn
why and what does it print
it returns an array where the elements have standard normal distribution
one uses the p-value to figure out if the model's performance is just the result of random chance (to simplify it a bit)
whereas the correlation coefficient is a measure of how strongly related two variables are.
yes
is there a column that gives you the episode number?
yes there is
so you can keep selecting rows by episode number and write each slice of the data to file like you wanted.
yes, but how exactly would I implement that in code?
well, it's not much of a learning experience if I give you the code
pweez I'll be able to learn from the code
how much programming experience would you say you have?
If beginner is 0, Intermediate is 0.5, and expert is 1, I'd say I'm 0.7
but I'm new to dealing with csv files
what does standard normal distribution mean?
@serene scaffold are you there?
!docs pandas.DataFrame.groupby
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)```
Group DataFrame using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
how do I use pd.dataframe.groupby to achieve what I want?
what column are you trying to group things by?
while I'm averse to telling people to "just google it", I personally wouldn't be able to give you a better answer than an online resource.
?
the original dataset looks like this, preview it here pls: https://www.kaggle.com/prashant111/the-simpsons-dataset?select=simpsons_script_lines.csv
I just want files like 'ep1.txt' 'ep2.txt' and so on containing scripts like this:
Homer Simpson: Hello
Moe: Hello
*Character: Dialogue*
right, I know what the data looks like. I downloaded it so I can help you. but I'm not going to write the code for you
look at the data--if you want to handle each episode separately, what column gives you that information?
episode number
are you sure?
episode id
right
🙂
strictly speaking it is episode_id. the underscore is necessary
yes
wait
so you need to select rows by each episode id and write out each slice.
let me try to write the code
can I ask you if I have any problems while writing the code
I need to do homework but if you ping me I'll try to look at it
thank you
hello
my computer is very weak
it crashed because the csv file was too large
and repl.it can't load the csv file
let me see if there's an option
@lapis sequoia you can just read in a certain number of rows at a time, but that means you'll need to be appending to the outputted files
Yo, is it alright if I ask for some advice on how to do some down-and-dirty outlier detection in a t-SNE plot? I am currently evaluating a weird machine learning method I jury-rigged together and am trying to generate some evidence that what the system is flagging as abnormal is actually abnormal
@serene scaffold ARe you there
I wrote my code
but was waiting for your homework to finish
Oh don't worry I wrote the code
but I need your help
a lil
are the episode ids random unique ids? or are they the number of the episode?
I wrote this, so I can get all the dialogues of a particular episode to write to my txt files: https://replit.com/@BleepLogger/freeprocess#main.py
Sorry to interrupt you Pinkie, hoping someone else can chime in, I'm trying to catch up on this course but I'm doing I'm really stuck on the basics, at the moment I'm on this exercise:
Exercise 6:
Please import from seaborn the famous Anscombe’s quartet. Then plot them with
matplot. And calculate their means, variances correlations and linear fitting
coefficients. For linear regression, you can use the sklearn lib. Can you have a more
concise way to plot the data?
And I'm given the code
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model
anscombe = sns.load_dataset("anscombe")
print(anscombe)
# create subsets and subplots of the anscombe data
dataset_1 = anscombe[anscombe['dataset'] == 'I']
dataset_2= anscombe[anscombe['dataset'] == 'II']
dataset_3 = anscombe[anscombe['dataset'] == 'III']
dataset_4 = anscombe[anscombe['dataset'] == 'IV']
fig = plt.figure()
axes1 = fig.add_subplot(2, 2, 1)
axes2 = fig.add_subplot(2, 2, 2)
axes3 = fig.add_subplot(2, 2, 3)
axes4 = fig.add_subplot(2, 2, 4)
axes1.plot(dataset_1['x'], dataset_1['y'], 'o')
axes2.plot(dataset_2['x'], dataset_2['y'], 'o')
axes3.plot(dataset_3['x'], dataset_3['y'], 'o')
axes4.plot(dataset_4['x'], dataset_4['y'], 'o')
#linear regression model
regr = linear_model.LinearRegression()
regr.fit(dataset_1['x'].values.reshape(-1,1), dataset_1['y'].values.reshape(-1,1))
axes1.plot(dataset_1['x'].values.reshape(-1,1), regr.predict(dataset_1['x'].values.reshape(-1,1)), 'r')
plt.show()
I really just barely have a clue how this code is even working, I understand it is plotting graphs atleast, and I know the Anscombe’s quartet will have the same means, variances, medians, etc... but can anyone guide me through calculating those values? Would appreciate any help
I didn't receive much support from my lecturer since face-to-face teaching is not allowed :\
how complicated would something be like this to set up for a noob?
like where would I find the code for an alogarithm like this?
what are the possible relationships between correlation and causation?
@serene scaffold Are you online
yes
👆
⤴️ .
alright, what's next?
I have written the function to get the script of an episode by knowing it's ID. How do I use this?
right, so once you have all those CSVs, what do you want to do with them?
[PANDAS] Hello men. I have a big trouble with having no idea how to write a code to print this:
The most popular girl’s name and boy’s name in every year ( two records for year )
And I wonder how to make that? That’s the excel sheed which I have read in. Liczba means amound, Plec means sex, Imie means name and Rok means year.
And thats the code I was trying to do smth with ```py
print(f"{df1.loc[(df1.groupby('Rok')) & (df1.Plec == 'M')]['Liczba'].idxmax()}")
I don't want many csvs, I want many txts. I need them for a machine learning project, specifically, few-shot learning with EleutherAI's GPT-Neo.
I finished the code to generating all my txts. Can you please verify my code and correct and explain me all errors? Also tell me how I can improve my code and why it isn't working if it isn't working. Also inform me if it works as expected.
Here is the finished code: https://replit.com/@BleepLogger/freeprocess#main.py
@serene scaffold Are you there?
What do you want in the text files?
Just all the dialogue in a given episode as one continuous stream of text?
the scripts for all the episodes
yes
That's easy to do if you can fit the whole csv in your ram
did you check out my code
oh is the method used in my code fine
I can, I have 13 gb RAM
One moment
take as long as you want
np.full((25, 25), "white", dtype="object")
raises
ValueError: Object arrays cannot be loaded when allow_pickle=False
set allow_pickle to True
then it will work
where?
i'm not seeing that as an argument in np.full
i'm not using np.load tho
argument in np.load()
wait a sec
try the argument in np.full
or remove pickle data from your file
@lapis sequoia wait I looked it up
there is no allow_pickle argument in numpy.full()
@lapis sequoia your code can at least be greatly simplified
try opening an issue on github'
great! how?
okay, let's try a different solution then...what is the correct dtype for a string with variable length?
import pandas as pd
df = pd.read_csv('simpsons_script_lines.csv')
episode_ids = df['episode_id'].unique()
for id_ in episode_ids:
...
see if you can go from there
oh there's a unique function
it's a method but yes
I thought I had to make it a list and then make it a set to make it unique
I understand it's not simple, but does my code at least run properly? Can you check the txt files that it generates and whether they make any sense?