#data-science-and-ml
1 messages · Page 313 of 1
was like
thinking about taking slices
across axes
and what shapes the results would be
Hey anyone got a chance to play around with DataSpell?
I feel like pandas is much more manageable for labeled data compared to numpy, but I heard that pandas is at times 20x slower
Woo, finally got around to rebuilding the main solve loop for my Keep Talking and Nobody Explodes bot, so it can now solve multiple concurrent modules at once rather than having to do them all one at a time
(Seemed relevant what with the vast majority of what it does being computer vision)
https://www.youtube.com/watch?v=ZSGCO4eFRJE
but at the end you will just execute a cell and continue working with edited variable
hence you won't feel it as much
that looks so rad!
Thanks :)
how did you end up achieving that? what libraries did you use? that's so impressive
OpenCV for all the Computer Vision stuff, the overlay is from wxPython, pynput for sending mouse input and mss for the screen capture
And tesseract for the OCR (at least when I couldn't be bothered to hack something more accurate together with opencv)
It does all the other (non-needy) modules too, though it takes a long time with the one with lots of words because OCR is slooow and it has to do a few passes to make sure it's right
https://www.youtube.com/watch?v=DvTNRo8tCqo
Keep Cheating and Nobody Explodes bot trying out 8 modules.
Still need to update a bunch of the drawing and text and its display in general, but I think the non-needy modules are all basically finished at this point
All written in Python, using OpenCV for the image processing/computer vision parts and a bit of wxPython for the display window ...
Still working on making the drawing output more often and more informatively
The vast majority of what opencv is being used for is calling inRange() on hsv images and finding contours. And some eroding and dilating but most all of it is just using that. The keypad with symbols on it uses matchShape (and a ShapeContextDistanceExtractor when matchShape is lying) but otherwise it's largely just testing for certain colours and sizes of resulting contours and then using handwritten logic to work out what that means
what does the word prior mean in terms of ML? I'm reading a paper in which they say that
The construction phase is prior-driven, not data-driven—-data comes in only at the learning phase.
Please let me know if anyone can help and if more detail is needed. Thanks.
It didn't change for me, so I think it's read-only.
They mean the "prior probability" as in Bayesian statistics
anyone here familiar with sparse matrices and how they work
yeah that's what i thought so. but what do we mean by prior-driven and data-driven.
I've just talked with a friend of mine, and what he suggested is instead of giving data to our model to learn we give certain properties for granted.
(btw in this approach we use very less data so above suggestion made kinda sense to me.)
yes i think your friend's interpretation is reasonable
from Bayes' theorem we have P(θ|Y) ∝ P(Y|θ) * P(θ) where θ is our model parameter and Y is our data. if we don't have a lot of data, our estimates of P(θ|Y) will depend more strongly on our assumptions of P(θ)
it's better to just ask your question, then if someone has an answer they can just answer without extra back-and-forth
"don't ask to ask", is the saying
well im running into issues with predictions after building a model
yeah that makes a lot more sense now. Thanks a lot for answering : )
my training data after preprocessing have different number of features to the test data, which raised an error during prediction stage
both test and training datasets were preprocessed the same way
this doesn't appear to be relevant to sparse matrices btw
can you show your full code?
at least how clf is defined
my training data after preprocessing have different number of features to the test data
basically, this should not ever happen
I posted this whole part on stackoverflow
the model worked during validation
so this works fine (predicting the training data)
clf.predict(X)
that is what I thought as well
You are not using sklearn correctly
you are creating new and separate transformers for each split
you don't want to do that
oh
after .fit-ing a transformer, it keeps the fitted state internally, then you just .transform on the other datasets
so the vectorizer function is not working correctly?
it cannot work correctly as-written
vectorizer = CountVectorizer(stop_words = 'english')
classifier = LogisticRegression(C = 0.01, max_iter = 1000000, penalty = 'l2')
x_train = vectorizer.fit_transform(data_train)
clf.fit(x_train, y_train)
pred_train = clf.predict(x_train)
x_test = vectorizer.transform(data_test)
pred_test = clf.predict(x_test)
your code should look something like this
better yet, use a "pipeline" to automate the sequence of preprocessing and classifier fitting
from sklearn.pipeline import make_pipeline
clf = make_pipeline(
CountVectorizer(stop_words = 'english'),
LogisticRegression(C = 0.01, max_iter = 1000000, penalty = 'l2'),
)
clf.fit(data_train)
pred_train = clf.predict(data_train)
pred_test = clf.predict(data_test)
and does that work with both text features and numerical features?
because right now, I am vectorizing each text feature on its own first (though not done correctly), then hstacking them with the numerical features (in np.array form)
but what you are showing is that I just preprocess the whole dataset together?
or am I misunderstanding it?
if you only want to apply the CountVectorizer to some dataframe columns but not all, use https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer
and here are the Pipeline docs https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline
thank you
I added an answer on SO as well
Hi guys, can anyone help me with this?
Please I am stuck here. The task is on the picture. Trying to create an API for an image search. I am done with the ml part and the trained model has been assigned to a variable model
Check this https://datacarpentry.org/python-ecology-lesson/09-working-with-sql/index.html
This is db not sql file
thank you!!!!
I hope it help
yo
I'm trying to work on my own style GAN encoder and would like to learn the basics leading upto this do I learn OpenCV, Tensorflow or Keras first?
is there any specific reason just to build an encoder?
I want to create Cartoons from images and videos.
Want to do motion detection and face recognition.
also want to turn images into videos.
maybe make my own anime
i have alot of things I want to do
that's a whole GAN - I thought you meant encoder seperately
this is an example of something i want to pull of my own
https://github.com/yuval-alaluf/restyle-encoder
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699 - yuval-alaluf/restyle-encoder
@grave frost I want to work on this and improve it too.
this is almost one of the things I'd like to achieve.
somehow it is still not working
can i get advice on this?
am I not meant to hstack them?
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
ah sorry yeah
hm, the vectorizer might still not be emitting the full array
it shouldnt though
make sure you restart your notebook in case there are any typos or something
you can hstack them although i do think ColumnTransformer will be easier to work with
that code you wrote looks like it should work
X_test is 10000x8718 and X is 40000x31765
yeah I decided to hstack because dont understand how column transformer work
yeah this definitely is not what you'd want
i would normally expect this to work...
let me see if there's a missing flag or something
thanks for helping out btw
https://replit.com/@maximum__/count-vectorizer#main.py
yeah, i can't reproduce the problem
this works as expected
right, hstack isn't the problem here
I may have found the problem
anyone here worked with GAN's?
Hey guys , how do I start learning ds
so I guess the order matters
Does ds include ai and ml
what do you mean by that?
you shouldn't even be able to run that code with those lines commented out
restart your damn notebook
and use [''] to get dataframe columns, don't use .
(what happens if you have a column called map?)
other_features = ["n_steps", "n_ingredients"]
features = df_train[other_features]
test_features = df_test[other_features]
name = vectoriser.fit_transform(df_train.name)
test_name = vectoriser.transform(df_test.name)
steps = vectoriser.fit_transform(df_train.steps)
test_steps = vectoriser.transform(df_test.steps)
ingr = vectoriser.fit_transform(df_train.ingredients)
test_ingr = vectoriser.transform(df_test.ingredients)
X = hstack([steps,ingr, name])
X_test = hstack([test_steps, test_ingr, test_name])
y = df_train.duration_label
now it works
restart your notebook anyway
it's highly likely that you just had some other variable name hanging around due to a typo
OH
that's the problem

you need a separate vectorizer for each set of features...
yeah hahahaha
don't re-use it
other_features = ["n_steps", "n_ingredients"]
features = df_train[other_features]
test_features = df_test[other_features]
name_vectoriser = CountVectorizer()
name = name_vectoriser.fit_transform(df_train.name)
test_name = name_vectoriser.transform(df_test.name)
steps_vectoriser = CountVectorizer()
steps = steps_vectoriser.fit_transform(df_train.steps)
test_steps = steps_vectoriser.transform(df_test.steps)
ingr_vectoriser = CountVectorizer()
ingr = ingr_vectoriser.fit_transform(df_train.ingredients)
test_ingr = ingr_vectoriser.transform(df_test.ingredients)
X = hstack([steps,ingr, name])
X_test = hstack([test_steps, test_ingr, test_name])
y = df_train.duration_label
columntransformer will be really useful here
from sklearn.compose import make_column_transformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
clf = make_pipeline(
make_column_transformer(
('passthrough', ['n_steps', 'n_ingredients']),
(CountVectorizer(), 'name'),
(CountVectorizer(), 'steps'),
(CountVectorizer(), 'ingr'),
),
LogisticRegression(C=0.01, max_iter=1000000, penalty='l2'),
)
@cyan lantern something like this
when using np.mean for rows (axis=1), the array has to be flattened.
So, when using np.mean for columns (axis=0), I'd assume array has to be modified (not flattened - but... stood up?)
Is there a term for this?
why does it have to be flattened?
that's what it does by default
numpy knows how long each column is, so it can "step over" the right number of elements in the underlying flat array to do the computation
aah okay, makes sense. Thanks
Please I am stuck here. The task is on the picture. Trying to create an API for an image search. I am done with the ml part and the trained model has been assigned to a variable model
Guys, can anybody tell me what all to study under pandas.
Like what all are important attributes and functions.
anyone know what that notation for matrix M means?
wdym notation?
Hi, so I want to make a program which predicts crypto prices, so what python library should I use for that?. (I am relatively new to ml, I've made a few simple ml projects)
Tensorflow throws an error IndexError: tuple index out of range when I add weights to my validation data
but if you see the last two code cellls
the shapes are correct for both validation and training
SVM_clf_counts = Pipeline([('vect', CountVectorizer()),
('clf', LinearSVC(C=0.1, max_iter=3000)),
])
SVM_clf_counts.fit(X_train, y_train)
SVM_cnt_pred_tr = LR_clf_counts.predict(X_train)
SVM_cnt_pred_val = LR_clf_counts.predict(X_val)
SVM_cnt_pred_tst = LR_clf_counts.predict(X_test)
print("precision on training: ",precision_score(y_train, SVM_cnt_pred_tr, average='micro'))
print("precision on validation: ",precision_score(y_val, SVM_cnt_pred_val, average='micro'))
print("precision on testing: ",precision_score(y_test, SVM_cnt_pred_tst, average='micro'))```
I don't understand what the error is in this code, can someone help
@sharp herald this is "block matrix" notation
P stands for "all the elements of P"
0 stands for "fill with 0s up to the correct dimensions"
so i have this program that checks for the input in the csv file, and then if it is not there, then it writes, but the write part doesn't work for some reason. I also am not getting any errors
you're making predictions with LR_clf_counts, but you fitted SVM_clf_counts, so the LR_clf_counts is probably un-fitted. that's what the error means: you haven't fitted the vectorizer yet, so it has no vocabulary stored.
'w' mode means "overwrite the file if it exists". use 'a' mode to add lines to the end of the file.
also do not open a file for both reading and writing at the same time. you will make a big mess
wait so how do I not open it twice?
thanks
store the rows in a list, close the file for reading, modify the list as needed, then overwrite the file.
import csv
uuu = input('user: ')
uu = input('pass0: ')
u = input('pass1: ')
with open('test1.csv') as fp:
rows = list(csv.reader(fp))
new_rows = []
for row in rows:
if row == [uuu, uu]:
print('nogood')
else:
new_rows.append([uuu, uu])
print('end')
rows.extend(new_rows)
del new_rows
with open('test1.csv', 'w') as fp:
csv.writer(fp).writerows(rows)
admittedly i don't understand what this code is supposed to do, but it looks more or less like what you wrote, but without the chance of messing up the files
note that i do not .append to rows - i .append to a new list. this is because you should never mutate something that you are iterating over
but it still doesn't work
what does "doesn't work" mean?
what happened, and what were you expecting?
note that this is also untested code written by a stranger on the internet, so it could be buggy or incorrect
ok
so i want it to first take an input, then i want it to read the csv, and if it is not in the csv then write it
but its not writing it
try (uuu, uu) instead of [uuu, uu]
i can't remember if csv rows are returned as tuples or lists. probably tuples, so use () and not [].
@desert oar that didn't make a difference
does it never print nogood?
what actually does happen
and how is it different from your expectations?
it might help if you used https://repl.it and posted your code along with an example csv that shows the problem
for the sake of the demonstration, you should save to a different filename so i can see both the inputs and outputs
no it doesn't print nogood
what do you mean by this?
save to test2.csv instead of test1.csv, so that when i run your repl.it post i can re-run it as many times as i want, without overwriting the original file
ok
so it is alrady on replit
@desert oar https://replit.com/@27jkpatel/csv#test1.py
guys, using this api
How can i know the methods?
i havent found documentation anywhere
where is the docs?
are there any recommendations for resources to use for getting started with a.i and machine learning
id_train, X_train, y_train = ftrain_preprocessed['SentenceId'], ftrain_preprocessed['Phrase'], ftrain_preprocessed['Sentiment']
id_test, X_test, = ftest_preprocessed['SentenceId'], ftest_preprocessed['Sentiment']```
I keep getting this error
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Sentiment'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-97-bf964073328e> in <module>
1 id_train, X_train, y_train = ftrain_preprocessed['SentenceId'], ftrain_preprocessed['Phrase'], ftrain_preprocessed['Sentiment']
----> 2 id_test, X_test, = ftest_preprocessed['SentenceId'], ftest_preprocessed['Sentiment']
/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 'Sentiment'```
Sentiment is not a key in ftest_preprocessed then
what do i need to fix in the code?
You're trying to get a nonexisting column of a dataframe.
i think understanding what u are doing first will help tho
Ahh I see, I'll try to see why it isn't there even tho it's supposed to be
thank you
u can print ftrain_preprocessed.keys()
recurrent = keras.models.Sequential([
keras.layers.SimpleRNN(1, input_shape=(None, 1))
])
recurrent.compile(loss='mse', optimizer='nadam')
recurrent.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=40)```is this supposed to take forever
= for ever xd
import numpy as np
import tensorflow as tf
keras = tf.keras
# returns batch_size number of sequences, each of length len_
def gen_time_series(num_instances: int = 32, len_: int = 64):
freq1, freq2, offset1, offset2 = np.random.rand(4, num_instances, 1)
time = np.linspace(0, 1, len_)
series = 0.5 * np.sin((time - offset1) * (freq1 * 10 + 10))
series += 0.2 * np.sin((time - offset2) * (freq2 * 20 + 20))
series += 0.1 * (np.random.rand(num_instances, len_) - 0.5)
return series.reshape(series.shape + (1,)).astype(np.float32)
seq_len = 50
instance_num = 10 ** 5
train_amt = int(instance_num * 0.6)
val_amt = int(instance_num * 0.2)
raw_data = gen_time_series(instance_num, seq_len + 1) # +1 for the instance to predict
X_train, y_train = raw_data[:train_amt, :-1], raw_data[:train_amt, -1]
X_valid, y_valid = raw_data[train_amt:val_amt, :-1], raw_data[train_amt:val_amt, -1]
linear = keras.models.Sequential([
keras.layers.Flatten(input_shape=(seq_len, 1)),
keras.layers.Dense(1)
])
linear.compile(loss='mse', optimizer='nadam')
linear.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=40)
```does anyone know why even when i'm providing the validation data, it isn't showing?
I'm on the hunt for a nice image annotation tool. Any recommendations?
Hi, I have recently started learning data science and have a doubt in pandas. Whats does the describe give.. I mean 25th,50th and 75th one basically i didnt understand...The rest i understood...just those 3 i didnt get ?
They are percentiles. Sort the data, find the value 25% into the list, that's the 25% percentile
These are sorted values..okay...so 25th perc will be (28+29)/2 i.e 29.5 correct
So what having 25th perc as 29.5...what does that mean compared to 29.5 ??
i mean values will be around,less or more than percentiles in Age..?
Another term for them is quartiles. They are cut points in the distribution such that a quarter of the values are below the 1st quartile (25%), half the values are below the 2nd quartile (50%), and so on
so lets say my 25th perc id 29.5 and 25th perc comes as 2.75 so i can say that either 2 or 3 values to left of 50th perc will be less than 25th perc value i.e 29.5 correct and same applies for 50 and 75
so ok this are percentiles not percent 😃
goti it
thanks a lot @near cosmos 👍 😃
Would anyone know how to make it so the graphs are transparent? I haven't been able to figure out a way to do this and any help would be highly appreciated. Thank you.
oops, my own heatmap didnt upload
import os
import json
import sys
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.image as mpimg
import sqlite3
import seaborn as sns
map = sys.argv[1]
file = f"./{map}.csv"
df = pd.read_csv(file, header=None, usecols=[0,1])
print(df)
map_img = mpimg.imread(f'{map}.png')
hmax = sns.kdeplot(df[0], df[1], cmap="Reds", shade=True, bw=.15)
hmax.collections[0].set_alpha(0)
if 'metalworks' in map:
xmin = -3034
xmax = 3374
ymin = -6699
ymax = 4939
elif 'product' in map:
xmin = -2859
xmax = -171
ymin = -3668
ymax = 3776
elif 'process' in map:
xmin = -5222
xmax = 5216
ymin = -3146
ymax = 3128
plt.imshow(map_img, zorder=0, extent=[xmin, xmax, ymin, ymax],resample=False)
plt.savefig(f'{map} heatmap.png', dpi=1200, transparency=True)
plt.show()```
here is the relevant code btw
@steel hill you might have to define your own colormap that has transparency, or otherwise need to find a way to set the "alpha" channel for the colors to something less than 1
import tensorflow as wtf
Hi guys, I want to do time series model prediction but I am wondering how I can treat the skewed data here?
is log(UnitPrice) less skewed? that's usually a good place to start. you can also consider the more general family of box-cox transformations, or a transformation using the inverse hyperbolic sine function (https://en.wikipedia.org/wiki/Inverse_hyperbolic_functions#Inverse_hyperbolic_sine), arcsinh(t*x)/t, where t=1 is the standard arcsin function
for the box-cox transformation you can refer to the scikit learn library.
percentiles
numbers that leave x% data to the left side and the rest on the right
how can I use matplotlib in vs code
Like in anything else? Not sure what you're asking.
I am getting module not found when running my code @tidal bough
@tidal bough this is what I mean
are you running on a virtual env? if so check your installed packages. if not just install using pip.
no I dont think so
is there any way I can check if I am on a virtual environment?
I think your using the default. I would suggest use notebooks instead when learning. (colab notebook/kaggle/jupyter notebook/jupyterlab).
install matplotlib using pip install matplotlib.
then import matplotlib.pyplot as plt
i have already installed matplotlib (but my environment is the default one)
do i run that in the command pallete?
One of the downside of using the default is mixing up packages from your other projects that may result to some conflict.
yes run it in the terminal
pip install matplotlib
import it in your python script using import matplotlib.pyplot as plt
i have some of my other projects which I made in the normal python environment, will that cause some problems?
Good evening has anyone read google mu zero paper here?
this is not working
ayyy don't abuse ma boi like that
yes; just plot to the same figure repeatedly.
can you give me a basic example or somthing to read?
https://stackoverflow.com/a/33050617
This gives some important info, and here's a working example:
import time
import matplotlib.pyplot as plt
fig = plt.figure()
for i in range(100):
plt.scatter(i,i,figure=fig)
time.sleep(0.1)
plt.pause(0.01)
ok thanks
this is what i have so far
import matplotlib.pyplot as plt
import random
y_data = []
average = 0
for i in range(0, 60):
y_data.append(random.randint(0, 100))
for i in y_data:
average += i
print(y_data)
print((average/60))
plt.plot(y_data)
plt.ylabel("Jason's Gay Percentage")
plt.xlabel("Seconds")
plt.show()
kill this heretic! Those who don't believe in the gospel of Google are condemned to the worst depths of pytorch! 👺
to classify almost 1000 classes, how many images do i need per class?
1
theoretically
practically? as much as you can store
number of images itself isn't that important. what you need is various inputs for each class. if you have 100 data points for class 532, but they are all very similar to each other, that isn't much better than having just 1 data point for that class.
you also need to worry about overall class imbalance - if some classes are much more common than other classes, the model can get pretty good accuracy by simply never predicting the rare classes
isnt amount of images = various inputs per class?
ah, u mean how different are between each other?
@desert oar you forgot data quality, noise, model architecture, gpu memory, bank account
i am building my own data set with google image search api, so i was wondering for a number
there is no specific number. more is better. if you have a lot of features in your data, you will need more data points to cover the feature space.
yeah, this is what i meant
with image classification, you can use data augmentation to help with this somewhat
ist just me, or is finding people with knowledge in multiple domains difficult AF?
or is signal processing + AI a niche job in general?
ye i know
also... is there any argument in keras to, lets say, augment data coloring it?
like if i have a red car, apart from rotating scaling flipping etc, pait it blue? or at least add it blue color? or something?
check out imgaug lib. it has enough augmentations to last you a lifetime
@lapis sequoia So, you need to peek the red car and make it blue?
from keras?
github - not integrated directly in keras
yeah like if i have an image, multiply it by some color
im pretty sure that's not how colored filters work - or do they?
i think yes
I would think scaling pixels to values for a particular color
like if blue is 0-10, then all image values would be scaled in that range
im sure opencv has this stuff
that said, the imgaug library does have this functionality already @lapis sequoia https://imgaug.readthedocs.io/en/latest/source/overview/color.html
maybe photoshop "color" blending mode is what i want
did someone need me
no i tagged the wrong person, sorry
oki
I suggest you to go with something more advanced (not necessary more complex) than python
Like Mathematica
It has a function that does what you need
this is how i wanted to augment the images
okey, ill take a look at imgaug
but since keras provides a generator u can pass to the fit method after
can i pass the analogue of imgaug?
i imagine keras gives you some way to write your own generator
keras ImageDataAugmentation class has
brightness_range=None,
can i somehow add the color one?
Seriously, I suggest you to write a short wolfram script that augments your data, put it in a folder, and then do all the ml in keras
i dont wanna write on disk all the augmented images
i only wanna have the basic ones, and during the train, provide the augmented ones, like u normally do with keras
You can keep them on memory and then pass all the data to python via mathlink
validation_data = validataion_gen,```
But, if you do the ml part on wolfram (faster and easier than keras) I think that you can do data augmentation on the fly
validataion_gen = data_gen.flow_from_directory
im not sure why learning mathematica is a better option than using a library in python if you're already using python for other stuff 🤷♂️
horizontal_flip = True,
vertical_flip = False,
brightness_range = (0.5, 1.6),
rotation_range = 11,
validation_split = 0.17)```
in the end, what u provide to fit, is a generator
cool that you can do neural networks in mathematica though
definitely a powerful tool
Mathematica has the stability and the coherence (sorry for spelling) that no other has (mathematica get designed by the ceo since the 80')
i remember i had a license for it when i was an undergrad, through my school. but i didnt really have a use for it then and didn't have the patience to learn the language
you spelled everything correctly
the ideal was to have exact the same thing as this, but with an extra option saying add_color = (255,0,0) or something xD
oh
look what i found
@desert oar By the way, I think that mathematica is better than python when doing research or training models, but I also think that the best comes when you take what you researched on mathematica and take it to python (or other languages for production)
from imgaug import augmenters as iaa
seq = iaa.Sequential([
iaa.Fliplr(0.5), # horizontally flip
# sometimes(iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05), per_channel=0.5)),
iaa.OneOf([
iaa.Sharpen(alpha=(0, 1.0), lightness=(0.75, 1.5)),
iaa.Emboss(alpha=(0, 1.0), strength=(0, 2.0)),
# iaa.Noop(),
iaa.GaussianBlur(sigma=(0.0, 1.0)),
# iaa.Noop(),
iaa.Affine(rotate=(-10, 10), translate_percent={"x": (-0.25, 0.25)}, mode='symmetric', cval=(0)),
# iaa.Noop(),
# iaa.PerspectiveTransform(scale=(0.04, 0.08)),
# # iaa.Noop(),
# iaa.PiecewiseAffine(scale=(0.05, 0.1), mode='edge', cval=(0)),
]),
sometimes(iaa.ElasticTransformation(alpha=(0.5, 3.5), sigma=0.25)),
# More as you want ...
], random_order=True)
datagen = ImageDataGenerator(preprocessing_function=seq.augment_image)
I don't know why there are people in weird places encouraging niche languages to newbies for no good reason
can i somehow keep the default augment params?
what do you mean by this?
like the guys at one server trying to get someone to write a NN in FORTRAN and x86
i think here, when u call datagen, only the changes u wrote, like gaussian blur, sharpen, etc, will be applied to images
Mathematica is not niche, it is used in the largest universities and research institutions, it also has a completely different approach, that makes it easy to learn and fast (to use)
this ones are gone
But I think that's not the right channel to talk about
oh yeah, newbies should use mathematica instead of an already defined, well-maintained lib with a single language??
brb, gtg
Mathematica is well defined since the 90s, and get maintained by an extremely large and professional team
it is also used in large scale production
(even alexa is in part powered by mathematica)
I think that mathematica is better than python when doing research or training models, but I also think that the best comes when you take what you researched on mathematica and take it to python (or other languages for production)
most research uses JAX and python tho? if it is indeed maintained by such a big team, it definitely doesn't convince many in research or in Applied ML
most products use a mixture of multiple languages 🤷
Oh no... Religious wars...
By the way, mathematica neural network framework is powered by mxnet
That is heavily used in production and research
Also many of fortune 500 companies actively uses mathematica for research
what? mxnet?
Yes
is it even maintained?
Yes
But I think that wolfram has it's own branch
I also think that in order to have a productive conversation you should take a look about who they are at Wolfram Research
What they did, what mathematica can do, etc.
I mean, I don't even have to argue how much of the industry uses mxnet
Sorry, but I don't even remember the point of the conversation
I'll make you a recap:
Mathematica is the world's fastest language (not performance, speed of coding)
It (with MatLab) is the industry standard for research
Top universities, companies and institutions uses it
It powers large scale productions systems
i think the point is that this is a python server, and most newbies here can barely use python, let alone do serious machine learning or understand the math that goes into it, so recommending that they use mathematica instead os not really helpful to those people
I think you're right
its definitely an interesting topic though
i know there are people who really love mathematica
where have you seen it used in industry? finance?
i know it has some very powerful symbolic math capabilities
i definitely used it to try and figure out homework answers in college
didn't always though...
Now we have wolfram|alpha for that 😉
yup, very handy tool
it was also useful for quick plotting when i needed intuition about how a function ought to work
Definitely
Hello, I'm trying to code a multiple linear regression model myself without using any libraries except numpy. But even after a lot of epochs, my accuracy is stuck at 38%. Using sklearn's linear regression gives 94%. I'm guessing that while moving down the cost function using gradient descent, my algorithm is stuck at some local minima. Any way I can confirm that and if it turns out to be true how can I get out of that local minima and move towards global minima? Thanks
fwiw, you could mostly make the same arguments for python (or several other environments)
Well, python alone is a small, fast and flexible language
So, without package it cannot do as many things as mathematica
the problem is
That since the developers of the packages are different
Many package wouldn't perfectly fit
So, while python is better for production
Mathematica is better for research
They have different purposes
calculate loss on a grid instead of doing a gradient search to look at the shape. or change your gradient descent parameters and try again and look for changes
If you have few points, you can also exactly calculate the correct (best fit) parameters by solving the normal equation.
True enough. My point was that your last three arguments also are arguments for python 1) it is an industry standard, 2) top places use it, 3) it is used in production systems
Oh yes, I think I did not explained myself correctly
I meant that with mathematica has born for doing so
So it is designed to be more powerful in research
It also comes with a big set of tools
yeah, the best fit parameters calculated using sklearn are slightly off from my custom coded regression
I'm sorry I don't know what a grid search is, haven't reached that part yet, still a beginner
the point is that you don't need gradient descent. you can find the best fit parameters analytically. but my sense was that you were trying to understand gradient descent
Basically make up a whole bunch of parameters (say, equally distributed on a grid) and for each set of them, calculate the error
that'll allow you to look at how the error looks like depending on the params
yes, exactly
oh okay, lemme try this, thanks
strong +1 recommendation to start with least squares
can someone explain the road map or share a useful link to learn data science and machine learning . I am total beginner with no proper guide.I know c++ and python. Should i learn django network & otherskills?
study some math is a good idea, data science is to undestande and build better models
https://www.coursera.org/learn/machine-learning is a very popular free course. Sadly, it's not in Python (it uses Octave, basically free Matlab), but it mostly focuses on the internals of some common algorithms, so the language doesn't matter much.
(it also teaches you some of the linear algebra required if you don't know it)
umm, can you please elaborate a bit more.
so guys, how can i add another transformation to ImageDataGenerator from keras?
Hello? Can someone help me in basic python console?
i recommend starting by implementing least squares regression. it's easier to program, you get an exact result rather than a local optimum, and it's a good excuse to dig into the linear algebra and optimization problem a bit more deeply than gradient descent
would you have any idea how to get started on that? or perhaps maybe a repository of color maps that would have transparency?
Suggestions for building a dashboard? Bokeh? Plotly? Something else?
hey guys there is someone who can help me with python pandas
Just ask your question
i need to fix my timestamp on pandas i get wrong value when i do pd.to_date
i convert timestamp to date UTC
and it doesnt work well when i do other thing its just gives me an error
@teal wadi it helps if you share your code and the specific errors or unexpected output
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
hi guys, i have a question about keras
keras ImageDataGenerator class provides some of the basic transformations to increase data amount, but i am wondering how can i add my own transformation
In this case, i wanna change the color. Ive seen imgaug library has it, but i dont know how to use it with keras
Can someone help?
Hey @native ginkgo!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:
• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
import pyaudio
ModuleNotFoundError: No module named 'pyaudio'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:/Users/Vandana/Desktop/vansh coding/discord bot/alex.py", line 45, in <module>
File "c:/Users/Vandana/Desktop/vansh coding/discord bot/alex.py", line 28, in commandlistener
with sr.Microphone() as source:
File "C:\Users\Vandana\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\speech_recognition\__init__.py", line 79, in __init__
self.pyaudio_module = self.get_pyaudio()
File "C:\Users\Vandana\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\speech_recognition\__init__.py", line 110, in get_pyaudio
raise AttributeError("Could not find PyAudio; check installation")
AttributeError: Could not find PyAudio; check installation
PS C:\Users\Vandana> pip install PyAudio
Collecting PyAudio
Using cached PyAudio-0.2.11.tar.gz (37 kB)
Building wheels for collected packages: PyAudio
Building wheel for PyAudio (setup.py) ... error```
it is saying that pyaudio is not found on comp
and when i am trying to pip install
it is also showing error
also for linear regression the loss space has only one minima, so you have some bug in the training loop
oh yeah, hpw could i forget this. Thanks a lot.
anyone free for me to message them? I have some general questions about a few topics regarding data science and the types of way to do them in python
Go on
any data science course you people recommend? i can see one in pinned messages from columbia's ML
hey guys, im new in data science and im having a hard time getting the newest googlebot string.. could someone help me with that please?
hi! can someone recommend me a good tutorial for generating text with keras ?
i need help with translating one of the old crypto hash function algorithms from C code to Python... can anyone help?
i need to fix my timestamp on pandas i get wrong value when i do pd.to_date
i convert timestamp to date UTC
and it doesnt work well when i do other thing its just gives me an error
DM me if someone can help me with it i can share my code and hope for help
ValueError: shapes (1,3) and (4,4) not aligned: 3 (dim 1) != 4 (dim 0)
Is the error I am getting again and again I have checked my code and I know hat it means But I am ubale to solve it
Try MIT's Intro to deep learning http://introtodeeplearning.com/
hi guys, i have a question about keras
keras ImageDataGenerator class provides some of the basic transformations to increase data amount, but i am wondering how can i add my own transformation
In this case, i wanna change the color. Ive seen imgaug library has it, but i dont know how to use it with keras
Can someone help?
actually
is this what i want?
when u extend ur custom class from ImageDataGenerator, does it still have the augmenatations from ImageDataGenerator?
Yes, right?
hey guys
so im planning to pick up ai and machine learning
but i have 0 experience with pandas whatsoever
or any other library needed
what book should i read?
guy i have this Salary dataset , it has like 30 samples and only one feature , so its shape is (30, 2) . i took last 2 samples as test data . i tried fitting it and when i see the score for training i get 94. but when i see the score for testing i get -131. can someone explain me why ?
start with ml for stanford in coursera , then watch tutorials on yt
Hello! I want to try do a sentiment analysis project, I found this tutorial https://github.com/bentrevett/pytorch-sentiment-analysis
Can anyone familiar with the subject look at the tutorial and tell me if it's any good? Thanksies ❤️
okay, thanks. will check it out
it has 3k stars 🤷 so must be good
Brain wrenching question 🧠 : For calculating attention on a multi-sequence input (as in rather than having a single 1D sequence of tokens, we have a 2D/3D array of tokens that all are considered to be a single sequence) is there any such method/technique/research that has been done into this? I can't seem to find relevant stuff.
I am trying to learn Machine learning, and I am confused about which path to go down. There is a Coursera course by Andrew Nag, but it does not teach it in python. Then there is a playlist on machine learning by Sentex (the YouTuber).
https://youtube.com/playlist?list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v
Other than that, there are machine learning modules like TensorFlow, Keras, PyTorch, etc. I am not sure which path to choose for moving ahead. You all are experienced than me, what do you suggest?
Note: I am not learning this as a hobby. I want to get a master’s degree in Robotics, and my college does not teach any of this stuff (I am an Undergraduate currently).
The playlist is symbolic
tensorflow is like assembly, keras like c, and pytorch like python 🙂
is like, different levels
this didnt made any sense , can you explain more ?
guy i have this Salary dataset , it has like 30 samples and only one feature , so its shape is (30, 2) . i took last 2 samples as test data . i tried fitting it and when i see the score for training i get 94. but when i see the score for testing i get -131. can someone explain me why ?
Hello, I just started learning some basic ML using sklearn. I was wondering if I could make a similar app to the one in the documentation ( https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html), but with my own dataset ? : )
That didn't exactly answer my question
Hi Guyz, This is my dataset and i need to get new dataset as decades but i want to add all colum values in that . Like say 1960-1970 is decade so in 1960 as first value in first colum and in second colum i need the sum of all values from 1960-1969... same for value 1970 as first colum value and next colum will have sum of all values from 1970-1979.
I tried groupby
googled but i am not getting any tresults
is there any inbult method
Like This.... Like in vehicle theft 0 index has sum of values from 1960-1969..Like This
rip fcb
Use for loop in range
firstly find the index of last year of each decade using .loc then do sum within for loop
that should easily solve your problem
You mean index slice those 10 values and take its sum ?
ok... but any ib method using groupby or agg.. anyone knows ?
will write a loop..but in genral asking ?
Why do you need to use that?
well you can use agg function with lambda
but basically it’s same
for simple line you could do agg + lambda + for in one line
hi yall , i'am dealing with covid data analysis , i 'm using this dataset https://raw.githubusercontent.com/beoutbreakprepared/nCoV2019/master/covid19/data/clean-outside-hubei.csv but i get this result (i dont understand why the shape it like with one column?) py sys:1: DtypeWarning: Columns (1,2,6,7,9,10,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,30,31,32) have mixed types. Specify dtype option on import or set low_mcify dtype option on import or set low_memory=False. (2676313, 1)
import pandas as pd
df = pd.read_csv('data.tar\latestdata.tar').shape
print(df)
Oh yeah loop is giving me summed values for 1960 to 1969 and then i will append this summed value to index 1960 :sum(1960-1969) and so on for years....
but i have to create other frame and append the years and summed values for ranges and append... Thanks anyways 🙂
hey guys having abit of a roadblock using pandas here
print(df.sort_values(['Name','Speed']).iloc[0:15])
its sorting Name but not Speed
can i know why?
god this cuda annd cudnn are killing me
3-5 months ago i tried installing and using them
i changed my drivers with them
i failed but my drivers were still cuda
my c drive storage died after installing them
and now when i try to use them i realize they are not updated and i need to reinstall
aghhh
Yup definitely not an efficient way of doing it but so long as it works, it’s fine lol
yeah man i too wanted to avoid that..thats why asked any ib method or one liner method.. but yeah not everything comes inbulit.. sometimes we toj have to implement our own logic 😊😅
using Google image search api, how can i search for more than 1 param?
'imgType': 'lineart|photo',
this sais it is not allowed
but 1 by 1 i can
Try using ascending = true in the brackets
in what way is it not sorting speed? I'm pretty sure what it's supposed to do is sort by name, and then sort by speed only when two values in Name are equal. All your names are different.
Yo I always think on weekdays that “I’m so excited to do my side project on the coming weekends” only to realize that I’m a lazy squidward lying on the bed on weekends
quick question.. does anyone here use lightfm at all ? I am trying to use the beta distribuition as a normalization(https://www.reddit.com/r/statistics/comments/4svy2e/how_would_i_normalize_product_review_ratings/d5daucj/) and I am not really understanding the beta and alpha in this case
@upper spade
>>> df
letter number
0 a 5
1 z 6
2 a 10
3 a 1
>>> df.sort_values(['letter', 'number'])
letter number
3 a 1
0 a 5
2 a 10
1 z 6
lightfm allows an alpha and user_alpha, which are L2 penalties.. but I don't quite get how to get the alpha and beta described in the post
You know what that reminds me of how simple and efficient python is. That sort by multiple criteria stuff is not that simple in vba
or even in excel formula it needs something like RANK()+SUMPRODUCT()
this is pandas-specific functionality, though the key parameter for list.sort and sorted works similarly if you pass a tuple.
Ooh
hello everyone!
I wanted to implement github repo : RankIQA based on Caffe
I am facing trouble in installing it in windows 10
I installed using Anaconda but im unable to import caffe
Can I install Caffe on Google Colab
Pls guide me as I sense the Caffe community is not much active, I commented on issues of the official repo but got no replys.
why are you using anaconda?
that might have been true in the past, but there's a lot more community support available if you don't use anaconda at all.
the data science ecosystem accomodates linux much better than Windows, but you should be able to get by on Windows if you have the C++ build tools installed.
!build
Microsoft Visual C++ Build Tools
When you install a library through pip on Windows, sometimes you may encounter this error:
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
This means the library you're installing has code written in other languages and needs additional tools to install. To install these tools, follow the following steps: (Requires 6GB+ disk space)
1. Open https://visualstudio.microsoft.com/visual-cpp-build-tools/.
2. Click Download Build Tools >. A file named vs_BuildTools or vs_BuildTools.exe should start downloading. If no downloads start after a few seconds, click click here to retry.
3. Run the downloaded file. Click Continue to proceed.
4. Choose C++ build tools and press Install. You may need a reboot after the installation.
5. Try installing the library via pip again.
thanks will try it
I think you can still pip install packages on colab.
guys, since google api forbides too many requests per day to it, can any of u help me? I am trying to create a big dataset
does anyone here work with reccomendations engines? I have been trying to increase the fit of my model with various normalizations and it doesn't seem to do anything, was wondering if someone could help me out quick
hey i need help making a racial detection robot in python
This is...oddly specific
Hello! Here's my bachelor's thesis on privacy-preserving federated learning on decentralized data. I have made it open source now, and I would love it if anybody here could try it, give feedback, or contribute in any way. The goal is to make an open source library for doing secure federated learning using different privacy-preserving algorithms in an easy and efficient way.
It was written for Norwegian University of Science and Technology as a part of my degree in Computer Science.
Contact me if you want to know more about the research, and please ⭐ the project if you find it interesting!
https://github.com/dilawarm/federated
i made a ethereum and dogecoin comparison graph but ethereum seems to be very linear since it has such a high value compared to dogecoin so its hard to compare the both together. what potential improvements should i make on this graph? thanks for the feedback if so
also if youre wondering, this is purely experimental, i just wanted something to plot and decided to analyse cryptocurrencies
you could try minmax-scaling each crypto
to their corresponding ranges
but what exactly do you want to compare? Rate of change per time period?
i wanna compare their value throughout the past month
yeah basically
then compute that change
wdym
the change is simply change = x - x(-1)
i plot the data i just need to scale it
or percent cahange
think carefully about what you want to do, what exactly do you want to plot
that's ok, but the most important question is -what- is what you want to see
everything builds from there
yep
what syntax do i use for that
but the most important question is -what- is what you want to see
in relation to what?
because i cant have it going at constant rate up
to itself? to other?, etc
in relation to doge
because if i plot it individually
it shows a lot of movement
but when its in relation to doge it just goes up at a constant rate
shall i show a depiction of what i mean?
yes please because im not sure i get it
im thinking what you want is a common standard regression but i might be wrong
one sec
oh my bad
i meant dogecoin
ethereum is always linear, even when plotted individually
heres dogecoin when its plotted by itself
so you just want to plot them...together? no specifics analysis or anything?
i mean i do want analysis which is why i want to make it so i can see it going up or down
instead of it just going at a straight line
http://surpriselib.com/ so looking at this framework, it doesn't use gpu right?
Id say your best is to calculate the pct_change for each crypt
pct_change from x0 to x1 for each crypto, since that would be standarized
so i have to plot in the y axis value manually?
i got it from a csv file
you need to calculate pct change then
dang discord is not loading
or i could just analyse another set of data
because ethereum has a huge value difference
just normlaize your data
its pretty trivial -> pct-change = (current_value - previous_value) / previous_value
average has nothing to do with pct_change
I'm not, i'm just wondering why you mentioned something unrelated
as i said, I think I havent quite understood WHAT is what you want to see
i just needed to know how to make it so i can see 2 different datasets normally without it being too linear
your plot only looks linear because your Y axis is strange
makes sense
thing is
but your chart is a weird thing that goes from 2.7k to.... 2.7k?
and dogecoin has a pretty low price
at least ive never seen that
im analysing dogecoins price
do you experience with financial analysis?
covariances? correlations? Betas?, etc?
Python expertise is not related to subject matter expertise.
if your financial expertise is solid then you just need to think of what kind of visualization you need for your analysis
sorry discord is being slow rn
not really im just a beginner hopping onto the data analysis rabbit hole
then you need to get some subject matter expertise before attempting to code it
i said this for a reason
not meant to be financial i just needed something to analyse
and decided to use cryptocurrencies as an example
Oh so you're just using financial data to learn how to plot in Python? is that correct?
yup
that would have been a lot faster to say lol
im just learning how to manipulate and plot csv files
ok 1st recommendation
Learn Pandas
what plotting library are you using? Matplotlib?
Do you know Pandas?
and pandas as well
i just used pandas to make it read the csv file so i can manage it
ok perfect
make a copy of your dataframe so you dont have to read it again in case of errprs
good, so, before going to visualization, try tampering a bit with your data
df2 = df.copy()
on that 2nd dataframe, try calculating pct_change for each crypto
and plot that, NOT the values
you can also find their correlation, etc
maybe fit a linear model and calculate the RMSE (root mean squared error) of trying to predict ETH through Doge
most of the time i was just trying to find the proper csv file to use
and i am actually going to do predictions soon but thats for another time
for now im doing something as simple as plotting 2 datasets together
?
like
no...the pct_change
wha does this mean?
like i plot a percentage of the value so it doesnt have a huge value difference
so i can actually see the graph move better
but i feel like that wouldnt be such a great idea anyways
thing is how do i use that syntax
what information do i need to put in it
actually i wont bother you and try to read the documentations to get a more vivid understanding
that's the correct approach for any code question
if something about the documentation is not clear, then you can specifically ask that and it will be much easier to help you 🙂
aight
sounds great
shall we dm just in case because i feel like we've flooded the chat way too much
anyways gonna go afk
oh btw sorry about the confusion, the reason why the y axis is messed up is because its values are made up of the ethereums prices, which is why it keeps going straight
guess i might need to set some index on the y axis
How do you add a softmax regression layer to an RNN model with autoencoder? using TensorFlow
I implemented the class to encode/decode and call
but I don't really understand how to take that output and add a layer on top of it
I generally recommend always looking at the raw values also. You'll learn things even if that's not the conventional view of the data.
how can I use numpy.isin for 2d arrays? like if I have a [[0,1], [1,2]] and I want to search for [[0,1]] so that it returns [True, False]?
Raw values should be observed? Not sure what this means aside from that
Obv you must always checl your raw data first 😋
Yes, that's what I mean. For a newbie, maybe it's not obvious
can i block a response from the package in chatbotAI like it keeps asking about my family n i wanna remove that response
I have a basic question but not sure exactly how to ask but here goes. I have 2 data sets one set is a set of successful transactions and the others are where I had a failure. The failures don't seem to have a pattern that i can spot visually. How could I figure out what combinations of factors and the order of that combination that led to the failure programmatically or using ML?
Can anyone help?
use ==
!e
import numpy as np
a = np.array([[0, 1], [1, 2], [2, 3]])
b = np.array([0, 1])
print((a == b).all(axis=1))
@velvet thorn :white_check_mark: Your eval job has completed with return code 0.
[ True False False]
just call that output with another layer
do you have any ML experience
I am trying to get some experience with it, so this was basically my first project that I chose to try and understand.
Mainly trying to find out where to start looking to be able to solve this problem.
so that's a no?
i.e. you don't know how to perform EDA, preprocess data, model relationships, etc.
I am trying to go through a coursera course on ML, but yes the answer is no
That is what I am trying to do
Hey, is it possible to use a stacking algorithm on just 3 inputs?
Like, I have 3 inputs(predictions) [0.999854,0.9894, 0.97802734375] and somehow I need to get a better prediction that the best one, in this case 0.999854
any good data analyst here?
I'm working with RFM Analysis
-------------------------------------
| Quantity | UnitPrice | Invoice |
-------------------------------------
| 6 | 10.05 | I105 |
| -2 | 10.05 | I105 |
| 3 | 12.36 | I107 |
| -1 | 12.36 | I107 |
-------------------------------------
It seems I105 and I107 has returend their order
so i must count them in monetary analysis ?
hey anyone knows a good docker containerfor data-science?
I don't know why I always manage to find weird use-cases that no one ever implements 😦
Anyone know anything about computing attention on multi-dimensional sequences?
i have a doubt on lstm neural networks.
From what I saw lstm is a great algorithm for time series forecast prediction.
But when dealing with something like the stock market we can't be sure when the market might go down or go up, then how come lstm makes such accurate stock prices predictions?
How is that done in practice?
Currently my autoencoder looks like this:
class AnomalyDetector(Model):
def __init__(self):
super(AnomalyDetector, self).__init__()
self.encoder = tf.keras.Sequential([
layers.Dense(64, activation="relu"),
layers.Dense(32, activation="relu"),
layers.Dense(16, activation="relu"),
layers.Dense(8, activation="relu")])
self.decoder = tf.keras.Sequential([
layers.Dense(16, activation="relu"),
layers.Dense(32, activation="relu"),
layers.Dense(64, activation="relu"),
layers.Dense(79, activation='sigmoid')
])
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
Do I just make the last decoder softmax activation? The article I'm implementing describes this as a separate step from the autoencoder
Hey @blissful heath!
It looks like you tried to attach file type(s) that we do not allow (). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
How do I turn this Text into a table? I opened TXT with Pandas, but it organized in a standard way, the big problem that this file does not have a delimiter. I thought of a logic according to the position where the characters of each line start (example: the main column comes from lines that have their first character in position 8, while the secondary column comes from lines where the characters start in position 15. But I am not able to develop the logic, can someone help me?
My goal
I have converted the model to a .tflite but am unable to predict using it
this is what i used to convert the model
converter = tf.lite.TFLiteConverter.from_keras_model(newmodel)
tflite_model = converter.convert()
with open('numeric_values-model.tflite', 'wb') as f:
f.write(tflite_model)```and this is how i predicted in colab
```to_predict=tf.constant(np.array([[2.0,2.0,6.0,6.2]]))
predictions=newmodel.predict(to_predict)
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
for prediction in predictions:
index=np.argmax(prediction)
print(SPECIES[index],prediction[index])```want to be able to use it with tflite
don't know how to do so
Does anyone seen this error before. I am using dask and Im trying to use the countvectorizer but when I fit_transform the data I get an error and I tried a lot of stuff to solve it but no luck
smokingDeaths = fatalities[(fatalities['ICD10 Diagnosis'] == "All deaths which can be caused by smoking") & (fatalities['Sex'].isnull() != True)]
smokingDeathsMaleYears = []
for each in smokingDeaths[smokingDeaths['Sex'] == 'Male']['Year']:
smokingDeathsMaleYears.append(each)
smokingDeathsFemaleYears = []
for each in smokingDeaths[smokingDeaths['Sex'] == 'Female']['Year']:
smokingDeathsFemaleYears.append(each)
smokingDeathsMaleValues = []
for each in smokingDeaths[smokingDeaths['Sex'] == 'Male']['Value']:
smokingDeathsMaleValues.append(each)
smokingDeathsFemaleValues = []
for each in smokingDeaths[smokingDeaths['Sex'] == 'Female']['Value']:
smokingDeathsFemaleValues.append(each)
plt.plot(smokingDeathsMaleYears, smokingDeathsMaleValues, label = "Male")
plt.plot(smokingDeathsFemaleYears, smokingDeathsFemaleValues, label = "Female")```
This is the code I've used to plot the graphs
but the lines are not getting plotted on the same scale for some reason
oh yeah thats similar to what happened to me
i still havent found a solution to it
the values in the y axis arent in proper order
@humble nest I found the problem with mine though
the Y values were actually strings
the problem was fixed when I converted them into integers
OH
no wonder the code wasnt detecting it as being numbers
makes sense
thanks for the solution btw
Hello house, anyone has link to a telegram chatbot code? I was asked to build one for something, so I was hoping I could just edit the code and stuff.
Thanks. 🙏🏿
as a matter of code quality, I don't believe there's any benefit in this case to copying each data point over to a Python list. Simply passing expressions like smokingDeaths[smokingDeaths['Sex'] == 'Female']['Year'] to plt.plot should be sufficient, though you can also do list(smokingDeaths[...]['Year']) to get the same effect as your append for loops.
Of course
Thanks, I was thinking of doing it like that as well, but I had to change it back to storing them in variables to see what was causing the error after it didn't work out initially
smoking_death_female_year = smokingDeaths[smokingDeaths['Sex'] == 'Female']['Year'] would work
Right,
that does look more readable
But yeah I'm just trying to make something work at the moment to reach the deadline,
considering the fact that my code is probably not going to be read at all
is there a way to index what rows have the same values for X columns and dropping them from a dataframe?
Everything im seeing uses a loop which looks messy and can take ages depending on how large the df is
I think there is
Can't remember what though, but there should be a function for that in Pandas
Can you provide me with a csv as text (no screenshot) that I can copy and explain in more detail what you're trying to do?
Please ping me when you do this or I will not know that you have done it.
I'm not sure if you're still here. If you know how to use masks in pandas, this will help you figure it out: https://stackoverflow.com/questions/22701799/pandas-dataframe-find-rows-where-all-columns-equal
Hey @dapper halo!
It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
ah pooo it didnt like it
💀
You have to copy and paste it as text into the chat
How big is the file
I only need a sample
but you can always just take like 10 rows
work with it
apply it on the whole
yeah
whatever print(df.head().to_csv()) prints out basically
Unless you have 300 columns per row for some f'ed up reason
Nh,Redshift,Metallicity,Density,N_SiII,N_SiIII,N_SiIV
15,0.25,-2,-1,12,14,13.5
15.5,0.25,-1.5,-2,12,12,13.5
16,0.25,-2,-2.5,12,12,12
16.5,0.25,-3,-1.5,13.75,13,14
@serene scaffold
while I saw this, adding a mention to a message after the fact does not trigger a ping.
Man my incompetence is shining today
Don't worry about it. So, which are the columns in question?
the N_Six
id like to ignore Nh, Redshift, metallicity, and density.
Only focus on the last 4 or just any grouping
- look into
DataFrame.eq - pick one arbitrary column and see if the other two are equal to it. since they all have to be the same, it doesn't matter which you pick
yeah. I'll set the N_Six to some threshold of 12 or whatever. If all of those columns have the same value the network cant train on em
also look into .all
thank ya thank ya. Ill check em out
the last three were N_Six...one is Silicon II, Silicon III, Silicon IV
x just meant which state of silicon. but yes
I see
Makes sense
@serene scaffold Is it just me or do you sound like you've worked as a professional recruiter once in your life
Why do people say that one can learn ML/AI without being good at math. That seems absolute BS. I'm trying to understand what the hell maximum likelihood is for logistic regression and it's taken me hours and I still can't quite wrap my head around it.
Not to be too blunt, but yeah you do sound kinda Pro
Because StackOverFlow my brother
Also, I think everybody's good at math
they just might not know how to apply it though
If you can do Multiplication, Division, Subtraction, Addition
your brain's pretty capable of applying mathematical concepts to solve problems
A professional recruiter? No.
whoever says that doesn't know ML/AI at all then 🤷 easy way to weed out the scammers
StackOverflow my brother
if you find yourself having to deal with some math in ML/AI
just post it on SOF
Knowing everyone looks stuff up on SO makes me feel better about having to look up stuff on SO. But it also makes me worry that I do not truly understand what I am doing.
Most people who say that already have a background in math and feel its somewhat trivial, or they're just bullshitters lol
but aside from that, most things in the math (that i've seen) is just keeping a cool mind and figure out the logic of how things should be
for example for backpropagation, which is just partial derivatives to minimize the cost function at the end of the NN. Conceptually it's "simple" but applying or building that for M layers of N neurons...ugh
hi guys, if any of u is interested in helping me making a data set pls ping me. Google api doesnt allow too many requests per user, So it will be faster if someone of u cooperate. the image data set will be about pokemons. I have an script already
what kind of dataset are you trying to create? maybe there's a way to request that information from Google
Hi, I'd like to get back in to programming for the purpose of being able to scrape data from websites. I've done a full semester of Intro to Compsci with Python five years ago. What's a good book or course that I can jump into?
btw, do u mean contacting with google?
gotta work first before one recruits
There's no reason to expect that they'd know anything about my employment history.
steler ignores me
is there any ai library that allows you to generate images from a dataset? Similar to how https://thispersondoesnotexist.com/ does it
This Person Does Not Exist
build a generative model and produce those images
what library is the best for that? i heard opencv is only for rigid objects is that fine?
@serene scaffold:(
hi all!
Yes? I'm waiting to hear back from a friend who works at Google to ask if you can request data from them.
Hi guys, quick question as i am a bit confused. I want to predict the UnitPrice which is my target, should I use standardscaler and then do one hot encoding?
trying to plot some data with python for the first time. it's a CSV file with 4 columns: datetime, a, b and c. i already learned how to work with 2 columns: just add "squeeze=True". but with 4 columns, how can i treat this as a time series and get a line plot with three different lines (which, to make it worse, have totally different maximum values)?
You load it into pandas split the columns into series now you have 4 variables each for one column
Plt.plot(date,a)
plt.plot(date,b)
…
Plt.show()
as pastafish said, you need a generative model for that. It's not something that can be done with OpenCv - generating human faces is not easy
@tiny flax
i'm interpreting that to mean i should write something like
series = read_csv('data.csv', header=0, index_col=0, parse_dates=True)
pyplot.plot(series[0], series[1])
pyplot.show()
which unfortunately gives KeyError: 0
No thats not a series its a dataset
Is generative model a library? Or what libraries support it
yeah i noticed 😛
it's alright perhaps someone else will come along
ah wait now i'm getting somewhere
index_col=0 means that i can just say .plot(data['a'])
no - generative models are a flavour of Neural Networks. it's not a library. if you want to understand them in-depth, I recommend you get started with ML (the pinned messages provide a good starting point)
each of the .plot() calls returns a Line2D object, which does not understand set_ylim() ... how is the latter now accessible?
I turned on my laptop yay
🙂
yeah set y limit is not for line2D its used in axes
but then how do you access set_ylim after having done line = plt.plot(a)?
that works but is not specific to each line
problem is, a goes from 0 to 10, b goes from 0 to 12000 and c goes from 0 to 4096
in that case if I want to limit it I would select the series and select all values greater than a number
its a hassle but easier than to set a y limit
I think
no, multiple, independent y axes ..
Anybody know any similar discord servers for R?
perhaps you can try using subplots?
ah! found something! https://stackoverflow.com/questions/46011940/how-to-plot-two-pandas-time-series-on-same-plot-with-legends-and-secondary-y-axi
@exotic maple no, i also later need to group it by day and put each day in a subplot.
Seaborn is the easiest way to do faceting like that in python
so this thread covers loc vs iloc
is there a similar thread for those two vs the raw index operator
that just gives you columns
oh ok
How do you get the random_state in sklearn?
Coz on one random state I see 5% better accuracy so I wanted to get it in a variable
Like after training the model
🗿
if you type #tweet
it should @ you and say undefined coz i dont have the python server up
made a tweet generator
#tweet
#tweet hello
bruh it not working @strange oriole
#tweet hello
@random aurora helloge man all when I us dorn tweet
dog
gay
conspiracy posting
prayn hfw
@random aurora undefined
#tweet hello
@dim olive hello are ifto you sent her frears esees is mo in make of Keemstar seesing a 12-tee.
@dim olive undefined
!ban 846208222225891329 selfbotting is against discord ToS
:incoming_envelope: :ok_hand: applied ban to @strange oriole permanently.
!ban 518944568302108712 it appears you are only here to help your friend with a selfbot. This is against ToS, we do not want this in our community.
:ok_hand: applied ban to @random aurora permanently.
👋
it is when you automate your user account in discord. It is against ToS
?
Why would that be against ToS? sounds random as hell
Maybe because then it'd be easier to spam though
What would automating your user account be useful for?
outside of just developing a bot....which its probably not their primary user account so thats just all it is for
Hello i need your suggestion about my previous question about siamese neural network for multiple person face recognition
The one idea that fly to my mind is instead using SNN for directly make the similarity calculation, I use the NN feature (from n-variation of image) then store it on database. Then the new image come, feature extracted then i check the similarity based on stored feature in paralleled?
Since as far as i know, general NN also good to create the feature
I usually use selenium and BeautifulSoup for webscraping
Idk how Scrapy is different
Hey guys, we are using Pytorch. After a while, all class probabilities converge to close to 0.99, making the model a random number generator. Anyone know any ways to deal with this?
We have tried adding L1 norm by adding the number of parameters to the loss, we have used batch normalization in the linear/convolutional layers and we have added gradient norm clipping
ok
Hello Everyone. I am new to python. Have been coding in java and C++ and mainframes uptill now. I am facing some issues using the Pandas package in python. I know how to do data manipulation in java and C++ through loops. But in python it takes a lot of time. So i switched to Pandas and it is great!!!. I have 31 rows and 5 columns in excel sheet. I want to divide the 15th row data by 1st row and so on uptill 30th row data is divided by 15th row. And write the output in same file in the next columns or even next sheet would do. Could you please help me out.
The simplest way that's Also stupid fast, I'd say, is to create "shifted" columns in pandas. Take a look at shift method
Then once shifted columns are created, just divide
No iteration needed, no loops needed, and then you can save the output as you prefer
Thanks @ripe forge for yourinput. If i understand correctly. I will have to make a copy of my data and shift that to 15 columns down and then divide the value of one dataframe with other.
Yep exactly. There's a .shift method in pandas that makes this easy
Guys, just one thing
If i have a pretrained model
But i dont have the data set it was trained with
But i wanna train it with more augmented data, can i with my own dataset?
Yes, that sounds a bit like what we do in transfer learning in any case.
The only caveats is, if this original data was also directly relevant to you, then each iteration with the new dataset may erode some of the learnings specific to older dataset.
Yeah, i would like to use the same dataset
You can mitigate this somewhat by freezing the initial layers but I'd suggest very few iterations and freezing both.
Maybe freeze everything except the last layer to begin
Oh. Then the original data isn't directly relevant to you is it?
What's the task for your model? Ie what are you trying to predict
It is
I was trying to predict pokemons
But sadly, if the pokemon is colored, it fails to predict. So i made a generator that extends the keras one wich adds different colors to the img, so i pretend the nn to focus on the shape too
And the original inception data for your model is also on Pokémon?
Yes
Ah OK
Sec, let me see if i find it
I'm confused then, how'd you create modified images if you don't have original data?
Oh ok. OK then got it, all my original statements apply. Your model performance may deteriorate if you overdo this
Maybe you could consider an alternative, convert your input to greyscale and see if the model is able to predict Pokémon from that as is. I suppose this depends on how the original model was trained
Ie instead of having the model deal with coloured images, have a preprocessing that deals with it such that the model doesn't have to.
Fair enough.
I bet original data was colored pokemons
Then the original must be using colours
So it pays attention to colors
So i thought about 2 ways
The first is retraining with this color modification augmented data
It would be ideal if you had original data
And the second, modifying the input layer to recieve the mask of the pokemon aswell
As the "shape" of the pokemon
I have another nn which returns the mask of an image. It is called u2net
Cuz when i was child, i remember pokemon had something like "who is this pokemon?" And only the shape was showm
And i was able to guess it
So maybe a nn can too :D
is there a module for generating responses to input based on training data?
yo guys i just finished learning pandas
took alot of my brainpower
but not sure if i really get it yet
is there any sort of project or wtv that i can do
to know if i really get it
@serene scaffold hello dude, did u get any reply?
I have one question... idk if it will be possible but
I am trying to make a pokemon classifier
and i have 898 classes
but there are pokemons such as Primal or what ever, which are the same but different shape, w/e
The thing is i downloaded a model
with a 928 classes output, cuz for it, kyogre != primal kyogre
so it is on a different class
Can i remove the last output layer of this pretrained model and reduce it to 898 classes???
no, right?
yes, you can remove the last layer and add a new one with 898 outputs