#data-science-and-ml

1 messages · Page 313 of 1

velvet thorn
#

one thing that helped

#

was like

#

thinking about taking slices

#

across axes

#

and what shapes the results would be

lapis sequoia
#

Hey anyone got a chance to play around with DataSpell?

sly salmon
#

I feel like pandas is much more manageable for labeled data compared to numpy, but I heard that pandas is at times 20x slower

tall zinc
lapis sequoia
#

but at the end you will just execute a cell and continue working with edited variable

#

hence you won't feel it as much

tall zinc
#

Thanks :)

sly salmon
tall zinc
#

OpenCV for all the Computer Vision stuff, the overlay is from wxPython, pynput for sending mouse input and mss for the screen capture

#

And tesseract for the OCR (at least when I couldn't be bothered to hack something more accurate together with opencv)

#

It does all the other (non-needy) modules too, though it takes a long time with the one with lots of words because OCR is slooow and it has to do a few passes to make sure it's right
https://www.youtube.com/watch?v=DvTNRo8tCqo

Keep Cheating and Nobody Explodes bot trying out 8 modules.

Still need to update a bunch of the drawing and text and its display in general, but I think the non-needy modules are all basically finished at this point

All written in Python, using OpenCV for the image processing/computer vision parts and a bit of wxPython for the display window ...

▶ Play video
#

Still working on making the drawing output more often and more informatively

tall zinc
# tall zinc OpenCV for all the Computer Vision stuff, the overlay is from wxPython, pynput f...

The vast majority of what opencv is being used for is calling inRange() on hsv images and finding contours. And some eroding and dilating but most all of it is just using that. The keypad with symbols on it uses matchShape (and a ShapeContextDistanceExtractor when matchShape is lying) but otherwise it's largely just testing for certain colours and sizes of resulting contours and then using handwritten logic to work out what that means

lapis sequoia
#

what does the word prior mean in terms of ML? I'm reading a paper in which they say that

The construction phase is prior-driven, not data-driven—-data comes in only at the learning phase.

Please let me know if anyone can help and if more detail is needed. Thanks.

desert oar
#

It didn't change for me, so I think it's read-only.

desert oar
cyan lantern
#

anyone here familiar with sparse matrices and how they work

lapis sequoia
# desert oar They mean the "prior probability" as in Bayesian statistics

yeah that's what i thought so. but what do we mean by prior-driven and data-driven.
I've just talked with a friend of mine, and what he suggested is instead of giving data to our model to learn we give certain properties for granted.
(btw in this approach we use very less data so above suggestion made kinda sense to me.)

desert oar
#

yes i think your friend's interpretation is reasonable

#

from Bayes' theorem we have P(θ|Y) ∝ P(Y|θ) * P(θ) where θ is our model parameter and Y is our data. if we don't have a lot of data, our estimates of P(θ|Y) will depend more strongly on our assumptions of P(θ)

desert oar
#

"don't ask to ask", is the saying

cyan lantern
#

well im running into issues with predictions after building a model

lapis sequoia
cyan lantern
#

my training data after preprocessing have different number of features to the test data, which raised an error during prediction stage

#

both test and training datasets were preprocessed the same way

desert oar
#

this doesn't appear to be relevant to sparse matrices btw

#

can you show your full code?

#

at least how clf is defined

#

my training data after preprocessing have different number of features to the test data
basically, this should not ever happen

cyan lantern
#

I posted this whole part on stackoverflow

#

the model worked during validation

#

so this works fine (predicting the training data)

clf.predict(X)
cyan lantern
desert oar
#

You are not using sklearn correctly

#

you are creating new and separate transformers for each split

#

you don't want to do that

cyan lantern
#

oh

desert oar
#

after .fit-ing a transformer, it keeps the fitted state internally, then you just .transform on the other datasets

cyan lantern
#

so the vectorizer function is not working correctly?

desert oar
#

it cannot work correctly as-written

#
vectorizer = CountVectorizer(stop_words = 'english')
classifier = LogisticRegression(C = 0.01, max_iter = 1000000, penalty = 'l2')

x_train = vectorizer.fit_transform(data_train)
clf.fit(x_train, y_train)
pred_train = clf.predict(x_train)

x_test = vectorizer.transform(data_test)
pred_test = clf.predict(x_test)
#

your code should look something like this

#

better yet, use a "pipeline" to automate the sequence of preprocessing and classifier fitting

from sklearn.pipeline import make_pipeline

clf = make_pipeline(
    CountVectorizer(stop_words = 'english'),
    LogisticRegression(C = 0.01, max_iter = 1000000, penalty = 'l2'),
)

clf.fit(data_train)

pred_train = clf.predict(data_train)
pred_test = clf.predict(data_test)
cyan lantern
#

and does that work with both text features and numerical features?

#

because right now, I am vectorizing each text feature on its own first (though not done correctly), then hstacking them with the numerical features (in np.array form)

#

but what you are showing is that I just preprocess the whole dataset together?

#

or am I misunderstanding it?

cyan lantern
#

thank you

desert oar
#

I added an answer on SO as well

lavish bison
#

Hi guys, can anyone help me with this?

warm basin
#

Please I am stuck here. The task is on the picture. Trying to create an API for an image search. I am done with the ml part and the trained model has been assigned to a variable model

warm basin
thorn bobcat
#

yo

#

I'm trying to work on my own style GAN encoder and would like to learn the basics leading upto this do I learn OpenCV, Tensorflow or Keras first?

grave frost
#

is there any specific reason just to build an encoder?

thorn bobcat
#

Want to do motion detection and face recognition.

#

also want to turn images into videos.

#

maybe make my own anime

#

pithink i have alot of things I want to do

grave frost
#

that's a whole GAN - I thought you meant encoder seperately

thorn bobcat
#

@grave frost I want to work on this and improve it too.

#

this is almost one of the things I'd like to achieve.

cyan lantern
desert oar
#

always show your code

cyan lantern
cyan lantern
desert oar
#

if you share your code as text it's a lot easier to read than as a screenshot

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

cyan lantern
#

ah sorry yeah

desert oar
#

hm, the vectorizer might still not be emitting the full array

#

it shouldnt though

#

make sure you restart your notebook in case there are any typos or something

cyan lantern
desert oar
#

you can hstack them although i do think ColumnTransformer will be easier to work with

#

that code you wrote looks like it should work

cyan lantern
#

X_test is 10000x8718 and X is 40000x31765

cyan lantern
desert oar
#

i would normally expect this to work...

#

let me see if there's a missing flag or something

cyan lantern
#

thanks for helping out btw

desert oar
#

this works as expected

cyan lantern
#

hmmm

#

even without hstack it is not right

desert oar
#

right, hstack isn't the problem here

cyan lantern
#

I may have found the problem

thorn bobcat
#

anyone here worked with GAN's?

cyan lantern
#

if I do this instead

#

it works fine

topaz epoch
#

Hey guys , how do I start learning ds

cyan lantern
#

so I guess the order matters

topaz epoch
#

Does ds include ai and ml

desert oar
#

you shouldn't even be able to run that code with those lines commented out

#

restart your damn notebook

#

and use [''] to get dataframe columns, don't use .

#

(what happens if you have a column called map?)

cyan lantern
# desert oar you shouldn't even be able to run that code with those lines commented out

other_features = ["n_steps", "n_ingredients"]
features = df_train[other_features]
test_features = df_test[other_features]

name = vectoriser.fit_transform(df_train.name)
test_name = vectoriser.transform(df_test.name)

steps = vectoriser.fit_transform(df_train.steps)
test_steps = vectoriser.transform(df_test.steps)

ingr = vectoriser.fit_transform(df_train.ingredients)
test_ingr = vectoriser.transform(df_test.ingredients)

X = hstack([steps,ingr, name])
X_test = hstack([test_steps, test_ingr, test_name])
y = df_train.duration_label
#

now it works

desert oar
#

restart your notebook anyway

#

it's highly likely that you just had some other variable name hanging around due to a typo

#

OH

#

that's the problem

#

you need a separate vectorizer for each set of features...

cyan lantern
#

yeah hahahaha

desert oar
#

don't re-use it

#
other_features = ["n_steps", "n_ingredients"]
features = df_train[other_features]
test_features = df_test[other_features]

name_vectoriser = CountVectorizer()
name = name_vectoriser.fit_transform(df_train.name)
test_name = name_vectoriser.transform(df_test.name)

steps_vectoriser = CountVectorizer()
steps = steps_vectoriser.fit_transform(df_train.steps)
test_steps = steps_vectoriser.transform(df_test.steps)

ingr_vectoriser = CountVectorizer()
ingr = ingr_vectoriser.fit_transform(df_train.ingredients)
test_ingr = ingr_vectoriser.transform(df_test.ingredients)

X = hstack([steps,ingr, name])
X_test = hstack([test_steps, test_ingr, test_name])
y = df_train.duration_label
#

columntransformer will be really useful here

cyan lantern
#

yeah I will try to learn that and redo it

#

thank you for the help

desert oar
#
from sklearn.compose import make_column_transformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

clf = make_pipeline(
    make_column_transformer(
        ('passthrough', ['n_steps', 'n_ingredients']),
        (CountVectorizer(), 'name'),
        (CountVectorizer(), 'steps'),
        (CountVectorizer(), 'ingr'),
    ),
    LogisticRegression(C=0.01, max_iter=1000000, penalty='l2'),
)

@cyan lantern something like this

sly salmon
#

when using np.mean for rows (axis=1), the array has to be flattened.
So, when using np.mean for columns (axis=0), I'd assume array has to be modified (not flattened - but... stood up?)
Is there a term for this?

desert oar
#

why does it have to be flattened?

sly salmon
#

that's what it does by default

desert oar
#

numpy knows how long each column is, so it can "step over" the right number of elements in the underlying flat array to do the computation

desert oar
#

yeah it won't internally re-allocate memory for the array

#

that'd be very inefficient

warm basin
#

Please I am stuck here. The task is on the picture. Trying to create an API for an image search. I am done with the ml part and the trained model has been assigned to a variable model

digital aurora
#

Guys, can anybody tell me what all to study under pandas.

#

Like what all are important attributes and functions.

sly salmon
#

does np.percentile sort the array implictly?

#

well, I guess it has to.

sharp herald
#

anyone know what that notation for matrix M means?

rigid tendon
sharp herald
#

that representation

#

because P and Q are both matrices too

heavy bay
#

Hi, so I want to make a program which predicts crypto prices, so what python library should I use for that?. (I am relatively new to ml, I've made a few simple ml projects)

heady tide
#

Tensorflow throws an error IndexError: tuple index out of range when I add weights to my validation data

#

but if you see the last two code cellls

#

the shapes are correct for both validation and training

limpid saddle
#
SVM_clf_counts = Pipeline([('vect', CountVectorizer()),
                   ('clf', LinearSVC(C=0.1, max_iter=3000)),
                  ])
SVM_clf_counts.fit(X_train, y_train)
SVM_cnt_pred_tr = LR_clf_counts.predict(X_train)
SVM_cnt_pred_val = LR_clf_counts.predict(X_val)
SVM_cnt_pred_tst = LR_clf_counts.predict(X_test)


print("precision on training: ",precision_score(y_train, SVM_cnt_pred_tr, average='micro'))
print("precision on validation: ",precision_score(y_val, SVM_cnt_pred_val, average='micro'))
print("precision on testing: ",precision_score(y_test, SVM_cnt_pred_tst, average='micro'))```
#

I don't understand what the error is in this code, can someone help

desert oar
#

@sharp herald this is "block matrix" notation

#

P stands for "all the elements of P"

#

0 stands for "fill with 0s up to the correct dimensions"

velvet linden
#

so i have this program that checks for the input in the csv file, and then if it is not there, then it writes, but the write part doesn't work for some reason. I also am not getting any errors

desert oar
desert oar
#

also do not open a file for both reading and writing at the same time. you will make a big mess

velvet linden
desert oar
velvet linden
#

sorry

#

I might be a bit dumb here but how do I store the rows in a list

desert oar
#
import csv

uuu = input('user: ')
uu = input('pass0: ')
u = input('pass1: ')

with open('test1.csv') as fp:
    rows = list(csv.reader(fp))

new_rows = []
for row in rows:
    if row == [uuu, uu]:
        print('nogood')
    else:
        new_rows.append([uuu, uu])
        print('end')
rows.extend(new_rows)
del new_rows

with open('test1.csv', 'w') as fp:
    csv.writer(fp).writerows(rows)
#

admittedly i don't understand what this code is supposed to do, but it looks more or less like what you wrote, but without the chance of messing up the files

#

note that i do not .append to rows - i .append to a new list. this is because you should never mutate something that you are iterating over

desert oar
#

what does "doesn't work" mean?

#

what happened, and what were you expecting?

#

note that this is also untested code written by a stranger on the internet, so it could be buggy or incorrect

velvet linden
#

ok

#

so i want it to first take an input, then i want it to read the csv, and if it is not in the csv then write it

#

but its not writing it

desert oar
#

try (uuu, uu) instead of [uuu, uu]

#

i can't remember if csv rows are returned as tuples or lists. probably tuples, so use () and not [].

velvet linden
#

@desert oar that didn't make a difference

desert oar
#

does it never print nogood?

#

what actually does happen

#

and how is it different from your expectations?

#

it might help if you used https://repl.it and posted your code along with an example csv that shows the problem

#

for the sake of the demonstration, you should save to a different filename so i can see both the inputs and outputs

velvet linden
desert oar
#

save to test2.csv instead of test1.csv, so that when i run your repl.it post i can re-run it as many times as i want, without overwriting the original file

velvet linden
#

ok

#

so it is alrady on replit

lapis sequoia
#

guys, using this api

#

How can i know the methods?

#

i havent found documentation anywhere

lapis sequoia
#

where is the docs?

haughty jackal
#

are there any recommendations for resources to use for getting started with a.i and machine learning

limpid saddle
#
id_train, X_train, y_train = ftrain_preprocessed['SentenceId'], ftrain_preprocessed['Phrase'], ftrain_preprocessed['Sentiment']
id_test, X_test, = ftest_preprocessed['SentenceId'], ftest_preprocessed['Sentiment']```
#

I keep getting this error

#
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Sentiment'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-97-bf964073328e> in <module>
      1 id_train, X_train, y_train = ftrain_preprocessed['SentenceId'], ftrain_preprocessed['Phrase'], ftrain_preprocessed['Sentiment']
----> 2 id_test, X_test, = ftest_preprocessed['SentenceId'], ftest_preprocessed['Sentiment']

/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'Sentiment'```
tidal bough
#

Sentiment is not a key in ftest_preprocessed then

limpid saddle
#

what do i need to fix in the code?

tidal bough
#

You're trying to get a nonexisting column of a dataframe.

lapis sequoia
#

i think understanding what u are doing first will help tho

limpid saddle
#

Ahh I see, I'll try to see why it isn't there even tho it's supposed to be

#

thank you

lapis sequoia
#

u can print ftrain_preprocessed.keys()

slate hollow
#
recurrent = keras.models.Sequential([
    keras.layers.SimpleRNN(1, input_shape=(None, 1))
])
recurrent.compile(loss='mse', optimizer='nadam')
recurrent.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=40)```is this supposed to take forever
winged stratus
# slate hollow

40 epochs * ~88 seconds/epoch = 3520 seconds ~ almost 1 hour

#

oof

lapis sequoia
#

= for ever xd

slate hollow
#
import numpy as np
import tensorflow as tf

keras = tf.keras


# returns batch_size number of sequences, each of length len_
def gen_time_series(num_instances: int = 32, len_: int = 64):
    freq1, freq2, offset1, offset2 = np.random.rand(4, num_instances, 1)
    time = np.linspace(0, 1, len_)
    series = 0.5 * np.sin((time - offset1) * (freq1 * 10 + 10))
    series += 0.2 * np.sin((time - offset2) * (freq2 * 20 + 20))
    series += 0.1 * (np.random.rand(num_instances, len_) - 0.5)
    return series.reshape(series.shape + (1,)).astype(np.float32)


seq_len = 50
instance_num = 10 ** 5
train_amt = int(instance_num * 0.6)
val_amt = int(instance_num * 0.2)
raw_data = gen_time_series(instance_num, seq_len + 1)  # +1 for the instance to predict
X_train, y_train = raw_data[:train_amt, :-1], raw_data[:train_amt, -1]
X_valid, y_valid = raw_data[train_amt:val_amt, :-1], raw_data[train_amt:val_amt, -1]

linear = keras.models.Sequential([
    keras.layers.Flatten(input_shape=(seq_len, 1)),
    keras.layers.Dense(1)
])
linear.compile(loss='mse', optimizer='nadam')
linear.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=40)
```does anyone know why even when i'm providing the validation data, it isn't showing?
near cosmos
#

I'm on the hunt for a nice image annotation tool. Any recommendations?

old grove
#

Hi, I have recently started learning data science and have a doubt in pandas. Whats does the describe give.. I mean 25th,50th and 75th one basically i didnt understand...The rest i understood...just those 3 i didnt get ?

near cosmos
old grove
old grove
#

i mean values will be around,less or more than percentiles in Age..?

near cosmos
#

Another term for them is quartiles. They are cut points in the distribution such that a quarter of the values are below the 1st quartile (25%), half the values are below the 2nd quartile (50%), and so on

old grove
#

so ok this are percentiles not percent 😃

#

goti it

#

thanks a lot @near cosmos 👍 😃

steel hill
#

Would anyone know how to make it so the graphs are transparent? I haven't been able to figure out a way to do this and any help would be highly appreciated. Thank you.

#

oops, my own heatmap didnt upload

#
import os
import json
import sys
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.image as mpimg 
import sqlite3
import seaborn as sns
map = sys.argv[1]
file = f"./{map}.csv"
df = pd.read_csv(file, header=None, usecols=[0,1])
print(df)
map_img = mpimg.imread(f'{map}.png') 
hmax = sns.kdeplot(df[0], df[1], cmap="Reds", shade=True, bw=.15)
hmax.collections[0].set_alpha(0)
if 'metalworks' in map:
    xmin = -3034
    xmax = 3374 
    ymin = -6699
    ymax = 4939
elif 'product' in map:
    xmin = -2859
    xmax = -171
    ymin = -3668
    ymax = 3776
elif 'process' in map:
    xmin = -5222
    xmax = 5216 
    ymin = -3146
    ymax = 3128
plt.imshow(map_img, zorder=0, extent=[xmin, xmax, ymin, ymax],resample=False)
plt.savefig(f'{map} heatmap.png', dpi=1200, transparency=True)
plt.show()```
here is the relevant code btw
desert oar
#

@steel hill you might have to define your own colormap that has transparency, or otherwise need to find a way to set the "alpha" channel for the colors to something less than 1

lapis sequoia
#

import tensorflow as wtf

#

Hi guys, I want to do time series model prediction but I am wondering how I can treat the skewed data here?

desert oar
lapis sequoia
#

numbers that leave x% data to the left side and the rest on the right

marble citrus
#

how can I use matplotlib in vs code

tidal bough
marble citrus
#

I am getting module not found when running my code @tidal bough

#

@tidal bough this is what I mean

lapis sequoia
marble citrus
#

is there any way I can check if I am on a virtual environment?

lapis sequoia
#

that status bar below will show your current interpreter.

marble citrus
lapis sequoia
#

I think your using the default. I would suggest use notebooks instead when learning. (colab notebook/kaggle/jupyter notebook/jupyterlab).

#

install matplotlib using pip install matplotlib.
then import matplotlib.pyplot as plt

marble citrus
marble citrus
lapis sequoia
#

One of the downside of using the default is mixing up packages from your other projects that may result to some conflict.

lapis sequoia
#

pip install matplotlib

#

import it in your python script using import matplotlib.pyplot as plt

marble citrus
charred skiff
#

Good evening has anyone read google mu zero paper here?

grave frost
lapis sequoia
#

who?

#

tensorflow as your boi?

#

ew

fresh zenith
#

is there a way

#

where i can make a constantly updating graph

#

in matplotlib?

tidal bough
#

yes; just plot to the same figure repeatedly.

fresh zenith
#

can you give me a basic example or somthing to read?

tidal bough
fresh zenith
#

ok thanks

#

this is what i have so far

#
import matplotlib.pyplot as plt
import random

y_data = []
average = 0

for i in range(0, 60):
    y_data.append(random.randint(0, 100))


for i in y_data:
    average += i

print(y_data)
print((average/60))

plt.plot(y_data)
plt.ylabel("Jason's Gay Percentage")
plt.xlabel("Seconds")
plt.show()
grave frost
lapis sequoia
#

to classify almost 1000 classes, how many images do i need per class?

grave frost
#

theoretically

#

practically? as much as you can store

desert oar
#

you also need to worry about overall class imbalance - if some classes are much more common than other classes, the model can get pretty good accuracy by simply never predicting the rare classes

lapis sequoia
#

isnt amount of images = various inputs per class?

#

ah, u mean how different are between each other?

grave frost
#

@desert oar you forgot data quality, noise, model architecture, gpu memory, bank account

lapis sequoia
desert oar
#

there is no specific number. more is better. if you have a lot of features in your data, you will need more data points to cover the feature space.

desert oar
#

with image classification, you can use data augmentation to help with this somewhat

grave frost
#

ist just me, or is finding people with knowledge in multiple domains difficult AF?

#

or is signal processing + AI a niche job in general?

lapis sequoia
#

also... is there any argument in keras to, lets say, augment data coloring it?

#

like if i have a red car, apart from rotating scaling flipping etc, pait it blue? or at least add it blue color? or something?

grave frost
#

check out imgaug lib. it has enough augmentations to last you a lifetime

grave breach
#

@lapis sequoia So, you need to peek the red car and make it blue?

grave frost
lapis sequoia
grave frost
lapis sequoia
#

i think yes

grave frost
#

I would think scaling pixels to values for a particular color

#

like if blue is 0-10, then all image values would be scaled in that range

desert oar
#

im sure opencv has this stuff

lapis sequoia
#

maybe photoshop "color" blending mode is what i want

hallow orbit
#

did someone need me

desert oar
#

no i tagged the wrong person, sorry

hallow orbit
#

oki

grave breach
#

Like Mathematica

grave breach
#

It has a function that does what you need

lapis sequoia
#

this is how i wanted to augment the images

#

okey, ill take a look at imgaug

#

but since keras provides a generator u can pass to the fit method after

#

can i pass the analogue of imgaug?

desert oar
#

i imagine keras gives you some way to write your own generator

lapis sequoia
#

keras ImageDataAugmentation class has

#

brightness_range=None,

#

can i somehow add the color one?

grave breach
lapis sequoia
#

i dont wanna write on disk all the augmented images

#

i only wanna have the basic ones, and during the train, provide the augmented ones, like u normally do with keras

grave breach
#

You can keep them on memory and then pass all the data to python via mathlink

lapis sequoia
#
                              validation_data = validataion_gen,```
grave breach
#

But, if you do the ml part on wolfram (faster and easier than keras) I think that you can do data augmentation on the fly

lapis sequoia
#

validataion_gen = data_gen.flow_from_directory

desert oar
lapis sequoia
#
                              horizontal_flip = True,
                              vertical_flip = False,
                              brightness_range = (0.5, 1.6),
                              rotation_range = 11,
                              validation_split = 0.17)```
#

in the end, what u provide to fit, is a generator

desert oar
#

cool that you can do neural networks in mathematica though

#

definitely a powerful tool

grave breach
desert oar
#

i remember i had a license for it when i was an undergrad, through my school. but i didnt really have a use for it then and didn't have the patience to learn the language

#

you spelled everything correctly

lapis sequoia
#

the ideal was to have exact the same thing as this, but with an extra option saying add_color = (255,0,0) or something xD

#

oh

#

look what i found

grave breach
#

@desert oar By the way, I think that mathematica is better than python when doing research or training models, but I also think that the best comes when you take what you researched on mathematica and take it to python (or other languages for production)

lapis sequoia
#
from imgaug import augmenters as iaa

seq = iaa.Sequential([
    iaa.Fliplr(0.5), # horizontally flip
    # sometimes(iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05), per_channel=0.5)),
    iaa.OneOf([
        iaa.Sharpen(alpha=(0, 1.0), lightness=(0.75, 1.5)),
        iaa.Emboss(alpha=(0, 1.0), strength=(0, 2.0)),
        # iaa.Noop(),
        iaa.GaussianBlur(sigma=(0.0, 1.0)),
        # iaa.Noop(),
        iaa.Affine(rotate=(-10, 10), translate_percent={"x": (-0.25, 0.25)}, mode='symmetric', cval=(0)),
        # iaa.Noop(),
        # iaa.PerspectiveTransform(scale=(0.04, 0.08)),
        # # iaa.Noop(),
        # iaa.PiecewiseAffine(scale=(0.05, 0.1), mode='edge', cval=(0)),
        
    ]),
    sometimes(iaa.ElasticTransformation(alpha=(0.5, 3.5), sigma=0.25)),
    # More as you want ...
], random_order=True)

datagen = ImageDataGenerator(preprocessing_function=seq.augment_image)
grave frost
#

I don't know why there are people in weird places encouraging niche languages to newbies for no good reason

lapis sequoia
#

can i somehow keep the default augment params?

desert oar
grave frost
#

like the guys at one server trying to get someone to write a NN in FORTRAN and x86

lapis sequoia
grave breach
lapis sequoia
grave breach
grave frost
lapis sequoia
#

brb, gtg

grave breach
#

it is also used in large scale production

#

(even alexa is in part powered by mathematica)

grave frost
#

I think that mathematica is better than python when doing research or training models, but I also think that the best comes when you take what you researched on mathematica and take it to python (or other languages for production)
most research uses JAX and python tho? if it is indeed maintained by such a big team, it definitely doesn't convince many in research or in Applied ML

grave frost
grave breach
#

Oh no... Religious wars...

#

By the way, mathematica neural network framework is powered by mxnet

#

That is heavily used in production and research

#

Also many of fortune 500 companies actively uses mathematica for research

grave frost
#

what? mxnet?

grave breach
#

Yes

grave frost
#

is it even maintained?

grave breach
#

Yes

#

But I think that wolfram has it's own branch

#

I also think that in order to have a productive conversation you should take a look about who they are at Wolfram Research

#

What they did, what mathematica can do, etc.

grave frost
#

I mean, I don't even have to argue how much of the industry uses mxnet

grave breach
#

Sorry, but I don't even remember the point of the conversation

#

I'll make you a recap:
Mathematica is the world's fastest language (not performance, speed of coding)
It (with MatLab) is the industry standard for research
Top universities, companies and institutions uses it
It powers large scale productions systems

desert oar
#

i think the point is that this is a python server, and most newbies here can barely use python, let alone do serious machine learning or understand the math that goes into it, so recommending that they use mathematica instead os not really helpful to those people

desert oar
#

its definitely an interesting topic though

#

i know there are people who really love mathematica

#

where have you seen it used in industry? finance?

grave breach
#

I'm a physics enthusiast

#

I mainly use it for it

desert oar
#

i know it has some very powerful symbolic math capabilities

#

i definitely used it to try and figure out homework answers in college

#

didn't always though...

grave breach
desert oar
#

yup, very handy tool

#

it was also useful for quick plotting when i needed intuition about how a function ought to work

grave breach
#

Definitely

late shell
#

Hello, I'm trying to code a multiple linear regression model myself without using any libraries except numpy. But even after a lot of epochs, my accuracy is stuck at 38%. Using sklearn's linear regression gives 94%. I'm guessing that while moving down the cost function using gradient descent, my algorithm is stuck at some local minima. Any way I can confirm that and if it turns out to be true how can I get out of that local minima and move towards global minima? Thanks

near cosmos
grave breach
#

Well, python alone is a small, fast and flexible language

#

So, without package it cannot do as many things as mathematica

#

the problem is

#

That since the developers of the packages are different

#

Many package wouldn't perfectly fit

#

So, while python is better for production

#

Mathematica is better for research

#

They have different purposes

near cosmos
tidal bough
#

If you have few points, you can also exactly calculate the correct (best fit) parameters by solving the normal equation.

near cosmos
# grave breach They have different purposes

True enough. My point was that your last three arguments also are arguments for python 1) it is an industry standard, 2) top places use it, 3) it is used in production systems

grave breach
#

Oh yes, I think I did not explained myself correctly

#

I meant that with mathematica has born for doing so

#

So it is designed to be more powerful in research

#

It also comes with a big set of tools

late shell
late shell
near cosmos
tidal bough
#

Basically make up a whole bunch of parameters (say, equally distributed on a grid) and for each set of them, calculate the error

#

that'll allow you to look at how the error looks like depending on the params

late shell
desert oar
obtuse spindle
#

can someone explain the road map or share a useful link to learn data science and machine learning . I am total beginner with no proper guide.I know c++ and python. Should i learn django network & otherskills?

main kernel
tidal bough
#

(it also teaches you some of the linear algebra required if you don't know it)

late shell
lapis sequoia
#

so guys, how can i add another transformation to ImageDataGenerator from keras?

wanton bobcat
#

Hello? Can someone help me in basic python console?

desert oar
# late shell umm, can you please elaborate a bit more.

i recommend starting by implementing least squares regression. it's easier to program, you get an exact result rather than a local optimum, and it's a good excuse to dig into the linear algebra and optimization problem a bit more deeply than gradient descent

steel hill
silent current
#

Suggestions for building a dashboard? Bokeh? Plotly? Something else?

teal wadi
#

hey guys there is someone who can help me with python pandas

silent current
#

Just ask your question

teal wadi
#

i need to fix my timestamp on pandas i get wrong value when i do pd.to_date

#

i convert timestamp to date UTC

#

and it doesnt work well when i do other thing its just gives me an error

desert oar
#

@teal wadi it helps if you share your code and the specific errors or unexpected output

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

lapis sequoia
#

hi guys, i have a question about keras
keras ImageDataGenerator class provides some of the basic transformations to increase data amount, but i am wondering how can i add my own transformation
In this case, i wanna change the color. Ive seen imgaug library has it, but i dont know how to use it with keras
Can someone help?

arctic wedgeBOT
#

Hey @native ginkgo!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

native ginkgo
#
    import pyaudio
ModuleNotFoundError: No module named 'pyaudio'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:/Users/Vandana/Desktop/vansh coding/discord bot/alex.py", line 45, in <module>
  File "c:/Users/Vandana/Desktop/vansh coding/discord bot/alex.py", line 28, in commandlistener
    with sr.Microphone() as source:
  File "C:\Users\Vandana\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\speech_recognition\__init__.py", line 79, in __init__
    self.pyaudio_module = self.get_pyaudio()
  File "C:\Users\Vandana\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\speech_recognition\__init__.py", line 110, in get_pyaudio
    raise AttributeError("Could not find PyAudio; check installation")
AttributeError: Could not find PyAudio; check installation
PS C:\Users\Vandana> pip install PyAudio
Collecting PyAudio
  Using cached PyAudio-0.2.11.tar.gz (37 kB)
Building wheels for collected packages: PyAudio
  Building wheel for PyAudio (setup.py) ... error```
#

it is saying that pyaudio is not found on comp

#

and when i am trying to pip install

#

it is also showing error

winged stratus
late shell
languid falcon
#

anyone free for me to message them? I have some general questions about a few topics regarding data science and the types of way to do them in python

polar stag
#

any data science course you people recommend? i can see one in pinned messages from columbia's ML

oak violet
#

hey guys, im new in data science and im having a hard time getting the newest googlebot string.. could someone help me with that please?

oblique raft
#

hi! can someone recommend me a good tutorial for generating text with keras ?

lapis sequoia
#

i need help with translating one of the old crypto hash function algorithms from C code to Python... can anyone help?

teal wadi
#

i need to fix my timestamp on pandas i get wrong value when i do pd.to_date
i convert timestamp to date UTC
and it doesnt work well when i do other thing its just gives me an error

#

DM me if someone can help me with it i can share my code and hope for help

hard hound
#

ValueError: shapes (1,3) and (4,4) not aligned: 3 (dim 1) != 4 (dim 0)

#

Is the error I am getting again and again I have checked my code and I know hat it means But I am ubale to solve it

hard hound
lapis sequoia
#

hi guys, i have a question about keras
keras ImageDataGenerator class provides some of the basic transformations to increase data amount, but i am wondering how can i add my own transformation
In this case, i wanna change the color. Ive seen imgaug library has it, but i dont know how to use it with keras
Can someone help?

#

actually

#

is this what i want?

#

when u extend ur custom class from ImageDataGenerator, does it still have the augmenatations from ImageDataGenerator?

#

Yes, right?

upper spade
#

hey guys

#

so im planning to pick up ai and machine learning

#

but i have 0 experience with pandas whatsoever

#

or any other library needed

#

what book should i read?

somber prism
#

guy i have this Salary dataset , it has like 30 samples and only one feature , so its shape is (30, 2) . i took last 2 samples as test data . i tried fitting it and when i see the score for training i get 94. but when i see the score for testing i get -131. can someone explain me why ?

somber prism
upper spade
#

oh i see

#

thanks man

frail oak
polar stag
grave frost
#

Brain wrenching question 🧠 : For calculating attention on a multi-sequence input (as in rather than having a single 1D sequence of tokens, we have a 2D/3D array of tokens that all are considered to be a single sequence) is there any such method/technique/research that has been done into this? I can't seem to find relevant stuff.

sterile stream
#

I am trying to learn Machine learning, and I am confused about which path to go down. There is a Coursera course by Andrew Nag, but it does not teach it in python. Then there is a playlist on machine learning by Sentex (the YouTuber).

https://youtube.com/playlist?list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v

Other than that, there are machine learning modules like TensorFlow, Keras, PyTorch, etc. I am not sure which path to choose for moving ahead. You all are experienced than me, what do you suggest?

Note: I am not learning this as a hobby. I want to get a master’s degree in Robotics, and my college does not teach any of this stuff (I am an Undergraduate currently).

#

The playlist is symbolic

lapis sequoia
#

is like, different levels

somber prism
#

guy i have this Salary dataset , it has like 30 samples and only one feature , so its shape is (30, 2) . i took last 2 samples as test data . i tried fitting it and when i see the score for training i get 94. but when i see the score for testing i get -131. can someone explain me why ?

regal trail
sterile stream
old grove
#

Hi Guyz, This is my dataset and i need to get new dataset as decades but i want to add all colum values in that . Like say 1960-1970 is decade so in 1960 as first value in first colum and in second colum i need the sum of all values from 1960-1969... same for value 1970 as first colum value and next colum will have sum of all values from 1970-1979.

#

I tried groupby

#

googled but i am not getting any tresults

#

is there any inbult method

#

Like This.... Like in vehicle theft 0 index has sum of values from 1960-1969..Like This

fierce hill
#

You can use . sum() at the end of your line

#

To add the values

lapis sequoia
#

rip fcb

old grove
#

tried groupby.sum

#

but it gives me corresponding value for that value and not sum

lapis sequoia
#

Use for loop in range

#

firstly find the index of last year of each decade using .loc then do sum within for loop

#

that should easily solve your problem

old grove
lapis sequoia
#

Yes

#

Make another dataframe for it

old grove
#

ok... but any ib method using groupby or agg.. anyone knows ?

#

will write a loop..but in genral asking ?

lapis sequoia
#

Why do you need to use that?

#

well you can use agg function with lambda

#

but basically it’s same

#

for simple line you could do agg + lambda + for in one line

south burrow
#
import pandas as pd
df = pd.read_csv('data.tar\latestdata.tar').shape
print(df)
old grove
#

but i have to create other frame and append the years and summed values for ranges and append... Thanks anyways 🙂

upper spade
#

hey guys having abit of a roadblock using pandas here

#

print(df.sort_values(['Name','Speed']).iloc[0:15])

#

its sorting Name but not Speed

#

can i know why?

lapis sequoia
#

god this cuda annd cudnn are killing me

#

3-5 months ago i tried installing and using them

#

i changed my drivers with them

#

i failed but my drivers were still cuda

#

my c drive storage died after installing them

#

and now when i try to use them i realize they are not updated and i need to reinstall

#

aghhh

lapis sequoia
old grove
lapis sequoia
#

using Google image search api, how can i search for more than 1 param?

#

'imgType': 'lineart|photo',

#

this sais it is not allowed

#

but 1 by 1 i can

fierce hill
serene scaffold
# upper spade its sorting Name but not Speed

in what way is it not sorting speed? I'm pretty sure what it's supposed to do is sort by name, and then sort by speed only when two values in Name are equal. All your names are different.

lapis sequoia
#

Yo I always think on weekdays that “I’m so excited to do my side project on the coming weekends” only to realize that I’m a lazy squidward lying on the bed on weekends

raven quiver
serene scaffold
#

@upper spade

>>> df
  letter  number
0      a       5
1      z       6
2      a      10
3      a       1
>>> df.sort_values(['letter', 'number'])
  letter  number
3      a       1
0      a       5
2      a      10
1      z       6
raven quiver
#

lightfm allows an alpha and user_alpha, which are L2 penalties.. but I don't quite get how to get the alpha and beta described in the post

lapis sequoia
#

or even in excel formula it needs something like RANK()+SUMPRODUCT()

serene scaffold
lapis sequoia
#

Ooh

strong dock
#

hello everyone!
I wanted to implement github repo : RankIQA based on Caffe
I am facing trouble in installing it in windows 10
I installed using Anaconda but im unable to import caffe
Can I install Caffe on Google Colab
Pls guide me as I sense the Caffe community is not much active, I commented on issues of the official repo but got no replys.

serene scaffold
strong dock
#

I thought it would be easier

#

I dont have much experience

#

to make it from source

serene scaffold
strong dock
#

should I dual boot ubuntu

#

?

serene scaffold
#

!build

arctic wedgeBOT
#

Microsoft Visual C++ Build Tools

When you install a library through pip on Windows, sometimes you may encounter this error:

error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

This means the library you're installing has code written in other languages and needs additional tools to install. To install these tools, follow the following steps: (Requires 6GB+ disk space)

1. Open https://visualstudio.microsoft.com/visual-cpp-build-tools/.
2. Click Download Build Tools >. A file named vs_BuildTools or vs_BuildTools.exe should start downloading. If no downloads start after a few seconds, click click here to retry.
3. Run the downloaded file. Click Continue to proceed.
4. Choose C++ build tools and press Install. You may need a reboot after the installation.
5. Try installing the library via pip again.

strong dock
#

thanks will try it

stark zenith
#

I think you can still pip install packages on colab.

lapis sequoia
#

guys, since google api forbides too many requests per day to it, can any of u help me? I am trying to create a big dataset

raven quiver
#

does anyone here work with reccomendations engines? I have been trying to increase the fit of my model with various normalizations and it doesn't seem to do anything, was wondering if someone could help me out quick

rotund lily
#

hey i need help making a racial detection robot in python

exotic maple
frank dock
#

Hello! Here's my bachelor's thesis on privacy-preserving federated learning on decentralized data. I have made it open source now, and I would love it if anybody here could try it, give feedback, or contribute in any way. The goal is to make an open source library for doing secure federated learning using different privacy-preserving algorithms in an easy and efficient way.
It was written for Norwegian University of Science and Technology as a part of my degree in Computer Science.
Contact me if you want to know more about the research, and please ⭐ the project if you find it interesting!
https://github.com/dilawarm/federated

GitHub

Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data - dilawarm/federated

humble nest
#

i made a ethereum and dogecoin comparison graph but ethereum seems to be very linear since it has such a high value compared to dogecoin so its hard to compare the both together. what potential improvements should i make on this graph? thanks for the feedback if so

#

also if youre wondering, this is purely experimental, i just wanted something to plot and decided to analyse cryptocurrencies

exotic maple
#

to their corresponding ranges

#

but what exactly do you want to compare? Rate of change per time period?

humble nest
#

i wanna compare their value throughout the past month

exotic maple
#

then compute that change

drowsy stag
humble nest
exotic maple
#

the change is simply change = x - x(-1)

humble nest
#

i plot the data i just need to scale it

exotic maple
#

or percent cahange

#

think carefully about what you want to do, what exactly do you want to plot

humble nest
#

how do i apply that to my code

#

yeah sorry im just a beginner to data visualization

exotic maple
#

that's ok, but the most important question is -what- is what you want to see

#

everything builds from there

humble nest
#

yep

humble nest
exotic maple
#

but the most important question is -what- is what you want to see

humble nest
#

i wanna see the movement of the crypto

#

when it goes down and when it goes up

exotic maple
#

in relation to what?

humble nest
#

because i cant have it going at constant rate up

exotic maple
#

to itself? to other?, etc

humble nest
#

in relation to doge

#

because if i plot it individually

#

it shows a lot of movement

#

but when its in relation to doge it just goes up at a constant rate

#

shall i show a depiction of what i mean?

exotic maple
#

yes please because im not sure i get it

#

im thinking what you want is a common standard regression but i might be wrong

humble nest
#

one sec

#

oh my bad

#

i meant dogecoin

#

ethereum is always linear, even when plotted individually

#

heres dogecoin when its plotted by itself

exotic maple
#

so you just want to plot them...together? no specifics analysis or anything?

humble nest
#

i mean i do want analysis which is why i want to make it so i can see it going up or down

#

instead of it just going at a straight line

exotic maple
#

your Y axis is a mess in that plot

#

why is it being generated like that

humble nest
#

nah its meant to be like that

#

because dogecoins value is pretty small

slate hollow
#

http://surpriselib.com/ so looking at this framework, it doesn't use gpu right?

exotic maple
#

Id say your best is to calculate the pct_change for each crypt

#

pct_change from x0 to x1 for each crypto, since that would be standarized

humble nest
#

so i have to plot in the y axis value manually?

exotic maple
#

no?

#

just calculate pct_change on each (instead of values) and plot that

humble nest
#

i got it from a csv file

exotic maple
#

you need to calculate pct change then

humble nest
#

dang discord is not loading

#

or i could just analyse another set of data

#

because ethereum has a huge value difference

exotic maple
#

its pretty trivial -> pct-change = (current_value - previous_value) / previous_value

humble nest
#

load

#

aight

#

so i just plot the averages?

exotic maple
#

average has nothing to do with pct_change

humble nest
#

dude i told you i am new

#

so no need to have high expectations of me

exotic maple
#

I'm not, i'm just wondering why you mentioned something unrelated

#

as i said, I think I havent quite understood WHAT is what you want to see

humble nest
#

i just needed to know how to make it so i can see 2 different datasets normally without it being too linear

exotic maple
humble nest
#

makes sense

exotic maple
#

if your Y axis was bottom-top you would see ETHs movement

#

like a normal plot...

humble nest
#

thing is

exotic maple
#

but your chart is a weird thing that goes from 2.7k to.... 2.7k?

humble nest
#

and dogecoin has a pretty low price

exotic maple
#

at least ive never seen that

humble nest
#

im analysing dogecoins price

exotic maple
#

do you experience with financial analysis?

#

covariances? correlations? Betas?, etc?

Python expertise is not related to subject matter expertise.

#

if your financial expertise is solid then you just need to think of what kind of visualization you need for your analysis

humble nest
#

sorry discord is being slow rn

humble nest
exotic maple
humble nest
#

not meant to be financial i just needed something to analyse

#

and decided to use cryptocurrencies as an example

exotic maple
#

Oh so you're just using financial data to learn how to plot in Python? is that correct?

humble nest
#

yup

exotic maple
#

that would have been a lot faster to say lol

humble nest
#

im just learning how to manipulate and plot csv files

exotic maple
#

ok 1st recommendation

#

Learn Pandas

#

what plotting library are you using? Matplotlib?

humble nest
#

yea

#

using matplotlib

exotic maple
#

Do you know Pandas?

humble nest
#

and pandas as well

#

i just used pandas to make it read the csv file so i can manage it

exotic maple
#

ok perfect

#

make a copy of your dataframe so you dont have to read it again in case of errprs

#

good, so, before going to visualization, try tampering a bit with your data

#

df2 = df.copy()

#

on that 2nd dataframe, try calculating pct_change for each crypto

#

and plot that, NOT the values

#

you can also find their correlation, etc

#

maybe fit a linear model and calculate the RMSE (root mean squared error) of trying to predict ETH through Doge

humble nest
#

most of the time i was just trying to find the proper csv file to use

#

and i am actually going to do predictions soon but thats for another time

#

for now im doing something as simple as plotting 2 datasets together

exotic maple
#

ok ignore regression

#

do the pct_change thing

#

and plot that

humble nest
#

alright

#

but what percentage should i change it to

exotic maple
#

?

humble nest
#

like

exotic maple
#

you dont set it manually lol

#

you calculate it

humble nest
#

oh

#

then i plot the proportion of the value?

exotic maple
#

no...the pct_change

exotic maple
humble nest
#

like i plot a percentage of the value so it doesnt have a huge value difference

#

so i can actually see the graph move better

#

but i feel like that wouldnt be such a great idea anyways

humble nest
#

what information do i need to put in it

#

actually i wont bother you and try to read the documentations to get a more vivid understanding

exotic maple
#

if something about the documentation is not clear, then you can specifically ask that and it will be much easier to help you 🙂

humble nest
#

aight

#

sounds great

#

shall we dm just in case because i feel like we've flooded the chat way too much

#

anyways gonna go afk

#

oh btw sorry about the confusion, the reason why the y axis is messed up is because its values are made up of the ethereums prices, which is why it keeps going straight

#

guess i might need to set some index on the y axis

grim patrol
#

How do you add a softmax regression layer to an RNN model with autoencoder? using TensorFlow

#

I implemented the class to encode/decode and call

#

but I don't really understand how to take that output and add a layer on top of it

near cosmos
median ember
#

how can I use numpy.isin for 2d arrays? like if I have a [[0,1], [1,2]] and I want to search for [[0,1]] so that it returns [True, False]?

exotic maple
#

Obv you must always checl your raw data first 😋

near cosmos
gray plover
#

can i block a response from the package in chatbotAI like it keeps asking about my family n i wanna remove that response

minor grove
#

I have a basic question but not sure exactly how to ask but here goes. I have 2 data sets one set is a set of successful transactions and the others are where I had a failure. The failures don't seem to have a pattern that i can spot visually. How could I figure out what combinations of factors and the order of that combination that led to the failure programmatically or using ML?

minor grove
#

Can anyone help?

velvet thorn
#

!e

import numpy as np

a = np.array([[0, 1], [1, 2], [2, 3]])
b = np.array([0, 1])

print((a == b).all(axis=1))
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

[ True False False]
velvet thorn
velvet thorn
minor grove
#

I am trying to get some experience with it, so this was basically my first project that I chose to try and understand.

#

Mainly trying to find out where to start looking to be able to solve this problem.

velvet thorn
#

i.e. you don't know how to perform EDA, preprocess data, model relationships, etc.

minor grove
#

I am trying to go through a coursera course on ML, but yes the answer is no

velvet thorn
#

hm

#

then I would suggest

#

you work on said course

minor grove
#

That is what I am trying to do

midnight charm
#

Hey, is it possible to use a stacking algorithm on just 3 inputs?

#

Like, I have 3 inputs(predictions) [0.999854,0.9894, 0.97802734375] and somehow I need to get a better prediction that the best one, in this case 0.999854

eager timber
#

dnd ||read status||

#

hehehehehe

#

gotcha

hoary wigeon
#

any good data analyst here?

#

I'm working with RFM Analysis

#
-------------------------------------
| Quantity |   UnitPrice  | Invoice |
-------------------------------------
|    6     |      10.05   |   I105  |
|   -2     |      10.05   |   I105  |
|    3     |      12.36   |   I107  |
|   -1     |      12.36   |   I107  |
-------------------------------------
hoary wigeon
#

It seems I105 and I107 has returend their order

#

so i must count them in monetary analysis ?

lapis sequoia
#

hey anyone knows a good docker containerfor data-science?

grave frost
#

I don't know why I always manage to find weird use-cases that no one ever implements 😦

#

Anyone know anything about computing attention on multi-dimensional sequences?

vital lodge
#

i have a doubt on lstm neural networks.

#

From what I saw lstm is a great algorithm for time series forecast prediction.

#

But when dealing with something like the stock market we can't be sure when the market might go down or go up, then how come lstm makes such accurate stock prices predictions?

grim patrol
# velvet thorn just call that output with another layer

How is that done in practice?
Currently my autoencoder looks like this:

class AnomalyDetector(Model):
    def __init__(self):
        super(AnomalyDetector, self).__init__()
        self.encoder = tf.keras.Sequential([
            layers.Dense(64, activation="relu"),
            layers.Dense(32, activation="relu"),
            layers.Dense(16, activation="relu"),
            layers.Dense(8, activation="relu")])

        self.decoder = tf.keras.Sequential([
            layers.Dense(16, activation="relu"),
            layers.Dense(32, activation="relu"),
            layers.Dense(64, activation="relu"),
            layers.Dense(79, activation='sigmoid')
        ])

    def call(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

Do I just make the last decoder softmax activation? The article I'm implementing describes this as a separate step from the autoencoder

arctic wedgeBOT
#

Hey @blissful heath!

It looks like you tried to attach file type(s) that we do not allow (). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

blissful heath
#

How do I turn this Text into a table? I opened TXT with Pandas, but it organized in a standard way, the big problem that this file does not have a delimiter. I thought of a logic according to the position where the characters of each line start (example: the main column comes from lines that have their first character in position 8, while the secondary column comes from lines where the characters start in position 15. But I am not able to develop the logic, can someone help me?

#

My goal

dusky granite
#

I have converted the model to a .tflite but am unable to predict using it
this is what i used to convert the model

#
converter = tf.lite.TFLiteConverter.from_keras_model(newmodel)
tflite_model = converter.convert()

with open('numeric_values-model.tflite', 'wb') as f:
  f.write(tflite_model)```and this is how i predicted in colab
```to_predict=tf.constant(np.array([[2.0,2.0,6.0,6.2]]))
predictions=newmodel.predict(to_predict)
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
for prediction in predictions:
  index=np.argmax(prediction)
  print(SPECIES[index],prediction[index])```want to be able to use it with tflite
don't know how to do so
raven knoll
#

Does anyone seen this error before. I am using dask and Im trying to use the countvectorizer but when I fit_transform the data I get an error and I tried a lot of stuff to solve it but no luck

lapis sequoia
#
smokingDeaths = fatalities[(fatalities['ICD10 Diagnosis'] == "All deaths which can be caused by smoking") & (fatalities['Sex'].isnull() != True)]
smokingDeathsMaleYears = []

for each in smokingDeaths[smokingDeaths['Sex'] == 'Male']['Year']:
    smokingDeathsMaleYears.append(each)

    
smokingDeathsFemaleYears = []
for each in smokingDeaths[smokingDeaths['Sex'] == 'Female']['Year']:
    smokingDeathsFemaleYears.append(each)
    

smokingDeathsMaleValues = []
for each in smokingDeaths[smokingDeaths['Sex'] == 'Male']['Value']:
    smokingDeathsMaleValues.append(each)
    
smokingDeathsFemaleValues = []

for each in smokingDeaths[smokingDeaths['Sex'] == 'Female']['Value']:
    smokingDeathsFemaleValues.append(each)
plt.plot(smokingDeathsMaleYears, smokingDeathsMaleValues, label = "Male")
plt.plot(smokingDeathsFemaleYears, smokingDeathsFemaleValues, label = "Female")```
This is the code I've used to plot the graphs
#

but the lines are not getting plotted on the same scale for some reason

humble nest
#

oh yeah thats similar to what happened to me

#

i still havent found a solution to it

#

the values in the y axis arent in proper order

lapis sequoia
#

@humble nest I found the problem with mine though

#

the Y values were actually strings

#

the problem was fixed when I converted them into integers

humble nest
#

OH

#

no wonder the code wasnt detecting it as being numbers

#

makes sense

#

thanks for the solution btw

boreal summit
#

Hello house, anyone has link to a telegram chatbot code? I was asked to build one for something, so I was hoping I could just edit the code and stuff.

#

Thanks. 🙏🏿

serene scaffold
lapis sequoia
lapis sequoia
serene scaffold
lapis sequoia
#

Right,

#

that does look more readable

#

But yeah I'm just trying to make something work at the moment to reach the deadline,

#

considering the fact that my code is probably not going to be read at all

dapper halo
#

is there a way to index what rows have the same values for X columns and dropping them from a dataframe?

Everything im seeing uses a loop which looks messy and can take ages depending on how large the df is

lapis sequoia
#

I think there is

#

Can't remember what though, but there should be a function for that in Pandas

serene scaffold
#

Please ping me when you do this or I will not know that you have done it.

serene scaffold
# dapper halo is there a way to index what rows have the same values for X columns and droppin...

I'm not sure if you're still here. If you know how to use masks in pandas, this will help you figure it out: https://stackoverflow.com/questions/22701799/pandas-dataframe-find-rows-where-all-columns-equal

arctic wedgeBOT
#

Hey @dapper halo!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

dapper halo
#

ah pooo it didnt like it

lapis sequoia
#

💀

serene scaffold
dapper halo
#

probably why you said as text

#

yeah lmao

lapis sequoia
#

How big is the file

serene scaffold
#

I only need a sample

lapis sequoia
#

but you can always just take like 10 rows

#

work with it

#

apply it on the whole

#

yeah

serene scaffold
#

whatever print(df.head().to_csv()) prints out basically

lapis sequoia
#

Unless you have 300 columns per row for some f'ed up reason

dapper halo
#

Nh,Redshift,Metallicity,Density,N_SiII,N_SiIII,N_SiIV
15,0.25,-2,-1,12,14,13.5
15.5,0.25,-1.5,-2,12,12,13.5
16,0.25,-2,-2.5,12,12,12
16.5,0.25,-3,-1.5,13.75,13,14

@serene scaffold

serene scaffold
dapper halo
#

Man my incompetence is shining today

serene scaffold
dapper halo
#

the N_Six

lapis sequoia
#

N_six?

dapper halo
#

id like to ignore Nh, Redshift, metallicity, and density.

Only focus on the last 4 or just any grouping

lapis sequoia
#

Which one's N_six

#

what?

serene scaffold
# dapper halo the N_Six
  1. look into DataFrame.eq
  2. pick one arbitrary column and see if the other two are equal to it. since they all have to be the same, it doesn't matter which you pick
dapper halo
#

yeah. I'll set the N_Six to some threshold of 12 or whatever. If all of those columns have the same value the network cant train on em

serene scaffold
#

also look into .all

dapper halo
#

thank ya thank ya. Ill check em out

lapis sequoia
#

?

#

No idea what you guys just talked about but I hope it works out for you

dapper halo
lapis sequoia
#

Oh

#

three of the columns make up N_six?

#

I see

dapper halo
#

x just meant which state of silicon. but yes

lapis sequoia
#

I see

#

Makes sense

#

@serene scaffold Is it just me or do you sound like you've worked as a professional recruiter once in your life

late shell
#

Why do people say that one can learn ML/AI without being good at math. That seems absolute BS. I'm trying to understand what the hell maximum likelihood is for logistic regression and it's taken me hours and I still can't quite wrap my head around it.

lapis sequoia
#

Not to be too blunt, but yeah you do sound kinda Pro

lapis sequoia
#

Also, I think everybody's good at math

#

they just might not know how to apply it though

#

If you can do Multiplication, Division, Subtraction, Addition

#

your brain's pretty capable of applying mathematical concepts to solve problems

serene scaffold
grave frost
lapis sequoia
#

if you find yourself having to deal with some math in ML/AI

#

just post it on SOF

stark zenith
#

Knowing everyone looks stuff up on SO makes me feel better about having to look up stuff on SO. But it also makes me worry that I do not truly understand what I am doing.

exotic maple
#

but aside from that, most things in the math (that i've seen) is just keeping a cool mind and figure out the logic of how things should be

#

for example for backpropagation, which is just partial derivatives to minimize the cost function at the end of the NN. Conceptually it's "simple" but applying or building that for M layers of N neurons...ugh

cedar sun
#

hi guys, if any of u is interested in helping me making a data set pls ping me. Google api doesnt allow too many requests per user, So it will be faster if someone of u cooperate. the image data set will be about pokemons. I have an script already

serene scaffold
cedar sun
#

yeah, but u cant do infinite per day

#

using Google Image Search api

lapis sequoia
#

Hi, I'd like to get back in to programming for the purpose of being able to scrape data from websites. I've done a full semester of Intro to Compsci with Python five years ago. What's a good book or course that I can jump into?

cedar sun
bronze skiff
serene scaffold
cedar sun
#

steler ignores me

stuck swallow
bronze skiff
stuck swallow
#

what library is the best for that? i heard opencv is only for rigid objects is that fine?

cedar sun
#

@serene scaffold:(

solar nest
#

hi all!

serene scaffold
cedar sun
#

ah lol

#

okey okey

#

tyvm tho lol

lapis sequoia
#

Hi guys, quick question as i am a bit confused. I want to predict the UnitPrice which is my target, should I use standardscaler and then do one hot encoding?

solar nest
#

trying to plot some data with python for the first time. it's a CSV file with 4 columns: datetime, a, b and c. i already learned how to work with 2 columns: just add "squeeze=True". but with 4 columns, how can i treat this as a time series and get a line plot with three different lines (which, to make it worse, have totally different maximum values)?

tiny flax
#

You load it into pandas split the columns into series now you have 4 variables each for one column
Plt.plot(date,a)
plt.plot(date,b)

#

Plt.show()

grave frost
solar nest
#

@tiny flax
i'm interpreting that to mean i should write something like

series = read_csv('data.csv', header=0, index_col=0, parse_dates=True)
pyplot.plot(series[0], series[1])
pyplot.show()

which unfortunately gives KeyError: 0

tiny flax
#

No thats not a series its a dataset

stuck swallow
tiny flax
#

Series is 1D

#

Im on my phone honestly typing code is hard

solar nest
#

yeah i noticed 😛

#

it's alright perhaps someone else will come along

#

ah wait now i'm getting somewhere

#

index_col=0 means that i can just say .plot(data['a'])

grave frost
solar nest
#

each of the .plot() calls returns a Line2D object, which does not understand set_ylim() ... how is the latter now accessible?

tiny flax
#

I turned on my laptop yay

solar nest
#

🙂

tiny flax
solar nest
#

but then how do you access set_ylim after having done line = plt.plot(a)?

tiny flax
#

@solar nest

#

plt.ylim()

solar nest
#

that works but is not specific to each line

#

problem is, a goes from 0 to 10, b goes from 0 to 12000 and c goes from 0 to 4096

tiny flax
#

its a hassle but easier than to set a y limit

#

I think

solar nest
#

huh

#

nono i can't drop them

#

i need to scale them

#

there must be a way

tiny flax
#

like bigger step values?

#

in the graph i mean

solar nest
#

no, multiple, independent y axes ..

lapis sequoia
#

Anybody know any similar discord servers for R?

exotic maple
#

perhaps you can try using subplots?

solar nest
#

@exotic maple no, i also later need to group it by day and put each day in a subplot.

near cosmos
slate hollow
#

so this thread covers loc vs iloc

#

is there a similar thread for those two vs the raw index operator

velvet thorn
slate hollow
#

oh ok

tiny flax
#

How do you get the random_state in sklearn?

#

Coz on one random state I see 5% better accuracy so I wanted to get it in a variable

#

Like after training the model

strange oriole
#

i will setup the server rn wait a sec

#

but

simple epoch
#

🗿

strange oriole
#

if you type #tweet

#

it should @ you and say undefined coz i dont have the python server up

#

made a tweet generator

#

#tweet

random aurora
#

#tweet hello

strange oriole
#

@random aurora undefined

#

@random aurora undefined

random aurora
#

😢

#

#tweet hello

strange oriole
#

@random aurora undefined

#

@random aurora undefined

random aurora
#

bruh it not working @strange oriole

strange oriole
#

lol

#

try again

random aurora
#

ok

#

#tweet hello

strange oriole
#

@random aurora undefined

#

@random aurora undefined

random aurora
#

#tweet hello

strange oriole
#

@random aurora helloge man all when I us dorn tweet

dog

gay

conspiracy posting

prayn hfw

#

@random aurora undefined

random aurora
#

YOO!!

#

ITS WORKING

#

cool @strange oriole

dim olive
#

#tweet hello

strange oriole
#

@dim olive hello are ifto you sent her frears esees is mo in make of Keemstar seesing a 12-tee.

#

@dim olive undefined

dim olive
#

!ban 846208222225891329 selfbotting is against discord ToS

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied ban to @strange oriole permanently.

dim olive
#

!ban 518944568302108712 it appears you are only here to help your friend with a selfbot. This is against ToS, we do not want this in our community.

arctic wedgeBOT
#

failmail :ok_hand: applied ban to @random aurora permanently.

dapper halo
#

👋

lapis sequoia
#

👋

#

What's selfbotting

#

Are they pretending to be bots?

dim olive
#

it is when you automate your user account in discord. It is against ToS

lapis sequoia
#

?

#

Why would that be against ToS? sounds random as hell

#

Maybe because then it'd be easier to spam though

dapper halo
#

What would automating your user account be useful for?

#

outside of just developing a bot....which its probably not their primary user account so thats just all it is for

inland zephyr
#

Hello i need your suggestion about my previous question about siamese neural network for multiple person face recognition

#

The one idea that fly to my mind is instead using SNN for directly make the similarity calculation, I use the NN feature (from n-variation of image) then store it on database. Then the new image come, feature extracted then i check the similarity based on stored feature in paralleled?

#

Since as far as i know, general NN also good to create the feature

coral kindle
#

I usually use selenium and BeautifulSoup for webscraping

#

Idk how Scrapy is different

jade chasm
#

Hey guys, we are using Pytorch. After a while, all class probabilities converge to close to 0.99, making the model a random number generator. Anyone know any ways to deal with this?

#

We have tried adding L1 norm by adding the number of parameters to the loss, we have used batch normalization in the linear/convolutional layers and we have added gradient norm clipping

boreal mulch
#

ok

abstract moon
#

Hello Everyone. I am new to python. Have been coding in java and C++ and mainframes uptill now. I am facing some issues using the Pandas package in python. I know how to do data manipulation in java and C++ through loops. But in python it takes a lot of time. So i switched to Pandas and it is great!!!. I have 31 rows and 5 columns in excel sheet. I want to divide the 15th row data by 1st row and so on uptill 30th row data is divided by 15th row. And write the output in same file in the next columns or even next sheet would do. Could you please help me out.

ripe forge
#

The simplest way that's Also stupid fast, I'd say, is to create "shifted" columns in pandas. Take a look at shift method

#

Then once shifted columns are created, just divide

#

No iteration needed, no loops needed, and then you can save the output as you prefer

abstract moon
#

Thanks @ripe forge for yourinput. If i understand correctly. I will have to make a copy of my data and shift that to 15 columns down and then divide the value of one dataframe with other.

ripe forge
#

Yep exactly. There's a .shift method in pandas that makes this easy

cedar sun
#

Guys, just one thing

#

If i have a pretrained model

#

But i dont have the data set it was trained with

#

But i wanna train it with more augmented data, can i with my own dataset?

ripe forge
#

Yes, that sounds a bit like what we do in transfer learning in any case.

#

The only caveats is, if this original data was also directly relevant to you, then each iteration with the new dataset may erode some of the learnings specific to older dataset.

cedar sun
#

Yeah, i would like to use the same dataset

ripe forge
#

You can mitigate this somewhat by freezing the initial layers but I'd suggest very few iterations and freezing both.

cedar sun
#

But i havent it

#

Freezing which ones?

ripe forge
#

Maybe freeze everything except the last layer to begin

cedar sun
#

Lol

#

Mm okey i will try

#

The model is inception

ripe forge
#

Oh. Then the original data isn't directly relevant to you is it?

#

What's the task for your model? Ie what are you trying to predict

cedar sun
#

It is

#

I was trying to predict pokemons

#

But sadly, if the pokemon is colored, it fails to predict. So i made a generator that extends the keras one wich adds different colors to the img, so i pretend the nn to focus on the shape too

ripe forge
#

And the original inception data for your model is also on Pokémon?

cedar sun
#

Yes

ripe forge
#

Ah OK

cedar sun
#

Sec, let me see if i find it

ripe forge
#

I'm confused then, how'd you create modified images if you don't have original data?

cedar sun
#

With my own images from the pokemons

#

:D

ripe forge
#

Oh ok. OK then got it, all my original statements apply. Your model performance may deteriorate if you overdo this

#

Maybe you could consider an alternative, convert your input to greyscale and see if the model is able to predict Pokémon from that as is. I suppose this depends on how the original model was trained

#

Ie instead of having the model deal with coloured images, have a preprocessing that deals with it such that the model doesn't have to.

cedar sun
#

Nah, gray scale it fails

#

I tried

ripe forge
#

Fair enough.

cedar sun
#

I bet original data was colored pokemons

ripe forge
#

Then the original must be using colours

cedar sun
#

So it pays attention to colors

#

So i thought about 2 ways

#

The first is retraining with this color modification augmented data

ripe forge
#

It would be ideal if you had original data

cedar sun
#

And the second, modifying the input layer to recieve the mask of the pokemon aswell

#

As the "shape" of the pokemon

#

I have another nn which returns the mask of an image. It is called u2net

#

Cuz when i was child, i remember pokemon had something like "who is this pokemon?" And only the shape was showm

#

And i was able to guess it

#

So maybe a nn can too :D

modern beacon
#

is there a module for generating responses to input based on training data?

upper spade
#

yo guys i just finished learning pandas

#

took alot of my brainpower

#

but not sure if i really get it yet

#

is there any sort of project or wtv that i can do

#

to know if i really get it

eager timber
#

hehehe the first thing i notice

cedar sun
#

@serene scaffold hello dude, did u get any reply?

cedar sun
#

I have one question... idk if it will be possible but

#

I am trying to make a pokemon classifier

#

and i have 898 classes

#

but there are pokemons such as Primal or what ever, which are the same but different shape, w/e

#

The thing is i downloaded a model

#

with a 928 classes output, cuz for it, kyogre != primal kyogre

#

so it is on a different class

#

Can i remove the last output layer of this pretrained model and reduce it to 898 classes???

#

no, right?

novel elbow