#data-science-and-ml

1 messages · Page 321 of 1

velvet thorn
#

wrong part

#

yeah

#

but if it’s not

#

then the brain has the same limitation

#

of course we know @ very small scales

grave frost
velvet thorn
#

the world is not deterministic

grave frost
#

its extremely complicated

velvet thorn
grave frost
#

it can't be mapped as a static function

grave frost
#

in the sense that the structure always changes

#

some connections break

#

some do not

#

its kinda contested BTW, and the research is pretty new. but we do apparently modify the brain in any case over time

velvet thorn
#

of course this happens

#

but

#

the point is that such changes

#

if they are deterministic, they can be modelled through the history of state, which the hypothetical function takes

#

and if they’re not, then those aspects are just random, which can also be theoretically modelled

grave frost
# velvet thorn yesyes

your argument is such that as long as humans do actions, it can be considered a function. if that's the case, then even behavior of quarks is a definite function, its output being the set of coordinates?

#

not everything can be modelled, and a deterministic view of the universe is pretty incorrect. Our brain is far from a function, as has been often laughed by neuroscientists.

HTM hasn't achieved AGI mostly because its breakthrough ideas are very new and its slowly picking up steam to be started and researched even fully. Maybe it won't lead us to AGI, but its the closed thing we have got - the path with the least error, as compared to DL

cedar sun
#

guys, what are the ways to increase a model acc?
from the most newbie ones to the most advanced

exotic maple
#

Guys I have question. What kind of statistical test can I use the determine a categorical features importance on a regression task?

I've been reading a bit about it and it seems a one-way ANOVA (after turning the categorical features into dummies) seems like the most viable approach, but I'd like to be sure.

#

the TL;DR -> What statistic is best to match: Categorical Input -> Numerical Output

#

I tried using sklearn's f_regression and mutual_info_regression, but i'm not confident in the significance of this results

blazing bridge
#

Hi, I am currently working a deep learning model for image colorization and have a pretty big dataset as well

#

Even though i switched the dataset, the results aren't very good at all

#

I am not sure how I can improve them

#

this is the dataset I am using

#

if anyone know how I can achieve good results please ping me

uncut barn
#
engine = create_engine('sqlite:///data.sqlite')
create_table_from_csv(engine,
                      "country-income.csv", # name of file
                      table_name = "country_income", # give a name to the table
                      fields = [ # all the columns in the csv file
                          ("region", "string"),
                          ("age", "integer"),
                          ("online_shopper", "string")],
                      create_id = True
                     )

How do I load the CSV file using Cubes, and create a JSON file for the data cube model, and create a data cube for the data?

#

ive done this part but dont know where to go from here

gritty spear
#

hi, anyone tried GPT with graph database ?

wintry crescent
#

How to implemante ANN with python for image recognize (not letters)

cedar sun
grave frost
grave frost
gritty spear
#

@grave frost can you please give some references?

strong zephyr
serene scaffold
grand breach
#

i'm thinking of moving my anaconda dir from C to a different drive. if i create a symlink to the old directory after moving which is there in the PATH variable will everything work as expected?

#

or should i backup my base and other env and restore them later after re-installing?

primal tulip
grand breach
#

or maybe just move only my environments to get back some space?

primal tulip
#

You would definitely be able to do it in Linux. If Windows is not acting weird, in theory it all should be fine as well.

primal tulip
grand breach
#

i'm on windows

#

I was clearing up my C and saw my anaconda installation was occupying ~12gb so thought why not move it to other place..i did a conda clean --all to remove some unnecessary files

near cosmos
#

It's been a while for me on windows, but I'm skeptical this will work because windows aliases usually aren't invisible to programs in the way symlinks are (at least ~10 years ago)

#

You might try just moving the cache

primal tulip
#

You can always count that Windows will be weird and quirky, then.

sharp pawn
#

hey guys anyone know how to subtract sine with cosine waves and plot the resultant wave in numpy python?

cedar sun
earnest hawk
#

Hello guys, i have a question. Actually i programming a simple application with mist 784 database. The application is to recognize the drawn numbers. I use SVC, and here is my problem. Model calculation take a long time. I can wait 1 hour and nothing happens. I try another model like KNN or a DummyClassifier, but the effect is the same.

#

There is my code

serene scaffold
# cedar sun then what are u asking exactly?

Are you asking how to improve model performance in general? Because if you're asking how to improve the performance of a specific model, I can't guess without knowing what that model is designed to do. What classes does it classify?

#

Don't forget the py

#

!code

earnest hawk
#
from sklearn.datasets import fetch_openml
import pandas
from sklearn.model_selection import GridSearchCV, cross_val_score, cross_val_predict
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

import py
from inne import jes

mnist = fetch_openml('mnist_784', version=1)

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

X, y = mnist["data"], mnist["target"]
some_digit = X.iloc[999]
some_digit_image = some_digit.values.reshape(28, 28)
plt.imshow(some_digit_image, cmap="binary")
plt.axis("off")
y = y.astype(np.uint8)
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
sgd_clf = SVC(gamma='auto',probability=False)
sgd_clf.fit(X_train, y_train)
while 1==1:
    py.Paint()
    obj = jes.init()
    print(sgd_clf.predict([obj]))
    some_digit_image = obj.values.reshape(28, 28)
    plt.imshow(some_digit_image, cmap="binary")
    plt.axis("off")
    plt.show()
cedar sun
serene scaffold
earnest hawk
serene scaffold
earnest hawk
#

oh sorry

serene scaffold
earnest hawk
#

The problem is with training model. Here ```py
sgd_clf.fit(X_train, y_train)

#

I check it with a debbuger

serene scaffold
earnest hawk
#

This database have 70k records. Interestingly earlier this problem did not occur.

lapis sequoia
#

can anybody help me with tensorflow?

grave frost
solid aurora
#

How can I vectorize this:py images = #an ndarray of 3-channel images, with dimensions (batch, height, width, rgb) images2 = np.zeros(images.shape) for i, image in enumerate(images): images2[i] = skimage.color.rgb2hed(image)?

#

I know there's a way to run a transformation of each subdimension of an ndarray

#

but I can't recall what t's called

tidal bough
#

scipy.color?..

solid aurora
#

skimage*

tidal bough
#

I can't find any docs on it, huh

solid aurora
#

my bad

tidal bough
#

ah

solid aurora
#

ah

tidal bough
#

by the docs, it only needs the last dimension to be colors - there's no requirement of it being 3d

solid aurora
#

nice catch!

tidal bough
#

so you can cast it on the entire array

acoustic leaf
#

Does tensorflow work with python 3.9? In their official page it says it has been tested with python 3.8

#

but doesn't mention py3.9?

serene scaffold
acoustic leaf
#

so what do I do then? just reinstall python?

serene scaffold
acoustic leaf
#

i am on windows

austere swift
tidal bough
#

install a 3.8 Python, yes. If you're using something like pyenv (IIRC) you can even manage them automatically

austere swift
#

so it should work

acoustic leaf
#

I don't know how pyenv works tho. should I just download anaconda?

serene scaffold
acoustic leaf
#

so I can have 2 installations at once and switch between them with the py command?

austere swift
#

yes

acoustic leaf
#

so I am guessing it changes the default python system-wide?

acoustic leaf
#

thanks for the help guys. I guess I'll be looking into it.

static granite
#

hello. I want to write a program to detect vehicles in traffic.
Does anyone know any good books on this subject? The field of computer vision in general is also relevant

serene scaffold
static granite
#

live feed from the car

#

for example

serene scaffold
#

That isn't something that I know about, but "dashcam footage detect vehicles machine learning" might be a good Google query. But hopefully someone else who knows about that topic will show up here.

static granite
#

it doesnt have to be specifically that

#

im looking for books about image classification in general

lapis sequoia
#

Hey there everybody! I have a little question, do you absolutely need to be an advance Python programmer(Knowing A-Z in beginner level programming like lambda function sort() etc.,OOP, Socket programming, Concurrent Programming, Data Structures & Algorithms) to learn AI, ML & DL or do you only need to learn the core concepts like the basics and OOP?

grave frost
grave frost
worn bough
# lapis sequoia Hey there everybody! I have a little question, do you absolutely need to be an a...

These are some advanced topics you're mentioning. Most of them don't relate to ML&DS in obvious ways. If you master the basics (I'd say the content of Automate The Boring Stuff) then you're ready to learn numpy and pandas and after that sklearn and pytorch/tensorflow. This might be an unpopular opinion among thoroughly trained people but the libraries keep getting easier to use and you learn most by just applying it in real life

grave frost
iron basalt
sleek sorrel
#

Hello guys, I am hawing problems with pivoting table with pandas in Kiwi help room. Can someone help please?

flint mason
#

Is it worth it to learn excel for data analysis?

iron basalt
#

You should also have a solid grasp of the computational complexity of various data structures / algorithms to make an informed decision on whether something is a viable option or requires too much compute (even if you never use the specific structures / algorithms studied, just get used to estimating how fast things are).

thorn bobcat
#

anyone heard of background matting?

spare vale
serene scaffold
# flint mason Is it worth it to learn excel for data analysis?

you need to be able to work with tabular data in general, so if you were to learn excel, you'd probably learn a lot of the terminology surrounding data manipulation. But I only use excel (well, google sheets in this case) to put data on my work's google drive for my coworkers to look at.

thorn bobcat
#

thinking of introducing a gan to cut human interference since the original paper required human guidance.

#

also object detection to highlight the object rather than having a user create a box

gritty spear
#

Hi, I have been reading for the past 3 weeks now but i'm still at a loss. Can anyone please guide ?

cedar sun
#

eeeehm guys one thing, when using ImageDataGenerator

#

How many new imgs are returned?

#

or how can i control it?

gritty spear
#

i'm trying to use available libraries to be able to generate texts, from existing keywords. I have few texts I wanted to feed for the learning process but I'm a bit a t a loss on where to start from

cedar sun
#

use gpt3

gritty spear
#

i'm told GPT3 is not accessible to the public yet.

thorn bobcat
#

Use GPT2 then

serene scaffold
gritty spear
#

i mean the content quality

serene scaffold
gritty spear
#

what's the url for the api?

cedar sun
#

stelercus, do u have by chance any snipper using albumentations?

radiant kayak
#

Hello

serene scaffold
cedar sun
#

dw

#

from here

#

what augmentations do u think will be usefull?

visual violet
#

hello guys

#

i am stuck (again)

#

suppose i have this

#

the "prediciton" column is to determine which cluster does which object belong to

#

so 0 menas cluster 1

#

1 means cluster 2, and so on

#

now i want to see how the object's classificaion has anything to do with its being clustered

#

but i have no idea how to proceed

#

but it seems a bit weird

serene scaffold
#

I guess that's fine, actually

visual violet
#

you see the annoying thing about drugs is

#

one ingredient can treat different thigns lmao

#

and there are sub divisions of classes

#

like there is an umbrella class, a sub class, an even subber calss

serene scaffold
#

@visual violet can you do print(df.sample(axis=0, n=7).to_csv())?

#

remember that I can't do anything with screenshots

visual violet
#

very true

#

,Ingredient,predictions,classification
1397,NABUMETONE,0,Nonsteroidal Anti-inflammatory Drugs (NSAIDs)
1738,PROPRANOLOL HCL,0,propranolol hydrochloride
1801,LEVETIRACETAM,0,Seizure Disorders
733,DULOXETINE HYDROCHLORIDE,0,"Analgesics, Centrally-acting Nonopioid; Anxiolytics, Non-benzodiazepines; Fibromyalgia; Neuropathy/Neuralgia; Serotonin-Norepinephrine Reuptake Inhibitors (SNRIs)"
1773,OXYBUTYNIN CHLORIDE,0,"Antispasmodics, Urinary"
1229,LABETALOL HYDROCHLORIDE,0,Beta Blockers
16,ZOLMITRIPTAN,2,Headache/Migraine

#

i hope this works

serene scaffold
#

this way you can see the frequency of each class in each cluster.

visual violet
#

sheesh did you just do everything in one line of code

serene scaffold
#

but it's your job as the human to speculate as to why your feature selection resulted in which class ending up in which cluster.

visual violet
#

yeah i used selenium to get the classification hehe

#

even though there are 84 None values

serene scaffold
#

selenium?

#

wat

visual violet
#

i just asked you about it in the morning lol

serene scaffold
#

yeah but all I understood out of that was that you wanted to handle exceptions

visual violet
#

oh right

#

the ultimate goal is to find subclasses for each ingredient

#

which i have finally done

serene scaffold
#

good job!

visual violet
serene scaffold
visual violet
#

i tried it it does work

#

so i took a look at your code

#

i mean one line of code

#

it does put the number of classification for each cluster

#

but it doesn't sort for each cluster

#

for example

#

i know you can't read screenshot but i can't do anything else :((

#

how can i put the trycylic on top of acne

serene scaffold
visual violet
#

it seems like trycylic appears 32 times and acne appears 10 times

#

in cluster 0

serene scaffold
visual violet
#

do i run this code first

#

or the other one first

serene scaffold
visual violet
#

oh right

#

i have a non-python question

serene scaffold
#

what is the meaning of life? idk

visual violet
#

suppose i am reading a research paper and i like a piece of evidence

visual violet
#

but that evidence is linked to another research paper

#

do i cite the research paper i am reading

#

or the original source

serene scaffold
#

depends on what claim made in the paper you're citing

visual violet
#

how about the evidence is just a fact

#

for example, on the research paper, Americans eat 3 cheeseburgers a day (python committe)

#

do i cite python committee?

serene scaffold
visual violet
#

yes

serene scaffold
#

then yes

visual violet
#

oh man

#

i like how in the background info section

#

the author link to many other reearch paper

serene scaffold
#

@visual violet you didn't save the result of the first statement to a variable

#

So it got thrown away. Pandas rarely modifies the source data.

visual violet
#

i thought the ingredient_list is already modified

#

my bad

serene scaffold
#

Nope. You should save it to a variable with a different name.

visual violet
#

this maybe too much to ask

#

but you are sorting on the first column

serene scaffold
#

Ya

visual violet
#

can you sort over all three?

serene scaffold
#

It is.

visual violet
#

i mean it won't be a complete beautiful table like that

serene scaffold
#

It sorts on the first column, then the second, then the third

visual violet
serene scaffold
#

Looks right to me.

visual violet
#

i am glad that each cluster doesn't have the same classification popularity

#

i know waht i just wrote doesn't make much sense for anybody

serene scaffold
#

So, you're glad that the clusters are largely disjoint with respect to what classes the instances have.

visual violet
#

oh yes

#

english has rejoined the chat

#

this is potentially very meaningful

#

so if your code does what you say

#

then the biggest count in column '2' is 18?

serene scaffold
#

My code always does what I say

visual violet
#

there is no bigger number

serene scaffold
#

That row is there because of the 40 in the zero column

#

If you want it to sort by the maximum value of each row, that's different

visual violet
#

is there a way to just break the column '2' away along with the ingredient and sort it by itself

serene scaffold
#

You can select only those rows where the two column is not 0

#

With loc

#

Anyway I must go to sleep

visual violet
#

good night!

serene scaffold
#

Bye!

lapis sequoia
gritty spear
#

how do i feed in my custom text model for training for GPT and BERT ?

ashen sable
#

guys i install tensorflow using pip and then when i imported it it gives me this error

ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed.```
#

any explanation ?

eager cradle
#

how make columns wider

#

wo dat just looks ugly

velvet thorn
#

try sep=';'

eager cradle
#

I know sep, but I used dat only in array, how it should look here 🤔

winged stratus
#

i think you have delimiter issues in the csv file

winged stratus
eager cradle
#

@winged stratus thx a lot!

nova tapir
#

why does my bot say 'goodbye' to everything I say?. how can i fix that ?

arctic wedgeBOT
nova tapir
#
{"intents": [
  {"tag": "greetings",
  "patterns": ["hello", "hey", "hi", "good day", "greetings", "what's up?", "how is it going?"],
  "responses": ["Hello", "Hey!", "What can I do for you?","Hi", "Good day", "Greetings", "what's up?"]
  },

  {"tag": "goodbye",
  "patterns": ["cya", "See you later", "Goodbye", "I am leaving", "Have a good day", "bye", "cao", "see ya"],
  "responses": ["Sad to see you go :(", "Talk to you later", "Goodbye!","Bye","cao","cya","see ya","bye bye"]
  },

  {"tag": "age",
  "patterns": ["how old", "how old is trojan", "what is yor age", "how old are you", "age?"],
  "responses": ["My owner Trojan is 17 years old!", "17 years!", "I am 1"]
  },

  {"tag": "name",
  "patterns": ["what is your name", "what should I call you", "whats your name?", "who are you?"],
  "responses": ["You can call me Jane!", "I'm Jane!", "I'm AI Assistant of trojan!", "My name is Jane"]
  },

  {"tag": "hours",
  "patterns": ["When are you guys open", "what are your hours", "hours of operation"],
  "responses": ["24/7"]
  }

]}

here is the json file

vivid echo
#

Hey guys

#

Can you suggest me a walkthrough for PyTorch

#

I have experience in Keras and some Tensorflow 2.0 for deep learning

#

But I have not done any convolutional unsupervised systems

#

Only classification and regression supervised

gritty spear
#

@nova tapir how do you custom-train your models?

nova tapir
gritty spear
#

i'm trying to do something similar, but i'm still new in the field. reading few resources but still confused

#

I have custom text which i want to train in order to spin them to provide a new text in the same context

#

@nova tapir do you have few resources i can follow ?

jaunty yoke
#

Hello, does anyone here know of any packages that will allow me to categorise words?

#

For example, I can search for all prepositions in a list and it will return this

serene scaffold
#

otherwise, you need to know what word categories you have in mind.

#

(which happens to be my area of expertise, inasfaras I can be considered an expert on anything)

cedar sun
#

can i use gpt3 as user?

#

like, normal user?

#

not the model

haughty wharf
#

Hi,

Is it possible to use a scatterplot on a pandas dataframe with 256 columns on the x-axis and having a 5 column identifier on the y-axis?

#

Here is what the dataframe looks like

lapis sequoia
#

if you've ever heard of matplotlib and pyplot and used them, what would you prefer on the basis of user friendliness ?

#

or even pls state the reason for superiority of one over other if more factors other than user-friendliness do exist and are important

lunar bison
#

Hello

I'd just like to let you all know i'm a lying, racist, and sexist scammer. I give no regard to anyone else, I'm just a idiotic kid who doesn't know crap and acts like a big man. I constantly spam slurs, and I love scamming people. I also violated multiple discord's terms of service. I'm a piece of garbage.

Please spread the word. I'm a scammer.
This is my ID: 841420280425611265

This account was hijacked by someone I scammed. They're the ones posting this. Deal with caution when dealing with me. Have a good day

lapis sequoia
#

bruh

haughty wharf
#

@lapis sequoia
So I've been trying to use matplotlib, but when i try running this on jupyter noteboook, the cell just freezes

for x, col in enumerate(phoneme_df2.columns):
    for y, ind in enumerate(phoneme_df2['g'].index):
        if phoneme_df2.loc[ind, col]:
            plt.plot(x, y, 'o', color='red')
            
plt.xticks(range(len(phoneme_df2.columns)), labels=phonme_df2.columns)
plt.yticks(range(len(phoneme_df2)), labels=phoneme_df2['g'].index)
    
plt.show()
#

ohhh woops thought you were speaking to me

lapis sequoia
fiery pollen
#

hello everyone, i have a case and my time is very limited can someone help me? I would be really happy if you do this, if anyone thinks about it, can you write it privately?

ivory jewel
#

Hello everyone, if someone can help me understanding a bit more a piece of code, I'd be extremely grateful. 7063_homu_zzzz
I'm short on time, I have a thesis to write...
I'm dealing with CNN, I already have the architecture and the code, I would like some clarifications so I can make correspondences between the two (thanks in advance)

gritty spear
#

any insight?

misty flint
#

data viz is pretty cool

#

highly recommend cole's storytelling with data book

ivory jewel
#

I actually have a dataset with 3197 features, here are the 5 first rows. I found a code that I would like to understand. The model structure comes as :

  • Input layer;
  • 1D convolutional layer, consisting of 10 2x2 filters, L2 regularization and RELU activation function;
  • 1D max pooling layer, window size - 2x2, stride - 2;
    -Dropout with 20% probability;
    -Fully connected layer with 48 neurons and RELU activation function;
    -Dropout with 40% probability;
    -Fully connected layer with 18 neurons and RELU activation function;
    -Output layer with sigmoid function.
#

I'm having troubles understanding counting the number of layers..

#

The code :

    # Architecture
    model = Sequential()
    model.add(Reshape((3197, 1), input_shape=(3197,)))
    model.add(Conv1D(filters=10, kernel_size=2, activation='relu', input_shape=(n_features, 1), kernel_regularizer='l2'))
    model.add(MaxPooling1D(pool_size=2, strides=2))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(48, activation="relu"))
    model.add(Dropout(0.4))
    model.add(Dense(18, activation="relu"))
    model.add(Dense(1, activation="sigmoid"))
thorn bobcat
#

Anyone worked with background_matting v2?

ivory jewel
#

My question is : what does the : model = Sequential()
model.add(Reshape((3197, 1), input_shape=(3197,))) line of code mean and how can I calculate the number of output given the structure of the filters (the Conv1D for example)

lapis sequoia
#

Is there any good course for mathematics for machine learning?

ripe forge
#

Model.add adds whatever layer is given to form the architecture layer by layer

#

Reshape must simply be a layer for reshaping the input received

grave frost
grave frost
#

from code tho, you have 8 layers. parameters would be printed out by model.summary()

mint palm
#

what does this error mean

#

upper part of code :

serene scaffold
#

@mint palm please provide text as text

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

that being said, \mathrel{+} is not valid Python code, so something else must have been intended.

visual violet
#

my life is a lie

#

the methods to find k values for k-means are all different

#

i am so confused

coral kindle
#

Hello, I wanted to know what regularization methods were.

#

Methods to prevent overfitting like LASSO and Ridge?

acoustic leaf
#

How do I disable the cudart64_110.dll not found errors in Tensorflow?

#

btw I know my GPU isn't cuda-enabled. I just want to suppress these warnings

cedar sun
#

do u know where can i apply cutmix augmentation?

low venture
#

Hello everyone , I'm learning image treatment with python and I would like to know if I can change the color of a specific pixel.

quasi sparrow
#

Guys, I have a question, can anyone help me out please?
I am uploading a dataset from a csv file and I convert it to a pandas frame but it gets loaded as an object of datatype int64 but I need datatype of int32 for my model

low venture
#

How can I change the color for the numbers 2?

thorn bobcat
#

to detect pixel values in number 2

#

looking for second opinion on this.

visual violet
#

. A
sequence composed of a series of nominal symbols from a
particular alphabet is usually called a temporal sequence, and
a sequence of continuous, real-valued elements, is known as a
time-series

#

i have 0 idea what this means

#

isn't it just y value over time?

#

i am very confused

thorn bobcat
#

Wo' ooh bo' ooh

visual violet
#

yes?

haughty wharf
#

Can someone here help me with trying to implement an LDA model on my dataset for a scatter plot?

I was following this Youtube video that used np.where to show separation between two classifications but the current data that Im working with has 5. The part of the code that's commented out, was the original code in the video.

lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train.ravel())

y_prob_lda = lda.predict_proba(X_test)[:,1]
y_pred_lda = np.where(phoneme_df2['g'] != 'aa' | phoneme_df2['g'] != 'dcl' | phoneme_df2['g'] != 'iy'
                     phoneme_df2['g'] != 'sh' | phoneme_df2['g'] != 'ao')

 #np.where(y_prob_lda > .5, 1, 0)
#

The dataset itself has 256 features, about 4509 instances, and 5 classifications

visual violet
#

your code looks complicated

mint palm
mint palm
#

i wanna do this

#

but how?

mint palm
#

n_h,n_w,c means ``horizontal,vertical, channel(for rbg an all)`

#

m means no. of examples

main fox
#

What are some interesting data sources with regularly updated data?
Like yahoo finance

nova tapir
#
import numpy as np 

def nonlin(x, deriv=False):
    if(deriv == True):
        return (x * (1-x))

    return 1 / (1+np.exp(-x))


X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1]])


y = np.array([[1],
              [0],
              [1],
              [1]])

np.random.seed(1)

syn0 = 2*np.random.random((3,4)) - 1
syn1 = 2*np.random.random((4,1)) - 1

for j in range(60000):

    l0 = X 
    l1 = nonlin(np.dot(l0, syn0))
    l2 = nonlin(np.dot(l1, syn1))

    l2_error = y - l2

    if (j % 10000) == 0:
        print("Error: " + str(np.mean(np.abs(l2_error))))

    l2_delta = l2_error * nonlin(l2, deriv = True)
    l1_error = l2_delta.dot(syn1.T)
    l1_delta = l1_error * nonlin (l1, deriv = True)

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)

print("Output after training")
print (l2)
#

which is the correct visualization of this neural network code?

bold timber
#

How to convert type of string to integer?

primal tulip
primal tulip
indigo pelican
#

has anyone worked with TF-hub before? I tried 2-3 models but I have no idea what is the output they give me

sleek iron
#

Hi Guys is there any dedicated chanel for opencv python

fiery pollen
#

Hi I have 15 columns but describe give me just 1 column and other techniques too(graph correlation etc) how can i solve this problem?

bold timber
near summit
gritty spear
#

Hi, i need guidance, I want to generate text from series of keywords. I want to train the model from bunch of texts I already have. I come across hugginface but i don't know which procedure to follow. Is there any writeups that can help me set to the path? How do i convert the text into train models?

ebon pier
#

hello i want to #ask
how to deal with he's and has for text preprocessing?

floral sky
#

hey i want to do some machine learning using tensor linear regression. but my problem is that i dont know how i can convert my string files into int

serene scaffold
fiery pollen
#

Yep I solved but I have new problem now

#

this is my date format

serene scaffold
#

I can't look at both of these screenshots at the same time, so please provide the data as text.

fiery pollen
#

well this is my date format

visual violet
#

another day another stuggle to cluster

fiery pollen
serene scaffold
arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

fiery pollen
#

ValueError: time data '5.03.2021 00:00' does not match format '%d-%m-%Y %H:%M' (match) and value error

serene scaffold
# fiery pollen

This screenshot has hyphens in it rather than periods. Note that if you need any additional help with this, I will not read any more screenshots of text.

fiery pollen
#

df['DateDim[Day]'] = pd.to_datetime(df['DateDim[Day]'], format='%d-%m-%Y %H:%M:%S')

#

df['DateDim[Day]'] = pd.to_datetime(df['DateDim[Day]'], format='%m-%d-%Y %H:%M') or this

serene scaffold
#
'%d-%m-%Y %H:%M:%S'  # This has hyphens where there should be dots
'5.03.2021 00:00'    # The actual strings you're trying to match has dots
#

I can't guess if 5 is a day or a month, as 5/3 and 3/5 are both valid day-month pairs.

fiery pollen
#

5 is day and 03 is month

serene scaffold
#

See if you can solve the problem with the hyphens.

fiery pollen
#

yess you are great

ebon pier
#

I thought you were talking with yourself

visual violet
#

i think i graphed it wrong

#

can somebody pleae help?

#
0    0.0    0.350547    -0.389165    0.031171    0.131560    0.101988    0.012384    0.118384    -0.326644    0.159515    0.641205    -0.287578    0.295131    0.049982    -0.453871    -0.058566    0.084067    0.033252    -0.087150    -0.026975
1    0.0    -0.063362    0.228691    -0.177655    -0.035891    -0.194385    -0.225461    0.085287    -0.112722    -0.190376    -0.319231    0.316168    0.297476    -0.222511    -0.161768    -0.022497    -0.107356    0.343189    -0.142414    0.157067
2    0.0    -0.525495    0.044349    0.259054    0.032564    0.017787    0.109994    0.617328    1.539279    -0.704796    -0.155155    0.132843    -0.039865    -0.213152    0.298412    -0.391566    -0.107134    -0.313010    -0.238712    -0.138868
3    0.0    0.294616    -0.146452    -0.010603    -0.289189    0.518459    -0.348416    0.174120    0.197173    -0.207225    -0.202068    -0.067731    -0.098195    0.377949    -0.284327    0.140845    0.179972    -0.269980    -0.163283    0.055986
4    0.0    -0.286758    0.176156    -0.045746    -0.031385    -0.361086    0.691218    -0.348555    0.612737    -0.376248    0.030953    -0.105264    0.176193    -0.208051    0.025628    -0.079569    0.342263    -0.220916    0.133213    -0.003057
#

uhh

#

as you can see there are negative values

#

somehow the graph doesn't show negative values?

serene scaffold
visual violet
#
counter = 0 
figure(figsize=(15, 10), dpi=80)
for index, row in percentage_difference.iterrows():
    
    #line, = plt.plot(row, marker='o')
    line, = plt.plot(row)
    if predictions_pct[counter] == 0:
        line.set_color("b")    #blue
    if predictions_pct[counter] == 1:
        line.set_color("g")   #green
    if predictions_pct[counter] == 2:
        line.set_color("r")  #red
    if predictions_pct[counter] == 3:
        line.set_color("c")  #cyan
    if predictions_pct[counter] == 4:
        line.set_color("m")
    if predictions_pct[counter] == 5:
        line.set_color("y")
    if predictions_pct[counter] == 6:
        line.set_color("k")
    if predictions_pct[counter] == 7:
        line.set_color("pink")
    counter = counter + 1
plt.xlabel('Quarter')
plt.ylabel('Percentage differnce')
plt.autoscale(enable=True, axis='x', tight=True)
serene scaffold
#

is predictions_pct the dataframe from before?

visual violet
#

predictions_pct shows what cluster

serene scaffold
#

what is it?

#

an array? a dataframe?

mint palm
#

IF I DO np.multiply(3d_matrix_1, 3d_matrix_2)(same dimensions lets say (m, n, l)), will i get matrix of (m, n, 1)

visual violet
#

<class 'numpy.ndarray'>

serene scaffold
visual violet
#
model_pct = TimeSeriesKMeans(n_clusters=3, metric="dtw", max_iter=10)
predictions_pct = model_pct.fit_predict(percentage_difference)
serene scaffold
visual violet
#
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
#

lol

#

i graph each row of the percentage_difference

#

then i set color accordingly

serene scaffold
#

nevermind, I see now

#

one moment

visual violet
#

the problem is there is no negative value in the y axis in the graph when there should be

serene scaffold
#

alright let me see

#

@visual violet

In [21]: df.plot.line(xlabel='Quarter', ylabel='Percentage Difference')
Out[21]: <AxesSubplot:xlabel='Quarter', ylabel='Percentage Difference'>
#

Not sure why the key is in the middle though

visual violet
#

how is your graph so much better than mine

#

i mean a lot better

serene scaffold
visual violet
#

are you coloring it according to the date?

serene scaffold
#

it's wrong though

#

let me see

visual violet
#

well at least you have the negative direction and i don't lol

serene scaffold
#

Yeah I had to transpose it first.

visual violet
#

please teach me your way

serene scaffold
#
In [23]: df.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference')
Out[23]: <AxesSubplot:xlabel='Quarter', ylabel='Percentage Difference'>
visual violet
#

one line of code??

#

you have to color code it though

#

so should be a bit longer

serene scaffold
#

yes. the method call in line 23 takes color= as a kwarg

#

do you have an array/series of which cluster each row belongs to?

visual violet
#

it is called predictions_pct

serene scaffold
#

so it predicted that each row is the same cluster...?

visual violet
#

yes! each row is an object

#

so one element in the array represent which cluster the corresponding row belongs to

serene scaffold
#

But it's all zero

visual violet
#

there are some 1 and 2 lol

serene scaffold
#

oh there are a few 1s

visual violet
#

that is why i complain the clustering doesn't work

serene scaffold
#

Do you know what color you want for 0, 1, and 2?

visual violet
#

but now i am already fucked, so i gotta keep going with the idea

#

excuse my language please

#

uhh i don't mind, make it red, blue, green

#

i got error like that when there is a logic error in my code

#

one time i merge dataframe with identical rows

#

python tried to do every single combinations

serene scaffold
#

@visual violet

# This is your array from before
In [32]: predictions = pd.Series([0, 1, 2, 1, 1])

In [33]: predictions.replace(dict(enumerate(['red', 'green', 'blue'])))
Out[33]: 
0      red
1    green
2     blue
3    green
4    green
dtype: object

In [41]: df.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors)
Out[41]: <AxesSubplot:xlabel='Quarter', ylabel='Percentage Difference'>
visual violet
#

line 33 won't change the array it self right?

serene scaffold
visual violet
#

can you pleae tell me what i did wrong?

#

like what is wrong with my logic

serene scaffold
fiery pollen
#

Unable to allocate 65.6 GiB for an array with shape (8803915165,) and data type float64

I'm getting an error like this, I think it can be solved with virtual memory, but do I really need to allocate 65 gb virtual memory from the disk or do I have 12 gb ram already, do I need to top it up?

serene scaffold
fiery pollen
#

yep probably it is.

#

I should get an output like this

#

plt.figure(figsize=(20,12))

mergings = linkage(rfm_scaled, method="complete", metric='euclidean')
dendrogram(mergings)
plt.show()

and this is my code it is rfm analyze

#

If I change the method, will the problem be solved?

visual violet
#

looks so much like clustering

#

but different lol

visual violet
#

bruh i possibly messed up both graphs

#

and yet my professor didn't telll me that

#

even though i sent him my code

#

i am sad

visual violet
#

because it assumes that it must have 3 clusters

serene scaffold
#

if you give more colors than there are clusters, however many extra will just never be used.

visual violet
#

'numpy.ndarray' object has no attribute 'replace'

serene scaffold
visual violet
#

yup i got it

serene scaffold
#

(a series is just the pandas version of a 1d array)

visual violet
#

wait where did you get the colors

#

you assigned colors = array.replaced... right?

#
colors = pd.Series(predictions_pct).replace(dict(enumerate(['red', 'green', 'blue'])))
percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors)
#

@serene scaffold sorry for ping

serene scaffold
#

@visual violet do you have matplotlib installed?

visual violet
#

yup i plot multiple things before

serene scaffold
#

what version of pandas do you have?

visual violet
#

something is wrong lol

serene scaffold
# visual violet

you might need to switch the scale so that values on the y axis aren't evenly spaced

visual violet
serene scaffold
visual violet
#

matplotlib : 3.1.3

serene scaffold
#

in either case, look into how you can change the scale of the y axis

#

in particular, you probably want to make it logarithmic.

#

here's what I'm talking about: https://en.wikipedia.org/wiki/Semi-log_plot

In science and engineering, a semi-log plot/graph or semi-logarithmic plot/graph has one axis on a logarithmic scale, the other on a linear scale. It is useful for data with exponential relationships, where one variable covers a large range of values, or to zoom in and visualize that - what seems to be a straight line in the beginning - is in fa...

serene scaffold
# visual violet

if you try again with updated versions of pandas and matplotlib, you just need to add logy=True

#

!docs pandas.DataFrame.plot

arctic wedgeBOT
#

DataFrame.plot(*args, **kwargs)```
Make plots of Series or DataFrame.

Uses the backend specified by the option `plotting.backend`. By default, matplotlib is used.
visual violet
#

the main focus is to make it looks nice lol

serene scaffold
#

which will fix the scale of the y axis so your bottom lines aren't all scrunched together

visual violet
#

i should probably resetart my computer

serene scaffold
visual violet
#

i tried to upgrade pandas

#

but i won't do that for some reasons

serene scaffold
#

And what was the error message?

visual violet
#

i have to upgrade pip first

#

one sec

haughty wharf
#

Im trying to implement an LDA model on my dataset of 256 columns/features and 4509 rows. The problem Im facing with now is that the dataset used in the tutorial is using only 2 classifications and I have 5.

-I commented out the original statement from the tutorial and have been trying to work on it myself but haven't had any luck. Any ideas on how I can modify this?

lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train.ravel())

y_prob_lda = lda.predict_proba(X_test)[:,1]
y_pred_lda = np.where(y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0) #np.where(y_prob_lda > .5, 1, 0)
serene scaffold
#

I'm worried that y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0 does something other than what you expected

haughty wharf
#

Also its Im finding it kinda hard to derive insights from a dataset where I don't even know the names of the columns

#

So any tips on that would be great too! right now Im learning the LDA model reduces the number of classifications to the important ones which will hopefully aid in figuring out how to Analyze this

serene scaffold
haughty wharf
#

I took the speaker and row column out of my dataframe

serene scaffold
#
In [21]: lda = LDA().fit(X, y)

In [23]: lda.predict(X)
Out[23]: array(['sh', 'iy', 'dcl', 'dcl'], dtype='<U3')

In [27]: lda.predict_proba(X)
Out[27]: 
array([[1.28273626e-01, 5.59261688e-03, 8.66133757e-01],
       [6.77497944e-07, 9.93583762e-01, 6.41556027e-03],
       [9.92678238e-01, 3.56884038e-09, 7.32175869e-03],
       [8.43266553e-01, 6.81605776e-06, 1.56726631e-01]])

In [30]: lda.predict_proba(X).argmax(axis=0)
Out[30]: array([2, 1, 0])
#

I was using a subset of the data with only three classes.

haughty wharf
serene scaffold
haughty wharf
#

Ahhh I see

serene scaffold
#
In [33]: lda.classes_
Out[33]: array(['dcl', 'iy', 'sh'], dtype='<U3')

I assume the columns follow this scheme

#

however this just seems like a roundabout way of doing lda.predict

serene scaffold
haughty wharf
#

So basically for each of these 4509 instances, we're trying to see how it matches to the corresponding g classification?

serene scaffold
haughty wharf
#

Phonemes are sounds specific to letters/words? right?

serene scaffold
#

it looks like the end goal is to be able to transcribe audio, though.

#

My background is in linguistics

visual violet
#

bruh is there anything that you can't do?

serene scaffold
haughty wharf
serene scaffold
#

Also if your y is just the g column of the dataframe, you don't need to .ravel() it.

serene scaffold
visual violet
#

what exalcty is the version that i am trying to upgrade?

#

i can't seem to do it

haughty wharf
# serene scaffold anyway, the number of classes shouldn't matter, as `LinearDiscriminantAnalysis` ...

Here is where Im at right now. This is where I left off in the video so just trying to figure out how to move forward.

le = LabelEncoder()
phoneme_df2['g'] = le.fit_transform(phoneme_df2['g'])
encoded_data = pd.get_dummies(phoneme_df2)


y = phoneme_df2['g'].values.reshape(-1, 1)
X= encoded_data.drop(['g'], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 42)

lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train.ravel())

y_prob_lda = lda.predict_proba(X_test)[:,1]
y_pred_lda = np.where(y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0) #np.where(y_prob_lda > .5, 1, 0)
serene scaffold
#

I actually have to head out for a bit

haughty wharf
#

And my understanding is LDA's are supposed to reduce the number of dimensions to the most relevant features depending on the data

haughty wharf
serene scaffold
#

!docs numpy.ndarray.ravel

arctic wedgeBOT
#

ndarray.ravel([order])```
Return a flattened array.

Refer to [`numpy.ravel`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html#numpy.ravel "numpy.ravel") for full documentation.

See also

[`numpy.ravel`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html#numpy.ravel "numpy.ravel")equivalent function

[`ndarray.flat`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flat.html#numpy.ndarray.flat "numpy.ndarray.flat")a flat iterator on the array.
serene scaffold
#

@haughty wharf that

visual violet
#

i have finally updated the version lol

#

thx god

#

why does this thing label every single row lmao

#

i didn't even command it to do that

#
colors = pd.Series(predictions_pct).replace(dict(enumerate(['red', 'green', 'blue'])))
percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors)
serene scaffold
#

@visual violet you forgot the logy part

#

Also I'm at the gym so I may not may not respond between sets

haughty wharf
visual violet
#

have a great workout my dude

#

i still can't fix it lol

serene scaffold
#

I'm already back from that

visual violet
#

oh

serene scaffold
#

As of like two minutes ago

#

Anyway what error message and what pandas version?

visual violet
#

'1.2.4'

#

there is really no error

#

when i put py percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors, logy=True)

#

it shows

#

when i change to

#

percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors, logy=False
)

#

it shows

#

@serene scaffold

visual violet
#

yeah when i set it to true, there is no graph at all

#

just label

serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

cedar sun
#

when training a model, should u augment the validation data too?

visual violet
#

lmao exceeed maximum length

#

hmm let me think

haughty wharf
arctic wedgeBOT
#

Hey @visual violet!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

visual violet
#

lol

serene scaffold
visual violet
#

lol yes

serene scaffold
#

Just do fewer rows, I guess. How many are there?

visual violet
#

724 🙂

serene scaffold
visual violet
#

hmm still too big

haughty wharf
serene scaffold
#

Having long class names is a waste of milliseconds.

serene scaffold
haughty wharf
#

ohhh gotcha

cedar sun
#

can i make a model checkpoint to save model every 5 epochs, for example?

visual violet
#

250 seems to be the limit

#

what do you think?

cedar sun
#

ah with period

serene scaffold
visual violet
serene scaffold
#

which are necessary in this case.

visual violet
#

so i print the csv file?

serene scaffold
visual violet
#

got it

#

lol it looks so cool in pastebin

#

yet so useless when it comes to being clustered

serene scaffold
#

@visual violet I tried setting it to a symlog scale and got this

visual violet
#

so i tried out some graphing

#

the weird labeling is not ebcause of the colors

#

hmm

#

how do you scale it so nicely though

#

i am starting to think it is because of my jupyter skin

#

the graph doesn't show fully

serene scaffold
# visual violet how do you scale it so nicely though
In [73]: df.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference')
Out[73]: <AxesSubplot:xlabel='Quarter', ylabel='Percentage Difference'>

In [74]: matplotlib.pyplot.yscale('symlog')

In [75]: matplotlib.pyplot.show()
cedar sun
#

should i augment data in validation?

visual violet
#

what is validation?

#

how do you augment data

serene scaffold
#

I was writing a paper on data augmentation for nlp but it hasn't gone anywhere

visual violet
#

are you hitting dead end?

#

or you can't find enough data

serene scaffold
#

well, it wasn't making the results better or worse

#

so... 🤷‍♂️

visual violet
#

well research do be like that sometimes

#

null result is not useless

cedar sun
#

should i augment data in validation?

visual violet
#

can i call you Steele ?

serene scaffold
visual violet
#

right

low wasp
visual violet
#

i still need to figure out how to remove the labels lol

#

but now i looks much like a cluster

#

before it is quite weird

#

atually surprised how the price behaves to similarly

cedar sun
haughty wharf
#

@serene scaffold Im trying to follow this https://www.python-course.eu/linear_discriminant_analysis.php for LDA.

Im seeing the first step looks like:
Would my target feature be 256 columns and my descriptive feature would column g?

# 0. Load in the data and split the descriptive and the target feature
df = pd.read_csv('data/Wine.txt',sep=',',names=['target','Alcohol','Malic_acid','Ash','Akcakinity','Magnesium','Total_pheonols','Flavanoids','Nonflavanoids','Proanthocyanins','Color_intensity','Hue','OD280','Proline'])
X = df.iloc[:,1:].copy()
target = df['target'].copy()
serene scaffold
serene scaffold
# visual violet

There might be a point at which there are just too many lines to effectively plot

haughty wharf
visual violet
#

you are not wrong

#

but it does show the cluster prety well though, so i am pretty happy about it

serene scaffold
visual violet
#

you can see one giant clump

fiery pollen
#

Hi me again
df2=df[['DateDim[Day]', '[NetSales]']]

df2['DateDim[Day]'] = pd.to_datetime(df2['DateDim[Day]'])

plt.figure(figsize=(16,8))
plt.title('Sale History')
plt.plot(df2['[NetSales]'])
plt.xlabel('DateDim[Day]')
plt.ylabel('[NetSales]', fontsize=25)
plt.show()
thats my code and output like this

fiery pollen
#

and you know my date data looking like this

#

why the program still doesn't see it in date format

cedar sun
#

WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.

visual violet
#

do like data["Date"]= pd.to_datetime(data["Date"])

cedar sun
#

What does this mean? like, i though period was used to specify number of epochs

visual violet
#

right now it doesn't know that those are dates

fiery pollen
#

same output like 500000 not date on xlabel

visual violet
#

how about plt.plot(df2['DateDim[Day]'],df2['[NetSales]'])

fiery pollen
#

not bad but it looks something wrong with data 😄

visual violet
#

now you can extend the y axis like i did lol

#

figure(figsize=(10, 15), dpi=80)

#

put ^ code first then everything else after for it to work

fiery pollen
#

thankss

cedar sun
#

different models of EfficientNet

#

Are only based on the dimensions of the input img?

haughty wharf
#

How do I know if theres a good classification distribution in my data?

serene scaffold
#

@haughty wharf having a confusion matrix that's just the diagnonal means your model got everything right

#

Though you would want to evaluate that using different data than you used for training

serene scaffold
#

@haughty wharf you usually train on like 70% and evaluate on 30%, or something like that

#

Unless you want to cross validate, which is nice if you can afford that computationally.

haughty wharf
#

Ahhh I see

haughty wharf
serene scaffold
#

@haughty wharf exploratory analysis with the training data? Remind me what the rows and columns mean?

haughty wharf
#

WIth the whole pandas dataframe, minus the speaker and row columns. I actually have some notes let me see.

 A dataset was formed by selecting five phonemes for classification based on digitized speech from this database. The phonemes are transcribed as follows: "sh" as in "she", "dcl" as in "dark", "iy" as the vowel in "she", "aa" as the vowel in "dark", and "ao" as the first vowel in "water". From continuous speech of 50 male speakers, 4509 speech frames of 32 msec duration were selected, approximately 2 examples of each phoneme from each speaker. Each speech frame is represented by 512 samples at a 16kHz sampling rate, and each frame represents one of the above five phonemes. The breakdown of the 4509 speech frames into phoneme frequencies is as follows:

From each speech frame, we computed a log-periodogram, which is one of several widely used methods for casting speech data in a form suitable for speech recognition. Thus the data used in what follows consist of 4509 log-periodograms of length 256, with known class (phoneme) memberships.

The data contain 256 columns labelled "x.1" - "x.256", a response column labelled "g", and a column labelled "speaker" identifying the diffferent speakers.

g- is labeled the phoneme```
haughty wharf
visual violet
#

anybody knows how to explain dataframe in words?

#

like anybody has experience with describing a dataframe in word in a research paper

reef bone
#

a pandas dataframe?

#

you could have a look around the docs, they have to explain it somehow

#

the docs for the class say:

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

#

there will probably be a more lengthy description elsewhere, but its difficult to navigate the docs on mobile 😬

visual violet
#

oh the way pandas doc describe their dataframe confuses me a lot lol

#

"Two-dimensional, size-mutable, potentially heterogeneous tabular data."

#

lol

#

wat does this even mean

reef bone
#

it means that you have rows and columns (2 dimensions), the size can change (you can add and remove rows and columns), and can hold mixed types (ints, floats, timestamps, ...) in a single dataframe

twin fiber
#

hey I'm really hoping someone can help me, struggling to get correct(?) output from confusion matrix

#

I am building a model to infer sentiment from reviews. The model accuracy is listed at 97% and I am now trying to calculate the confusion matrix however it doesn't seem to output the correct information unless i'm misinterpreting it

#

this is what the matrix is outputting, can this be right with a 97% accurate model?

lapis sequoia
#

how do I split with multiple delimiters

velvet thorn
lapis sequoia
velvet thorn
#

it depends

#

re.split for the general case

lapis sequoia
#

A big block of text in this case

#

How do I separate the delimiters with regex?

velvet thorn
#

you use a regex that matches multiple characters

#

and it’ll split on any of them

lapis sequoia
#

Got it, thanks

twin fiber
#

out of 5000, unless i'mr eading it wrong

#

it just seems high because the model is meant to be 97% accurate

velvet thorn
#

which seems about right

#

or FP, I forgot which axis is predicted

twin fiber
#

oh I see thank you

#

I was interpreting it wrong

velvet thorn
#

yw 👋

cedar sun
#

do u have any mixup implementation for keras?

visual violet
#

what representation method should i use

#

for a time-series data

#

i only have 20 columns and 725 rows

#

but i wanna know how to reduce the 'noise' 🙂

wanton sleet
#

x = np.linspace(0, 2 * np.pi, 400)

#

any knows what this specific line does?

austere swift
#

did you check out the documentation for the function?

#

!d numpy.linspace

arctic wedgeBOT
#

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)```
Return evenly spaced numbers over a specified interval.

Returns *num* evenly spaced samples, calculated over the interval [*start*, *stop*].

The endpoint of the interval can optionally be excluded.

Changed in version 1.16.0: Non-scalar *start* and *stop* are now supported.

Changed in version 1.20.0: Values are rounded towards `-inf` instead of `0` when an integer `dtype` is specified. The old behavior can still be obtained with `np.linspace(start, stop, num).astype(int)`
wanton sleet
#

@austere swift starts from 0 - 360 ( as pi =180) of 400 different samples right?

austere swift
#

If you’re in the context of radians, yes

wanton sleet
#

Thanks

dense hinge
#

can someone recommend an ebook for me to get started with deep learning with pytorch?

thorny bolt
#

i need help with a bit of my code in which i'm training a model

#

and it is a bit urgent

novel elbow
#

IF you need help. post the questions here (:

magic juniper
#

I have my neural network made and stuff, now how would I Utilize it to actually create an AI? please help.

dense hinge
novel elbow
#

thats a github problem, sometimes is not very reliable to check notebooks

dense hinge
#

yah just read the problem and found the site

#

thanks dude

#

@novel elbow is fastai better to learn than pytorch?

novel elbow
#

its build in top of pytorch

dense hinge
#

too many abstractions though right?

#

if I wanted to something manual it would be hard is what I heard

novel elbow
#

depends on what you want to learn

#

then check the fastai course

#

the part 2 of the course they teach you how is the library built

#

so you can see all the inner and manual parts

dense hinge
#

oh ok thanks

thorny bolt
#

does anybody here participate in kaggle competitions regularly?

inland zephyr
#

Does anyone in here know the proper way to feed tensorflow/keras conv1d network with pandas dataframe? I always have problem with 1D datastructure from pandas to conv1d

#

let say i have n x 1000 features data for train or test, i always troubled with the input_shape with [n,n_feature] or batch_input_shape with(n,n_feature)

#

i using this code line dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values)) and put my 1st convo1D layer like this model.add(layers.Conv1D(filters=64,kernel_size=9,activation='relu',batch_input_shape = [None,15360, 1])) it stuck on ValueError: Input 0 of layer sequential_3 is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (15360, 1)

#

and this is my model structure:

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_12 (Conv1D)           (None, 15352, 64)         640       
_________________________________________________________________
conv1d_13 (Conv1D)           (None, 15344, 64)         36928     
_________________________________________________________________
conv1d_14 (Conv1D)           (None, 15336, 64)         36928     
_________________________________________________________________
max_pooling1d_4 (MaxPooling1 (None, 7668, 64)          0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 7668, 64)          0         
_________________________________________________________________
conv1d_15 (Conv1D)           (None, 7660, 64)          36928     
_________________________________________________________________
conv1d_16 (Conv1D)           (None, 7652, 64)          36928     
_________________________________________________________________
max_pooling1d_5 (MaxPooling1 (None, 3826, 64)          0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 3826, 64)          0         
_________________________________________________________________
conv1d_17 (Conv1D)           (None, 3818, 64)          36928     
_________________________________________________________________
flatten_2 (Flatten)          (None, 244352)            0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 244353    
=================================================================
Total params: 429,633
Trainable params: 429,633
Non-trainable params: 0
_________________________________________________________________

novel osprey
#

!d numpy

arctic wedgeBOT
#

numpy is the standard numerical array library for python, the successor to Numeric and numarray. numpy provides fast operations for homogeneous data sets and common mathematical operations like correlations, standard deviation, fourier transforms, and convolutions.

wooden cosmos
#

hey guys, i'm trying to load a pre-trained gensim Word2Vec model and i am experiencing this error:

UnpicklingError: invalid load key, '6'.

i got the model like this :
'blabla-10-300.w2v.model.bz2'

i tried multiple ways: i loaded it directly, i loaded it unzipped, i tried multiple methods to load it into gensim

and nothing seems to work

do you have a fix?

serene scaffold
#

@wooden cosmos try copying the error message into the chat

wooden cosmos
serene scaffold
#

Yay!

wooden cosmos
#

yeah, cool
but i ran into an other problem - the model is trained on non-stemmed and non-lemmatized french words

wooden cosmos
serene scaffold
#

@eager heath, I choose you!

eager heath
#

Yes!

serene scaffold
#

French. Help.

eager heath
#

Tell me about it

serene scaffold
#

You know how there's like the normal form of a word, like the version that's used to look it up in a dictionary?

eager heath
#

I don't know what the format is, but now it exists, yes

serene scaffold
#

It's called the lemma.

#

It's usually the singular form of a word when it's the subject of a sentence.

wooden cosmos
#

the question is : is it better to train w2v on lemmatized and stemmed tokens or to use just plain words like in the text, without any preproc

#

and if i have a model, which is not trained on lemms and stemms - should i try and retrain that thing or i just stick with it?

eager heath
#

that would be a bit weird, we do have plural and gendered forms for most words

#

My blind guess is that you would get less accurate results because of all the possible spelling of a word

wooden cosmos
#

yeah

#

and also for the verbs -> "aller" could be spelled as "allez","allons","allait","allaient"

eager heath
#

Yeah

serene scaffold
#

Are there rules you can apply to certain types of words to get their base form?

eager heath
#

Yes, there should be

#

Although they would get really complicated quickly

#

We like exceptions of exceptions

desert oar
#

romance languages are generally easier to reduce to a base form than english, as far as i know

#

i know spanish you could probably do it in prolog, not many exceptions and even the exceptions are pretty "regular"

serene scaffold
#

there are two conjugations of verbs, those where you append -ed and those where a vowel changes internally. Like swim vs swam. But the latter category is shrinking over time.

dry hearth
desert oar
#

and those have had a lot of research behind them

grave frost
dry hearth
#

what other subreddits do you suggest i check out?

ember sapphire
#

how can i get a random vector [x, y, z] from a 3d numpy array with shape (255, 255, 3)

desert oar
ember sapphire
#

yes

#

the 3d array is conceptually a 2d array of (r, g, b) triples

#

i want a random triple

#

i looked at at np.random.choice and some others but i can't see an easy way to do this even though it should be easy

desert oar
#
from random import randrange
i, j = randrange(255), randrange(255)

my_random_rgb = array[i, j, :]

no?

ember sapphire
#

lol sure

#

for some reason i thought there'd be some numpy function that did it directly

desert oar
#

random.choice specifically says it's for 1D

ember sapphire
#

yeah

#

there's random.Generator.choice

desert oar
#

same thing

#

i'm not sure if there's a version to select random "slices" like that

ember sapphire
#

doesn't seem like it

#

guess i'll just do it the way you suggested

desert oar
#
from numpy.random import default_rng

rng = default_rng()
i, j = rng.integers(0, 255, size=2, dtype=int, endpoint=False)

i think this is equivalent using the numpy rng

#

which you probably should do if you want to use the same random seed as your other numpy code

#

maybe you can "partially ravel" the array and then use choices

#

@ember sapphire ```python
from numpy.random import default_rng

rng = default_rng()

image = ... # 255 x 255 x 3
random_triple = rng.choice(image.reshape((-1,3)))

ember sapphire
#

thank you

#

that is nice

desert oar
#

the i,j version might be faster for what it's worth

#
In [76]: b = np.arange(255*255*3).reshape((255,255,3))

In [77]: rng = np.random.default_rng()

In [78]: def rand1(rng, array):
    ...:     i, j = rng.integers(0, 255, size=2, dtype=int, endpoint=False)
    ...:     return array[i, j, :]
    ...:

In [79]: def rand2(rng, array):
    ...:     return rng.choice(array.reshape((-1,3)))
    ...:

In [80]: %timeit rand1(rng, b)
23.6 µs ± 3.03 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [81]: %timeit rand2(rng, b)
21.3 µs ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#

not much different

#

!e ```python
from math import sqrt
import scipy.stats

n1 = 7
n2 = 7

m1 = 23.6
s1 = 3.03

m2 = 21.3
s2 = 2.10

welch_t = (m1 - m2) / sqrt(s12 + s22)
welch_df = ((n1-1)*s14 + (n2-1)*s24) / sqrt(s14 + s24)
welch_p = scipy.stats.t(df=welch_df).ppf(welch_t)

print(welch_p)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

0.3171214613223502
ember sapphire
#

another simple numpy question

#

so i have my 255x255 image

#

and i want to augment it so that it's 255x255x5 instead of 255x255x3

#

i want image[x, y] to become [x/255, y/255, *image[x, y]]

#

is there a way to do that in a vectorized fashion?

desert oar
#

wouldn't that be more than 5?

#

or do you literally want the index in the array?

ember sapphire
#

i want the index in the array

desert oar
#

as in, image[10, 30] should be (10/255, 30/255, r, g, b)?

ember sapphire
#

yes

desert oar
#

dare i ask, why?

ember sapphire
#

because that is the space in which i want to compute distances

#

im clustering pixels based on their location and their color

#

so my centroids for k-means need to be vectors in that 5d space

#
cluster[y, x] = np.argmin(np.linalg.norm(centroids - img[y, x], axis=1))
#

the goal is to be able to write that

desert oar
#

!eval there might be a nicer way to do it, but this appears to work

import numpy as np

# rgb 255x255 image
b = np.arange(255*255*3).reshape((255,255,3))

m, n = b.shape[:2]
i_broadcast = np.repeat(np.arange(m), n).reshape((m, n, -1))
j_broadcast = np.tile(np.arange(m), n).reshape((m, n, -1))
b_aug = np.concatenate((i_broadcast, j_broadcast, b), axis=2)

print(b_aug)
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | [[[     0      0      0      1      2]
002 |   [     0      1      3      4      5]
003 |   [     0      2      6      7      8]
004 |   ...
005 |   [     0    252    756    757    758]
006 |   [     0    253    759    760    761]
007 |   [     0    254    762    763    764]]
008 | 
009 |  [[     1      0    765    766    767]
010 |   [     1      1    768    769    770]
011 |   [     1      2    771    772    773]
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/jenuwoguja.txt?noredirect

desert oar
#

if you have to do this for lots of images you could of course re-use the i_broadcast and j_broadcast over and over in a tight loop

#

i forgot to /255 but you get the idea

ember sapphire
#

is it just me or is numpy miserable to work with

desert oar
#
x = np.repeat(np.arange(m), n).reshape((m, n, -1)) / 255.0
y = np.tile(np.arange(m), n).reshape((m, n, -1)) / 255.0

b_aug = np.concatenate((x, y, b), axis=2)
#

it's just a reality when working within a language like python

#

you need to do as much in C as possible

#

which means you need custom C functions and lots of custom functionality that in a "fast" language you might just do in a for loop

ember sapphire
#

kind of defeats the purpose of using a high level language

desert oar
#

it's also a bit of a learning curve and an acquired taste

#

sort of, you the developer don't need to worry about allocating memory and strided array lookups and bytes and stuff

#

also if you really do need to write a for loop over a numpy array, numba can be magical

#
import numba
import numpy as np

def augment_with_coords_np(array):
    x = np.repeat(np.arange(m), n).reshape((m, n, -1)) / 255.0
    y = np.tile(np.arange(m), n).reshape((m, n, -1)) / 255.0
    return np.concatenate((x, y, array), axis=2)

@numba.njit
def augment_with_coords_nb(array_in):
    array_out = np.zeros((255, 255, 5))
    for i in range(255):
        x = i / 255.0
        for j in range(255):
            y = j / 255.0
            array_out[i, j, 0] = x
            array_out[i, j, 1] = y
            array_out[i, j, 2] = array_in[i, j, 0]
            array_out[i, j, 3] = array_in[i, j, 1]
            array_out[i, j, 4] = array_in[i, j, 2]
    return array_out
In [144]: %timeit augment_with_coords_np(b)
1.06 ms ± 93.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [146]: %timeit augment_with_coords_nb(b)
255 µs ± 53.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#

where b = np.arange(255*255*3).reshape((255,255,3))

#

numba is a lot faster here because it can be more algorithmically efficient, only a single nested loop and a single allocation, instead of lots and lots of looping and allocation + python function call overhead

#

if you use np.empty instead of np.zeros , the numba version is even faster

In [149]: %timeit augment_with_coords_nb(b)
155 µs ± 4.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ember sapphire
#

wow

serene scaffold
serene scaffold
#

how does that even happen?

desert oar
#

a lot faster

#

like i said, fewer passes over the data + no python function call overhead

#

at least, that is my understanding of why

serene scaffold
#

does numba only work with cpython?

desert oar
#

i think so, i've read some blog posts about using it with pypy but i think you need to build pypy from source with some patches, maybe?

#

maybe it's better now in 2021

serene scaffold
#
def augment_with_coords_np(array):
    x = np.repeat(np.arange(m), n).reshape((m, n, -1)) / 255.0  # arrange, repeat, reshape, divide; 4
    y = np.tile(np.arange(m), n).reshape((m, n, -1)) / 255.0    # same, basically. 4
    return np.concatenate((x, y, array), axis=2)                # 1

If I'm reading this right, this involves creating 9 arrays, only one of which gets returned. But I assume that within a numba-decorated function, the semantic requirement that intermediary arrays are created isn't there, yes?

#

that and you don't create intermediary arrays anyway

grave frost
serene scaffold
desert oar
#

yeah the arrays still need to get created

#

numpy has no way to optimize that away

dry hearth
desert oar
#

this is the same use case as numexpr in pandas

dry hearth
#

😦

grave frost
#

or buy/obtain a better GPU

serene scaffold
#

(when it's only having to deal with arrays)

desert oar
#

that i don't know, i'll try it

#

it doesn't work with nopython mode

#
In [156]: %timeit augment_with_coords_nb_np(b)
1.17 ms ± 79.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

this is the numpy version with numba.jit slapped on top of it

#

so it's actually slower than just doing it in plain python

#

or at least not any faster

#

note: this is all with numpy 1.20.2 under cpython 3.9.4 x86_x64

#

using the pypi wheel, not conda

#

properties might be different in different situations of course

serene scaffold
#

so to benefit from numba, you can't be creating lots of extra arrays?

desert oar
#

to benefit from numba you need to be writing for loops over numpy arrays

#

not using high-level numpy functions

#
@numba.njit
def augment_with_coords_nb_pre(array_in, array_out):
    for i in range(255):
        x = i / 255.0
        for j in range(255):
            y = j / 255.0
            array_out[i, j, 0] = x
            array_out[i, j, 1] = y
            array_out[i, j, 2] = array_in[i, j, 0]
            array_out[i, j, 3] = array_in[i, j, 1]
            array_out[i, j, 4] = array_in[i, j, 2]

i think you could even require a pre-allocated array_out parameter to be filled

#

this is one possible optimization if you're writing a loop, you can allocate the memory once for the entire loop

serene scaffold
#

I'd rather it be something like

@numba.njit(array_out=np.zeros((5, 5)))
def augment_with_coords_nb_pre(array_in):
    for i in range(255):
        x = i / 255.0
        for j in range(255):
            y = j / 255.0
            array_out[i, j, 0] = x
            array_out[i, j, 1] = y
            array_out[i, j, 2] = array_in[i, j, 0]
            array_out[i, j, 3] = array_in[i, j, 1]
            array_out[i, j, 4] = array_in[i, j, 2]
desert oar
#

interestingly it doesn't actually seem faster if you use the pre-allocated array

serene scaffold
#

but I guess that would fuck with the namespacing

desert oar
#
In [162]: @numba.njit
     ...: def augment_with_coords_nb_pre(array_in, array_out):
     ...:     for i in range(255):
     ...:         x = i / 255.0
     ...:         for j in range(255):
     ...:             y = j / 255.0
     ...:             array_out[i, j, 0] = x
     ...:             array_out[i, j, 1] = y
     ...:             array_out[i, j, 2] = array_in[i, j, 0]
     ...:             array_out[i, j, 3] = array_in[i, j, 1]
     ...:             array_out[i, j, 4] = array_in[i, j, 2]
     ...:

In [163]: c = np.empty((255, 255, 5))

In [164]: %timeit augment_with_coords_nb_pre(b, c)
179 µs ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [165]: %timeit c = np.empty((255, 255, 5)); augment_with_coords_nb_pre(b, c)
193 µs ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
serene scaffold
#

so the time savings is probably just from making the array in advance?

desert oar
#

what do you mean?

#

it's still significantly faster than the np.concatenate version

#

actually wait

#

it is faster than the numba version using np.zeros, but not faster than the numba version using np.empty

#

these differences aren't really statistically significant though

#

i'm surprised that pre-allocating isn't significantly faster, maybe there's additional overhead somehow, or i need to use the numba signature

serene scaffold
#

@desert oar On an unrelated note, I've been wanting to write an article on transition from general Python program design to data science Python design, and it's based on the idea that where general Python is mostly OOP and imperative (there are lots of data types and you use loops to read and write to different data structures), data science Python is more functional and less OOP (you mostly work with "rectangular" data structures, pretty much everything is a function that doesn't modify the underlying data (ie there are usually no (gasp) side effects)). Do you think I'm on the right track?

desert oar
#
In [167]: c = np.empty((255, 255, 5)); augment_with_coords_nb_pre(b, c)

In [168]: np.testing.assert_array_almost_equal(augment_with_coords_nb(b), c)

In [169]: %timeit augment_with_coords_nb_pre(b, c)
158 µs ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [170]: %timeit augment_with_coords_nb_pre(b, c)
149 µs ± 1.77 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [171]: %timeit augment_with_coords_nb(b)
150 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [172]: %timeit augment_with_coords_nb(b)
158 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
lost umbra
desert oar
#

pandas also isn't really that "functional"

#

in fact id' say it's not really functional at all other than its use of higher-order functions in some places (map, apply, etc)

serene scaffold
#

I'm thinking of both pandas and numpy

desert oar
#

same goes for numpy as for pandas

#

but pandas and numpy are both fairly object-oriented and only somewhat "functional"

#

numpy and pandas do both support some mix of views and copy-on-write, but it's mostly exposed directly to the user, rather than hidden away as optimization behind an immutable interface

#

the "no side effects" aspect of functional programming is mostly incidental by virtue of what people usually want to do with numpy and pandas: math

serene scaffold
#

Declarative is definitely more along the right lines. It wasn't part of my CS education, apparently.

#

I should get a refund.

desert oar
#

heh, sql is declarative

serene scaffold
#

my database class was weird

desert oar
#

i'm sure they talked more about database implementation than about programming language design though

serene scaffold
#

in my database class? the first half of the class was ACID, relational algebra and all those normal forms, and the second half was SQL and making a website

#

we didn't talk about the time complexity of different queries, which is kind of annoying

uncut barn
#
 if True:
        tokens = [t for t in tokens if t not in set(stopwords.words('english'))]
#

can anyone help me understand what this code means

#

i mean the first part if True

#

what has to be True, and what makes this statement false?

serene scaffold
uncut barn
#

yes but i'm just trying to understand the if True part

#

i get the rest of the code

serene scaffold
uncut barn
#

ok so its just extra code

serene scaffold
ember sapphire
#

ok i must be doing something seriously wrong

uncut barn
#

so what happens if i do if False

#

would it still run

austere swift
#

do you know the concept of an if statement?

#

like what if does

uncut barn
#

yes check if a statement is true or not and executes the result if a condition is achieved

austere swift
#

it executes the block when the condition is evaluated to true

#

so if True will always be executed, since the condition is True

#

and inversely, if False will never be executed, since the condition is False

ember sapphire
#
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import image

rng = np.random.default_rng()

img = image.imread('fruits_small.jpg')
h, w = img.shape[:2]

x = np.repeat(np.arange(h), w).reshape((h, w, -1)) / w
y = np.tile(np.arange(h), w).reshape((h, w, -1)) / h

augmented_image = np.concatenate((x, y, img), axis=2)
augmented_image = np.array(img)

plt.subplot(3, 3, 1)
plt.title('Original')
plt.imshow(img)

for plot, k in enumerate([4, 8, 16, 32, 64]):
    centroids = rng.choice(augmented_image.reshape((-1, 3)), size=k, replace=False)
    clusters = np.empty((h, w))

    print(centroids)

    while True:
        for y, x in np.ndindex(img.shape[:2]):
            v = augmented_image[y, x]
            clusters[y, x] = np.argmin(np.linalg.norm(centroids - v, axis=1))

        d = 0
        for i in range(k):
           c = augmented_image[clusters == i]
           new_centroid = c.mean(axis=0)
           d += np.linalg.norm(centroids[i] - new_centroid)
           centroids[i] = new_centroid

        if d == 0:
            break

    cluster_colors = [np.random.rand(3) for _ in range(k)]
    for i in range(k):
        img[clusters == i] = centroids[i]

    plt.subplot(3, 3, plot + 1)
    plt.title(f'k = {k}')
    plt.imshow(img)

plt.show()
#

is there anything obviously terrible here? running it on a 750x500 image is taking hours

#

it looks like it isn't even converging

desert oar
#
stopwords_set = set(stopwords.words('english'))
tokens = [t for t in tokens if t not in stopwords_set]
uncut barn
#

ah so this way saves time?

desert oar
#

yes, and if you're doing text processing on a lot of data and/or have a big stopwords list, the time savings could add up

uncut barn
#

ahh so this is why my runtime was taking too long, thanks

desert oar
#

that probably isn't the only reason

#

you'd have to share more of your code

charred umbra
#

Does anyone know how to write an ML paper?

For background, I've trained a image-based biostatistical model that can identify COVID-19 at >99% accuracy. I extended the model to three other diseases. It trained on CT scans and x-rays. It sucesfully identified Coronavirus, Tuberculosis, Carcinoma, & Pneumonia at above 93% accuracy, specificity, sensitivity, and precision when tested through one-vs-all adaptations of 2x2 confusion matrices in cross validation.

The model is deep-learning based in semi-supervised platform. It used convolutional neural-network, deep multilayer perceptron, isolation forest, and support vector machine to make diagnosis. It shows promising results, and I want to write the paper. Idk how these types of papers are written though.

grave frost
#

if its just for CV, not for actual conferences then I guess that doesn't really matter

charred umbra
#

Yeah ik, but theyre all written differently

#

so Im so confused

#

btw, Im just writing this paper as a mf school science fair project, so Im not trynna get compensated or anything

grave frost
#

I also wanted to write a paper too 😁 but writing even a decent paper requires a ton of knowledge and studying of previous methods - not to mention all the formality

charred umbra
#

Thing is that Im required to

#

I already did the actual experiment with the network and stuff

#

Now I just gotta put it into word form by summer's end

grave frost
#

if its only for a school project

#

then you just need to formalize whatever you have done - no need to write a full fledged paper

charred umbra
#

Yeah Im thinking I want to try and get the paper to ISEF, but idk if it's good enough though

grave frost
#

do they specifically ask for research papers? if not, then a document would be enough

charred umbra
#

In the past 2 years, at least one paper at my regionals fair that advanced to ISEF was a ML classifier for cancer. The aforementioned model can do cancer, as well as other diseases like tuberculosis and stuff

charred umbra
grave frost
#

seems like just a document formatting way

#

so if you put decent amount of formalism in it, you would be fine

charred umbra
#

ebic

lapis sequoia
#

I have a question for professionals data scientists ; in which context do you use maths and in which context do you use coding ? thank you 🙂

ember sapphire
#

my cost function is increasing from one iteration to the next

#

does that mean i have a problem in my implementation?

grave sparrow
#

So I am running into this dilemma that I have not found an answer for and its weird...
When concatenating the values of 2 columns at a row level, you have to do something ugly like

df['combined']=df['one'].astype(str)+' stuff ' + df['two'].astype(str)

It works and all but I can't help but feel like its a code smell

desert oar
#

and what are the actual datatypes here?

#

sometimes it is the best way to do something, but it's rare that this is actually what you want to do

grave sparrow
#

Can't say particularly. Basically generating an instruction based on values in two columns.

desert oar
#

the only other way to perform this particular task is with .apply over rows

#

if the column is already a string column, why astype(str)?

grave sparrow
#

They are strings.

desert oar
#

if there are nulls, you need to handle those differently

#

astype(str) will do the wrong thing for the most part

grave sparrow
#

I was getting weird errors.

desert oar
#

what errors

#

!e ```python
import pandas as pd
s1 = pd.Series(['a', 'b', None])
s2 = pd.Series(['x', None, 'z'])
print( s1.astype(str) + ' -> ' + s2.astype(str) )

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | 0       a -> x
002 | 1    b -> None
003 | 2    None -> z
004 | dtype: object
grave sparrow
#

Sry actually not weird.

desert oar
#

that probably isn't what you want

grave sparrow
#

Float > string coercion error

desert oar
#

so you have mixed data types

#

i.e. not strings

#

are there np.nan's in there?

#

!e ```python
import pandas as pd
import numpy as np
s1 = pd.Series(['a', 'b', np.nan])
s2 = pd.Series(['x', np.nan, 'z'])
print( s1.astype(str) + ' -> ' + s2.astype(str) )

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | 0      a -> x
002 | 1    b -> nan
003 | 2    nan -> z
004 | dtype: object
desert oar
#

again, probably not what you want

grave sparrow
#

Hmm how would you handle that?

#

Assuming the concatenated columns can potentially be null