#data-science-and-ml | Python | Page 321

velvet thorn Jun 18, 2021, 12:13 AM

#

wrong part

#

yeah

#

but if it’s not

#

then the brain has the same limitation

#

of course we know @ very small scales

grave frost Jun 18, 2021, 12:14 AM

#

velvet thorn then the brain has the same limitation

I previously said the brain's structure is kinda dynamic.

velvet thorn Jun 18, 2021, 12:14 AM

#

the world is not deterministic

grave frost Jun 18, 2021, 12:14 AM

#

its extremely complicated

velvet thorn Jun 18, 2021, 12:14 AM

#

grave frost I previously said the brain's structure is kinda dynamic.

what does “dynamic” mean in this context

grave frost Jun 18, 2021, 12:14 AM

#

it can't be mapped as a static function

grave frost Jun 18, 2021, 12:15 AM

#

velvet thorn what does “dynamic” mean in this context

honestly, even I don't know neuro-science fully. I assume its due to the voting mechanism that aggregates dreams and live expereinces + memories to make different predictions each time

#

in the sense that the structure always changes

#

some connections break

#

some do not

#

its kinda contested BTW, and the research is pretty new. but we do apparently modify the brain in any case over time

velvet thorn Jun 18, 2021, 12:16 AM

#

grave frost its kinda contested BTW, and the research is pretty new. but we do apparently mo...

yesyes

#

of course this happens

#

but

#

the point is that such changes

#

if they are deterministic, they can be modelled through the history of state, which the hypothetical function takes

#

and if they’re not, then those aspects are just random, which can also be theoretically modelled

grave frost Jun 18, 2021, 12:17 AM

#

velvet thorn yesyes

your argument is such that as long as humans do actions, it can be considered a function. if that's the case, then even behavior of quarks is a definite function, its output being the set of coordinates?

#

not everything can be modelled, and a deterministic view of the universe is pretty incorrect. Our brain is far from a function, as has been often laughed by neuroscientists.

HTM hasn't achieved AGI mostly because its breakthrough ideas are very new and its slowly picking up steam to be started and researched even fully. Maybe it won't lead us to AGI, but its the closed thing we have got - the path with the least error, as compared to DL

cedar sun Jun 18, 2021, 2:00 AM

#

guys, what are the ways to increase a model acc?
from the most newbie ones to the most advanced

serene scaffold Jun 18, 2021, 2:27 AM

#

cedar sun guys, what are the ways to increase a model acc? from the most newbie ones to th...

what does the model do?

exotic maple Jun 18, 2021, 4:17 AM

#

Guys I have question. What kind of statistical test can I use the determine a categorical features importance on a regression task?

I've been reading a bit about it and it seems a one-way ANOVA (after turning the categorical features into dummies) seems like the most viable approach, but I'd like to be sure.

#

the TL;DR -> What statistic is best to match: Categorical Input -> Numerical Output

#

I tried using sklearn's f_regression and mutual_info_regression, but i'm not confident in the significance of this results

blazing bridge Jun 18, 2021, 6:40 AM

#

Hi, I am currently working a deep learning model for image colorization and have a pretty big dataset as well

#

Even though i switched the dataset, the results aren't very good at all

#

I am not sure how I can improve them

#

https://www.kaggle.com/darthgera/colorization

colorization

#

this is the dataset I am using

#

if anyone know how I can achieve good results please ping me

uncut barn Jun 18, 2021, 8:58 AM

#

engine = create_engine('sqlite:///data.sqlite')
create_table_from_csv(engine,
                      "country-income.csv", # name of file
                      table_name = "country_income", # give a name to the table
                      fields = [ # all the columns in the csv file
                          ("region", "string"),
                          ("age", "integer"),
                          ("online_shopper", "string")],
                      create_id = True
                     )

How do I load the CSV file using Cubes, and create a JSON file for the data cube model, and create a data cube for the data?

#

ive done this part but dont know where to go from here

gritty spear Jun 18, 2021, 10:05 AM

#

hi, anyone tried GPT with graph database ?

wintry crescent Jun 18, 2021, 10:37 AM

#

How to implemante ANN with python for image recognize (not letters)

cedar sun Jun 18, 2021, 10:58 AM

#

serene scaffold what does the model do?

Classification

grave frost Jun 18, 2021, 11:25 AM

#

gritty spear hi, anyone tried GPT with graph database ?

Tabular dataset? try TAPAS

grave frost Jun 18, 2021, 11:26 AM

#

exotic maple I tried using sklearn's f_regression and mutual_info_regression, but i'm not con...

you can try naive ablation experiments to confirm their results

gritty spear Jun 18, 2021, 11:44 AM

#

@grave frost can you please give some references?

strong zephyr Jun 18, 2021, 12:15 PM

#

Hello All, I thought I would share another tool you can use to visualize your data, let me know what you think
https://github.com/codemation/easycharts

GitHub

codemation/easycharts

Easily create data visualization of static or streaming data - codemation/easycharts

serene scaffold Jun 18, 2021, 12:44 PM

#

cedar sun Classification

A lot of models do, so that isn't specific enough.

grand breach Jun 18, 2021, 12:57 PM

#

i'm thinking of moving my anaconda dir from C to a different drive. if i create a symlink to the old directory after moving which is there in the PATH variable will everything work as expected?

#

or should i backup my base and other env and restore them later after re-installing?

primal tulip Jun 18, 2021, 1:13 PM

#

grand breach i'm thinking of moving my anaconda dir from C to a different drive. if i create ...

Wait for someone else to answer or confirm this, but a symlink should work without any issues. I haven't done it, that's why I'm not sure.

grand breach Jun 18, 2021, 1:15 PM

#

or maybe just move only my environments to get back some space?

primal tulip Jun 18, 2021, 1:15 PM

#

You would definitely be able to do it in Linux. If Windows is not acting weird, in theory it all should be fine as well.

primal tulip Jun 18, 2021, 1:15 PM

#

grand breach or maybe just move only my environments to get back some space?

If you want to do it manually, yeah.

grand breach Jun 18, 2021, 1:15 PM

#

i'm on windows

#

I was clearing up my C and saw my anaconda installation was occupying ~12gb so thought why not move it to other place..i did a conda clean --all to remove some unnecessary files

near cosmos Jun 18, 2021, 1:20 PM

#

It's been a while for me on windows, but I'm skeptical this will work because windows aliases usually aren't invisible to programs in the way symlinks are (at least ~10 years ago)

#

You might try just moving the cache

#

https://docs.anaconda.com/anaconda/user-guide/tasks/shared-pkg-cache/

primal tulip Jun 18, 2021, 1:35 PM

#

You can always count that Windows will be weird and quirky, then.

sharp pawn Jun 18, 2021, 1:47 PM

#

hey guys anyone know how to subtract sine with cosine waves and plot the resultant wave in numpy python?

cedar sun Jun 18, 2021, 2:01 PM

#

serene scaffold A lot of models do, so that isn't specific enough.

then what are u asking exactly?

earnest hawk Jun 18, 2021, 3:06 PM

#

Hello guys, i have a question. Actually i programming a simple application with mist 784 database. The application is to recognize the drawn numbers. I use SVC, and here is my problem. Model calculation take a long time. I can wait 1 hour and nothing happens. I try another model like KNN or a DummyClassifier, but the effect is the same.

#

There is my code

serene scaffold Jun 18, 2021, 3:07 PM

#

cedar sun then what are u asking exactly?

Are you asking how to improve model performance in general? Because if you're asking how to improve the performance of a specific model, I can't guess without knowing what that model is designed to do. What classes does it classify?

#

Don't forget the py

#

!code

earnest hawk Jun 18, 2021, 3:08 PM

#

from sklearn.datasets import fetch_openml
import pandas
from sklearn.model_selection import GridSearchCV, cross_val_score, cross_val_predict
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

import py
from inne import jes

mnist = fetch_openml('mnist_784', version=1)

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

X, y = mnist["data"], mnist["target"]
some_digit = X.iloc[999]
some_digit_image = some_digit.values.reshape(28, 28)
plt.imshow(some_digit_image, cmap="binary")
plt.axis("off")
y = y.astype(np.uint8)
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
sgd_clf = SVC(gamma='auto',probability=False)
sgd_clf.fit(X_train, y_train)
while 1==1:
    py.Paint()
    obj = jes.init()
    print(sgd_clf.predict([obj]))
    some_digit_image = obj.values.reshape(28, 28)
    plt.imshow(some_digit_image, cmap="binary")
    plt.axis("off")
    plt.show()

cedar sun Jun 18, 2021, 3:08 PM

#

serene scaffold Are you asking how to improve model performance in general? Because if you're as...

so, i was asking general techniques, but for my particular case, im the guy doing the pokemon classification

serene scaffold Jun 18, 2021, 3:09 PM

#

cedar sun so, i was asking general techniques, but for my particular case, im the guy doin...

I don't think it's possible to answer that question in general.

earnest hawk Jun 18, 2021, 3:09 PM

#

serene scaffold Are you asking how to improve model performance in general? Because if you're as...

okey let's start from the beginning

serene scaffold Jun 18, 2021, 3:10 PM

#

earnest hawk okey let's start from the beginning

The comment you're replying to was not directed at you.

earnest hawk Jun 18, 2021, 3:10 PM

#

oh sorry

serene scaffold Jun 18, 2021, 3:11 PM

#

earnest hawk Hello guys, i have a question. Actually i programming a simple application with ...

Which part is taking a long time? Because you have an infinite while loop at the end.

earnest hawk Jun 18, 2021, 3:12 PM

#

The problem is with training model. Here ```py
sgd_clf.fit(X_train, y_train)

#

I check it with a debbuger

serene scaffold Jun 18, 2021, 3:15 PM

#

earnest hawk The problem is with training model. Here ```py sgd_clf.fit(X_train, y_train) ```

if you have a lot of data, that's the expected bottleneck

earnest hawk Jun 18, 2021, 3:17 PM

#

This database have 70k records. Interestingly earlier this problem did not occur.

lapis sequoia Jun 18, 2021, 4:24 PM

#

can anybody help me with tensorflow?

grave frost Jun 18, 2021, 4:27 PM

#

earnest hawk This database have 70k records. Interestingly earlier this problem did not occur...

Don't you have to batch you data, unless you can hold everything in memory

solid aurora Jun 18, 2021, 4:54 PM

#

How can I vectorize this:py images = #an ndarray of 3-channel images, with dimensions (batch, height, width, rgb) images2 = np.zeros(images.shape) for i, image in enumerate(images): images2[i] = skimage.color.rgb2hed(image)?

#

I know there's a way to run a transformation of each subdimension of an ndarray

#

but I can't recall what t's called

tidal bough Jun 18, 2021, 4:55 PM

#

scipy.color?..

solid aurora Jun 18, 2021, 4:55 PM

#

skimage*

tidal bough Jun 18, 2021, 4:55 PM

#

I can't find any docs on it, huh

solid aurora Jun 18, 2021, 4:55 PM

#

my bad

tidal bough Jun 18, 2021, 4:55 PM

#

ah

#

solid aurora Jun 18, 2021, 4:56 PM

#

ah

tidal bough Jun 18, 2021, 4:56 PM

#

by the docs, it only needs the last dimension to be colors - there's no requirement of it being 3d

solid aurora Jun 18, 2021, 4:56 PM

#

nice catch!

tidal bough Jun 18, 2021, 4:56 PM

#

so you can cast it on the entire array

acoustic leaf Jun 18, 2021, 5:02 PM

#

Does tensorflow work with python 3.9? In their official page it says it has been tested with python 3.8

#

but doesn't mention py3.9?

serene scaffold Jun 18, 2021, 5:03 PM

#

acoustic leaf Does tensorflow work with python 3.9? In their official page it says it has been...

In that case, I just wouldn't use it with 3.9 until they have a release that they say is 3.9-compatible.

acoustic leaf Jun 18, 2021, 5:03 PM

#

so what do I do then? just reinstall python?

serene scaffold Jun 18, 2021, 5:03 PM

#

acoustic leaf so what do I do then? just reinstall python?

what OS are you on? You can have more than one python version installed at a time.

acoustic leaf Jun 18, 2021, 5:03 PM

#

i am on windows

austere swift Jun 18, 2021, 5:03 PM

#

acoustic leaf Does tensorflow work with python 3.9? In their official page it says it has been...

it does have wheels for python 3.9

tidal bough Jun 18, 2021, 5:03 PM

#

install a 3.8 Python, yes. If you're using something like pyenv (IIRC) you can even manage them automatically

austere swift Jun 18, 2021, 5:04 PM

#

so it should work

acoustic leaf Jun 18, 2021, 5:04 PM

#

I don't know how pyenv works tho. should I just download anaconda?

serene scaffold Jun 18, 2021, 5:04 PM

#

acoustic leaf I don't know how pyenv works tho. should I just download anaconda?

definitely not

serene scaffold Jun 18, 2021, 5:05 PM

#

acoustic leaf i am on windows

if you install 3.8 from the Python website, you can use the py command to pick which version you use

acoustic leaf Jun 18, 2021, 5:08 PM

#

so I can have 2 installations at once and switch between them with the py command?

austere swift Jun 18, 2021, 5:08 PM

#

yes

acoustic leaf Jun 18, 2021, 5:08 PM

#

so I am guessing it changes the default python system-wide?

serene scaffold Jun 18, 2021, 5:09 PM

#

acoustic leaf so I am guessing it changes the default python system-wide?

no

acoustic leaf Jun 18, 2021, 5:10 PM

#

thanks for the help guys. I guess I'll be looking into it.

static granite Jun 18, 2021, 5:33 PM

#

hello. I want to write a program to detect vehicles in traffic.
Does anyone know any good books on this subject? The field of computer vision in general is also relevant

serene scaffold Jun 18, 2021, 5:41 PM

#

static granite hello. I want to write a program to detect vehicles in traffic. Does anyone know...

so it will be detecting which parts of an image are of a vehicle? or will it be a live video feed? something else?

static granite Jun 18, 2021, 5:41 PM

#

live feed from the car

#

for example

serene scaffold Jun 18, 2021, 5:43 PM

#

That isn't something that I know about, but "dashcam footage detect vehicles machine learning" might be a good Google query. But hopefully someone else who knows about that topic will show up here.

static granite Jun 18, 2021, 5:43 PM

#

it doesnt have to be specifically that

#

im looking for books about image classification in general

lapis sequoia Jun 18, 2021, 6:26 PM

#

Hey there everybody! I have a little question, do you absolutely need to be an advance Python programmer(Knowing A-Z in beginner level programming like lambda function sort() etc.,OOP, Socket programming, Concurrent Programming, Data Structures & Algorithms) to learn AI, ML & DL or do you only need to learn the core concepts like the basics and OOP?

grave frost Jun 18, 2021, 7:15 PM

#

static granite hello. I want to write a program to detect vehicles in traffic. Does anyone know...

look up Yolo models, use them for inference, count the number of vehicles. bingo!

grave frost Jun 18, 2021, 7:17 PM

#

lapis sequoia Hey there everybody! I have a little question, do you absolutely need to be an a...

nah, moderate python is more than enough for starters. you might have to upgrade a bit later on, but if you would be in the industry then chances are you would use simple stuff which wouldn't require extreme knowledge of python

worn bough Jun 18, 2021, 7:59 PM

#

lapis sequoia Hey there everybody! I have a little question, do you absolutely need to be an a...

These are some advanced topics you're mentioning. Most of them don't relate to ML&DS in obvious ways. If you master the basics (I'd say the content of Automate The Boring Stuff) then you're ready to learn numpy and pandas and after that sklearn and pytorch/tensorflow. This might be an unpopular opinion among thoroughly trained people but the libraries keep getting easier to use and you learn most by just applying it in real life

grave frost Jun 18, 2021, 8:04 PM

#

worn bough These are some advanced topics you're mentioning. Most of them don't relate to M...

its not an unpopular opinion, just a wrong one. shallow knowledge won't get you anywhere significant if you don't put in the effort yourself

iron basalt Jun 18, 2021, 8:53 PM

#

lapis sequoia Hey there everybody! I have a little question, do you absolutely need to be an a...

If you want to be the person that creates the libraries used by others to do ML/AI/DL/DS stuff (e.g. pytorch, numpy, pandas, etc) then yes.

sleek sorrel Jun 18, 2021, 8:55 PM

#

Hello guys, I am hawing problems with pivoting table with pandas in Kiwi help room. Can someone help please?

flint mason Jun 18, 2021, 8:57 PM

#

Is it worth it to learn excel for data analysis?

iron basalt Jun 18, 2021, 8:57 PM

#

You should also have a solid grasp of the computational complexity of various data structures / algorithms to make an informed decision on whether something is a viable option or requires too much compute (even if you never use the specific structures / algorithms studied, just get used to estimating how fast things are).

thorn bobcat Jun 18, 2021, 9:11 PM

#

anyone heard of background matting?

spare vale Jun 18, 2021, 9:16 PM

#

flint mason Is it worth it to learn excel for data analysis?

A little, but it can't deal with large datasets.

serene scaffold Jun 18, 2021, 9:48 PM

#

flint mason Is it worth it to learn excel for data analysis?

you need to be able to work with tabular data in general, so if you were to learn excel, you'd probably learn a lot of the terminology surrounding data manipulation. But I only use excel (well, google sheets in this case) to put data on my work's google drive for my coworkers to look at.

thorn bobcat Jun 18, 2021, 10:37 PM

#

https://arxiv.org/pdf/2011.11961.pdf this is a paper on an algo called grabcut

#

thinking of introducing a gan to cut human interference since the original paper required human guidance.

#

also object detection to highlight the object rather than having a user create a box

gritty spear Jun 18, 2021, 11:16 PM

#

Hi, I have been reading for the past 3 weeks now but i'm still at a loss. Can anyone please guide ?

cedar sun Jun 18, 2021, 11:16 PM

#

eeeehm guys one thing, when using ImageDataGenerator

#

How many new imgs are returned?

#

or how can i control it?

gritty spear Jun 18, 2021, 11:17 PM

#

i'm trying to use available libraries to be able to generate texts, from existing keywords. I have few texts I wanted to feed for the learning process but I'm a bit a t a loss on where to start from

cedar sun Jun 18, 2021, 11:19 PM

#

use gpt3

gritty spear Jun 18, 2021, 11:19 PM

#

i'm told GPT3 is not accessible to the public yet.

thorn bobcat Jun 18, 2021, 11:32 PM

#

Use GPT2 then

serene scaffold Jun 18, 2021, 11:58 PM

#

gritty spear i'm told GPT3 is not accessible to the public yet.

There's an API for working with it, but it's not open source yet, no.

gritty spear Jun 19, 2021, 12:06 AM

#

serene scaffold There's an API for working with it, but it's not open source yet, no.

what's the url pls? how is the how is the delivery compared to GPT2?

#

i mean the content quality

serene scaffold Jun 19, 2021, 12:07 AM

#

gritty spear what's the url pls? how is the how is the delivery compared to GPT2?

I'm not sure. Sentence generation isn't part of my area within nlp.

gritty spear Jun 19, 2021, 12:08 AM

#

what's the url for the api?

cedar sun Jun 19, 2021, 12:09 AM

#

stelercus, do u have by chance any snipper using albumentations?

radiant kayak Jun 19, 2021, 12:15 AM

#

Hello

grave frost Jun 19, 2021, 12:22 AM

#

gritty spear what's the url pls? how is the how is the delivery compared to GPT2?

terrific

serene scaffold Jun 19, 2021, 1:24 AM

#

cedar sun stelercus, do u have by chance any snipper using albumentations?

idk what that is, sorry

cedar sun Jun 19, 2021, 1:30 AM

#

dw

#

from here

#

https://albumentations.ai/docs/api_reference/augmentations/transforms/

Albumentations Documentation - Transforms (augmentations.transforms)

Albumentations: fast and flexible image augmentations

#

what augmentations do u think will be usefull?

visual violet Jun 19, 2021, 2:14 AM

#

hello guys

#

i am stuck (again)

#

suppose i have this

#

the "prediciton" column is to determine which cluster does which object belong to

#

so 0 menas cluster 1

#

1 means cluster 2, and so on

#

now i want to see how the object's classificaion has anything to do with its being clustered

#

but i have no idea how to proceed

#

my intial strat is to do this https://matplotlib.org/stable/gallery/lines_bars_and_markers/barh.html

#

but it seems a bit weird

serene scaffold Jun 19, 2021, 2:33 AM

#

visual violet suppose i have this

are you sure you don't have ingredient and classification backwards? it seems like there's tons of unique values on either side.

#

I guess that's fine, actually

visual violet Jun 19, 2021, 2:34 AM

#

you see the annoying thing about drugs is

#

one ingredient can treat different thigns lmao

#

and there are sub divisions of classes

#

like there is an umbrella class, a sub class, an even subber calss

serene scaffold Jun 19, 2021, 2:35 AM

#

@visual violet can you do print(df.sample(axis=0, n=7).to_csv())?

#

remember that I can't do anything with screenshots

visual violet Jun 19, 2021, 2:35 AM

#

very true

#

,Ingredient,predictions,classification
1397,NABUMETONE,0,Nonsteroidal Anti-inflammatory Drugs (NSAIDs)
1738,PROPRANOLOL HCL,0,propranolol hydrochloride
1801,LEVETIRACETAM,0,Seizure Disorders
733,DULOXETINE HYDROCHLORIDE,0,"Analgesics, Centrally-acting Nonopioid; Anxiolytics, Non-benzodiazepines; Fibromyalgia; Neuropathy/Neuralgia; Serotonin-Norepinephrine Reuptake Inhibitors (SNRIs)"
1773,OXYBUTYNIN CHLORIDE,0,"Antispasmodics, Urinary"
1229,LABETALOL HYDROCHLORIDE,0,Beta Blockers
16,ZOLMITRIPTAN,2,Headache/Migraine

#

i hope this works

serene scaffold Jun 19, 2021, 2:39 AM

#

visual violet ,Ingredient,predictions,classification 1397,NABUMETONE,0,Nonsteroidal Anti-infla...

df.groupby('classification')['predictions'].value_counts().unstack(fill_value=0)

#

this way you can see the frequency of each class in each cluster.

visual violet Jun 19, 2021, 2:40 AM

#

sheesh did you just do everything in one line of code

serene scaffold Jun 19, 2021, 2:40 AM

#

but it's your job as the human to speculate as to why your feature selection resulted in which class ending up in which cluster.

visual violet Jun 19, 2021, 2:41 AM

#

yeah i used selenium to get the classification hehe

#

even though there are 84 None values

serene scaffold Jun 19, 2021, 2:41 AM

#

selenium?

#

wat

visual violet Jun 19, 2021, 2:41 AM

#

i just asked you about it in the morning lol

serene scaffold Jun 19, 2021, 2:41 AM

#

yeah but all I understood out of that was that you wanted to handle exceptions

visual violet Jun 19, 2021, 2:42 AM

#

oh right

#

the ultimate goal is to find subclasses for each ingredient

#

which i have finally done

serene scaffold Jun 19, 2021, 2:43 AM

#

good job!

visual violet Jun 19, 2021, 2:44 AM

#

serene scaffold ```py df.groupby('classification')['predictions'].value_counts().unstack(fill_va...

will this work for none value?

serene scaffold Jun 19, 2021, 2:44 AM

#

visual violet will this work for none value?

Are those manifest as nans?

visual violet Jun 19, 2021, 2:46 AM

#

i tried it it does work

#

so i took a look at your code

#

i mean one line of code

#

it does put the number of classification for each cluster

#

but it doesn't sort for each cluster

#

for example

#

i know you can't read screenshot but i can't do anything else :((

#

how can i put the trycylic on top of acne

serene scaffold Jun 19, 2021, 2:54 AM

#

visual violet how can i put the trycylic on top of acne

do you just want to arbitrarily put that one on top, or is there a reason?

visual violet Jun 19, 2021, 2:55 AM

#

it seems like trycylic appears 32 times and acne appears 10 times

#

in cluster 0

serene scaffold Jun 19, 2021, 3:01 AM

#

visual violet i know you can't read screenshot but i can't do anything else :((

df.rename(axis=1, mapper=str).sort_values('0 1 2'.split(), ascending=False)

visual violet Jun 19, 2021, 3:01 AM

#

do i run this code first

#

or the other one first

serene scaffold Jun 19, 2021, 3:02 AM

#

visual violet do i run this code first

it's intended to act upon the dataframe in the screenshot I replied to

visual violet Jun 19, 2021, 3:02 AM

#

oh right

#

i have a non-python question

serene scaffold Jun 19, 2021, 3:02 AM

#

what is the meaning of life? idk

visual violet Jun 19, 2021, 3:02 AM

#

suppose i am reading a research paper and i like a piece of evidence

serene scaffold Jun 19, 2021, 3:02 AM

#

visual violet suppose i am reading a research paper and i like a piece of evidence

burn it

visual violet Jun 19, 2021, 3:02 AM

#

but that evidence is linked to another research paper

#

do i cite the research paper i am reading

#

or the original source

serene scaffold Jun 19, 2021, 3:03 AM

#

depends on what claim made in the paper you're citing

visual violet Jun 19, 2021, 3:04 AM

#

how about the evidence is just a fact

#

for example, on the research paper, Americans eat 3 cheeseburgers a day (python committe)

#

do i cite python committee?

serene scaffold Jun 19, 2021, 3:06 AM

#

visual violet do i cite python committee?

was it the "python committee" that conducted the survey that determined that?

visual violet Jun 19, 2021, 3:06 AM

#

yes

serene scaffold Jun 19, 2021, 3:06 AM

#

then yes

visual violet Jun 19, 2021, 3:06 AM

#

oh man

#

i like how in the background info section

#

the author link to many other reearch paper

#

serene scaffold Jun 19, 2021, 3:20 AM

#

@visual violet you didn't save the result of the first statement to a variable

#

So it got thrown away. Pandas rarely modifies the source data.

visual violet Jun 19, 2021, 3:22 AM

#

i thought the ingredient_list is already modified

#

my bad

serene scaffold Jun 19, 2021, 3:22 AM

#

Nope. You should save it to a variable with a different name.

visual violet Jun 19, 2021, 3:22 AM

#

this maybe too much to ask

#

but you are sorting on the first column

serene scaffold Jun 19, 2021, 3:23 AM

#

Ya

visual violet Jun 19, 2021, 3:23 AM

#

can you sort over all three?

serene scaffold Jun 19, 2021, 3:23 AM

#

It is.

visual violet Jun 19, 2021, 3:23 AM

#

i mean it won't be a complete beautiful table like that

serene scaffold Jun 19, 2021, 3:23 AM

#

It sorts on the first column, then the second, then the third

visual violet Jun 19, 2021, 3:23 AM

#

serene scaffold Jun 19, 2021, 3:24 AM

#

Looks right to me.

visual violet Jun 19, 2021, 3:25 AM

#

i am glad that each cluster doesn't have the same classification popularity

#

i know waht i just wrote doesn't make much sense for anybody

serene scaffold Jun 19, 2021, 3:25 AM

#

So, you're glad that the clusters are largely disjoint with respect to what classes the instances have.

visual violet Jun 19, 2021, 3:26 AM

#

oh yes

#

english has rejoined the chat

#

this is potentially very meaningful

#

so if your code does what you say

#

then the biggest count in column '2' is 18?

serene scaffold Jun 19, 2021, 3:27 AM

#

My code always does what I say

visual violet Jun 19, 2021, 3:27 AM

#

there is no bigger number

serene scaffold Jun 19, 2021, 3:27 AM

#

visual violet then the biggest count in column '2' is 18?

No

#

That row is there because of the 40 in the zero column

#

If you want it to sort by the maximum value of each row, that's different

visual violet Jun 19, 2021, 3:29 AM

#

is there a way to just break the column '2' away along with the ingredient and sort it by itself

serene scaffold Jun 19, 2021, 3:29 AM

#

You can select only those rows where the two column is not 0

#

With loc

#

Anyway I must go to sleep

visual violet Jun 19, 2021, 3:30 AM

#

good night!

serene scaffold Jun 19, 2021, 3:30 AM

#

Bye!

lapis sequoia Jun 19, 2021, 4:18 AM

#

grave frost nah, moderate python is more than enough for starters. you might have to upgrade...

@grave frost what do you mean by "Moderate Python"? Is it OOP and the basics?

gritty spear Jun 19, 2021, 4:54 AM

#

how do i feed in my custom text model for training for GPT and BERT ?

ashen sable Jun 19, 2021, 5:24 AM

#

guys i install tensorflow using pip and then when i imported it it gives me this error

ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed.```

#

any explanation ?

eager cradle Jun 19, 2021, 6:49 AM

#

how make columns wider

#

wo dat just looks ugly

velvet thorn Jun 19, 2021, 6:52 AM

#

eager cradle how make columns wider

well, the first problem you have is that that’s one column

#

try sep=';'

eager cradle Jun 19, 2021, 6:54 AM

#

I know sep, but I used dat only in array, how it should look here 🤔

#

🤔

winged stratus Jun 19, 2021, 9:18 AM

#

i think you have delimiter issues in the csv file

winged stratus Jun 19, 2021, 9:19 AM

#

eager cradle I know sep, but I used dat only in array, how it should look here 🤔

try pd.read_csv('patient_data.csv', delimiter=';')

eager cradle Jun 19, 2021, 9:20 AM

#

@winged stratus thx a lot!

nova tapir Jun 19, 2021, 10:57 AM

#

why does my bot say 'goodbye' to everything I say?. how can i fix that ?

arctic wedgeBOT Jun 19, 2021, 10:58 AM

#

Hey @nova tapir!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

nova tapir Jun 19, 2021, 11:01 AM

#

https://paste.pythondiscord.com/ovuxucibah.coffeescript

#

https://paste.pythondiscord.com/ujegifivoj.py

nova tapir Jun 19, 2021, 11:05 AM

#

nova tapir why does my bot say 'goodbye' to everything I say?. how can i fix that ?

i'm training him but it just says 'goodbye' , 'bye' or something to what i say

#

{"intents": [
  {"tag": "greetings",
  "patterns": ["hello", "hey", "hi", "good day", "greetings", "what's up?", "how is it going?"],
  "responses": ["Hello", "Hey!", "What can I do for you?","Hi", "Good day", "Greetings", "what's up?"]
  },

  {"tag": "goodbye",
  "patterns": ["cya", "See you later", "Goodbye", "I am leaving", "Have a good day", "bye", "cao", "see ya"],
  "responses": ["Sad to see you go :(", "Talk to you later", "Goodbye!","Bye","cao","cya","see ya","bye bye"]
  },

  {"tag": "age",
  "patterns": ["how old", "how old is trojan", "what is yor age", "how old are you", "age?"],
  "responses": ["My owner Trojan is 17 years old!", "17 years!", "I am 1"]
  },

  {"tag": "name",
  "patterns": ["what is your name", "what should I call you", "whats your name?", "who are you?"],
  "responses": ["You can call me Jane!", "I'm Jane!", "I'm AI Assistant of trojan!", "My name is Jane"]
  },

  {"tag": "hours",
  "patterns": ["When are you guys open", "what are your hours", "hours of operation"],
  "responses": ["24/7"]
  }

]}

here is the json file

vivid echo Jun 19, 2021, 12:06 PM

#

Hey guys

#

Can you suggest me a walkthrough for PyTorch

#

I have experience in Keras and some Tensorflow 2.0 for deep learning

#

But I have not done any convolutional unsupervised systems

#

Only classification and regression supervised

austere swift Jun 19, 2021, 1:21 PM

#

vivid echo Can you suggest me a walkthrough for PyTorch

theres the tutorial walkthrough on pytorch's site

#

https://pytorch.org/tutorials/beginner/basics/intro.html

#

https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

#

if you want a video form, sentdex has a pytorch tutorial playlist as well https://www.youtube.com/playlist?list=PLQVvvaa0QuDdeMyHEYc0gxFpYwHY2Qfdh

YouTube

Pytorch - Deep learning w/ Python

gritty spear Jun 19, 2021, 2:49 PM

#

@nova tapir how do you custom-train your models?

nova tapir Jun 19, 2021, 2:53 PM

#

gritty spear <@!322074548021231618> how do you custom-train your models?

what do you mean ?

gritty spear Jun 19, 2021, 2:53 PM

#

i'm trying to do something similar, but i'm still new in the field. reading few resources but still confused

#

I have custom text which i want to train in order to spin them to provide a new text in the same context

#

@nova tapir do you have few resources i can follow ?

jaunty yoke Jun 19, 2021, 3:37 PM

#

Hello, does anyone here know of any packages that will allow me to categorise words?

#

For example, I can search for all prepositions in a list and it will return this

serene scaffold Jun 19, 2021, 3:46 PM

#

jaunty yoke For example, I can search for all prepositions in a list and it will return this

you can use spaCy, which has a part-of-speech tagger

#

otherwise, you need to know what word categories you have in mind.

#

(which happens to be my area of expertise, inasfaras I can be considered an expert on anything)

cedar sun Jun 19, 2021, 4:00 PM

#

can i use gpt3 as user?

#

like, normal user?

#

not the model

haughty wharf Jun 19, 2021, 5:04 PM

#

Hi,

Is it possible to use a scatterplot on a pandas dataframe with 256 columns on the x-axis and having a 5 column identifier on the y-axis?

#

Here is what the dataframe looks like

lapis sequoia Jun 19, 2021, 5:12 PM

#

if you've ever heard of matplotlib and pyplot and used them, what would you prefer on the basis of user friendliness ?

#

or even pls state the reason for superiority of one over other if more factors other than user-friendliness do exist and are important

lunar bison Jun 19, 2021, 5:17 PM

#

Hello

I'd just like to let you all know i'm a lying, racist, and sexist scammer. I give no regard to anyone else, I'm just a idiotic kid who doesn't know crap and acts like a big man. I constantly spam slurs, and I love scamming people. I also violated multiple discord's terms of service. I'm a piece of garbage.

Please spread the word. I'm a scammer.
This is my ID: 841420280425611265

This account was hijacked by someone I scammed. They're the ones posting this. Deal with caution when dealing with me. Have a good day

lapis sequoia Jun 19, 2021, 5:19 PM

#

monkaW

#

bruh

haughty wharf Jun 19, 2021, 5:26 PM

#

@lapis sequoia
So I've been trying to use matplotlib, but when i try running this on jupyter noteboook, the cell just freezes

for x, col in enumerate(phoneme_df2.columns):
    for y, ind in enumerate(phoneme_df2['g'].index):
        if phoneme_df2.loc[ind, col]:
            plt.plot(x, y, 'o', color='red')
            
plt.xticks(range(len(phoneme_df2.columns)), labels=phonme_df2.columns)
plt.yticks(range(len(phoneme_df2)), labels=phoneme_df2['g'].index)
    
plt.show()

#

ohhh woops thought you were speaking to me

lapis sequoia Jun 19, 2021, 5:33 PM

#

KEK

fiery pollen Jun 19, 2021, 5:49 PM

#

hello everyone, i have a case and my time is very limited can someone help me? I would be really happy if you do this, if anyone thinks about it, can you write it privately?

ivory jewel Jun 19, 2021, 5:53 PM

#

Hello everyone, if someone can help me understanding a bit more a piece of code, I'd be extremely grateful. 7063_homu_zzzz
I'm short on time, I have a thesis to write...
I'm dealing with CNN, I already have the architecture and the code, I would like some clarifications so I can make correspondences between the two (thanks in advance)

gritty spear Jun 19, 2021, 6:00 PM

#

any insight?

misty flint Jun 19, 2021, 6:09 PM

#

data viz is pretty cool

#

highly recommend cole's storytelling with data book

#

praise

ivory jewel Jun 19, 2021, 6:15 PM

#

I actually have a dataset with 3197 features, here are the 5 first rows. I found a code that I would like to understand. The model structure comes as :

Input layer;
1D convolutional layer, consisting of 10 2x2 filters, L2 regularization and RELU activation function;
1D max pooling layer, window size - 2x2, stride - 2;
-Dropout with 20% probability;
-Fully connected layer with 48 neurons and RELU activation function;
-Dropout with 40% probability;
-Fully connected layer with 18 neurons and RELU activation function;
-Output layer with sigmoid function.

#

I'm having troubles understanding counting the number of layers..

#

The code :

    # Architecture
    model = Sequential()
    model.add(Reshape((3197, 1), input_shape=(3197,)))
    model.add(Conv1D(filters=10, kernel_size=2, activation='relu', input_shape=(n_features, 1), kernel_regularizer='l2'))
    model.add(MaxPooling1D(pool_size=2, strides=2))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(48, activation="relu"))
    model.add(Dropout(0.4))
    model.add(Dense(18, activation="relu"))
    model.add(Dense(1, activation="sigmoid"))

thorn bobcat Jun 19, 2021, 6:18 PM

#

Anyone worked with background_matting v2?

ivory jewel Jun 19, 2021, 6:18 PM

#

My question is : what does the : model = Sequential()
model.add(Reshape((3197, 1), input_shape=(3197,))) line of code mean and how can I calculate the number of output given the structure of the filters (the Conv1D for example)

lapis sequoia Jun 19, 2021, 6:20 PM

#

Is there any good course for mathematics for machine learning?

ripe forge Jun 19, 2021, 7:15 PM

#

ivory jewel My question is : what does the : model = Sequential() model.add(Reshape((3...

Model sequential just initializes the model, where you then can start defining the layers in it.

#

Model.add adds whatever layer is given to form the architecture layer by layer

#

Reshape must simply be a layer for reshaping the input received

grave frost Jun 19, 2021, 7:37 PM

#

lapis sequoia Is there any good course for mathematics for machine learning?

yeah, there is a book named mathematics for machine learning. its free, you can start with that

grave frost Jun 19, 2021, 7:37 PM

#

ivory jewel The code : ```python # Architecture model = Sequential() model.add(R...

do model.summary to see all details about your model, including each individual layer

#

from code tho, you have 8 layers. parameters would be printed out by model.summary()

mint palm Jun 19, 2021, 7:49 PM

#

what does this error mean

#

upper part of code :

#

serene scaffold Jun 19, 2021, 8:07 PM

#

@mint palm please provide text as text

#

!code

arctic wedgeBOT Jun 19, 2021, 8:07 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold Jun 19, 2021, 8:07 PM

#

that being said, \mathrel{+} is not valid Python code, so something else must have been intended.

visual violet Jun 19, 2021, 8:31 PM

#

my life is a lie

#

the methods to find k values for k-means are all different

#

i am so confused

coral kindle Jun 19, 2021, 8:33 PM

#

Hello, I wanted to know what regularization methods were.

#

Methods to prevent overfitting like LASSO and Ridge?

acoustic leaf Jun 19, 2021, 9:25 PM

#

How do I disable the cudart64_110.dll not found errors in Tensorflow?

#

btw I know my GPU isn't cuda-enabled. I just want to suppress these warnings

cedar sun Jun 19, 2021, 9:33 PM

#

do u know where can i apply cutmix augmentation?

low venture Jun 19, 2021, 9:53 PM

#

Hello everyone , I'm learning image treatment with python and I would like to know if I can change the color of a specific pixel.

quasi sparrow Jun 19, 2021, 9:53 PM

#

Guys, I have a question, can anyone help me out please?
I am uploading a dataset from a csv file and I convert it to a pandas frame but it gets loaded as an object of datatype int64 but I need datatype of int32 for my model

low venture Jun 19, 2021, 9:55 PM

#

How can I change the color for the numbers 2?

thorn bobcat Jun 19, 2021, 10:01 PM

#

low venture How can I change the color for the numbers 2?

train a classifier for numbers 2.

#

to detect pixel values in number 2

#

looking for second opinion on this.

visual violet Jun 19, 2021, 10:16 PM

#

. A
sequence composed of a series of nominal symbols from a
particular alphabet is usually called a temporal sequence, and
a sequence of continuous, real-valued elements, is known as a
time-series

#

i have 0 idea what this means

#

isn't it just y value over time?

#

i am very confused

thorn bobcat Jun 19, 2021, 10:26 PM

#

Wo' ooh bo' ooh

visual violet Jun 19, 2021, 10:27 PM

#

yes?

haughty wharf Jun 19, 2021, 11:54 PM

#

Can someone here help me with trying to implement an LDA model on my dataset for a scatter plot?

I was following this Youtube video that used np.where to show separation between two classifications but the current data that Im working with has 5. The part of the code that's commented out, was the original code in the video.

lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train.ravel())

y_prob_lda = lda.predict_proba(X_test)[:,1]
y_pred_lda = np.where(phoneme_df2['g'] != 'aa' | phoneme_df2['g'] != 'dcl' | phoneme_df2['g'] != 'iy'
                     phoneme_df2['g'] != 'sh' | phoneme_df2['g'] != 'ao')

 #np.where(y_prob_lda > .5, 1, 0)

#

The dataset itself has 256 features, about 4509 instances, and 5 classifications

visual violet Jun 20, 2021, 1:33 AM

#

your code looks complicated

mint palm Jun 20, 2021, 4:03 AM

#

acoustic leaf How do I disable the `cudart64_110.dll not found` errors in Tensorflow?

check this C:\Program Files\NVIDIA GPU Computing Toolkit\if its cuda in there or no....if not then install it

mint palm Jun 20, 2021, 4:04 AM

#

serene scaffold that being said, `\mathrel{+}` is not valid Python code, so something else must ...

i will try changing that

mint palm Jun 20, 2021, 4:34 AM

#

#

i wanna do this

#

but how?

mint palm Jun 20, 2021, 4:38 AM

#

mint palm

these are the dimensions

#

n_h,n_w,c means ``horizontal,vertical, channel(for rbg an all)`

#

m means no. of examples

main fox Jun 20, 2021, 5:42 AM

#

What are some interesting data sources with regularly updated data?
Like yahoo finance

nova tapir Jun 20, 2021, 8:20 AM

#

import numpy as np 

def nonlin(x, deriv=False):
    if(deriv == True):
        return (x * (1-x))

    return 1 / (1+np.exp(-x))


X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1]])


y = np.array([[1],
              [0],
              [1],
              [1]])

np.random.seed(1)

syn0 = 2*np.random.random((3,4)) - 1
syn1 = 2*np.random.random((4,1)) - 1

for j in range(60000):

    l0 = X 
    l1 = nonlin(np.dot(l0, syn0))
    l2 = nonlin(np.dot(l1, syn1))

    l2_error = y - l2

    if (j % 10000) == 0:
        print("Error: " + str(np.mean(np.abs(l2_error))))

    l2_delta = l2_error * nonlin(l2, deriv = True)
    l1_error = l2_delta.dot(syn1.T)
    l1_delta = l1_error * nonlin (l1, deriv = True)

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)

print("Output after training")
print (l2)

#

#

which is the correct visualization of this neural network code?

bold timber Jun 20, 2021, 9:08 AM

#

How to convert type of string to integer?

primal tulip Jun 20, 2021, 9:16 AM

#

lapis sequoia Is there any good course for mathematics for machine learning?

Check for the ML and DL courses from Andrew Ng in Coursera.org. It's more of a practical approach. There's a course that I did called "Mathematics for programmers" or something along those lines, but I can't find it.

primal tulip Jun 20, 2021, 9:33 AM

#

Check these also. https://github.com/ertsiger/coursera-mathematics-for-ml

indigo pelican Jun 20, 2021, 9:34 AM

#

has anyone worked with TF-hub before? I tried 2-3 models but I have no idea what is the output they give me

sleek iron Jun 20, 2021, 10:23 AM

#

Hi Guys is there any dedicated chanel for opencv python

fiery pollen Jun 20, 2021, 11:35 AM

#

Hi I have 15 columns but describe give me just 1 column and other techniques too(graph correlation etc) how can i solve this problem?

bold timber Jun 20, 2021, 11:42 AM

#

bold timber How to convert type of string to integer?

anyone can help me?

near summit Jun 20, 2021, 11:57 AM

#

fiery pollen Hi I have 15 columns but describe give me just 1 column and other techniques too...

I think describe works with numeric columns only

gritty spear Jun 20, 2021, 12:32 PM

#

Hi, i need guidance, I want to generate text from series of keywords. I want to train the model from bunch of texts I already have. I come across hugginface but i don't know which procedure to follow. Is there any writeups that can help me set to the path? How do i convert the text into train models?

ebon pier Jun 20, 2021, 12:34 PM

#

hello i want to #ask
how to deal with he's and has for text preprocessing?

floral sky Jun 20, 2021, 12:44 PM

#

hey i want to do some machine learning using tensor linear regression. but my problem is that i dont know how i can convert my string files into int

serene scaffold Jun 20, 2021, 1:32 PM

#

fiery pollen Hi I have 15 columns but describe give me just 1 column and other techniques too...

Not all of your columns contain numerical data. Try calling .describe for individual columns.

fiery pollen Jun 20, 2021, 1:49 PM

#

Yep I solved but I have new problem now

#

this is my date format

#

serene scaffold Jun 20, 2021, 1:52 PM

#

I can't look at both of these screenshots at the same time, so please provide the data as text.

fiery pollen Jun 20, 2021, 1:52 PM

#

well this is my date format

visual violet Jun 20, 2021, 1:53 PM

#

another day another stuggle to cluster

fiery pollen Jun 20, 2021, 1:53 PM

#

fiery pollen

and this is my code

serene scaffold Jun 20, 2021, 1:53 PM

#

fiery pollen and this is my code

!code

arctic wedgeBOT Jun 20, 2021, 1:53 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

fiery pollen Jun 20, 2021, 1:53 PM

#

ValueError: time data '5.03.2021 00:00' does not match format '%d-%m-%Y %H:%M' (match) and value error

serene scaffold Jun 20, 2021, 1:53 PM

#

fiery pollen

This screenshot has hyphens in it rather than periods. Note that if you need any additional help with this, I will not read any more screenshots of text.

fiery pollen Jun 20, 2021, 1:54 PM

#

df['DateDim[Day]'] = pd.to_datetime(df['DateDim[Day]'], format='%d-%m-%Y %H:%M:%S')

#

df['DateDim[Day]'] = pd.to_datetime(df['DateDim[Day]'], format='%m-%d-%Y %H:%M') or this

serene scaffold Jun 20, 2021, 1:55 PM

#

'%d-%m-%Y %H:%M:%S'  # This has hyphens where there should be dots
'5.03.2021 00:00'    # The actual strings you're trying to match has dots

#

I can't guess if 5 is a day or a month, as 5/3 and 3/5 are both valid day-month pairs.

fiery pollen Jun 20, 2021, 1:56 PM

#

5 is day and 03 is month

serene scaffold Jun 20, 2021, 1:56 PM

#

fiery pollen 5 is day and 03 is month

Here's the mini-language for date formatting and parsing: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

#

See if you can solve the problem with the hyphens.

fiery pollen Jun 20, 2021, 2:01 PM

#

yess you are great

ebon pier Jun 20, 2021, 2:19 PM

#

I thought you were talking with yourself

visual violet Jun 20, 2021, 2:20 PM

#

\

#

i think i graphed it wrong

#

can somebody pleae help?

#

0    0.0    0.350547    -0.389165    0.031171    0.131560    0.101988    0.012384    0.118384    -0.326644    0.159515    0.641205    -0.287578    0.295131    0.049982    -0.453871    -0.058566    0.084067    0.033252    -0.087150    -0.026975
1    0.0    -0.063362    0.228691    -0.177655    -0.035891    -0.194385    -0.225461    0.085287    -0.112722    -0.190376    -0.319231    0.316168    0.297476    -0.222511    -0.161768    -0.022497    -0.107356    0.343189    -0.142414    0.157067
2    0.0    -0.525495    0.044349    0.259054    0.032564    0.017787    0.109994    0.617328    1.539279    -0.704796    -0.155155    0.132843    -0.039865    -0.213152    0.298412    -0.391566    -0.107134    -0.313010    -0.238712    -0.138868
3    0.0    0.294616    -0.146452    -0.010603    -0.289189    0.518459    -0.348416    0.174120    0.197173    -0.207225    -0.202068    -0.067731    -0.098195    0.377949    -0.284327    0.140845    0.179972    -0.269980    -0.163283    0.055986
4    0.0    -0.286758    0.176156    -0.045746    -0.031385    -0.361086    0.691218    -0.348555    0.612737    -0.376248    0.030953    -0.105264    0.176193    -0.208051    0.025628    -0.079569    0.342263    -0.220916    0.133213    -0.003057

#

uhh

#

as you can see there are negative values

#

somehow the graph doesn't show negative values?

serene scaffold Jun 20, 2021, 2:25 PM

#

visual violet somehow the graph doesn't show negative values?

please also share the code that made the graph

visual violet Jun 20, 2021, 2:25 PM

#

counter = 0 
figure(figsize=(15, 10), dpi=80)
for index, row in percentage_difference.iterrows():
    
    #line, = plt.plot(row, marker='o')
    line, = plt.plot(row)
    if predictions_pct[counter] == 0:
        line.set_color("b")    #blue
    if predictions_pct[counter] == 1:
        line.set_color("g")   #green
    if predictions_pct[counter] == 2:
        line.set_color("r")  #red
    if predictions_pct[counter] == 3:
        line.set_color("c")  #cyan
    if predictions_pct[counter] == 4:
        line.set_color("m")
    if predictions_pct[counter] == 5:
        line.set_color("y")
    if predictions_pct[counter] == 6:
        line.set_color("k")
    if predictions_pct[counter] == 7:
        line.set_color("pink")
    counter = counter + 1
plt.xlabel('Quarter')
plt.ylabel('Percentage differnce')
plt.autoscale(enable=True, axis='x', tight=True)

serene scaffold Jun 20, 2021, 2:26 PM

#

is predictions_pct the dataframe from before?

visual violet Jun 20, 2021, 2:26 PM

#

predictions_pct shows what cluster

serene scaffold Jun 20, 2021, 2:26 PM

#

what is it?

#

an array? a dataframe?

mint palm Jun 20, 2021, 2:27 PM

#

IF I DO np.multiply(3d_matrix_1, 3d_matrix_2)(same dimensions lets say (m, n, l)), will i get matrix of (m, n, 1)

visual violet Jun 20, 2021, 2:28 PM

#

serene scaffold Jun 20, 2021, 2:28 PM

#

visual violet <class 'numpy.ndarray'>

please share the data so that we can replicate this

visual violet Jun 20, 2021, 2:28 PM

#

model_pct = TimeSeriesKMeans(n_clusters=3, metric="dtw", max_iter=10)
predictions_pct = model_pct.fit_predict(percentage_difference)

serene scaffold Jun 20, 2021, 2:28 PM

#

mint palm IF I DO ``np.multiply(3d_matrix_1, 3d_matrix_2)``(same dimensions lets say ``(m,...

Yes, it will just do element-wise multiplication

visual violet Jun 20, 2021, 2:29 PM

#

       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],

#

lol

#

i graph each row of the percentage_difference

#

then i set color accordingly

serene scaffold Jun 20, 2021, 2:31 PM

#

visual violet ``` 2016_Q1 2016_Q2 2016_Q3 2016_Q4 2017_Q1 2017_Q2 2017_Q3...

where do you actually put this data in the plot?

#

nevermind, I see now

#

one moment

visual violet Jun 20, 2021, 2:31 PM

#

the problem is there is no negative value in the y axis in the graph when there should be

serene scaffold Jun 20, 2021, 2:34 PM

#

alright let me see

#

@visual violet

In [21]: df.plot.line(xlabel='Quarter', ylabel='Percentage Difference')
Out[21]: <AxesSubplot:xlabel='Quarter', ylabel='Percentage Difference'>

#

#

Not sure why the key is in the middle though

visual violet Jun 20, 2021, 2:36 PM

#

how is your graph so much better than mine

#

i mean a lot better

serene scaffold Jun 20, 2021, 2:36 PM

#

visual violet how is your graph so much better than mine

idk, I just fumbled my way to this by testing stuff

visual violet Jun 20, 2021, 2:37 PM

#

are you coloring it according to the date?

serene scaffold Jun 20, 2021, 2:37 PM

#

it's wrong though

#

let me see

visual violet Jun 20, 2021, 2:37 PM

#

well at least you have the negative direction and i don't lol

serene scaffold Jun 20, 2021, 2:37 PM

#

Yeah I had to transpose it first.

#

visual violet Jun 20, 2021, 2:38 PM

#

please teach me your way

serene scaffold Jun 20, 2021, 2:38 PM

#

In [23]: df.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference')
Out[23]: <AxesSubplot:xlabel='Quarter', ylabel='Percentage Difference'>

visual violet Jun 20, 2021, 2:38 PM

#

one line of code??

#

you have to color code it though

#

so should be a bit longer

serene scaffold Jun 20, 2021, 2:38 PM

#

yes. the method call in line 23 takes color= as a kwarg

#

do you have an array/series of which cluster each row belongs to?

visual violet Jun 20, 2021, 2:39 PM

#

visual violet ```array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...

this

#

it is called predictions_pct

serene scaffold Jun 20, 2021, 2:40 PM

#

so it predicted that each row is the same cluster...?

visual violet Jun 20, 2021, 2:40 PM

#

yes! each row is an object

#

so one element in the array represent which cluster the corresponding row belongs to

serene scaffold Jun 20, 2021, 2:42 PM

#

But it's all zero

visual violet Jun 20, 2021, 2:42 PM

#

there are some 1 and 2 lol

serene scaffold Jun 20, 2021, 2:42 PM

#

oh there are a few 1s

visual violet Jun 20, 2021, 2:42 PM

#

that is why i complain the clustering doesn't work

serene scaffold Jun 20, 2021, 2:42 PM

#

Do you know what color you want for 0, 1, and 2?

visual violet Jun 20, 2021, 2:42 PM

#

but now i am already fucked, so i gotta keep going with the idea

#

excuse my language please

#

uhh i don't mind, make it red, blue, green

#

i got error like that when there is a logic error in my code

#

one time i merge dataframe with identical rows

#

python tried to do every single combinations

serene scaffold Jun 20, 2021, 2:48 PM

#

@visual violet

# This is your array from before
In [32]: predictions = pd.Series([0, 1, 2, 1, 1])

In [33]: predictions.replace(dict(enumerate(['red', 'green', 'blue'])))
Out[33]: 
0      red
1    green
2     blue
3    green
4    green
dtype: object

In [41]: df.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors)
Out[41]: <AxesSubplot:xlabel='Quarter', ylabel='Percentage Difference'>

#

visual violet Jun 20, 2021, 2:54 PM

#

line 33 won't change the array it self right?

serene scaffold Jun 20, 2021, 2:59 PM

#

visual violet line 33 won't change the array it self right?

No, numpy/pandas operations pretty much always act on a copy

visual violet Jun 20, 2021, 3:01 PM

#

can you pleae tell me what i did wrong?

#

like what is wrong with my logic

serene scaffold Jun 20, 2021, 3:02 PM

#

visual violet like what is wrong with my logic

I don't understand how it works.

fiery pollen Jun 20, 2021, 3:02 PM

#

Unable to allocate 65.6 GiB for an array with shape (8803915165,) and data type float64

I'm getting an error like this, I think it can be solved with virtual memory, but do I really need to allocate 65 gb virtual memory from the disk or do I have 12 gb ram already, do I need to top it up?

serene scaffold Jun 20, 2021, 3:04 PM

#

fiery pollen Unable to allocate 65.6 GiB for an array with shape (8803915165,) and data type ...

Well, 65 GiB is more that 65 GB. Is it a sparse array?

fiery pollen Jun 20, 2021, 3:05 PM

#

yep probably it is.

#

I should get an output like this

#

plt.figure(figsize=(20,12))

mergings = linkage(rfm_scaled, method="complete", metric='euclidean')
dendrogram(mergings)
plt.show()

and this is my code it is rfm analyze

#

If I change the method, will the problem be solved?

visual violet Jun 20, 2021, 3:14 PM

#

looks so much like clustering

#

but different lol

visual violet Jun 20, 2021, 3:33 PM

#

bruh i possibly messed up both graphs

#

and yet my professor didn't telll me that

#

even though i sent him my code

#

i am sad

visual violet Jun 20, 2021, 3:38 PM

#

serene scaffold <@!354372432838000642> ```py # This is your array from before In [32]: predictio...

this won't work if i have 5 clusters right?

#

because it assumes that it must have 3 clusters

serene scaffold Jun 20, 2021, 3:41 PM

#

visual violet this won't work if i have 5 clusters right?

you just have to give as many colors in the enumeration as there are clusters.

#

if you give more colors than there are clusters, however many extra will just never be used.

visual violet Jun 20, 2021, 3:46 PM

#

'numpy.ndarray' object has no attribute 'replace'

serene scaffold Jun 20, 2021, 3:46 PM

#

visual violet 'numpy.ndarray' object has no attribute 'replace'

you have to make it a series

visual violet Jun 20, 2021, 3:46 PM

#

yup i got it

serene scaffold Jun 20, 2021, 3:46 PM

#

(a series is just the pandas version of a 1d array)

visual violet Jun 20, 2021, 3:48 PM

#

wait where did you get the colors

#

you assigned colors = array.replaced... right?

#

colors = pd.Series(predictions_pct).replace(dict(enumerate(['red', 'green', 'blue'])))
percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors)

#

@serene scaffold sorry for ping

serene scaffold Jun 20, 2021, 3:54 PM

#

@visual violet do you have matplotlib installed?

visual violet Jun 20, 2021, 3:55 PM

#

yup i plot multiple things before

serene scaffold Jun 20, 2021, 3:57 PM

#

what version of pandas do you have?

visual violet Jun 20, 2021, 3:58 PM

#

#

something is wrong lol

serene scaffold Jun 20, 2021, 3:59 PM

#

visual violet

you might need to switch the scale so that values on the y axis aren't evenly spaced

visual violet Jun 20, 2021, 3:59 PM

#

serene scaffold what version of pandas do you have?

1.0.1

serene scaffold Jun 20, 2021, 3:59 PM

#

visual violet 1.0.1

that's an older version.

visual violet Jun 20, 2021, 4:02 PM

#

matplotlib : 3.1.3

serene scaffold Jun 20, 2021, 4:03 PM

#

visual violet matplotlib : 3.1.3

that version is old as well

#

in either case, look into how you can change the scale of the y axis

#

in particular, you probably want to make it logarithmic.

#

here's what I'm talking about: https://en.wikipedia.org/wiki/Semi-log_plot

Semi-log plot

In science and engineering, a semi-log plot/graph or semi-logarithmic plot/graph has one axis on a logarithmic scale, the other on a linear scale. It is useful for data with exponential relationships, where one variable covers a large range of values, or to zoom in and visualize that - what seems to be a straight line in the beginning - is in fa...

serene scaffold Jun 20, 2021, 4:09 PM

#

visual violet

if you try again with updated versions of pandas and matplotlib, you just need to add logy=True

#

!docs pandas.DataFrame.plot

arctic wedgeBOT Jun 20, 2021, 4:09 PM

#

pandas.DataFrame.plot


DataFrame.plot(*args, **kwargs)```
Make plots of Series or DataFrame.

Uses the backend specified by the option `plotting.backend`. By default, matplotlib is used.

visual violet Jun 20, 2021, 4:09 PM

#

the main focus is to make it looks nice lol

serene scaffold Jun 20, 2021, 4:09 PM

#

visual violet the main focus is to make it looks nice lol

Yes, but you need an updated version of pandas and matplotlib to do what I said

#

which will fix the scale of the y axis so your bottom lines aren't all scrunched together

visual violet Jun 20, 2021, 4:11 PM

#

i should probably resetart my computer

serene scaffold Jun 20, 2021, 4:11 PM

#

visual violet i should probably resetart my computer

That won't have the effect of updating the pandas and matplotlib versions

visual violet Jun 20, 2021, 4:16 PM

#

i tried to upgrade pandas

#

but i won't do that for some reasons

serene scaffold Jun 20, 2021, 4:19 PM

#

visual violet i tried to upgrade pandas

did you do pip install -U pandas matplotlib?

#

And what was the error message?

visual violet Jun 20, 2021, 4:19 PM

#

i have to upgrade pip first

#

one sec

haughty wharf Jun 20, 2021, 4:21 PM

#

Im trying to implement an LDA model on my dataset of 256 columns/features and 4509 rows. The problem Im facing with now is that the dataset used in the tutorial is using only 2 classifications and I have 5.

-I commented out the original statement from the tutorial and have been trying to work on it myself but haven't had any luck. Any ideas on how I can modify this?

lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train.ravel())

y_prob_lda = lda.predict_proba(X_test)[:,1]
y_pred_lda = np.where(y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0) #np.where(y_prob_lda > .5, 1, 0)

serene scaffold Jun 20, 2021, 4:22 PM

#

I'm worried that y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0 does something other than what you expected

haughty wharf Jun 20, 2021, 4:24 PM

#

serene scaffold I'm worried that `y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0` does something ...

Yeahh I've been trying a few things. One thing was following how he did it, and since I read that np.where uses bitwise operators I tried using or. But that didn't work. Is there a way to fix this? Im using this dataset https://web.stanford.edu/~hastie/ElemStatLearn/datasets/phoneme.data

#

Also its Im finding it kinda hard to derive insights from a dataset where I don't even know the names of the columns

#

So any tips on that would be great too! right now Im learning the LDA model reduces the number of classifications to the important ones which will hopefully aid in figuring out how to Analyze this

serene scaffold Jun 20, 2021, 4:27 PM

#

haughty wharf Yeahh I've been trying a few things. One thing was following how he did it, and ...

sounds good. is speaker the class?

haughty wharf Jun 20, 2021, 4:28 PM

#

serene scaffold sounds good. is `speaker` the class?

Ahh I was told to ignore the speaker column. Column g is the class of different phonemes

#

I took the speaker and row column out of my dataframe

serene scaffold Jun 20, 2021, 4:31 PM

#

haughty wharf Im trying to implement an LDA model on my dataset of 256 columns/features and 45...

for y_pred_lda = np.where(y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0), are you really just trying to figure out what the most probable class is for each row?

#

In [21]: lda = LDA().fit(X, y)

In [23]: lda.predict(X)
Out[23]: array(['sh', 'iy', 'dcl', 'dcl'], dtype='<U3')

In [27]: lda.predict_proba(X)
Out[27]: 
array([[1.28273626e-01, 5.59261688e-03, 8.66133757e-01],
       [6.77497944e-07, 9.93583762e-01, 6.41556027e-03],
       [9.92678238e-01, 3.56884038e-09, 7.32175869e-03],
       [8.43266553e-01, 6.81605776e-06, 1.56726631e-01]])

In [30]: lda.predict_proba(X).argmax(axis=0)
Out[30]: array([2, 1, 0])

#

I was using a subset of the data with only three classes.

haughty wharf Jun 20, 2021, 4:33 PM

#

serene scaffold for `y_pred_lda = np.where(y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0)`, are ...

Thats one thing I was thinking about. Honestly Im having a hard time figuring out how to go about analyzing this because Im not even sure what the columns mean

serene scaffold Jun 20, 2021, 4:34 PM

#

haughty wharf Thats one thing I was thinking about. Honestly Im having a hard time figuring ou...

each column is a class and each row is an instance. The value represents the probability that that instance belongs to the class for that column.

haughty wharf Jun 20, 2021, 4:34 PM

#

Ahhh I see

serene scaffold Jun 20, 2021, 4:34 PM

#

In [33]: lda.classes_
Out[33]: array(['dcl', 'iy', 'sh'], dtype='<U3')

I assume the columns follow this scheme

#

however this just seems like a roundabout way of doing lda.predict

serene scaffold Jun 20, 2021, 4:36 PM

#

serene scaffold ```py In [21]: lda = LDA().fit(X, y) In [23]: lda.predict(X) Out[23]: array(['s...

I should have done .argmax(axis=1)

haughty wharf Jun 20, 2021, 4:36 PM

#

So basically for each of these 4509 instances, we're trying to see how it matches to the corresponding g classification?

serene scaffold Jun 20, 2021, 4:36 PM

#

haughty wharf So basically for each of these 4509 instances, we're trying to see how it matche...

it appears that you're trying to learn what audio frequencies represent which phoneme. Are you familiar with phonemes?

haughty wharf Jun 20, 2021, 4:37 PM

#

Phonemes are sounds specific to letters/words? right?

serene scaffold Jun 20, 2021, 4:38 PM

#

haughty wharf Phonemes are sounds specific to letters/words? right?

they're the speech sounds in a natural language. They're somewhat related to letters, as spelling systems can be arbitrary.

#

it looks like the end goal is to be able to transcribe audio, though.

#

My background is in linguistics

visual violet Jun 20, 2021, 4:39 PM

#

bruh is there anything that you can't do?

serene scaffold Jun 20, 2021, 4:39 PM

#

serene scaffold they're the speech sounds in a natural language. They're somewhat related to let...

spelling systems, and more broadly, writing systems in general.

haughty wharf Jun 20, 2021, 4:40 PM

#

serene scaffold they're the speech sounds in a natural language. They're somewhat related to let...

Ohhh awesome! Guess I asked the right person about this haha

serene scaffold Jun 20, 2021, 4:41 PM

#

haughty wharf Im trying to implement an LDA model on my dataset of 256 columns/features and 45...

anyway, the number of classes shouldn't matter, as LinearDiscriminantAnalysis can generalize to an arbitrary number of classes. I think.

#

Also if your y is just the g column of the dataframe, you don't need to .ravel() it.

serene scaffold Jun 20, 2021, 4:43 PM

#

visual violet bruh is there anything that you can't do?

I actually don't know how LDAs work 🤷‍♂️

visual violet Jun 20, 2021, 4:43 PM

#

what exalcty is the version that i am trying to upgrade?

#

i can't seem to do it

haughty wharf Jun 20, 2021, 4:43 PM

#

serene scaffold anyway, the number of classes shouldn't matter, as `LinearDiscriminantAnalysis` ...

Here is where Im at right now. This is where I left off in the video so just trying to figure out how to move forward.

le = LabelEncoder()
phoneme_df2['g'] = le.fit_transform(phoneme_df2['g'])
encoded_data = pd.get_dummies(phoneme_df2)


y = phoneme_df2['g'].values.reshape(-1, 1)
X= encoded_data.drop(['g'], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 42)

lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train.ravel())

y_prob_lda = lda.predict_proba(X_test)[:,1]
y_pred_lda = np.where(y_prob_lda == 0, 0, 0 | y_prob_lda == 1, 1, 0) #np.where(y_prob_lda > .5, 1, 0)

serene scaffold Jun 20, 2021, 4:44 PM

#

I actually have to head out for a bit

haughty wharf Jun 20, 2021, 4:44 PM

#

And my understanding is LDA's are supposed to reduce the number of dimensions to the most relevant features depending on the data

haughty wharf Jun 20, 2021, 4:45 PM

#

serene scaffold I actually have to head out for a bit

Ahh gotcha np thanks for the break down though

haughty wharf Jun 20, 2021, 4:47 PM

#

serene scaffold Also if your `y` is just the `g` column of the dataframe, you don't need to `.ra...

What does ravel do exactly?

serene scaffold Jun 20, 2021, 4:55 PM

#

!docs numpy.ndarray.ravel

arctic wedgeBOT Jun 20, 2021, 4:55 PM

#

numpy.ndarray.ravel


ndarray.ravel([order])```
Return a flattened array.

Refer to [`numpy.ravel`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html#numpy.ravel "numpy.ravel") for full documentation.

See also

[`numpy.ravel`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html#numpy.ravel "numpy.ravel")equivalent function

[`ndarray.flat`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flat.html#numpy.ndarray.flat "numpy.ndarray.flat")a flat iterator on the array.

serene scaffold Jun 20, 2021, 4:55 PM

#

@haughty wharf that

visual violet Jun 20, 2021, 4:56 PM

#

i have finally updated the version lol

#

thx god

#

why does this thing label every single row lmao

#

i didn't even command it to do that

#

colors = pd.Series(predictions_pct).replace(dict(enumerate(['red', 'green', 'blue'])))
percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors)

serene scaffold Jun 20, 2021, 4:59 PM

#

@visual violet you forgot the logy part

#

Also I'm at the gym so I may not may not respond between sets

haughty wharf Jun 20, 2021, 5:03 PM

#

arctic wedge

Ahhh got it thanks

visual violet Jun 20, 2021, 5:38 PM

#

have a great workout my dude

#

i still can't fix it lol

serene scaffold Jun 20, 2021, 5:38 PM

#

I'm already back from that

visual violet Jun 20, 2021, 5:38 PM

#

oh

serene scaffold Jun 20, 2021, 5:39 PM

#

As of like two minutes ago

#

Anyway what error message and what pandas version?

visual violet Jun 20, 2021, 5:49 PM

#

'1.2.4'

#

there is really no error

#

when i put py percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors, logy=True)

#

it shows

#

when i change to

#

percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference', color=colors, logy=False
)

#

it shows

#

@serene scaffold

serene scaffold Jun 20, 2021, 6:06 PM

#

visual violet percentage_difference.T.plot.line(xlabel='Quarter', ylabel='Percentage Differenc...

Set logy to true

visual violet Jun 20, 2021, 6:08 PM

#

yeah when i set it to true, there is no graph at all

#

just label

serene scaffold Jun 20, 2021, 6:10 PM

#

visual violet yeah when i set it to true, there is no graph at all

can you put the whole CSV in a pastebin?

#

!paste

arctic wedgeBOT Jun 20, 2021, 6:10 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

cedar sun Jun 20, 2021, 6:11 PM

#

when training a model, should u augment the validation data too?

visual violet Jun 20, 2021, 6:12 PM

#

lmao exceeed maximum length

#

hmm let me think

haughty wharf Jun 20, 2021, 6:16 PM

#

Use https://gist.github.com

Gist

Discover gists

GitHub Gist: instantly share code, notes, and snippets.

arctic wedgeBOT Jun 20, 2021, 6:16 PM

#

Hey @visual violet!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

visual violet Jun 20, 2021, 6:17 PM

#

lol

serene scaffold Jun 20, 2021, 6:17 PM

#

visual violet lmao exceeed maximum length

Even for the paste bin?

visual violet Jun 20, 2021, 6:17 PM

#

lol yes

serene scaffold Jun 20, 2021, 6:18 PM

#

Just do fewer rows, I guess. How many are there?

visual violet Jun 20, 2021, 6:18 PM

#

724 🙂

serene scaffold Jun 20, 2021, 6:19 PM

#

visual violet 724 🙂

Try 400

visual violet Jun 20, 2021, 6:21 PM

#

hmm still too big

haughty wharf Jun 20, 2021, 6:22 PM

#

serene scaffold ```py In [21]: lda = LDA().fit(X, y) In [23]: lda.predict(X) Out[23]: array(['s...

tried recreating this but got LDA is not defined

serene scaffold Jun 20, 2021, 6:23 PM

#

haughty wharf tried recreating this but got LDA is not defined

I imported that long class name as LDA. LinearDiscriminatorwahtever

#

Having long class names is a waste of milliseconds.

serene scaffold Jun 20, 2021, 6:23 PM

#

visual violet hmm still too big

300?

haughty wharf Jun 20, 2021, 6:23 PM

#

ohhh gotcha

cedar sun Jun 20, 2021, 6:23 PM

#

can i make a model checkpoint to save model every 5 epochs, for example?

visual violet Jun 20, 2021, 6:26 PM

#

250 seems to be the limit

#

what do you think?

cedar sun Jun 20, 2021, 6:26 PM

#

ah with period

serene scaffold Jun 20, 2021, 6:26 PM

#

visual violet 250 seems to be the limit

Sure, go ahead and send that link over

visual violet Jun 20, 2021, 6:27 PM

#

https://paste.pythondiscord.com/oturupeyat.lua

serene scaffold Jun 20, 2021, 6:27 PM

#

visual violet https://paste.pythondiscord.com/oturupeyat.lua

you have to do it in a print(...) statement or it won't do line breaks.

#

which are necessary in this case.

visual violet Jun 20, 2021, 6:27 PM

#

so i print the csv file?

serene scaffold Jun 20, 2021, 6:27 PM

#

visual violet so i print the csv file?

I just need this, but with linebreaks where a row ends.

visual violet Jun 20, 2021, 6:29 PM

#

https://paste.pythondiscord.com/avuqazaqow.css

#

got it

#

lol it looks so cool in pastebin

#

yet so useless when it comes to being clustered

serene scaffold Jun 20, 2021, 6:36 PM

#

@visual violet I tried setting it to a symlog scale and got this

visual violet Jun 20, 2021, 6:38 PM

#

so i tried out some graphing

#

the weird labeling is not ebcause of the colors

#

hmm

#

how do you scale it so nicely though

#

i am starting to think it is because of my jupyter skin

#

the graph doesn't show fully

serene scaffold Jun 20, 2021, 6:41 PM

#

visual violet how do you scale it so nicely though

In [73]: df.T.plot.line(xlabel='Quarter', ylabel='Percentage Difference')
Out[73]: <AxesSubplot:xlabel='Quarter', ylabel='Percentage Difference'>

In [74]: matplotlib.pyplot.yscale('symlog')

In [75]: matplotlib.pyplot.show()

cedar sun Jun 20, 2021, 6:41 PM

#

should i augment data in validation?

visual violet Jun 20, 2021, 6:41 PM

#

what is validation?

#

how do you augment data

serene scaffold Jun 20, 2021, 6:42 PM

#

visual violet how do you augment data

depends on the type of data

#

I was writing a paper on data augmentation for nlp but it hasn't gone anywhere

visual violet Jun 20, 2021, 6:42 PM

#

are you hitting dead end?

#

or you can't find enough data

serene scaffold Jun 20, 2021, 6:42 PM

#

well, it wasn't making the results better or worse

#

so... 🤷‍♂️

visual violet Jun 20, 2021, 6:43 PM

#

well research do be like that sometimes

#

null result is not useless

cedar sun Jun 20, 2021, 6:44 PM

#

should i augment data in validation?

visual violet Jun 20, 2021, 6:53 PM

#

can i call you Steele ?

serene scaffold Jun 20, 2021, 6:55 PM

#

visual violet can i call you Steele ?

You can call me Stelercus

visual violet Jun 20, 2021, 6:55 PM

#

right

low wasp Jun 20, 2021, 6:56 PM

#

cedar sun should i augment data in validation?

no

#

https://datascience.stackexchange.com/questions/41422/when-using-data-augmentation-is-it-ok-to-validate-only-with-the-original-images

Data Science Stack Exchange

When using Data augmentation is it ok to validate only with the ori...

I'm working on a multi-classification deep learning algorithm and I was getting big over-fitting:
My model is supposed to classify sunglasses on 17 different brands, but I only had around 400 images

visual violet Jun 20, 2021, 7:01 PM

#

i still need to figure out how to remove the labels lol

#

but now i looks much like a cluster

#

before it is quite weird

#

#

atually surprised how the price behaves to similarly

cedar sun Jun 20, 2021, 7:06 PM

#

low wasp no

thanks

haughty wharf Jun 20, 2021, 7:09 PM

#

@serene scaffold Im trying to follow this https://www.python-course.eu/linear_discriminant_analysis.php for LDA.

Im seeing the first step looks like:
Would my target feature be 256 columns and my descriptive feature would column g?

# 0. Load in the data and split the descriptive and the target feature
df = pd.read_csv('data/Wine.txt',sep=',',names=['target','Alcohol','Malic_acid','Ash','Akcakinity','Magnesium','Total_pheonols','Flavanoids','Nonflavanoids','Proanthocyanins','Color_intensity','Hue','OD280','Proline'])
X = df.iloc[:,1:].copy()
target = df['target'].copy()

serene scaffold Jun 20, 2021, 7:10 PM

#

haughty wharf <@!253696366952316929> Im trying to follow this https://www.python-course.eu/lin...

I don't actually have experience using LDAs, so I might be able to get back to you

serene scaffold Jun 20, 2021, 7:11 PM

#

visual violet

There might be a point at which there are just too many lines to effectively plot

haughty wharf Jun 20, 2021, 7:12 PM

#

serene scaffold I don't actually have experience using LDAs, so I might be able to get back to y...

gotcha no worries. i didnt think this would be so confusing. Most of the tutorials I've been looking at are working with two classes

visual violet Jun 20, 2021, 7:14 PM

#

you are not wrong

#

but it does show the cluster prety well though, so i am pretty happy about it

serene scaffold Jun 20, 2021, 7:15 PM

#

haughty wharf gotcha no worries. i didnt think this would be so confusing. Most of the tutoria...

yeah, I might have been wrong about the number of classes not mattering. if I figure it out I'll let you know

visual violet Jun 20, 2021, 7:15 PM

#

you can see one giant clump

fiery pollen Jun 20, 2021, 7:16 PM

#

Hi me again
df2=df[['DateDim[Day]', '[NetSales]']]

df2['DateDim[Day]'] = pd.to_datetime(df2['DateDim[Day]'])

plt.figure(figsize=(16,8))
plt.title('Sale History')
plt.plot(df2['[NetSales]'])
plt.xlabel('DateDim[Day]')
plt.ylabel('[NetSales]', fontsize=25)
plt.show()
thats my code and output like this

haughty wharf Jun 20, 2021, 7:16 PM

#

serene scaffold yeah, I might have been wrong about the number of classes not mattering. if I fi...

Appreciate it man!

fiery pollen Jun 20, 2021, 7:16 PM

#

#

and you know my date data looking like this

#

why the program still doesn't see it in date format

cedar sun Jun 20, 2021, 7:18 PM

#

WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.

visual violet Jun 20, 2021, 7:18 PM

#

do like data["Date"]= pd.to_datetime(data["Date"])

cedar sun Jun 20, 2021, 7:18 PM

#

What does this mean? like, i though period was used to specify number of epochs

visual violet Jun 20, 2021, 7:19 PM

#

right now it doesn't know that those are dates

fiery pollen Jun 20, 2021, 7:20 PM

#

same output like 500000 not date on xlabel

visual violet Jun 20, 2021, 7:21 PM

#

how about plt.plot(df2['DateDim[Day]'],df2['[NetSales]'])

fiery pollen Jun 20, 2021, 7:24 PM

#

#

not bad but it looks something wrong with data 😄

visual violet Jun 20, 2021, 7:25 PM

#

now you can extend the y axis like i did lol

#

figure(figsize=(10, 15), dpi=80)

#

put ^ code first then everything else after for it to work

fiery pollen Jun 20, 2021, 7:28 PM

#

thankss

cedar sun Jun 20, 2021, 7:59 PM

#

different models of EfficientNet

#

Are only based on the dimensions of the input img?

haughty wharf Jun 20, 2021, 8:24 PM

#

How do I know if theres a good classification distribution in my data?

serene scaffold Jun 20, 2021, 8:29 PM

#

@haughty wharf having a confusion matrix that's just the diagnonal means your model got everything right

#

Though you would want to evaluate that using different data than you used for training

haughty wharf Jun 20, 2021, 8:31 PM

#

serene scaffold <@277292825262030858> having a confusion matrix that's just the diagnonal means ...

Like an entirely new dataset?

serene scaffold Jun 20, 2021, 8:32 PM

#

@haughty wharf you usually train on like 70% and evaluate on 30%, or something like that

#

Unless you want to cross validate, which is nice if you can afford that computationally.

haughty wharf Jun 20, 2021, 8:34 PM

#

Ahhh I see

haughty wharf Jun 20, 2021, 8:36 PM

#

serene scaffold <@277292825262030858> you usually train on like 70% and evaluate on 30%, or some...

Okayy. I guess Ill pause on the LDA analysis for now. I also wanted to do some exploratory analysis. But Im having a hard time figuring out how to. The only thing I've done so far in that regard was get a pie chart displaying the phoneme distribution in the data

serene scaffold Jun 20, 2021, 8:38 PM

#

@haughty wharf exploratory analysis with the training data? Remind me what the rows and columns mean?

haughty wharf Jun 20, 2021, 8:40 PM

#

WIth the whole pandas dataframe, minus the speaker and row columns. I actually have some notes let me see.

 A dataset was formed by selecting five phonemes for classification based on digitized speech from this database. The phonemes are transcribed as follows: "sh" as in "she", "dcl" as in "dark", "iy" as the vowel in "she", "aa" as the vowel in "dark", and "ao" as the first vowel in "water". From continuous speech of 50 male speakers, 4509 speech frames of 32 msec duration were selected, approximately 2 examples of each phoneme from each speaker. Each speech frame is represented by 512 samples at a 16kHz sampling rate, and each frame represents one of the above five phonemes. The breakdown of the 4509 speech frames into phoneme frequencies is as follows:

From each speech frame, we computed a log-periodogram, which is one of several widely used methods for casting speech data in a form suitable for speech recognition. Thus the data used in what follows consist of 4509 log-periodograms of length 256, with known class (phoneme) memberships.

The data contain 256 columns labelled "x.1" - "x.256", a response column labelled "g", and a column labelled "speaker" identifying the diffferent speakers.

g- is labeled the phoneme```

haughty wharf Jun 20, 2021, 8:42 PM

#

serene scaffold <@277292825262030858> exploratory analysis with the training data? Remind me wha...

the rows are 4509 instances, and there are 259 columns(x.1- x.256(frequency measurements/predictor features), column g is the phoneme respone, and column speaker identifies different speakers but I was told to ignore this column)

visual violet Jun 20, 2021, 9:43 PM

#

anybody knows how to explain dataframe in words?

#

like anybody has experience with describing a dataframe in word in a research paper

reef bone Jun 20, 2021, 10:04 PM

#

a pandas dataframe?

#

you could have a look around the docs, they have to explain it somehow

#

the docs for the class say:

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

#

from here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas-dataframe

#

there will probably be a more lengthy description elsewhere, but its difficult to navigate the docs on mobile 😬

visual violet Jun 20, 2021, 10:22 PM

#

oh the way pandas doc describe their dataframe confuses me a lot lol

#

"Two-dimensional, size-mutable, potentially heterogeneous tabular data."

#

lol

#

wat does this even mean

reef bone Jun 20, 2021, 10:36 PM

#

it means that you have rows and columns (2 dimensions), the size can change (you can add and remove rows and columns), and can hold mixed types (ints, floats, timestamps, ...) in a single dataframe

twin fiber Jun 20, 2021, 10:58 PM

#

hey I'm really hoping someone can help me, struggling to get correct(?) output from confusion matrix

#

I am building a model to infer sentiment from reviews. The model accuracy is listed at 97% and I am now trying to calculate the confusion matrix however it doesn't seem to output the correct information unless i'm misinterpreting it

#

this is what the matrix is outputting, can this be right with a 97% accurate model?

#

velvet thorn Jun 20, 2021, 11:22 PM

#

twin fiber this is what the matrix is outputting, can this be right with a 97% accurate mod...

why don’t you think so?

lapis sequoia Jun 20, 2021, 11:22 PM

#

how do I split with multiple delimiters

velvet thorn Jun 20, 2021, 11:24 PM

#

lapis sequoia how do I split with multiple delimiters

in pandas?

lapis sequoia Jun 20, 2021, 11:24 PM

#

velvet thorn in `pandas`?

anything

velvet thorn Jun 20, 2021, 11:24 PM

#

lapis sequoia anything

what do you mean anything

#

it depends

#

re.split for the general case

lapis sequoia Jun 20, 2021, 11:24 PM

#

A big block of text in this case

#

How do I separate the delimiters with regex?

velvet thorn Jun 20, 2021, 11:25 PM

#

you use a regex that matches multiple characters

#

and it’ll split on any of them

lapis sequoia Jun 20, 2021, 11:26 PM

#

Got it, thanks

twin fiber Jun 20, 2021, 11:27 PM

#

velvet thorn why don’t you think so?

just curious because it says there are 4100 false positives

#

out of 5000, unless i'mr eading it wrong

#

it just seems high because the model is meant to be 97% accurate

velvet thorn Jun 20, 2021, 11:30 PM

#

twin fiber just curious because it says there are 4100 false positives

not sure why the display is in scientific notation? but I see 10000 TN, 410 FN, and 9600 TN

#

which seems about right

#

or FP, I forgot which axis is predicted

twin fiber Jun 20, 2021, 11:31 PM

#

oh I see thank you

#

I was interpreting it wrong

velvet thorn Jun 20, 2021, 11:32 PM

#

yw 👋

cedar sun Jun 21, 2021, 1:34 AM

#

do u have any mixup implementation for keras?

visual violet Jun 21, 2021, 3:47 AM

#

what representation method should i use

#

for a time-series data

#

i only have 20 columns and 725 rows

#

but i wanna know how to reduce the 'noise' 🙂

wanton sleet Jun 21, 2021, 3:49 AM

#

x = np.linspace(0, 2 * np.pi, 400)

#

any knows what this specific line does?

austere swift Jun 21, 2021, 4:12 AM

#

did you check out the documentation for the function?

#

!d numpy.linspace

arctic wedgeBOT Jun 21, 2021, 4:12 AM

#

numpy.linspace


numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)```
Return evenly spaced numbers over a specified interval.

Returns *num* evenly spaced samples, calculated over the interval [*start*, *stop*].

The endpoint of the interval can optionally be excluded.

Changed in version 1.16.0: Non-scalar *start* and *stop* are now supported.

Changed in version 1.20.0: Values are rounded towards `-inf` instead of `0` when an integer `dtype` is specified. The old behavior can still be obtained with `np.linspace(start, stop, num).astype(int)`

wanton sleet Jun 21, 2021, 4:39 AM

#

@austere swift starts from 0 - 360 ( as pi =180) of 400 different samples right?

austere swift Jun 21, 2021, 4:42 AM

#

If you’re in the context of radians, yes

wanton sleet Jun 21, 2021, 4:50 AM

#

Thanks

dense hinge Jun 21, 2021, 4:52 AM

#

can someone recommend an ebook for me to get started with deep learning with pytorch?

thorny bolt Jun 21, 2021, 5:24 AM

#

i need help with a bit of my code in which i'm training a model

#

and it is a bit urgent

novel elbow Jun 21, 2021, 5:48 AM

#

IF you need help. post the questions here (:

novel elbow Jun 21, 2021, 5:48 AM

#

dense hinge can someone recommend an ebook for me to get started with deep learning with pyt...

https://github.com/fastai/fastbook

magic juniper Jun 21, 2021, 6:45 AM

#

I have my neural network made and stuff, now how would I Utilize it to actually create an AI? please help.

dense hinge Jun 21, 2021, 7:13 AM

#

novel elbow https://github.com/fastai/fastbook

why can't i see the ipynb file in github?
gives Sorry, something went wrong. Reload?

novel elbow Jun 21, 2021, 7:13 AM

#

thats a github problem, sometimes is not very reliable to check notebooks

#

you can use https://nbviewer.jupyter.org/ and paste the notebook url

dense hinge Jun 21, 2021, 7:14 AM

#

yah just read the problem and found the site

#

thanks dude

#

@novel elbow is fastai better to learn than pytorch?

novel elbow Jun 21, 2021, 7:15 AM

#

its build in top of pytorch

dense hinge Jun 21, 2021, 7:15 AM

#

too many abstractions though right?

#

if I wanted to something manual it would be hard is what I heard

novel elbow Jun 21, 2021, 7:15 AM

#

depends on what you want to learn

#

then check the fastai course

#

the part 2 of the course they teach you how is the library built

#

so you can see all the inner and manual parts

dense hinge Jun 21, 2021, 7:18 AM

#

oh ok thanks

thorny bolt Jun 21, 2021, 7:29 AM

#

does anybody here participate in kaggle competitions regularly?

inland zephyr Jun 21, 2021, 9:03 AM

#

Does anyone in here know the proper way to feed tensorflow/keras conv1d network with pandas dataframe? I always have problem with 1D datastructure from pandas to conv1d

#

let say i have n x 1000 features data for train or test, i always troubled with the input_shape with [n,n_feature] or batch_input_shape with(n,n_feature)

#

i using this code line dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values)) and put my 1st convo1D layer like this model.add(layers.Conv1D(filters=64,kernel_size=9,activation='relu',batch_input_shape = [None,15360, 1])) it stuck on ValueError: Input 0 of layer sequential_3 is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (15360, 1)

#

and this is my model structure:

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_12 (Conv1D)           (None, 15352, 64)         640       
_________________________________________________________________
conv1d_13 (Conv1D)           (None, 15344, 64)         36928     
_________________________________________________________________
conv1d_14 (Conv1D)           (None, 15336, 64)         36928     
_________________________________________________________________
max_pooling1d_4 (MaxPooling1 (None, 7668, 64)          0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 7668, 64)          0         
_________________________________________________________________
conv1d_15 (Conv1D)           (None, 7660, 64)          36928     
_________________________________________________________________
conv1d_16 (Conv1D)           (None, 7652, 64)          36928     
_________________________________________________________________
max_pooling1d_5 (MaxPooling1 (None, 3826, 64)          0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 3826, 64)          0         
_________________________________________________________________
conv1d_17 (Conv1D)           (None, 3818, 64)          36928     
_________________________________________________________________
flatten_2 (Flatten)          (None, 244352)            0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 244353    
=================================================================
Total params: 429,633
Trainable params: 429,633
Non-trainable params: 0
_________________________________________________________________

novel osprey Jun 21, 2021, 10:20 AM

#

!d numpy

arctic wedgeBOT Jun 21, 2021, 10:20 AM

#

numpy

numpy is the standard numerical array library for python, the successor to Numeric and numarray. numpy provides fast operations for homogeneous data sets and common mathematical operations like correlations, standard deviation, fourier transforms, and convolutions.

wooden cosmos Jun 21, 2021, 11:00 AM

#

hey guys, i'm trying to load a pre-trained gensim Word2Vec model and i am experiencing this error:

UnpicklingError: invalid load key, '6'.

i got the model like this :
'blabla-10-300.w2v.model.bz2'

i tried multiple ways: i loaded it directly, i loaded it unzipped, i tried multiple methods to load it into gensim

and nothing seems to work

do you have a fix?

serene scaffold Jun 21, 2021, 12:39 PM

#

@wooden cosmos try copying the error message into the chat

wooden cosmos Jun 21, 2021, 12:40 PM

#

serene scaffold <@488001720845336586> try copying the error message into the chat

thx, i already fixed it

serene scaffold Jun 21, 2021, 12:40 PM

#

Yay!

wooden cosmos Jun 21, 2021, 12:41 PM

#

yeah, cool
but i ran into an other problem - the model is trained on non-stemmed and non-lemmatized french words

wooden cosmos Jun 21, 2021, 12:41 PM

#

serene scaffold Yay!

that seems weird to me, what do you think about it?

serene scaffold Jun 21, 2021, 12:43 PM

#

wooden cosmos that seems weird to me, what do you think about it?

I don't know enough about french to know why they would have made that decision

#

@eager heath, I choose you!

eager heath Jun 21, 2021, 12:43 PM

#

Yes!

serene scaffold Jun 21, 2021, 12:43 PM

#

French. Help.

eager heath Jun 21, 2021, 12:43 PM

#

Tell me about it

serene scaffold Jun 21, 2021, 12:43 PM

#

You know how there's like the normal form of a word, like the version that's used to look it up in a dictionary?

eager heath Jun 21, 2021, 12:44 PM

#

I don't know what the format is, but now it exists, yes

serene scaffold Jun 21, 2021, 12:44 PM

#

It's called the lemma.

#

It's usually the singular form of a word when it's the subject of a sentence.

wooden cosmos Jun 21, 2021, 12:45 PM

#

the question is : is it better to train w2v on lemmatized and stemmed tokens or to use just plain words like in the text, without any preproc

#

and if i have a model, which is not trained on lemms and stemms - should i try and retrain that thing or i just stick with it?

eager heath Jun 21, 2021, 12:46 PM

#

that would be a bit weird, we do have plural and gendered forms for most words

#

My blind guess is that you would get less accurate results because of all the possible spelling of a word

wooden cosmos Jun 21, 2021, 12:47 PM

#

yeah

#

and also for the verbs -> "aller" could be spelled as "allez","allons","allait","allaient"

eager heath Jun 21, 2021, 12:50 PM

#

Yeah

serene scaffold Jun 21, 2021, 12:54 PM

#

Are there rules you can apply to certain types of words to get their base form?

eager heath Jun 21, 2021, 1:05 PM

#

Yes, there should be

#

Although they would get really complicated quickly

#

We like exceptions of exceptions

desert oar Jun 21, 2021, 1:23 PM

#

romance languages are generally easier to reduce to a base form than english, as far as i know

#

i know spanish you could probably do it in prolog, not many exceptions and even the exceptions are pretty "regular"

serene scaffold Jun 21, 2021, 1:29 PM

#

desert oar romance languages are generally easier to reduce to a base form than english, as...

English is actually pretty easy. The exceptions to our pluralization rules are when we retain the plural forms of other languages

#

there are two conjugations of verbs, those where you append -ed and those where a vowel changes internally. Like swim vs swam. But the latter category is shrinking over time.

dry hearth Jun 21, 2021, 1:34 PM

#

Can someone help me with an error I'm having running a BERT model on Colab? Colab's RAM is getting depleted, what am I missing here?
https://www.reddit.com/r/MachineLearning/comments/o4uxc5/p_help_error_due_to_colab_ram_depletion_when/

r/MachineLearning - [P] Help - Error due to Colab RAM depletion whe...

0 votes and 0 comments so far on Reddit

desert oar Jun 21, 2021, 1:36 PM

#

serene scaffold there are two conjugations of verbs, those where you append `-ed` and those wher...

fair enough, although the "standard" nltk stemmers do miss a lot of specific cases

#

and those have had a lot of research behind them

grave frost Jun 21, 2021, 1:54 PM

#

dry hearth Can someone help me with an error I'm having running a BERT model on Colab? Cola...

you won't get any replies on r/machinelearning, and did you try googling your error?

dry hearth Jun 21, 2021, 2:01 PM

#

grave frost you won't get any replies on r/machinelearning, and did you try googling your er...

yeah but nothing relevant that is helping 😦

#

what other subreddits do you suggest i check out?

ember sapphire Jun 21, 2021, 2:02 PM

#

how can i get a random vector [x, y, z] from a 3d numpy array with shape (255, 255, 3)

desert oar Jun 21, 2021, 2:02 PM

#

ember sapphire how can i get a random vector [x, y, z] from a 3d numpy array with shape (255, 2...

you want a random slice of the array?

ember sapphire Jun 21, 2021, 2:02 PM

#

yes

#

the 3d array is conceptually a 2d array of (r, g, b) triples

#

i want a random triple

#

i looked at at np.random.choice and some others but i can't see an easy way to do this even though it should be easy

desert oar Jun 21, 2021, 2:04 PM

#

from random import randrange
i, j = randrange(255), randrange(255)

my_random_rgb = array[i, j, :]

no?

ember sapphire Jun 21, 2021, 2:04 PM

#

lol sure

#

for some reason i thought there'd be some numpy function that did it directly

desert oar Jun 21, 2021, 2:05 PM

#

random.choice specifically says it's for 1D

ember sapphire Jun 21, 2021, 2:05 PM

#

yeah

#

there's random.Generator.choice

desert oar Jun 21, 2021, 2:05 PM

#

same thing

#

i'm not sure if there's a version to select random "slices" like that

ember sapphire Jun 21, 2021, 2:05 PM

#

doesn't seem like it

#

guess i'll just do it the way you suggested

desert oar Jun 21, 2021, 2:07 PM

#

from numpy.random import default_rng

rng = default_rng()
i, j = rng.integers(0, 255, size=2, dtype=int, endpoint=False)

i think this is equivalent using the numpy rng

#

which you probably should do if you want to use the same random seed as your other numpy code

#

maybe you can "partially ravel" the array and then use choices

#

@ember sapphire ```python
from numpy.random import default_rng

rng = default_rng()

image = ... # 255 x 255 x 3
random_triple = rng.choice(image.reshape((-1,3)))

ember sapphire Jun 21, 2021, 2:12 PM

#

thank you

#

that is nice

desert oar Jun 21, 2021, 2:13 PM

#

the i,j version might be faster for what it's worth

#

In [76]: b = np.arange(255*255*3).reshape((255,255,3))

In [77]: rng = np.random.default_rng()

In [78]: def rand1(rng, array):
    ...:     i, j = rng.integers(0, 255, size=2, dtype=int, endpoint=False)
    ...:     return array[i, j, :]
    ...:

In [79]: def rand2(rng, array):
    ...:     return rng.choice(array.reshape((-1,3)))
    ...:

In [80]: %timeit rand1(rng, b)
23.6 µs ± 3.03 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [81]: %timeit rand2(rng, b)
21.3 µs ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

#

not much different

#

!e ```python
from math import sqrt
import scipy.stats

n1 = 7
n2 = 7

m1 = 23.6
s1 = 3.03

m2 = 21.3
s2 = 2.10

welch_t = (m1 - m2) / sqrt(s12 + s22)
welch_df = ((n1-1)*s14 + (n2-1)*s24) / sqrt(s14 + s24)
welch_p = scipy.stats.t(df=welch_df).ppf(welch_t)

print(welch_p)

arctic wedgeBOT Jun 21, 2021, 2:28 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

0.3171214613223502

ember sapphire Jun 21, 2021, 2:53 PM

#

another simple numpy question

#

so i have my 255x255 image

#

and i want to augment it so that it's 255x255x5 instead of 255x255x3

#

i want image[x, y] to become [x/255, y/255, *image[x, y]]

#

is there a way to do that in a vectorized fashion?

desert oar Jun 21, 2021, 3:04 PM

#

wouldn't that be more than 5?

#

or do you literally want the index in the array?

ember sapphire Jun 21, 2021, 3:05 PM

#

i want the index in the array

desert oar Jun 21, 2021, 3:05 PM

#

as in, image[10, 30] should be (10/255, 30/255, r, g, b)?

ember sapphire Jun 21, 2021, 3:05 PM

#

yes

desert oar Jun 21, 2021, 3:05 PM

#

dare i ask, why?

ember sapphire Jun 21, 2021, 3:06 PM

#

because that is the space in which i want to compute distances

#

im clustering pixels based on their location and their color

#

so my centroids for k-means need to be vectors in that 5d space

#

cluster[y, x] = np.argmin(np.linalg.norm(centroids - img[y, x], axis=1))

#

the goal is to be able to write that

desert oar Jun 21, 2021, 3:10 PM

#

!eval there might be a nicer way to do it, but this appears to work

import numpy as np

# rgb 255x255 image
b = np.arange(255*255*3).reshape((255,255,3))

m, n = b.shape[:2]
i_broadcast = np.repeat(np.arange(m), n).reshape((m, n, -1))
j_broadcast = np.tile(np.arange(m), n).reshape((m, n, -1))
b_aug = np.concatenate((i_broadcast, j_broadcast, b), axis=2)

print(b_aug)

arctic wedgeBOT Jun 21, 2021, 3:10 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | [[[     0      0      0      1      2]
002 |   [     0      1      3      4      5]
003 |   [     0      2      6      7      8]
004 |   ...
005 |   [     0    252    756    757    758]
006 |   [     0    253    759    760    761]
007 |   [     0    254    762    763    764]]
008 | 
009 |  [[     1      0    765    766    767]
010 |   [     1      1    768    769    770]
011 |   [     1      2    771    772    773]
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/jenuwoguja.txt?noredirect

desert oar Jun 21, 2021, 3:11 PM

#

if you have to do this for lots of images you could of course re-use the i_broadcast and j_broadcast over and over in a tight loop

#

i forgot to /255 but you get the idea

ember sapphire Jun 21, 2021, 3:15 PM

#

is it just me or is numpy miserable to work with

desert oar Jun 21, 2021, 3:15 PM

#

x = np.repeat(np.arange(m), n).reshape((m, n, -1)) / 255.0
y = np.tile(np.arange(m), n).reshape((m, n, -1)) / 255.0

b_aug = np.concatenate((x, y, b), axis=2)

#

it's just a reality when working within a language like python

#

you need to do as much in C as possible

#

which means you need custom C functions and lots of custom functionality that in a "fast" language you might just do in a for loop

ember sapphire Jun 21, 2021, 3:16 PM

#

kind of defeats the purpose of using a high level language

desert oar Jun 21, 2021, 3:16 PM

#

it's also a bit of a learning curve and an acquired taste

#

sort of, you the developer don't need to worry about allocating memory and strided array lookups and bytes and stuff

#

also if you really do need to write a for loop over a numpy array, numba can be magical

#

import numba
import numpy as np

def augment_with_coords_np(array):
    x = np.repeat(np.arange(m), n).reshape((m, n, -1)) / 255.0
    y = np.tile(np.arange(m), n).reshape((m, n, -1)) / 255.0
    return np.concatenate((x, y, array), axis=2)

@numba.njit
def augment_with_coords_nb(array_in):
    array_out = np.zeros((255, 255, 5))
    for i in range(255):
        x = i / 255.0
        for j in range(255):
            y = j / 255.0
            array_out[i, j, 0] = x
            array_out[i, j, 1] = y
            array_out[i, j, 2] = array_in[i, j, 0]
            array_out[i, j, 3] = array_in[i, j, 1]
            array_out[i, j, 4] = array_in[i, j, 2]
    return array_out

In [144]: %timeit augment_with_coords_np(b)
1.06 ms ± 93.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [146]: %timeit augment_with_coords_nb(b)
255 µs ± 53.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#

where b = np.arange(255*255*3).reshape((255,255,3))

#

numba is a lot faster here because it can be more algorithmically efficient, only a single nested loop and a single allocation, instead of lots and lots of looping and allocation + python function call overhead

#

if you use np.empty instead of np.zeros , the numba version is even faster

In [149]: %timeit augment_with_coords_nb(b)
155 µs ± 4.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

ember sapphire Jun 21, 2021, 3:24 PM

#

wow

serene scaffold Jun 21, 2021, 4:05 PM

#

ember sapphire kind of defeats the purpose of using a high level language

isn't having all the iteration abstracted away more high-level than writing loops?

serene scaffold Jun 21, 2021, 4:07 PM

#

desert oar ```python import numba import numpy as np def augment_with_coords_np(array): ...

am I reading that right? the numba one was faster? what the fuck?

#

how does that even happen?

desert oar Jun 21, 2021, 4:08 PM

#

a lot faster

#

like i said, fewer passes over the data + no python function call overhead

#

at least, that is my understanding of why

serene scaffold Jun 21, 2021, 4:09 PM

#

does numba only work with cpython?

desert oar Jun 21, 2021, 4:10 PM

#

i think so, i've read some blog posts about using it with pypy but i think you need to build pypy from source with some patches, maybe?

#

maybe it's better now in 2021

serene scaffold Jun 21, 2021, 4:12 PM

#

def augment_with_coords_np(array):
    x = np.repeat(np.arange(m), n).reshape((m, n, -1)) / 255.0  # arrange, repeat, reshape, divide; 4
    y = np.tile(np.arange(m), n).reshape((m, n, -1)) / 255.0    # same, basically. 4
    return np.concatenate((x, y, array), axis=2)                # 1

If I'm reading this right, this involves creating 9 arrays, only one of which gets returned. But I assume that within a numba-decorated function, the semantic requirement that intermediary arrays are created isn't there, yes?

#

that and you don't create intermediary arrays anyway

grave frost Jun 21, 2021, 4:14 PM

#

dry hearth yeah but nothing relevant that is helping 😦

reducing batch size is literally the first thing that comes up

serene scaffold Jun 21, 2021, 4:16 PM

#

serene scaffold ```py def augment_with_coords_np(array): x = np.repeat(np.arange(m), n).resh...

def augment_with_coords_np(array):
    return np.concat((
        np.repeat(np.arange(m), n).reshape((m, n, -1)),
        np.tile(np.arange(m), n).reshape((m, n, -1)),
        array), axis=2
    ) / 255.0

I think this is the same?

desert oar Jun 21, 2021, 4:19 PM

#

yeah the arrays still need to get created

#

numpy has no way to optimize that away

dry hearth Jun 21, 2021, 4:20 PM

#

grave frost reducing batch size is literally the first thing that comes up

already tried that

desert oar Jun 21, 2021, 4:20 PM

#

this is the same use case as numexpr in pandas

dry hearth Jun 21, 2021, 4:20 PM

#

😦

grave frost Jun 21, 2021, 4:20 PM

#

dry hearth already tried that

if you can't get it to work with BS 1, then you have no other choice than to reduce model parameters

#

or buy/obtain a better GPU

serene scaffold Jun 21, 2021, 4:24 PM

#

desert oar yeah the arrays still need to get created

even in numba? or can numba figure out the intended semantics?

#

(when it's only having to deal with arrays)

desert oar Jun 21, 2021, 4:24 PM

#

that i don't know, i'll try it

#

it doesn't work with nopython mode

#

In [156]: %timeit augment_with_coords_nb_np(b)
1.17 ms ± 79.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

this is the numpy version with numba.jit slapped on top of it

#

so it's actually slower than just doing it in plain python

#

or at least not any faster

#

note: this is all with numpy 1.20.2 under cpython 3.9.4 x86_x64

#

using the pypi wheel, not conda

#

properties might be different in different situations of course

serene scaffold Jun 21, 2021, 4:33 PM

#

so to benefit from numba, you can't be creating lots of extra arrays?

desert oar Jun 21, 2021, 4:33 PM

#

to benefit from numba you need to be writing for loops over numpy arrays

#

not using high-level numpy functions

#

@numba.njit
def augment_with_coords_nb_pre(array_in, array_out):
    for i in range(255):
        x = i / 255.0
        for j in range(255):
            y = j / 255.0
            array_out[i, j, 0] = x
            array_out[i, j, 1] = y
            array_out[i, j, 2] = array_in[i, j, 0]
            array_out[i, j, 3] = array_in[i, j, 1]
            array_out[i, j, 4] = array_in[i, j, 2]

i think you could even require a pre-allocated array_out parameter to be filled

#

this is one possible optimization if you're writing a loop, you can allocate the memory once for the entire loop

serene scaffold Jun 21, 2021, 4:36 PM

#

I'd rather it be something like

@numba.njit(array_out=np.zeros((5, 5)))
def augment_with_coords_nb_pre(array_in):
    for i in range(255):
        x = i / 255.0
        for j in range(255):
            y = j / 255.0
            array_out[i, j, 0] = x
            array_out[i, j, 1] = y
            array_out[i, j, 2] = array_in[i, j, 0]
            array_out[i, j, 3] = array_in[i, j, 1]
            array_out[i, j, 4] = array_in[i, j, 2]

desert oar Jun 21, 2021, 4:36 PM

#

interestingly it doesn't actually seem faster if you use the pre-allocated array

serene scaffold Jun 21, 2021, 4:36 PM

#

but I guess that would fuck with the namespacing

desert oar Jun 21, 2021, 4:36 PM

#

In [162]: @numba.njit
     ...: def augment_with_coords_nb_pre(array_in, array_out):
     ...:     for i in range(255):
     ...:         x = i / 255.0
     ...:         for j in range(255):
     ...:             y = j / 255.0
     ...:             array_out[i, j, 0] = x
     ...:             array_out[i, j, 1] = y
     ...:             array_out[i, j, 2] = array_in[i, j, 0]
     ...:             array_out[i, j, 3] = array_in[i, j, 1]
     ...:             array_out[i, j, 4] = array_in[i, j, 2]
     ...:

In [163]: c = np.empty((255, 255, 5))

In [164]: %timeit augment_with_coords_nb_pre(b, c)
179 µs ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [165]: %timeit c = np.empty((255, 255, 5)); augment_with_coords_nb_pre(b, c)
193 µs ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

serene scaffold Jun 21, 2021, 4:37 PM

#

so the time savings is probably just from making the array in advance?

desert oar Jun 21, 2021, 4:37 PM

#

what do you mean?

#

it's still significantly faster than the np.concatenate version

#

#data-science-and-ml message

#

actually wait

#

it is faster than the numba version using np.zeros, but not faster than the numba version using np.empty

#

these differences aren't really statistically significant though

#

i'm surprised that pre-allocating isn't significantly faster, maybe there's additional overhead somehow, or i need to use the numba signature

serene scaffold Jun 21, 2021, 4:43 PM

#

@desert oar On an unrelated note, I've been wanting to write an article on transition from general Python program design to data science Python design, and it's based on the idea that where general Python is mostly OOP and imperative (there are lots of data types and you use loops to read and write to different data structures), data science Python is more functional and less OOP (you mostly work with "rectangular" data structures, pretty much everything is a function that doesn't modify the underlying data (ie there are usually no (gasp) side effects)). Do you think I'm on the right track?

desert oar Jun 21, 2021, 4:43 PM

#

In [167]: c = np.empty((255, 255, 5)); augment_with_coords_nb_pre(b, c)

In [168]: np.testing.assert_array_almost_equal(augment_with_coords_nb(b), c)

In [169]: %timeit augment_with_coords_nb_pre(b, c)
158 µs ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [170]: %timeit augment_with_coords_nb_pre(b, c)
149 µs ± 1.77 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [171]: %timeit augment_with_coords_nb(b)
150 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [172]: %timeit augment_with_coords_nb(b)
158 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

lost umbra Jun 21, 2021, 4:44 PM

#

can someone help me with a plotly.py problem?

desert oar Jun 21, 2021, 4:44 PM

#

serene scaffold <@!389497659087650836> On an unrelated note, I've been wanting to write an arti...

kind of. pandas itself is quite object-oriented. the main difference is transitioning from imperative (for loops) to declarative (vectorized operations)

#

pandas also isn't really that "functional"

#

in fact id' say it's not really functional at all other than its use of higher-order functions in some places (map, apply, etc)

serene scaffold Jun 21, 2021, 4:45 PM

#

I'm thinking of both pandas and numpy

desert oar Jun 21, 2021, 4:45 PM

#

same goes for numpy as for pandas

#

but pandas and numpy are both fairly object-oriented and only somewhat "functional"

#

numpy and pandas do both support some mix of views and copy-on-write, but it's mostly exposed directly to the user, rather than hidden away as optimization behind an immutable interface

#

the "no side effects" aspect of functional programming is mostly incidental by virtue of what people usually want to do with numpy and pandas: math

serene scaffold Jun 21, 2021, 4:49 PM

#

Declarative is definitely more along the right lines. It wasn't part of my CS education, apparently.

#

I should get a refund.

desert oar Jun 21, 2021, 4:49 PM

#

heh, sql is declarative

serene scaffold Jun 21, 2021, 4:50 PM

#

my database class was weird

desert oar Jun 21, 2021, 4:50 PM

#

i'm sure they talked more about database implementation than about programming language design though

serene scaffold Jun 21, 2021, 4:51 PM

#

in my database class? the first half of the class was ACID, relational algebra and all those normal forms, and the second half was SQL and making a website

#

we didn't talk about the time complexity of different queries, which is kind of annoying

uncut barn Jun 21, 2021, 5:05 PM

#

 if True:
        tokens = [t for t in tokens if t not in set(stopwords.words('english'))]

#

can anyone help me understand what this code means

#

i mean the first part if True

#

what has to be True, and what makes this statement false?

serene scaffold Jun 21, 2021, 5:08 PM

#

uncut barn i mean the first part if True

it returns a copy of tokens that doesn't have any stopwords. Do you know what a stopword is?

uncut barn Jun 21, 2021, 5:11 PM

#

yes but i'm just trying to understand the if True part

#

i get the rest of the code

serene scaffold Jun 21, 2021, 5:11 PM

#

uncut barn yes but i'm just trying to understand the if True part

if True is pointless

uncut barn Jun 21, 2021, 5:12 PM

#

ok so its just extra code

serene scaffold Jun 21, 2021, 5:12 PM

#

uncut barn ok so its just extra code

I guess so. if True blocks will always get entered

ember sapphire Jun 21, 2021, 5:12 PM

#

ok i must be doing something seriously wrong

uncut barn Jun 21, 2021, 5:13 PM

#

so what happens if i do if False

#

would it still run

austere swift Jun 21, 2021, 5:13 PM

#

do you know the concept of an if statement?

#

like what if does

uncut barn Jun 21, 2021, 5:14 PM

#

yes check if a statement is true or not and executes the result if a condition is achieved

austere swift Jun 21, 2021, 5:14 PM

#

it executes the block when the condition is evaluated to true

#

so if True will always be executed, since the condition is True

#

and inversely, if False will never be executed, since the condition is False

ember sapphire Jun 21, 2021, 5:15 PM

#

import numpy as np
from matplotlib import pyplot as plt
from matplotlib import image

rng = np.random.default_rng()

img = image.imread('fruits_small.jpg')
h, w = img.shape[:2]

x = np.repeat(np.arange(h), w).reshape((h, w, -1)) / w
y = np.tile(np.arange(h), w).reshape((h, w, -1)) / h

augmented_image = np.concatenate((x, y, img), axis=2)
augmented_image = np.array(img)

plt.subplot(3, 3, 1)
plt.title('Original')
plt.imshow(img)

for plot, k in enumerate([4, 8, 16, 32, 64]):
    centroids = rng.choice(augmented_image.reshape((-1, 3)), size=k, replace=False)
    clusters = np.empty((h, w))

    print(centroids)

    while True:
        for y, x in np.ndindex(img.shape[:2]):
            v = augmented_image[y, x]
            clusters[y, x] = np.argmin(np.linalg.norm(centroids - v, axis=1))

        d = 0
        for i in range(k):
           c = augmented_image[clusters == i]
           new_centroid = c.mean(axis=0)
           d += np.linalg.norm(centroids[i] - new_centroid)
           centroids[i] = new_centroid

        if d == 0:
            break

    cluster_colors = [np.random.rand(3) for _ in range(k)]
    for i in range(k):
        img[clusters == i] = centroids[i]

    plt.subplot(3, 3, plot + 1)
    plt.title(f'k = {k}')
    plt.imshow(img)

plt.show()

#

is there anything obviously terrible here? running it on a 750x500 image is taking hours

#

it looks like it isn't even converging

desert oar Jun 21, 2021, 5:33 PM

#

uncut barn ```py if True: tokens = [t for t in tokens if t not in set(stopwords.wo...

this also wastefully re-computes set on the stopwords every iteration

#

stopwords_set = set(stopwords.words('english'))
tokens = [t for t in tokens if t not in stopwords_set]

uncut barn Jun 21, 2021, 5:35 PM

#

ah so this way saves time?

desert oar Jun 21, 2021, 5:35 PM

#

yes, and if you're doing text processing on a lot of data and/or have a big stopwords list, the time savings could add up

uncut barn Jun 21, 2021, 5:37 PM

#

ahh so this is why my runtime was taking too long, thanks

desert oar Jun 21, 2021, 5:42 PM

#

that probably isn't the only reason

#

you'd have to share more of your code

charred umbra Jun 21, 2021, 6:17 PM

#

Does anyone know how to write an ML paper?

For background, I've trained a image-based biostatistical model that can identify COVID-19 at >99% accuracy. I extended the model to three other diseases. It trained on CT scans and x-rays. It sucesfully identified Coronavirus, Tuberculosis, Carcinoma, & Pneumonia at above 93% accuracy, specificity, sensitivity, and precision when tested through one-vs-all adaptations of 2x2 confusion matrices in cross validation.

The model is deep-learning based in semi-supervised platform. It used convolutional neural-network, deep multilayer perceptron, isolation forest, and support vector machine to make diagnosis. It shows promising results, and I want to write the paper. Idk how these types of papers are written though.

grave frost Jun 21, 2021, 6:18 PM

#

charred umbra Does anyone know how to write an ML paper? For background, I've trained a image...

not to be discouraging, but there are literally 100s of papers that are already 99% accuracy giving out there

#

if its just for CV, not for actual conferences then I guess that doesn't really matter

charred umbra Jun 21, 2021, 6:18 PM

#

Yeah ik, but theyre all written differently

#

so Im so confused

#

btw, Im just writing this paper as a mf school science fair project, so Im not trynna get compensated or anything

grave frost Jun 21, 2021, 6:20 PM

#

I also wanted to write a paper too 😁 but writing even a decent paper requires a ton of knowledge and studying of previous methods - not to mention all the formality

charred umbra Jun 21, 2021, 6:21 PM

#

Thing is that Im required to

#

I already did the actual experiment with the network and stuff

#

Now I just gotta put it into word form by summer's end

grave frost Jun 21, 2021, 6:21 PM

#

if its only for a school project

#

then you just need to formalize whatever you have done - no need to write a full fledged paper

charred umbra Jun 21, 2021, 6:22 PM

#

Yeah Im thinking I want to try and get the paper to ISEF, but idk if it's good enough though

grave frost Jun 21, 2021, 6:23 PM

#

do they specifically ask for research papers? if not, then a document would be enough

charred umbra Jun 21, 2021, 6:24 PM

#

In the past 2 years, at least one paper at my regionals fair that advanced to ISEF was a ML classifier for cancer. The aforementioned model can do cancer, as well as other diseases like tuberculosis and stuff

charred umbra Jun 21, 2021, 6:24 PM

#

grave frost do they specifically ask for research papers? if not, then a document would be e...

they want a paper in the APA format

grave frost Jun 21, 2021, 6:26 PM

#

seems like just a document formatting way

#

so if you put decent amount of formalism in it, you would be fine

charred umbra Jun 21, 2021, 6:40 PM

#

ebic

lapis sequoia Jun 21, 2021, 6:42 PM

#

I have a question for professionals data scientists ; in which context do you use maths and in which context do you use coding ? thank you 🙂

ember sapphire Jun 21, 2021, 6:52 PM

#

my cost function is increasing from one iteration to the next

#

does that mean i have a problem in my implementation?

opaque stratus Jun 21, 2021, 7:20 PM

#

https://cdn.discordapp.com/attachments/441989398465085442/856166865231413258/RDT_20210620_1541032087122900829955148.jpg

grave sparrow Jun 21, 2021, 7:32 PM

#

So I am running into this dilemma that I have not found an answer for and its weird...
When concatenating the values of 2 columns at a row level, you have to do something ugly like

df['combined']=df['one'].astype(str)+' stuff ' + df['two'].astype(str)

It works and all but I can't help but feel like its a code smell

desert oar Jun 21, 2021, 7:32 PM

#

grave sparrow So I am running into this dilemma that I have not found an answer for and its we...

indeed it is, why are you concatenating stringified versions of your data at all?

#

and what are the actual datatypes here?

#

sometimes it is the best way to do something, but it's rare that this is actually what you want to do

grave sparrow Jun 21, 2021, 7:33 PM

#

Can't say particularly. Basically generating an instruction based on values in two columns.

desert oar Jun 21, 2021, 7:33 PM

#

the only other way to perform this particular task is with .apply over rows

#

if the column is already a string column, why astype(str)?

grave sparrow Jun 21, 2021, 7:33 PM

#

They are strings.

desert oar Jun 21, 2021, 7:33 PM

#

if there are nulls, you need to handle those differently

#

astype(str) will do the wrong thing for the most part

grave sparrow Jun 21, 2021, 7:34 PM

#

I was getting weird errors.

desert oar Jun 21, 2021, 7:34 PM

#

what errors

#

!e ```python
import pandas as pd
s1 = pd.Series(['a', 'b', None])
s2 = pd.Series(['x', None, 'z'])
print( s1.astype(str) + ' -> ' + s2.astype(str) )

arctic wedgeBOT Jun 21, 2021, 7:34 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | 0       a -> x
002 | 1    b -> None
003 | 2    None -> z
004 | dtype: object

grave sparrow Jun 21, 2021, 7:34 PM

#

Sry actually not weird.

desert oar Jun 21, 2021, 7:34 PM

#

that probably isn't what you want

grave sparrow Jun 21, 2021, 7:35 PM

#

Float > string coercion error

desert oar Jun 21, 2021, 7:35 PM

#

so you have mixed data types

#

i.e. not strings

#

are there np.nan's in there?

#

!e ```python
import pandas as pd
import numpy as np
s1 = pd.Series(['a', 'b', np.nan])
s2 = pd.Series(['x', np.nan, 'z'])
print( s1.astype(str) + ' -> ' + s2.astype(str) )

arctic wedgeBOT Jun 21, 2021, 7:35 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | 0      a -> x
002 | 1    b -> nan
003 | 2    nan -> z
004 | dtype: object

desert oar Jun 21, 2021, 7:35 PM

#

again, probably not what you want

grave sparrow Jun 21, 2021, 7:36 PM

#

Hmm how would you handle that?

#

Assuming the concatenated columns can potentially be null