#data-science-and-ml

1 messages · Page 294 of 1

distant needle
#

Tkinter is what it is. I don't have a problem with Tkinter. More so, I have a problem with the complete and utter lack of clarity managing these figure objects with matplotlib. Quite infuriating there isn't a simple ".destroy()" method to delete the associated figure object immediately

misty flint
#

as far as i remember, there isnt

distant needle
#

yaeh big wtf pikachu face right now

misty flint
lapis sequoia
#

does anyone know basic machine learning ?

wheat island
#

i made pytorch

exotic maple
uncut orbit
#

i do

wheat island
uncut orbit
#

sklearn

wheat island
#

what lib

uncut orbit
#

bit of tensorflow

wheat island
#

o

#

is tensorflow hard

uncut orbit
#

not too much

wheat island
#

o

#

on god?

#

i quit b4 i even started when i looked at the docs

uncut orbit
#

lmao

#

it can be complicated

#

don't look at the docs

#

look online

wheat island
#

m

uncut orbit
#

for some code

wheat island
#

ur online

#

teach me tensorlflow pl z

uncut orbit
#

and its meaning

lapis sequoia
uncut orbit
#

i hv this notebook i wrote

#

in colab

lapis sequoia
#

I hate you whoever texted me

wheat island
#

LMOOAO

#

noo:(

exotic maple
lapis sequoia
#

We all can relate sir

exotic maple
#
from parallel_universe_2 import wife_&_kids

Traceback: wife_&_kids not found
uncut orbit
#

LMAO

lapis sequoia
#

Sad

#

So sad

uncut orbit
#

LOVE THE JOKES

exotic maple
#

the big sad

lapis sequoia
#

Might not be just jokes

uncut orbit
#

huh

#

never thought that

lapis sequoia
lapis sequoia
#

Wow

uncut orbit
#

code is valuable

#

its on classification tho

misty flint
misty flint
uncut orbit
#

Thx

#

Love my expression on the emoji lmao

harsh trellis
#

umm

#

can someone tell me

#

what is the meaning of skewness?

#

how can i observe the skewness, i the data is rightly skewed or not

restive flare
grave frost
raven knoll
#

Hey, I am new to datascience. I am currently following a minor about bigData and I have a project, but I cannot figure out how I need to solve this problem. I have currently only learned the basics like pandas, webscraping and a bit of classifiers like KNN.

My project is to create a program that can predict if a hotel review is positive or negative. Every row of data I have has a positive and negative review, but I don't know how to start. Can anyone help me out?

primal tulip
#

Why not use pd.Series() instead?

harsh trellis
solid kindle
#

How to export pandas dataframe into text clipboard that I could paste into script as string and reimport as pandas dataframe again? this is for MVE

serene dragon
#

hi

#

Is there any way to leave header empty in Df when i use mulitple headers?

#

level_0 will be always present

#

but for levels_1 to 4 i want to have empyt field when there is no name present

serene dragon
#
for i, columns_old in enumerate(df_value.columns.levels):
    columns_new = np.where(columns_old.str.contains('Unnamed'), '', columns_old)
    df_value.rename(columns=dict(zip(columns_old, columns_new)), level=i, inplace=True)
#

got it

primal tulip
serene dragon
untold cove
#

Can anyone tell me how to add labels/text or a second yaxis from another Col to this: Do you know how I can add this legend or text to my px.bar, need to display the text for each of my values:

def SetColor(y):
        if(y <= 1):
            return "red"
        elif(y <= 2):
            return "orange" 
        elif(y <= 3):
            return "yellow"
        elif(y <= 4):
            return "lightgreen"
        elif(y <= 5 or y <= 6):
            return "green"
        elif(y <= 7):
            return "darkgreen"
        elif(y <= 8):
            return "silver"
        elif(y <= 9):
            return "gold"
def Setlabel(y):
        if(y <= 1):
            return "Very low (1)"
        elif(y <= 2):
            return "Low (2)" 
        elif(y <= 3):
            return "Below average (3)"
        elif(y <= 4):
            return "Average (4)"
        elif(y <= 5 or y <= 6):
            return "Average (5, 6)"
        elif(y <= 7):
            return "Above average (7)"
        elif(y <= 8):
            return "High (8)"
        elif(y <= 9):
            return "Very High (9)"


px.bar(filtered_df, x=filtered_df["ID"], y=filtered_df["Score"]).update_traces(marker = dict(color=list(map(SetColor, filtered_df['Score']))))


This doesn’t work:
##update_layout(legend = dict(list(map(Setlabel, SetColor, filtered_df['Score']))))
hoary wigeon
#

Hello

#

I need help with SCRAPING,

I manually saved the webpage in html format from browser.
Im able to retrieve dataframe from MANUALLY SAVED method.

I saved webpage using request url module in html format.
But i cannot retrieve dataframe from that.

Both looks same but I don't know this happens.
Thanks if you helped me.

austere swift
#

how do you want to get a dataframe from the html

#

is there a table in the webpage?

broken stratus
#

depends on u scape

#

*depends on how u scape

broken stratus
hoary wigeon
#

hold on,

it was very easy with pandas to read_html(mannualysave.html)
but not with request method

grave frost
untold cove
#

@grave frost have you used PlotlyExpress?

grave frost
untold cove
#

It doesn’t behave the same, take this for example:


=dict(list(Setlabel(bro))))
ValueError: dictionary update sequence element #0 has length 1; 2 is required.

That was with a basic data structure.’bro’

grave frost
#

you can try by using an example dict with sample data to better explain your issue here

#

or use a help-channel

untold cove
#

I got it working with the function anyway, doing it under the color trace not by updating traces. Haven’t managed to work out the second y axis tho or verify the data but I’m assuming adding the second y as a trace will resolve it.

#

Trust me a basic dict and values was the first thing I tried with the df

misty flint
#

i feel like every time work with numpy i end up having to reshape something

hollow sentinel
#

that is numpy for you

misty flint
undone lotus
#

Hello, wanted to know if I could get some assistance in using dynamic pivot in a query using TSQL. Please let me know if this is the right channel to ask this question or point me to the correct channel

misty flint
undone lotus
misty flint
#

ah yes, high degree polynomial regression, the ability to make anything fit anything

misty flint
#

funfetti-flavored

keen gull
#

Hi, Im programming a mini mathematics/scientific dice rolling game

#

its only a couple lines but I want it to be able to count the median of the sums

#

if someone dms i would be ecstatic

hollow sentinel
#

just post the code here

#

we will all take a look

keen gull
#

okay, code is in swedish though

#

import random
antalForsok = int(input("Hur många gånger vill du kasta?"))
tarningsSumma = 0
for n in range(0,antalForsok):
tarning1 = random.randint(1,6)
tarning2 = random.randint(1,6)
forsokSumma = tarning1 + tarning2
tarningsSumma += forsokSumma
print(tarning1,tarning2,forsokSumma,"\t",tarningsSumma)
print("Summan är",tarningsSumma)

#

if you run this in python, it will ask you how many times do you want to roll

#

and you can choose a number and it will give you the total sum

#

what if I want it to give the sum divided by the number chosen

#

tarning means dice

#

summa is sum

grave frost
#

!e

import random
antalForsok = int(input("Hur många gånger vill du kasta?"))
tarningsSumma = 0
for n in range(0,antalForsok):
 tarning1 = random.randint(1,6) 
 tarning2 = random.randint(1,6)
 forsokSumma = tarning1 + tarning2
 tarningsSumma += forsokSumma
 print(tarning1,tarning2,forsokSumma,"\t",tarningsSumma)
print("Summan är",tarningsSumma)
arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

keen gull
#

anything @hollow sentinel ?

arctic wedgeBOT
#

Hey @thin remnant!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

thin remnant
#

I'm having a csv file that has 32 cols

#

but all data is in the first col

#

how to seperate it overthe cols

misty flint
#

open with excel

#

@keen gull

import random
antalForsok = int(input("Hur många gånger vill du kasta?"))
tarningsSumma = 0
for n in range(0,antalForsok):
 tarning1 = random.randint(1,6) 
 tarning2 = random.randint(1,6)
 medelSumma = (tarning1 + tarning2)/antalForsok
 tarningsSumma += medelSumma
 print(tarning1,tarning2,medelSumma,"\t",tarningsSumma)
print("Genomsnittet är",tarningsSumma)
#

idk swedish so i used google translate

#

sorry if the words are wrong

#

lol

keen gull
#

ur a god bro wtf

hasty mountain
#

Guys, I have an excel file that got many spread sheets. One shows info about "Win rate with X character", other has data about "Win rate when X character follows Y route", and so on. They have different lengths. Can I just make a DataFrame with them? Or would it be better to just separate them all in different files and use them as different datasets?

keen gull
#

can I dm u if i need more help?

misty flint
#

uhh ill probably be busy later so just post here

hasty mountain
keen gull
#

okay thank you ❤️

misty flint
#

np

tidal bough
hasty mountain
#

For example: If I got a dataset that shows "Win rate when playing with X character"
and another that shows "Win rate when using Y item"
What if I want to predict "Win rate when playing with X character and using Y item"?

tidal bough
#

You can make a single dataset out of them if you can figure out how to make them all, well, the same kind of data

hasty mountain
#

I see

tidal bronze
#

yo pandas is struggling to read a csv file with just about less than half a milion rows is that normal?

#

It did issue me a warning about mixed dtypes but I then specified them

keen gull
#

@misty flint it didnt work, its still only showing the result

#

import random
antalForsok = int(input("Hur många gånger vill du kasta?"))
tarningsSumma = 0
for n in range(0,antalForsok):
tarning1 = random.randint(1,6)
tarning2 = random.randint(1,6)
medelSumma = (tarning1 + tarning2)/antalForsok
tarningsSumma += medelSumma
print(tarning1,tarning2,medelSumma,"\t",tarningsSumma)
print("Genomsnittet är",tarningsSumma)

misty flint
#

ah i forgot to change the last line

keen gull
#

i changed it

#

to medelSumma

misty flint
#

yeah

#

thats it

keen gull
#

but its a bit off, by like 0.15

misty flint
keen gull
#

hmm

#

its off depending on the number, not constant

misty flint
#

probably this line

tarningsSumma += medelSumma

#

not at my comp anymore so i cant check lol

keen gull
#

hmm i tried changing to only +, only =, only - and *

#

none worked

#

wait i just realized that it now shows numbers with decimals? its a dice roll so it cant be anything other than the numbers 1-6

misty flint
misty flint
#

the "Genomsnittet"

keen gull
#

yes the average can be in decimals but not the actual results

#

but when it shows u the sequence e.g 2 5 8, it shows now 2 5 1.3

misty flint
#

i think i dont understand the question. sorry bud

keen gull
#

for example, 1.6 is fine for it to have decimals

#

but here, you can see in line 3 that it says 5 6 2.2, that means the dice rolled 2.2, which isnt possible

misty flint
#

your code is definitely different than mine i think lol

#

oh wait i see the problem

keen gull
#

I just want it to show the median at the results

misty flint
#

i misunderstood the initial problem

keen gull
#

its the last line right_

#

?

#

that should be edited

#

thats my fault, im not that good at explaining

misty flint
keen gull
#

I was thinking:
print("Genomsnittet är", (tarning1 + tarning2)/antalForsok)

#

yes average

misty flint
#

import random
antalForsok = int(input("Hur många gånger vill du kasta?"))
tarningsSumma = 0
for n in range(0,antalForsok):
 tarning1 = random.randint(1,6) 
 tarning2 = random.randint(1,6)
 medelSumma = (tarning1 + tarning2)/2
 tarningsSumma += medelSumma
print(tarning1,tarning2,medelSumma,"\t",medelSumma)
print("Genomsnittet är",medelSumma)

keen gull
#

the result divided by the number the person chose

misty flint
#

oh man this is hard to do on mobile lol

keen gull
#

im having trouble on a laptop xD

misty flint
keen gull
#

let me try to formulate it

#

import random
antalForsok = int(input("Hur många gånger vill du kasta?"))
tarningsSumma = 0
for n in range(0,antalForsok):
tarning1 = random.randint(1,6)
tarning2 = random.randint(1,6)
forsokSumma = tarning1 + tarning2
tarningsSumma += forsokSumma
print(tarning1,tarning2,forsokSumma,"\t",tarningsSumma)
print("Summan är",tarningsSumma)

#

the original code shows the sum of all the dices you rolled

#

so if u chose 3 and got 1, 2, and 3 you would get the sum 6

#

i want it to take 6 and divided it by 3, the number chosen

#

which should give 2

grand monolith
#

dudes im having a slight problem, I have a (10, 1561) feature matrix but when i do feature[i] its shape is (1561, )

misty flint
#

then place x wherever you want it

keen gull
#

wdym place x wherever?, I just started coding 2 days ago sorry xD

misty flint
#

uhh just place it at the end and you will see what im talking about

#

print(x)

grand monolith
#

anyone here good at numpy?

keen gull
misty flint
#

that x is different than the x above it

keen gull
#

LMFAO HOW

#

omg i see it

#

36/5 isnt 1.8 unfortunately 😭

misty flint
keen gull
#

PLSSS im a nooob

#

🤣

misty flint
#

ah you want THAT number

keen gull
#

yeah xD

#

what is THAT in english xD

misty flint
#

change it to

x = tarningsSumma/antalForsok

keen gull
#

now it only shows sum

keen gull
misty flint
#

you deleted

print(x)

keen gull
#

LMFAOO

misty flint
keen gull
#

FINALLY

#

HOLY SHIT

misty flint
keen gull
#

so all I had to do was add those two lines?

misty flint
#

yes, you can even change the last one to

#

print("Genomsnittet är", x)

keen gull
#

can i kiss u

misty flint
#

np dude

keen gull
#

how can i make an empty space between the sum and the average thingy

#

between the last two

misty flint
#

just do

print()
in between the two lines

keen gull
#

oh makes sense

misty flint
keen gull
#

oh i love you bro

misty flint
astral path
#

hey y'all im gonna be doing a soundcloud network analysis project to recommend new users to listeners and I have to combine it with another dataset for an assignment. Any ideas for what I could combine it with for some interesting insights?

uncut orbit
#

whats in ur first dataset

#

the columsn

keen gull
#

@misty flint dont kill me but...

#

import random
antalForsok = int(input("Hur många gånger vill du kasta?"))
raknare = 0
for n in range(0, antalForsok):
tarning1 = random.randint(1,6)
tarning2 = random.randint(1,6)
forsokSumma = tarning1 + tarning2
if forsokSumma == 9:
raknare += 1
print(tarning1, tarning2, forsokSumma, "\t", raknare)
print("Antal gånger summan 9 slogs:", raknare)

#

this is a new code

#

and i wanna make it find the liklihood of getting the sum 15

astral path
# uncut orbit whats in ur first dataset

I don't know yet but it should be things like genre of songs, other artists an artist is following/who follows them, songs #, song length, external social media links, etc...

uncut orbit
#

popularity?

#

of the song out of ten

astral path
#

that will be one metric yea

#

ah

#

yeah i could do that

#

the assignment needs an external dataset though, nothing i could scrape from soundcloud

uncut orbit
#

hmm

astral path
#

im thinking i could combine it with data i scrape from their twitter links in their bios but not many artists do that

uncut orbit
#

prolly not

#

how big is the data set

#

the rows

astral path
#

i have no idea yet, im planning out the architecture of the project first

uncut orbit
#

oh

#

i dont think i should be doing this much cuz of the rules

astral path
#

Hmm im not sure which part of that applies?

uncut orbit
#

rule 5 sorry

astral path
#

Ah ok this isn't a solution, this part is ungraded

#

My prof only grades on our actual analaysis

uncut orbit
#

oh

#

then lets see

#

i cant think

astral path
#

Same im having a hard time w this

uncut orbit
#

maybe loudness of the song?

#

beats per measure?

#

whats your target column

keen gull
#

guys, what is it called when you say "what two numbers added together ranging from 1-12 are equal to 9"

#

how do you turn that into a python line or what is it even called mathematically

#

so for example, 1 and 8, 2 and 7, 3 and 6, 4 and 5

astral path
uncut orbit
#

your target column is what you would define as y...its what you are trying to predict

astral path
#

ah ok

#

it would be other similar users

uncut orbit
#

so on that train of though

#

what do other users do?

astral path
#

they're all music artists with the same feature types and stuff

grave frost
astral path
#

so like given an artist as input, it would produce a list of artists who are related

#

and i'm looking for ways to implement external datasets in here

grave frost
astral path
#

do i anything about music?

grave frost
# astral path do i anything about music?

I can't identify your tone 🤷 but If I were you, I would use the time key signature, bpm, and indices of rests in songs to create features and cluster artists via them. (you could take median of that most prob, average doesn't seem appropriate)

#

or you could construct an artificial feature concatenating indices of rest and their distance along with time key in a formula and use that

astral path
#

hey sorry i'm in a zoom but i'll be back in a sec

fiery maple
#

Anyone using Kedro here?

umbral raptor
#

Hello all, nice to meet you.

fiery maple
#

Hhahahha an impostor

umbral raptor
fiery maple
#

Yes, I think if we search BackPropa* on the usernames, thousands of them would appear on results

umbral raptor
#

@fiery maple So Kedro? No I don't feel comfortable with this type of frameworks

lavish tundra
#

someone know how to set gridline color on a seaborn lineplot graphic?

exotic maple
#

you can always find the matplotlib object directly

#

i think for grid it was... plt.gcf().ygrids

lavish tundra
#

i dont use matplotlib . _.

exotic maple
#

havent used matplotlib in a while

lavish tundra
#

i was thinking if i left seaborn to only use matplotlib if could be better for performance, cause i only need to graphics like this:

exotic maple
#

you can use both

#

but for that id probably just use mpl

#

for something prettier define seaborn, much easier

lavish tundra
#

idk, i'm trying to use less imports, cause i have a big list rn . _.

#

i'm with fear if my bot can have problems with this

grave frost
#

or you can do import math and use it like math.sqrt()

undone heron
#

hey guys, I need some help with Tableu/Power BI stuff, is this the right place?

hollow sentinel
#

I think we only help with Python libraries for data science/ML

grave frost
#

For discussion of scientific python, matplotlib, statistics, machine learning and related topics.

iron basalt
#

I'm guessing @undone heron is trying to run a python script in Power BI.

serene scaffold
velvet thorn
#

most of the overhead of MPL is from drawing

bitter harbor
#

If I wanted to drop rows in a df where the column 'victory_status' is equal to 'outoftime' would I not just do: df.drop(index=np.where(df.data["victory_status"] == "outoftime"))?

serene scaffold
bitter harbor
iron basalt
#

You can flip it around: keep all of the rows where 'victory_status' is not 'outoftime'.

serene scaffold
#

!paste If you copy and paste enough rows for me to try it, I will try it or look for other solutions.

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

bitter harbor
serene scaffold
#

I'm not sure why it didn't occur to me sooner

bitter harbor
#

I'm ngl I have no clue what a mask is

serene scaffold
#

df = df[df['victory_status'] != 'outoftime']

#

you use the mask, df['victory_status'] != 'outoftime', to get a boolean series of what you want

#

and then you use that as a mask

#

which is what the outer df[...] does

iron basalt
bitter harbor
#

oh ok ya I've used that but only in R

#

is that just a pandas thing?

serene scaffold
iron basalt
bitter harbor
#

huh I'll definitely have to remember that that's hella useful

iron basalt
#

Think of it like doing a search, where you keep narrowing it down with each filter.

bitter harbor
iron basalt
#

You're using python and you are worried about memory (use c instead)? But ok, you can just apply multiple filters at once, and yes you have all those resulting rows duplicated, but it's much less error prone since you are not modifying state (original df unchanged). If you have a memory issue, then fix it at that time. Until then it's premature optimization.

#

Generally I doubt you will have a memory issue, but if you do, consider using a DB since it will store most of the data on disk for you (well all, but it will keep some in memory for faster access).

lean ledge
#

That's actually amazing for scientific and engineering applications

iron basalt
#

I use dearpygui for all kinds of things now, like telemetry.

lean ledge
#

If I ever find some free time I might remake the features of MATLAB's Control systems toolbox or something using that

bitter harbor
#

well honestly I think using a db would solve most of my issues at this point but my prof wants us to use pandas
I get what you're saying about not needing micro-optimisations but at the same time creating multiple dfs seems unnatural to me and I like micro-optimisations 🤷‍♂️

iron basalt
#

Yeah it has all of dear imgui's stuff including the raw draw commands which you can use to make custom widgets.

iron basalt
#

8GB is a lot, unless you are dealing with like raw video.

bitter harbor
#

4 but ya I see what you're saying

iron basalt
#

Also you have virtual memory too, so it will use the disk also.

#

(All modern operating systems do, as long as you don't cause pages to fly in and out really fast you basically have unlimited memory)

warm seal
#

Quick question: I'm trying to run a correlation test between an ordinal DV and a continuous IV. Stuck between choosing Spearman's or Kendall's. Any advice?

lavish tundra
#

i was thinking about to swipe from seaborn to plotpy, what u guys think about? u guys have one favorite for performance and beaultiful visualization?

misty flint
#

anything but matplotlib

#

please

lavish tundra
#

why matplotlib? looks like i dont need it

#

i mean i could use plotpy only

exotic maple
misty flint
exotic maple
#

I love how I've changed from dumb social networks to github repos and from bullshit bookmarks to python or ds articles o.O

#

I think somenoe shared this here earlier, and it looks amazing:

misty flint
uncut orbit
#

i wish people talked here more

#

there can be so many great discussions on this topic

lean ledge
#

There's like 3-5 people who actually know ML stuff here that are regulars and they generally have better things to do than talk all day

iron basalt
#

I'm coding and reading, I have already been distracted too much here xd, but Raggy's stuff was p cool so kind of worth it.

uncut orbit
#

huh thats true

misty flint
uncut orbit
#

alr

misty flint
#

rn im looking at starting an ocr project

#

so looking into that

iron basalt
#

Right now i'm a bit more in a reading phase so I can just drop in, but when I go back to a coding / engineering phase I will pretty narrowly focused on it.

misty flint
#

im just here for the vibes

#

and to learn by osmosis

#

🧠

uncut orbit
#

im here for the data science and the ai

#

its fun

misty flint
#

ive just started grad school coming from a dif field so im mainly in the learning phase

uncut orbit
#

all of it

#

neural nets mainly

#

GANS

#

CNNs

#

all that good stuff

#

opencv is fun to work with

#

its awesome what you can do with it

misty flint
#

oh nice

#

one of my last projects was with opencv. it was cool

#

our prof wants us to train a CNN for this project

uncut orbit
#

oh

#

u using keras?

misty flint
#

yeah thats the plan. still reading more before actually training anything

#

probs will start tomorrow morning tho

#

we have a meeting with the TA in the afternoon

uncut orbit
#

cool

#

i wish i had assignments like those

#

but im not in college yert

#

*yet

misty flint
#

noice. bet you know more than me tho

#

regarding this stuff. maybe not other things lol

misty flint
uncut orbit
#

i definetly will

#

this stuff makes dopamine in my brain

misty flint
#

haha nice expression

uncut orbit
#

thx

misty flint
#

60000 by 785

#

lets see if excel lags

#

hmmmmm

#

theres something weird

misty flint
#

theres stuff up to rows 300k in excel but when i read the file with python, it says its shape is only 60k by 785

placid drum
#

hello, XML question:

when do you use
<device name="SEP12345"></device>

and when do you use
<device>SEP12345</device>

tidal bronze
#

I did a df.groupby["column"].count()

#

how come a bunch of values are 0?

#

shouldn't every value be at least 1?

#

Otherwise they wouldn't exist no?

summer fractal
#

Hello, is anyone familiar with using Google Data Studio? I've got an issue with custom data not displaying correctly in charts?

obtuse sable
#

What's a good resource for a beginner to learn about neural networks and implement one in PyTorch? I want to implement a binary classifier and compare it to a logistic regression model in sklearn

lapis sequoia
#

whats data science?

lean ledge
#

TL;DR buzzword for statistics and machine learning with coding

grave frost
#

!resources @obtuse sable

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold
misty flint
serene scaffold
#

I have a program that calculates precision, recall, and f1 scores, and it breaks if there's a class with 0 tps. in that case should I set all three scores to zero, or would one usually use nan?

grave frost
#

Anyone know what is the technical term for the multi-branch networks (the ones where we would concatenate layers)?

lean ledge
#

Multi output is sorta common too

tidal bronze
#

so I have a subset of a df, columns(productID and orders), there are 5 unique product id, when I try to to a histplot with hue=productID, all productID from the original df show up

lapis sequoia
#

Hey! So I am just starting with python

tidal bronze
#
common = agg_df.sort_values(by=["sh_18_Ln_PalletsEquivalent"], ascending=False).index.tolist()[:5]
df_most_ordered = df[df["sh_ItemId"].isin(common)]
sns.histplot(data=df_most_ordered,
             x = "sh_18_Ln_PalletsEquivalent",
             hue="sh_ItemId")
lapis sequoia
#

Question... modules are synonymous of libraries?

tidal bronze
#

a modume is another python scrip

lapis sequoia
#

Excellent thanks!!

tidal bronze
#

my question is why are all the orginal productID showing up

#

where they aren't even part of the dataframe I am basing the plot on

lapis sequoia
#

yo

#

i am trying out a beginner project

#

as u can see, the data is the documents

#

and i really dont understand how it predicted both "that kud" and "HOHOHO" as belonging to the same cluster in this case

#

is there anything that i should fix there? i dont think that it's actually legit

#

Hey guys I'm working with matplotlib subplot animations and I'm running into some issues, could someone please just tell me what is wrong with this code? I don't get the logic behind it.

import matplotlib.animation as animation
import matplotlib.pyplot as plt
import numpy as np

figure = plt.figure()
n = 1000

x1 = np.random.normal(-2.5, 1, 10000)
x2 = np.random.gamma(2, 1.5, 10000)
x3 = np.random.exponential(2, 10000) + 7
x4 = np.random.uniform(14, 20, 10000)
x = [x1, x2, x3, x4]


def plot_hist(curr):
    if curr == n:
        a.even_source.stop()
    for i in range(len(axs)):
        axs[i].cla()
        axs[i].hist(x[i], bins=100)


fig, ((top_left, top_right), (bottom_left, bottom_right)) = plt.subplots(2, 2, sharex=True)
axs = [top_left, top_right, bottom_left, bottom_right]

a = animation.FuncAnimation(figure, plot_hist, interval=100)```
grave frost
#

Is there any specific advantage that architecture offers?

lean ledge
#

Multi channel is multiple branches as inputs

lean ledge
#

Multiple output branches are basically multiple networks that share input parameters

grave frost
lean ledge
#

Means you can do less processing because you're likely to be learning the same lower level features in early layers anyway

#

See: Mask RCNN

#

It's basically Faster RCNN that shares its early parameters with a segmentation branch

grave frost
#

tbh that's pretty creative

#

but it doesn't do much to push SOTA, does it?

lapis sequoia
#

i am pretty unsure tbh

grave frost
#

you can use cosine similarity or simple euclidean distance to check whether the vectors are indeed correct and debug that way

#

most prob there is little to no correlation

#

you can visualize them too so that its easy to understand and looks pretty too

lapis sequoia
#

i dont think i can visualize the datas tho

grave frost
#

I mean the vectors

lapis sequoia
#

they are in texts

#

oh?

#

how

grave frost
#

there are plenty of tutorials online

lapis sequoia
#

cool

#

thx

grave frost
#

I think there was a guy recently who posted the same article here, but you would have to search for it

lapis sequoia
#

the prediction makes more sense now

grave frost
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

pseudo raptor
#

Does anyone knows a good tutorial to start python tensorflow learning ?

grave frost
#

Linear Support Vector Machine is widely regarded as one of the best text classification algorithms.
google got this ^^^

lean ledge
grave frost
#

I had a question - can anyone suggest some method where I can incorporate document vectors with TF-Idf? it would be pretty easy with word vectors with simple scalar multiplication. but what approach could we use in documents??

austere swift
austere swift
#

or the tutorials

pseudo raptor
lapis sequoia
#

hi, i want some help. So i am unsure of the type of machine learning algorithm i should use, i hope yall know the type of learning that suits my project.

a) i want it to be able to be able to categorize sentences, such as "chitchat" and "task" and even sub categories based on any pattern that it can find from the sentences. ( main objective )

b) i want to let it be able to be trained by this way. I first define the categories, then i put some sentences in the category. it tries to predict the category of test sentence and i can tell the correct category if it's wrong ( i think i can do the same with supervised learning for the "telling the correct category" part )

grave frost
#

For me, TF just adds a lot of native support which makes it easier to do stuff that uses other Google products (like GCP, Tfrecords, TPU, etc.) having native support is a big thing that prevents TF people from going to PyTorch. I wanted to do PyT with TPU and that was a mess of errors. with TF - literally 5 lines. Same with model parallelism - its just easier.
Thats why a lot of people stick with TF because it just makes life easy 🤷 Tho my aim is to be familiar with pytorch too by the time I end undergrad or smthing

austere swift
#

for me it depends on the complexity of the project

#

i like using keras with tensorflow for simple projects

#

since its a lot easier

#

but for more complex stuff i use pytorch

#

i've never really used raw tensorflow with no keras

lapis sequoia
#

@grave frost @austere swift what do u think of my project?

grave frost
#

there is no need to use raw TF when keras works

austere swift
#

yeah

grave frost
#

its only for researchers and power users I guess

austere swift
#

as for TPU afaik thats like one of the biggest advantages that TF has

#

pytorch just doesn't really like TPUs

#

but I train on local servers rather than kaggle or colab

#

so I use GPUs anyways

grave frost
#

ye, XLA f-ing sucks. that thing produces a shit-ton of errors that are basically the exact opposite of what the error says

#

never. again.

grave frost
#

rich boi

#

I would never be able to take the brave step with local hardware (but then I don't really have $$)

austere swift
#

my dad gets grants for research stuff, so thats how i get funded

#

I'm the one who mostly uses the hardware though

#

my dads a professor and physicist at a university

grave frost
#

that is pretty cool

austere swift
#

yeah

grave frost
#

so whats the config?

austere swift
#

this is the main server i use

#

4 rtx 6000s, dual xeon 6242s

#

384gb ram

#

i also have a second one which is similar except it has dual 4210s and a single rtx 6000

#

but i'm soon gonna be upgrading it to have an A40, then moving the rtx 6000 from it to the one with 4 rtx 6000s

grave frost
#

dude, please stop. my potato computer would crash seeing such expensive hardware

austere swift
#

so its gonna be one with an A40, and one with 5 rtx 6000s

grave frost
#

do you do kaggle?

austere swift
#

kaggle competitions?

#

yeah sometimes

grave frost
#

ye

#

so do you use your rig to train massive models?

austere swift
#

yeah

grave frost
#

cuz I doubt then you wouldn't atleast come at top 5

austere swift
#

sometimes i just use it to train multiple models at once too

#

one on each gpu, or 2 each using 2 GPUS

#

or any config like that

#

mix and match :)

grave frost
#

lucky you

#

experimentation must be a breeze when you don't have to think about resources

#

its all just writing the code

austere swift
#

yeah but the code is more complicated when you have to make it take advantage of the GPUs

#

coding for multi gpu is more complicated

grave frost
#

still

austere swift
#

yeah lol its nice

grave frost
#

you can't experiment with that code unless you have multi-gpus

austere swift
#

but the power bills are 📈 📈 📈

grave frost
#

just train it at the night 🙂

austere swift
#

it draws too much power to be plugged into one circuit

#

so i have an extension cord to have one of the psus connected to a different circuit

#

cus it has 2 psus

grave frost
#

nice

austere swift
#

yeah

#

it tripped the breaker 3 times before i figured that out

exotic maple
lapis sequoia
#

it's to classify sentences into sub categories

#

but it's ok now

#

i figured that it wouldnt work

exotic maple
#

its not that it wouldnt work. Its just you need to clearly define what you want lol

#

sounds more like an NPL kind of problem though.

#

NLP*

spare vine
#

any ideas why this mnist dataset has more than the expected 70,000 rows? https://www.openml.org/d/554

misty flint
#

ahhh i messed up preprocessing my data

#

just now caught it

uncut orbit
#

nah you're not

#

there must have been more data

spare vine
#

thanks for checking it. i just realised that it is 70,000 but i was also counting the metadata lines

uncut orbit
#

oh

spare vine
#

that's why i'm dumb 😛

uncut orbit
#

oh

spare vine
#

numberofinstances is 70,000 == number of rows. number of columns is 785

lapis sequoia
#

this is why i just gonna stop trying it

#

i wanted it to work like a personal assistant

#

the problem is

#

it wouldnt understand the way to execute the commands

#

such as

#

turn on my fan

#

every command has to be coded

#

i can actually use ml for the part i mentioned

#

the problem is executing my commands

austere swift
#

oh nvm you already figured it out

grave frost
#

@lapis sequoia it would be much helpful if you can condense your issue down to a single paragraph over one message

hollow sentinel
#

Is bruher oui oui baguette

boreal summit
#

For those who are into web scraping and selenium.

#

I'm reading this book, and I need to have Gecko to run some code.

#

I've been searching how to download Gecko driver but I couldn't find much information like the .exe file itself.

#

I already have Firefox installed on my laptop, so do I just use the Firefox installation as the driver or is there a way to download Gecko driver's.exe file?

#

Thanks.

#

Also, I'm on windows.

spare vine
#

find the win zip (either 32 or 64 based on your PC, most likely 64)

boreal summit
misty flint
#

ive told the dataframe to add 10 to every number in the 0th index

#

why does it add it to the first index

sudden ether
#

how can i make sense of stanord's stanza's outputs?

misty flint
#

aha

#

why is this not 0??

#

this is the culprit

exotic maple
#

Impressive

That dataframe is the least explicative table i've ever seen in my life lol

misty flint
#

its the worst

#

also interesting note

#

if you try to read a csv on google colab before its fully uploaded, it still reads it

#

it just reads the rows it has currently

#

so like, half the data DoggoKek

#

idk why colab doesnt return an error

#

was supposed to be ~400k rows but only 200k were in the dataframe

keen pivot
#

Which pdf library should one use to read PDFs?

#

is there a "best one" or standard one most folks use?

exotic maple
#

PDF is a pain to work with

#

depending on what you want, either library works best

misty flint
#

pypdf is nice

#

also you know what they should make in the future?

#

some kind of way to estimate how long a model will take to train

#

based on how big the dataset is, etc., etc.

#

im here staring at the rotating circle

#

like

keen pivot
misty flint
#

pypdf

keen pivot
#

I'm trying it right now and I'm not finding it working well

#

I'm about to check out PDFminer.six

misty flint
#

rip

#

works for me

keen pivot
#

what does your code look like?

#

maybe I'm doing something wrong

exotic maple
#

i literally converted it to a different file type because it was easier lol

#

I used this

#

but if your data is sensitive...well, no idea tbh

misty flint
#

um im training a model rn so...lol

#

also i have a meeting with the ta where im supposed to show this model

keen pivot
#

^^

#

Good luck!

misty flint
#

thanks. the model wont train in time bc i waited last minute

#

look into ocr maybe

#

tesseract was the library we used for our team project

keen pivot
#

.extractText() just doesn't seem to work

keen pivot
exotic maple
#

maybe its one of those beautiful PDFs that store the data as in image

#

-pukes-

keen pivot
#

might be

#

but I can definitely highlight the text

#

in the pdf... so the text must be in there right?

#

i guess I can copy over another pdf and test it out

exotic maple
#

should be. Unfortunately i dont think i can help you anymore thou 😦

keen pivot
#

Looks like It works with another, simpler PDF.

#

The text seems to be broken up in some really weird points... hmm

#

There's definitely bits missing...

#

okay so pyPDF doesnt' work and when it does it misses out text

#

buuuut

#

when i use pdfminer the text comes out for both example pdfs

#

it works like a charm

#

the only issue i think is just it's a lot... harder to use

molten hamlet
#

How can I use sklearn PCA with pandas table where I got string values?

#

car model ValueError: could not convert string to float: 'alfa-romero'

keen pivot
exotic maple
#

for exmple you can use sklearns OneHotEncoder

exotic maple
#

PDFs are unbearable

lean ledge
#

The PCA of categorial data is weird, you might also just want to not include that in your PCA.

misty flint
keen pivot
misty flint
keen pivot
#

It's high-level function makes it real easy

misty flint
#

ill have to look into this

#

to see if we can also use this for our project

#

thanks for the link

#

hmmm

#

WARNING:tensorflow:Learning rate reduction is conditioned on metric val_acc which is not available. Available metrics are: loss,categorical_accuracy,auc,val_loss,val_categorical_accuracy,val_auc,lr

#

the model is still training so...

#

why doesnt the TF documentation have a list of the warnings

#

oh well ig the learning rate reduction just wont happen

keen pivot
#

What are you building?

misty flint
#

a basic CNN to read handwritten characters

exotic maple
#

and then later encode your categorical?

austere swift
atomic swan
#

Does anyone know how to resolve matplotlib graphing frequency on the x axis? My chart is the wrong way round

lean ledge
exotic maple
#

what is a good feature selection process for categoricals? PCA, MDS and t-SNE probably wouldnt work with those right?

atomic swan
#

seriously... nobody knows?

#

in the data science channel

lean ledge
exotic maple
#

Chi squared? I hve never used X2 like that. interestnig. I'll read about it. thanks!

lean ledge
#

It's just like usual hypothesis testing honestly

misty flint
misty flint
#

your problem could be different things

#

maybe the data isnt sorted right

#

maybe your variable names are wrong

#

etc.

misty flint
#

what do people do when their model is still training

#

just chill

#

grab lunch?

#

etc.

exotic maple
#

or troll in this discord :v

misty flint
#

i see you speak from personal experience

#

how wise

exotic maple
#

time to Khan Academy 🃏

misty flint
#

theres a good open source textbook

#

if youre a textbook guy

#

i just read chapters 5 and 6 and they were pretty decently written

#

on the shorter side for textbooks too

#

or ig you could watch the videos too

#

if youre a video guy

exotic maple
#

I'd prefer classrooms for mathc lasses but in these beautiful times

#

videos will do

misty flint
exotic maple
#

honestly khan academy is pretty good; i just need a refresher

misty flint
#

ye

#

the book is just if you need more specifics ig

#

i make my students that i tutor watch khan academy

exotic maple
#

that's an improvement over college teachers that give you a book and sya "let me know if you dont get something" hyperlemon

misty flint
#

and then they never respond to their email if you actually need them

hollow sentinel
#

I emailed my writing professor like a week ago about an assignment bc I couldn’t find it

#

She never even responded

misty flint
#

when youre waiting for a model to train for your project, so you end up working on your other project

#

moral of the story:

#

dont procrastinate kids

iron basalt
#

During training I think about the model and then realize that I have a (logic / not immediately noticeable) bug and all that time spent was for nothing.

#

Or the model is not doing well and I can't tell if it's a bug or if my experimental model is just bad.

exotic maple
#

nah bro

#

the best is when you perform a gridsearch with 3 paramters, 5 variations for each parameters, but you forget to set the scoring to something other than accuracy (which is why i needed)

#

and you waste 1 hour of your life watching your PC burn

misty flint
#

could be worse tho

#

you couldve wasted even more time with a larger one

exotic maple
#

my friend just sent me the greatest piece of ehresy ive ever seen

import pandas as np
import numpy as pd
hollow sentinel
#

The real menaces don’t even import pandas as pd

#

they just import pandas

arctic wedgeBOT
#

@exotic maple I'm going to secretly execute pd, np = np, pd under the hood to switch it back. I will not be mocked!

twilit pilot
#

This is my pandas dataframe ```
image label
0 [146, 151, 156, 158, 160, 154, 144, 131, 127, ... butterfly
1 [156, 156, 156, 157, 158, 159, 160, 160, 160, ... butterfly
2 [146, 198, 172, 200, 168, 226, 186, 183, 192, ... butterfly
3 [66, 57, 53, 51, 42, 63, 95, 123, 139, 121, 77... butterfly
4 [48, 110, 212, 226, 232, 248, 144, 186, 190, 1... butterfly
... ... ...
26174 [157, 150, 122, 131, 149, 151, 162, 190, 132, ... squirrel
26175 [128, 119, 99, 73, 58, 132, 108, 110, 97, 88, ... squirrel
26176 [172, 162, 151, 108, 109, 115, 132, 174, 183, ... squirrel
26177 [16, 89, 112, 109, 65, 30, 46, 97, 98, 118, 8,... squirrel
26178 [97, 106, 94, 101, 55, 121, 129, 77, 35, 18, 8... squirrel

[26179 rows x 2 columns]
All the `images` are 1d arrays of length 2500 (all same length, type="uint8") I made my py
X = df['image']
y = df['label']
and I am trying to use an `sklearn.svm.SVC()` model and this is error i getpy
model = SVC()
model.fit(X, y)

#
TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\sohan\OneDrive\Documents\ProgrammingProjects\ImageClassification\main.py", line 22, in <module>
    model.fit(X_train, y_train)
  File "C:\Users\sohan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\sklearn\svm\_base.py", line 160, in fit
    X, y = self._validate_data(X, y, dtype=np.float64,
  File "C:\Users\sohan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\sklearn\base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "C:\Users\sohan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\arrays\numpy_.py", line 211, in __array__
    return np.asarray(self._ndarray, dtype=dtype)
  File "C:\Users\sohan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\numpy\core\_asarray.py", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
#

Can someone please help me with this error

serene scaffold
#

@twilit pilot do you understand what is meant by TypeError: only size-1 arrays can be converted to Python scalars?

#

Refer to this:

>>> import numpy as np
>>> arr = np.array([1])
>>> arr
array([1])
>>> int(arr)
1
>>> arr = np.array([1, 2, 3])
>>> int(arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: only size-1 arrays can be converted to Python scalars
>>> arr = np.array([[1]])
>>> int(arr)
1
twilit pilot
misty flint
velvet thorn
#

vs what shape it is

twilit pilot
#

@velvet thorn does it have to be a 2d array?

fair stream
#

How I can add a link to a word?

velvet thorn
twilit pilot
twilit pilot
#

ok

#

wait

#

@velvet thorn ```py
import os
import cv2
import pickle
import numpy as np
import pandas as pd
from sklearn.svm import SVC
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

Load the dataset

with open('data/dataframe.txt', 'rb') as infile:
df = pickle.load(infile)

"""
image label
0 [[146, 151, 156, 158, 160, 154, 144, 131, 127,... butterfly
1 [[156, 156, 156, 157, 158, 159, 160, 160, 160,... butterfly
2 [[146, 198, 172, 200, 168, 226, 186, 183, 192,... butterfly
3 [[66, 57, 53, 51, 42, 63, 95, 123, 139, 121, 7... butterfly
4 [[48, 110, 212, 226, 232, 248, 144, 186, 190, ... butterfly
... ... ...
26174 [[157, 150, 122, 131, 149, 151, 162, 190, 132,... squirrel
26175 [[128, 119, 99, 73, 58, 132, 108, 110, 97, 88,... squirrel
26176 [[172, 162, 151, 108, 109, 115, 132, 174, 183,... squirrel
26177 [[16, 89, 112, 109, 65, 30, 46, 97, 98, 118, 8... squirrel
26178 [[97, 106, 94, 101, 55, 121, 129, 77, 35, 18, ... squirrel

[26179 rows x 2 columns]
"""

X, y, X_train, X_test, y_train, y_test

X = df['image']
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Creating and training model

model = SVC()
model.fit(X_train, y_train) # <-- Error here
print(model.score(X_test, y_test))

brave owl
#

Hello Everyone, I'm new to ML and NumPy, I've a basic doubt posted at #help-cupcake , It'll be nice if you can help me out, Thank You.

iron basalt
twilit pilot
#

i mentioned it above

iron basalt
#

I don't see it in your code posted though.

twilit pilot
iron basalt
#

Which scikit version number?

twilit pilot
iron basalt
#

oh uh, I never gave a df directly to scikit-learn before, seems incorrect.

twilit pilot
#

converting it to numpy gave the same error

#

this has become very frustrating for me now 😅

iron basalt
#

Might as well upgrade to 0.24 first to make sure

twilit pilot
#

ok ill try that although there shouldnt be that big of a difference

iron basalt
#

How are you converting the columns to a numpy array

#

Your code posted is missing a bunch of things.

#

@twilit pilot

twilit pilot
#

@iron basalt I didn't show it in the code, but the pandas.Dataframe has a .to_numpy() method

iron basalt
#

Show the real code

fading burrow
#

can you show which line is causing the error?

#

the first occurence of the exception i mean.

twilit pilot
twilit pilot
# iron basalt Show the real code
import os
import cv2
import pickle
import numpy as np
import pandas as pd
from sklearn.svm import SVC
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# Load the dataset
with open('data/dataframe.txt', 'rb') as infile:
    df = pickle.load(infile)

"""
                                                   image      label
0      [146, 151, 156, 158, 160, 154, 144, 131, 127,...  butterfly
1      [156, 156, 156, 157, 158, 159, 160, 160, 160,...  butterfly
2      [146, 198, 172, 200, 168, 226, 186, 183, 192,...  butterfly
3      [66, 57, 53, 51, 42, 63, 95, 123, 139, 121, 7...  butterfly
4      [48, 110, 212, 226, 232, 248, 144, 186, 190, ...  butterfly
...                                                  ...        ...
26174  [157, 150, 122, 131, 149, 151, 162, 190, 132,...   squirrel
26175  [128, 119, 99, 73, 58, 132, 108, 110, 97, 88,...   squirrel
26176  [172, 162, 151, 108, 109, 115, 132, 174, 183,...   squirrel
26177  [16, 89, 112, 109, 65, 30, 46, 97, 98, 118, 8...   squirrel
26178  [97, 106, 94, 101, 55, 121, 129, 77, 35, 18, ...   squirrel

[26179 rows x 2 columns]
"""

# X, y, X_train, X_test, y_train, y_test
X = df['image'].to_numpy()
y = df['label'].to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Creating and training model
model = SVC()
model.fit(X_train, y_train) # <-- Error here
print(model.score(X_test, y_test))
```Its essectially the same thing
iron basalt
#

Where is the flatten?

twilit pilot
iron basalt
#

X is [[[]], [[]], ...]

#

Needs to be [[], [], [], ....]

twilit pilot
iron basalt
#

So you flattened it an loaded the flattened one

fading burrow
#

it seems like your function is getting an array when it should be getting numbers

#

if your data is (1,2,2), then the function will recieve arrays instead of numbers

iron basalt
#

what does x and y print like after becoming a numpy array.

twilit pilot
#
[array([146, 151, 156, ..., 149, 144, 142], dtype=uint8)
 array([156, 156, 156, ..., 141, 142, 141], dtype=uint8)
 array([146, 198, 172, ..., 204, 192, 190], dtype=uint8) ...
 array([172, 162, 151, ..., 199, 199, 198], dtype=uint8)
 array([ 16,  89, 112, ..., 240, 240, 240], dtype=uint8)
 array([ 97, 106,  94, ..., 253, 253, 253], dtype=uint8)]
#

that is X

fading burrow
#

and y?

twilit pilot
#

wait

#
['butterfly' 'butterfly' 'butterfly' ... 'squirrel' 'squirrel' 'squirrel']
#

@iron basalt @fading burrow

iron basalt
#

dtype?

twilit pilot
#

uint8

#

X dtype = uint8

iron basalt
#

and y?

twilit pilot
#

y is just a string

#

you can use label encoder, but will still get the same error

iron basalt
#

numpy arrays have multiple ways to store strings, idr is scikit learn accepts all

#

the error is probably x, should look more like this: [[...], [...], ...] but you have something strange going on with [ndarray(...), ...]

twilit pilot
#

so you want me to convert to regular list?

#

ok ill try

iron basalt
#

well one big 2d numpy array

#

not list of numpy arrays

fading burrow
#

i don't think that's the issue but you can try

twilit pilot
#

yea ill try rn

iron basalt
#

I'm just trying to match everything as much as possible.

twilit pilot
#

i can send you the whole thing if you want

#

so you can test it on your own computer

#

@iron basalt

iron basalt
#

sure

twilit pilot
#

ok gimme a little while

iron basalt
#

I don't need the whole thing, just some of the data

#

well, you have it pickled.

twilit pilot
#

its late for me rn, so i will head for bed. you can try running the code on your computer or editing it to make it work. Good Night!

iron basalt
#

I'm pretty sure the error is that X thing

misty flint
iron basalt
#

X is a numpy array with dtype "object"

#

because it contains multiple numpy arrays

twilit pilot
misty flint
#

all the examples have a different kind of input for X

#

at least for sklearn svm svc

iron basalt
#

The lesson here is to not store images as arrays in pandas, usually people have the dataset in some other format

twilit pilot
#

what format

misty flint
#

i will note this so i do not make the same mistake

iron basalt
#

like let's say you want to load mnist

#

you just use an mnist loader

#

If you have images and are making your own dataset

misty flint
#

and yes i remember the image dataset i was working with was NOT stored with np arrays

iron basalt
#

instead of storing the image data in the table, store paths to the image files, and use those to load all of them

#

combine into one thing

twilit pilot
#

but when i put into model, i need numpy array

iron basalt
#

Or do what many datasets like mnist do and store multiple images in one file

#

makes it easier

#

yea you do, and you still can

#

you just need some extra work now to make this X something like array([[...], [...], ...], dtype=np.uint8)

twilit pilot
#

so an np array of regular arrays?

iron basalt
#

yup got it running

#
import pickle
import numpy as np
import pandas as pd
from sklearn.svm import SVC
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# Load the dataset
with open('help_dataframe.txt', 'rb') as infile:
    df = pickle.load(infile)

"""
                                                   image      label
0      [146, 151, 156, 158, 160, 154, 144, 131, 127,...  butterfly
1      [156, 156, 156, 157, 158, 159, 160, 160, 160,...  butterfly
2      [146, 198, 172, 200, 168, 226, 186, 183, 192,...  butterfly
3      [66, 57, 53, 51, 42, 63, 95, 123, 139, 121, 7...  butterfly
4      [48, 110, 212, 226, 232, 248, 144, 186, 190, ...  butterfly
...                                                  ...        ...
26174  [157, 150, 122, 131, 149, 151, 162, 190, 132,...   squirrel
26175  [128, 119, 99, 73, 58, 132, 108, 110, 97, 88,...   squirrel
26176  [172, 162, 151, 108, 109, 115, 132, 174, 183,...   squirrel
26177  [16, 89, 112, 109, 65, 30, 46, 97, 98, 118, 8...   squirrel
26178  [97, 106, 94, 101, 55, 121, 129, 77, 35, 18, ...   squirrel

[26179 rows x 2 columns]
"""

# X, y, X_train, X_test, y_train, y_test
X = df['image'].to_numpy()
X = np.stack(X) # <- this stacks all the arrays in the array creating a 2d array
y = df['label'].to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Creating and training model
model = SVC()
model.fit(X_train, y_train) # <-- Error here
print(model.score(X_test, y_test))
#

Now if you print X you see something like:

#
[[123 125 123 ... 137 108 132]
 [ 96 104 101 ... 125 157  83]
 [147 143 147 ... 133 139 150]
 ...
 [231 232 227 ... 141 151 177]
 [225 222 217 ... 199 201 199]
 [115 116 116 ...  71  68 110]]
twilit pilot
#

so a 2d array what shape?

iron basalt
#

let me give you an example

twilit pilot
#

ok

iron basalt
#
>>> import numpy as np
>>> a = np.array([np.arange(3), np.arange(3), np.arange(3)])
>>> a
array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])
>>> a = np.empty(3, object)
>>> a[:] = [np.arange(3), np.arange(3), np.arange(3)]
>>> a
array([array([0, 1, 2]), array([0, 1, 2]), array([0, 1, 2])], dtype=object)
>>> np.stack(a)
array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])
>>> 
#

got it now?

twilit pilot
#

oooooohhhhhhh

#

i see

#

gosh man thanks!!

iron basalt
#

pandas to_numpy treats each cell's entry as a numpy array (object)

twilit pilot
#

i see

#

i didn't know this

iron basalt
#

Always check the types

twilit pilot
#

thanks to you i also learned something new thanks man!

iron basalt
#

Type errors in python can trickle down to later parts of the code

#

It's one of the big issues with dynamically typed languages

twilit pilot
#

i see

iron basalt
#

They are convenient, but can cause more errors.

twilit pilot
#

yes true

misty flint
iron basalt
#

Or more specifically, errors in other parts of the code, even though the error is elsewhere (error propagation)

#

So you as a python programmer need to error backprop 😉

misty flint
#

then you have to search like a pirate

twilit pilot
#

@iron basalt Thanks for taking your time to fix my error man I really appreciate it! Have a great rest of your day!

iron basalt
#

This is also a software engineering issue on scikit-learn's end, it should have assertions for the dtype of the inputs and shapes. As it can only really work with a certain subset of all dtypes possible (and shapes).

#

The assertion would have triggered and made the issue immediately obvious.

#

Assertions == good, use them everywhere to check things (check pre-conditions).

#

Yea I don't see any of the code dealing with the input being "object", only when it's specifically a string object. Just "object" will pass all checks and so it will error later on in numpy (looking inside the fit function on github).

#

The better approach would be to flip it around. Only allow a specific set of types, instead of having a bunch of checks for different ones and do conversions and stuff just for those (less error prone).

#

(Overly generic code, while not actually being fully generic)

misty flint
harsh trellis
#

guys

#

i have a question about, why do we use scaling in a data set like if my data is not skwed and have outliers then its not like scaling would be getting rid of them?

#

there are scalers like stander scallers and minmax scaler

#

can anyone help me?

ripe forge
#

Scaling isn't used to deal with outliers per se, or to deal with skewness. So your question needs to simply boil down to, what does scaling do.

#

And the answer there is, it depends on your model architecture. So tree based models don't necessarily need scaling since they create decision points from the data itself

#

But for deep learning and linear regression scaling offers unique benefits. For deep learning it helps with model fitting because the activation functions operate best within a certain range, and the weights are also initialized with a certain expectation for input scales*

#

For linear regression, scaling allows you to meaningfully compare coefficients across two features, which you can't do without scaling.

#

There might be other reasons too, but those are the ones I can think of off the top of my head

tidal bronze
#

If I were to use a box-cox transformation, is it possible to reconvert the results back to the original units? If so how could I do that?

#

@ripe forge somewhat related question

#

would this be it?

ripe forge
#

Yes.

tidal bronze
#

awesome sorry for bothering you for nothing 😘

harsh trellis
#

@tidal bronze for that u need to have copy of the dataset so u might wont lose it accidentally, or else u can restart the kernel and re run the cell again

#

@ripe forge i see, and does it work on adaboost and sgboost? or else ensemble models

tidal bronze
#

thanks for the tip

scenic ravine
#
df.loc[df_pl['placement_ts'] >= pd.to_datetime('2018')]

error Invalid comparison between dtype=datetime64[ns, pytz.FixedOffset(330)] and Timestamp

help please!

tidal bronze
#

it seems your two columns are not using the same datetime format

lean ledge
#

Scaling makes it so that the size of gradient updates is more fair to all dimensions

#

Otherwise your large steps combined with some gradient noise would make optimisation incredibly hard for many smaller scaled dimensions, or smaller learning rates wouldn't make meaningful progress on other dimensions

tidal bronze
#

using seaborn how could I make the last bars darker?

hoary wigeon
#

Hello

#

I want to retrieve tables from html file.

My code : https://paste.pythondiscord.com/moyiyowatu.py
Running this python script will copy the html code from the url mentioned in the code in current directory with name = f'{today_date}.html'
After saving the code, I want the table of that copied html file.

tidal bough
#

Sounds like you want to change the aspect ratio.

misty flint
lavish tundra
mighty cobalt
#

Hello everyone, need a little here

#

am trying to manipulate discord profile pics ran into a problem

#

how do i read gif's from Url's using cv and urllib

tidal bough
#

.set_aspect(1/2) for 1:2 height:width

lavish tundra
#

how i can know the right proportion?

tidal bough
#

WDYM right?

misty flint
mighty cobalt
#

how do i combine all frames and save them as Gif?

tidal bough
#

What library are you using? PIL has a guide on that, IIRC.

mighty cobalt
#

if asking me cv2

bronze jacinth
#

hello im new to machine learning and i just had a few doubts about a project that im doing

#

not really understanding the final output of the project (involves svm)

#

help?

lavish tundra
#

looks like he do the graphic and only after put the graphic on the center he put the legend over the graphic, i'll try to see if i have a way to put the legend on the side if that can fix it

tidal bough
hoary wigeon
#

i tried that 3 time in help

#

and twice here

lapis sequoia
#

i need help

#

help me

#

@hoary wigeon hi

#

help me

hoary wigeon
#

How may i help you ?

viscid quest
#

@hoary wigeon can you see the error in this code?

#

I mean there is an error in last commamd

raven halo
#

hello

viscid quest
#

Hye there.

misty flint
raven halo
misty flint
#

you should copy and paste instead of taking a picture like that

lapis sequoia
#

import speech_recognition as sr
import pyttsx3
import pywhatkit
import datetime
import wikipedia

listener = sr.Recognizer()
engine = pyttsx3.init()
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)

def talk(text):
engine.say(text)
engine.runAndWait()

def take_command():
try:
with sr.Microphone() as source:
print('listening...')
voice = listener.listen(source)
command = listener.recognize_google(voice)
command = command.lower()
if 'alexa' in command:
command = command.replace('alexa', '')
print(command)
except:
pass
return command

def run_alexa():
command = take_command()
print(command)
if 'play' in command:
song = command.replace('play', '')
talk('playing ' + song)
pywhatkit.playonyt(song)
elif 'time' in command:
time = datetime.datetime.now().strftime('%I:%M %p')
talk('Current time is ' + time)
elif 'who the heck is' in command:
person = command.replace('who the heck is', '')
info = wikipedia.summary(person, 1)
print(info)
talk(info)
elif 'date me' in command:
talk('fucker')
elif 'are you single' in command:
talk('I am in a relationship with wifi')
elif 'joke' in command:
talk(pyjokes.get_joke())
else:
talk('Please say the command again.')

while True:
run_alexa()

raven halo
#

:O

raven halo
#
@pájthon
lapis sequoia
#

D:\Users\Nadia\pyton\python.exe D:/Users/Nadia/PycharmProjects/alexa.py/mine_alexa.py
listening...
Traceback (most recent call last):
File "D:\Users\Nadia\PycharmProjects\alexa.py\mine_alexa.py", line 59, in <module>
run_alexa()
File "D:\Users\Nadia\PycharmProjects\alexa.py\mine_alexa.py", line 36, in run_alexa
if 'play' in command:
TypeError: argument of type 'NoneType' is not iterable
None

Process finished with exit code 1

#

errors

serene scaffold
#

@raven halo what are you talking about? please keep this channel on-topic

serene scaffold
raven halo
#

sorry

tidal bough
serene scaffold
# lapis sequoia yes

have you made a web app of any kind before? if you want it to run on Alexa devices, the actual AI component is abstracted away by Amazon.

bronze jacinth
spice shuttle
#

Hello. I searched and think I've selected the right topic for this question. I'm new to Python and wrote a while loop that ultimately produces a number stored in a variable called 'cycles' after each round of cycles is completed. How can I store these 'cycles' values and then print them after all rounds are completed? Here's what I have so far, but I keep getting an array with the same values for each cycle, ex. [7, 7, 7]
results = []
for i in range(totalGames):
results.append(cycles)

tidal bough
#

What you probably want instead is to store a copy of the current cycles:
results.append(cycles.copy())

spice shuttle
#

Add that copy outside of the while loop?

tidal bough
#

Not sure what you mean.

spice shuttle
#

I thought it'd make more sense to share it.

tidal bough
#

Why, every game, throw away all results and replace them with tons of copies of the latest one?

#

For that matter, you seem to have been planning to add them to cycleslist, not result.

#

Just, each iteration of the while-loop, do cycleslist.append(cycles) at the end.

spice shuttle
#

The assignment is to create a list of how many cycles for each game.

#

Thanks!!