#data-science-and-ml

1 messages · Page 332 of 1

undone flare
#

because the default axis is set to 0, that means it is row

silk axle
#

Ah yeah, thanks

#

Rather than dropping 40 and keeping 20, is there a way to specify which 20 to keep? @undone flare

#

So basically remove everything except what's passed

undone flare
#

you could just create a new dataframe with those 20 columns

undone flare
silk axle
#

Ended up doing a different approach and instead only reading the columns from the csv that I want (using usecols kwarg)

#

Seemed bad to read the entire database when I only need a fraction of it

undone flare
#

that works too

undone flare
#

skewness before tranformation

ph : 0.04891026669821542
Hardness : -0.08517383101708786
Solids : 0.595449442721807
Chloramines : 0.01296659647911324
Sulfate : -0.04652296251790013
Conductivity : 0.26666972862929905
Organic_carbon : -0.020002726567027108
Trihalomethanes : -0.051383722200829214
Turbidity : -0.03302682552748457
```after log transformation

ph : -2.2032213464172155
Hardness : -0.8250213149082217
Solids : -1.230858118768609
Chloramines : -1.069749910885117
Sulfate : -0.692747912780153
Conductivity : -0.20033687775898243
Organic_carbon : -0.9940495304526159
Trihalomethanes : -1.2119564041594677
Turbidity : -0.702269975309455

#

the goal is to make something look like a normal distribution right?

frosty ore
#

Anyway to install older versions of tensorflow like 2.3.1?

#

Without having to compile from source.

unborn glacier
#

pip install tensorflow==2.3.1

#

That's also the way you're supposed to format requirements.txt files, with the exact version listed

frosty ore
#

Of course

#

Go ahead and try pip install tensorflow==2.3.1 Let me know how it works.

hoary wigeon
#

Hello everyone

#

i need help with deploying my ML model

undone flare
hoary wigeon
#

i have successfully deployed my application heroku

#

for tweet sentiment analysis

#

It collects specified number of tweets and analyze it , and generate reports

#

when i tried using 900 tweets it exceeded the memory over 640Mib

#

when i tried with 600 tweets i faced server timeout, delay in response

#

how can i avoid it ?

undone flare
#

different dynos offer different max ram

#

are you using a free one?

hoary wigeon
#

yes

#

check this

#

for now im using 300 tweets

hoary wigeon
#

if not ram

#

so that app can work with 600 tweets

undone flare
#

I don't know, haven't used heroku much

hoary wigeon
#

im asking about heroku now

#

vectorizer does take time ?

undone flare
#

Count Vectorizer or Tfidf?

hoary wigeon
undone flare
hoary wigeon
#

i dint set stopword in tfidf

undone flare
#

ah alright

#

I don't think Tfidf is slow, I think nltk is slow

hoary wigeon
#

actually twitter api is slow in sending response

#

then selecting part from json and creating a dict

#

converting dict to df

#

and vectorizing it, applying model

#

is taking time

undone flare
#

can you not load json to pandas df?

hoary wigeon
#

contains too much unwanted data

undone flare
#

hmm

hoary wigeon
#

have a look

#

that's just one tweet result

arctic wedgeBOT
#

Hey @hoary wigeon!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

undone flare
#

yikes

hoary wigeon
#

yep

#

so we cant do anything ?

undone flare
#

not to my knowledge, maybe someone has done this same thing

hoary wigeon
#

its just one tweet and like this im using 300 tweets

#

and creating a dataframe

undone flare
#

that's gonna take some time ye

lapis sequoia
#

How would you put the legend outside to the right?

plt.legend(loc='center right')

has no effect.

hardy hornet
#

I've run my tsv file by Pandas on Jupyter

#

but i dont know why it appeared like this

#

anyone know to fix this?

lapis sequoia
hardy hornet
#

but it is true file in my computer

lapis sequoia
#

put the absolute path

hardy hornet
#

yepp, i have done it

fading wigeon
#

What sort of analysis would you use to figure out how close two groups of numbers are to one another?

hardy hornet
fading wigeon
#

Basically there is a "real" number and two algorithms that "guess" what that number is

#

and I have a ton of data

#

Trying to determine which algo is better

#

Oh, pandas has a .corr() method, I'll just use that, haha

lapis sequoia
#

Anyone know how I can move my matplotlib legend outside to the right of my graph?

lapis sequoia
silk axle
#

You can't move it outside

#

Or at least not with the default options

#

There might be a different method

lapis sequoia
#

You can. I think it's done with a subplot.

#

bbox_to_anchor

somber prism
#

anyone know a good metrics for multiclass clf ?

gentle epoch
#

I need an opinion about pandas

#

can I ask it here?

serene scaffold
gentle epoch
#

I'm still rather new to programming in general and this is just my second program. I'm using pandas to parse a csv file and pick specific cells with iloc

#

I haven't read the whole documentation yet, but rather posts on blogs and stacked overflow

#

my question is, is defining the type of data contained in each column with dtype={} really necessary?

#

@serene scaffold

serene scaffold
#

can you show what you've written so far?

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

gentle epoch
#

here's a sample

#
from os import path
import numpy as np
import random
import pandas as pd
from datetime import datetime as dt
from datetime import timedelta as td



def rollDice(numDice):
    results = []
    for x in range(numDice):
        results.append(random.choice(range(1,7)))
    
    return sum(results)

print(rollDice(3))


def HabMod_Table(path, rowN, colN):
    a = pd.read_csv(path)
    
    return a.iloc[rowN,colN]


table_path = R"H:\01 Libraries\Documents\Tosh0kan Studios\Coding\GURPS Space\Tables\12 - Habitability Modifiers Table.csv"
path = table_path

ovrl_wrldType = HabMod_Table(path,1,1)

print(type(ovrl_wrldType))```
#

and here's the table:

"No atmosphere or Trace atmosphere"                                               ,0
"Non-breathable atmosphere, Very Thin or above, Suffocating, Toxic, and Corrosive",-2
"Non-breathable atmosphere, Very Thin or above, Suffocating and Toxic only"       ,-1
"Non-breathable atmosphere, Very Thin or above, Suffocating only"                 ,0
"Breathable atmosphere (Very Thin)"                                               ,1
"Breathable atmosphere (Thin)"                                                    ,2
"Breathable atmosphere (Standard or Dense)"                                       ,3
"Breathable atmosphere (Very Dense or Superdense)"                                ,1
"Breathable atmosphere is not Marginal"                                           ,1
"No liquid-water oceans, or Hydrographic Coverage 0%"                             ,0
"Liquid-water oceans, Hydrographic Coverage 1% to 59%"                            ,1
"Liquid-water oceans, Hydrographic Coverage 60% to 90%"                           ,2
"Liquid-water oceans, Hydrographic Coverage 91% to 99%"                           ,1
"Liquid-water oceans, Hydrographic Coverage 100%"                                 ,0
"Breathable atmosphere, climate type is Frozen or Very Cold"                      ,0
"Breathable atmosphere, climate type is Cold"                                     ,1
"Breathable atmosphere, climate type is Chilly, Cool, Normal, Warm, or Tropical"  ,2
"Breathable atmosphere, climate type is Hot"                                      ,1
"Breathable atmosphere, climate type is Very Hot or Infernal"                     ,0```
serene scaffold
#

doesn't seem like pandas is necessary for this.

gentle epoch
#

agreed

#

but it was the first method I found when googling "how to pick a cell in a csv with python" so here we are

serene scaffold
#

well, looks like you figured it out lemon_hyperpleased

gentle epoch
#

yeah

serene scaffold
#

Also, your code is not following pep8

gentle epoch
#

what's pep8?

serene scaffold
#

the style guide. you should never have variableNamesLikeThis

#
# not pep8
def rollDice(numDice):
    results = []
    for x in range(numDice):
        results.append(random.choice(range(1,7)))
    
    return sum(results)

# pep8
def roll_dice(num_dice):
    results = []
    for x in range(num_dice):
        results.append(random.choice(range(1,7)))
    
    return sum(results)
#
# not pep8
def HabMod_Table(path, rowN, colN):
    a = pd.read_csv(path)
    
    return a.iloc[rowN,colN]


# pep8
def hab_mod_table(path, row_n, col_n):
    a = pd.read_csv(path)
    return a.iloc[row_n,col_n]
gentle epoch
#

I see

#

generally, when should I use dtype={}?

#

because pandas is too awesome to run tables any other way from now on

#

@serene scaffold

serene scaffold
gentle epoch
serene scaffold
gentle epoch
#

I see

#

also, it's reading numbers as str for some reason lol

#

so I guess I should just always use dtype

#

how annoying

serene scaffold
#

I rarely do. I'd have to look at the source file and the code to understand why your numbers are being inferred as strings.

chilly geyser
#

I'm able to get proper ints and floats from really simple examples.

A bit more 'adversarial' example might be csvs that store floats with surrounding ""

gentle epoch
#
9E10000000               ,0.009                     ,Trace
0.01                     ,0.5                       ,Very Thin
0.51                     ,0.8                       ,Thin
0.81                     ,1.2                       ,Standard
1.21                     ,1.5                       ,Dense
1.51                     ,10                        ,Very Dense
11                       ,9E10000000                ,Superdense```
#

all values in this table are being read as strings

chilly geyser
#

Those don't seem really...proper 👀

#

I think it'd be hard for a computer to know they are floats

gentle epoch
#

I mean

#

how is 1.51 not proper? legit question

sonic scaffold
#

What does it mean by ambiguous, i got ValueError saying the truth value of a Series is ambiguous

chilly geyser
#

I don't think space-padding in csvs is common, you generally get them really dense

hdr,hdr2
0,1.523523
234,4.5234
23,666.3453
chilly geyser
#

So the elements of pandas.Series can be truthy or falsey

gentle epoch
chilly geyser
#

And you should try to ask for the truthiness of the elements of the Series instead

sonic scaffold
chilly geyser
gentle epoch
# serene scaffold I rarely do. I'd have to look at the source file and the code to understand why ...

code

import pandas as pd
from datetime import datetime as dt
from datetime import timedelta as td

from pandas.io.parsers import read_csv



def atmo_pressure_table(path, rowN, colN):
    a = pd.read_csv(path)
    
    return a.iloc[rowN,colN]


table_path = R"H:\01 Libraries\Documents\Tosh0kan Studios\Coding\GURPS Space\Tables\3 - Atmospheric Pressure Categories Table.csv"

ovrl_wrldType = atmo_pressure_table(table_path,5,0)

print(type(ovrl_wrldType))```
sonic scaffold
chilly geyser
#

You generally want an all or some or on the entire Series

sonic scaffold
#

Yeah i want to get out the values that are False

chilly geyser
#

Or some and of some kind, of the elements of the series

gentle epoch
#

@chilly geyser padding removed

#
9E10000000,0.009,Trace
0.01,0.5,Very Thin
0.51,0.8,Thin
0.81,1.2,Standard
1.21,1.5,Dense
1.51,10,Very Dense
11,9E10000000,Superdense```
chilly geyser
gentle epoch
#

still being read as string

chilly geyser
#

Hmm that's a good question

#

I didn't know it doesn't understand float notation xEy

#

I think that might be a cause

sonic scaffold
#

This should target the elements idk why it's giving me that error

gentle epoch
#

the result

PS H:\01 Libraries\Documents\Tosh0kan Studios\Coding> & C:/Users/Tosh0kan/AppData/Local/Programs/Python/Python39/python.exe "h:/01 Libraries/Documents/Tosh0kan Studios/Coding/tester.py"
1.51
<class 'str'>```
sonic scaffold
chilly geyser
gentle epoch
#

wait

#

there are limits to scientific notation?

chilly geyser
#

Lmao yes

gentle epoch
#

over witch, it just becomes a string?

chilly geyser
#

It's stored in a limited memory space

gentle epoch
#

whaaaaaa?! lmao

#

oh

#

I see

chilly geyser
#

There's an IEEE standard for this, but the double limit (or just float64 is +-E308)

chilly geyser
#

I'm not too sure about InF, Inf, inF, etc within the csv, can test that

gentle epoch
#

InF, Inf, inF,
what are these?

#

ah

#

infinity

chilly geyser
#

oh, they all seem to work, so I think it's auto .lower()-ing them or something

#

Yeah there are a lot of 'obvious things' done in pd I think

#

It's almost too convenient

gentle epoch
#

because like

#

these tables are for a tabletop RPG

#

and I'm making a program to automate rolling them

chilly geyser
#

is there a difference between inf and those numbers you were using

gentle epoch
#

I didn't know how to refer to inf in a csv

#

so I used a ludicrous number lol

chilly geyser
#

ah, so this should work

gentle epoch
#

I put 9E10000000 in those two places because in the table is "0.009 or less"

chilly geyser
#

I think there's dedicated functions for inf, like DataFrame.isinf()

gentle epoch
#

so I just needed a really ridiculous number that would never come up

chilly geyser
#

Oh, that will certainly fail that, no issue with infinity order checking

gentle epoch
#

this is the original table

chilly geyser
#

Functionally (to me?) positive infinity is just an entity that is greater than any number, and should(?) error if compared against another positive infinity

#

Yeah >10 seems like you can go inf on it

#

less than 0.01, you can use 0? or just -inf

gentle epoch
#

I don't think it can read negative inf

#

I put -inf on the table

#

but it's reading print(ovrl_wrldType < 0) as false

#

when it should be true

#

so I'll go with just 0

proven sigil
proven sigil
prime hearth
#

hello, i would like to please know if ML certificates gives more of an eye to employers vs someone who didnt get one but they. have projects to showcase they know ML?

I am currently pursing a degree in CS though, but school doesnt teach ML, but im learning it on my own.

quasi sparrow
#

Hey guys, question about ML

#

How do you call a machine learning task that uses it's output for another machine learning task?

gentle epoch
#

is there any difference between:

import pandas as pd

def csv_parser(path):
    a = pd.read_csv(path)
    
    return a

teeburu = csv_parser(R"H:\01 Libraries\Documents\Tosh0kan Studios\Coding\GURPS Space\Tables\1 - Overall Type Table.csv")

print(type(teeburu.iloc[0,1]))
print(type(teeburu.iloc[0,0]))```

and: 
```py
import pandas as pd

def csv_parser(path):
    a = pd.read_csv(path)
    
    return a

teeburu = csv_parser(R"H:\01 Libraries\Documents\Tosh0kan Studios\Coding\GURPS Space\Tables\1 - Overall Type Table.csv")

teeburu_df = pd.DataFrame(data=teeburu)

print(type(teeburu_df.iloc[0,1]))
print(type(teeburu_df.iloc[0,0]))```
quasi sparrow
#

I think the second one is redundant

#

You don't need to specify it as a dataframe if you have already opened the CSV with pandas.

#

Pandas will upload it as a dataframe

gentle epoch
quasi sparrow
#
import pandas as pd
teeburu=pd.read_csv("H:\01 Libraries\Documents\Tosh0kan Studios\Coding\GURPS Space\Tables\1 - Overall Type Table.csv",header=None,sep=';')

print(type(teeburu_df.iloc[0,1]))
print(type(teeburu_df.iloc[0,0]))
#

I normally read CSV documents like this

#

you can choose if you want headers or not and the sep is dependent on what your CSV uses for separation of rows

#

It could be comma, or this ;

#

oops, like this:

import pandas as pd
teeburu=pd.read_csv("H:\01 Libraries\Documents\Tosh0kan Studios\Coding\GURPS Space\Tables\1 - Overall Type Table.csv",header=None,sep=';')

print(type(teeburu.iloc[0,1]))
print(type(teeburu.iloc[0,0]))
#

if the file does not load, check for relative or absolute path.

quasi sparrow
gentle epoch
#

alright

#

thank you

gentle epoch
strange portal
#

How can I do a very simple reinforcement learning, I already know a little about ML

quasi sparrow
#

you'll end up with a matrix of just features but no description/headers

gentle epoch
#

yep

#

here

#

headerint, list of int, default ‘infer’

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file,
#

straight from the documentation

quasi sparrow
#
feature1|feature2|feature3|
---------------------------
0934828|349823849|348238943|
---------------------------
4327823|323848484|378327474|
---------------------------

if set to False it is loaded like this:

---------------------------
0934828|349823849|348238943|
---------------------------
4327823|323848484|378327474|
---------------------------
#

If set to zero, then you are telling pandas row 0 is the header row

gentle epoch
#

I'm not asking about false. I was asking about setting header to 0

#

I don't think I ever asked anything about false

quasi sparrow
#

Oh yeah. Set to zero points to the zero row as header row

strange portal
#

I did something amazing with neural network, can I send it to you?

#

the code

gentle epoch
strange portal
#

😦

prime hearth
#

sorry repost :
hello, i would like to please know if ML certificates gives more of an eye to employers vs someone who didnt get one but they. have projects to showcase they know ML?

I am currently pursing a degree in CS though, but school doesnt teach ML, but im learning it on my own.

strange portal
prime hearth
#

you can just post it here

#

but i not sure what you want me to do

strange portal
visual field
#

Today I was using pandas to import csv to sqlite3 and using the separator ';' and notice that all 2000 records imported except for 3 records that found that one of the columns had a semi Colin and create another column. trying to find a way for the cvs data to be imported and ignoring the discovered extra semi colin. any thoughts our idea?

prime hearth
#

@strange portal oh sorry, it just my question is different, i was wondering about if ML certificate are really worth it when it comes down to internships

#

because i already have projects to showcase that i know ML, but there are other interns who have coursea certifate on ML

strange portal
prime hearth
#

but yeah thats cool to see, you can share git repo here

strange portal
strange portal
grave frost
grave frost
serene scaffold
iron basalt
#

(And I recommend you don't and instead start out with simple tabular implementations)

iron basalt
# strange portal ok, thanks

The book I linked is written by the people that came up with the stuff in the first place (RL, in its modern form).

#

It's not a very hard read, although it does require some math.

quiet vault
#

I am using keras and it is training models on my cpu. Is there a way to make it use my gpu?

lapis sequoia
#

spank it

desert oar
gentle epoch
#

btw, i noticed something weird

#

I've been messing with themes in vscode

#

and for some reason, vscode thinks iloc is a variable, rather than part of a function, in regards to coloring

#

it works fine

#

it just the color

ripe forge
#

Well, iloc isn't a function is it.

#

Ie. You don't use it with round brackets.

desert oar
#

what was that library that was some kind of wrapper around numba, jax, torch, and a few other python-optimizer tools?

#

transonic

#

!pypi transonic

arctic wedgeBOT
copper loom
#

im using pytorch ....i have a folder of images that i want to pass to model one by one and get outputs ? custom dataloader just seems too much to do all i want is to pass files one by one to a model

gentle epoch
tawdry lily
arctic wedgeBOT
#

Hey @lapis sequoia!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

vast elk
#

Hello!, does anyone have ideas about data analytics business in pandemic era that has social impacts?

*sorry for broken englsih, TYSM

bold timber
#

Hi, I have a question : whether Recommender System is part of Machine Learning?

desert oar
undone flare
#

So this is a classification problem, I fixed the skewed data, scaled the data but still can't get better results what else can I do?

#

also weirdly enough scaled and non scaled data give me the same results

fathom tiger
undone flare
#

Thanks for the suggestions

fathom tiger
#

That’s great having domain knowledge… you can spend time on feature engineering, it would help your models a lot. Cheers 🥂

lapis sequoia
#

Has anyone here applied ML to heavy-tail data? I have a few questions

desert oar
#

in a linear model, if you scale x up by 10, the parameter just scales down by 10 to compensate

#

how many classes do you have?

#

what are the f1 scores, precision, recall, log or brier score, etc.?

undone flare
desert oar
#

classes, not features

undone flare
#

I don't know what you mean by classes, do you mean the categories the model outputs?

desert oar
#

yes

#

those are usually called "classes"

#

the term "categories" is usually applied to features, as in "categorical features"

undone flare
#

2 (water is potable or not)

desert oar
#

oh, so it's binary

#

that's easier then

undone flare
#

yea I think the features are making it harder

desert oar
#

why do you think that?

undone flare
#

tried many models with different hyper params but still can't get the mean score above a certain point

desert oar
#

are you expecting higher accuracy? sometimes data is noisy or the features just aren't that tightly related to the outcome

undone flare
#

features just aren't that tightly related to the outcome
I believe that's the case

desert oar
#

are these ppm values?

#

not that it matters, maybe there isn't any interesting feature engineering to be done here

undone flare
#

different units ppm, μg/L and mg/L

undone flare
desert oar
#

how is potability determined?

#

70% out-of-sample accuracy based on those 8 of features actually sounds kind of good, but i'm not a water treatment expert either

undone flare
#

mtft is the standard I think

#

also I think hard water doesn't really make drinking water unsafe

weary summit
#

Hi
I have two numpy arrays, data which has (3,n) dim points which has (3, m) dim

Until now, there was a for loop over n which produced the following output:
for i in range(n):
diff = data[:,i,newaxis] - points # shape is 3,m

meaning, subtract each column n times.
I want to that in a vectorized fashion, that the result would be in shape n,3,m

How can I achieve that?

tulip girder
#

Minecraft ai bot possible?

tidal bough
#

Oh, you can probably use einsum.
EDIT: ah, sadly no, doesn't support subtraction

#

you want res[i,j,k] = data[i,j] - points[i,k], I believe, where i is in range(3), j in range(n), k in range(m).

I think you might be able to broadcast data and points both to the output shape and subtract them like that.

wispy bay
#

hello! so i'm trying to make a speech recognition ai (kind of) using the google speech recognition module and it keeps showing this error: Traceback (most recent call last): File "C:\Users\rorop\Desktop\ai.py", line 10, in <module> text=r.recognize_google(audio_data) File "C:\Users\rorop\Desktop\speech_recognition\__init__.py", line 822, in recognize_google assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data" AssertionError: ``audio_data`` must be audio data

#

here's my code: ` import speech_recognition as sr
from speech_recognition import AudioFile

r=sr.Recognizer()

audio=AudioFile('vs.wav')

audio_data=r.record
type(audio_data)
text=r.recognize_google(audio_data)
print(text)`

#

Can anybody help me please?

#

Thank you

#

I've been trying to fix this for days

#

I'm also quite new to python

#

I hope someone helps soon! 🙂

bold timber
#

What is different of splitting with train_test_split and Cross fold validation?

polar rapids
#

Hi. I want to learn data science. Where to start? I know Python up to average.
Of course, a basic question is what does a data scientist do?

PS:
I hope I have not violated the rules of society regarding my questions :)

copper loom
#

Does anyone have idea about onnx operators ?

late shell
#

Hey, there's a dataset I'd like to use, but It's 900+ mb, I don't want it on my local disk. Is there a way to use the dataset for CNN training without having to download it?

undone flare
#

uh oh

#

you can just create a kaggle notebook and add the data if you don't want to download the data set

shadow frigate
#

Hello 👋 in Pytorch, can I somehow measure a loss on a subset of a tensor? say I have

pred=[1,2,3,0]
labels=[1,1,3,nan]

Is it possible to tell the loss to only consider the first 3 values, without modifying the tensors? Possibly by passing a mask for those values to ignore ( mask=[1,1,1,0] in the example). I can't filter out the samples to ignore before sending them through the NN because I'd have to modify a large part of the model to do so.

#

I didn't think of this situation when I prepared the model blobDerp

bold timber
#

Hi, I have a question, how to evaluate RecommenderSystem?

lapis sequoia
#

What kind of recommender system?

wheat spire
#

Hi there, I am starting to prepare data science seriously from the beginning. And am gonna complete whole data science within 6 to 12 months. If there is anyone who would like me join me, it would be great as we can study together. 🙂

flat hollow
#

what are you going to use as learning resource?

wheat spire
#

I have enrolled a course in a website

#

Also, as usual Youtube has a lot of content

undone flare
#

I have one hot encoded data and I want to revert this to just digits as in an 1D array with digits [0, 0, 0, 2, 5, 9...]

#

how can I achieve this?

#

do I check if the value is one and return it? I think there is probably more efficient way

late shell
late shell
undone flare
#

so I made it a dataframe and it looks like one hot encoded

umbral ferry
#

In xgboost, how is it determining which feature will be the root of the tree? I have a few continuous variables and many categorical which I one hot encoded

desert oar
#

there are other split-finding algorithms implemented in xgboost, but the others are all just approximations to the exact algorithm

umbral ferry
#

like Gini impurity or gain? or is that unrelated to determine which to split at

desert oar
#

think of a node in a decision tree as containing data points, not as containing a split on a feature. the edges between nodes are the splits.

#

yep, that's it. the algorithm just finds the split point with the greatest gain for each feature, then splits on the best feature

umbral ferry
#

here's the first tree of my model

undone flare
#

I keep getting

ValueError: Expected 2D array, got 1D array instead:
array=[6 9 3 9 0 5 8 2 5 9 4 9 7 1 3 3 0 5 0 7 0 8 3 6 9 2 7 3 5 9 8 5 4 6 4 6 3
 1 9 2 7 7 3 1 1 2 0 7 8 9 1 9 6 2 1 0 6 8 2 8 8 7 2 7 5 9 2 3 6 4 1 1 5 7
 4 9 9 4 3 8 8 9 2 0 9 0 0 4 1 5 5 4 7 4 7 4 2 2 8 7 2 0 9 0 2 1 7 8 8 7 2
 8 3 3 2 2 6 1 5 5 5 0 1 5 8 2 6 5 1 0 3 1 9 9 8 3 8 9 2 2 2 6 2 6 6 1 6 2
 5 4 9 2 1 2 6 2 6 6 1 1 7 5 9 8 6 2 4 7 6 9 8 7 2 9 1 6 7 6 0 6 1 7 4 8 4
 3 2 2 4 2 8 6 8 3 2 0 8 8 8 5 4 7 0 8 2 4 2].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
```can anyone tell why? I tried many things but still end up with this
umbral ferry
#

ahhh so it's gain, ok

undone flare
#

shapes

umbral ferry
#

and it will compare gains of various features, and if it's a continuous features, it will find which split in the continuous feature has the highest gain

#

and use the largest gain as splitting point

desert oar
# umbral ferry here's the first tree of my model

i think that visualization is just a little weird. unless i'm really badly misunderstanding something, NTON<585 is supposed to be the splitting criterion after the first node. it is not "the" first node as such.

umbral ferry
#

the first node would contain all the data right, I'd call that the 0th node, so it decided to split using NTON at the first

#

and then it's the same for further nodes, where it will compare gain from each feature using the new subset of data points?

undone flare
#

if I make y a list then also it gives the same thing

umbral ferry
#

on and on until it reaches max depth, reaches minimum data points in the node, or gets pruned based on what regularization I use

desert oar
#

.values is deprecated

undone flare
desert oar
#

that's... weird

#

what does this df look like

#

oh, the one-hot encoded data you posted in the screenshot above?

undone flare
#

I have two npy files

undone flare
#

X had 3 dimensions so I reduced it to two by X.reshape(2062, 64*64)

undone flare
grave frost
undone flare
#

nope npy files

grave frost
#

tried just converting it to 2d

#

its literally what the error says

undone flare
#

y doesn't need to be 2D does it?

#

(2062,) is the dimension

grave frost
#

oh no my bad

#

y can be 1d

desert oar
#

LogisticRegression y should only be 1d actually

undone flare
#

yea

grave frost
#

x has to be 2d

undone flare
#

I don't know why it's giving me this error

desert oar
#

i don't think they even support multiclass-onehot or multilabel

grave frost
#

because its an image

undone flare
#

I tried doing it with digits dataset of sklearn and it worked but doesn't wanna work with this data

grave frost
#

BTW what is there in the npy files? why dont they give images in jpegs?

desert oar
#

X should probably have shape like (n_images, image_height * image_width), no?

grave frost
desert oar
#

it looks like they're flattening it to 2d

grave frost
#

(img_height, img_widht, channel)

undone flare
grave frost
#

check your vars

undone flare
#

reshape

X = X.reshape(2062, 64 * 64)
X.shape
#

This is the image

#

this is when X has 3 dimensions but you can't provide array with 3 dims to LogisticRegression

lapis sequoia
#

Hi ! What prerequisites do I need to get started in ml ?

umbral ferry
#

Good knowledge of pandas will really help

#

other than that it's just curiosity and decent reading comprehension lol

grave frost
#

because that's not what you are passing

undone flare
grave frost
# undone flare .

clf1 = LogisticRegression(random_state=42).fit(X_train, y_train.values.ravel())
y_pred1 = clf1.predict(X_test)
clf1.score(y_test, y_pred1)
you are obviously doing some more processing, since you don't pass X?

desert oar
#

curious what's considered "good" out-of-sample accuracy on this problem, i assume it's in the high 90s

undone flare
#

What was I doing wrong tho? The dimensions were wrong?

desert oar
#

not sure. take a look at how i did it maybe and compare

raw temple
#

Hi, I'm looking at coding to implement an AR model for time series and in this image, can anyone tell me what does the -100 signify in the code
train_data = df['Consumption'][:len(df)-100]

desert oar
#

indexing or slicing with a negative number means "count from the end"

undone flare
desert oar
#

!eval @raw temple```python
import numpy as np
import pandas as pd

x_py = list(range(10))
x_np = np.array(x_py)
x_pd = pd.Series(x_np, index=list('abcdefghij'))

print(x_py[:-3])
print()
print(x_np[:-3])
print()
print(x_pd.iloc[:-3])

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | [0, 1, 2, 3, 4, 5, 6]
002 | 
003 | [0 1 2 3 4 5 6]
004 | 
005 | a    0
006 | b    1
007 | c    2
008 | d    3
009 | e    4
010 | f    5
011 | g    6
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/vorubegiya.txt?noredirect

desert oar
undone flare
#

Hmm I have the same shapes

desert oar
#
In [95]: x.iloc[:10,:5]
Out[95]:
pixel           0,0       0,1       0,2       0,3       0,4
image_num
0          0.466667  0.474510  0.478431  0.482353  0.486275
1          0.596078  0.607843  0.619608  0.631373  0.643137
2          0.588235  0.603922  0.619608  0.631373  0.643137
3          0.556863  0.568627  0.584314  0.600000  0.611765
4          0.580392  0.576471  0.592157  0.607843  0.615686
5          0.517647  0.529412  0.552941  0.615686  0.635294
6          0.427451  0.439216  0.454902  0.474510  0.490196
7          0.564706  0.576471  0.588235  0.603922  0.615686
8          0.498039  0.509804  0.521569  0.537255  0.549020
9          0.501961  0.517647  0.533333  0.545098  0.564706
undone flare
#

Yea same

#

I will try again tomorrow thanks for the help

raw temple
#

@desert oar thanks for your help. so if I only have 55 points of data, then would I just change the value to 50?

#

or maybe -15? as I want all points included until the last 15?

#

is that what it means?

desert oar
raw temple
rigid zodiac
#

Hi everyone, pretty random question on the neural network. I saw a sample code on the towardDataScience they have model.add(Dense(64, activation=tf.nn.relu, kernel_initializer='uniform', input_dim = input_dim)) # fully-connected layer with 64 hidden units Where 64 is the number of layers.

Why do they choose 64 layers? is it hurt if I choose more layers than that?

unborn glacier
#

I think you mean the number of nodes?

chilly geyser
#

64 is definitely the number of nodes

unborn glacier
#

There are general rules on how to determine the number of nodes, and layers. Here's an explanation:
https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

chilly geyser
rigid zodiac
#

thanks yall

lusty stag
#

how do you guys define class imbalance?
I have 10 classes of observations
max class has 1100 observations
min class has 750 observations
should I consider oversampling or keep it as it is?

fallen trellis
#

I want to plot 2 classes with barely any variance but a big interclass difference

#

Any way to do this? Boxplots fail miserably

desert oar
#

what are you trying to show? maybe you just want a table

fallen trellis
#

Loading times

desert oar
fallen trellis
#

left with precompilation, right without

#

A table could work but Id like to have a nice looking figure

#

and a logarithmic axis is definitely not nice, sadly

desert oar
#

make 3 plots:

  1. average loading time as a bar chart with some kind of error bar showing that the errors are small relative to the difference between groups
    2-3) kernel density plot or histogram for each group, separately, or faceted together so you can compare the distributions without worrying about scale
fallen trellis
#

I feel like I should just summarize the results textually..

desert oar
#

and yeah, it's almost never bad to include a table with some combination of mean, std dev, median, min, max, 25%, 75%, 10%, 90%

#

(in general i wouldn't recommend using tuples for "array-like" things)

#

!eval ```python
import pandas as pd

data = pd.DataFrame({
'without_precomp': [90, 91, 92],
'with_precomp': [10, 11, 12],
})

def p25(x):
return x.quantile(0.25)

def p75(x):
return x.quantile(0.75)

table = data.agg([
'mean',
'std',
'min',
p25,
'median',
p75,
'max',
]).transpose()

print(table)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |                  mean  std   min   p25  median   p75   max
002 | without_precomp  91.0  1.0  90.0  90.5    91.0  91.5  92.0
003 | with_precomp     11.0  1.0  10.0  10.5    11.0  11.5  12.0
fallen trellis
#

great

#

thanks you, as always

#

Ill see to get that stuff visualized, as well

molten hamlet
#

Hey guys, I stumbled upon this problem, I use DeepQ algorithm (reinforced learning) and my output as you see (actions) are same for every input, which is very bad. Can some help me identify what is the cause? or name the problem im facing? Is this due to learning rate? Loss function? to small model? I used huber, and mae loss functions, but nothing works in long run...

thorny willow
#

does anyone know a doc or guide that explains the installation process of keras and tensorflow in anaconda on linux? I'm facing a few problems

#

or, if you're willing, provide all the lines of code here?

fallen trellis
#

@desert oar sorry for the ping, but regarding my loading-time testing:

With the std calculated, can I use e.g., chi² to say that "std is not relevant so the difference between mean_with and mean_without is equal to the the compilation time"?

desert oar
fallen trellis
#

Ok, just want to make sure my prof doesnt bicker at me

#

But I think thats a valid argument

rigid zodiac
#

have anyone do RNN, CNN or ConLSTM? I need some help of how to set it up. I got my optimal gridsearch and epoch but idk how to use those deep learning model

agile jolt
#

can someone explain this 2 rows

molten hamlet
agile jolt
#

hahaha, thanks but i meant like.. what does data = [trace] and line 32 mean?

slow vigil
#

Anyone here have a lot of experience with Spark?

velvet thorn
slow vigil
#

Well I have to pull a lot of data in from an api, and I have to use a ton of separate requests due to the API limitations. I want to write all the data to a parquet file, and I think the way to do that is each time I do an api call I append the returned data to a spark dataframe that gets written into a parquet file at the end. I'm new to spark so I'm pretty sure I should be using partitions to write the file since the dataframe will be too large to fit in memory. I guess I'm just wondering if I have the idea right, and also wondering if there is a fast way to make concurrent api calls with Spark since I know it's basically designed for big data ingestion

#

Everything I can find is all about doing one api call or working with one JSON file that already exists

#

So I'm having trouble visualizing how the pieces go together in my situation

#

I have to pay for the API access so I'm trying to get this all sorted out without having access to the data

velvet thorn
#

in general

#

for that kind of stuff

#

you want to do async IO

#

as for the partitioning...

#

Spark will do that for you

#

(basically)

#

do you have a cluster or what?

slow vigil
#

No it's all local, but I have probably about 20-30 gigs of data at least

#

and I will probably move it to a cloud service eventually

#

So am I right about just adding everything to a dataframe?

#

Should I even be using Spark? I do want to use the parquet format that's why I'm looking at it

#

Like I just don't understand how it goes about it. In my head I picture the data getting added to the dataframe as it comes in after each api call, then once the dataframe reaches a certain size it writes it to a parquet file... and then what? it basically starts a new dataframe and once that reaches the same size it appends it to the same parquet file?

#

It's stock data, so I have to loop over a list of tickers and do multiple api calls for each ticker, and there are thousands of tickers.

velvet thorn
#

like dask

velvet thorn
#

when it comes in?

slow vigil
#

json

velvet thorn
#

I would say

#

store it in memory first

#

when you hit a certain limit

#

write that to disk

#

then

#

you'll have multiple dataframes

#

once you're done with the API

#

concatenate them

#

ALTERNATIVELY

#

you can use spark-streaming

#

I haven't

#

but I'm fairly sure

#

it would work here?

slow vigil
#

interesting

#

This looks similar to websockets or something

#

Thank you for your help

quiet vault
#

Would Cuda version 11.4 work with the newest version of Tensorflow?

quiet vault
#

windows

#

the website says 11.2

#

but i cant get it

#

because the version is too old for my gpu

#

so im asking if anyone has tried using 11.4

serene scaffold
quiet vault
#

no

#

im just asking before i download

#

wait

#

wait

#

no

#

i read it wrong

#

yes

serene scaffold
serene scaffold
#

yes, you did get an error message? if so, show.

quiet vault
#

i read ur message wrong

#

i did get an error

serene scaffold
#

if you're asking for help that's in any way related to an error message, always show the error message.

quiet vault
#

will do in the future, sorry about that

serene scaffold
#

no problem

quiet vault
serene scaffold
#

Please do text next time.

quiet vault
#

i cant copy paste it

serene scaffold
#

Anyway, try installing and running tensorflow and see what happens.

quiet vault
#

but it hasnt installed yet

serene scaffold
#

did you try to install it?

quiet vault
#

i cant

serene scaffold
#

what happened when you tried?

quiet vault
#

i cant

#

it doesnt give me an option

#

look at the pic

serene scaffold
#

that's not how you install tensorflow

quiet vault
#

i have tensorflow installed. im talking about cuda

#

i cant install cuda because of the error message

serene scaffold
#

try doing something with tensorflow so we see what error message you get from tensorflow.

#

like python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

quiet vault
#

I did not get a code breaking error

#

But I got the usual warnings

serene scaffold
#

can you show the warnings?

quiet vault
#

2021-08-11 19:50:14.363291: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-08-11 19:50:14.364295: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2021-08-11 19:50:14.365360: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2021-08-11 19:50:14.366408: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2021-08-11 19:50:14.367435: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2021-08-11 19:50:14.368440: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found
2021-08-11 19:50:14.369441: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2021-08-11 19:50:14.370436: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-08-11 19:50:14.370627: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-08-11 19:50:14.473931: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

serene scaffold
#

thank you

#

can you do pip freeze | grep tensorflow

quiet vault
#

grep : The term 'grep' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:14

  • pip freeze | grep tensorflow
  •          ~~~~
    
    • CategoryInfo : ObjectNotFound: (grep:String) [], CommandNotFoundException
    • FullyQualifiedErrorId : CommandNotFoundException
quiet vault
serene scaffold
#

yes but I guess windows doesn't have grep

#

do you know the three-number version number of tensorflow that you have?

quiet vault
#

yeah, ive never heard of grep before

serene scaffold
#

should be like x.y.z

serene scaffold
#

if you do pip freeze I guess you can just look for tensorflow

quiet vault
#

2.6.0

serene scaffold
#

great, let me see

#

@quiet vault can you use 2.5?

ebon walrus
#

@quiet vault

quiet vault
#

bruh

serene scaffold
#

!mute 314448333739524096 investigating

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @ebon walrus until <t:1628733351:f> (59 minutes and 59 seconds).

quiet vault
quiet vault
serene scaffold
#

which I realize doesn't solve your problem

quiet vault
#

yeah

#

i just want to get cuda in the first place

#

should i just try to use 2.6.0 tensorflow with 11.4 cuda

#

see what happens

serene scaffold
quiet vault
#

no

#

i was asking if someone had tried it

#

before installing

serene scaffold
#

it's unlikely that anyone will have tried what you did with the exact same OS, tensorflow version, cuda version, and remember that they did it with those exact versions.

quiet vault
#

true

#

well

#

im gonna be the first

#

epic

serene scaffold
#

Truly a pioneer joe_salute

quiet vault
#

Well

#

My power went out

#

Right before installation finished

#

Hope that didn’t fuck anything up

serene scaffold
#

I mean I suppose there are worse cases but suffice to say you won't have any data loss.

quiet vault
#

Yeah

#

Still not back tho 😦

sour spindle
#

How is this for my first time making a AI stock predictor?

unborn glacier
#

Did you train it on that data???

#

Or is that data that it's never seen before?

silk rune
#

or better yet, how did you train it lol

sour spindle
unborn glacier
#

K what's your bitcoin wallet I'd like to buy your code

sour spindle
sour spindle
unborn glacier
#

Lol I'm kidding anyway

sour spindle
#

Oh

#

Ok

unborn glacier
#

But yeah if that's legit you should continue testing and use it if it works into the future

silk rune
sour spindle
#

Yeah it was a hassel and it took me like 5 days to make

unborn glacier
#

5 days only??

sour spindle
#

Yeah

silk rune
#

so is it based on training or does it look at parameters and make predictions as time goes on

sour spindle
#

It takes some closing prices from the present closing price and the closing 2 days before and then predicts the next days price

silk rune
#

ooo

sour spindle
#

I just need to make it scalable for some bots i will make

silk rune
#

yeah, it looks promising already so thats awesome

sour spindle
#

Thanks.

unborn glacier
#

So what's apple gonna be tomorrow?

sour spindle
#

I havent tested it on present data yet because i am sceptical of my code

quiet vault
unborn glacier
#

Yeah good to have a skeptical attitude haha

sour spindle
quiet vault
#

ah

#

nice job

sour spindle
#

Thanks

unborn glacier
#

When you say 60 parameters do you mean stuff like the weather and other stock data and things?

sour spindle
#

Just stock data only

#

I might also make an article reader and add it in

quiet vault
#

one more question from me

sour spindle
#

Ok

quiet vault
#

How did u make this graph? Did you use walk forward validation?

#

two questions i guess

sour spindle
#

??

unborn glacier
#

And you're sure you haven't fed it like the next days google stock price to predict the same day's apple price? (I.e. giving it future data)

sour spindle
#

No

#

The testing data is from 2018 to 2020

quiet vault
#

dog in the fog be getting interrogated rn

silk rune
#

lol

unborn glacier
#

I mean if someone makes an accurate stock predictor that's like a billion dollar tool so...

#

I worked at a finance company and they basically laughed at the idea that was even possible

sour spindle
#

I just need to edit something in my code one sec and i will come back with a new graph. Just to make sure

quiet vault
#

Just by the way, I was working on something similar and I thought I had the perfect model with these results:

#

but

#

i realized that tensorflow was reusing backend graphs which meant it was cheating kind of

sour spindle
#

I have no idea what i am looking at. I just watched some tutorials and then made this from scratch

quiet vault
#

its the results of some testing

#

u know what

#

nvm

unborn glacier
quiet vault
#

yeah

#

well for me this was not surprising

#

i was using a basic neural network

#

just like 2 dense layers

#

well

serene scaffold
#

@quiet vault did you figure out your thing?

quiet vault
#

ah

#

kind of

serene scaffold
quiet vault
#

i came here originally to ask a question but i got distracted

#

the power came back

#

and then i did everything and i think it work

serene scaffold
#

You had the power all along bb

quiet vault
serene scaffold
#

No I mean
You have the power.
Like as a person

quiet vault
#

oh

#

lol

#

anyway

#

2021-08-11 21:49:13.705324: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-11 21:49:14.151069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3993 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5
2021-08-11 21:49:14.333000: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-11 21:49:15.356801: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8202

#

this comes up when i import tensorflow

#

is everything ok

serene scaffold
#

Oh I'm going to sleep but hopefully someone knows

quiet vault
#

nonono

#

pls

unborn glacier
#

If you google that error "None of the MLIR Optimization Passes are enabled" it says it's fine

#

So...

quiet vault
#

aight

#

epic

serene scaffold
quiet vault
#

This means that i am the first person to make tensorflow 2.6.0 work with cuda 11.4

sour spindle
#

This is gonna take awhile to train

unborn glacier
#

Are you using a local GPU or colab or AWS or something?

sour spindle
#

Colab

sudden canyon
#

Statistics question here. I was thinking about architecting a code running/code contest system where students submit solutions to various problems, and I wanted to calculate how many cores I'd need to allocate for a particular contest.

Let's say that I have S students solving problems concurrently. This is a perfect system, so each student's workflow is like this: they are working on the code for I seconds, then they submit the code for testing and wait for the testing system to run all the test. After the student gets the results, they start the next iteration and so on.

Each problem has M tests in it that it runs (a test consists of running a program with a certain input and checking that the output matches the predefined one). Each test takes T_avg on average and T_worst in the worst case to run (there's a hard upper limit, but many submissions will run very quickly). I have C cores at my disposal, in other words I can run C total tests in parallel.

If I'm willing to accept that students will wait for X seconds for the results of the test, how many cores (C) do I need?

unborn glacier
#

Why not simulate it to get a pretty solid estimation

#

I mean it's definitely a solvable stats problem, but it's pretty trivial to run a simulation

undone flare
#
ValueError: Expected 2D array, got 1D array instead:

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
undone flare
#

However it works on your model

#

like

base_model = LogisticRegression(max_iter=100)
legacy_random_state = np.random.RandomState()
tune_model = HalvingRandomSearchCV(
    base_model,
    param_distributions={
        "C": scipy.stats.expon(scale=100),
        "class_weight": ["balanced", None],
    },
    cv=3,
    n_jobs=3,
    random_state=legacy_random_state,
    verbose=1,
)

tune_model.fit(X_train, y_train)
pred_train = tune_model.predict(X_train)
score_train = tune_model.score(X_train, y_train)

pred_test = tune_model.predict(X_test)
score_test = tune_model.score(X_test, y_test)

print("Train:", score_train)
print("Test:", score_test)
#

so I don't even know what I am doing wrong

unborn glacier
# sudden canyon Statistics question here. I was thinking about architecting a code running/code ...
import random
import scipy.stats as stats
from collections import Counter


S = 100 #students
I = 200 #time per problem
sI = 30 #std.dev of I
maxI = 3600 #maximum time
minI = 60 #minimum time
M = 10 #number of problems
T = 3 #typical time
Tf = 30 #timeout time
pTf = 3 #percent chance the student's code times out

a, b = minI, maxI
mu, sigma = I, sI
dist = stats.truncnorm((a - mu) / sigma, (b - mu) / sigma, loc=mu, scale=sigma)

max_cores = []
for i in range(100):
    all_students = []
    for student in range(S):
        in_use = []
        for problem in range(M):
            problem_time = T
            if random.randint(0,100)<pTf:
                problem_time = Tf
            solve_time = int(dist.rvs())
            if in_use==[]:
                in_use+=(list(range(solve_time,solve_time+problem_time)))
            else:
                in_use+=(list(range(in_use[-1]+solve_time,in_use[-1]+solve_time+problem_time)))
        all_students+=in_use
    max_cores.append(Counter(all_students).most_common(1)[0][1])

print(max_cores)
#

PAINFULLY slow and not optimized, but it works!

sudden canyon
#

downloading scipy

unborn glacier
#

That gives the max cores to never have a collision, if you have a queue or something it would change the complexity

#

Also I think there is a way to have more python instances than cores, just that everyone's code would slow down a bit

undone flare
#

100 cases tho rite? ._.

sudden canyon
#

well, there's no reason to run 7 processes on 4 cores, right

#

(in this context)

unborn glacier
#

Well it's better to be sharing a core with someone in an infinite while loop than to be queuing behind them

sudden canyon
#

but it will have the same throughput, or maybe a lower throughput, right?

unborn glacier
#

Yes

#

But like if 10 people use up all 10 available cores, for example, and are all running while loops that last forever, I'll never get the chance to run something as simple as print("hello")

#

But if I have an available thread on a shared cpu, then my print("hello") will take one nanosecond longer to run, but I won't have to wait forever to start it

sudden canyon
#

the issue is, I'll have several machines, not one machine

unborn glacier
#

Yeah, I'm not really sure how you'd route traffic effectively

sudden canyon
#

The numbers are something like:

maximum test time = 5s
test cases per problem = 20
students = 1000 * N
unborn glacier
#

1000 * N ? Like you have multiple thousands of students?

sudden canyon
#

yeah, that's the theoretical idea

#

I'm not making anything practical yet, just wondering how many servers that would need

unborn glacier
#

So now the question becomes, how many cores do you need to run my poorly optimized code for that many students

#

Lol

sudden canyon
#

write another simulation to find out

unborn glacier
#

Haha

#

Honestly you might be able to generalize the simulation results to an approximation

sudden canyon
#

Not sure if my logic is correct, but in a perfect world```
N = number of students
i = iteration time
t = time waiting in queue
c = number of cores
r = time to run a test
M = tests per problem

The system can process `capacity = c/r` tests per second, while students can submit `throughput = N * M / (i + t)` problems per second
`c/r = N * M / (i + t)`
`c = N * M * r / (i + t)`
So if 
```py
N = 1000
i = 300  # 5 minutes
t = 10   # 10 s wait on average
r = 5    # 5 s per test
M = 20

then I need 323 cores (ouch)

inland zephyr
#

i want to ask about keras loss function. There is a categorical cross-entropy and sparse categorical cross-entropy. What is the different on both of them, and what suitable function i need to use if the data are slightly imbalance (2:1)?

unborn glacier
#

Was 323 what you calculated theoretically?

sudden canyon
#

yeah, from these numbers on my ""model""

unborn glacier
#

The simulation predicts a very similar number, so your theory is correct!

sudden canyon
unborn glacier
#

-> [310, 293, 280, 293, 318, 309, 294, 323, 297, 301]

#

Weird it actually has 323 as one of the values

unborn glacier
# inland zephyr i want to ask about keras loss function. There is a categorical cross-entropy an...
odd falcon
#

why is it giving such an error? can someone help me?

austere swift
#

what is the line where that error shows up?

iron basalt
# inland zephyr i want to ask about keras loss function. There is a categorical cross-entropy an...
This vector (a) is dense:  [0.1, 0.5, 0.3, 1.0, 0.8, 0.6, 0.1, 0.7]
This vector (b) is sparse: [0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.0, 1.0]
The vector b can be compressed. It can instead be stored it like this:
nnz = [0.4, 1.0]
indices = [4, 7]
Where "nnz" is the non-zero values, and indices is the indices of those non-zero values.
Consider adding b to a (changing a in-place). When using the non-compressed form of b, there are n iterations, where n is the length of the two vectors (8 iterations).
Now consider adding b to a, but this time using the compressed form of b. Adding the zero values from b to a is pointless as it leaves those values unchanged.
So there are m iterations where m is the number of non-zero values in b (2 iterations). This provides a very large speedup with large enough sizes and if b is sparse enough (e.g. 1% non-zero).
In addition, since only the non-zero values of b are stored and b has mostly zero values, there is a large reduction is memory usage for b.
#
Now consider having categories of color: red, blue, green.
Each category can be stored as one-hot encoding:
red = [1, 0, 0]
blue = [0, 1, 0]
green = [0, 0, 1]
Each of these can be stored in a compressed form. While for only 3 categories and batch size 1 this does not provide much benefit, for larger sizes there are gains to be had.
inland zephyr
#

my class are coded 0 and 1 but the proportion is 2:1 for 0 and 1 class. I using the sparse with 2 output at the dense. When i try to use sparse, the accuracy are higher compare with dense one. Of course, in dense i only use 1 output and at the sparse are 2

iron basalt
inland zephyr
#

Its binary but i hot encode my class as 0 and 1 (in integer)

iron basalt
#

If it's binary classification there is no need for one-hot encoding.

#

It's just one output, true or false

inland zephyr
#

I.m sorry to clarify the last statement, I mean from the source data, the csv dataset, the class is set on 0 and 1 in integer (although class 0 and 1 are come from the different csv file)

iron basalt
#

So you have multiple classes and you have one-hot encoded each?

inland zephyr
#

yup

iron basalt
#

what is this "proportion" then?

inland zephyr
#

2:1

iron basalt
#

of what

inland zephyr
#

2 from class 0 and 1 from class 1

#

so if theres 14 class 0 so there would be 12 for class 1

iron basalt
#

I thought there was more than 2 classes.

inland zephyr
#

so the case is there is normal condition data which labeled as 1, and sick one as 0. There are 2 rows different between class 0 and 1 which class 0 bigger than 1 at proportion

iron basalt
#

What do you mean by class? Category? Sub-category? Or something else.

inland zephyr
#

its category

iron basalt
#

So the proportion is the amount of each class in the dataset (labelled)?

inland zephyr
#

there is only one dataset (combine of class 0 and 1 in separated CSV), which has 40 class 0 or sick and 36 class 1 or healthy

iron basalt
#

So it's binary classification.

#

Two categories, sick and not sick.

#

True/False

#

Use binary cross-entropy loss.

inland zephyr
#

thanks @iron basalt now i know how to do. But there is small issue... about the net i used and the class to feed the CNN. I read since i have two class, i need to define my net

            model = Sequential([
            InputLayer(input_shape=(f_leng,1)),
            Conv1D(filters= 128,kernel_size=3,activation='relu'),
            Conv1D(filters=128,kernel_size=5,activation='relu'),
            MaxPool1D(pool_size=10),
            Conv1D(filters= 256,kernel_size=3,activation='relu'),
            Conv1D(filters=256,kernel_size=5,activation='relu'),
            MaxPool1D(pool_size=2),
            Dropout(rate=0.3),
            Flatten(),
            Dense(2,activation='softmax')
        ])
            model.compile(loss = tf.keras.losses.BinaryCrossentropy(),optimizer='Adam', metrics=["accuracy"])

like this since when i use Dense(1,activation='softmax')giving bad result (always fail to classifiy the 2nd class. But when i feed the class with
trainY = trainY.reshape(trainY.shape[0], 1, 1) since trainY is 1D array
always give error ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1)). This error not happen when i used sparse categorical

austere swift
#

i'm assuming from the error that you have index labels so you need to use sparse categorical

austere swift
austere swift
inland zephyr
#

and my class is 0 and 1... which arranged in 1D array

austere swift
#

one hot would be [1, 0] or [0, 1]

inland zephyr
#

ow so thats the problem

#

is it okay if using sparse categorical
for index labels things?

iron basalt
austere swift
#

you can't use categorical with index labels anyways

iron basalt
#

You can either have 2 outputs with softmax, or 1 output with sigmoid, both should work. I prefer the sigmoid route because it's less computation being done.

#

For the 2 output softmax version, sparse can be used but you won't really gain anything from it.

#

As for keras API specific stuff, idk, I have not used it in a long time.

austere swift
#

yeah i was discussing it from the point of view that you don't wanna change the model, if you want to change the model to use sigmoid then you can change the loss function as well

iron basalt
#

To explain the difference between the two, with softmax you would get something like [0.3, 0.7] as output (probability of each class), and with sigmoid you would just get output of like [0.7]

#

You know the probability of the other class is just 1.0 - 0.7

austere swift
#

in most cases people usually go the sigmoid route for binary stuff and softmax for more than that

inland zephyr
#

so for binary one, its pretty costly in performance with softmax since the sigmoid one are enough

iron basalt
#

Softmax is for when you have 3 or more because then it can give you something like [0.1, 0.2, 0.7]

austere swift
#

^

iron basalt
#

Sigmoid does not work for that

austere swift
iron basalt
#

Yeah, but when I say work, I mean also work well. A lot of things "work" in ML

austere swift
#

yeah

iron basalt
#

Technically you could have 3 sigmoid outputs and get away with it on simple stuff.

austere swift
#

because softmax will always have the results add up to 1, while sigmoid they will not

odd falcon
austere swift
#

so softmax will be like the probabilities of each class, which will add up to 1

odd falcon
#

it's define function train model

austere swift
inland zephyr
iron basalt
#

The classes are 0 and 1 so they are directly the targets in the case of using 1 sigmoid output.

#

They act as the probabilities

inland zephyr
#

and now it works will waiting for the result...

iron basalt
#

probability of sick is 1 or 0

inland zephyr
#

the proba for sick is 0 and health is 1
so far its work with sigmoid and sparse one

#

actually in this project i combine signal processing since the source data is an ECG record. Why i store it as CSV because its the proper way to represent the data

inner pebble
#

Hello everyone,
I have a question regarding regex.
I d like to check if a variable follows a certain pattern before mutating it from str to date.
The pattern is this one:
2022-05-20 13:21:29

I can t succeed in writting the necessary regex to check it.
How would you proceed?
Thanks.

#

I tried this:
\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}

#

oups nevermind the problem was not coming from my regex it works 👍

velvet thorn
inner pebble
#

Because I have some unappropriate string which are not following the correct format

#

I realised it when attempting to mutate str to date

#

but Wanted to extract a list of all the non appropriate formats

#

Do you know a better way to proceed?

velvet thorn
#

with try-except

inner pebble
#

I don't know this, I m gonna check it 🙂 thanks

#

This is great, how if I use assert instead of a try-except? Would I see the incorrect format appearing?

chilly geyser
# sudden canyon Statistics question here. I was thinking about architecting a code running/code ...

Not sure if I should ping, but pinging anyway.
You can try
https://en.wikipedia.org/wiki/M/M/c_queue
If you assume times are Markovian,
else
https://en.wikipedia.org/wiki/M/G/k_queue
(which is less likely you get analytical solutions)
There's also G/M/k queues, which should relate to M/G/k queues (but don't ask me how)

G/G/k sounds very hopeless, you might want to approximate (IIRC - heavy traffic approximation might make the problem analytically easier), or even just use the raw M/M/k queues for approximation.

In queueing theory, a discipline within the mathematical theory of probability, the M/M/c queue (or Erlang–C model) is a multi-server queueing model. In Kendall's notation it describes a system where arrivals form a single queue and are governed by a Poisson process, there are c servers, and job service times are exponentially distributed. It is...

In queueing theory, a discipline within the mathematical theory of probability, an M/G/k queue is a queue model where arrivals are Markovian (modulated by a Poisson process), service times have a General distribution and there are k servers. The model name is written in Kendall's notation, and is an extension of the M/M/c queue, where service ti...

#

The problem is kind of different given that you have finite students, but infinite students with a distribution of visiting times might still be a good idea

vital widget
#

hi um, idk if this is the right place to ask but, how much python should i learn to be able to start learning AI/ML stuff?

austere swift
#

a basic understanding should be fine, you can learn more advanced stuff as you go on

undone flare
somber prism
#

can someone explain me what is class_weight parameter in sklearn clf algorithms

#

it has options like either None or balanced

undone flare
somber prism
#

i tried it to some imbalanced dataset , after changing the parameter to 'balanced' my f1 scored lowered

undone flare
#

if you dataset is imbalanced try using technique like SMOTE

somber prism
#

yeh going to try that now

undone flare
somber prism
undone flare
#

oh binary classification okay

somber prism
undone flare
#

well if you have multiclass labels then only this parm is required

undone flare
undone flare
#

then set it to weighted

#

also want the average?

#

bcuz that is what weighted will do

vital widget
grave frost
#

you can use class_weights for n classes, be it 2 or 10

#

its useful for baselines if you don't want to do any augmentation

#

but its impact on accuracy is kinda inconsisitent

#

try removing the param and training + with param to see if there is any difference in accuracy scores (setting seed ofc) @somber prism if you don't see any major difference, use some augmentation

undone flare
grave frost
lapis sequoia
#

where do i learn data science for free

carmine tide
#

Hello! I have some data for x and y axes and I also have their errors. The thing is that I can't find a way to fit a curve that takes into consideration both x and y errors. The function curve_fit only considers y error. I also tried using odr but it doesn't give me the correct curve. I have searched a lot but I can't find anything that works, so I would appreciate some help. Thanks in advance

chilly geyser
#

How do you know it's not giving you the 'correct curve'?

late shell
#

Hello, I have a dataset of X-ray images that fall under one of the classes : covid or non-covid. The assignment requires me to perform EDA on these images. Can someone help me with this. Except plotting the mean and S.D of the pixels of each image in a scatter plot, I don't know what kind of EDA can I run on images.

carmine tide
sudden canyon
#

thanks

frozen hound
#

This is the appropriate channel for TF questions I assume?

chilly geyser
chilly geyser
#

And like, if anyone is well-experienced and willing to answer

austere loom
#

Anyone using the new VSCode notebooks?

umbral ferry
#

So I know on xgboost, you can use a few different metrics to determine feature importance, but is there a way to determine the importance of a set of features? For example, could you pick a certain feature, and then look at all the features the algo decided to split on immediately after your picked feature, and compare the gain? Like say every time it split on "Color" and then "Size", the gain from both of them is larger on average than "Color" and then "Weight". And then you'd interpret that as those two features interacting with each other somehow

desert oar
#

well, if you add the averages it won't be quite the same

#

but you can sum the total_gains for each of the features, and sum the number of splits (called weight in xgboost) for each of the features, and divide to get an average

umbral ferry
#

I'm not sure if that would have the effect I'm looking for, which is determining the relationship between two (or more) features. I'm less concerned about the absolute value, and more about for this specific feature, which feature in combination is the most important

#

isn't total_gain/weight the same as just gain?

#

gain is the average gain per split (according to docs)

undone flare
#

does anyone have a tutorial on image classification with machine learning (no CNN)

undone flare
#

thanks

chilly geyser
grave frost
undone flare
#

so I was trying to explore more

grave frost
undone flare
#

didn't wanna do CNN just yet

grave frost
#

I believe SVM with a bit of preprocessing does good

undone flare
#

I think it can be better

#

like in the 90+

grave frost
#

its too overkill - MLPs are better

undone flare
#

I am doing sign lang digits classification, is CNN overkill?

grave frost
#

no

#

I recommend you learn the basics of NNs first rather than jumping straight to CNNs

undone flare
#

I am

grave frost
#

well, you are using SVM in one and CNNs in another 🤷

undone flare
#

No?

#

ANN

grave frost
#

yes, calling it MLP's is maybe more accurate

#

"Artificial Neural Network" isnt very specific

undone flare
#

alright my bad

grave frost
#

cool - for images, CNNs are the only archs (bar some esoteric ones)

#

ViT is not something you would use unless you want to win or smthing

#

it would be overkill anyways. CNNs are always the best for images

sonic scaffold
#

Im getting a error while making a box plot for my series

#

ValueError: The number of FixedLocator locations (2), usually from a call to set_ticks, does not match the number of ticklabels (1).

#

Can someone help pls

undone flare
#

I don't know why I did mae

vague stratus
#

Also I would suggest use log_loss available in sklearn as the loss function

undone flare
#

0-9 are digits

vague stratus
undone flare
vague stratus
#

Benefits would be like lets say in test cases ther were only 30 8s and 40 9s so the 9 will be brighter than 8 even if the accuracy in both classes were the same

chilly geyser
vague stratus
# undone flare

you have normalised the pixel values which you should have, i am saying you could normalise the number of data from each class before making a confusion matrix so that it would not be confusing

undone flare
#

oh how to do that?

desert oar
#

i think PCA + SVM is the "classic" MNIST solution

#

or something like preprocessing to binarize the data

#

i think you could do something similar with sign language

#

use some kind of image processing to extract an "outline"

#

then PCA + SVM or keep using RBF

vague stratus
#

Well we are avoiding CNN so does that mean we are avoding NNs too?

desert oar
#

true a boring feedforward NN could be in play

vague stratus
#

Because I have tried using autoencoders and then SVM; it works well too

desert oar
#

"let's pretend it's 2008 again"

undone flare
#

this is what I got

vague stratus
vague stratus
desert oar
undone flare
#

nope

desert oar
#

a linear SVM wouldn't be much better than ridge regression, if at all

undone flare
#

yea

undone flare
desert oar
#

use "SVM + RBF" in the graph so you don't forget (and other people don't wonder about it)

#

and yeah gradient boosting could be a good option

desert oar
#

i'm not surprised the random forest didn't do well... feature splitting doesn't make sense on pixels, they're too "specific"

#

you need to extract bigger features like with PCA

#

then you can try something like ensembling the SVM-RBF with the PCA+RF setup

#

this is why CNNs are so cool, they learn useful features from "fine-grained" data like this

#

and why deep learning works so well on this kind of data, where the data is very "high resolution"

undone flare
#

I was going to learn about CNNs, think this is the right time?

quiet vault
#

ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (None, 12)
X.shape = (118, 12, 1)
Y.shape = (118,)
The input shape of the model is input_shape(12, 1)
Does anyone know why I am getting this error

model = Sequential()
            model.add(Conv1D(128, 7, activation='relu', input_shape=(n_steps, n_features)))
            model.add(Conv1D(128, 7, padding='same', activation='relu'))
            model.add(MaxPooling1D(pool_size=3, padding='same'))
            model.add(Conv1D(256, 5, padding='same'))
            model.add(Conv1D(256, 5, padding='same', activation='relu'))
            model.add(MaxPooling1D(pool_size=3, padding='same'))
            model.add(Conv1D(512, 3, padding='same'))
            model.add(Conv1D(512, 3, padding='same', activation='relu'))
            model.add(MaxPooling1D(pool_size=3, padding='same'))
            model.add(Conv1D(512, 1, padding='same'))
            model.add(Conv1D(512, 1, padding='same', activation='relu'))
            model.add(MaxPooling1D(pool_size=3, padding='same'))
            model.add(Flatten())
            model.add(Dense(1))
            model.add(Activation('sigmoid'))
            model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Here is the code for the model
can someone help

grave frost
unborn glacier
# quiet vault ValueError: Input 0 of layer sequential is incompatible with the layer: : expect...

You need to define the input shape with one of these methods, I think

# With explicit InputLayer.
model = tf.keras.Sequential([
  tf.keras.layers.InputLayer(input_shape=(4,)),
  tf.keras.layers.Dense(8)])
model.compile(tf.optimizers.RMSprop(0.001), loss='mse')
model.fit(np.zeros((10, 4)),
          np.ones((10, 8)))

# Without InputLayer and let the first layer to have the input_shape.
# Keras will add a input for the model behind the scene.
model = tf.keras.Sequential([
  tf.keras.layers.Dense(8, input_shape=(4,))])
model.compile(tf.optimizers.RMSprop(0.001), loss='mse')
model.fit(np.zeros((10, 4)),
          np.ones((10, 8)))

#

That's just an example not the exact code

quiet vault
#

i fixed

#

it was weird

#

thanks tho

grave frost
#

most probably you put the wrong shapes

quiet vault
#

maybe

glad mulch
#

question i have data that looks like this. When i convert it to date time its in the format Year-01-01 but i want it to be Year-12-31, how would i go about doing that

iron basalt
#

Note that MNIST is a trivial task and does not at all represent a real computer vision task.

#

Just about anything will work on it. Also some digits are miss-labeled so keep that in mind. Don't expect 100% accuracy ever.

#

Because it is so simple, it does make a for a good bug check.

#

Fashion MNIST and other datasets that use the MNIST name are more a of a real task and you will notice the large drop in accuracy.

glad mulch
#

my solution to above for anyone who cares

rigid zodiac
#

Hey guys, sorry for keep asking this type of question. so this is what I have ```c['cat'] = np.nan
for i in range(len(c)):
if (abs(c['ay'].iloc[i]) >= 50) and (abs(c['az'].iloc[i]) >= 70)and (c['ay'].iloc[i+1] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+1] < abs(c['az'].iloc[i])) and (abs(c['ay'].iloc[i+2]) < 20 ) and (abs(c['az'].iloc[i+2]) < 20) and (abs(c['ay'].iloc[i+3]) < 20 )and (abs(c['az'].iloc[i+3]) < 20) and (abs(c['ay'].iloc[i+4]) < 20 ) and (abs(c['az'].iloc[i+4]) < 20) and (abs(c['ay'].iloc[i+5]) < 20 ) and (abs(c['az'].iloc[i+5]) < 20) and (abs(c['ay'].iloc[i+6]) < 20 ) and (abs(c['az'].iloc[i+6]) < 20) and(abs(c['ay'].iloc[i+7]) < 20 ) and (abs(c['az'].iloc[i+7]) < 20):
c['cat'].iloc[i] = 1

elif (abs(c['ay'].iloc[i]) >= 50) and (abs(c['ax'].iloc[i]) >= 70) and (c['ay'].iloc[i+1] < abs(c['ay'].iloc[i])) and (c['ax'].iloc[i+1] < abs(c['ax'].iloc[i])) and (abs(c['ay'].iloc[i+2]) < 20 ) and (abs(c['ax'].iloc[i+2]) < 20) and (abs(c['ay'].iloc[i+3]) < 20 ) and (abs(c['ax'].iloc[i+3]) < 20) and (abs(c['ay'].iloc[i+4]) < 20 ) and (abs(c['ax'].iloc[i+4]) < 20) and (abs(c['ay'].iloc[i+5]) < 20 ) and (abs(c['ax'].iloc[i+5]) < 20) and (abs(c['ay'].iloc[i+6]) < 20 ) and (abs(c['ax'].iloc[i+6]) < 20) and(abs(c['ay'].iloc[i+7]) < 20 ) and (abs(c['ax'].iloc[i+7]) < 20):
    c['cat'].iloc[i] = 1 
    
else: c['cat'].iloc[i] = 0```

How can i set any other c['cat'].iloc[i+1] and so on to i+7 =1

umbral ferry
#

so my model is fitting really really well to the training data, but it's also fitting well to the test data (RMSE on train of 1, on test is 6) is that ok? Or do I want reduce how well it fits the training data?

modest mulch
#

Hiya, anyone knows what is the simplest way to kind of model like a "gesture is not among known gestures" label in hand gesture recognition? threshold is one way but it's not really that robust, I thought of estimating uncertainty using bayesian neural networks but was wondering if there happened to be a simpler fairly robust method?
would using sigmoid intseade of softmax for the last layer help? if all the probailities are less than 0.5, it means either there was no gesture, or the gesture is not known enough?

modest mulch
#

I would have to modify the dataset, and retrain the model

hasty mountain
#

Hey guys, I want to read a video using matplotlib.image, can someone give me an idea on how to do that?
I've tried using image.io, which can use a reader and then iterate through the reader to get frames and an array with the pixels. However, I gave up using this library because it doesn't return a proper array that I can use in my algoritms.

Here's the code I've used so far:

data = imageio.get_reader(r'video_sample.mp4', 'ffmpeg')

for frame, rgb in enumerate(data):
    X = rgb
    y = frame

I'm out of ideas on how to iterate through a video using matplotlib.image

velvet thorn
#

because each successive learner increases variance and decreases bias

#

if you start with strong learners you’re probably going to get mad overfitting

grave frost
grave frost
#

it simply converts the data your are interating over to a tuple while providing its count as the second elem

#

atleast, that's what I understand ¯_(ツ)_/¯

hasty mountain
velvet thorn
#

ah okay

#

wait hold up

velvet thorn
#

which you can use as if it were a numpy array

grave frost
#

does that provide any useful information?

velvet thorn
grave frost
hasty mountain
grave frost
#

can you print the shapes?

velvet thorn
#

np.asarray

velvet thorn
grave frost
velvet thorn
#

🙏

hasty mountain