#data-science-and-ml

1 messages ยท Page 330 of 1

hearty tusk
#

are your scores maybe below 0.7, since you are setting the y limits to 0.7-1.01?

icy pine
#

Hello, fellow coders.

I'm putting together a team of python users to make a downloadable AI assistant (kind of like Siri, Cortana or Alexa) that you can download on your computer. All in python.

I think this isn't a one-man project so I need some team members. Please contact me if you have experience regarding this area (I'm new to this but I'm a fast learner) or if you have any questions. I'm very new to this but It's a project I definitely want to undertake because it seems overall like a fun project, especially since I'm only a teen.

What I'm expecting or hoping for the final result to be (I will update it, fix it, and add more features as we go too) I'm trying to make it able to tell weather, time, math calculations, mini-games, looking on the web, youtube music, and recent news, all using voice commands and speaking in voice that should sound somewhat natural. I'm also trying to make some sort of machine learning so the AI can learn more about you and slightly change its questions and statements to fit your personality.

If you think this is impossible or I'm having high hopes and I am a complete idiot, please feel free to tell me, since I'm open to judgement and improvement.

You can DM me at DarkMist#0074.

Note: I'm not offering payment of any kind or anything. I am just hoping that this will be a fun experience to everyone and a wonderful project. I will make like a poster of everyone in the team with their names and contribution and everything to kind of honor them and thank them for their help. This is a TEAM, by the way, not a company or a giant corporation, so I will probably accept a max of 15 members or so.

Thank you for reading. It should have taken a ton of time unless you are Mr Howard Berg. Let me know if you have questions!

DarkMist

arctic wedgeBOT
#

6. Do not post unapproved advertising.

grave frost
#

and there are multiple projects that have made similar things, check them out too

quasi sparrow
#

Does anybody know of a workaround to train a model for XGBoost regression on multi-output?

#

The library XGBoost currently does not support multi-output regression.

grave frost
#

nvm they dont have it either

#

is it for a kaggle comp?

grizzled barn
#

Anyone here involved with AI projects and know a good place to start? Not necessarily learning about what it is, but how to actually make projects involving it.

serene scaffold
grizzled barn
lapis sequoia
#

I have a function like the following

def myfunc(c, h, alpha, beta, delta):
    # perform some calculations
    return s, t, x, y, z

where input parameters are

c = 0.53
h = 0.07
alpha = 0.6
beta = 1
delta = 0.8

The alpha, beta, delta inputs are initial values in the range from 0 to 1. I would like to adjust these input values such that the outputs s, t, and the sum of x, y, z are close to some values such as

s = 0.34
t = 0.20
sum(x, y, z) = 0.45

Is there an optimization function in SciPy or other Python package that would do something like this?

chilly geyser
#

If you can shift everything to a single objective then yes, I think you can use one of the ready-made ones

#

If you are doing multi-objective optimization, I'm not too sure if any are ready-made

tidal bough
#

try just minimizing something like:

c = 0.53
h = 0.07
def cost(alpha, beta, delta):
    s, t, x, y, z = myfunc(c, h, alpha, beta, delta)
    return (s-0.34)**2 + (t-0.20)**2 + (0.45 - (x+y+z))**2

with scipy.optimize for a starter

#

I think it even has multiobjective ones

lapis sequoia
#

@tidal bough So something like this:

from scipy.optimize import minimize
c = 0.53
h = 0.07
def cost(alpha, beta, delta):
    s, t, x, y, z = myfunc(c, h, alpha, beta, delta)
    return (s-0.34)**2 + (t-0.20)**2 + (0.45 - (x+y+z))**2
x0 = [0.6, 1, 0.8]
res = minimize(cost, x0, method='Nelder-Mead', tol=1e-6)
tidal bough
#

Yeah, basically

radiant kayak
#

Hi

serene scaffold
rancid widget
real torrent
#

Is it possible to change the location of the origin in a 2D Matplotlib plot?

velvet thorn
#

if you mean in the mathematical sense

#

look into ax.axis/plt.axis

stoic hill
#

hello guys can anyone help me, i am trying to run a code in google colab with a database containing 29lkh records and trying to fit that data to random forest classifier and when i try to run that code, my session crashes because of memory error as i am running it on 16GB ram and GPU tho it crashes any way to run it?

ornate jasper
#

Hi

arctic wedgeBOT
#

Hey @stoic hill!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

austere swift
#

send the colab link

stoic hill
#

yeh ok

#

when i try to fit it crashes

austere swift
#

random forest doesnt use gpu btw

#

so try using a non gpu session with maybe more ram

stoic hill
#

ik even on cpu with 16gb of ram it crashes

#

i think the only problem is the ram

#

and i dont have that;(

austere swift
#

i have a server with more than enough ram to run that, and it's not currently loaded with much, so if you send me the data i can run it

#

if you're ok with that

stoic hill
#

sure i just want the model

#

file

late shell
#

Hello, I'm just getting started with CNN, and I was wondering what would happen if I throw in a (greyscale) image (flattened into an array) to normal feed forward NN. Wouldn't the network learn the weights to classify the image or what would happen?

cinder barn
#

what IDE do y'all use for tensorflow?

#

anyone use vscode here?

chilly geyser
#

A lot of people use vsc

serene scaffold
cinder barn
#

Anybody know why I get this error when I run the code

serene scaffold
#

One of the skills you'll develop is identifying the salient part of error messages. Often you can find an exact solution by googling the salient part.

dire echo
#

pip install tensorflow

Required satisfaction

Import tensorflow

ERROR: 
Module "Tensorflow" not found
#

And thats why i use mobile ide ;D

cinder barn
#

Yea I wish installing tensorflow was miles easier

#

It's a roadblock that prevents many beginners from starting to learn

dire echo
#

My favorite AI module is

#

Random

#

;|

#

Lol

dire echo
#

I mean is google sooo

hoary wigeon
#

i need help

#

This is first time I'm training LogisticRegression Model over 0.95*1.6 Million rows and 0.5 Million columns of data with penalty='elasticnet', l1_ratio=0.5, solver=saga

#

How much time it can take ?

#

it already took 4 hr and still in progress, i want to track progress if its is really working or just stuck...

lament halo
#

Whats up! Who anyone knows quant trading?

long shard
#

Correlation matrix captures linear relation between 2 features in a dataset. how to capture non linear relations between features? And how to address/eliminate them?

dull turtle
#

hello

#

how i can seprate pandas dataframe

#
                  4
0  02-03-2020 09:19
1  02-03-2020 09:20
2  02-03-2020 09:20
3  02-03-2020 09:21
4  02-03-2020 09:21
5  02-03-2020 09:22
6  02-03-2020 09:22
7  02-03-2020 09:23
8  02-03-2020 09:23```
#

how i can seprate date and time in above data?

grave breach
#

you just have to get cuda up and running

granite star
#

hello

#

I got a dataset from Kaggle about water potability and the data set gives potability as 0 or 1 (like True or False)

#

it got: ph Hardness Solids Chloramines Sulfate Conductivity Organic_carbon Trihalomethanes Turbidity attribiutes and I wonder how can i apply multiple linear regression to it

#

actually it doesnt have to be linear regression

#

but i think i have trouble with 0 or 1 value of potability. When i try to apply multiple linear regression to it, it gives me absurd results

granite star
#

I am very new to data science and thank you in advance for your help

#

oh I found the solution. I think I must use binary classification not regression ๐Ÿ˜…

ocean swallow
#

is there any leaks in numpy? My pipeline data is made up of in total 40 images and about 1000 objects that has array views of those 40 images about 100 mb in total as jpeg. My memory consumption increases dramatically compared to how much it should actually be. (10-14 gigabytes of pipeline data, with only libraries used it is about 3gb.)

#

aren't numpy images memory efficient since it uses view

young valve
#

Hi, is it okay to conduct an elbow test using a data frame with no normalized variables (i.e all dummies), or is it better to include the data frame with normalized variables?

#

Would there be any significant differences? Thank you

lapis sequoia
#

Hi

#

I have made a Jarvis like
To make my things easier like opening an app or listening to songs

#

Voice recognition

#

Can someone help me to put apps

#

def JARVIS(self): wish() while True: self.query = self.STT() if 'good bye' in self.query: sys.exit() elif 'open google' in self.query: webbrowser.open('www.google.co.in') speak("opening google") elif 'open youtube' in self.query: webbrowser.open("www.youtube.com") elif 'play music' in self.query: speak("playing music from pc") self.music_dir ="./music" self.musics = os.listdir(self.music_dir) os.startfile(os.path.join(self.music_dir,self.musics[0]))

#

How do I put spotify or any other app

quasi sparrow
#

Does anybody know of a good example on sklearn.model_selection.TimeSeriesSplit?

#

I'm trying to use this method along with a modekl

grave breach
remote fossil
#

when comparing models should you keep hyperparameters the same or optimise for each

ancient fog
#

herlp

#

i need help

#

HOW DO I DO THIS 8PUZZLE HTING

quasi sparrow
#

Sorry I was not clear. I'm working on a boosted trees model and using XGBoost implementation for Python.
XGBoost can only predict one target, so I'm using scikit_learn multioutput regression as a wrapper to train a model with 3 target outputs.

multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(max_depth=3, n_estimators=100, n_jobs=2,
                           objectvie='reg:squarederror', booster='gbtree',
                           random_state=42, learning_rate=0.05)).fit(x_train, y_train)

Time series must be validated using walk forward validation . I want to use the scikit implementation on my problem but I can't find a good example online on how to implement this validation on my model.

cinder barn
#

I stg Iโ€™ve been trying to setup tensorflow for over a year

#

Tomorrow is the day I will finally finish

grave breach
#

the problem is with cuda

#

tensorflow is setupped

grave frost
#

look it up for a better explanation

slate hollow
#

doesn't the normal maxpool also do it over the whole input

blissful nymph
#

can someone do me a favor and translate this keras nn to pytorch?

model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))
dire echo
#

!e

import random as r
print(r.randint(1, 100))
#what come out will be decision
arctic wedgeBOT
#

@dire echo :white_check_mark: Your eval job has completed with return code 0.

75
dire echo
#

there, the most simple AI i can think of

austere swift
#

ai involves intelligence

#

that's random

#

lol

#

technically the simplest ai you can do is using one parameter, so linear regression

austere swift
#

why not keep it in keras?

arctic wedgeBOT
#

Hey @young valve!

It looks like you tried to attach file type(s) that we do not allow (.html). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

proven sigil
#

Pyspark:

def remove_null_columns(df, label_col, null_threshold=0.8):
    total_rows = df.count()
    cols_to_drop = []
    for c in df.columns:
        if c == label_col:
            continue
            
        null_values = df.select(F.count(F.when(F.isnull(c), c)).alias(c)).collect()[0][0]
        if null_values / total_rows > null_threshold:
            cols_to_drop.append(c)
    
    df = df.drop(*cols_to_drop)
    return df

df = remove_null_columns(df, label_col)
print(len(df.columns))

I'm removing columns which have more than 80% null values. How to optimise this code?

velvet thorn
#

this part

#

is the problem

#

you're calling collect once per column

#

you should write a query that selects the null percentage for all columns, filters out the ones that fall above the threshold, and then collect that

#

then drop

#

alternatively you can write it as a select but that's a bit more complex

#

I wouldn't recommend that

proven sigil
#

Thank you so much!

velvet thorn
#

yw ๐Ÿ‘‹

cinder barn
#

Why doesn't it print the chart

brazen jackal
cinder barn
#

omg

cinder barn
uncut barn
#

does anyone know how to open NDPI files in python?

indigo skiff
#

hey guys anyone familiar with text generation pipelines? Need quick help to understand it more

desert oar
#

can you be more specific @indigo skiff ?

desert oar
arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

weak sentinel
#

Does anyone know how efficiency of pd.cumsum() scales as compared to np.cumsum()

#

I found someone on StackOverflow saying pandas was faster, but my understanding was that pandas is just a layer built on top of numpy

chilly geyser
#

I think Numpy is likely to be faster, but actually, just test it out

weak sentinel
#

My dataset Iโ€™m currently developing with is really small, but when this goes into prod and uses live data itโ€™s going to be thousands of rows

#

Iโ€™ll try a test though

#

Do you suggest %timeit?

chilly geyser
#

Thousands of rows is generally not performance critical to me, unless you are doing >quadratic stuff

#

But well, you can make fake data with 50000 rows and see

grand lion
#

How would I plot data on a United States Map by coordinate?

chilly geyser
#

!e

from timeit import repeat
setup=(
"""
from numpy.random import default_rng
from pandas import DataFrame
x = default_rng().standard_normal(size=(30000, 20))
df = DataFrame(x)
"""
)
print(repeat("x.cumsum(axis=0)", setup, number=10, repeat=5))
print(repeat("df.cumsum()", setup, number=10, repeat=5))
arctic wedgeBOT
#

@chilly geyser :white_check_mark: Your eval job has completed with return code 0.

001 | [0.19380801357328892, 0.1688365377485752, 0.17779459105804563, 0.20028469525277615, 0.2744292030110955]
002 | [0.2766013741493225, 0.28942475002259016, 0.25081070279702544, 0.2581129721365869, 0.2731569781899452]
chilly geyser
#

@weak sentinel Minibot 'benchmark' seems to say np is slightly faster.
I also tested with colab, with numpy being also slightly faster

chilly geyser
#

I further tested with C++ with compiler optimizations - it's a lot faster if you go that route, so there's that if what you're doing is somehow performance critical

indigo skiff
cinder barn
visual heart
#

Hello

grave frost
timber skiff
#

Hey, anyone know why my OLS trend is so whacky on my plotly scatter plots?

#

Like, in mid-April it jumps up when all of the observations were actually below average

blissful nymph
quasi sparrow
#

What's the incentive of publishing articles on medium?

#

"towards data science" blog, to be more specific

blissful nymph
#

@quasi sparrow money and probably reputation

desert oar
#

TDS has a lot of clout nowadays

#

lots of people subscribe to it

#

otherwise, medium is just a blogging platform with some social media elements

quasi sparrow
#

Yeah, it's hard to navigate TDS. Most of the examples are toy programs.

candid wraith
#

hey how do i get time.sleep(60) to stop all functions and read the script logically so i can make my script stop where i want it to ?

austere swift
blissful nymph
#

Dunno i use pytorch quite a bit

#

not tensorflow as much

austere swift
#

that's like asking someone to help move your stuff into a new house because your old house has a leak

#

lol

blissful nymph
#

true

austere swift
#

what's the issue with tensorflow?

#

like what error do you get when you run it

blissful nymph
#

2021-08-02 10:20:52.830264: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-08-02 10:20:52.830756: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

austere swift
#

did you install cuda

wooden rapids
#

hey guys any references to intuitively understand json/html/css parsing

#

so far its a lot of brute force try and try again

austere swift
#

there's already prebuilt modules in python for html and json parsing

#

also, that doesn't really sound like it would belong in this channel

wooden rapids
#

oh sorry - which channel is more appropriate? for context im in a data science course so i thought that this is something a lot of native users could speak about

cinder barn
#

I gave up

#

and just used google collab

austere swift
cinder barn
#

so much easier

cinder barn
austere swift
#

yeah cuda 11.2

#

that's the latest version tensorflow accepts

cinder barn
#

I downloaded 11.4

austere swift
#

there's your issue :)

cinder barn
#

thanks bro

#

Could you send me the link to download 11.2

#

I could only find 11.4

austere swift
#

you have to go to the archives to see it

austere swift
#

the update 1 part doesn't matter

sterile prawn
#

Some lstm generated tweets:

#

Heya really tweeting are understanding upon with are you and of it you here and.
I am murder of rogers time as shorter with burgers and with me and here once blahhh.
Stop crazy twitter very hit turkey little homemade turkey upset his food haircut.
Mite goodmorning because roof with development yay amp twittering a his paparazzi.
Willieday tweet guys cousin are ian getting hopes and i im gonna for with so im here yet xxx.
Even sooo the shame create home food visit hit your massive myself starbucks.
Dats sounded yay hence hopes proud brit of ease you movies pain like you they are on at tomorrow.
Lauren hate with bugs wouldnt yet doing word and there do to about that and cya.
Iranelection alot the rumor recorded torture approach printed are of it even love and he are around yay.
Httptwitpic doing jenna and does perfect news line political newcastle while going on your song and in proud dani

#

made with an LSTM autoencoder and dcgan

serene scaffold
#

do you have the source code? I'd love to see it.

sterile prawn
#

totes this is more for learning

#

about gans and seq2seq autoencoders

#

i'd fine tune gpt-2 if i was like oging for quality tweets

serene scaffold
sterile prawn
#

and teh gan is literally the keras dcgan

#

just plugedd into the latent space

sterile prawn
#

but gpt is abit overbearing rn

serene scaffold
#

the problem is that there's more to NLP than generating text.

sterile prawn
#

ofc

#

this is legit just text gen which is what gpt-2 is FOR

#

so ofc it would work

serene scaffold
#

right. and it's still very interesting lemon_hyperpleased

sterile prawn
sterile prawn
#

it isn't AGI for NLP

serene scaffold
#

what is AGI?

sterile prawn
#

aritifical general intelligence

chilly geyser
#

I don't even know how it's so high up when stackoverflow is even better, especially when the result is quite pertinent

cinder barn
#
# Imports
import cv2
print("imported cv2")

# Loading pre-trained data
trainedFaceData = cv2.CascadeClassifier('FaceDetection/haarcascade_frontalface_default.xml')
print("loaded pre-trained data")

# launch webcam
webcam = cv2.VideoCapture(1)
print("Webcam launched")

# loop all frames
while True:
    sucessFrameRead, frame = webcam.read()
    # Converting to grayscale
    grayscaleImg = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    print("Converted to grayscale")
    # Detecting faces
    faceCoordinates= trainedFaceData.detectMultiScale(grayscaleImg)

    # Print location of face
    print(faceCoordinates)

    # Draw rectangle around face
    for (x, y, w, h) in faceCoordinates:
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 1)


    # show image (window name, and what you want to show)
    cv2.imshow('Face Detector app', frame)
    print("Displaying image")
    
    # wait and close be pressing any key
    cv2.waitKey(1)
    print("Press any key to exit")
grave frost
#

plus they are SOTA, so its hard to argue with that ยฏ_(ใƒ„)_/ยฏ

sterile prawn
#

yea i only use lstms cuz they are easier ๐Ÿ˜

pearl violet
#

how about music and AI, there is someone very good at it here?

serene scaffold
sterile prawn
#

very badly

#

but basically

#

encode music as text -> generatoe text w/ lstm -> encode bacc to musak

#

carykh has great vid on it

proven sigil
# velvet thorn you should write a query that selects the null percentage for all columns, filte...
def remove_null_columns(df, label_col, null_threshold=config['null_threshold']):
    all_features = df.columns
    if label_col in all_features:
        all_features.remove(label_col)
    df.createOrReplaceTempView('remove_null_columns')
    query = 'select ' + ', '.join(['count(`%s`) * 1.0 / count(*) as `%s`'%(i, i) \
                                   for i in all_features]) + ' from remove_null_columns'
    non_null_count = spark.sql(query).collect()[0].asDict()
    columns_to_drop = [k for k, v in non_null_count.items() if (1 - v) > null_threshold]
    
    df = df.drop(*columns_to_drop)
    return df

new_df = remove_null_columns(df, label_col, 0.1)
print(len(new_df.columns))

Does this look good?

#

It runs much faster compared to earlier. Should I change anything further?

proven sigil
#

I don't understand

#

Got it

#

Updated

serene scaffold
velvet thorn
#

I would suggest

#

you refrain from writing your own SQL

#

you can do that with the Spark DSL

#

but that's not a huge problem

velvet thorn
serene scaffold
#
df[df.isna().sum() / len(df) < TRESHOLD]
#

yes?

proven sigil
#

haha

desert oar
#

@velvet thorn i imagine the sql version would be a lot faster, no?

#

otherwise you end up doing a for loop over columns with a collect-ing operation (count) in each iteration of the loop

#

or is there some pyspark magic i don't know about

velvet thorn
#

what they did originally

#

and I said

#

do it once, with one collect

#

it's been like a 2 years since I worked with PySpark

#

but

#

you can defo express it with the Spark DSL

desert oar
#

how would you do that with the DSL? you can't count without "collecting"

#

maybe the scala version lets you do some map/filter stuff over columns

velvet thorn
#

like you write a query that counts the nulls in each column, collect that, then drop

#

not sure if I'm expressing myself properly

desert oar
#

oh, i see

#

yeah easy enough

velvet thorn
#

df.select((F.count(F.isnull(F.col(col)) / len(df) < 0.8).alias(col) for col in df.columns)?

#

something like that?

#

I don't really remember but that should work

#

wait as is reserved in Python right

desert oar
#

yeah it's .alias in pyspark

velvet thorn
desert oar
#

F.count

velvet thorn
#

but it's been a LONG time since I did any sort of Spark

desert oar
#

that was it

velvet thorn
#

bet it's like version 2.8 already

#

3.2?

desert oar
#

yeah 3.something

velvet thorn
#

3.1.2

#

wow

desert oar
#
def null_frac(df, colname):
    return df[colname].isNull() / F.count(df)

col_null_fracs = df.select(
    *(null_frac(df, c).alias(c) for c in df.columns)
).first()

bad_columns = [c for c, f in col_null_fracs.asDict().items() if f > 0.8]
df = df.drop(bad_columns)
#

@proven sigil โ˜๏ธ

#

it's probably good to stay fresh on pyspark

#

i haven't used it in over a year

dire echo
#

Natural language...

#

can we use unused human brain as cpu, LOL

#

It also come with free big hardrive and built in learning module

halcyon vale
#

Tokenization: Subword Tokenization splits words into smaller parts based on the most commonly occurring sub strings. Word Tokenization splits a sentence on spaces as well as applying language specific rules to try to separate parts of meaning even when there are no spaces. Subword Tokenization provides a way to easily scale between character tokenization i.e. using a small subword vocab and word tokenization i.e using a large subword vocab and handles every human language without needing language specific algorithms to be developed. On my Journey of Machine Learning and Deep Learning, I have read and implemented from the book Deep Learning for Coders with Fastai and PyTorch. Here, I have read about Word Tokenization, Subword Tokenization, Setup Method, Vocabulary, Numericalization with Fastai, Embedding Matrices and few more topics related to the same from here. I have presented the implementation of Subword Tokenization and Numericalization using Fastai and PyTorch here in the snapshot. I hope you will gain some insights and work on the same. I hope you will also spend some time learning the topics from the Book mentioned below. Excited about the days ahead !!
https://www.linkedin.com/posts/thinam-tamang-3b12831a2_300daysofdata-66daysofdata-machinelearning-activity-6828204194089041920-gyVw

๐Ÿ† Day 233 ofย #300DaysOfData!

๐Ÿ“‘ Tokenization :
Subword Tokenizationย splits words into smaller parts based on the most commonly occurring sub strings...

arctic wedgeBOT
#

@raven steeple Please don't try to ping @everyone or @here. Your message has been removed. If you believe this was a mistake, please let staff know!

ebon rock
#

Hey folks! I am a data science aspirant and I have been learning SQL from the past month for a data analyst role.

#

I know JOINS,CASE, CTEs, Agrregrate Functions as I have used all these to solve problems in Hackerrank.

#

What is the next step?

#

What more do I need to know?

uncut barn
#

does anyone know how to check if the pixels have more than 8 bits per channel for a given image?

sterile prawn
#

if its above what and 8 bit channel img should be

#

it has 10 bits?

#

like the size in mem of the img idk

uncut barn
sterile prawn
#

oh

uncut barn
#

and the only thing I can get out of it are the dimensions

sterile prawn
#

isn't that a C library?

uncut barn
#

no python has it too

sterile prawn
uncut barn
#

which are only (width, height)

sterile prawn
#

that's it

#

so ONLY with the width and height

#

can you tell if it has more than 8 bits

uncut barn
#

but they're color imgs

#

that i dont know

sterile prawn
#

well is there any difference in the width and height between 8 bit and 10 bits

#

?

uncut barn
#

I converted it to a thumbnail and that turned this to an array and gave 3 dimensions, last being 3 which is colour

sterile prawn
#

so

#

(width, height, channels)?

uncut barn
uncut barn
sterile prawn
#

ok

#

so i dont think your problem can be resolved

#

with just width, height and cannels

#

lookf or someone who knows openslide

abstract falcon
#

Can anybody share a good resource on Entity Extraction Model..??

desert oar
# ebon rock What more do I need to know?

that sounds like a pretty good basis for data analysis. maybe also take a look at window functions. otherwise you know more than enough to get started, and you should start focusing on other things like excel skills, light-duty data processing with python, basic command line stuff, and probability/statistics

random elk
#

Hello, everyone! I'm a beginner python programmer who is coming from a project management background and looking to transition into a data analyst career. Could you help me build a roadmap of which skills to develop? So far my python projects are very basic and I wanted to build projects that get slowly more complex.
So far I know the basics of the language and I've played a little bit with some analytics concepts in my last project. I don't want to write a giant text wall here with all my questions and curiosities, but here is my github and I would really appreciate some recommendations on what to work on: https://github.com/renatolew

#

I am sorry if this is not the right way to ask this. This is my first time participating in a programming community and I don't know how things work yet

sterile prawn
#

how about if you want to get into machine larning

#

i see you already built a recipe analyzer

#

how about an RNN/LSTM to write original recipes?

#

but for a roadmpa

#

i'd say learn SQL, numpy, pandas, then get into scikit, machine learning, then finally learn tensorflow, keras, deep learning

#

allw hile making proejcts

random elk
#

Thank you. Do you recommend any specific projects for me?

sterile prawn
#

to start - idk its really up to you?

#

but your recipe thing is a great example

#

for my getting started with nerual nets

random elk
#

One of my issues has been finding good projects, because the last two ones I tried were way more complex than I antecipated and I got stuck pretty quickly

sterile prawn
#

i experimented with the iris dataset

#

or i built a simple webscraper

#

an LSTM to generate text

#

all pretty easy with plenty of examples

random elk
#

Thank you. I'll look into it!

modern pine
#

How about a stock prediction ai bot?

#

Any examples for the code?

sterile prawn
#

with simple LSTMs

#

though an idea i had

#

was to run an LSTM on wall street journal article headlines

#

and use that to predict how the DOW/ S & P 500 would perform in the next week

#

if anyone wants to try that

modern pine
#

Oic...with LSTMs ....how about DNN?

#

Fully connected deep neural network

sterile prawn
#

so lstms yeet DNNs

inland zephyr
#

Hello I want to ask about making custom matrice with pandas.
I want to make similiar confusion plot but with average of distance between actual and predicted class which the data available in here https://paste.pythondiscord.com/ehogipafip.css but cannot find a good advice to do this.

#

i want to make it like this so i can analyze which entity has almost have relation between each other

sterile prawn
#

gl

quiet vault
#

I am doing a walk forward validation to evaluate a model. To get the best accuracy of how the model really performs to new data, I retrain the model every timestep. I am testing for 7 timesteps and repeating the walk forward validation 3 times. The model seems to be getting better every time it foes through the 7 days. Could it be possible that the model has past knowledge and is basically "cheating"?

#

I am working with keras.

acoustic halo
quiet vault
#

no

#

Its time series data

#

Thats why I have walk forward validation

#

So just to be safe, is there a way to restart models completely?

acoustic halo
#

You can still (and definitely should) have a separate validation set

quiet vault
#

how

acoustic halo
#

Have you got a sample of your dataset?

quiet vault
#

yes

#

I have 131 datapoints

#

how many do u need?

acoustic halo
#

like 5

quiet vault
#

-0.10000228881836648
0.6800003051757955
0.5600013732910085
-0.8400001525878906
-0.7400016784667969
0.9099998474121094

acoustic halo
#

Okay gimme a minute, I need to check some code where I have done something similar before

quiet vault
#

alright

acoustic halo
#

So if you have 131 datapoints, you could only use the first 100 for training and testing

#

Then at the end of each epoch, validate on the remaining 31

quiet vault
#

That is not a very accurate way to test though

#

Walk forward validation is a great way for testing models with time series data because it represents how I would make predictions with the model

#

So the problem here is not really the way to testing.

#

I'm looking for a way to completely delete a model

acoustic halo
#

Either way, the same still applies, you train the model on the first 100 or so datapoints, then once you have finished, you can use those 100 to try and predict the final 30 or however many you want to use as a final benchmark, otherwise there is no way to know the actual performance of the model, this would apply to both sliding windows and expanding windows

#

Not sure what you mean by deleting the model, since if you are saving them, they are usually just saved in a .h5 file

lapis sequoia
#

Statement: We cannot determine overfitting based on one hypothesis only

  • Why is that? Isn't it the case that if we have a hypothesis with very low E_in and have a high E_out that its a sign of overfitting? Why can't we in virtue of that conclude that H1 is overfitting the data?
wicked basin
#

how do i start the basics of machine learning

quiet vault
#

I don't think you are understanding what walk forward validation is

#

The point is not to get the best model

#

it is to test to see how well the model performs

#

To get the best understanding of the variance of predictions, I am running the walk forward validation 3 times. After every time step, I want to make a new model and train it with the data it has. Then make a prediction and compare it with the answer

quiet vault
wicked basin
quiet vault
#

np

#

it does not go into deep learning right away which is good

#

at the end it does introduce u to neural networks

flint musk
#

If logistic regression is classified between two features, and KNN is classified between more than two features, between how many features a decision tree classified?

sterile prawn
#

this is where the tweet lstm is now:

#

So greece calories shooting tunnel tragedy and lovin crazy touching.
Just dreading full marathon is as or pool rock move spain beers.
Taking up drunken other tragedy high projects is alot less that no if i feel maybe even worry.
Was into a decisions chilled apples lol but take dream that sore workout i.
Just dreading full thunderstorm is fun or pool rock move spain rehearsal.
Here at my appropriate soft more animation today i was a drunken and.
Just off in class is id miss central places meet downtown weeks.
Was up the disc murder a misery comcast blogging pirates.
Chilling reality tonight october possibly on your annoying theme music for flowers.
Iranelection greece murder ideal loud and is i lovin with touching.

quiet vault
#

beter than me can right

inland zephyr
#

i want to ask about arcface embedding algorithm, about the output vector. Is the vector value are is normalized between 0 and 1 or not? since i need to decide which similarity method to do the inference?

sterile prawn
#

idk

#

i really dk

acoustic halo
quiet vault
#

Sure. Can you wait like half an hour?

#

I'm in a ranked game lol

sterile prawn
#

the lstm is a autoencoder lstm with dcgan if anyone wants to give it a shot themelves

acoustic halo
#

Yeah no worries, ranked siege is way more important

quiet vault
#

yes, sorry

quiet vault
acoustic halo
#

Okay it was what I thought it was, nevermind

umbral ferry
#

is there any benefit to over fitting? maybe it can give you some insight into your data like in unsupervised learning

unborn glacier
# umbral ferry is there any benefit to over fitting? maybe it can give you some insight into yo...

Yes, the ability to over-fit suggests that your model is sufficiently complex to handle the data. If you just do a single variable linear regression on complicated data, you'll never be able to get a proper fit to, lets say, sinusoidal data. So if you are concerned that you don't have enough layers or you didn't choose a complex enough ML model, the ability to over-fit suggests that it is complex enough but you need more data, or data that better predicts the test samples

#

It can also give you an idea of when to stop training, because if you are over-fitting, you've gone too far

umbral ferry
#

you can tell when you've overfit?

acoustic halo
#

Yes, in super simple terms if your model seems to be doing really well on the training data, but then does worse on new data it has never seen before, it's likely because it has overfit to the training data

umbral ferry
#

what do you mean by doing well on training data? I thought you just kind of feed the training data in and out pops a model

#

I am sort of seeing on my test data, that most predictions are pretty close to the actual, but a few are far off

#

I think that points to over fitting possibly

acoustic halo
#

Okay so let's say you train a model on your training data

#

Then after training you test your model on the training data again and get 99% correct predictions

#

Then you test your test data in the model, which only predicts correctly 30% of the time

#

That's suggests the model has learnt the training data too well that it doesn't generalize well to new data, aka overfitted

umbral ferry
#

oooh ok

#

ty

dusty cloud
#

Hi guys, when it comes to faster python code, whats the difference between numba and transonic?

stark bough
#

hello Everyone!

#

๐Ÿ™‚

#

i am new in this group glad to be here

desert oar
cyan sun
#

If any of you are feeling extra generous tonight, mind joining me a #โ˜•help-coffee ??need a pandas Q answered thx

lapis sequoia
#

Anyone have some good tips for how I could speed up my yolov5 model? Using 720p images for my input data and only two classes, I have about 3000 training images and 10% of those are for validation

What can I do to speed up inference no matter how small?

tidal sonnet
#

So like, I've been trying to figure out how many heights are within one standard deviation for this given set... but I seem to be doing something wrong, in a process I thought was fairly straightforward

#

code:

from math import sqrt
players = [180, 172, 178, 185, 190, 195, 192, 200, 210, 190]

mean = sum(players) / len(players)

pre_variant = [(number - mean) * (number - mean) for number in players]

variance = sum(pre_variant) / len(pre_variant)

std = sqrt(variance)

valid = [player for player in players if player in range(int(mean-std), int(mean+std))]

print(len(valid))
#

Find the mean, then the variance which is the average of the squares of the difference of each value and the mean, find the standard_deviation which is the square root of the Variance, and then find all numbers in that range...
Where did I go wrong?

#

I tried doing this same method with different data, another question that I knew the correct answer for, and it worked fine

#

Nvm... I got it to work, was a problem with my last list comp

#
from math import sqrt
data = [180, 172, 178, 185, 190, 195, 192, 200, 210, 190]

mean = sum(data) / len(data)

pre_variance = list(map(lambda number: (number - mean) * (number - mean), data))

variance = sum(pre_variance) / len(pre_variance)

std = sqrt(variance)

result = list(filter(lambda number: number > (mean - std) and number < (mean + std), data))

print(len(result))  

The result

white venture
#

What exactly is TensorFlow? And why do many say that it makes it easy to do ML when looking at it makes my head hurt?

acoustic halo
#

Many use something like keras which provides a simpler interface for tensorflow, or pytorch instead

#

Tensorflow can be difficult to understand if you are not already familiar with the concept of tensors

white venture
acoustic halo
#

Theres also a bunch of other resources in the pinned messages

#

Codecademy has a decent course as well if you have access to that

limpid osprey
#
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

data = pd.read_csv('/content/student-por.csv')
print(data)
X =  data.drop(columns=['grade'])
print(X)
y = data['grade']
print(y)
model = DecisionTreeClassifier()
model.fit(X , y)
model.predict([[18, 2, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0, 4, 3, 4, 1, 1, 3, 4]])

What am i doing wrong here

ValueError                                Traceback (most recent call last)
<ipython-input-33-4fc4f82f40e4> in <module>()
      9 print(y)
     10 model = DecisionTreeClassifier()
---> 11 model.fit(X , y)
     12 model.predict([[18, 2, 2, 0, 1, 0, 0, 0, 1, 1, 0, 0, 4, 3, 4, 1, 1, 3, 4]])

2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    875             sample_weight=sample_weight,
    876             check_input=check_input,
--> 877             X_idx_sorted=X_idx_sorted)
    878         return self
    879 

/usr/local/lib/python3.7/dist-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    171 
    172         if is_classification:
--> 173             check_classification_targets(y)
    174             y = np.copy(y)
    175 

/usr/local/lib/python3.7/dist-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    167     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    168                       'multilabel-indicator', 'multilabel-sequences']:
--> 169         raise ValueError("Unknown label type: %r" % y_type)
    170 
    171 

ValueError: Unknown label type: 'continuous'

This works with other data i have but not with this one for some reason
current data : https://paste.pythondiscord.com/pogoqinuko.apache
old data : https://paste.pythondiscord.com/ayazepezoz.apache

Is it that it can only train with 2 parameters

late shell
#

Hello, I'm just getting started with keras and was trying out this code:

model = keras.Sequential(name='shit', layers = [                                               keras.Input(shape(2,)),
                                                    keras.layers.Dense(3, activation='relu'),
                                                    keras.layers.Dense(1, activation='sigmoid')
                                                        ])

But I get this error :
WARNING:tensorflow:Please add keras.layers.InputLayerinstead ofkeras.Inputto Sequential model.keras.Input is intended to be used by Functional model.

What does it mean by "Functional model" and why am I getting this error?

acoustic halo
#

Because as it says, you need to use InputLayer, not Input

#

you could also just do keras.layers.Dense(3, activation='relu', input_shape=(2,))

#

And skip defining the input layer

#

Functional models in keras are just models more complex than sequential models, eg:

austere swift
#

a functional model is a model that is made kind of like this:

inputs = keras.Input()
x = keras.layers.Dense()(inputs)
x = keras.layers.Dense()(x)
# ...
outputs = keras.layers.Dense()(x)
model = keras.Model(inputs=inputs, outputs=outputs)
late shell
#

Cool ๐Ÿ‘ , thanks a lot @acoustic halo & @austere swift

acoustic halo
#

Actually got a functional API question myself, could you implement NEAT with it by creating layers with single inputs and outputs to effectively act as single nodes?

dusk spear
#

Hello, I've got an error while running a NN. Can you please help me?

#

It's giving me this error

#

WARNING:tensorflow:Model was constructed with shape (None, 300, 1) for input KerasTensor(type_spec=TensorSpec(shape=(None, 300, 1), dtype=tf.float32, name='gru_9_input'), name='gru_9_input', description="created by layer 'gru_9_input'"), but it was called on an input with incompatible shape (None, 14, 1).

#

I think it's something to do with the way I've imputed the data, but I cannot figure it out

grave frost
acoustic halo
#

That's what I was thinking, not sure what the performance would be of having potentially hundreds of layers, even if they are small

grave frost
#

well, you can alawys use their profiler to optimize them further if they take too much time

#

though use pytorch or jax then cuz it would be more controllable

acoustic halo
#

Not that I plan on doing this, it's mostly just a thought experiment

#

But it also makes me wonder if there are any papers that apply a method like NEAT to layers as opposed to individual nodes

glacial sparrow
#

I have a dataset (100k,52) with labelled anomalies and I'm trying iforest on various variations of the dataset. So far none returns anything sensible. When I plot the feature space in 2D with PCA or TSNE with different colours for normal and anomaly points, must I be able to visually confirm anomaly regions? In my case normal and anomaly points are mixed e.g. with PCA normal points form a circle and anomaly points are scattered towards the middle or with TSNE everything seems to be mixed altogether

carmine tide
#

Hello! So I am not sure if this is the right room to ask my question but I didn't find a more appropriate one. I want help with fitting a curve on data with errors on both the x and y axes. From what I've read scipy's curve_fit cannot deal with x errors (correct me if I'm wrong). I tried using odr but I think that it didn't give me the correct curve. I could be wrong and it could actually be the best fit curve but I would appreciate a second opinion. Thanks!

wheat sun
#

I was trying to make an ML algorithm that predicts the value of for example f(x) = 2x for a specific input to x (overkill but I'm trying out the power of ML) so that when I fit this array into the model (DecisionTreeRegressor),

0    0   0
1    1   2
2    2   4
3    3   6
4    4   8
5    5  10

it should hopefully predict a y value of 40 for x = 20, 60 for 30, 90 for 45, so on and so forth.

However, when I try to predict 6 for example with the model trained on the array above, it returns 10 and not 12. It goes for any number higher than the numbers in the array.

Can another model solve this issue?

length = 50
slope = 2
b = 0
step = 1

# Initialize array
data = {'x': [(i * step) for i in range(length)], 'y': [(i * step) * slope + b for i in range(length)]}
df = pd.DataFrame(data)

# Define
from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor(random_state=1)

y = df.y
X = df.x
X = X.values.reshape(-1, 1)

# Fit
model.fit(X, y)

# Predict
print(model.predict([[10]]))

Thanks!

sterile prawn
#

Anyone here ever do any roguelike stuff with Python that can be read by a lotta shit

there is nothing against it, but it is really screechy

@nekitdev it's already there for you to call it like a regular pattern..maybe if u can help me grab this guys token he keeps logging on my account on steam just saying it out loud for me to learn python?
im new in python that technically it is for
You are not allowed to use that to run in the background while other code is where it doesnt work now
and the webdriver in your code without seeing the code
lmao i use repl .it
that would be a great choice ๐Ÿ˜„
You need to use requires some notion of OOP and to be compatible with both, Linux and Windows support multiple IPs on same NIC
How do you become better than him including me ๐Ÿ˜‚๐Ÿ˜…

#

markov chains = somewhat realistic python discord messages

#

When i look at to get a simple 1 or 0. 1 means it is assigned to a class
None
I don't appreciate the tone you're taking with cepo. Do you understand what you're asking for is .... Twisted indeed .... badum tsss
how to do a project skskkss
Isn't there a nice project to do with said bot, Selenium is a testing tool, not for scraping
how do i call a function inside a for loop breaks
well pandas complains about the python language. This python course i'm auditing just got a quick question if u have a point where all the keys in the dictionary, instead you should loop backward so if someone asked me to teach him, he didn't take it seriously
if i want to do something on button click, not refresh or redirect the page, instead; update the content in the github student program, which is easier to read i think
It'll work for all of these errors
** does anyone know of the styling guidelines

desert oar
wheat sun
#

You're looking at a tree of possibilities, but they're only limited to the trained values or something

desert oar
#

also you might want to practice using numpy/pandas for working with data more efficiently:

length = 50
slope = 2
b = 0
step = 1

data = pd.DataFrame({'x': np.arange(0, length, step)})
data['y'] =  data['x'] * slope + b
wheat sun
#

That makes intuitive sense

#

So the np.arange function kinda functions exactly like the range function with a min, max and step?

desert oar
#

and for the sake of the exercise: if you know that the underlying function has the form f(x) = ax + b, what model is definitely the best choice for learning this function f?

arctic wedgeBOT
#

numpy.arange([start, ]stop, [step, ]dtype=None, *, like=None)```
Return evenly spaced values within a given interval.

Values are generated within the half-open interval `[start, stop)` (in other words, the interval including *start* but excluding *stop*). For integer arguments the function is equivalent to the Python built-in *range* function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use [`numpy.linspace`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#numpy.linspace "numpy.linspace") for these cases.
wheat sun
desert oar
#

sure, i'm just checking for understanding ๐Ÿ™‚

wheat sun
#

I knew it was overkill, I just wanted to check what ML can do

desert oar
#

in that case a tree isn't a great idea either. you'll want something that can learn "complicated" functions - random forest, gradient boosting, neural network

#

if you have the ability to arbitrarily create test inputs and outputs, you might do even better with a gaussian process model

#

depends on the situation

wheat sun
#

So a tree is better for let's say bool values or limited-choice values like gender, eye color, country of origin, etc

desert oar
#

yeah. i don't think i've ever seen a single regression tree used in serious work

#

they used classification trees quite a bit when i worked in insurance

#

i guess a single regression tree isn't bad if you want to "cut" the target into categories/levels, but you don't know what those categories/levels should be

#

pretty specific use case

wheat sun
#

I just started kaggle's intro ML course today and it was the first model in the tutorial

desert oar
#

what, a regression tree?

#

can you link it?

wheat sun
desert oar
#

it's definitely better if you have more features: that means you have more splits, and more possible values the tree can predict

#

i think this tutorial is meant to show off scikit-learn moreso than decision trees

#

but price is probably an okay situation to use a decision tree: you don't really care about the exact price, and it probably makes sense to group prices into "levels" anyway

wheat sun
#

Yeah right

#

I used a regression tree for the Titanic competition and got a 74% score

desert oar
#

i thought the titanic competition was a binary classification task?

#

i remember doing it in ~2015 when i was first learning python and branching out from "traditional" stats and econometrics

wheat sun
#

Yeah, the target variable is survival

desert oar
#

a classification tree is a sensible intuitive choice for that problem

#

personally i came from the statistics world, so i didn't even consider it and went right for logistic regression (which is also a sensible choice, for other reasons)

wheat sun
#

I just wanted to dive right in to doing ML stuff with whatever code I learned

desert oar
#

it's also worth considering what the advantages and disadvantages of a decision tree and logistic regression are

wheat sun
#

I still don't know what kinds of ML models there are

desert oar
#

i'd argue that the decision tree is a lot easier to understand. but knowing stats and probability is very valuable for doing serious ML or more general data science work

wheat sun
desert oar
#

that's good

#

i think i found a better YT channel for stats.. i will try to find it

#

there are pretty much 3 main categories of ML models in common use: trees and ensembles of trees, neural networks, and statistical models (especially linear ones). many other types exist, but these are the 3 that you will see over and over.

wheat sun
#

From what I understand, neural networks have layers and utilize linear algebra for getting data from one layer to the next

#

Are trees in ML models represented by actual tree data structures?

serene scaffold
wheat sun
#

Hmm

serene scaffold
#

yes, my understanding is that they are acyclic directed graphs.

#

(which is a specific kind of tree--trees can be undirected)

umbral ferry
#

so I trained my model trying to get it to overfit (just to see how it performs), and maybe as expected, it performs well on the training data, and worse but also well on test data (never before seen)

#

is that ok? like the only thing that matters is how well it does on never before seen data?

acoustic halo
#

I mean, it's okay as long as you are happy with it, but reducing the overfitting may make the result better on the test data

desert oar
#

a "tree" data structure holds data - a decision tree is an algorithm, and you store various parameters that make the algorithm work

desert oar
umbral ferry
#

actually, I'm adding parameters to reduce overfitting, and it's really not doing anything

#

huh

acoustic halo
#

reduce parameters, don't increase them

wheat sun
#

I meant to ask how ML models store what they get from fitting

#

Like, it concluded that if sex == male, not survived and if sex == female, survived

desert oar
# wheat sun I meant to ask how ML models store what they get from fitting

the scikit-learn decision tree implementation is all written in python, you could read through it if you're curious https://github.com/scikit-learn/scikit-learn/blob/82df48934eba1df9a1ed3be98aaace8eada59e6e/sklearn/tree/_classes.py#L445-L494

GitHub

scikit-learn: machine learning in Python. Contribute to scikit-learn/scikit-learn development by creating an account on GitHub.

wheat sun
#

The _init_ method is shorter than I expected

old grove
#

What is Covariance ? Can anyone explain with example.. all i know is If one var increases the second also increase then pos covariance but what if one moves up and other down... Can anyone explain waith ease whats covariance and how does it differ with correlation?

desert oar
desert oar
old grove
desert oar
chilly geyser
#

Covariance takes variance units

#

Pearson correlation measures linear correlation

old grove
#

ok

old grove
desert oar
#

covariance is unbounded

old grove
#

ok

desert oar
#

@fading burrow where did you get that diagram?

#

@fading burrow is this a question about neural networks or about pooling covid tests?

#

or something else

#

the calculations for that table appear to be in the source paper https://www.medrxiv.org/content/10.1101/2020.04.06.20052159v1

#

i see

#

i think p in that table is the expected frequency of positive results

#

skimming the paper, it sounds like they derived this table from numerical simulations

#

they have an appendix with some derivations though

#

...but i think the appendix is missing

#
#

yeah there are some derivations and formulas in that appendix

shut dock
grave frost
torpid scarab
#

hello

#

Anyone knows any good site with project ideas on AI? searching hard for my thesis

#

Thanks

sterile prawn
#

but for a phd thesis? i wouldn't look for an idea on a website

#

what do you want to do?

#

nlp?

#

computer vision?

torpid scarab
#

MSc

sterile prawn
#

generattive ai?

#

ok its good to start with an area of machine learning

#

regression?

#

neural net architectures?

torpid scarab
#

ye computer vision or sound i guess..I would love to create some easy hardware too, like connect it with arduino

desert oar
#

i still wouldn't look too hard for a masters thesis in a python discord server

sterile prawn
#

yea

#

we are not big brain ai people

sterile prawn
desert oar
#

there are also online communities more specifically focused on that field

#

that said, some hardware stuff could be interesting. not all theses have to be "implement an AI"

torpid scarab
#

ye...professors gave us some cases, didnt like any too much ๐Ÿ˜›

desert oar
#

your contribution could be "i got AI stuff to run on this tiny arduino and it's something that other people will find useful, here is the source code"

#

and your thesis wouldn't be "here is my cool machine learning model", it would be "here is how i got XYZ to run on an embedded system"

#

but i depends on your skillset

sterile prawn
#

if you want an ml thing in audio/computer vision - how about a model that generates audio from a slient video (with lip movements) that's a few shot learner

#

so you give it a few short clips of someone speaking

#

and it can figure out the rest

torpid scarab
#

I just have this idea of thesis would be something I like you know..trying to think ideas but I get stuck on implementation side..like how I ll manage to collect data

desert oar
#

collecting data is always the hardest part

sterile prawn
#

just an example

torpid scarab
sterile prawn
#

alright

#

i thought "what's a cross between audio and computer vision"

#

lip sync audio generation

torpid scarab
sterile prawn
#

what's state of the art? "generating new audio from lip sync video after training on a specific speaker"

#

how could that be improved? "rather than having to train on an individual speaker, make the training few-shot so it oculd work on anyone"

torpid scarab
#

that's a cool idea

#

Do you have anything of this coolness for AI and environment I like? I would love to use it for dunno maybe predict wildfires or detect stray animals and try to protect them..seems hard to find data though

sterile prawn
#

โค๏ธ Check out Snap's Residency Program and apply here: https://lensstudio.snapchat.com/snap-ar-creator-residency-program/?utm_source=twominutepapers&utm_medium=video&utm_campaign=tmp_ml_residency
โค๏ธ Try Snap's Lens Studio here: https://lensstudio.snapchat.com/

๐Ÿ“ The paper "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"...

โ–ถ Play video
#

i believe this is SOA

sterile prawn
# torpid scarab Do you have anything of this coolness for AI and environment I like? I would lov...

โค๏ธ Check out the Gradient Dissent podcast by Weights & Biases: http://wandb.me/gd

๐Ÿ“ The paper "Fire in Paradise: Mesoscale Simulation of Wildfires" is available here:
http://computationalsciences.org/publications/haedrich-2021-wildfires.html

๐Ÿ™ We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksand...

โ–ถ Play video
#

seems like a good starting point

torpid scarab
#

thank you! both!

sterile prawn
#

ok good luck!

#

@inland zephyr what are you saying lol

inland zephyr
#

hello i need help about keras.backend method.
I now try to validate the result of my model and loop through 100 times different combination of train valid test. I wrap my model train and evaluation method in a one function and call it inside for loop with different data

def train_and_test(trainX,trainY,validX,validY,testX,f_leng):
        tf.keras.backend.reset_uids()
        tf.keras.backend.clear_session()
        stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10,mode='min')
        model = Sequential()
       ...
        model.compile(loss = 'sparse_categorical_crossentropy',optimizer='Adam', metrics=["accuracy"])
        print(model.summary())
        with tf.device('/device:GPU:0'):
            history = model.fit(x=trainX,
                                  y=trainY,
                                  validation_data=(validX,validY),
                                  epochs=50,
                                  shuffle=True,
                                  batch_size=10,
                                  callbacks=[stop_early])
        ypred = model.predict(x = testX)
        ypred = ypred.argmax(axis=-1)
        return history,ypred

with the flow like this:

for each wavelet method:
  tf.keras.backend.reset_uids()
  tf.keras.backend.clear_session()
  for each level:
      set train x,train y
      call train_and_test function
  create pandas summary

I'm glad it works well but pretty suspicious when the result jump drastically from 70% to 90% for each decomposition level. It is same as other kind of wavelet i used.

#

lmao i'm pretty happy but suspicious with my CNN result

#

see both curve are likely identical whether the wavelet are different

#

i'm pretty affraid that the same model always learn with new weight in every for loop whether than assign fresh untrained model

#

wait i find the issue

#

i think is the problem lie on how i define the model

#

dunno if i wrong, is calling the model of each layer with .add or define it with list has different effect to the model definition?

umbral ferry
#

I know a good test/train split is 80/20, but if I decrease my test size and get better results on my test set, does that mean that a smaller test set is better for my specific dataset?

unborn glacier
#

It's probably just random chance that it's better

#

Or the fact that now you have more training data

#

You want a wide range of samples in your test set (hence making it 20% of the data) so that you can get a realistic picture of what it will do in the real world

umbral ferry
#

I'm also confused, say you did all the tuning and stuff, how do you go about generating the final model? do you run it a bunch of times until you get a low scoring metric? Do you train it on 100% of the data or a larger fraction?

#

how do you shape the best deliverable

unborn glacier
#

I think time is best spent trying a range of model types to get the best result, not fine-tuning the metrics with the same model

#

The point of the test data is to give you a preview of what it might do in the real world, but if you keep iterating you can get an artificial good result on the test data

#

Which it sounds like you may have done

#

That's also why people do test-train-validate, but if you have very limited amounts of data, it's probably not worth doing that

#

One place where fine-tuning might be worthwhile is in things like batch-size, as I think that can help eliminate over-fit

umbral ferry
#

I'm getting quite good fit on my training, and worse but still good on my test lol

inland zephyr
#

i using test train valid even if the data are very limited. the reason is i need a validation data to check if the model performance are almost linear in both train and valid... which mean check whether the model overfitted or not

unborn glacier
#

It's not the end of the world to have slight overfit, it's typical to have a model perform better on the train than the test. You don't want crazy large differences though

umbral ferry
#

I have large differences ๐Ÿ˜ฌ, RMSE on train is 2, test is 6.5

inland zephyr
#

when i using scheme 50-20-30 for train test valid split with random configuration sometimes my model overfit in validation after training or underfit the result... and reflected by the test one...

#

This is like what happen in my case after 50 epoch, even the train acc almost 100 but the validation is very low around 50 or 60 with the lowest val loss, the test will follow valid result and underfit happen... but not frequently happen

#

which scare me anytime

uncut orbit
sterile prawn
#

i just use 100% of data for training

uncut orbit
#

its been an hour

uncut orbit
sterile prawn
#

oh that was a joke

#

now and then im too lazy to make a validation callback

#

and jregret it later

uncut orbit
#

lmao

#

i took 10000 images of fake people

#

10000 images of real people

#

and its taking 10000 years to finish

grave frost
#

BART usually takes about 1-2 years

uncut orbit
lapis sequoia
# uncut orbit its been an hour

an hour for 4 epochs, each epoch takes approx. 827s. You wait around 3308s for 4 epochs (an hour), to reach 50 epochs, 41350s. 38042s you must wait, around 10.5 hours.

You must wait around 10 hours and 30 minutes for this to complete, correct me if i am wrong pwease.

uncut orbit
#

i will definetly

lapis sequoia
#

(Since posting)

chilly geyser
#

Well that's certainly average case

#

And I think that's a reasonable prediction

uncut orbit
#

but i dont know why i put the epochs so high because now its probably going to overfit

lapis sequoia
#

Mmm

uncut orbit
#

before i was running with 200 deepfakes and real people each

#

the score was ok

waxen veldt
#

whats the best way to find a mentor

#

also quick question

#
# lets say I have a list 
lst = [1, 2, 3, 4, 5]
data = pd.DataFrame({'Numbers': [3, 42, 5, 345, 36]})
#

I want to filter for the rows that have any of the numbers in lst

#

how can i do that?

#

the desired rows that I would want has the numbers, 3 and 5

#
# My solution 
data[data['Numbers'].apply(lambda x: x in lst)]
#

But i feel like there would be a faster way

iron basalt
#
data[data['Numbers'].isin(lst)]
waxen veldt
#

ahhhhhhhhhhhh

#

i seee

#

thanks

#

also what are good visualizations to see correlations between two categorical columns?

iron basalt
#

A table.

waxen veldt
#

im familiar with visualizations such as pointplot, violinplot, barplot, etc but those are all for categorical columns and also numerical columns

waxen veldt
iron basalt
#

(With coloring, red = high)

waxen veldt
#

hmm

#

i'm thinking countplot with a hue as the other categorical

#

but is there a better one

waxen veldt
iron basalt
#

rows are one column, cols are the other

#

cross

waxen veldt
#

yeah so corss tab

#

but is there like visualizations?

#

cuz a table isn't a visualization

iron basalt
#

It is, but there are others

#

Can do a graph, close nodes are highly correlated.

#

Graphs are useful when there are many components and you want an overview from a distance

#

Can zoom out

exotic maple
#

Isnt a heatmap prolly the best for crosstabs kind of data?

iron basalt
#

(But a very small cell size in a matrix works too)

#

Yea, colored table.

#

Color is pre-cognitive so it helps a lot.

#

The graph approach can give you insight into clusters.

#

A couple of things all correlated with each other.

waxen veldt
#

bet htanks

#

so for the heatmap

iron basalt
#

Graph renderings are kinda hard to find though compared to a simple table.

waxen veldt
#

would you first create a cross tab and then call sns.heatmap()

#

or pivot table with aggfunc=np.count if that will even work

lapis sequoia
#

hey can someone help me out. I'm new to tensorflow and am getting a dimension error on validation data.

basically im using an imagedatagenerator on my training data, when I try to evaluate the mdoel based on the evaluation data however, it throws an error.

here is the error, and im guessing its to do without the output since its a 10x1 array output.
ValueError: Shapes (None, 10, 2) and (None, 10) are incompatible

#

model.fit(datagen.flow(x=x_train, y=y_train, shuffle=True, batch_size=32), epochs=1,
callbacks=[callback], validation_data=(x_test, y_test))

#

here is my model.fit line, i bet its something here

#

could someone help

#

the error occurs at the end of the epoch

junior lintel
#

โ€˜โ€˜โ€™py
Testโ€™โ€™โ€™

lapis sequoia
#

wow ok, im just stupid nvm

lapis sequoia
#

What am I doing wrong? Why is the target variable and its column duplicated?

dummies = pd.get_dummies(df2, columns= ['payment_type','category_name'])
  
df3 = pd.concat([df2,dummies], axis='columns')
X = df3.loc[:,df3.columns != 'price']
# Target
y = df3['price']```
serene scaffold
#

also, the definition of X could just be X = df3.drop('price', axis='columns')

lapis sequoia
#

Hey not sure if this is the right place but I've been reading around but couldn't find 1 exact answer.

I am reading large files in python and looking for the fastest way to do so.

serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
#

I don't tbh
I've had my code running for 5 hours and it processed 1.85mil lines, every line I am checking a list with a length of 26k if that item in the list is equal to the line of the file

serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
#

Like multiprocessing?

serene scaffold
#

yes

lapis sequoia
#

I got 4 cores

serene scaffold
#

if processing each line doesn't depend on knowing about previous lines, you can do it in parallel

lapis sequoia
#

It's using about 25% of my CPU

serene scaffold
#

so maybe you could let it run overnight with three cores ๐Ÿคทโ€โ™‚๏ธ

lapis sequoia
#

hmm i could look into it

#

I just don't want it to kill my CPU over time lol

#

I have no idea if it will I'm kinda new to this

serene scaffold
#

@lapis sequoia if each process uses at most 25% then running three instances will give you some clearance.

#

But it's up to you to know what the peak memory usage is

lapis sequoia
#

alright thx

odd falcon
#

hello everyone

#

excuseme, has anyone worked with lstm before?

odd falcon
#

help me, understand lstm, please!!

royal crest
native bay
#

ok so the full form is long short term memory mostly its used in generating text messages like quotes or you can also make maths questions with it

#

and it is smart enough to also understand the grammar used in a sentence once you have enough data

#

LSTM or long short term memory is a special type of RNN that solves traditional RNN's short term memory problem. In this video I will give a very simple explanation of LSTM using some real life examples so that you can understand this difficult topic easily. Also refer to following blogs to explore math and understand few more details.

http://c...

โ–ถ Play video
#

this video will be more helpful ๐Ÿ™‚

ruby patio
#

i have seen it

scarlet python
#

@ruby patio great!

ruby patio
grave frost
native bay
grave frost
# ruby patio yeah bro

start with the official paper on arxiv, check out yannic kilcher or stack overflow if you have any doubts - or post them here

#

karpathy also wrote a ton of blogs on it, you can see them also

carmine tide
#

Hello, can someone help me a bit with curve fitting? I try fitting a curve on data with x and y errors using odr but it doesn't give me the correct curve. Thanks!

red mortar
#

how many GBs of data is recommended for machine learning (I'm choosing between 8 and 16)? I usually use Google colab for ML because of the free GPU, however i realized that in the future if i am doing things with larger datasets, google colab might not work because uploading the dataset to drive takes a really long time sometimes.

#

16gb data seems like overkill because i probably won't even do machine learning locally for the next few years, so i probably won't get it, but i am just wondering what other people think

grave frost
solid lintel
#

If you guys are looking for implementation of Machine Learning algorithms on python, I've made a github repository which you can follow (https://github.com/vanshhhhh/Hands-On-Machine-Learning)
If you find this helpful please do give it a star on github ๐Ÿ™‚

GitHub

This repository contains the implementation of all the Machine Learning algorithms like Regression, Classification, Clustering etc. All of this has been Implemented in python - GitHub - vanshhhhh/H...

undone flare
#

Does this mean as n_student increases, posttest decreases?

rigid zodiac
#

Hi guys, i have a quick question. if I have a data frame from 1 to 10, and I trying to create some sort of group like
(0,1,2), (1,2,3),(2,3,4) .... (8,9,10). How can I do it ?

chilly geyser
desert oar
#

or do you just need to perform a calculation on 3 rows at a time?

#

don't make us guess!

rigid zodiac
#

idk what is it called, like mathematical term. My goal is for a list of data ranging from 1 to 10, I will create kinda a partition or group like
Group1: (1,2,3)
group2: (2,3,4)
etc.

desert oar
#

yes, and what are you trying to do with those groups? how do you want to store them? do you even need to store them, or do you just want to perform a calculation on them?

#

are these dataframe rows? index values? etc

rigid zodiac
#

Once I have those group. I will use that as the number to identify data. Example data['c'].iloc(group1)

#

idk whether it is possible to do that or not

desert oar
#

and what do you want to do with that data?

#

so i ask again - what are you actually trying to do?

rigid zodiac
#

Thank you, I will look in to those documents. Reason why I do that is:
on the fall data that I have, I realize that the fall usually happen on the second smallest data. and the next 14 data of it has to be less than the 2nd smallest, specifically the data #2 up to #14 has to be between 0 and 20.

#

that's why I want to get the sequence of data then isolate it,

desert oar
#

!eval @rigid zodiac if you really just want the overlapping tuples, you can do something like this

from itertools import islice

def infinite_windows(window_size):
    start = 0
    while True:
        yield tuple(range(start, start+window_size))
        start += 1

windows = list(islice(infinite_windows(5), 5))
print(windows)
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7), (4, 5, 6, 7, 8)]
desert oar
#

but it sounds like you should probably be using DataFrame.rolling, i just don't really understand what you're trying to do

desert oar
#

!eval @rigid zodiac you might also want your "windows" to be (start, stop) pairs, which you can use with df.iloc[start : stop]:

window_size = 5
n_windows = 5
window_bounds = [(start, start+window_size) for start in range(n_windows)]
print(window_bounds)
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

[(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]
desert oar
#

!eval as opposed to the way i did it before:

window_size = 5
n_windows = 5
window_bounds = [(start, start+window_size) for start in range(n_windows)]
window_indices = [list(range(start, stop)) for start, stop in window_bounds]
print(window_indices)
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

[[0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8]]
rigid zodiac
desert oar
#

i'm just showing a couple different ways to do the same thing

#

it's a good exercise to figure out what these different versions do

odd patio
#

Is there is any other packages except pyautogui and opencv
I want a comment like Locateonscreen in pyautogui

rigid zodiac
odd patio
#

So is there is any other packages that do this work?

desert oar
#

!eval here's another one @rigid zodiac :

from itertools import count, islice

def infinite_windows(window_size: int) -> tuple[int, int]:
    for window_start in count():
        window_stop = window_start + window_size
        yield (window_start, window_stop)

def window_to_indices(window: tuple[int, int]) -> list[int]:
    start, stop = window
    return list(range(start, stop))

window_size = 5
n_windows = 5
windows = infinite_windows(window_size)
windows = islice(windows, n_windows)
windows = map(window_to_indices, windows)
windows = list(windows)
print(windows)
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

[[0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8]]
desert oar
rigid zodiac
agile jolt
#

hi, im new to jupyter notebook so i'm wondering what's wrong in here

#

solved it, nvm!

inland zephyr
#

hello i need suggestion about this case. When i try to run my CNN model, the loss reducted but in every epoch the result is raising. I using 38 training data and 12 validation for the validation. I know this is overfitted heavily.

1/1 [==============================] - 5s 5s/step - loss: 0.7031 - accuracy: 0.4000 - val_loss: 6.6536 - val_accuracy: 0.5000
Epoch 2/50
1/1 [==============================] - 0s 260ms/step - loss: 0.0332 - accuracy: 1.0000 - val_loss: 15.3232 - val_accuracy: 0.5000
Epoch 3/50
1/1 [==============================] - 0s 239ms/step - loss: 6.5698e-04 - accuracy: 1.0000 - val_loss: 25.3745 - val_accuracy: 0.5000
Epoch 4/50
1/1 [==============================] - 0s 244ms/step - loss: 3.0219e-06 - accuracy: 1.0000 - val_loss: 36.1942 - val_accuracy: 0.5000
Epoch 5/50
1/1 [==============================] - 0s 262ms/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 47.4359 - val_accuracy: 0.5000
Epoch 6/50
1/1 [==============================] - 0s 257ms/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 58.8482 - val_accuracy: 0.5000
Epoch 7/50
1/1 [==============================] - 0s 255ms/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 70.2395 - val_accuracy: 0.5000
Epoch 8/50
1/1 [==============================] - 0s 273ms/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 81.4671 - val_accuracy: 0.5000
Epoch 9/50
1/1 [==============================] - 0s 252ms/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 92.4239 - val_accuracy: 0.5000
Epoch 10/50
1/1 [==============================] - 0s 268ms/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 103.0329 - val_accuracy: 0.5000```
#

i cannot add more data due limited size of data (actually it is 1D data)

odd falcon
#

help me, search document implement sentiment analysis use CNN with pytorch ๐Ÿฅธ

sterile prawn
#

if you can't add any more data

#

then this will keep happening

#

try using a model with less parameters

#

maybe a simple one-layer perceptron

#

the less parametetrs

#

the more some info will be etractred from the data

#

or try finetuning a larger network

#

either way more daeta helps

inland zephyr
#

nvm i have check the problem

#

at one class, the data is has same proportion with the other class (since its binary classification), but the problem each data for that problematic class are repeated and identical due preprocessing issue

#

so even for 12 data, it always shown only 2 distinct data since the repetition

shrewd cradle
#

Hello, any idea how to extract last date of the week from yyyyww format data?
for example: My dataset has 201501 and I need last date from 1st week of 2015 i.e. 4-1-2015
using python^

thorny coral
#

how hard is it to learn ML and AI

trim stream
odd falcon
#

how to data text activate in CNN ?

earnest herald
#

I'm getting this error:
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (128, 20, 20)
and I know how to fix it:

currentStates = currentStates.reshape(-1, *env.STATE_SHAPE) # env.STATE_SHAPE = (20, 20, 1)
currentQsList = self.model.predict(currentStates)

and currentStates.shape returns (128, 20, 20, 1) as expected so why is it saying Full shape received: (128, 20, 20)

earnest herald
#

I'm convinced this is unsolvable

lapis sequoia
#

hey anyone know a GOOD text to speech engine that sounds like a google home?

sterile prawn
#

with work defo doable

#

but you need some math knowledge

#

some programming knowledge

#

a dash of natural aptitiude doesn't hurt

#

and most of all persistence

#

lots and lots of persistence

desert oar
#

@earnest herald are you running this in a notebook?

earnest herald
desert oar
#

what is self.model?

#

is it trying to do some batching or something?

#

i assume self.model is some sklearn-style wrapper around a keras model, but i have no idea what its predict method does

#

if you can share what libraries you are using and the full traceback, maybe someone can help

earnest herald
#

I'm on my phone now, can we speak in DM?

desert oar
#

i'd rather not

#

you can @ me when you get back to a computer

#

i'm on here most days

#

can't guarantee an answer though

earnest herald
#

okay,

#

self.model is a keras Sequential model with 1 Conv2d layer and has input shape 20,20,1

#

the predict method takes a batch of states (captures of a game in array form) and returns the output layer's values

#

but as a batch since currentStates is a batch of states

desert oar
#

Can you show the actual code

grizzled barn
#

Was thinking of expanding my knowledge into the machine learning/Ai category. Anyone have any tips beforehand/stuff I should know?

royal crest
#

linalg

rough otter
#

im not very experienced with unsupervised learning-if input data is not labeled then how do you determine whether a model is accurate or not?

grand breach
#

Is keras a wrapper around tensorflow

serene scaffold
#

@grand breach yes

vital compass
#

which modules are important to learn data science

serene scaffold
# vital compass which modules are important to learn data science
  • numpy: doing math, especially in large batches
  • pandas: manipulating tabular data
  • sklearn: lots of data science tools and models to work with
  • pytorch or tensorflow: deep learning stuff
  • matplotlib: data visualization

but focus on learning data science in general and doing projects. don't try to "learn libraries".

undone flare
#

Can I share my kaggle notebook? If anyone could tell me how to improve it and what bad practices I should avoid doing?

#

It's a simple linear regression problem

serene scaffold
undone flare
serene scaffold
undone flare
#

hmm?

serene scaffold
#

or did you write the whole thing? I thought it was like a template or something

undone flare
#

no no I wrote the whole thing

serene scaffold
#

ahhh

#
def score(y_test, y_pred):
    """Helper function for evaluation metrics."""
    print(f"""Explained Variance: {explained_variance_score(y_test, y_pred) * 100:.2f}%
MAE: {round(mean_absolute_error(y_test, y_pred), 2):.2f}""")

I find this difficult to read

#

maybe make variables and then put those in the f string?

undone flare
#

yea, will do that

serene scaffold
#

For all the cells where you go over the value counts for each feature, it might be interesting to show both the counts and the percent share

undone flare
#

thanks, will do

serene scaffold
#

pretty good, I think ๐Ÿ‘

undone flare
#

should I also consider the standard deviation or mse in this case?

serene scaffold
rigid zodiac
#

I need some help. I trying to create a loop like the following, but it just keep running forever.

for i in range(len(c)):
    if (c['ay'].iloc[i] == second_smallest(c['ay'])) and (c['ay'].iloc[i] < -20) and (c['az'].iloc[i] == second_smallest(c['az'])) and (c['az'].iloc[i] < -20):
        for j in range(1, len(c)):
            if (c['ay'].iloc[j] < abs(c['ay'].iloc[i])) and (c['az'].iloc[j] < abs(c['az'].iloc[i])): # frame 1 after minimum
                 for k in range(2,len(c)):
                    if (abs(c['ay'].iloc[k]) < 10 ) and (abs(c['az'].iloc[k]) < 10): # frame 2
                        for n in range(3, len(c)):
                            if (abs(c['ay'].iloc[n]) < 10 ) and (abs(c['az'].iloc[n]) < 10):# frame 3 
                                for m in range(4, len(c)):
                                    if (abs(c['ay'].iloc[m]) < 10 ) and (abs(c['az'].iloc[m]) < 10):# frame 4 
                                        for b in range(5, len(c)):
                                            if (abs(c['ay'].iloc[b]) < 10 ) and (abs(c['az'].iloc[b]) < 10):# frame 5 
                                                for v in range(6, len(c)):
                                                    if (abs(c['ay'].iloc[v]) < 10 ) and (abs(c['az'].iloc[v]) < 10):# frame 6 
                                                        for h in range(7, len(c)):
                                                            if (abs(c['ay'].iloc[h]) < 10 ) and (abs(c['az'].iloc[h]) < 10):# frame 7 
                                                                c['cat'].iloc[h] = 1```
serene scaffold
#

what is this supposed to do?

winged stratus
#

omg

serene scaffold
#

there must be a better way

winged stratus
#

please don't tell me you had to write this by hand

rigid zodiac
#

i use copy and paste

#

well my logic is if I get the second smallest, then if the next number is less than the second smallest.... and the subsequence 6 more number is ranging between 0 and 20 then the categorical is 1

#

but god forbid the pc didnt think like I do

#

like it will follow this logic i >j>k>n>m>b>v>h

rigid zodiac
acoustic halo
rigid zodiac
#

c has 67 data

acoustic halo
#

Because it looks like it runs O(n^8)

#

Yeah thats why lol

rigid zodiac
#

is there any better way to do this? I been google my ass off like entire of week

acoustic halo
#

What exactly is it doing?

rigid zodiac
#

it will categorize whether object will fall or not

rigid zodiac
# acoustic halo What exactly is it doing?

so for the 1st line, I'm trying to say that if there exist a second smallest in y and z, and the one right after that is less than its absolute value, and the absolute value of 6 more frame after that is between 0 and 20. Then cat =1

#

Like this image

#

but in the y and z acceleration. I also have to do like 3 more condition, similar with it as a fail proof. Because sometime we dont have a second smallest. we have smallest

acoustic halo
#

So let me see if I understand, you want to label data as 1 if it is a second smallest value, and the following 6 points don't go above 20?

rigid zodiac
acoustic halo
#

Okay well your nested for loops are super unnecessary because it repeats itself for example lets just look at a bit of it:

    if (abs(c['ay'].iloc[m]) < 10 ) and (abs(c['az'].iloc[m]) < 10):# frame 4 
        for b in range(5, len(c)):
            if (abs(c['ay'].iloc[b]) < 10 ) and (abs(c['az'].iloc[b]) < 10):# frame 5 ```

Lets say m is 10 and b is 11, on the next m loop, it checks b = 11 again
#

You could just have a single for loop, staring at second_smallest, and ending at the number of points you want to check

#

Infact give me a minute and I'll rewrite it to show you

#
for i in range(len(c)):
    if (c['ay'].iloc[i] == second_smallest(c['ay'])) and (c['ay'].iloc[i] < -20) and (c['az'].iloc[i] == second_smallest(c['az'])) and (c['az'].iloc[i] < -20)
    and (c['ay'].iloc[i+1] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+1] < abs(c['az'].iloc[i])) # frame 1 after minimum
    and (abs(c['ay'].iloc[i+2]) < 10 ) and (abs(c['az'].iloc[i+2]) < 10) # frame 2
    and (abs(c['ay'].iloc[i+3]) < 10 ) and (abs(c['az'].iloc[i+3]) < 10)# frame 3 
    and (abs(c['ay'].iloc[i+4]) < 10 ) and (abs(c['az'].iloc[i+4]) < 10)# frame 4 
    and (abs(c['ay'].iloc[i+5]) < 10 ) and (abs(c['az'].iloc[i+5]) < 10)# frame 5 
    and (abs(c['ay'].iloc[i+6]) < 10 ) and (abs(c['az'].iloc[i+6]) < 10)# frame 6 
    and(abs(c['ay'].iloc[i+7]) < 10 ) and (abs(c['az'].iloc[i+7]) < 10):# frame 7 
        c['cat'].iloc[h] = 1```
#

You can probably find a way to shorted the condition as well using another loop but i'll leave that to you to figure out, but this gets rid of the nested for loops

#

And I think should accomplish what you want to achieve

rigid zodiac
#

thank you so much let me try it

acoustic halo
#

then you can alter the giant condition with a for j in range 7 to shorten it

acoustic halo
#

You probably need to fix the indentation / newlines

#

It may be that the comments are cutting up the condition

#
for i in range(len(c)):
    if (c['ay'].iloc[i] == second_smallest(c['ay'])) and (c['ay'].iloc[i] < -20) and (c['az'].iloc[i] == second_smallest(c['az'])) and (c['az'].iloc[i] < -20) and (c['ay'].iloc[i+1] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+1] < abs(c['az'].iloc[i])) and (abs(c['ay'].iloc[i+2]) < 10 ) and (abs(c['az'].iloc[i+2]) < 10) and (abs(c['ay'].iloc[i+3]) < 10 ) and (abs(c['az'].iloc[i+3]) < 10) and (abs(c['ay'].iloc[i+4]) < 10 ) and (abs(c['az'].iloc[i+4]) < 10) and (abs(c['ay'].iloc[i+5]) < 10 ) and (abs(c['az'].iloc[i+5]) < 10) and (abs(c['ay'].iloc[i+6]) < 10 ) and (abs(c['az'].iloc[i+6]) < 10) and(abs(c['ay'].iloc[i+7]) < 10 ) and (abs(c['az'].iloc[i+7]) < 10):
        c['cat'].iloc[h] = 1```
rigid zodiac
acoustic halo
#

Which should be 1?

rigid zodiac
#

the ay at -310

acoustic halo
#

is -310 the second smallest in ay?

rigid zodiac
#

yeah

acoustic halo
#

Okay but look at the conditions

#

and (c['ay'].iloc[i+1] < abs(c['ay'].iloc[i])

#

c['ay'].iloc[i+1] is 207.523

#

Which is obviously bigger

rigid zodiac
#

yeah that's why from the second line I switch it back to its absolute value and compare the rest

acoustic halo
#

Yeah so you need to do the same here then

rigid zodiac
#

it has to pair with the az in other to work

winged stratus
#

looks like those ands can get an help from all

rigid zodiac
#

because some time ay happen, without az or ax, the whole thing will consider as fall

winged stratus
#

so ```py
if all([
c['ay'].iloc[i] == second_smallest(c['ay']),
c['ay'].iloc[i] < -20,
...
])

acoustic halo
#

a loop would be better since the conditions are basically the same with incremented indices but yeah

winged stratus
#

and put the c['ay'].iloc into a separate function

#
fn = c['ay'].iloc
fn(i), fn(i+1), ...
#

this should also help

rigid zodiac
#

so what will it look like?

winged stratus
#

and as spagoose said, it should be a loop

winged stratus
undone flare
#

I can safely drop a few categorical rows which are null if I have a big data set right? and it doesn't remove any of the unique values

rigid zodiac
acoustic halo
#

@rigid zodiac I realised the code i sent still has the h variable in it

#

SO obviously you need to remove that

acoustic halo
#

i+7 no?

#

Thats how you were doing it originally

#
for i in range(len(c)):
    condition_name = all([c['ay'].iloc[i] == second_smallest(c['ay']),
    c['ay'].iloc[i] < -20),
    c['az'].iloc[i] == second_smallest(c['az']),
    c['az'].iloc[i] < -20])
    
    for j in range (1, 8):
        condition_name &= (c['ay'].iloc[i+j] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+j] < abs(c['az'].iloc[i]))
        
    if condition_name:
        c['cat'].iloc[i+7] = 1```
#

But it could be i if thats where you wanted the label to be

rigid zodiac
#

it said out of bound.

#

for the previous code that you send it work. I will see whether it work for the other data

#
for i in range(len(c)):
    if (c['ay'].iloc[i] == second_smallest(c['ay'])) and (c['ay'].iloc[i] < -20) and (c['az'].iloc[i] == second_smallest(c['az'])) and (c['az'].iloc[i] < -20) and (c['ay'].iloc[i+1] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+1] < abs(c['az'].iloc[i])) and (abs(c['ay'].iloc[i+2]) < 10 ) and (abs(c['az'].iloc[i+2]) < 10) and (abs(c['ay'].iloc[i+3]) < 10 ) and (abs(c['az'].iloc[i+3]) < 10) and (abs(c['ay'].iloc[i+4]) < 10 ) and (abs(c['az'].iloc[i+4]) < 10) and (abs(c['ay'].iloc[i+5]) < 10 ) and (abs(c['az'].iloc[i+5]) < 10) and (abs(c['ay'].iloc[i+6]) < 10 ) and (abs(c['az'].iloc[i+6]) < 10) and(abs(c['ay'].iloc[i+7]) < 10 ) and (abs(c['az'].iloc[i+7]) < 10):
        c['cat'].iloc[i] = 1```
acoustic halo
#

I'm just trying to wing it in notepad, so theres bound to be errors, you should easily be able to resolve something like an out of bounds error though

undone flare
elfin storm
#

i have to make a system map on good health and well being i want some idea or examples

rigid zodiac
rigid zodiac
undone flare
acoustic halo
#

You dont have an elif?

rigid zodiac
# undone flare rows sorry

well my ultimate goal is just id the initial fall. So i think, if all of the condition is satisfy then the ML algorithm will say object fall

rigid zodiac
acoustic halo
#

Depends on what your trying to do exactly, a break will exit the for loop

rigid zodiac
#

well because sometime fall will happen if we have the second smallest. Occasionally, it will happen if it is a smallest

#

So I have

c['cat'] = np.nan
for i in range(len(c)):
    if (c['ay'].iloc[i] == second_smallest(c['ay'])) and (c['ay'].iloc[i] < -20) and (c['az'].iloc[i] == second_smallest(c['az'])) and (c['az'].iloc[i] < -20) and (c['ay'].iloc[i+1] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+1] < abs(c['az'].iloc[i])) and (abs(c['ay'].iloc[i+2]) < 10 ) and (abs(c['az'].iloc[i+2]) < 10) and (abs(c['ay'].iloc[i+3]) < 10 ) and (abs(c['az'].iloc[i+3]) < 10) and (abs(c['ay'].iloc[i+4]) < 10 ) and (abs(c['az'].iloc[i+4]) < 10) and (abs(c['ay'].iloc[i+5]) < 10 ) and (abs(c['az'].iloc[i+5]) < 10) and (abs(c['ay'].iloc[i+6]) < 10 ) and (abs(c['az'].iloc[i+6]) < 10) and(abs(c['ay'].iloc[i+7]) < 10 ) and (abs(c['az'].iloc[i+7]) < 10):
        c['cat'].iloc[i] = 1
        


    elif (c['ay'].iloc[i] == min(c['ay'])) and (c['ay'].iloc[i] < -20) and (c['az'].iloc[i] == min(c['az'])) and (c['az'].iloc[i] < -20) and (c['ay'].iloc[i+1] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+1] < abs(c['az'].iloc[i])) and (abs(c['ay'].iloc[i+2]) < 10 ) and (abs(c['az'].iloc[i+2]) < 10) and (abs(c['ay'].iloc[i+3]) < 10 ) and (abs(c['az'].iloc[i+3]) < 10) and (abs(c['ay'].iloc[i+4]) < 10 ) and (abs(c['az'].iloc[i+4]) < 10) and (abs(c['ay'].iloc[i+5]) < 10 ) and (abs(c['az'].iloc[i+5]) < 10) and (abs(c['ay'].iloc[i+6]) < 10 ) and (abs(c['az'].iloc[i+6]) < 10) and(abs(c['ay'].iloc[i+7]) < 10 ) and (abs(c['az'].iloc[i+7]) < 10):
        c['cat'].iloc[i] = 1```
#

I was just wondering do I need a "break" in order for that to jump to the elif

acoustic halo
#

no, no break needed for that

#

but you have two conditions that have the same result sooo

chilly geyser
#

Please refactor...

#

Why is the code like that?

rigid zodiac
#

agh, so it will be (c['ay'].iloc[i] <= second_smallest(c['ay']))

acoustic halo
#

Yeah you need to simplify the conditions, I only suggested that so you could see how I altered it from your original code

rigid zodiac
# chilly geyser Why is the code like that?

because I'm trying to say that if the second smallest. and satisfy those condition that I set. And the one right after that has to be < abs of the secondsmallest, and the next 7 data has to be between 20 and 0... then it has to be fall

chilly geyser
#

um what

#

I don't understand

#

I pasted it into VSC and I still don't understand

rigid zodiac
#

wait you mean my issue or your lol

chilly geyser
#

Right

#

This feels like one of those trend-change rules of thumbs

acoustic halo
#

it is basically

chilly geyser
#

Would be better if you can show an example dataframe

acoustic halo
#

But @rigid zodiac this simplifies what i said originally:

for i in range(len(c)):
    condition_name = all([c['ay'].iloc[i] == second_smallest(c['ay']),
    c['ay'].iloc[i] < -20,
    c['az'].iloc[i] == second_smallest(c['az']),
    c['az'].iloc[i] < -20])
    
    for j in range (1, 8):
        condition_name &= (c['ay'].iloc[i+j] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+j] < abs(c['az'].iloc[i]))
        
    if condition_name:
        c['cat'].iloc[i] = 1```
chilly geyser
#

why is second_smallest a dataframe too

acoustic halo
#

Then you just add any extra conditions to that

#

I would be tempted to remove the i loop as well, but idk exactly what your labelling rules are

rigid zodiac
acoustic halo
#

condition_name is just the name of the variable i picked because I don'tr know what a label of 1 represents, you should rename it

#

But add it to the same variable with an OR, since the result is the same, the label 1

rigid zodiac
acoustic halo
#

Lets say you add your new alternate condition, the result is the same : c['cat'].iloc[i] = 1 if its true

#

So why make a new condition, just make the original => original condition OR new condition

#

You can make a new condition if you want and its easier for you to read, the end result is the same, but it's less code otherwise

acoustic halo
#

@rigid zodiac like this for example:

for i in range(len(c)):
    condition_name = all([
    (c['ay'].iloc[i] == second_smallest(c['ay']) and c['az'].iloc[i] == second_smallest(c['az'])) or (c['ay'].iloc[i] == min(c['ay']) and c['az'].iloc[i] == min(c['az'])),
    c['ay'].iloc[i] < -20,
    c['az'].iloc[i] < -20])
    
    for j in range (1, 8):
        condition_name &= (c['ay'].iloc[i+j] < abs(c['ay'].iloc[i])) and (c['az'].iloc[i+j] < abs(c['az'].iloc[i]))
        
    if condition_name:
        c['cat'].iloc[i] = 1```
rigid zodiac
#

so that for ay and az right? what if I want to add the second condition for the ay and ax within that

acoustic halo
#

Yeah, but like I said, if you are not confident in doing it that way, do it however is easiest for you to understand

rigid zodiac
real wigeon
#

hello, I tried to google this but am having issues

#

im trying to use sqlalchemy to query my db

#

im just trying to get all the values for a specific column

undone flare
#
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=42)
model = RidgeCV(alphas=np.arange(0, 1, 0.01), cv=cv, scoring='neg_mean_absolute_error')
model.fit(X_train, y_train)
print('alpha: %f' % model.alpha_)
```any better way cuz this takes way too long
#

I guess my mistake trying to do 0.01

chilly geyser
#

Yeah that's 100 cases

#

You could probably throw it to some online host

undone flare
#

I- great