#data-science-and-ml

1 messages ยท Page 328 of 1

wispy wolf
#

bonus point if you can guess what I'm doing

desert oar
#

I am on mobile right now but i can try another time

wispy wolf
#

No worries!

#

I've just realised I never write my own pandas, I always refactor working pandas

mint palm
#

Doubt regarding MobilNet:
So i am referring the mobilNet which has resisual connection and a bottleneck layer(consisting of expansion, depthwise conv and projection)
My question :
Why does this NN architecture apart from being memory and computation efficient gives advantage of learning more complex and richer function?

#

I think, it is due to multiple conv and filters that the function is prevented from overfitting and features that are actually to be learned are more pronounced after application of filters in bottleneck.

hoary wigeon
#

Anyone to help me with multi colinearity ?

#

Multi co linearity (TotalBsmtSF, GarageArea) >> SalePrice

desert oar
#

I wouldn't worry about it unless you are fitting with OLS and you see high VIFs

chilly geyser
#

From the corr diagram all the areas are heavily related, and house ages are also related. (for hopefully obvious reasons)

While I wouldn't worry about it, I think using just 1 of both groups would predict SalePrice uses data most efficiently
Given n=9 I would run automated AIC-based exhaustive subset selection on it - it should terminate within reasonable time

hoary wigeon
#

cool then

chilly geyser
#

Just search for lowest(or highest if that's what your system uses) AIC

#

Typical modern computer should be able to do 2^9 fits without issues

solid kindle
#

Jupyter lab 3.0.16 sometimes jumps the position of the code cell on the screen when clicking on it (autoscroll a few lines), it often happens when I click on the output, then on the code, it's pretty annoying, is there a setting to change that, or it is a bug that I should feel on Jupyter GitHub?

glacial sparrow
#

when training and evaluating Isolation Forest on a labelled dataset i.e. 95% inliers 5% outliers, the minority class percentage should be the contamination ratio right?

robust snow
#

What's the recommended FPS, video resolution, & video size for video input for OpenCV's VideoCapture?

#

I'm right now trying a vehicle tracking program using my own video file (1080p, 30 FPS), but it stutter so much and the centroid is all over the place.

opaque stratus
#

Basically, I am using various BERT models to classify a certain section of patient's medical documents (electronic health records) -- however, much of the valuable information is written in a sort of 'note' format, and less so refined English.
BERT is extremely good at understanding a language, however, I can't help but think this task relies heavily on a set of key words rather than a complex understanding of the clinical language
This can be highlighted by the fact that
while the various BERT models I implement do indeed perform very well (high f1 scores across the board), there is not much different between them -- even though some have been pretrained on entirely different material
and some even employ different pretraining techniques
Does anyone know any NLP models that would be better suited for this? -- please @ me if so ! ty

deft harbor
#

@opaque stratus have you tried fine tuning your Bert model?

opaque stratus
#

all fine-tuned on the downstream task i mentioned above

cerulean stream
#

hi
so Im using matplotlib in different areas of 1 program and my issue is that the plots interfere with each other e.g.
after one plot is saved and the plots that are made after it uses the previous plot for the background MegaThonk
any idea how I would fix that

opaque stratus
cerulean stream
#

rooThink but matplotlib is in the channel description so

opaque stratus
#

oh

#

thought this was a help channel

#

my bad

#

but now nobody will see my message above =[

deft harbor
#

Have you tried gpt2

#

I've seen some people have success adding gnns to document based problems, but I'm not sure how to set that up without working through the specific data

tepid cobalt
lapis sequoia
#

How to see neural network properties from pickle open ? Like features used, number of layers, etc

quasi sparrow
#
def classifier_data(data,time_horizon):
    lagged_features=data.shift(time_horizon, axis = 0)
    data=data.add_suffix('_target')
    data_formatted = pd.concat([lagged_features,data], axis=1)
    data_formatted.dropna(axis='columns')
    return(data_formatted)
#

How can I drop the first of last rows of my dataframe?

#

The code works but it's not efficient because on dropna, it iterates over all rows to find a row with NAN but I already now where the NAN are.

vague stratus
inland zephyr
#

i need help about Tensorflow things. I successfuly run this model with keras tuner

def GetModel(hp):
    f_choice = hp.Choice('num_filters',values=[16,32, 64, 128,512],default=16)
    k_size = hp.Choice('kernelsize',values=[3,7,11,13,15],default = 3)
    max_k_size = hp.Choice('pool_kernelsize',values=[2,3,5,7,11],default = 2)
    dropouts = hp.Choice('dropout',values=[0.2,0.3,0.5,0.7,0.8],default=0.3)
    model = Sequential()
    model.add(InputLayer(input_shape=(15360,1)))
    model.add(Convolution1D(filters= f_choice,kernel_size=    k_size ,activation=tf.nn.leaky_relu,strides=1))
    model.add(Convolution1D(filters= f_choice,kernel_size=    k_size ,activation=tf.nn.leaky_relu,strides=1))
    model.add(Dropout(rate=dropouts))
    model.add(MaxPooling1D(pool_size=max_k_size,strides =2))
    model.add(Convolution1D(filters= f_choice,kernel_size=    k_size ,activation=tf.nn.leaky_relu,strides=1))
    model.add(Dropout(rate=dropouts ))
    model.add(MaxPooling1D(pool_size=max_k_size,strides = 2))
    model.add(Flatten())
    model.add(Dense(64,activation=tf.nn.leaky_relu))
    model.add(Dense(2,activation='softmax'))
    model.compile(loss = 'sparse_categorical_crossentropy',optimizer='Adam', metrics=["accuracy"])
    return model

and get the params from the hyperband which is 128 for filter lenght, 3 for kernelsize , 2 for pooling and 0.3 for dropouts. But with the same model in different function, i have error TypeError: unsupported operand type(s) for %: 'ListWrapper' and 'int'

#

i have run the same process before (i using google colab free tier for this) and using tf 2.5.0. This is the hyperband process:

def hyperband_test(trainX, trainY,order):
    print(order)
    seeds = 20
    stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10,mode='max')
    model = GetModel
    tuner = Hyperband(model,max_epochs=55,objective='val_accuracy',seed=seeds,executions_per_trial=2,factor=5,hyperband_iterations=1,directory='DIR')
    print(tuner.search_space_summary())
    tuner.search(trainX,trainY, epochs=50, validation_split=0.4,callbacks=[stop_early])
    best_hps=tuner.get_best_hyperparameters()[0]
    print(best_hps.get('num_filters'),
              best_hps.get('kernelsize'),
              best_hps.get('pool_kernelsize'),
              best_hps.get('dropout'))
    filter = best_hps.get('num_filters')
    kernel = best_hps.get('kernelsize')
    pooling = best_hps.get('pool_kernelsize')
    dropouts = best_hps.get('dropout')
    return filter,kernel,pooling,dropouts
inland zephyr
hexed walrus
cedar bluff
#

How to unite different queries results?

serene scaffold
#

!docs pandas.concat

arctic wedgeBOT
#

pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)```
Concatenate pandas objects along a particular axis with optional set logic along the other axes.

Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.
inland zephyr
cedar bluff
#

It doesn't work, I don't know why exactly
What does concat return?

serene scaffold
inland zephyr
cedar bluff
#

Hmm

hexed walrus
waxen veldt
#

pandas help

serene scaffold
waxen veldt
#

I have a categorical column mostly filled with integers with occasional lettering. I want to encode this data for my ML algorithm but don't know if I should use Label Encoding Or One-Hot Encoding.
If I use One Hot Encoding, I will get way too many columns since I have so many distinct labels. And if I do label encoding, I will run into the problem of ML algo considering some of the tickets hierarchal
which one would be best to use?
Or better question, how should I encode this categorical column

serene scaffold
waxen veldt
#

mm not really

serene scaffold
#

or are they just identifiers?

#

oh

waxen veldt
#

they are just identifiers

serene scaffold
#

what algorithm do you plan to use? because that's a lot of classes.

waxen veldt
#

actually since this is the titanic dataset from kaggle, the letter probably represents the floor level

serene scaffold
#

what are you trying to predict?

waxen veldt
serene scaffold
waxen veldt
#

here is more info

serene scaffold
#

you want to select features that are relevant to who lived or died.

waxen veldt
#

and that floor is implict in the ticket values

waxen veldt
serene scaffold
waxen veldt
#

im a bit new sorry

#

i see.

#

if i were to check the correlation between a column and my y-labels, whats the best way to do that?

serene scaffold
#

if there's a way you can parse what deck their cabin was on out of another column, you can add that as a new column.

#

actually their cabin number is given, but it's missing for most rows. So you probably can't use that.

waxen veldt
#

i dropped that column

#

i dropped name and cabin

#

columns

serene scaffold
#

Yeah, you don't want to use their name as a feature, either.

waxen veldt
#

my intuition is that sex won't matter either.

#

but like what if there is some weird correlation?

serene scaffold
#

men were a lower priority for the life boats. I would definitely keep that feature.

waxen veldt
#

so you basically have to know past data to decided which features are relevant

#

so if you were to get a new dataset, you would proceed to research about the topic at hand?

serene scaffold
waxen veldt
#

do you think i should just drop the tickets column and see how the model performs?

serene scaffold
waxen veldt
#

hmm

#

okay so pclass, sex, age, sibsp, parch, fare, and embarked are the columns i would select

#

i don't even think i need the id column

serene scaffold
#

but it shouldn't be a feature for sure ๐Ÿ˜„

waxen veldt
#

but the dataframe is already an index no?

#

so its basically double counting

#

like when you do pd.read_csv() it automatically creates an index for all of the rows

serene scaffold
serene scaffold
#

an index is assigned automatically, but you can instead pick a column that you want to use to index.

waxen veldt
#

or is it better to set the id column as the index

#

i see

proud oasis
#

@serene scaffold has all I want to say so I don't have more input, Tawsif ๐Ÿ™‚

proud oasis
#

Maybe you can search how the ticket numbers work and get the deck information from that or something, you can't really use those ticket numbers in raw format

waxen veldt
proud oasis
#

eh not really, I learned this stuff in school

acoustic halo
#

Just use one-hot

serene scaffold
#

or, don't encode them at all

waxen veldt
acoustic halo
#

label encoding assumes all the categories exist on a scale so thats out

#

Thats fine

serene scaffold
#

if there's a category with way too many possible values, the model isn't going to learn anything

waxen veldt
serene scaffold
#

just skip that feature

proud oasis
#

you don't have to use all data, if you use too much data maybe it's prone to overfitting

serene scaffold
acoustic halo
#

1000 features is not really that many in the grand scheme of things

waxen veldt
serene scaffold
waxen veldt
#

between the feature and your y_label

serene scaffold
serene scaffold
waxen veldt
serene scaffold
#
     lived  died
1    20     5
2    10     10
3    3      15

It would look like that

acoustic halo
#

Is the 1000 features actually just the unique id?

waxen veldt
#

i got more questions but ill hold off, ill try it out thanks for the help.

serene scaffold
waxen veldt
#

when you are doing one hot encoding, is it good to call drop_first=True for the pd.get_dummies column? I get that it reduces the number of columns you have, but doesn't it also lose info about the columns?

#

I.e if you call drop_first=True for the Sex column (pretending you didn't know the other sex), you would have no idea that "male" exists

acoustic halo
#

But you would have the second column still

#

So if a person is not female in the remaining column, you would know they are male

vague stratus
waxen veldt
acoustic halo
#

You're trying to one-hot encode the sex right?

#

Theres not really any difference between using one-hot and label encoding when there are only two options, except that label means one less input

waxen veldt
acoustic halo
#

If they exist on a scale, label encoding

#

Otherwise one hot with 4 or 5 columns

#

where with 4 columns 0,0,0,0 would be the fifth category

rich fox
#

what part of openai playground homepage would redirect to a page that looks like this

acoustic halo
#

@waxen veldt So if you have 3 categories for instance you could use this:

goat [0, 1, 0]
pig  [0, 0, 1]```
or
```cow  [1, 0]
goat [0, 1]
pig  [0, 0]
#

I prefer the first option for more than 2 categories, but ultimately it's personal preference

remote fossil
waxen veldt
remote fossil
#

How do you read the 3d dimensions? E.g. 4x4x512, Im guessing last is neurons in layer

acoustic halo
lapis sequoia
#

idky as soon as i imported pynput.keyboard and run i get this error

acoustic halo
# waxen veldt makes sense

This is why I prefer using all the categories in one-hot, for your sex column, you could use your drop first and rename the column something like isMale or isFemale depending on which you drop

lapis sequoia
serene scaffold
acoustic halo
#

You have doubly imported controller, one from pynput.mouse then again from keyboard

#

It's trying to use the keyboard Controller

serene scaffold
#

@lapis sequoia you could do import ... as ... to give them different names. Like MouseController

lapis sequoia
#

i did this

#

but now i get this error

acoustic halo
#

Now your keyboard controller will use the mouse

serene scaffold
#

btw, I'm not sure that this is on-topic

acoustic halo
#

Yeah definitely not

lapis sequoia
acoustic halo
#

Moving a mouse is not ai

serene scaffold
lapis sequoia
#

ok

#

is there a channel for tht?

serene scaffold
lapis sequoia
#

ok thank you

grave frost
#

plus LM's usually pick up a lot of things like medical stuff too

acoustic halo
#

@opaque stratus If you think it is more reliant on keywords, just try an n-gram based model, it can outperform bert in a lot of cases where there is no structured language

grave frost
#

even if its keyword based, BERT still manages to do it - hence "what does it matter"

acoustic halo
#

But he's asking if there is a model better suited for unstructured language

grave frost
#

hmmm....you seem familiar pithink

grave frost
acoustic halo
#

Better accuracy? What else lol

grave frost
flat hollow
#

Hi there, look for some help with the function scipy.stats.ttest_rel and my data. I have a dataframe with 4 columns and 2064 rows. The rows are separated into chunks of 86, so there is a TON of categories (see the screenshot to get an idea). I would like to find the ttest value between the group and individual columns for v_p and k_trans, but each input to ttest_rel should only contain the rows for each category (86 at a time). I could do this by for-looping over all the different categories in the multiindex, but I was hoping there would be a vectorised way of achieving the same thing. Any ideas of how to proceed?

quasi sparrow
#

So you have a three dimensional data set?

#

4 columns and 2064 rows times 86?

nova tapir
#

can someone explain?

flat hollow
# quasi sparrow 4 columns and 2064 rows times 86?

actually I realised its 72 different row-categories with each category having 28,66... values on average because patient_catg has variable number of rows per label, if I ignore that one then it's 24 different row-categories with 86 rows per category

#

2064 is the total number of rows

flat hollow
# nova tapir can someone explain?

isn't it just plugging numbers into a calculator? like if you do J(theta + epsilon) = 1,01^3 and J(theta - epsilon) = 0,99^3 you just get the answer as 3,0001

#

and derivative of J(theta) = theta^3 is 3*theta^2, with theta = 1 that's equal to 3 trivially

quasi sparrow
#

I'm not sure. Sorry I can't be of more help but I don't understand the dataset

#

I have a question too.

I'm training a regressor, the dataset is a combination of 4 systems that produce the same data output. I concatenated the 4 datasets to have more samples.Can I train a model without testing data and only using training and validation data?
Would it be a good practice to use only 3 systems to train data and then test the model with data coming from the fourth system just to test transfer learning?

#

Should I test the data using cross-validation?

potent cobalt
#

does anyone have a book they can recommend about time series forecasting with NNs?

serene scaffold
undone vigil
#

Hello, I'm new to machine learning and I'm creating a crawler and it should classify if a given URL is an article link or section link. My use cases are to determine if a web page is an article link is based on the structure of Document Object Model(DOM), because an article link would always have a title and content, and for the section link it is the opposite of the structure of article link's DOM structure.

I'm not yet sure if I should use random forest algorithm to classify a web page or other algorithm to perform a DOM analysis to classify if its an article link or section link. Any thoughts?

serene scaffold
undone vigil
serene scaffold
undone vigil
#

Some links have # and others don't have #. That's why I try not to based on links.

#

Also when I collect relative links I clean all the links such as replacing from https to http, and disallowing fragments(#) to avoid duplicate links.

serene scaffold
undone vigil
#

An article would always have a title and content.

#

That's why I'm not yet sure on what algorithm should I use for this web page classification.

serene scaffold
twin token
# quasi sparrow I have a question too. I'm training a regressor, the dataset is a combination o...

Well technical you can but it is much better having that third hold out test set to test your model. Remember the model will optimize on your validation set and might overfit on the specific data. Therefore the test set is your way of simulating how it would perform and thus generalize in a real world setting (and avoid overfitting). What you can do however is do the three splits and when you find the best model then do a final model on all data but that only makes sense if the model is to be deployed in a production environment. If it is for a paper it is not an option I guess.
Regards your second question: it depends on your case and data but In general I would say no. I would rather just split your data into train, test and validation based on random sampling. Random split should avoid any systematic bias that might occur e.g. between the systems. Is there any particular reason you wanna use one system output data as a test set?

twin token
willow gull
#

Serious question: Is it possible to code an AI capable of sentience in python? if so, I'd like to try

#

also where would i get started neural networking

#

it all looks really cool

#

idk

serene scaffold
#

@willow gull this is a philosophical question about what counts as sentience, but lots of cutting edge AI is written in python, so it's just a matter of your answer to that question.

willow gull
#

yeh

mystic harbor
#

hey @hoary spear , please speak try to speak in English ๐Ÿ˜…

#

!rule 4

arctic wedgeBOT
#

4. Use English to the best of your ability. Be polite if someone speaks English imperfectly.

hoary spear
#

My friend ask me
Something

mystic harbor
hoary spear
#

Ok

#

Thank you

mystic harbor
#

No worries ๐Ÿ‘

iron basalt
acoustic halo
#

Surely this is the right place to ask AI related questions considering we don't have a philosophy channel

#

And @willow gull, what you are thinking of is called a general artificial intelligence, and no, making one wouldn't be feasible without a supercomputer that far outperforms anything that currently exists and decades of research

iron basalt
#

I'm strongly of the opinion that we have more than enough compute power (way more than enough). We are just doing it wrong.

acoustic halo
#

I would say a brain does outperform a supercomputer, yes

iron basalt
#

Based on much simpler life having complex behavior that we fail to replicate.

acoustic halo
#

And that creating a human-like general AI would require the simulation of a brain, which is currently impossible

dire echo
#

Yeah

#

It will be around for 20-30 years later by predictions

#

And ASI will come in ???

iron basalt
#

I believe the biggest problem is not compute power, but rather that we are trying to catch up with millions or years of evolution (so much data and iterations (and reality is much more complex than any simulation)).

#

In many ways, we are data bound.

dire echo
#

Also how do we make the AI think like human

#

Human thinking is so complex

#

Thanks to God, we can think better

iron basalt
dire echo
#

What beyond ASI lemon_thinking

iron basalt
#

ASI implies that it surpasses humans, but it does not need to surpass humans to do any human task.

dire echo
#

Imagine when AI can think sooooo fast that i run out of "o"

iron basalt
#

Also it's very hard to tell what it even means to "surpass" at that point.

dire echo
#

Like it can learn a entire new galaxy on 1 sec

iron basalt
#

It can't do physically impossible things.

dire echo
dire echo
#

Maybe

iron basalt
#

So you can assume it will find something like that, but based on what? How is that better than the prediction that it can't do things that seem very physically impossible?

dire echo
#

or instead learning what have learned they just... learn new things that human have wall to reach

#

Like when space become big so the expanding speed goes vroom vroom then humans will probaly stuck

iron basalt
#

ASI will know things humans can't understand, but all because a worm can't understand calculus does not mean that we can do physically impossible things.

dire echo
iron basalt
#

(just based on what we know, which is all we can go off of, everything else is just hopeful fantasy)

nova tapir
#

does anyone know, why the code is not working? it gives "error: I: conversion of 1.0001 to octave_idx_type value failed" error

grave frost
# iron basalt In many ways, we are data bound.

I wouldn't see it too much of an issue; seeing that even if we have an intelligence that can demonstrate near-human thinking capability virtually, why it won't be able to learn as a robot from our surroundings too

acoustic halo
# grave frost how?

Well, it's hard to make any real direct comparison, because a brain is so vastly different to the Von Neuman architechture that a computer uses, you can't really say the brain is better in terms of computing performance since they work so differently, so my claim that it does outperform is dubious. That said, at being a human brain, the human brain does indeed outperform any supercomputer, unless said supercomputer is capable of perfectly emulating a brain (which currently hasn't happened)

#

TL;DR You can't really compare a computer to a brain, only to an emulation of a brain running on a supercomputer, and we can't do that.

tender hearth
# iron basalt In many ways, we are data bound.

if we could figure out how reasoning in the brain works we wouldn't be so data bound. trained ML models are just software encoded instincts, and instincts develop with experience thus the need for data

untold night
#

Hi, I am working on a time series model which should predict after how many days or months vaccination will be complete in India. I am applying Neural Prophet Model. I am able to predict vaccination dose for future but I am bit confused that how can predict days or months on which vaccination will be completed Acc. to population in India. Here is my notebook:- https://colab.research.google.com/drive/1DOaNNunweNo_GDs6GU6fQuaZCCTdbXmK?usp=sharing please help me how should I proceed further and does my approach is correct.

grave frost
# acoustic halo Well, it's hard to make any real direct comparison, because a brain is so vastly...

according to neuroscience, the most processes of a neuron are for its life-support capabilities since they are few of the most valuable cell types in the body. for the 150k cortical columns, even replicating 10x the amount would be doable on a supercomputer due to its sparse nature which makes computations a piece of cake. the real problem is processing the SDR's (basically Sparse matrices) but we don't even need FPGA's for that - current CPUs are really efficient for sparse ops.

#

overall, its more towards the understanding and the need of a universal frameworks that accounts for the processes in our brain like an equivalent to the 3 laws of newton for physics. previous attempts at making a framework were not very good but newer attempts lie HTM and SNNs are good for a start

#

previous attempts tried to explain the data for their experiments which has been conncluded to be not the best approach for complex systems - as demonstrated by General relativity.

lavish tundra
#

guys, i have two columns on a Dataframe where the cells of the columns are lists []
and i need to get the index and the name of the column where has one value, like the img bellow

i tried: db[['wts_id', 'wtb_id']].apply(lambda x: x.map(set(['123456']).issubset)) to get this img but idk how i can get the index and the column where the True is and how to do a If statement to check if has any True on this Dataframe, i'm kinda bad using apply and map =/

spiral ridge
#

@lavish tundra

#

Maybe this helps

lavish tundra
#

unfortunately it dont ; -;

#

its too basic

lapis sequoia
#

Hi, what's the best way to make a shelve file into a pandas df?

The shelve file is in the form
key:dict

and i want the df to be made of the keys and values from the dict.

here's what i am using now but it's very very slow

import shelve
import tqdm

with shelve.open('dbfile') as db:
    for key in tqdm.tqdm(keys):
        df = df.append(db[key],ignore_index=True)
woeful estuary
#

hello, can someone help me with darknet yolov4 on google colab?

#

i'm getting an error and it makes the kernel crash,

#

ping me if anybody can help

serene scaffold
woeful estuary
#

alr

#

this is what happen

#

give me a sec, ill put in the code cell that i run when this happen

#
# import darknet functions to perform object detections
from darknet import *
# load in our YOLOv4 architecture network
network, class_names, class_colors = load_network("cfg/yolov4-obj.cfg", "data/obj.data", "mydrive/yolov4/backup/yolov4-obj_1000.weights")
width = network_width(network)
height = network_height(network)

# darknet helper function to run detection on image
def darknet_helper(img, width, height):
  darknet_image = make_image(width, height, 3)
  img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  img_resized = cv2.resize(img_rgb, (width, height),
                              interpolation=cv2.INTER_LINEAR)

  # get image ratios to convert bounding boxes to proper size
  img_height, img_width, _ = img.shape
  width_ratio = img_width/width
  height_ratio = img_height/height

  # run model on darknet style image to get detections
  copy_image_from_bytes(darknet_image, img_resized.tobytes())
  detections = detect_image(network, class_names, darknet_image)
  free_image(darknet_image)
  return detections, width_ratio, height_ratio
#

i re run it again but still end up with the error

deft harbor
#

OOM?

#

Seems to crash when trying to load the weights, if I'm reading the order of events correctly

woeful estuary
#

i found a github issues and it says it can come from ipython

#

but i tried the solution and it didn't work

tacit wigeon
#

ill drop my question in here as its less of a direct question and more of a small discussion of my options in my situation:

im writing a script that takes 16-bit grayscale .tiff images as input and is supposed to run some analysis on them
the images contain a bright spot and some background noise which should be relatively weak, compared to the spot

i want to do cross sections that go through the brightest area, plot these and display a colourmapped version of the image, where i drew in straight lines of where the cross section is happening

i already have all of that up and running but the code is in desparate need of optimization
if the intensity is just a little on the lower end, the algorithm is no longer capable of finding the brightest area and places the cross section in the weirdes places
and if rhe intensity is just a bit too high, the spot in the colourmapped image will no longer have a clearly visible maximum, but just a big yellow smudge at the center (making it hard for me to estimate how good the algorithm nailed the brightest area)

#

this is the section of my code that looks for the brightest area

i didnt just go ahead and look for the maximum as the background noise can create random "hot pixels" where a single pixel has a peak value that i dont want the script to bother with

#

so i went with averaging areas of high intensity (the algorithm would fail if multiple spots were present but there is only ever one to be analyzed)

#

imgarray is a 2D numpy array that came from the .tiff and can have values from 0 to 65k (all int)

#

and this is my colourmapping which could use some improvements too

#

btw i for the threshold i tried any values between 70% and 99% and no value was good for all input images
though 70% and 80% usually yield the best results if the intensity is sufficiently high

#

is there an efficient algorithm or function that lets me search for the square shaped area in an image of a given size, that contains the highest overall intensity?

viral moat
#

Hey guys anyone active here ?

acoustic halo
tacit wigeon
#

yeah i could also think of brute forcing it but it would slow down the code significantly

#

and idk if i have much to gain from it

acoustic halo
#

If you just start with a square the size of the image and slowly decrease it until you reach peak brightness

#

It shoould end up with the optimum solution

#

unless you do have two large blobs

stray mortar
#

What is โ€œLetter Distributionโ€ and what is โ€œWord Distributionโ€ in NLP dataset while preforming Exploratory data analysis(EDA)?

acoustic halo
#

letter distribution is letter based n-grams

#

word distribution is word n-grams

stray mortar
#

can you tell me how to perform it i am new to data analysis and this dataset is my first nlp dataset

acoustic halo
#

i mean it depends on what you are trying to do but effectively its counting the frequency of each letter/word

stray mortar
#

if i will share my dataset can u help me just like an example on cloumn 2 ,index 0

#

because i am stuck on this issue from 2 days

acoustic halo
#

Show an example

#

And tell us what you actually want to do

stray mortar
#

one minute

#

this is the dataset

#

and this is the task

acoustic halo
tacit wigeon
#

particularly these annoying hot pixels

acoustic halo
tacit wigeon
#

usually 1x1

acoustic halo
#

just check the pixels left and right , above and below

#

if >2 are dark then its noise

tacit wigeon
#

good idea! ty!

acoustic halo
# stray mortar and this is the task

You could literally just count the occurence of each word/letter for each tweet, and figure out whether they mostly occur in negative or positive tweets

#

You'd likely be better off using words

stray mortar
acoustic halo
#

Letters on their own probably wont reveal much insight, for example does the letter "e" have a positive or negative sentiment?

#

What about the word "bad", that generally is used in negative sentiments

stray mortar
#

ohhhh

#

can you tell me how i can perform it

acoustic halo
#

Have you ever made a machine learning model before?

#

or done statistical analysis?

stray mortar
acoustic halo
#

then you should probably start by looking at an NLP basics course

cold sequoia
#

would suggest you go through a bit of supervised learning before you go through NLP and sentiment analysis

grave breach
#

Remember to threat the special chars right

quasi sparrow
#

Anyone familiar with XGBoost?

candid wraith
#

Good resources for building a dahsboard ?

quasi sparrow
quasi sparrow
# stray mortar this is the dataset

Send me a DM if you decide to use BERT. I did a project on bert and it took me 2 weeks to figure it out, lol.
Maybe I can save you some headaches.

twin token
undone holly
junior matrix
#

I am getting Linear regression accuracy as 100 percent...is that possible? Or maybe I messed up with data sets?

twin token
undone holly
#

oooooo

twin token
uncut orbit
junior matrix
#

I had done a dumb mistake... figured it out ๐Ÿ™‚...now it's showing around 90 percent accuracy

quasi sparrow
# twin token Yup

Thanks for the link!
When using XGBoost for regression, do you process the data as lagged features? Or do you just slice the columns out of the dataset and set it to the target

#

It's confusing because some tutorials are shifting data into lagged features and other just separate dataset into features and targets and train the model directly

twin token
quasi sparrow
#

Yes, it is a time series problem.
The model is a wind turbine and I have temperature, wind speed, power generated, position of vane, etc. I want to build a regression model to predict power output with a time horizon of 12 hours.

#

I did this, I have a function that shifts the data and concatenates the two dataframe on axis one.

#

So I end up with Dataframe ( [Features (not shifted),Targets(Features but shifted)])

#

All the sample code I found online are toy codes predicting boston housing prices but they don't do sliding window to preprocess data, which is confusing.

willow gull
#

where can I get started making AI?

twin token
willow gull
#

ok then

quasi sparrow
willow gull
#

are there any resources i can use to begin my ai learning adventure?

#

anything

iron basalt
iron basalt
willow gull
#

so uh

#

imma just look something up and hope it works

iron basalt
#

Basically, learning to learn is very important step for the robot which is still unsolved.

willow gull
#

speak english please i just woke up KEKA

grave breach
#

Wen we start living we don't learn by looking at data

#

We learn by interacting

#

Doing so we learn how to learn

#

For example, reading and understanding

willow gull
#

heh

iron basalt
# grave breach We learn by interacting

That is data, but yes, there is a feedback loop not present in the type of machine learning most are used to working with which makes it a very different problem (and much more difficult).

willow gull
#

this whole conversation is confusing ngl imma just sit back and eat my grapes

grave breach
grave breach
#

To start overcoming this I had to design a whole new algorithm

willow gull
#

so about how accurate can an AI be?

iron basalt
grave breach
#

That's not the first thing to do

#

Reasoning is a skill

willow gull
#

i just wanted some resources PepeCry

grave breach
#

By the way

#

What kind of resources you need?

willow gull
#

something just to start making an AI
i learn quick so that should be all I need

willow gull
#

k ill look it up rq

grave breach
#

Yes

willow gull
#

mk

iron basalt
grave breach
#

Alone cannot lead to something "cool"

iron basalt
#

Entropy maximization, etc

grave breach
iron basalt
grave breach
#

My problem was, that don't were flexible enough

#

Running a "search" for everything is "expensive"

#

I designed something that could adapt in a couple of frames

#

(if the task is simple)

#

And, if there are not too many layers to handle

iron basalt
#

Have you done such a task in real life on a real robot? There is this funny thing where ML algorithms that work in simulations do not translate at all to reality even in a very simple setup (by that I mean, untrained, learning only from reality).

grave breach
#

Some days I "ship" multiple version of the algorithm

#

I'm working really fast in this stage

iron basalt
#

(And setup)

grave breach
#

But there's currently an annoyng bug

#

Have to rething a function

#

What kind of "experiments" have you been doing?

#

@iron basalt

#

Or, are you just trying to figure out where to start?

iron basalt
grave breach
#

Cool

iron basalt
#

All the gym stuff, but those are much easier than real life stuff like that (but a good sanity check on a new algorithm).

grave breach
#

I agree

#

I have a couple of bots

#

It would be fun mounting the algorithm on them

#

I'll do it

iron basalt
#

We also specialize in machine learning that is very fast and needs little memory, while still out performing other much most costly algorithms.

#

So we run our algorithms on stuff like the pi zero.

grave breach
#

What do you mean by "we" are you part of a research team or something?

grave breach
iron basalt
#

Yes, but I don't feel like revealing identity at this time.

grave breach
#

Don't worry

#

I don't care about that

#

Just curious

iron basalt
#

Gotta use the little integrated GPU to at least downscale.

grave breach
#

(I mean, making a learning algorithm work with images on a pi zero)

quasi sparrow
iron basalt
#

Yea ours does online learning on a pi zero at about 70 fps. It required not using the python pi camera library because it was too slow (made our own).

#

But the model is of course very small. Which kind of helps us push the algorithm forward though. Do more with less.

#

While also still running it on a big beefy machine to see how it scales.

grave breach
#

I wish you best luck

#

If you have something cool to say send me a dm, always open to learn something cool

grave frost
grave breach
#

@grave frost By the way, I think that this might interest you: https://www.epfl.ch/research/domains/bluebrain/

EPFLโ€™s Blue Brain Project is a Swiss brain research Initiative led by Founder and Director Professor Henry Markram. The aim of Blue Brain is to establish simulation neuroscience as a complementary approachย alongsideย experimental, theoretical and clinical neuroscience toย understandingย the brain, by building the worldโ€™s first biologically detailed...

#

It aims to recreate a virtual "biological" brain

#

It is something big

grave breach
grave frost
#

it really doesn't matter what theory we take for WBE - its just different paths for the same goal

iron basalt
# grave frost if we do develop a close enough simulation of a brain - what's stopping it from ...

So you know how a biological organism must obtain food to stay alive? All interesting behavior comes as the result of that need. A robot has no such need by default so one would need to craft such a need that would lead to something interesting. Which is not very easy to do. But other than that, yeah sure. But I don't think we need tons of compute power to even do this. We don't need to simulate it at such a fine detail level. We only need the motifs (and maybe a few details like micro columns, grid cells, etc).

grave frost
#

the more important thing is to keep the funding flowing ๐Ÿคฃ

grave frost
#

basically how humans do it

iron basalt
#

But if you think it can work, please try it.

#

(And maybe write something about how it went, and if you think you just did it wrong, or the idea was wrong)

#

(Or maybe it needs a little something more)

potent ember
#

hello guys , need a little help , anyone here know about datasets in h5 files , need to finnish a project and dont have much time , a little help is appreciated

grave frost
# iron basalt Not sure how easy that would actually be to implement, nor if it would give the ...

basically how DL's probablity distribution represents the "confidence" of predictions? A reward for creating a model of the world (like how balls bounce, how carpets roll, how the sun appears from a certain point, how the notes in a music piece change etc.) and predicting it reasonably correctly & confidently would give it a positive reward and would endow human like behavior.

That is, if its brain is reasonably complex to atleast match human level abilities

grave frost
iron basalt
#

Hence stuff like the T-maze. Can it use the world model to navigate the maze?

#

I currently believe that anything that forces the robot to switch locations will result in interesting behavior simply because getting from A to B is difficult in a real world environment, especially a dynamically changing one.

#

(And will make it use its world model)

grave frost
iron basalt
#

Well we have multiple versions, thats the RL one.

iron basalt
# grave frost gridworld?

In behavioral science, a T-maze (or the variant Y-maze) is a simple maze used in animal cognition experiments. It is shaped like the letter T (or Y), providing the subject, typically a rodent, with a straightforward choice.
T-mazes are used to study how the rodents function with memory and spatial learning through applying various stimuli. Start...

grave frost
#

does sound like a great test for a POC

iron basalt
#

(and more complex mazes too)

grave frost
#

what fundamental theory are you using for the "brain"?

iron basalt
#

Similar to Jeff's stuff.

#

But more, uh, practical.

grave frost
iron basalt
#

Jeff Hawking

grave frost
#

something proprietary/new?

iron basalt
#

It's new and proprietary yes

grave frost
#

well, do send us a paper here!

iron basalt
#

When it does that maze like it's nothing I will def. send you something to look at.

#

Until then, we change stuff so rapidly (daily), that it would be a bit of a waste of time to write a long thing about it (since it can become completely outclassed the next day).

grave frost
#

but the basics are near/like HTM? so what would be different?

iron basalt
#

It's more like Jeff newer stuff. Or actually we bounce off of each other*

#

We cite each other*

grave frost
#

ooh, sounds pretty interesting ๐Ÿ˜‰ now I am curious about your real identity, but that's for another time ig ๐Ÿ™‚

#

note to self: check Jeff's citations for patterns

#

anyways, do update us on any new progress lemon_pleased

tranquil yarrow
#

Does anyone know how I might utilize my GPU to train tensorflow models and speed up the training process?

vague stratus
vague stratus
tranquil yarrow
#

is that available via pip?

vague stratus
#

Also first of all you need to install NVIDIA drivers

tranquil yarrow
#

Just latest drivers or drivers specifically for ML?

vague stratus
vague stratus
tranquil yarrow
#

thanks.

#

Gotta put this 3080 to work for more than just games!

austere swift
tranquil yarrow
austere swift
tranquil yarrow
#

Now if this gosh darn training would stop running on my CPU so I can test my GPU!

#

I didn't realize 5 epochs would take so long.

austere swift
#

why don't you cancel the training?

#

cpu training does take forever

tranquil yarrow
austere swift
tranquil yarrow
#

hmmmm....

austere swift
#

thats the standard keyboard interrupt key for cancelling any python programs

tranquil yarrow
#

Not a bad idea

austere swift
#

and even most non-python programs as well

tranquil yarrow
austere swift
#

yeah its the same on windows as well

#

ctrl c to cancel whatever's running

tranquil yarrow
austere swift
#

it should default to gpu

tranquil yarrow
#

okay thanks

austere swift
#

i wouldn't recommend using IDLE for advanced stuff btw

#

it would be better to switch to a proper editor/ide like vscode or pycharm or something

tranquil yarrow
#

setting up / training a DL model

austere swift
#

yeah linux is much better for that stuff imo

tranquil yarrow
#

Yeah, I'm being a little stupid right now.

solar rock
#

I am using Pandas to pump data into a csv. Is it at all possible to format a column in the dataframe using split? What do I need to Google by chance?

tranquil yarrow
#

@austere swift Do you know how long it should take to train this MNIST dataset with 5 epochs using something like an RTX 3080?

tranquil yarrow
#

This is going to take a while. I did not expect it to take this long.

#

It's just the basic tensorflow tutorial.

austere swift
#

on my A6000 (which is about equivalent to a 3090 in terms of compute speed but double the vram) takes like 4s per epoch

#

thats on a basic dense model though

tranquil yarrow
#

Then it seems like there's something wrong then.

austere swift
#

how long is it taking?

tranquil yarrow
#

This will take over an hour at this rate.

austere swift
#

how big is the model

#

also check task manager and see if your gpu memory is being filled

tranquil yarrow
#

It does appear that the GPU is using almost all 10GB of memory

austere swift
#

ok so it is using gpu

#

what's the model size?

tranquil yarrow
#

But it's those low-res MNIST images.

austere swift
#

that should not take that long

#

thats smaller than the one i was saying took 4s per epoch

tranquil yarrow
#

yeah, and a 3080 isn't that much less performant than a 3090

austere swift
austere swift
#

4s

#

per epoch

#

so it should be 4s on your 3080

#

are you running the code exactly how it is?

tranquil yarrow
#

Wait, why does it say "Utilization: 0%", but then the graph shows it almost full?

tranquil yarrow
austere swift
#

windows shows utilization for different gpu "engines"

#

the cuda engine isnt on there

#

in most cases

#

so it wont show it

tranquil yarrow
#

oh okay

#

Well, I know my GPU can handle some massive 3D files, so I don't think it's a GPU hardware issue.

austere swift
#

what cuda and cudnn versions

tranquil yarrow
#

latest cuda

#

let me check

austere swift
#

dont use latest

#

latest is 11.3 which tf doesnt like

tranquil yarrow
#

11.4.0

austere swift
#

use 11.2

tranquil yarrow
#

wtf

#

okay

austere swift
austere swift
tranquil yarrow
#

It came out in june

austere swift
#

TensorFlow supports CUDA 11.2 (TensorFlow >= 2.5.0)

tranquil yarrow
#

okay

#

does it matter which version of 11.2 I use?

austere swift
#

no

#

sub versions don't matter

tranquil yarrow
#

okay

austere swift
#

like 11.2 vs 11.2.1 doesnt make a difference

tranquil yarrow
#

Okay I'll just download latest 11.2 then

austere swift
#

yep

tranquil yarrow
#

I did install / re-install the latest drivers AFTER installing CUDA. Would this wipe out the CUDA installation?

austere swift
#

drivers shouldnt affect it

#

cuda is separate

tranquil yarrow
#

okay

tranquil yarrow
austere swift
#

just keep to what the page says tbh

#

DL gpu dependencies are a bitch to mess with

tranquil yarrow
#

the problem is the link takes me to the latest version

austere swift
#

you can go to the version archive

#

and get the older version

tranquil yarrow
#

yeah okay i found it

#

@austere swift Do you know if it's going to be a problem if there is an 11.4 folder in addition to the 11.2 folder inside the CUDA directory?

austere swift
#

there shouldnt be

tranquil yarrow
#

okay

austere swift
#

i personally have 4 cuda installs

tranquil yarrow
#

So tf should just get the proper version?

austere swift
#

yes

tranquil yarrow
#

okay

#

alright, let's test this bad boy out

#

Okay, well, I guess we'll see if everything is hunky dory in a moment.

austere swift
#

the epochs should take like less than 10s

tranquil yarrow
austere swift
#

no that's normal

tranquil yarrow
#

okay

grave frost
#

@iron basalt a gedankenexperiment - suppose you put a healthy human baby in a building full of insane people and suppose it's brought up completely by them. do you think despite being normal and having no defects, the baby would still be like them and act like them?

if you think from a human perspective, it seems the only reward we got early on was replication. if we learnt to say something the way parents say then the baby would usually get a human equivalent of reward.

We have knowledge distillation for approximating algos. It seems that suppose your model can 'learn' enough from it's 'parent' in a compute efficient manner (just like knowledge distillation). Thus, supposing a parent to be a sufficiently intellgent algo say a GA. then on simulating the movement of the GA with the least tries would be deserving of a huge reward.

it kind of feels like cheating, but suppose it sees enough amount of mazes and learns to replicate its so-called smart parent enough. Wouldn't it generalize pretty well to the mazes - hopefully enough to do online learning and understand enough about maze topology to atleast solve it at a PoC level?

tbh it depends on the algo you would end-up using. but I think the idea of a having a critic/parent would be a simple idea to perform this computationally effectively while having it evolve complex policies from its parent to be rewarded upon succesfully mimicing a parent enough to navigate the maze and get the cheese.
I would think dervivatives of this raw approach might serve you well for having a simple model that can learn with a simple reward that basically defines "intelligent" atleast at a naive level.

BTW sorry for the long wall of text ๐Ÿ˜… just had a suggestion that might be helpful

tranquil yarrow
#

@austere swift Still not done with first epoch yet.

#

Steps get progressively slower

#

on step 1291 out of 1875

austere swift
#

What other specs of your system?

tranquil yarrow
#

Ryzen 7 5800X CPU, RTX 3080, 32 MB DDR4 3200 RAM. nvme 1TB SSD

#

Python version 3.9

royal crest
#

32 MiB sweatDuck

tranquil yarrow
tranquil yarrow
#

yeah, 32 mb ram

royal crest
#

32 mb sweatDuck

austere swift
#

Iโ€™m pretty sure you have that wrong lol

royal crest
#

millibits or megabytes

#

either way that's not a lot

tranquil yarrow
#

megabytes

austere swift
#

are you sure

royal crest
tranquil yarrow
#

100% sure, yes

austere swift
#

Screenshot task manager

#

Because I guarantee youโ€™re wrong

royal crest
#

yeah screenshots please

#

htop etc

tranquil yarrow
#

GB

austere swift
#

Lol yeah

tranquil yarrow
#

gigabytes my bad

austere swift
#

Anyways that should run fine

tranquil yarrow
#

Yeah, I don't know what the effing problem is.

#

It runs AAA games at ultra settings perfectly fine

austere swift
#

I was just double checking it wasnโ€™t like a 10 year old dell optiplex or something stupid like that

tranquil yarrow
#

lol no

#

I've got like 15-20 tabs open in Edge, but that shouldn't be the issue.

#

It's only using 10.3 GB of system ram atm

royal crest
#

jesus

tranquil yarrow
royal crest
#

okay i've scrolled up to read the fullest of the context

#

i think nvidia-smi is the only way to know for sure if your model is indeed using your GPU

tranquil yarrow
#

Is that a python or CMD command?

#

My GPU memory is almost getting maxed out.

royal crest
#

hmm so it is being used

tranquil yarrow
#

apparently

#

This is absurdly long, though. Nearly an hour and not even one epoch

royal crest
#

real solution: use linux

tranquil yarrow
#

I could switch over to Linux. I'm thinking of trying that. I have Ubuntu installed on this machine.

#

I'd have to download all the drivers again, though.

royal crest
#

well it's a one stop shop if you have linux

#

thanks to package managers

tranquil yarrow
#

Okay, eff Windows. I'm going to get onto Ubuntu.

#

Windows should still be able to run this, though.

#

I don't like not being able to do this in Windows also.

royal crest
#

it's just a matter of how much hassle you're willing to go through to get it working on ๐ŸชŸ

tranquil yarrow
royal crest
tranquil yarrow
#

It doesn't even recognize my main monitor

royal crest
#

Anyways i think we are getting off-topic to this channel here

#

for linux related stuff #unix exists!

#

my favourite channel

tender hearth
tranquil yarrow
#

into the Python REPL

#

Python 3.9

#

I have not fixed it yet. I'm installing CUDA drivers in Linux right now.

tranquil yarrow
austere swift
#

i've had a ton of random bugs in windows that just were gone in linux

tranquil yarrow
#

It would probably still be running in Windows if I hadn't stopped it.

dire echo
#

Ios lemon_unamused

#

Disgastang

drifting ermine
inland zephyr
#

i want to ask about Earlystopping strategy. Some people said that put val_accuracy with max objective and some using val_loss with minimum objective. In hyperband original paper, they using val_loss as earlystopping objective. But most tutorial around web using val_accuracy as objective. What is the drawbacks and your suggestion about the strategy? I using val_loss to follow the original implementation with 10 patience.

real perch
#

there won't be a big difference in using val loss or val accuracy as the early-stopping metric

#

the thing is, they are kind of loosely correlated, and mostly whenever val loss increased, accuracy decreases (not a hard rule, but most of the time this happens)

inland zephyr
#

but as far as i know, the loss can be bigger than 1 but cannot be 0 and accuracy must lie between 0 to 1. There's some test i done using both, and val_accuracy have tendency to give higher outcome in accuracy and F1 compare with val_loss.

real perch
#

then ig you can go with val-accuracy. a lot depends on the nature of the dataset for these kinds of things, so...

inland zephyr
#

i'm lack of scientific reference for this...

#

and also about usage Batch Normalization instead dropout

fathom slate
#

Can a fresher with intermediate python along with average-level dsa get a job as a data engineer? given that all the backend stuff will be learned. what are the projects that one have to do, I mean whether python projs or projects with backend stuff like that in kaggle.com. please share your opinion

serene scaffold
#

@fathom slate what is a fresher?

eager imp
#

hey, i need some pointers for a keras model i've build that i want to train with a generator

#

i'm using tf.keras.utils.Sequence for the generator

#
class DataGenerator(Sequence):
    def __init__(self, data:np.array, batch_size:int=1):
        data = scaler.fit_transform(data)
        #sample = data[np.random.choice(data.shape[0])].astype('float32').reshape(-1, 1)
        self.data = data
        self.labels = labels
    def __len__(self):
        return 7
    def __getitem__(self, idx):
        #  yield [imgs, cols], targets
        return np.array(self.data[idx:idx+2]), np.array([1,1])
#

for now it should just return two samples with a single label to make it work

#

the issue i run in is ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (None, None)

#

what's happening here?

proven sigil
#

hi

#

anyone here use dask?

fathom slate
serene scaffold
serene scaffold
fathom slate
#

I think you got me wrong here. I like to restate that "a gradute who's just completed his/her bachelor's programme applying for their first job would be considered as a fresher."

grave frost
#

ay, that's an indian term

#

yea, kaggle would serve you well even if you can't score well in the comps

velvet thorn
#

indeed it is

serene scaffold
grave frost
#

you would need to be quite competitive if you areindeed in India, due to the low demand for indian jobs and high supply - especially in DS which almost everyone i s doing

fathom slate
#

Thank you for A2A

grave frost
round mango
#

yoooo

#

im data scientist

#

u can ask me anything

#

i work at google

#

BOIS

proven sigil
lapis sequoia
#

why using a cnn instead of a capsnet?

grim oar
#

guys i get this error

Unable to allocate 288. MiB for an array with shape (23, 1640597) and data type int64

What can i do

#

All i did was try to upload a dataset to pandas

void anvil
#

Hey guys, say I have a table of people with multiple datapoints such as their hobbies, favorite store, and favorite car. What algorithm is the best to rank the closest matching people to any record in the table

grim oar
#

Windows

#

This only happens with pycharm

#

When I do it with Jupyter lab I don't get any errors but the program just crashes

undone flare
#

what python version 32 bit or 64 bit?

grim oar
#

64 bit

#

I think I am running out of memory

undone flare
#

can you try changing the dtype to uint8 or would that mess up some data

grim oar
#

Let me see if they will work

#

How do you general work with large data sets

#

The one I am working with is around 4gb

undone flare
#

let me see if I get this

#
np.zeros((156816, 36, 53806), dtype='uint8')

-------------------------------------------------------------------
MemoryError                       Traceback (most recent call last)
<ipython-input-26-9392083af14d> in <module>
----> 1 np.zeros((156816, 36, 53806), dtype='uint8')

MemoryError: Unable to allocate 283. GiB for an array with shape (156816, 36, 53806) and data type uint8

hmm

#

let me try something

grim oar
#

Wow 293 gb

#

Mine stop functioning at mb lol

#

Maybe I need more ram

undone flare
#

works now

#

:)

#

I changed it to always overcommit mode

#

idk how to do that on windows tho

#

on linux it's echo 1 > /proc/sys/vm/overcommit_memory

grim oar
#

hmmm

#

what would happen if i run out of memory?

undone flare
grim oar
#

is there any way to change it in windows?

undone flare
#

that so post should have for windows

grim oar
#

i was just reading through that

#

alright ill do both

#

making into unit 8

undone flare
#

I don't think making it uint8 would do anything tho u can try

grim oar
#

I was trying to follow the method, but it doesnt give me the option run as admin

undone flare
#

um if you right click does it not show Run as Administrator?

grim oar
#

But it works normally

#

So I'll try it with that

#

Yeah it doesn't give me the option when I right click

undone flare
#

ยฏ_(ใƒ„)_/ยฏ

grim oar
#

maybe i can break the data down into smaller chunks

#

thank you for your help @undone flare I really appreciate it. I am not very at tech haha

undone flare
#

or you can remove something which won't help you much in data analysis

grim oar
#

right I am just using a corelation matrix and removing all the attributes that dont make much sense

grave frost
quasi sparrow
#

Guys, I have a question, I seem to have the concepts mixed.

#

When talking about regression, regression estimates based on the correlation of the features while time-series estimates based on extrapolation, correct?

#

Then, if regression uses correlation, why it is considered in machine learning a "regression" problem and not a classification problem.

#

I know it's super basic but I'm a little bit rusty.

#

is this why when training LSTM networks, 2D data must be converted into a 3D dataframe? To work the problem as regression in one time step and as a time-series problem in multiple time steps?

lapis sequoia
#

hey guys

#

in my jupyter notebook

#

it's not showing gpu

#

i already downloaded nividia cuda

#

also tensorlfow

#

show gpu - 0

#

i have RTX gpu

#

pls treat as urgnet

#

@bitter kayak no one is replying : (

#

it's urgent please

#

i also have lot's of work today

#

i'm in stress

uncut orbit
grave frost
#

use colab my dude

lapis sequoia
lapis sequoia
lapis sequoia
bronze skiff
#

try pytorch-- it comes with its own cuda binaries so it should work even without you installing cudatoolkit

#

unless of course you are missing drivers for the card itself-- at which point, download the drivers

tranquil yarrow
# austere swift windows is shit

Ubuntu wouldn't boot today after installing the CUDA drivers. Managed to get it resolved (at least for now), but it seems like the common thread here is Nvidia. The immortal words of Linus Torvalds never rang truer than now.

lapis sequoia
tepid rapids
#

im working on a project involving using reinforcement learning on battlegrounds with hearthstone. im trying to code an environment for it in openai gym but i cant figure out how to deal with mouse control. anyone got any advice?

lapis sequoia
#

you'll recommend

tranquil yarrow
lapis sequoia
#

i'll follow the video

#

it's comfortable

tepid rapids
#

it's a masters project that i have no intention of releasing to the public. would this be a problem? i never considered the ethical issues...

tranquil yarrow
#

If it breaks terms of service, it's not really something people are supposed to be chatting about in here.

tepid rapids
#

i see

#

understandable. thank you anyway.

tranquil yarrow
#

Maybe you should consider a project that can't be so readily used for unethical purposes.

tepid rapids
#

its fairly late to change. it's due in september

#

i might be forced to though

tranquil yarrow
#

I'm just saying your project may have more future value if it had some market potential. I know people make money off game hacks, but it's not really something you can talk about out in the open in a lot of places.

tepid rapids
#

well my other idea involved detecting network intrusions using a system log. This seemed like it had already been done numerous times though

cold mantle
#

Can someone help me in #help-pineapple, just me being a idiot, but i dont know how to fix

elfin trellis
#

need some advice for object detection modeling. Which is better for large datasets: Gluoncv, pytorch, or tensorflow?

civic summit
#

i wanted to build a ordinal log regression model but i cant find a working guide. i tried using from statsmodels.miscmodels.ordinal_model import OrderedModel for statsmodel but it seems like it is no longer supported. Anyone know how i can run a ordinal log reg in python?

civic summit
#

even from mord import LogisticAT doesnt work

lapis sequoia
#

anyone here know how to write good airflow pipelines? do you have a repo you can share? thanks!

royal crest
#

i have 2 GPUs they show up as gpu0 and gpu1

royal crest
#

with an in-game overlay

#

Blizzard doesn't consider that cheating

lapis sequoia
royal crest
#

solve what

#

if you're using tf try tf.config.list_physical_devices('GPU')

#

if they show you anything but False then you're good to go

#

and ensure that your tf is built with cuda

#

check this by doing tf.test.is_built_with_cuda

lapis sequoia
royal crest
#

have you checked if your tf is built with cuda?

gloomy crater
#

Does anyone have experience with Apache spark (Pyspark). Have some small general doubt

#

Or if anyone is currently working as a Data Scientist or for a company that's into retail analysis ?

lapis sequoia
#

i am new to ML

#

pls help

royal crest
#

What would you like us to do?

mystic edge
#

Hey guys, I am using Gekko to try to solve a Mixed Integer Optimisation problem for a hobby of mine and it is worked somewhat well I think, as in the answer is at least reasonable but when I try to read the debug... Well because I only learned it a surface level I am not sure how to read it and some things seems weird? That is mostly when I increase the domain the optimisation has to go over but regardless, does anyone has like a solid article that would point me in the right direction for understanding well the results bellow?

And the weird things are for example:

#

It is a maximisation problem, why is Iter 2's obj lower (less negative) then the first one

#

and also, if I set one of the equations which just says a couple of variables have to be lower then X, where if those variable's are higher then X it has no change in the answer, to a very high number... Say from 100 to 100000, the obj gets even lower. Still the solver only shows me 2 iters

#

But then if I increase the Max_Iter option to 1000000, it goes back to giving me the previous answer, although still only showing 2 iters and with the first iter having a higher (more negative) obj then the second one

short heart
#

Anybody encountered this?

NotImplementedError: Layer ModuleWrapper has arguments in `__init__` and therefore must override `get_config`.```

From what Ive read it can pop up if theres a custom layer, but I dont think Im using any

```py
model=Sequential()
model.add(Conv2D(128, 3, activation='relu', kernel_initializer='he_uniform', input_shape=(256,256, 1)))
model.add(MaxPooling2D())
model.add(Conv2D(64, 3, activation='relu', kernel_initializer='he_uniform'))
model.add(Conv2D(32, 3, activation='relu', kernel_initializer='he_uniform'))
#model.add(MaxPooling2D())
model.add(GlobalAveragePooling2D())
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer=
                  Adam(learning_rate=0.01),#,momentum=0.9, clipvalue=0.5),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])```
#

I found that the problem is in GlobalAveragePooling, but i dont see how its a custom layer

#

Solved: mixing up layer and model from keras and tf.keras

solar rock
#

Anyone have any good resources on formatting a column in a Pandas data frame? I have a column where each row is a list and I was to extract a value from each list item.

lavish tundra
#

guys i have a img like this and i'm trying to make it be clear and easy to be read by tesseract to convert it to a string text, i tried to resize it, but looks like what could help it be better is about to change the dpi, but i cant find someone using PIL or opencv to change the dpi of one image, someone can help me with that?

#

i tried to use opencv to do some things with this image, but nothing works

grand breach
#

How do i resolve conflicts between pip and conda?

This is usually due to pip uninstalling or clobbering conda managed files,
resulting in an inconsistent environment. Please check your environment for
conda/pip conflicts using conda list, and fix the environment by ensuring
only one version of each package is installed (conda preferred)

royal crest
grand breach
#

i'm in the base env

royal crest
#

or are we talking system's pip and conda clashing

grand breach
#

and trying to run conda pack

grand breach
#

i'm trying to pack base env

#

basically trying to move my conda installation to a different drive

#

@royal crest how do i fix this? update the two conflicting pkgs via conda?

#

this github issue here (https://github.com/conda/conda-pack/issues/98) says: " the package cache no longer has a copy of the packages in the environment. So unfortunately, the same mechanism conda requires to clone an environment is needed be conda-pack to turn it into an archive"

lean pebble
#

Who can help with yolov3. I need to check coordinate of middle of pickture what that found. Idk how to teach ai but I need teach and download ai to my pc and use in my bot

waxen veldt
#

what is the point of factorplots when you can individually call violinplot, barplot, boxplot, etc

#

and all of them have the hue argument right? so what would be the point

short heart
#

Is it better to fit small chunks of data with higher batch size or bigger chunks of data with small batch size for gpu

short heart
#

I wont be able to just fit 560000 images instantly, so I only fit 10k and move on to next 10k and so on

lean pebble
#

I have all picture what is need

grave breach
#

Could you explain better?

nova cliff
#

Am i alloweed to ask questions in here

grave breach
#

Yes

nova cliff
#

Ok

#

I am trying to scrape data off of a website

#

im using bs4,requests,beautiful soup and im getting cloudflare errors when sending the request

#

Tried having my useragent set to chrome, any thoughts?

grave breach
#

I don't think that's the right channel

nova cliff
#

Oh> sry

grave breach
#

No problem

#

Try asking in an open help channel

misty flint
#

has anyone worked with FHIR

lean pebble
# grave breach Could you explain better?

Now I'm doing a system of automatic closure of advertising and I need to train artificial intelligence to find a cross and then click in the middle. Please help make ai and download to your computer and use it.

#

How i can teach bot and download and use on my pc

grave breach
#

Wait, I think that using ai could cause problems

#

Crosses aren't just used for ads, even for general purpose UI

#

It's better to download the image of commonly used crosses in ad

#

(like the one in youtube)

#

and use them as masks in OpenCV

#

But it could be better to handle the thing by manipulating the page's source

#

@lean pebble

lean pebble
#

Ok

#

Idk how to use openvc

#

Because I didnโ€™t use ai

opaque stratus
#

Hello! I am trying to find the optimal language model for a document classification task. These documents are long, however, the important sentiment is just a few sentences or less. I tried 6 BERT models, and all showed similar high F1 scores, which implies that the domain-specific pretraining of these models did not affect performance. This tells me that the optimal model would be able to really understand a small set of key words and phrases rather than need a complex understanding of the domain-specific language. Anyone have any suggestions or ideas of where to find this model? please @ me if so, thank you

late shell
#

Hello, I am just getting started with NN, and wanted to code up a simple NN with just using numpy. The network architecture looks like the below picture. But I'm stuck at the updating the parameters in the gradient descent part. i.e :

W1 = W1 - alpha * dC_dW1
b1 = b1 - alpha * dC_db1

W2 = W2 - alpha * dC_dW2
b2 = b2 - alpha * dC_db2
# Where W1 & b1 represents the matrix of weights and biases in layer 1 respectively,
# W2 & b2 represents the matrix of weights and bias in layer 2 respectively.
```Since there are 9 weights and 4 biases, do I have to calculate the equation for derivative of cost function with respect to each of those 9 weights and 4 biases, and then vectorize them? That's 13 equations to code up, is this really how it's done if you want to code it from scratch?
grave breach
opaque stratus
#

clinical notes

opaque stratus
#

it is just all coming down to a few sentences of information in them

#

like

#

general BERT trained on general english text outperforms BERT trained on clinical documents

serene scaffold
opaque stratus
#

thank you

rigid zodiac
#

Dumb question, if we have a small data like 15 of them, and we feed it to the ML algorithm. The accuracy will be like 100%? or else

opaque stratus
#

15 samples is not enough for a model to learn

#

even if you doubled that you could see a huge improvement!

rigid zodiac
#

that's what I afraid so far lol. For categorical variable, what ML model do you recommend?

tranquil yarrow
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @magic dune until <t:1627066391:f> (9 minutes and 59 seconds) (reason: newlines rule: sent 114 newlines in 10s).

atomic tide
#

!unmute 555944200047296513

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: pardoned infraction mute for @magic dune.

#

:x: There's no active mute infraction for user @magic dune.

atomic tide
#

Please use this service to paste large amounts of code:

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

magic dune
#

Won't happen again

noble sand
magic dune
umbral ferry
#

when ever I use df.describe() and specify the percentiles, it always includes the 50th percentile. I don't want it to include that, how do I make it not?

austere swift
#

so what do you mean by your figures dont get sized appropriately

noble sand
magic dune
#

help pls

austere swift
noble sand
steel ravine
magic dune
noble sand
magic dune
#

but this guy just a question in the channel

steel ravine
#

Ok

umbral ferry
#

which scoring function should I use with SelectKBest when I have categorical input and numerical output?

mint palm
#

Why do we use resisual nets in NN ...it doesnt make sense to me in how the improve NN if ur just teleporting data so as to skip some layers?

bronze skiff
#

gradients like teleports tho

edgy hull
#

Is there a great book related to machine learning as pdf form or anything?

magic dune
finite leaf
#

Hi Everyone - I'm new to python and am tinkering with pandas. I have a two datasets with a primary key - foreign key relationship

#

On the dataset with the foreign key, I want to lookup data in the linked table and do a calculation

#

And then add that as a column in the original dataset

ripe forge
#

look at pd.merge or .join for joins

#

i find pd.merge interface easier to use

finite leaf
#

ok

#

I'll look up those docs

#

thanks!

#

A follow on question - my keys are named dissimilarly

#

it looks like merge takes a single key, finds that on both tables, and joins on that

unborn glacier
unborn glacier
finite leaf
#

If I were doing SQL you can say something like INNER JOIN Foo ON Bar.ID=Foo.BarID

unborn glacier
#

Ohhh you mean the column containing keys has a different name

finite leaf
#

yes ๐Ÿ™‚

unborn glacier
#

Lol I thought your actual keys were different haha

mint palm
unborn glacier
#

I think the right on / left on params do that
pd.merge(df1, df2, left_on='UserName', right_on='UserID')

finite leaf
#

ah - I'll give that a try

#

That worked!

#

Thanks!

timber skiff
serene scaffold
timber skiff
#

While it's on my mind, do you know the best way to make a helper column that checks for nan in a different column, and fills in based off whether or not it's occupied?

serene scaffold
timber skiff
#

My case is the large set is production data, the medium set is shipment data, and the small set is consumption data. I'd like a column called "stage" with the value produced, shipped, or consumed based off whether it shows up

serene scaffold
#
In [4]: df
Out[4]: 
          0         1         2
0  0.471273  0.800411  0.899211
1  0.214583  0.581962  0.752713
2  0.611297  0.909228  0.658377
3  0.512379  0.329779  0.843706
4  0.323153  0.020822  0.234330

In [5]: df[0].isna()
Out[5]: 
0    False
1    False
2    False
3    False
4    False
Name: 0, dtype: bool
frank flare
#

I'm a noob. Idk how to code properly and definitely dunno about NN. Is there any online server or something where I could just add my dataset and it will be able to make predictions ?

serene scaffold
velvet thorn
timber skiff
#

if column A is 20 rows, B is 15, and C is 10.....

#

whats the best way to create a column D that has 3 possible values, "A is present", "A and B are present", "All are present"

velvet thorn
#

I suppose

#

that means that for all rows for which B and C are present, A is present

#

and for all rows for which C is present, B is present

timber skiff
#

exactly

velvet thorn
#

okay then

timber skiff
#

it steps down

velvet thorn
#

(df['A'] * 4 + df['B'] * 2 + df['C']).map({7: 'all', 6: 'A and B', 4: 'A'}) should work

#

wait hold up

velvet thorn
#

how can they have different sizes

timber skiff
#

in an outter merge

velvet thorn
#

ah

#

but

#

you have joined them already

#

right

timber skiff
#

where the presence of data indicates its stage

#

and i want a label for stage. Maybe df["all"] = df[df["C"].notna()]

#

then df["all"] = "all" .... then i'll append them ill play around

serene scaffold
timber skiff
#

it definitely doesn't

#

i'll look into df.apply or lambda

warm swallow
#

maybe this will work for you:

def is_present(a,b,c):
  if not math.isnan(b) and not math.isnan(c):
    return "all are present"
  elif not math.isnan(b) and math.isnan(c):
    return "a and b are present"
  else:
    return "a is present"

df['D'] = df.apply(lambda x: is_present(x['A'], x['B'], x['C']), axis=1)
timber skiff
#

that is awesome!

#

thank you!

warm swallow
#

ofc! i love lambda functions.