#data-science-and-ml

1 messages Ā· Page 391 of 1

agile cobalt
#

ah... well, that is a fair bit more complicated then

gilded bobcat
#

Ah shucks

#

I know for logistics, if its negative ---> ok prob 0, if its pos ---> ok prob 1.

agile cobalt
#

Do you have an overall understanding how Adaboost (or Bagging Boosting Ensemble) models work overall?

gilded bobcat
#

I'd like to think so!

#

I guess this gets to the interpretability part, where adaboost def brought up my predictive power, but I cant turn around and tell someone "yeah sex really matters, idk which way but it matters!!" ?

agile cobalt
#

oh wait, I misspoke - Adaboost is a Boosting ensemble, not a bagging ensemble

gilded bobcat
#

Side question -- Adaboost still bags in the sense that we're resampling each time?

agile cobalt
#

in a nutshell,
you make a model that underfits the data, with a high error
you make another model, that tries to predict the previous model's error, but also (intentionally) under fitting
you make another model, that tries to predict the previous model's error, but also (intentionally) under fitting
you make another model, that tries to predict the previous model's error, but also (intentionally) under fitting
...

agile cobalt
#

The sklearn documentation describes the feaure_importances_ as

The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance
so you could just say "This has a Gini score of xyz", but it is a fairly arbitrary metric.

It is not a simple decision function, thus it is not easy to say how much "weight" exactly each parameter has

gilded bobcat
#

Interesting

#

So for boosting its like these high important variables reduced the iterative error the most across models (or something like this)

acoustic peak
# gilded bobcat Side question -- Adaboost still bags in the sense that we're resampling each tim...

According to the sklearn documentation for AdaBoostClassifier:

An AdaBoost [1] classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
Looks like it doesn't resample on each iteration.

gilded bobcat
#

Interesting so AdaBoost takes the whole sample given and works with it

#

Whereas Randomforest would bag

#

?

acoustic peak
#

Yup, I believe so

gilded bobcat
#

Cool! Ty

#

I guess my initial question still stands (if it even makes sense), if I wanted to explain this to someone, how can I tell how the features are important for classification?

#

Like is it older ages that are important for being adopted or younger ages?

acoustic peak
#

Conceptually, it's neither. In your case, age is a significant factor for predicting adoption, but it does not say whether older animals are adopted more than younger animals. Something like Shapley values may give you that information.

gilded bobcat
#

interesting ill look into it, thank you!

#

May I ask a final question to you? No worries if not!

acoustic peak
#

Sure!

gilded bobcat
#

Awesome ty! So i've been practicing decision trees as you can tell and I built my own dataset using adoption data. I started with a simple decision tree and found an accuracy of 72% (I just testing depths of 1 thru 10 and picked the best). After I used the SAME training data and made random forest and boosted trees, both get to 71%, but they never do better than my first simple decision tree. It makes me think I messed up, how could the simple single tree out preform all these complex models that I fine tuned?!

acoustic peak
#

that's a really hard question to answer without seeing all of the data and code... are those accuracies from the training or hold-out data set?

gilded bobcat
#

Hold out dataset

#

like at the end of the day its a percent difference, but still

#

Like I did a train_test_split, trained over train, predicted over test

misty flint
acoustic peak
#

@gilded bobcat yeah, it could just be that you have a ton of unexplained variance, but try looking at other performance metrics (recall, F1, ROC AUC, etc) and your class balance

#

I'm also a little suspicious that one feature has 10x higher importance than all of the others

gilded bobcat
#

Yeah... If I build out a tree of depth 3, then only Age is the important feature.

#

Maybe I need to take a step back and start with a very simple model and try over

tacit basin
misty flint
#

its not even all of them in the pic DoggoKek

tacit basin
#

That's even more tools lol

misty flint
#

he does this like every year

#

and then you can see which AI startups die and stuff

mild dirge
#

Oh basically like specific field vs multiple fields ig

misty flint
#

ugh i feel like i cant decide what i like/want to do

#

traditional DS work vs. DE vs. MLE vs. some flavor of SWE with ML

#

i hope i can decide by end of summer

#

when i graduate kekHands

#

does anyone know why they say you need 2-3 data engineers for every data scientist?

#

like i get that there needs to be ETL and data warehouses for DS but like

#

it seems like a 1:1 ratio might also be fine?

mild dirge
#

Recently looked at some code I found on kaggle using Pytorch. they made a model for image classification. Was wondering if there was a bug in this code:
https://paste.pythondiscord.com/ajepebanug
They define self.pool2 = nn.MaxPool2d(2, 2) once in the init. In the forward pass they use it multiple times. Is this incorrect, as it basically trains the same layer between multiple layers?

#

I assume you have to make a new MaxPool2d instance for every pooling layer you use in the network right?

desert oar
#

there's nothing to train/learn, so it's probably just for convenience

#

@mild dirge ā˜ļø

mild dirge
#

oh right, it's maxpool haha

#

don't mind me

desert oar
#

fwiw it's a weird and confusing thing to do in a tutorial

#

for exactly that reason

mild dirge
#

Yeah, it wasn't really a tutorial, more so a code showcase. But we copied part of the architecture design. Spent like 2 hours making a diagram of it:

desert oar
mild dirge
low bronze
#

hello

#

is there a way I can combine all of this to make it simple instead of making multiple dataframes

#
    df1 = working_df.drop('Embeddings', axis = 1)
    df2 = df1.sort_values(['Match %'], ascending=False)
    return df2[df2['Match %'] >= 40.00] 
thin palm
#

Hello, I have a dataframe that has two columns "Names" and "Original Name", i have 5 original names that are NaN, how can I apply a lambda function (or any) to assignn the missing data in original name to be equal to the Name??

#

for example a Russian name spelt in English is there but in the "Original Name" it's spelt in Russian text -> but it's missing. How do I tell the Original Name to be set to the "Name"

#

```nan_in_col = data.apply(lambda x: data['original_name'] = data.name if data[data['original_name'].isna()] == True)````

#

figured it out
nan_selection.fillna({'original_name':nan_name.name })

tacit basin
low bronze
#

thank you so much

languid stratus
#

If I have a classifier getting 5% accuracy on a data set I know to be binary 50/50 split….

#

Surely it’s then detecting something

#

Is there a likely reason or is this just a fluke?

ashen sable
#

can we make a bot which knows everything?

#

like an active data inserting sort of feature so it learns?

lapis sequoia
#

hello, off topic, but does anyone know if kaggle has dark mode, they only have it for their jupyter notebooks

dusty ivy
#

how can I create a canvas 35x35 pixels in python because I have a task to create a paint canvas that is 35x35 pixels only is it possible to create a larger canvas that is 35x35 pixels only ?

#

I want to create a vowel recognition using with image processing

tacit basin
lapis sequoia
#

XD

tacit basin
#

I hate it too. Most likely there's some way to add custom css to dark reader to show notebooks properly. Then that would be a nice solution.

lapis sequoia
#

I see. okay, thanks for answering :D. I'll try looking

fleet musk
#

hi guys

#

so i am getting a new Windows laptop

#

i have been studying python for a month, and today learnt about using Conda to create virtual environments

#

so when Im installing Anaconda freshly, should I or should I NOT, select to enable/install Conda to PATH

#

same question for Python installation, should i enable Path during 1st time install?
because I can install it later anyway while creating virtual environments, no?

tacit basin
fleet musk
#

another question

#

if i have installed conda, and specific python, say 3.7
and wanna code in pycharm/vscode

#

how do i ensure that it is using 3.7 from my virtual environment, and not the main python, say 3.9

tacit basin
fleet musk
#

oh cool. what about the packages

#

if i installed numpy in VE1 and pandas in VE2,
how does pycharm know which VE to look for, for the pandas package?

steady basalt
#

It gets confusing when ur in mac haha

tacit basin
steady basalt
#

I can’t keep track of direct

#

I only launch out of conda

#

For this reason

#

Maybe if I get a new computer I’ll do it right

tacit basin
steady basalt
#

It’s messy

#

It’s hard to understand

fleet musk
#

say i have created a Virtual environment - VE1 - which has numpy package and Python 3.7

#

and another environment VE2 - which has pandas, and python 3.5

#

so when i open Pycharm
how/where do i select so that My working environment is VE1

night sequoia
#

Hey guys , I am currently reading "Hand's on Machine Learning" book by Aurelion Geron and I have planned to share the things I have learnt in the book , in this notebook I have added some important things I learnt in the book along with the code. Please give your honest feedback and should I continue with the upcoming chapters ....

amber lark
#

Hi guys can someone please look at this and tell me what am I doing wrong?

serene scaffold
amber lark
#

in a sec

serene scaffold
#

I can't help, unfortunately.

amber lark
#

why?

mild dirge
#

dtype should be float 32

#

not int8

amber lark
#

how to fix it?

mild dirge
#

convert to float 32 lol

#

theyre rgb values of 0 to 256

#

they should be normalized/standardized

#

and close to 0-1 range

amber lark
#

i tried to do
train_images = train_images . 255.0

#

and it didn't work

desert oar
# amber lark

!paste stelercus was asking for actual text, and not screenshots. see below for pasting code and other output:

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar
#

it's expecting a lot of people to ask them to squint at a tiny screenshot and manually type out dozens of numbers

amber lark
#

not working

mild dirge
#

Pretty sure you should just give it a tensor

#

not a ndarray

amber lark
#

what does it mean?

mild dirge
#

not sure what library you are using, you haven't told us yet

#

sklearn, tf, pytorch?

amber lark
#

tf

mild dirge
#

what is train labels?

#

did you convert those too?

#

pretty sure that needs to be an array of size n_labelsx1 (2d)

#

currently it's an array of size n_labels (1d)

amber lark
#

the train_labels is the y_true of the images

#

like what they are

mild dirge
#

right

#

they're probably in the wrong shape

amber lark
#

how can i fix that ? šŸ˜…

mild dirge
#

seems that 1d tensor can work too, so should be fine I guess

#

what error are you getting after converting the train images to float32 in range of 0 to 1 ish?

amber lark
#

TypeError: only size-1 arrays can be converted to Python scalars

mild dirge
#

?

#

Please give a full traceback

amber lark
#

can i sent a screen shot?

#

send

mild dirge
#

yeah

amber lark
mild dirge
#

that means your trainining images aren't all the same shape

#

are they all rgb or some grayscale?

amber lark
#

sorry for bothering you for this I m new to all the deep learning stuff and it is hard for me to understand the dimensions staff

mild dirge
#

Yeah it's confusing haha, reading a whole book on pytorch rn, just finished the tensor chapter

amber lark
mild dirge
#

The error is because it's isn't a rectangular multidimensional shape

#

Like some rows or columns are different size

#

For instance when some images are grayscale (1 channel) instead of rgb (3 channels)

#

What datatype is train_images?

#

a list of ndarrays?

amber lark
#

I used cv2 to change all the images to rgb ones to I think that the problem is the width and height

mild dirge
#

They must all be the same width and height too yes

amber lark
mild dirge
#

type(train_images) is ndarray?

amber lark
#

yep

mild dirge
#

Eh honestly no clue then

#

try train_images = train_images.astype('float32')

dusty valve
#

mhm yes thanks you pycharm.

amber lark
#

WORKING!!!! I was just needed to fix the width and heights for all images

mild dirge
#

nice

amber lark
#

thanks @mild dirge you the king

mild dirge
#

you might want to look into dataloaders from pytorch

#

they can save you this headache

amber lark
#

i will thank you

mild dirge
#

you can use transformations with them

#

that automatically normalize and reshape etc.

rugged oasis
#

Hi! I wanted to make an multi agent simulation kind of thing so can anyone help me with what library or application should i use? I would like my ai code to be in python

mild dirge
#

multi agent simulation kind of thing is pretty broad haha

nova totem
#

Guys, good morning, could someone help me, I'm trying to replace missing value of a specific column, but it doesn't replace, I tried other methods, but when I check with isnull, it still shows missing data.

  # take the average of the values 0 and 1 of Age
media_idade_benigno = mamografia.Idade[mamografia['Severidade'] == 0].mean()
media_idade_maligno = mamografia.Idade[mamografia['Severidade'] == 1].mean()
print(media_idade_benigno)
print(media_idade_maligno)
#fill in missing data
mamografia.Idade[mamografia['Severidade'] == 0].fillna(media_idade_benigno, inplace = True)
mamografia.Idade[mamografia['Severidade'] == 1].fillna(media_idade_maligno, inplace = True)

#check for missing value
mamografia.isnull().sum()

rugged oasis
next phoenix
urban prism
#

I have a tflite model and to be honest, I'm already having an issue understanding how to run inference with it. When I load up my tflite model and call interpreter.invoke() as in TFLite docs, my code stops running. Cannot even print. Thought this might be a local issue but I get the same result in Kaggle

bold timber
#

Hi I have a question: In this code, what the meaning of {i+1:4}?

urban prism
#

It is to show the number of the current epoch but padded to fit 1000 in there. @bold timber

#

You can try running it yourself.

i = 0
print(f"Epochs: {i + 1 : 4}")
i = 9
print(f"Epochs: {i + 1 : 4}")
i = 99
print(f"Epochs: {i + 1 : 4}")
i = 999
print(f"Epochs: {i + 1 : 4}")
Epochs:    1
Epochs:   10
Epochs:  100
Epochs:  1000
serene scaffold
#

as an undergrad, I helped develop a library with a CLI where you can specify the location of the training data and all the hyperparameters at the command line, and it trains a model and saves it. and then you can use the same CLI to load that model and predict. and several people told me that this was remarkably well-packaged. I didn't believe them at the time, but now I realize how low that bar is.

#

like, the bar is on the ground, and anyone can just roll around on it.

serene scaffold
desert oar
#

it helps to remember that : is not a valid python syntax construct, outside of indexing with []. so the : is used to separate the python expression from the format expression

pearl heart
#

hello

soft lance
#

Hello everyone, I have a quick and probably silly question: I'm loading a colored image from COCO dataset through pytorch's DataLoader, then I try to plot it, and instead of using 3 channels to compose a colored image, it plots every grayscale component 3 times along every axis. How to fix that?

#

I'm probably missing something obvious

desert oar
#

so you need to reorder the axes

soft lance
#

thank you very much, i will try it

desert oar
lapis sequoia
#

hi

soft lance
desert oar
#

maybe just mess with it / experiment on smaller arrays so you can print them out to see what's happening

#

that's pretty much what i do when i need to do complicated array manipulations

soft lance
#

thanks, i'll try to do that

#

and maybe some other question, regarding pytorch DataLoader: COCO images have different sizes, so I can create batches of size 1, but with the higher batch size it just can't concatenate

#

what's the right way to work with this and not reinvent the wheel?

wicked grove
#

@desert oar hello i was reading about this instead of cross entropy losspyscce=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False,reduction=losses_utils.ReductionV2.AUTO,name='sparse_categorical_crossentropy') scce(tf.Tensor([0 2 1]), output, sample_weight=tf.constant([0.3, 0.7])).numpy() i am not getting what i should put for y_true and y_pred i keep getting an error

#

how can i put y_pred before the model has been compiled?

runic zodiac
#

i have made an opencv file to detect face i want to send the name of the faces detected into django how can i do it.What my open cv does is i have some pre loaded images and it detects it and appends the name into a list i want that data in my django app to save it in a server how do i do that

urban prism
#

Does anyone here know how to use Tensorflow 2 Object Detection API? I'm desperate at this point

serene scaffold
#

@urban prism try asking your question

plush jungle
#

I'm trying to get a basic untrained transformer to use for sequence prediction and I've got the pytorch example working

#

but once it's trained, I can't get it to predict

#

in their code, they forward pass through the transformer like this

output = model(data, src_mask)```
#

and then that outputs a matrix

#

of shape [9, 20, 28782]

#

I know 20 is the batch size

#

and 28,782 is the vocabulary size

thin palm
#

if I have 305 rows, each representing astronauts, how do I plot each of those on a x axis without being over crowded?

plush jungle
#

but what I want to do is pass it text, and have it complete the text with a prediction

tacit basin
tacit basin
plush jungle
#

so my thought was I could pass model() a string and then cover it with a mask for however many characters I want it to predict

#

and then convert the resulting tensor into a string

tacit basin
thin palm
#

Any advice on how to deal with 111 Names that need to be plotted?

ionic relic
#

can someone give me some potential ai research topics or questions for an extended essay (research essay of 4000 words) intermediate level

nova totem
#

If you want to send the notebook, I'll just let you know that it's in Portuguese

urban prism
#

I'm using TensorFlow Object Detection API and had exported the model to kaggle/input/id-out/fine_tuned_model. I'm trying to run evaluation on my saved_model. Unfortunately I cannot run model.evaluate(generator_object) and it seems that I have to use !python /kaggle/models/research/object_detection/model_main_tf2.py --pipeline_config_path=/kaggle/models/research/object_detection/efficientdet_d2_coco17_tpu-32/pipeline.config --model_dir=kaggle/input/id-out/model/train --checkpoint_dir=/kaggle/input/id-out/fine_tuned_model/checkpoint --run_once=True in my terminal. Evaluation part of the configuration file looks something like this

  label_map_path: "/kaggle/input/id-out/labels(4).pbtxt"
  shuffle: false
  num_readers:1
  num_epochs: 1
  #queue_capacity: 150
  #sample_1_of_n_eval_examples: 80
  #sample_1_of_n_examples: 80
  #min_after_dequeue: 100
  tf_record_input_reader {
    input_path: "/kaggle/input/id-out/tfrecs/file_??-512.tfrec"
    input_path: "/kaggle/input/id-out/tfrecs/file_17-176.tfrec"
      
  }``` (Commented out parts because I have been trying different flags to make it work)
TFRecords files have 8880 samples in total. Running the Python command mentioned above makes my memory load up to 16 GBs and then it restarts the kernel stating that the notebook tried to allocate more memory than available. How can I make it evaluate all my samples without overloading the memory?
#

Last part of the output of that command:

I0329 14:20:12.925074 139994487641920 model_lib_v2.py:966] Finished eval step 0
W0329 14:20:13.111861 139994487641920 deprecation.py:339] From /opt/conda/lib/python3.7/site-packages/object_detection/utils/visualization_utils.py:617: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
I0329 14:57:09.680602 139994487641920 model_lib_v2.py:966] Finished eval step 100
I0329 15:33:06.951502 139994487641920 model_lib_v2.py:966] Finished eval step 200
I0329 16:09:41.167769 139994487641920 model_lib_v2.py:966] Finished eval step 300
I0329 16:45:51.399167 139994487641920 model_lib_v2.py:966] Finished eval step 400
I0329 17:19:54.271977 139994487641920 model_lib_v2.py:966] Finished eval step 500
#

Then memory allocation issue happens

fleet musk
thin palm
#

Does anyone know how I could adjust my code to have each plot represent a country? Right now it produces the same plot each row.

sns.set(font_scale = .5)

# flatten axes for easy iterating
for i, ax in enumerate(axes.flatten()):
#     sns.boxplot(x= data.iloc[:, i],  orient='v' , ax=ax)
    # plot astronauts who have spent more than 4380 hours (6 months) in space or more
    bx = sns.scatterplot(x="name", y="total_hrs_sum",
                hue="nationality", 
                palette="Paired",
                sizes=(10, 200), 
                linewidth=0,
                data=long_hours,
                ax=ax)
#

long_hours['nationality'] has 10 values

amber lark
#

Does someone know how to get into a generations deep learning like that mario ai that learn from every death?

tacit basin
thin palm
thin palm
steady basalt
#

With thin bars and tiny tiny writing

#

Put the text at a 80 degree angle maybe

#

Histogram

tacit basin
steady basalt
#

I’d use thin bars and small angled text

tacit basin
#

you can also print vertically

steady basalt
#

To be honest it’s more practical to have a sorted table

#

Going from high to low hours

tacit basin
#

something like that

#

then it will be tall šŸ˜‰

steady basalt
#

Good idea

tacit basin
#

this is minitab, but i am sure in matplotlib you can do similar, so you start with hightest number of hours and once you reach say 80% cumulative you group rest of the names as other

bold timber
#

I have a cudatoolkit 11.1.1, but why it can't be used?

thin palm
inland mantle
#

Would a cert in deep learning and machine learning be needed because I have no experience

inland mantle
#

For an internship at the least

grave frost
#

not really

#

projects do count though

tacit basin
inland mantle
#

Can school projects be allowed on resume

grave frost
#

school projects like?

#

I got a fairly interesting one in an AI music startup, and that was through projects alone

inland mantle
#

Would you recommend undergraduate research on deep learning

tacit basin
inland mantle
#

Or is it too advanced

grave frost
#

undergrad research is chill

#

you aren't really expected to do wonders in NIPS or top-notch conferences, publishing papers itself becomes a pretty sizeable achievement

#

I'd say its great for an internship, but often engineers don't have the acumen to read papers. whereas projects look pretty interesting and engaging to them

inland mantle
#

Oh I’m doing several rn

#

I need to finish the hard ones but I’m only in hs

#

When would you recommend me to start looking for an internship

#

And get one for the matter

tacit basin
hybrid mica
#

after creating a polynomial regression model with sklearn, can you see the polynomial formula generated?

mild dirge
#

I assume you can use np.linspace to generate some points, classify them and draw a line through it? @hybrid mica

hybrid mica
#

does sklearn offer a method for that purpose though?

mild dirge
#

It would literally be 1 line to generate the points, and 1 more to draw a line through it

desert oar
steady basalt
#

If I’m not going to learn R, is there a similar stats package for python?

#

Sklearn breaks a lot of stats rules

stone marlin
#

Sklearn breaks a lot of stats rules...? In what way?

agile cobalt
#

statsmodel has stats in the same so you might like it
jokes aside, maybe do check it out

stone marlin
#

Statsmodel is great, agreed.

agile cobalt
#

just generating a linear space and plotting it is pretty standard though

stone marlin
#

This is not exactly what you want, but it's sort'a close.

agile cobalt
#

(5 hours ago)

urban prism
#

I have a TFLite model. Everything seems to work until I run interpreter.invoke(). Then the code stops running. The kernel doesn't interpret the lines that come after it. What might be the issue?

stone marlin
#

Thanks for this link etrotta, that was an interesting read and I didn't know you could do some of that!

severe pulsar
#

so I have this in a very simple keras model

#

model = Sequential()
model.add(keras.layers.CategoryEncoding(num_tokens=47, output_mode = "one_hot"))
model.add(Dense(47))

#

and i'm trying to feed in a series of length 2 arrays into the one-hot encoder, with the goal of getting 2 entries of length 47 into the next layer (i also changed dense to 94). however im getting the error

#

When output_mode is not 'int', maximum supported output rank is 2. Received output_mode one_hot and input shape (None, 2), which would result in output rank 3.

#

does anyone know how to resolve this?

bold timber
plush jungle
#

the pytorch transformer example code has this function that turns a string into a tensor

#
def data_process(raw_text_iter: dataset.IterableDataset) -> Tensor:
    """Converts raw text into a flat Tensor."""
    data = [torch.tensor(vocab(tokenizer(item)), dtype=torch.long) for item in raw_text_iter]
    return torch.cat(tuple(filter(lambda t: t.numel() > 0, data)))
desert oar
plush jungle
#

but there's no reverse function that turns the tensor back into a string

#

and the transformer outputs tensors, not strings

#

so I need to convert them if I want it to generate predictive text

desert oar
#

this is torchtext i assume?

plush jungle
desert oar
plush jungle
#
>>> vocab
Vocab()```
#

this vocab?

desert oar
#

so for some one-hot encoded vector v, something like vocab[torch.argmax(v)]

desert oar
#

it says that the Vocab class supports __getitem__, which means it supports indexing with []

plush jungle
#

the problem is i don't think it's one-hot

#

if I pass it "a"

#

it returns
[8]

#

"abc" returns

desert oar
#

pass to what exactly?

plush jungle
#

[ 8, 458, 378]

desert oar
#

that sounds like you messed up and it's one-hot encoding letters

plush jungle
#

the data_process() function

desert oar
#

as in, you passed it a string instead of a list of strings, or some equivalent

plush jungle
#

oh

desert oar
#

the curse of strings being iterable

plush jungle
#

lol

grave frost
#

or maybe that's just me. Eric Hallahan from eleuther from instance, is an undergrad

steady basalt
thin palm
#

if I have a list of countries how do I make a for loop to plot each country as subplots?

#
for i, ax in enumerate(axes.flatten()):
#     sns.boxplot(x= data.iloc[:, i],  orient='v' , ax=ax)
    # plot astronauts who have spent more than 4380 hours (6 months) in space or more
    bx = sns.scatterplot(x="name", y="total_hrs_sum",
                hue="nationality", 
                palette="Paired",
                sizes=(10, 200), 
                linewidth=0,
                data=long_hours,
                ax=ax)```
plush jungle
#
TypeError: __getitem__(): incompatible function arguments. The following argument types are supported:
    1. (self: torchtext._torchtext.Vocab, arg0: str) -> int```

nvm I realized it was vocab.lookup_token() that I wanted
severe pulsar
#

Can someone explain why I get this

#

InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [32,46] and labels shape [1472]

#

when im trying to run a very simple model

#

model = Sequential()
model.add(tf.keras.layers.Flatten())
model.add(Dense(94, activation = 'relu'))
model.add(Dense(46))
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = 'adam')
model.fit(to_categorical(inputs_array), to_categorical(outputs_array), epochs = 5, verbose = 1)

#

actually how do i write this as code markdodwn in discord

#

never mind forced it to work

grave frost
thin palm
#

if I have 10 countries and want to plot each country unique stat how do I do this instead of manually doing it?

belgium = long_hours[long_hours['nationality'] == 'Belgium']```
#

is there an easier way?

stark zenith
#

@thin palm could do long_hours['nationality'].value_counts() ?

#

maybe, unless you have other columns and that's why you're filtering the whole df

thin palm
#

what's more interesting to the data science community? A Graph showing astronauts who spent 6 or months or more on missions showing the total amount of time each astronaut spent in space and outside the space shuttle.

OR

each country's male v women time spent on astronauts?

lapis sequoia
#

"Assuming that the data collector makes an entry error when collecting data, it can
be ensured that the error occurred in the Price and Power columns, but it is not
sure which car’s information the error lies on. Please try to explore the error by
visualization to identify how many errors there are and try to fix it."

#

Hi, what plot can I use for this?

karmic valley
#

is it correct practice to work out standard deviation of an average column. e.g. i have 4 columns for for different people data and 5th column is average column of those 4. can i still work out standard deviation of column 5 - does it represent SD of all patients

mild dirge
#

Take the SD of all datapoints then

urban prism
#

I'm trying to evaluate my Tensorflow Object Detection API model. My config file's eval_config is something like:

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1
  num_examples: 8880
}

eval_input_reader: {
  label_map_path: "/kaggle/input/id-out/labels(4).pbtxt"
  shuffle: false
  num_readers:1
  num_epochs: 1
  #queue_capacity: 150
  #sample_1_of_n_eval_examples: 80
  sample_1_of_n_examples: 80
  #min_after_dequeue: 100
  tf_record_input_reader {
    input_path: "/kaggle/input/id-out/tfrecs/file_??-512.tfrec"
    input_path: "/kaggle/input/id-out/tfrecs/file_17-176.tfrec"
      
  }

It has 8880 images and when I try to run it with `!python /kaggle/models/research/object_detection/model_main_tf2.py --pipeline_config_path=/kaggle/models/research/object_detection/efficientdet_d2_coco17_tpu-32/pipeline.config --model_dir=kaggle/input/id-out/model/train --checkpoint_dir=/kaggle/input/id-out/fine_tuned_model/checkpoint --num_workers=1``it seems like it only evaluates with 56 images, given the output: https://paste.pythondiscord.com/upuzinusat
How can I arrange this in a way that it evaluates with all 8880 images and doesn't run out of memory -which restarts the Kaggle kernel that I'm working on

desert oar
thin palm
#

what would you do if a column had only 1 missing value?

agile cobalt
#

investigate why it is missing
if it would be simple enough to, find the actual value and fill in manually
if it's an input error, it wouldn't be viable to get the real value, and that data point does not seems to matter all that much, possibly drop it, but mention that you did so somewhere

thin palm
desert oar
thin palm
desert oar
#

that's a bad newbie habit

desert oar
#

right. it's categorical data

thin palm
desert oar
#

a lot of beginners learning data science from a "programming-first" perspective forget that they are actually working with real data that represents real things, not just pushing around 1s and 0s for fame and glory

#

it's good to avoid getting caught in that. data is data, code is code. we only use the latter in service of the former.

thin palm
desert oar
#

good! and it sounds like you've made progress too

thin palm
#

I would for sure do a mean to give each company an idea of how much each mission costs.. but 77% is way too much

desert oar
#

i thought you were talking about a field with one missing value

thin palm
#

would it be safe to say that I can impute the missing values with a mean? I like this because it's the cost in millions for each space mission.

desert oar
desert oar
#

i suppose you can impute that with the mean, but 77% is a lot. you have to start to wonder if the missingness is non-random, e.g. is the missingness spread unevenly across countries, years, etc.

#

if all the usa missions have a cost, but all the russian and japanese missions do not, then you're basically extrapolating from the us space program to the russian and japanese space programs

#

is that a valid thing to do? hard to say without knowing more about space programs

#

maybe the cost is a function of duration, or the mass of the payload, or the type of mission (shuttle, soyuz, etc)

#

the launch vehicle and launch location probably are really important factors, etc

#

tldr you can impute with the mean, but alway keep in mind what you are losing by doing something the "simple" way

#

but maybe it doesnt matter! it's probably better than throwing out the field entirely... or is it? depends on what you even want to know

#

maybe mission cost is something you might want to try to forecast/infer with a model

thin palm
thin palm
desert oar
steady basalt
runic zodiac
#

How to get output from an open cv programto other python program

urban lance
#

Does anyone have experience with plotly?
I'm trying to make a bar chart where the height of each label should be corresponding quantity.
Instead everything is 1 high as each label is unique

tacit basin
runic zodiac
#

In my program it recognises the faces and puts their name in a list i want that list in my other program so how to do that

misty flax
lapis sequoia
#

Is there any help-channel or something alike for spark issues ?(silly question maybe but I'm new here)

serene scaffold
#

@lapis sequoia you can ask about spark in this channel

long locust
#

Hello, please stick to the current topic of discussion instead of making irrelevant posts. Thanks!

lapis sequoia
#

Ok, well. I'm trying to write a Spark code to make a pictures classification. I managed to import them, make the train/test databases in dataframes, import the spark-deep-learning package from databricks and it crashs when I call the fit(train_df) in the pipeline. It seems to be a py4j issue : Py4JJavaError: An error occurred while calling o51.transform. : java.lang.NoClassDefFoundError: org/tensorframes/ShapeDescription (and a loooong succession of cryptics lines follow this first one).

runic zodiac
lapis sequoia
#

I doubt. raphael@ubuntu18046:~$ pip3 show py4j
Name: py4j
Version: 0.10.7
Summary: Enables Python programs to dynamically access arbitrary Java objects
Home-page: https://www.py4j.org/
Author: Barthelemy Dagenais
Author-email: barthelemy@infobart.com
License: BSD License
Location: /home/raphael/.local/lib/python3.6/site-packages

#

I do have py4j installed on my machine 🤷

#

And Java as well. raphael@ubuntu18046:~$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

soft lance
#

Hello everyone, can you please consult me on the topic of semantic/instance segmentation? I want to build a pytorch model that produces masks for every object class on the image (whether seperate "instances" or just "semantic" classes, doesn't matter). What would be the easiest architecture to achieve such masking? I scanned through the code of the known solutions (mask-r-cnn, yolac) but was scared off. I wonder what are the solutions that are easier to understand and replicate. Thank you very much.

desert oar
misty flax
lapis sequoia
#

And maybe, one day, I will finally nail it.

desert oar
lapis sequoia
#

Well, I tried with java 8 and java 11 (with respectively the proper path directory given in the .bashrc) and in the two cases the same error occurred. As far I know there is no other versions available.

desert oar
#

i have no idea how that stuff works

urban lance
#

How do you make subplots of multiple dataframes with plotly (like this)

#

online I only see code to do this with the columns of a single dataframe

serene scaffold
urban lance
#

I have a dictionary containing the dataframes

each dataframe is a sub dataframe of the original (grouped by the values of a column)

For instance: all fruits are in 1 dataframe, all cars are in another dataframe, all animals in another and so on

serene scaffold
urban lance
#

I want the plot to tell something about each of the different groups over time

serene scaffold
#

so is each group going to be a different color?

urban lance
#

yes

#

like in the image above

#

except the code I have is for columns of a single dataframe rather than for each dataframe

serene scaffold
#

so you're aggregating each group in some way? because this only has one line per color.

urban lance
#

nope, just like this

df_rent = dataset[dataset["type"] == "RENT"]```
#

df_rent would be one of the sub dataframes

#

I looped though all the dataframes in the dictionary and I can make line plots of them seperately

serene scaffold
urban lance
#

but I want it in a single plot

#

with different colors for each dataframe

#

when I try to put them into one column, this is the error I get

serene scaffold
#

it would need to be copy/pastable

urban lance
serene scaffold
#

I can't assist any further, then. Sorry!

urban lance
urban lance
woeful falcon
#

just created a working "neuron" which can basically train and detect a & b, in any linear equation like ax+by (ex. 3x+2y), given some example inputs & outputs
thinking of making some kind of a "neural net" program that can train and play tic tac toe, tho that will be a bit larger
anyone's got any ideas for a simple project

frank acorn
#

If you did a calculator as your first project, this is going to be familiar
Make a scientific calculator? One that could solve harder math like simultaneous equations, ratios, inequalities, etc (maybe make it draw graphs too)

#

Don’t know if that’s what you want though

serene scaffold
#

so in that sense, creating a NN for it is sort of overkill. but it might be a pretty good learning experience.

#

@tropic edge we're going to mute you if you keep making random statements ("Do you love God?") in a bunch of channels

lost zodiac
#

I just had a really random (and potentially stupid) thought, but is it possible to use Huffmans Algorithm/Coding to clean and compress data?

serene scaffold
#

it sounds like you might be conflating compression with density

lost zodiac
#

because i was trying to figure out a way of involving Algo and Datastructures with Data Science

serene scaffold
#

compression is for making files that you aren't actively using take up less space on the hard drive. that's just a general computer thing.

in the context of data science, you can have dense or sparse representations of data. sometimes you can make a sparse representation more dense. but not by using a compression algorithm.

serene scaffold
lost zodiac
serene scaffold
serene scaffold
#

if you have ten categories of things, you could represent each thing as a vector with ten elements, and each category gets an index

#
A: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
B: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
C: [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
... etc
#

This would be a sparse representation, because each instance is represented by a bunch of zeros, and only a single 1.

#

and it just gets more and more sparse the more categories there are.

lost zodiac
#

alright, would dense representation just mean the opposite

serene scaffold
#

right. a dense representation is when information is represented using less space. but you can't achieve that with compression, because when you compress a file, what it represents is no longer there until it's decompressed.

lost zodiac
#

ahhhh ok that makes sense

#

you had mentioned that graphs can be used

#

in the data science - alg and db relation

serene scaffold
#

right. neural networks are conceptualized as graphs.

lost zodiac
#

alright, so would you say it could also be applied to something "simple" like a web scraper and fake news detection

#

or is it a little higher level

serene scaffold
#

you can use a web scraper to obtain data for ML, I guess. I don't think there's a connection beyond that.

lost zodiac
#

other than graphs, is it possible to use another other theory in my specific case

#

or am i overthinking again haha

ornate isle
#

Hey guys, anyone with experience in PySpark? I need help with converting a python library to a UDF.

serene scaffold
#

there's no such thing as "overthinking"

#

not all ideas are useful, necessarily, but the amount of time you spent thinking about it isn't the problem.

serene scaffold
ornate isle
#

Where do I look for such a discord?

serene scaffold
#

you're welcome to wait for an answer here, of course.

lapis sequoia
woeful falcon
# serene scaffold creating a network that plays tic tac toe is special because every possible tic ...

yes, so I was thinking about an implementation which will store all the game-states in a dict with a weight, two agents will play the game to train, starting with an empty dict, will add a state to it if it is not there, and weights will be added once a game is finished,

so during the next game agents will select a move based on weights in the dict-of-game-states, though I wonder if it is even a neural network, like it will probably be the one with 3 layers, each with 1 - 5,478 - 1 neurons respectively šŸ˜…

and the problem is there would be no unique weights assigned to the same "state" when coming from different previous "states", creating a sort of "general weight for each state" mess,

for branching weights, one would probably need a 'real' NN, though i really wanna try & see if the former could perform well

bold timber
#

how to use GPU in pytorch?

tacit basin
mild dirge
#

for most tensors you can do .to(device)

#

where device='cuda'

#

or 'cpu' to go back to cpu

tacit basin
#

Oh sorry missed that noway said pytorch...

bold timber
mild dirge
#
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)
#

try this

bold timber
#

I don't understand why it doesn't work

mild dirge
#

No clue sorry

sand stone
#

hey guys, any of you is experienced with speech recognition or text to speech in python? Preferably both, but 1 person with sr powers and another with tts powers is ok

mild dirge
#

ask your question, not if there are experts that can help @sand stone

#

If someone can help they might respond

sand stone
#

I actually don't have questions, I wanna know, since I'm still a beginner and I have robotic ambitions

serene scaffold
thin palm
#

at that point I'd be left with all object types and what would I be able to do with it?

#

I could make some plots showing which Space companies reported costs and which didn't

steady basalt
#

Well if u don’t have costs for entire nations what good is it

#

It’s got multiple companies ?

#

But only american

thin palm
#
       'Cost', 'Status Mission'],
      dtype='object')  ```
#

so if the 'location' has different cities in USA, how would I group these as one for USA? I could do this for each different country and see which countries provide cost data

#

How do I group by certain strings in pandas? If I find the location is "Las Vegas, Neveda, USA" and another is 'Los Angeles,Calfironia, USA", how do I tell pandas to group these puppies based off USA match

narrow saffron
brave sand
#

what would the first steps be for solving the LunarLander v2?

grave frost
#

step 1: land the spacecraft

#

step 2: profit?!

misty flint
brave sand
misty flint
#

dont look at me. im only here for the scenery kekHands

agile cobalt
#

it would be better to first split and explode expand the address into separate columns for each of them, but you could just use str.endswith("USA") for now

#

!e ```py
import pandas as pd
df = pd.DataFrame({"XYZ": [1, 2], "address": ["LA, NV, USA", "SF, IDK, USA"]})
df[["city", "state", "country"]] = df["address"].str.split(expand=True)
print(df)

arctic wedgeBOT
#

@agile cobalt :white_check_mark: Your eval job has completed with return code 0.

001 |    XYZ       address city state country
002 | 0    1   LA, NV, USA  LA,   NV,     USA
003 | 1    2  SF, IDK, USA  SF,  IDK,     USA
agile cobalt
#

(you could drop address after)
edit; also in that case you should have split by ", " instead of the default šŸ˜…

thin palm
#

this was easier lol

mighty spoke
#

Hi when I run my code I don't know why its not running and giving the output (i'm trying to solve some coupled ODE's) is anyone able to take a quick look? many thanks ```import scipy.integrate
import numpy as np
from matplotlib import pyplot as plt

#defining constants
C=0.86
h_bar = 1.05457266e-34
Ye =[ 6/12,26/56] #carbon-12 nuclei and iron-56 nuclei
c= 2.99792458e8#speed of light
G=6.67259e-11#gravitational constant
me = 9.1093897e-31#mass of electron
mp = 1.6726231e-27 #mass of proton
rh0_0 = (mpme**3c**3)/(3np.pih_bar)#natural unit for density

#define scaling functions

R0Val=[]#natural unit of length
#cycling through the 2 values of Ye the number of electrons per nucleon
for i in Ye:
R0=np.sqrt(C3imec**2/4np.piGmprh0_0)
R0Val.append(R0)

#defining first order ODE's
def rhs3(x, p):
dpdx = np.zeros_like(p)
M = p[0]
q = p[1]
g=q2/3/(3*(1+q2/3)1/2)#gamma factor us a function of q
dpdx[0] = 3qx
2
dpdx[1] = -CqM/g*x**2
return dpdx#return the two coupled 1st order ODE's

sol = scipy.integrate.solve_ivp(rhs3, [0,1],[0,1], dense_output=True)
"""
while the value of the density is larger than a very
small value (q > 10āˆ’10) the program is allowed to run. The small value(aprrox zero) is chosen such
that the second boundary condition of the density at the WD’s surface tending to zero is
satisfied
"""

x=np.linspace(0,1,1000)

#m=sol.sol(x)[0, :]((4/3)np.piR0Val**3rh0_0)#mass of the white dwarfs
#rh0=sol.sol(x)[1, :]*rh0_0#density of the white dwarfs
M=sol.sol(x)[0, :]
q=sol.sol(x)[1, :]
print(M)```

agile cobalt
#

well, which output are you getting, if any?

#

also......... you do realise that you'll likely run into floating point precision issues with values as small as e-34 right?

mighty spoke
#

ohh yh that must be why its taking such a long time to run

thin palm
#

is this a good infographic?

desert oar
mighty spoke
odd meteor
#

Ask your question and people will respond to you; hopefully. "Don't ask a question to ask a question"

Just like Nike, Just Do It āœŒļø

neon imp
#

What's demand for MLE like in the US

#

I keep thinking "AI is ovrsatured"

#

but my friends keep getting hired as bonafide machine learning engineers haha

#

is AI actually not oversatured and there's a lot of demand for decent "software engineers who know what a training and test set are?"

misty flint
#

curious. why do you think its oversaturated

serene scaffold
#

I'm interested to know as well.

lapis sequoia
#

I have no personal experience. But I read entry level roles are oversaturated because of the popularity of the space. And ML, DS being buzz words. So I think that's what they are reffering to

inland mantle
#

lemon_thinking it’s not really over saturated, I think because it’s like the hype nowadays

tacit basin
misty flint
#

all other positions in tech that arent entry-level are pretty hot rn

edgy briar
#

guys can i ask you for some feed back?

#

im using pickle to storage my data but im also using git to get control of my versions. so as git cant restore or track binary files as pickle creates im made an file called backup of the data of the AI in a .txt file so if the AI gets broken i can restore his behabiour just copyn and paseting the data of the text file into the source code and deleting the binary file of pickle. you tink this is a good solution? or i have to made someting more complex to make easy the restore of the AI to a older version?

lapis sequoia
#

R or python? 🤪 @misty flint

neon imp
#

I should probably say, maybe let's cut it down to

#

"The job market in the United States for MLEs with 2-10 years of software engineering experience."

tacit basin
neon imp
#

My perception was that MLE pays worse

#

because it's oversaturated by bushy tailed ML enthusiasts

#

compared to straight general software engineer or data engineer

#

But I'm not very sure about my assumptions

misty flint
#

i say that half-jokingly kekHands

#

no but if your industry is academia/pharmaceuticals/bioinformatics/traditional stats, you are more likely to use R

#

if not...then python is your best bet RunFail

lapis sequoia
misty flint
lapis sequoia
#

I could have slept or taken class.

#

I am taking class

misty flint
#

i am without context kekHands

#

and semi-confused kekHands

safe elk
safe elk
misty flint
misty flint
#

i do like kolv's cats tho

#

also

#

its like

#

almost midnight here

#

so no coffee atm

#

or else i will die tomorrow morning

#

should probs sleep soon

#

people talk about MLE here but i also wished we talked about data engineering more too

#

since you usually need DE in order for DS to have more impact

tacit basin
edgy briar
#

@misty flint what is the main funtion of your perceptron?

#

i mean what is the task the perceptron has to learn

misty flint
edgy briar
#

not yet

misty flint
#

i am so confused tonight

edgy briar
#

dont be sad we can do it

#

you can do it

#

i belive in you

misty flint
#

id rather talk about data engineering kekHands

edgy briar
#

what about market exchange data?

misty flint
#

im not really interested in finance tho kekHands

edgy briar
#

because i have a game which is a bitcoin simulator

#

i tough i can made in the future a perceptron learn how to play my game

#

just to practice how to make a good perceptron

#

@misty flint what kind of data enginiering you used work ?

misty flint
#

oh thats what you mean

#

why dont you use reinforcement learning

#

that could be fun

misty flint
edgy briar
#

yea i will run the perceptron for hours and see how much it can survive

misty flint
#

just wanted to talk about DE is all

edgy briar
#

i dont know what DS mean

misty flint
misty flint
edgy briar
#

facinating

#

what kind of data you work?

misty flint
#

lets not talk about that kekHands

#

look at this chart instead DoggoKek

#

jk

edgy briar
#

ok you fell sad about it ok

misty flint
#

its super blurry

misty flint
misty flint
#

this is better

edgy briar
#

ok im just an mecatronic/informatic enginiering but i want to become a robotics enginier

misty flint
#

nice. thats intense stuff

edgy briar
#

yea thats why im here im searching for feed back about my work

misty flint
edgy briar
#

yea i will used but whit a preceptron format

#

and i will make some escenarios to get easyest learn about an specific task

#

my main problem is not that, the main problem for now is the correct storage of data and version control

misty flint
#

i...have never heard of combining perceptron with reinforcement learning...good luck kekHands

edgy briar
#

well tell us what is this image you send

#

its a diagram of how to make a product from the farm to the final consumer?

sweet venture
#

Hi, if someone can help me !
My problem is in french im gonna try to translate it.
I have two equations :
x(t) = p1 + sin(p2 * t + p3)
y(t) = p4 + sin(p5 * t + p6)
with x(t) et y(t) the position of the satellite at instant t (0;2*Pi).
p1,...,p6 are the parameters I have to find to anticipate the movements of the satellite at an instant t given.
I have a list in a folder : "position_sample.csv" which give me some positions at t given.
My job is to create an genetic algorithm which can find a good combination of p1,...,p6 that give the right trajectory of the satellite.
To simplify the exercise, each parameters are included in [-100;100]

#

Im a beginner I take any help

edgy briar
#

ok im also a beginer but it doesnt sounds too dificult

#

have you made any of the perceptron already_

sweet venture
#

I tried but .. looks like nothing

edgy briar
#

i made a proyect similar in the university but it was a robot

#

in my case it has to calibrate his sensors whit data of the past

sweet venture
#

I know I have to write some function called fitness mutation crossover but im lost

edgy briar
#

this problem can be solve whit machine learnig but i tink its too much for htis you can make an simple algoritm to play hot and cold and get valid values to solve this problem

sweet venture
#

yes I do have to write a simple algorithm

#

but still I dont know how to proceed

edgy briar
#

or make this in a little perceptron if you want this have a full automatic calibration for any kind of curve

#

you can solve this whiout much math

#

you have to solve this whit an AI or can you use any kind of algoritm?

sweet venture
#

I have to solve this with an algorithm

edgy briar
#

ok make an simple hot and cold algoritm

#

its simple and it work perfect for this problems

#

its less proccesing heavie for the computer

#

and you can develop in 20 minutes

#

whit AI this can take days in develop

#

just training the AI

sweet venture
#

Hm I misspoke, I have to use AI but u sure it takes that much time ? bc I have to do it for tomorrow

#

I mean we have 1 day to do it

edgy briar
#

it tecniclly posible doing in 1 day but just whit a lot of practice and knowing exactlly what are you doing

sweet venture
#

<

edgy briar
#

in our case i recomend make an simple algoritm that learn how to calibrate his exit whit out much math as an AI

sweet venture
#

I want to try this

#

its called cold and hot ?

edgy briar
#

is easier as using trigonometry insted sinusoidal waves

#

well in my school i called the algoritm hot and cold

sweet venture
#

How do you create an algorithm that learn how to calibrate his exit

edgy briar
#

its because this algoritm predicts where the thing whill be and if it fail it changes his value

sweet venture
#

its probably easy for you but idk

edgy briar
#

i made this

#

how many inputs of data you have?

#

how many variables?

sweet venture
#

6 parameters p1,..,p6, time t and positions x,y i guess ?

edgy briar
#

in you case it could be actual_potition(x,y,z), acceleration, and speed

#

you are using only x y?

#

ok

sweet venture
#

there is no notion of speed here

#

only position

#

I have to predict the position at instant t

edgy briar
#

and you give multyples "pictures" to calibrate the algoritm?

sweet venture
#

to do it I have to find the 6 parameters

#

or an approach

edgy briar
#

so i tink it would be

#

you have time t and x,y as input data

#

so you have to predict the next input data as exit

sweet venture
#

yes

edgy briar
#

ok and the p1 ......p6 are you hiden layer of "knowledge" in the neurons of the perceptron (wheigts)

sweet venture
#

sry i didnt understand your sentence

sweet venture
#

oh i understand

#

I have a folder yes

#

with some knowlege

#
  • P1,...p6 are included in [-100,100]
edgy briar
#

ok then you normalitation of data is in the range of -100,100

sweet venture
edgy briar
#

you can use some kind of perceptron maker but i tink they doesnt accept values biger than 1 and the normalitation default is crap

sweet venture
#

I rely on you sry

edgy briar
#

im tinking if you can make this work whit out normalize the data

#

and you have only 1 day to finish this?

sweet venture
#

yes

edgy briar
#

or you didint make your homework of the last mont?

#

this is like an work it takes at least 3 days for a new student

#

i can made this in 1 day because im graduated

#

but i tink its too much for only 1 person

#

ok i found the example i was searching for

sweet venture
#

Not gonna lie, I had 5 days to do it >< but still Im completly lost even with 5 days

edgy briar
#

the good thing i this exaple can do the work you want the bad thing is you have to normalize the data whit anoter algoritm

#

ok 5 days sounds more undestandable

#

but still hard and i even wasnt in that class

#

ill search for my perfect and very polished normalitation algoritm

sweet venture
#

Would be more easier if the teacher gave us some example before giving this but nevermind

steady basalt
#

I got perceptron next semester

#

Isn’t it like 20 lines of code or less?

#

Sure only takes 1-2 days

edgy briar
#

yea but this is for tomorrow

steady basalt
#

Rip

sweet venture
#

<

edgy briar
#

ok i found the code but my comand line is deat because i was playing whit some controlators so i have restart some configurations ill be ofline some time

sweet venture
#

ok

edgy briar
#
from sklearn import preprocessing#ayuda a preprocesar los datos de entrada
from sklearn.preprocessing import MinMaxScaler
#input_data = np.array([2.1, -1.9, 5.5],#xyz seeker1
#                      [-1.5, 2.4, 3.5])#xyz seeker2

# input_data = np.array([2.1, -1.9, 5.5],
#                       [-1.5, 2.4, 3.5],
#                       [0.5, -7.9, 5.6],
#                       [5.9, 2.3, -5.8])
input_data = ([13, 50, 200],
              [802, 23, 38],
              [0, 0, 1240])#1240 es el valor maximo para calibrar el resto de numeros inferiores
data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print ("\nMin max scaled data:\n", data_scaled_minmax)
print (input_data[0])
print (input_data[1])

#create NumPy array
data = np.array([[13, 50, 200, 802, 23, 38, 0, 1240]])

#normalize all values in array
data_norm = (data - data.min())/ (data.max() - data.min())

print (data_norm)```
#

this is all the code you ill need to normalize ina perfect form any kind of data

#

the bad is you have to manually put the data in the array or change the code to read the data from you .csv file

sweet venture
#

I don't have to use the equations ?

edgy briar
#

no. just enter the data into the input data

#

thats how perceptrons works

#

you give him data and it gives you data in exchange all cooked and ready to use

sweet venture
#

How do I collect the 6 parameters p1,..,p6

edgy briar
#

but that file you have its ahiter the trining data or the usuall data it will manage and the perceptron have to calibrate himself to acmoplish his task

sweet venture
#

Im gonna try it, thx

edgy briar
#

in data = np.array([[13, 50, 200, 802, 23, 38, 0, 1240]])

#

data = np.array([[p1, p2, p3, p4, p5, p6, t]])

sweet venture
#
import numpy as np
from sklearn import preprocessing#ayuda a preprocesar los datos de entrada
from sklearn.preprocessing import MinMaxScaler
#input_data = np.array([2.1, -1.9, 5.5],#xyz seeker1
#                      [-1.5, 2.4, 3.5])#xyz seeker2

# input_data = np.array([2.1, -1.9, 5.5],
#                       [-1.5, 2.4, 3.5],
#                       [0.5, -7.9, 5.6],
#                       [5.9, 2.3, -5.8])
input_data = ([13, 50, 200],
              [802, 23, 38],
              [0, 0, 1240])#1240 es el valor maximo para calibrar el resto de numeros inferiores
data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print ("\nMin max scaled data:\n", data_scaled_minmax)
print (input_data[0])
print (input_data[1])

#create NumPy array
data = np.array([[p1, p2, p3, p4, p5, p6, t]])

#normalize all values in array
data_norm = (data - data.min())/ (data.max() - data.min())

print (data_norm)
#

what does input_data = ([13, 50, 200], [802, 23, 38], [0, 0, 1240]) do ?

#

p1, p2, p3, p4, p5, p6, t are unknown

edgy briar
#

well in the actuall state its no goin to work because you have to give values to a p1 p2 ...

sweet venture
#

random values

edgy briar
#

if you want but the perceptron will not lear aniting because they are random

#

i mistaked that its no p1 p2 p3 it has to be x,y,t

sweet venture
#

ok x,y are the equations with p1 .. p6 in it

edgy briar
#

those lines are just to normalize the data into numbers betwen 0 to 1

sweet venture
#

oh x y from the data I already know

#

mb

edgy briar
#

xy are the input data but p1 to p6 are the neurons

#

inside p1 ...... p6 exist values betwen 0 to 1 and those are the knoledge of the AI

#

all that conbinated gives you an exit of data it will be x,y,t of the future

sweet venture
#

well I do have to install sklearn, i do it then i try

edgy briar
#

Yeah

#

I think also numpy

edgy briar
#

ok im back and is very funny the error i haved

sweet venture
#
with open('position_sample.csv') as f:         
    data=np.empty(shape=90)
    lire=csv.reader(f,delimiter=';')                           
    for ligne in lire:                            
        data = np.append(ligne)
#

it doesnt work with numpy I try to find a solution

edgy briar
#

are you using which comand to run it?

sweet venture
#

im using spyder

edgy briar
#

are you using python3?

sweet venture
#

yes

urban prism
#

I have this TF Lite object detection model. It seems like it returns a (1,50000) shaped array. I'm having a hard time understanding it's output. The code:

import cv2
import numpy as np
import tensorflow as tf
im = cv2.imread("/path/to/image")
im = cv2.resize(im, (768, 768))
im = np.array(im,dtype=np.uint8)
im=np.expand_dims(im,0)
interpreter = tf.lite.Interpreter(model_path="/kaggle/input/id-out/model_quant_tl.tflite")
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_ =   tf.convert_to_tensor(im)
input_tensor = input_[tf.newaxis, ...]
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], input_)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data) # [[65504. 65504. 65504. ... 42720.  4148. 14856.]]
print(output_data.shape) # (1, 50000)

Can someone explain this output_data to me? I'm trying to plot the image with output detection boxes

edgy briar
#

wait im having troubles to test the normalitation algoritm

sweet venture
#

sure

edgy briar
#

i tink this is a old vertion

sweet venture
#

But I think u shouldn't waste more of your time

sharp rain
#

Does anyone know how to do interpolation from two value which is non-linear

#

I have nine equation of different temp, but If i want about 16.5 temp, how can i interpolation between 16 and 18 trend line?

tacit basin
sharp rain
#

I fell tired about handling the missing value in machine learning🄲

tacit basin
#

if its pandas fillna doesn't work?

serene scaffold
#

it might not be as straightforward as replacing them all with 0 or the mean, in their case

tacit basin
#

i see

hybrid mica
#

do optical character recognition libraries use ai?

serene scaffold
#

yes

sand stone
blissful bone
serene scaffold
sand stone
#

wait seriously?

serene scaffold
#

Yes

sand stone
#

I'm sorry then, didn't know. If not recrute, then how about knowing people, or assistance request

serene scaffold
#

You can ask questions related to tts, yes.

sand stone
#

does anyone know a trick to optimize heavy tts architectures to work in realtime and on the cpu instead of the gpu?

sharp rain
#

Do anyone know how to do more better in Optimization-curve fitting, since now my cuvre only have one partten🤣

serene scaffold
sand stone
#

for deploiment

sharp rain
#

My way to do curve fitting, is about using the coef from most data line, and the intercept from them.

serene scaffold
#

How quickly do you need to be able to generate ten seconds of audio for it to be viable?

sand stone
#

for example, if I want to impliment a model for a screen reader, which requires the voice to be extremely responsive, introducing no lag or delay what so ever, no matter how much the size of the text is, just like how responsive espeak, rhvoice or flite are, and those aren't neural based tts systems

serene scaffold
#

How realistic does the voice need to sound?

#

Also, remember that "use", "deploy", and "implement" are not the same things. It doesn't sound to me like you're trying to implement a model.

tacit basin
sharp rain
#

Since, the less data in somewhere, so there are some curve is weird, I have done curve fitting, but not exactly perfect I think , as all the curve shape and coef is the same lol

#

BTW, i think statsmodels is more better, there are more information that give you in summary model

sand stone
#

well, stelerkos, the voice needs to instantly generate the audio and speak it after request. The voice sound is already fixed, once I find a cool voice I just want to be able to optimize it

#

need*

serene scaffold
#

@sand stone you're telling me about the responsiveness. I need to know how realistic you need for it to sound.

#

The more realistic you need for it to sound, the less likely it is that there's any way you could possibly synthesize it fast enough for your use case on a CPU

tacit basin
sand stone
#

no need for overly emotional voices, just a simple reading speech, like reading a story, that kind of intonation. Can I make a model using the realtime voice cloning toolkit to make a voice model with only six or less sentences of myself, then somehow modify the hyperparams to disallow it to vary the speech for each utterance of the same string, I don't want it to change how it reads stuff the more I give it the same text. Then can I take/export that little model and go from there?

serene scaffold
#

only six sentences of yourself? it would take significantly more audio to generate something that sounds realistic even for completely monotone speech.

sharp rain
tacit basin
sand stone
#

stelerkos, I'm not taslking about training a standard tts model with a crafted dataset that the more you give it the better it'll sound, I'm talking about a tts toolkit called realtime voice cloning. It can generate speech with your voice with only a few audio files

#

so I hypothesized that maybe the models generated with that toolkit are much smaller, and much easier to integrate within a realtime requiring application like screen reader use

serene scaffold
#

I don't think it's possible to achieve this with the constraints you have specified. That's probably all I can contribute.

sharp rain
#

@tacit basin All the curve in the same shape, cause the coef is the same, but in reality, the coef is little diferent

sand stone
#

do you even know about real time voice cloning? I'm talking about the toolkit

serene scaffold
#

I'm not familiar with whatever that toolkit is, but I work as a computational linguist and have been on a project for creating TTS systems that are indistinguishable from a real person (without emotion) for several months.

sand stone
#

absolutely not to be rude, but it says in the docs/manual/whatever you call it, that it can generate a tts voice similar to the wave files, with only a matter of minutes of audio

#

so stelerkos, can we voicechat about this?
seams like you have some expertees

serene scaffold
#

I can't voice chat; sorry.

shy pewter
#

hi

sand stone
#

ah no prob, I'm sorry for sounding rude in that message, I absolutely didn't mean to, mister greekson

#

and hi

shy pewter
#

lol people are talking some advance things

serene scaffold
#

I'm not actually Greek. A couple of mods changed our names to greek letters for the lulz.

sand stone
#

lol

serene scaffold
#

Do you have a link for this toolkit? I should look into it. I'm a bit skeptical as to how well it might perform.

sand stone
#

oh I see, well cause I'm learning greek and am interested in it, I can also type english with greek characters if you're interested

serene scaffold
#

We should use Latin letters so that everyone can read what we're saying. Also, how would you transliterate H?

shy pewter
#

what is greek

serene scaffold
sand stone
#

sorry about that, just thought I'd translate quickly

shy pewter
#

im not

sand stone
#

oh, sorry then

shy pewter
serene scaffold
#

ana darastu al-arabiya mataa ana kuntu taalib 3alam al-lughaat

#

(attempted meaning: I studied Arabic when I was a linguistics student.)

shy pewter
#

what

#

ooo

#

nice

sand stone
#

don't type that way, it's confusing, especially the numbers used for letters

#

but nice

#

here's your linky mister stelerkos

serene scaffold
#

I don't use Arabic often or well enough to use an Arabic keyboard.

sand stone
#

yeah I understand, I can speak english just fine :D\

#

oh by the way I apologize again for my rudeness

sand stone
#

oh handyyyyyyyy

shy pewter
sand stone
#

python discord creators, if you're reading this, you're geniouses

serene scaffold
#

.source bm

strange elbowBOT
#
Command: bookmark

Send the author a link to target_message via DMs.

Source Code
serene scaffold
#

looks like it's not clear from the commit history who first created this.

sand stone
#

well it makes a voice that can speak anything you give it, when you provide it an audio file so it learns the voice's characteristics and speak anything you give it provided that the language is supported

#

woah really, stelerkos? but isn't qorintin j or don't know how to spell the name the one who made it?

serene scaffold
#

the command is older than the first commit in that file, so it must have been moved from somewhere else.

shy pewter
sand stone
#

if there is a pretrained model for that language which is supported, yes, it'll generate. I'm not sure if it can do crosslingual voice cloning or not

#

maybe it was cloned from somewhere else and forked and whatnot

shy pewter
#

i had a question. is that possible to make a voice assisstant just like google assisstant but runs on my own servers and has machine learning so it keeps learning for me

sand stone
#

I think there's something called lione or leon or I don't know what it's called, it has many modules that customize it and you can run it on your own servers

lone drum
#

hello i am trying np.where conditio for two conditions

#

my code ```python
whole_new['status'] = np.where(whole_new['time_hm']==whole_new['time_hm'].iloc[0], 'entry', 'hold')
whole_new['status'] = np.where(whole_new['time_hm']==whole_new['time_hm'].iloc[-1], 'squareoff', 'hold')

mild dirge
#

I'm reading multiple things online. Is stochastic gradient descent always updating after 1 train sample, or after a subset of the training data?

serene scaffold
mild dirge
#

Ah alright thx

#

I see it called mini-batch gradient descent for a subset, and distinguished from SGD

#

like here in some course slides

serene scaffold
#

they might be synonyms. one of the downsides of the democratization of data science is that anyone can go online, write an article, and call things whatever they want. there are several concepts in NLP that I want to rename, but I don't want to be the one to dilute the vocabulary.

mild dirge
#

Yeah that's True

#

It's kinda annoying, because at my uni we get repeated the same info many times over, and everyone starts calling stuff different and making different distinctions between names

serene scaffold
#

maybe SGD does usually refer to one sample tangerine_think

desert oar
#

but I think nowadays people only make the distinction for educational purposes

serene scaffold
#

interesting

#

from Wikipedia (emphasis mine):

[SGD] can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).

desert oar
#

yeah

serene scaffold
#

boo

#

now I'll have to figure out what they meant every time I read something that mentions it

desert oar
#

that's why I said "traditional" because in practice that's how people use it

serene scaffold
#

it's not traditional if it's no longer being passed down 😠

desert oar
#

because it's silly and arbitrary to distinguish

#

at least as far as i know, the batchsize=1 case doesn't have any special properties that the batchsize>1 case doesn't have

#

so there's not much point in distinguishing, except if you're teaching the concept for the first time and you want to start with batchsize=1 for simplicity

#

maybe i'm wrong and the batchsize=1 case is special mathematically, but if so i have no intuition for it and i've never heard of it before

serene scaffold
#

for batchsize>1, doesn't there need to be some reducing operation that would be idempotent for batchsize=1?

desert oar
#

which is already a "reducing" operation

#

you are just varying the number of points you use to compute the loss

#

batches smaller than the full dataset are basically approximations of the full gradient

#

also don't forget that it's the gradient with respect to the parameters, not to the input data. so it's not like the size and shape of the gradient vector is changing as you change the batch size

modest shuttle
#

Is Prophet Library a type of artificial intelligence?

urban lance
#

why does something like this work:

df["col"] == b

While this doesn't:

df["col"] in range(50000, 250000)
misty flint
modest shuttle
modest shuttle
south condor
#

Hope it helps beginners šŸ™‚

misty flint
#

plenty of venn diagrams about this

#

as well as debates on definitions

misty flint
#

no sorry

#

im not comfortable dming strangers

modest shuttle
modest shuttle
desert oar
#

"machine learning" is a general category of problem and also a field of study. in general, machine learning is anything where a machine or algorithm learns to perform a task without human supervision by learning from data. statistical modeling and analysis is often used to solve machine learning problems.

"deep learning" is just a buzzword for "neural networks with a lot of layers".

"AI" is anything that mimics human intelligence. not all AI even uses "machine learning", for example an "expert system" with a lot of hard-coded rules could be considered a form of AI. conversely, not everything that uses machine learning is AI.

#

prophet is a machine learning algorithm that uses statistical inference. it is not deep learning. it is not AI by itself, but could certainly be used in an AI-like system if it made sense to do so.

#

@modest shuttle does that help?

modest shuttle
desert oar
serene scaffold
inland mantle
#

Andrew ng is the best at teaching ML

#

I do not think anyone else is a better teacher

misty flint
#

you should check out his newest initiative

#

data-centric ai

#

pretty dope

misty flint
#

there was also an IEEE article i think

#

if thats more your cup of tea

#

ā€œIn many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn.ā€

desert oar
desert oar
misty flint
#

actually bayesian methods are gaining in popularity

#

idk if its just hype or what

desert oar
#

i think theyve been gradually on the rise across fields

misty flint
#

since the one bayesian podcast i listened to said many of these causal methods have only been applied to small datasets and they werent really used for these use cases

desert oar
#

i also think that theres still a lot of active research and development on the techniques

#

right, exactly

misty flint
#

yeah its needed too lol

desert oar
#

i think there's still a lot of "todo" work in bridging between the traditional social science applications of causal inference and causal ML

misty flint
#

like where do the statistical assumptions fall apart and how robust are they, etc.

desert oar
#

i think it's also gaining popularity because the promise of bayesian inference is precisely what people are looking for

misty flint
#

interesting interesting

desert oar
#

today's machine learning techniques are in a similar position to traditional statistical frequentist techniques

#

there is no coherent framework right now for making principled decisions that incorporate uncertainty, measurement error, etc.

#

the promise of bayesian inference is such a framework

#

the need and desire for such a thing has been growing over the last several years, with all the media attention about "bad AI"

misty flint
#

thats true

desert oar
#

and in the last ~10 years the software and math methods for doing bayesian inference have improved by a huge amount

misty flint
desert oar
#

nice, i never even saw that

misty flint
#

yeah its pretty dope

#

obv you have to set your assumptions beforehand

#

but thats something you can work with a SME/domain expert with

desert oar
#

i remember working through the beginning of the koller coursera course on probabilistic graphical models but never had the time to commit to the full course because i was busy with my actual undergrad classes

#

now it's not free anymore 😢

misty flint
desert oar
#

i feel like podcasts have too low of an information : content ratio

misty flint
#

the hosts each specialize in different bayesian schools

misty flint
#

its def not a sit down activity

desert oar
#

its like they are the worst of both worlds, i am sitting there trying to learn something but it's 50% casual banter, or i'm trying to hang out and not think about work but it's 50% work talk

misty flint
#

nah dont use it to sit and learn something

#

thats what courses are for

#

this is more just eavesdropping on peoples' convos

desert oar
#

yeah i can't do the thing where i listen to podcasts about data science while i'm commuting, i need to not think about work in my free time

misty flint
desert oar
#

i burned out once, it sucked

#

not going there again

misty flint
#

understandable

#

that was me in my previous life/profession

#

but theres also podcasts about everything

#

may or may not be addicted to them RunFail

#

one of my faves is 99% invisible

#

its about architecture and design DoggoKek

desert oar
#

oh ive heard of that one, didnt know what it was about

misty flint
#

yeah its pretty interesting since we dont typically think about the physical world around us

#

at least i dont until that podcast lol

inland mantle
robust jungle
#

does anyone know if there is a good method for video recognition? Example: You want to find out if a person is tossing a ball upwards. Dropping the ball and holding it in place do not count

#

could you just stitch the frames together?

mild dirge
#

you probably want some recurrent convolutional network

#

Or maybe if you can locate the object, you could just use the bounding box coords to see if the ball is going upwards or downwards

lapis sequoia
#

How can I reduce the length of bars in an histogram. Basically, change the scale of Y axis.

iron basalt
# misty flint my bayesian knowledge comes from this podcast: https://podcasts.apple.com/us/pod...

I think it's best to first learn about classic Bayesian methods like belief networks and belief propagation (https://en.wikipedia.org/wiki/Bayesian_network https://en.wikipedia.org/wiki/Belief_propagation ). Then if you want to incorporate causal methods you can learn about structured causal models (https://en.wikipedia.org/wiki/Causal_model ). The key idea is to understand the ladder of causation (also Judea Pearl's work in general). (https://arxiv.org/pdf/2106.07635.pdf if you want to go further)

A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of...

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node (or variable), conditional on any observed nodes (or variables). Belief propagation is com...

In the philosophy of science, a causal model (or structural causal model) is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.
They can allow some questions to be answered from existing...

robust jungle
mild dirge
#

Not really experienced with sequential data tbh, so I'm probably not the best person to answer :/

robust jungle
#

appreciate it

robust jungle