#data-science-and-ml
1 messages Ā· Page 391 of 1
Ah shucks
I know for logistics, if its negative ---> ok prob 0, if its pos ---> ok prob 1.
Do you have an overall understanding how Adaboost (or Bagging Boosting Ensemble) models work overall?
I'd like to think so!
I guess this gets to the interpretability part, where adaboost def brought up my predictive power, but I cant turn around and tell someone "yeah sex really matters, idk which way but it matters!!" ?
oh wait, I misspoke - Adaboost is a Boosting ensemble, not a bagging ensemble
Side question -- Adaboost still bags in the sense that we're resampling each time?
in a nutshell,
you make a model that underfits the data, with a high error
you make another model, that tries to predict the previous model's error, but also (intentionally) under fitting
you make another model, that tries to predict the previous model's error, but also (intentionally) under fitting
you make another model, that tries to predict the previous model's error, but also (intentionally) under fitting
...
I'm not sure enough to answer that confidently
The sklearn documentation describes the feaure_importances_ as
The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance
so you could just say "This has a Gini score of xyz", but it is a fairly arbitrary metric.
It is not a simple decision function, thus it is not easy to say how much "weight" exactly each parameter has
Interesting
So for boosting its like these high important variables reduced the iterative error the most across models (or something like this)
According to the sklearn documentation for AdaBoostClassifier:
An AdaBoost [1] classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
Looks like it doesn't resample on each iteration.
Interesting so AdaBoost takes the whole sample given and works with it
Whereas Randomforest would bag
?
Yup, I believe so
Cool! Ty
I guess my initial question still stands (if it even makes sense), if I wanted to explain this to someone, how can I tell how the features are important for classification?
Like is it older ages that are important for being adopted or younger ages?
Conceptually, it's neither. In your case, age is a significant factor for predicting adoption, but it does not say whether older animals are adopted more than younger animals. Something like Shapley values may give you that information.
interesting ill look into it, thank you!
May I ask a final question to you? No worries if not!
Sure!
Awesome ty! So i've been practicing decision trees as you can tell and I built my own dataset using adoption data. I started with a simple decision tree and found an accuracy of 72% (I just testing depths of 1 thru 10 and picked the best). After I used the SAME training data and made random forest and boosted trees, both get to 71%, but they never do better than my first simple decision tree. It makes me think I messed up, how could the simple single tree out preform all these complex models that I fine tuned?!
that's a really hard question to answer without seeing all of the data and code... are those accuracies from the training or hold-out data set?
Hold out dataset
like at the end of the day its a percent difference, but still
Like I did a train_test_split, trained over train, predicted over test
@gilded bobcat yeah, it could just be that you have a ton of unexplained variance, but try looking at other performance metrics (recall, F1, ROC AUC, etc) and your class balance
I'm also a little suspicious that one feature has 10x higher importance than all of the others
Yeah... If I build out a tree of depth 3, then only Age is the important feature.
Maybe I need to take a step back and start with a very simple model and try over
That's a lot of tool š
for now. people are saying there will be consolidation in the future but who knows 
its not even all of them in the pic 
That's even more tools lol
yep yep you found matt turck's chart
he does this like every year
and then you can see which AI startups die and stuff

what is horizontal ai?
Oh basically like specific field vs multiple fields ig
ugh i feel like i cant decide what i like/want to do
traditional DS work vs. DE vs. MLE vs. some flavor of SWE with ML

i hope i can decide by end of summer
when i graduate 
does anyone know why they say you need 2-3 data engineers for every data scientist?

like i get that there needs to be ETL and data warehouses for DS but like
it seems like a 1:1 ratio might also be fine?

Recently looked at some code I found on kaggle using Pytorch. they made a model for image classification. Was wondering if there was a bug in this code:
https://paste.pythondiscord.com/ajepebanug
They define self.pool2 = nn.MaxPool2d(2, 2) once in the init. In the forward pass they use it multiple times. Is this incorrect, as it basically trains the same layer between multiple layers?
I assume you have to make a new MaxPool2d instance for every pooling layer you use in the network right?
there's nothing to train/learn, so it's probably just for convenience
@mild dirge āļø
Yeah, it wasn't really a tutorial, more so a code showcase. But we copied part of the architecture design. Spent like 2 hours making a diagram of it:
looks nice, what application did you use for the diagram?
hello
is there a way I can combine all of this to make it simple instead of making multiple dataframes
df1 = working_df.drop('Embeddings', axis = 1)
df2 = df1.sort_values(['Match %'], ascending=False)
return df2[df2['Match %'] >= 40.00]
Hello, I have a dataframe that has two columns "Names" and "Original Name", i have 5 original names that are NaN, how can I apply a lambda function (or any) to assignn the missing data in original name to be equal to the Name??
for example a Russian name spelt in English is there but in the "Original Name" it's spelt in Russian text -> but it's missing. How do I tell the Original Name to be set to the "Name"
```nan_in_col = data.apply(lambda x: data['original_name'] = data.name if data[data['original_name'].isna()] == True)````
figured it out
nan_selection.fillna({'original_name':nan_name.name })
Yeah you could chain all these methods
out_df = (working_df
.drop('Embeddings', axis = 1)
.sort_values(['Match %'], ascending=False)
.query("'Match %' >= 40.00")
)
Something like that. Didn't test it. On mobile now
thank you so much
If I have a classifier getting 5% accuracy on a data set I know to be binary 50/50 splitā¦.
Surely itās then detecting something
Is there a likely reason or is this just a fluke?
can we make a bot which knows everything?
like an active data inserting sort of feature so it learns?
hello, off topic, but does anyone know if kaggle has dark mode, they only have it for their jupyter notebooks
how can I create a canvas 35x35 pixels in python because I have a task to create a paint canvas that is 35x35 pixels only is it possible to create a larger canvas that is 35x35 pixels only ?
I want to create a vowel recognition using with image processing
Yep that's annoying. I use dark reader extension for dark mode everywhere. But with kaggler notebooks you don't see cursor wit it. So there the notebook dark mode which is fine but the the other kaggler pages are not dark. :(
yeah, its like a flash bang when I switch kaggle notebook and the courses
XD
I hate it too. Most likely there's some way to add custom css to dark reader to show notebooks properly. Then that would be a nice solution.
I see. okay, thanks for answering :D. I'll try looking
hi guys
so i am getting a new Windows laptop
i have been studying python for a month, and today learnt about using Conda to create virtual environments
so when Im installing Anaconda freshly, should I or should I NOT, select to enable/install Conda to PATH
same question for Python installation, should i enable Path during 1st time install?
because I can install it later anyway while creating virtual environments, no?
yeah, you want to add anaconda and python to path, so you can launch these from any directory. most likely you will not use system python a lot if you use conda, but it stills good to have it on path i think.
ok thanks
another question
if i have installed conda, and specific python, say 3.7
and wanna code in pycharm/vscode
how do i ensure that it is using 3.7 from my virtual environment, and not the main python, say 3.9
in vscode you can select which python install you want to use, the same in pycharm. it's fine.
oh cool. what about the packages
if i installed numpy in VE1 and pandas in VE2,
how does pycharm know which VE to look for, for the pandas package?
It gets confusing when ur in mac haha
if you need two packages for one say app, you need to install them in one env. they are two camps, some people use just base env and install all there, it's good for interactive development as all packages are available.
some ppl crate separate env for each project, this is good for production i guess. but both options are fine.
I canāt keep track of direct
I only launch out of conda
For this reason
Maybe if I get a new computer Iāll do it right
why is it confusing?
no. my question is different
say i have created a Virtual environment - VE1 - which has numpy package and Python 3.7
and another environment VE2 - which has pandas, and python 3.5
so when i open Pycharm
how/where do i select so that My working environment is VE1
Hey guys , I am currently reading "Hand's on Machine Learning" book by Aurelion Geron and I have planned to share the things I have learnt in the book , in this notebook I have added some important things I learnt in the book along with the code. Please give your honest feedback and should I continue with the upcoming chapters ....
Hi guys can someone please look at this and tell me what am I doing wrong?
you'll get better answers when you share text as text rather than as screenshots. I suspect that one of train_images or train_labels is an array with arrays as elements, rather than a proper n-dimensional array. If you print the arrays out and show them, I can give a more informed answer.
I can't help, unfortunately.
why?
how to fix it?
convert to float 32 lol
theyre rgb values of 0 to 256
they should be normalized/standardized
and close to 0-1 range
!paste stelercus was asking for actual text, and not screenshots. see below for pasting code and other output:
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
it's expecting a lot of people to ask them to squint at a tiny screenshot and manually type out dozens of numbers
train_images = np.array(train_images, dtype=np.float32)
not working
what does it mean?
tf
what is train labels?
did you convert those too?
pretty sure that needs to be an array of size n_labelsx1 (2d)
currently it's an array of size n_labels (1d)
how can i fix that ? š
seems that 1d tensor can work too, so should be fine I guess
what error are you getting after converting the train images to float32 in range of 0 to 1 ish?
TypeError: only size-1 arrays can be converted to Python scalars
yeah
that means your trainining images aren't all the same shape
are they all rgb or some grayscale?
sorry for bothering you for this I m new to all the deep learning stuff and it is hard for me to understand the dimensions staff
Yeah it's confusing haha, reading a whole book on pytorch rn, just finished the tensor chapter
they all rgb not so sure about the size i will fix it
oh lol
The error is because it's isn't a rectangular multidimensional shape
Like some rows or columns are different size
For instance when some images are grayscale (1 channel) instead of rgb (3 channels)
What datatype is train_images?
a list of ndarrays?
I used cv2 to change all the images to rgb ones to I think that the problem is the width and height
They must all be the same width and height too yes
ndarrays
type(train_images) is ndarray?
yep
mhm yes thanks you pycharm.
WORKING!!!! I was just needed to fix the width and heights for all images
nice
thanks @mild dirge you the king
i will thank you
Hi! I wanted to make an multi agent simulation kind of thing so can anyone help me with what library or application should i use? I would like my ai code to be in python
multi agent simulation kind of thing is pretty broad haha
Guys, good morning, could someone help me, I'm trying to replace missing value of a specific column, but it doesn't replace, I tried other methods, but when I check with isnull, it still shows missing data.
# take the average of the values 0 and 1 of Age
media_idade_benigno = mamografia.Idade[mamografia['Severidade'] == 0].mean()
media_idade_maligno = mamografia.Idade[mamografia['Severidade'] == 1].mean()
print(media_idade_benigno)
print(media_idade_maligno)
#fill in missing data
mamografia.Idade[mamografia['Severidade'] == 0].fillna(media_idade_benigno, inplace = True)
mamografia.Idade[mamografia['Severidade'] == 1].fillna(media_idade_maligno, inplace = True)
#check for missing value
mamografia.isnull().sum()
Well i am kind of new to ai and ml so if you have any other kind of projects i could start with?
Found this. This course covers everything you need to know for Python and it's libraries : https://t.co/H6Jqf7wLY2
I have a tflite model and to be honest, I'm already having an issue understanding how to run inference with it. When I load up my tflite model and call interpreter.invoke() as in TFLite docs, my code stops running. Cannot even print. Thought this might be a local issue but I get the same result in Kaggle
Hi I have a question: In this code, what the meaning of {i+1:4}?
It is to show the number of the current epoch but padded to fit 1000 in there. @bold timber
You can try running it yourself.
i = 0
print(f"Epochs: {i + 1 : 4}")
i = 9
print(f"Epochs: {i + 1 : 4}")
i = 99
print(f"Epochs: {i + 1 : 4}")
i = 999
print(f"Epochs: {i + 1 : 4}")
Epochs: 1
Epochs: 10
Epochs: 100
Epochs: 1000
as an undergrad, I helped develop a library with a CLI where you can specify the location of the training data and all the hyperparameters at the command line, and it trains a model and saves it. and then you can use the same CLI to load that model and predict. and several people told me that this was remarkably well-packaged. I didn't believe them at the time, but now I realize how low that bar is.
like, the bar is on the ground, and anyone can just roll around on it.
>>> f"{3:4}"
' 3'
it's the f-string formatting thing. i+1 is a Python expression, : means "this is the beginning of f-string formatting stuff", and then the 4 is leading whitespace. I found this confusing myself because I thought the order of operations was i + (1:4)
it helps to remember that : is not a valid python syntax construct, outside of indexing with []. so the : is used to separate the python expression from the format expression
hello
Hello everyone, I have a quick and probably silly question: I'm loading a colored image from COCO dataset through pytorch's DataLoader, then I try to plot it, and instead of using 3 channels to compose a colored image, it plots every grayscale component 3 times along every axis. How to fix that?
I'm probably missing something obvious
apparently matplotlib expects the shape to be (width, height, colordepth) https://stackoverflow.com/a/28917091/2954547
so you need to reorder the axes
maybe use torch.permute https://pytorch.org/docs/stable/generated/torch.permute.html#torch.permute
thank you very much, i will try it
for plain numpy, you have np.moveaxis and np.transpose https://stackoverflow.com/a/57438393/2954547
hi
ok, thank you so much
didn't help unfortunately, the array has the correct size but the image is even more obscured
hm, looks like the data was improperly reshaped
maybe just mess with it / experiment on smaller arrays so you can print them out to see what's happening
that's pretty much what i do when i need to do complicated array manipulations
thanks, i'll try to do that
and maybe some other question, regarding pytorch DataLoader: COCO images have different sizes, so I can create batches of size 1, but with the higher batch size it just can't concatenate
what's the right way to work with this and not reinvent the wheel?
@desert oar hello i was reading about this instead of cross entropy losspyscce=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False,reduction=losses_utils.ReductionV2.AUTO,name='sparse_categorical_crossentropy') scce(tf.Tensor([0 2 1]), output, sample_weight=tf.constant([0.3, 0.7])).numpy() i am not getting what i should put for y_true and y_pred i keep getting an error
how can i put y_pred before the model has been compiled?
i have made an opencv file to detect face i want to send the name of the faces detected into django how can i do it.What my open cv does is i have some pre loaded images and it detects it and appends the name into a list i want that data in my django app to save it in a server how do i do that
Does anyone here know how to use Tensorflow 2 Object Detection API? I'm desperate at this point
@urban prism try asking your question
I'm trying to get a basic untrained transformer to use for sequence prediction and I've got the pytorch example working
but once it's trained, I can't get it to predict
in their code, they forward pass through the transformer like this
output = model(data, src_mask)```
and then that outputs a matrix
of shape [9, 20, 28782]
I know 20 is the batch size
and 28,782 is the vocabulary size
if I have 305 rows, each representing astronauts, how do I plot each of those on a x axis without being over crowded?
but what I want to do is pass it text, and have it complete the text with a prediction
It was some time ago when i used pycharm but there was option to select conda env for sure
so my thought was I could pass model() a string and then cover it with a mask for however many characters I want it to predict
and then convert the resulting tensor into a string
Which columns have nulls? Did you replace all nulls?
Any advice on how to deal with 111 Names that need to be plotted?
can someone give me some potential ai research topics or questions for an extended essay (research essay of 4000 words) intermediate level
So the code I made runs normally. But when you confirm that you performed a swap with isnull(), it returns that it still has missing values
If you want to send the notebook, I'll just let you know that it's in Portuguese
I'm using TensorFlow Object Detection API and had exported the model to kaggle/input/id-out/fine_tuned_model. I'm trying to run evaluation on my saved_model. Unfortunately I cannot run model.evaluate(generator_object) and it seems that I have to use !python /kaggle/models/research/object_detection/model_main_tf2.py --pipeline_config_path=/kaggle/models/research/object_detection/efficientdet_d2_coco17_tpu-32/pipeline.config --model_dir=kaggle/input/id-out/model/train --checkpoint_dir=/kaggle/input/id-out/fine_tuned_model/checkpoint --run_once=True in my terminal. Evaluation part of the configuration file looks something like this
label_map_path: "/kaggle/input/id-out/labels(4).pbtxt"
shuffle: false
num_readers:1
num_epochs: 1
#queue_capacity: 150
#sample_1_of_n_eval_examples: 80
#sample_1_of_n_examples: 80
#min_after_dequeue: 100
tf_record_input_reader {
input_path: "/kaggle/input/id-out/tfrecs/file_??-512.tfrec"
input_path: "/kaggle/input/id-out/tfrecs/file_17-176.tfrec"
}``` (Commented out parts because I have been trying different flags to make it work)
TFRecords files have 8880 samples in total. Running the Python command mentioned above makes my memory load up to 16 GBs and then it restarts the kernel stating that the notebook tried to allocate more memory than available. How can I make it evaluate all my samples without overloading the memory?
Last part of the output of that command:
I0329 14:20:12.925074 139994487641920 model_lib_v2.py:966] Finished eval step 0
W0329 14:20:13.111861 139994487641920 deprecation.py:339] From /opt/conda/lib/python3.7/site-packages/object_detection/utils/visualization_utils.py:617: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
I0329 14:57:09.680602 139994487641920 model_lib_v2.py:966] Finished eval step 100
I0329 15:33:06.951502 139994487641920 model_lib_v2.py:966] Finished eval step 200
I0329 16:09:41.167769 139994487641920 model_lib_v2.py:966] Finished eval step 300
I0329 16:45:51.399167 139994487641920 model_lib_v2.py:966] Finished eval step 400
I0329 17:19:54.271977 139994487641920 model_lib_v2.py:966] Finished eval step 500
Then memory allocation issue happens
thanks. will check it out
Does anyone know how I could adjust my code to have each plot represent a country? Right now it produces the same plot each row.
sns.set(font_scale = .5)
# flatten axes for easy iterating
for i, ax in enumerate(axes.flatten()):
# sns.boxplot(x= data.iloc[:, i], orient='v' , ax=ax)
# plot astronauts who have spent more than 4380 hours (6 months) in space or more
bx = sns.scatterplot(x="name", y="total_hrs_sum",
hue="nationality",
palette="Paired",
sizes=(10, 200),
linewidth=0,
data=long_hours,
ax=ax)
long_hours['nationality'] has 10 values
Does someone know how to get into a generations deep learning like that mario ai that learn from every death?
wdym that 111 names needs to be ploted? what's on x and y axes?
so y axis a count of how many hours each astronaut spent in space
what is swap with isnull?
X axis I'd like the names to be
Could u do a super condensed bar chart lol
With thin bars and tiny tiny writing
Put the text at a 80 degree angle maybe
Histogram
do you want to print all of them? just make the graph looooong. or you may want to print pareto like with other names for small number of hours
Iād use thin bars and small angled text
you can also print vertically
To be honest itās more practical to have a sorted table
Going from high to low hours
Good idea
this is minitab, but i am sure in matplotlib you can do similar, so you start with hightest number of hours and once you reach say 80% cumulative you group rest of the names as other
I have a cudatoolkit 11.1.1, but why it can't be used?
I'm down for that, but how do I do it with spacing and all that
Would a cert in deep learning and machine learning be needed because I have no experience
like needed for what?
For an internship at the least
Can school projects be allowed on resume
school projects like?
I got a fairly interesting one in an AI music startup, and that was through projects alone
Would you recommend undergraduate research on deep learning
if you type !nvidia-smi what do you see?
Or is it too advanced
undergrad research is chill
you aren't really expected to do wonders in NIPS or top-notch conferences, publishing papers itself becomes a pretty sizeable achievement
I'd say its great for an internship, but often engineers don't have the acumen to read papers. whereas projects look pretty interesting and engaging to them
Oh Iām doing several rn
I need to finish the hard ones but Iām only in hs
When would you recommend me to start looking for an internship
And get one for the matter
start now and see what feedback you get and work on that
Same for masters lol
after creating a polynomial regression model with sklearn, can you see the polynomial formula generated?
I assume you can use np.linspace to generate some points, classify them and draw a line through it? @hybrid mica
does sklearn offer a method for that purpose though?
It would literally be 1 line to generate the points, and 1 more to draw a line through it
scikit-learn doesn't generally offer high-level convenience methods for analyzing fitted models
If Iām not going to learn R, is there a similar stats package for python?
Sklearn breaks a lot of stats rules
Sklearn breaks a lot of stats rules...? In what way?
statsmodel has stats in the same so you might like it
jokes aside, maybe do check it out
Statsmodel is great, agreed.
for the formula itself, you can get the coef_ and intercept_ of most if not all linear models, and I think that some transformers support methods to revert the transformation but I am not sure if that applies to all
just generating a linear space and plotting it is pretty standard though
Yeah; I agree, this is not as nice as R stuff, but it works pretty well. https://scikit-learn.org/stable/modules/learning_curve.html has an example of how it might look if you plot it with matplotlib.
[Code: https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html]
This is not exactly what you want, but it's sort'a close.
oh hey, that was finally merged!
https://github.com/scikit-learn/scikit-learn/pull/16061
(mentioning it since it's a plotting thing - mostly if not exclusively for categorical values though iirc)
(5 hours ago)
https://scikit-learn.org/stable/inspection.html might be worth checking out
I have a TFLite model. Everything seems to work until I run interpreter.invoke(). Then the code stops running. The kernel doesn't interpret the lines that come after it. What might be the issue?
Thanks for this link etrotta, that was an interesting read and I didn't know you could do some of that!
so I have this in a very simple keras model
model = Sequential()
model.add(keras.layers.CategoryEncoding(num_tokens=47, output_mode = "one_hot"))
model.add(Dense(47))
and i'm trying to feed in a series of length 2 arrays into the one-hot encoder, with the goal of getting 2 entries of length 47 into the next layer (i also changed dense to 94). however im getting the error
When output_mode is not 'int', maximum supported output rank is 2. Received output_mode one_hot and input shape (None, 2), which would result in output rank 3.
does anyone know how to resolve this?
like this
the pytorch transformer example code has this function that turns a string into a tensor
def data_process(raw_text_iter: dataset.IterableDataset) -> Tensor:
"""Converts raw text into a flat Tensor."""
data = [torch.tensor(vocab(tokenizer(item)), dtype=torch.long) for item in raw_text_iter]
return torch.cat(tuple(filter(lambda t: t.numel() > 0, data)))
i wouldn't say it breaks stats rules, but it's not really oriented for statistics users. at least not in the way R/Statsmodels are
but there's no reverse function that turns the tensor back into a string
and the transformer outputs tensors, not strings
so I need to convert them if I want it to generate predictive text
it looks like this is one-hot encoding words. so you can get the word by finding the index of the column that has a 1 in it, and then using that to look up the word in the vocab
this is torchtext i assume?
yep
it looks like you can use vocab[i] to get the ith word
so for some one-hot encoded vector v, something like vocab[torch.argmax(v)]
https://pytorch.org/text/stable/vocab.html i am looking here
it says that the Vocab class supports __getitem__, which means it supports indexing with []
the problem is i don't think it's one-hot
if I pass it "a"
it returns
[8]
"abc" returns
pass to what exactly?
[ 8, 458, 378]
that sounds like you messed up and it's one-hot encoding letters
the data_process() function
as in, you passed it a string instead of a list of strings, or some equivalent
oh
the curse of strings being iterable
lol
not to be downer, but I've often seen masters students publishing in conferences
or maybe that's just me. Eric Hallahan from eleuther from instance, is an undergrad
They have but itās not required
if I have a list of countries how do I make a for loop to plot each country as subplots?
for i, ax in enumerate(axes.flatten()):
# sns.boxplot(x= data.iloc[:, i], orient='v' , ax=ax)
# plot astronauts who have spent more than 4380 hours (6 months) in space or more
bx = sns.scatterplot(x="name", y="total_hrs_sum",
hue="nationality",
palette="Paired",
sizes=(10, 200),
linewidth=0,
data=long_hours,
ax=ax)```
ok I'm confused. if I do this
vocab(['certainty', 'the'])```
I get
```py
[12272, 1]```
but then why can't I do this?
```py
vocab[[12272, 1]]```
TypeError: __getitem__(): incompatible function arguments. The following argument types are supported:
1. (self: torchtext._torchtext.Vocab, arg0: str) -> int```
nvm I realized it was vocab.lookup_token() that I wanted
Can someone explain why I get this
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [32,46] and labels shape [1472]
when im trying to run a very simple model
model = Sequential()
model.add(tf.keras.layers.Flatten())
model.add(Dense(94, activation = 'relu'))
model.add(Dense(46))
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = 'adam')
model.fit(to_categorical(inputs_array), to_categorical(outputs_array), epochs = 5, verbose = 1)
actually how do i write this as code markdodwn in discord
never mind forced it to work
ofc its not, but that still shows the excellence of a candidate
if I have 10 countries and want to plot each country unique stat how do I do this instead of manually doing it?
belgium = long_hours[long_hours['nationality'] == 'Belgium']```
is there an easier way?
@thin palm could do long_hours['nationality'].value_counts() ?
maybe, unless you have other columns and that's why you're filtering the whole df
well if I plot it'll put all the countries on one plot
what's more interesting to the data science community? A Graph showing astronauts who spent 6 or months or more on missions showing the total amount of time each astronaut spent in space and outside the space shuttle.
OR
each country's male v women time spent on astronauts?
^
"Assuming that the data collector makes an entry error when collecting data, it can
be ensured that the error occurred in the Price and Power columns, but it is not
sure which carās information the error lies on. Please try to explore the error by
visualization to identify how many errors there are and try to fix it."
Hi, what plot can I use for this?
is it correct practice to work out standard deviation of an average column. e.g. i have 4 columns for for different people data and 5th column is average column of those 4. can i still work out standard deviation of column 5 - does it represent SD of all patients
Take the SD of all datapoints then
I'm trying to evaluate my Tensorflow Object Detection API model. My config file's eval_config is something like:
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1
num_examples: 8880
}
eval_input_reader: {
label_map_path: "/kaggle/input/id-out/labels(4).pbtxt"
shuffle: false
num_readers:1
num_epochs: 1
#queue_capacity: 150
#sample_1_of_n_eval_examples: 80
sample_1_of_n_examples: 80
#min_after_dequeue: 100
tf_record_input_reader {
input_path: "/kaggle/input/id-out/tfrecs/file_??-512.tfrec"
input_path: "/kaggle/input/id-out/tfrecs/file_17-176.tfrec"
}
It has 8880 images and when I try to run it with `!python /kaggle/models/research/object_detection/model_main_tf2.py --pipeline_config_path=/kaggle/models/research/object_detection/efficientdet_d2_coco17_tpu-32/pipeline.config --model_dir=kaggle/input/id-out/model/train --checkpoint_dir=/kaggle/input/id-out/fine_tuned_model/checkpoint --num_workers=1``it seems like it only evaluates with 56 images, given the output: https://paste.pythondiscord.com/upuzinusat
How can I arrange this in a way that it evaluates with all 8880 images and doesn't run out of memory -which restarts the Kaggle kernel that I'm working on
what about vocab[12272]? glad you found lookup_token() though
what would you do if a column had only 1 missing value?
investigate why it is missing
if it would be simple enough to, find the actual value and fill in manually
if it's an input error, it wouldn't be viable to get the real value, and that data point does not seems to matter all that much, possibly drop it, but mention that you did so somewhere
well its a take home assigment on astronauts data, and I could technically google their history and find it but this seems a bit uneccessary
are they expecting you to do something like median imputation?
well it's an object so I did 2 different things. One column I did a 'most_frequent' and another missing column I just found the information online since it was accessible
what do you mean by "an object". do not conflate pandas data types with the actual type of the data
that's a bad newbie habit
object representing Text
right. it's categorical data
gotcha makes more sense
a lot of beginners learning data science from a "programming-first" perspective forget that they are actually working with real data that represents real things, not just pushing around 1s and 0s for fame and glory
it's good to avoid getting caught in that. data is data, code is code. we only use the latter in service of the former.
well I'm working with 2 different data sets and part of the assigment is to create meaninful insights and so far i've done that! Now working with the second data set which is a bit more vague
good! and it sounds like you've made progress too
If i have 77% of missing values on one column, normally I'd drop this (30% is my threshold) but this is valuable column. It's showing the cost of each space mission. What would you do?
I would for sure do a mean to give each company an idea of how much each mission costs.. but 77% is way too much
setting a threshold for missingness across all fields usually only makes sense if you have a lot of features and don't have the time to read about and understand each one, or you lack information about them and there's nothing else you can do
i thought you were talking about a field with one missing value
Correct, but this is a new data set š
would it be safe to say that I can impute the missing values with a mean? I like this because it's the cost in millions for each space mission.
wait, the field missing 77% of the data is the cost of the mission?
i suppose you can impute that with the mean, but 77% is a lot. you have to start to wonder if the missingness is non-random, e.g. is the missingness spread unevenly across countries, years, etc.
if all the usa missions have a cost, but all the russian and japanese missions do not, then you're basically extrapolating from the us space program to the russian and japanese space programs
is that a valid thing to do? hard to say without knowing more about space programs
maybe the cost is a function of duration, or the mass of the payload, or the type of mission (shuttle, soyuz, etc)
the launch vehicle and launch location probably are really important factors, etc
tldr you can impute with the mean, but alway keep in mind what you are losing by doing something the "simple" way
but maybe it doesnt matter! it's probably better than throwing out the field entirely... or is it? depends on what you even want to know
maybe mission cost is something you might want to try to forecast/infer with a model
why would launch location and detail of the rocket matter so much? I just don't see how I can make an EDA on this data.. because my best bet was how much each company spends on the mission and if the mission was a success or not
the goal is to just tell them a story and provide analysis of the data given.. no models are needed (though I could go above and beyond to make one)
just guessing at ideas, really. the point is that you want to constantly stay focused on the data as data, avoid algorithmic mechanical thinking in tasks like this, unless it's something that needs to be automated
no no these are solid ideas!
This guys got it. If you miss 77% of data in a field which is dependant on different economies itās super unreliable
How to get output from an open cv programto other python program
Does anyone have experience with plotly?
I'm trying to make a bar chart where the height of each label should be corresponding quantity.
Instead everything is 1 high as each label is unique
You can save image to disk and read it with other script
In my program it recognises the faces and puts their name in a list i want that list in my other program so how to do that
you can just export the list as a file and then read it in your other program
Is there any help-channel or something alike for spark issues ?(silly question maybe but I'm new here)
@lapis sequoia you can ask about spark in this channel
Hello, please stick to the current topic of discussion instead of making irrelevant posts. Thanks!
sorry
Ok, well. I'm trying to write a Spark code to make a pictures classification. I managed to import them, make the train/test databases in dataframes, import the spark-deep-learning package from databricks and it crashs when I call the fit(train_df) in the pipeline. It seems to be a py4j issue : Py4JJavaError: An error occurred while calling o51.transform. : java.lang.NoClassDefFoundError: org/tensorframes/ShapeDescription (and a loooong succession of cryptics lines follow this first one).
i want that information in my django project tho
is it a missing package?
I doubt. raphael@ubuntu18046:~$ pip3 show py4j
Name: py4j
Version: 0.10.7
Summary: Enables Python programs to dynamically access arbitrary Java objects
Home-page: https://www.py4j.org/
Author: Barthelemy Dagenais
Author-email: barthelemy@infobart.com
License: BSD License
Location: /home/raphael/.local/lib/python3.6/site-packages
I do have py4j installed on my machine š¤·
And Java as well. raphael@ubuntu18046:~$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
Hello everyone, can you please consult me on the topic of semantic/instance segmentation? I want to build a pytorch model that produces masks for every object class on the image (whether seperate "instances" or just "semantic" classes, doesn't matter). What would be the easiest architecture to achieve such masking? I scanned through the code of the known solutions (mask-r-cnn, yolac) but was scared off. I wonder what are the solutions that are easier to understand and replicate. Thank you very much.
no idea... good luck 
then you can deploy the inference part of the application as a rest api, your django project would send the request and get the result back from the response
I already tried a lot of stuff but thanks anyway. Any encouragement is good to take.
And maybe, one day, I will finally nail it.
if it makes you feel better, it seems like a java problem and not a python problem
Well, I tried with java 8 and java 11 (with respectively the proper path directory given in the .bashrc) and in the two cases the same error occurred. As far I know there is no other versions available.
ok thanks a lot
i mean it might be a problem with the java package setup
i have no idea how that stuff works
How do you make subplots of multiple dataframes with plotly (like this)
online I only see code to do this with the columns of a single dataframe
what is your data model? why are there multiple dataframes, and what is in them?
I have a dictionary containing the dataframes
each dataframe is a sub dataframe of the original (grouped by the values of a column)
For instance: all fruits are in 1 dataframe, all cars are in another dataframe, all animals in another and so on
are the columns the same for each of them? if so, it would be more idiomatic to have one dataframe, and have each of these groups (cars, animals, etc) as a level of indexing.
I want the plot to tell something about each of the different groups over time
so is each group going to be a different color?
yes
like in the image above
except the code I have is for columns of a single dataframe rather than for each dataframe
so you're aggregating each group in some way? because this only has one line per color.
nope, just like this
df_rent = dataset[dataset["type"] == "RENT"]```
df_rent would be one of the sub dataframes
I looped though all the dataframes in the dictionary and I can make line plots of them seperately
instead of that (or making the dict of dataframes at all), do dataset = dataset.set_index('type', append=True)
but I want it in a single plot
with different colors for each dataframe
when I try to put them into one column, this is the error I get
I know that. I would need a sample of the dataset to experiment further.
it would need to be copy/pastable
I'm afraid that won't be possible
I can't assist any further, then. Sorry!
fig.add_trace(px.line(y=df["omloopsnelheid"], x=df[key]))
fig.update_layout({"title": f"Omloopsnelheid {list(my_dict.keys())[i]}",
"xaxis": {"title": "Number of days"},
"yaxis": {"title": "Quantity"}})
fig.show()
I understand that it's difficult to help without it
just created a working "neuron" which can basically train and detect a & b, in any linear equation like ax+by (ex. 3x+2y), given some example inputs & outputs
thinking of making some kind of a "neural net" program that can train and play tic tac toe, tho that will be a bit larger
anyone's got any ideas for a simple project
If you did a calculator as your first project, this is going to be familiar
Make a scientific calculator? One that could solve harder math like simultaneous equations, ratios, inequalities, etc (maybe make it draw graphs too)
Donāt know if thatās what you want though
creating a network that plays tic tac toe is special because every possible tic tac toe game is known and can be stored in memory.
so in that sense, creating a NN for it is sort of overkill. but it might be a pretty good learning experience.
@tropic edge we're going to mute you if you keep making random statements ("Do you love God?") in a bunch of channels
I just had a really random (and potentially stupid) thought, but is it possible to use Huffmans Algorithm/Coding to clean and compress data?
compression and cleaning are pretty much unrelated. you also can't use compressed data until it's decompressed.
it sounds like you might be conflating compression with density
nah it was just me overthinking
because i was trying to figure out a way of involving Algo and Datastructures with Data Science
compression is for making files that you aren't actively using take up less space on the hard drive. that's just a general computer thing.
in the context of data science, you can have dense or sparse representations of data. sometimes you can make a sparse representation more dense. but not by using a compression algorithm.
graphs are used extensively in both.
wdym by "dense and sparse representations of data"
are you familiar with one-hot encoding? that's probably the quintessential sparse representation.
not at all ā¹ļø
if you have ten categories of things, you could represent each thing as a vector with ten elements, and each category gets an index
A: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
B: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
C: [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
... etc
This would be a sparse representation, because each instance is represented by a bunch of zeros, and only a single 1.
and it just gets more and more sparse the more categories there are.
alright, would dense representation just mean the opposite
right. a dense representation is when information is represented using less space. but you can't achieve that with compression, because when you compress a file, what it represents is no longer there until it's decompressed.
ahhhh ok that makes sense
you had mentioned that graphs can be used
in the data science - alg and db relation
right. neural networks are conceptualized as graphs.
alright, so would you say it could also be applied to something "simple" like a web scraper and fake news detection
or is it a little higher level
you can use a web scraper to obtain data for ML, I guess. I don't think there's a connection beyond that.
other than graphs, is it possible to use another other theory in my specific case
or am i overthinking again haha
Hey guys, anyone with experience in PySpark? I need help with converting a python library to a UDF.
there's no such thing as "overthinking"
not all ideas are useful, necessarily, but the amount of time you spent thinking about it isn't the problem.
you might also see if there's a Discord specifically for Apache's ecosystem. People don't really get experience with that unless they work for a company that uses it, and a lot of people here are hobbyists.
Where do I look for such a discord?
wherever you found this one, idk 
you're welcome to wait for an answer here, of course.
I searched the public servers for you, but couldn't find it. You might like:
https://stackoverflow.com/questions/tagged/apache-spark
https://lists.apache.org/list.html?user@spark.apache.org If your question cannot be found there, you could post on StackOverflow or e-mail user@spark.apache.org
yes, so I was thinking about an implementation which will store all the game-states in a dict with a weight, two agents will play the game to train, starting with an empty dict, will add a state to it if it is not there, and weights will be added once a game is finished,
so during the next game agents will select a move based on weights in the dict-of-game-states, though I wonder if it is even a neural network, like it will probably be the one with 3 layers, each with 1 - 5,478 - 1 neurons respectively š
and the problem is there would be no unique weights assigned to the same "state" when coming from different previous "states", creating a sort of "general weight for each state" mess,
for branching weights, one would probably need a 'real' NN, though i really wanna try & see if the former could perform well
how to use GPU in pytorch?
You need a lib that supports GPU, like pytorch, Numba, rapids, xgboost, etc
for most tensors you can do .to(device)
where device='cuda'
or 'cpu' to go back to cpu
Oh sorry missed that noway said pytorch...
I've already written like that, but it's doesn't work. This is my Nvidia. Can you guide me?
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)
try this
I've already to written like this
I don't understand why it doesn't work
No clue sorry
hey guys, any of you is experienced with speech recognition or text to speech in python? Preferably both, but 1 person with sr powers and another with tts powers is ok
ask your question, not if there are experts that can help @sand stone
If someone can help they might respond
I actually don't have questions, I wanna know, since I'm still a beginner and I have robotic ambitions
if you don't have questions, then why are you wanting to get the attention of those who are knowledgeable in these areas?
so then you advise to just drop the cost column?
at that point I'd be left with all object types and what would I be able to do with it?
I could make some plots showing which Space companies reported costs and which didn't
Well if u donāt have costs for entire nations what good is it
Itās got multiple companies ?
But only american
here's the data:
'Cost', 'Status Mission'],
dtype='object') ```
so if the 'location' has different cities in USA, how would I group these as one for USA? I could do this for each different country and see which countries provide cost data
How do I group by certain strings in pandas? If I find the location is "Las Vegas, Neveda, USA" and another is 'Los Angeles,Calfironia, USA", how do I tell pandas to group these puppies based off USA match
try to extract the city, state and country from the string instead e.g.
['Las Vegas', 'Neveda', 'USA']
what would the first steps be for solving the LunarLander v2?

thatās all? no q table or anything?
dont look at me. im only here for the scenery 
i need a dataframe though
it would be better to first split and explode expand the address into separate columns for each of them, but you could just use str.endswith("USA") for now
!e ```py
import pandas as pd
df = pd.DataFrame({"XYZ": [1, 2], "address": ["LA, NV, USA", "SF, IDK, USA"]})
df[["city", "state", "country"]] = df["address"].str.split(expand=True)
print(df)
@agile cobalt :white_check_mark: Your eval job has completed with return code 0.
001 | XYZ address city state country
002 | 0 1 LA, NV, USA LA, NV, USA
003 | 1 2 SF, IDK, USA SF, IDK, USA
(you could drop address after)
edit; also in that case you should have split by ", " instead of the default š
usa = data[data['Location'].str.contains("USA")]
this was easier lol
Hi when I run my code I don't know why its not running and giving the output (i'm trying to solve some coupled ODE's) is anyone able to take a quick look? many thanks ```import scipy.integrate
import numpy as np
from matplotlib import pyplot as plt
#defining constants
C=0.86
h_bar = 1.05457266e-34
Ye =[ 6/12,26/56] #carbon-12 nuclei and iron-56 nuclei
c= 2.99792458e8#speed of light
G=6.67259e-11#gravitational constant
me = 9.1093897e-31#mass of electron
mp = 1.6726231e-27 #mass of proton
rh0_0 = (mpme**3c**3)/(3np.pih_bar)#natural unit for density
#define scaling functions
R0Val=[]#natural unit of length
#cycling through the 2 values of Ye the number of electrons per nucleon
for i in Ye:
R0=np.sqrt(C3imec**2/4np.piGmprh0_0)
R0Val.append(R0)
#defining first order ODE's
def rhs3(x, p):
dpdx = np.zeros_like(p)
M = p[0]
q = p[1]
g=q2/3/(3*(1+q2/3)1/2)#gamma factor us a function of q
dpdx[0] = 3qx2
dpdx[1] = -CqM/g*x**2
return dpdx#return the two coupled 1st order ODE's
sol = scipy.integrate.solve_ivp(rhs3, [0,1],[0,1], dense_output=True)
"""
while the value of the density is larger than a very
small value (q > 10ā10) the program is allowed to run. The small value(aprrox zero) is chosen such
that the second boundary condition of the density at the WDās surface tending to zero is
satisfied
"""
x=np.linspace(0,1,1000)
#m=sol.sol(x)[0, :]((4/3)np.piR0Val**3rh0_0)#mass of the white dwarfs
#rh0=sol.sol(x)[1, :]*rh0_0#density of the white dwarfs
M=sol.sol(x)[0, :]
q=sol.sol(x)[1, :]
print(M)```
well, which output are you getting, if any?
also......... you do realise that you'll likely run into floating point precision issues with values as small as e-34 right?
ohh yh that must be why its taking such a long time to run
is this a good infographic?
not bad. you could also display this data in a 2x2 table
Hi its just taking so long to run so not giving anything atm
Ask your question and people will respond to you; hopefully. "Don't ask a question to ask a question"
Just like Nike, Just Do It āļø
What's demand for MLE like in the US
I keep thinking "AI is ovrsatured"
but my friends keep getting hired as bonafide machine learning engineers haha
is AI actually not oversatured and there's a lot of demand for decent "software engineers who know what a training and test set are?"
I'm interested to know as well.
I have no personal experience. But I read entry level roles are oversaturated because of the popularity of the space. And ML, DS being buzz words. So I think that's what they are reffering to

itās not really over saturated, I think because itās like the hype nowadays
MLE it's more than just model training. It depends on the org, but engineer in MLE usually stands for pipelines, production, etc. Data scientist, ML researchers on the other hand usually develop models i think.
i think thats one component but i also think entry-level positions across the board are less than normal due to covid-affected companies being risk-adverse rn
all other positions in tech that arent entry-level are pretty hot rn


guys can i ask you for some feed back?
im using pickle to storage my data but im also using git to get control of my versions. so as git cant restore or track binary files as pickle creates im made an file called backup of the data of the AI in a .txt file so if the AI gets broken i can restore his behabiour just copyn and paseting the data of the text file into the source code and deleting the binary file of pickle. you tink this is a good solution? or i have to made someting more complex to make easy the restore of the AI to a older version?
R or python? 𤪠@misty flint
That's true yeah.
My perception was that the market for what you've described, "Software engineer in ML" was fairly saturated.
Not as saturated as those for academically trained ML statisticians... but still saturated.
Just an enormous tide of bright eyed people keen to be involved in machine learning.
I should probably say, maybe let's cut it down to
"The job market in the United States for MLEs with 2-10 years of software engineering experience."
Is that bcs these ML jobs pay more than a 'regular' se jobs? Or there's some other reason?
My perception was that MLE pays worse
because it's oversaturated by bushy tailed ML enthusiasts
compared to straight general software engineer or data engineer
But I'm not very sure about my assumptions
R is for research, Python is for products 
i say that half-jokingly 
no but if your industry is academia/pharmaceuticals/bioinformatics/traditional stats, you are more likely to use R
if not...then python is your best bet 


Drink your coffee lmao
There is also julia
shhhhhhh
you cant tell me you understand what kolv is saying 
i do like kolv's cats tho

also
its like
almost midnight here
so no coffee atm

or else i will die tomorrow morning
should probs sleep soon
people talk about MLE here but i also wished we talked about data engineering more too

since you usually need DE in order for DS to have more impact
As Rex said, it depends. š
edit: and here a bit biased towards snake
@misty flint what is the main funtion of your perceptron?
i mean what is the task the perceptron has to learn
i dont have a perceptron...

not yet
id rather talk about data engineering 
what about market exchange data?
im not really interested in finance tho 
because i have a game which is a bitcoin simulator
i tough i can made in the future a perceptron learn how to play my game
just to practice how to make a good perceptron
@misty flint what kind of data enginiering you used work ?

oh thats what you mean
why dont you use reinforcement learning
that could be fun
i dont really do DE at work; i mostly do DS stuff
yea i will run the perceptron for hours and see how much it can survive
just wanted to talk about DE is all
i dont know what DS mean
like that hyper-buzzword: "The Modern Data Stack" 
Data Science
ok you fell sad about it ok
its super blurry
i dont feel sad - i just wanted to talk about Data Engineering 
ok im just an mecatronic/informatic enginiering but i want to become a robotics enginier
nice. thats intense stuff
yea thats why im here im searching for feed back about my work
oh...thats what you meant. i still think for your use-case, you should check out reinforcement learning; its also used in robotics too
yea i will used but whit a preceptron format
and i will make some escenarios to get easyest learn about an specific task
my main problem is not that, the main problem for now is the correct storage of data and version control
i...have never heard of combining perceptron with reinforcement learning...good luck 

well tell us what is this image you send
its a diagram of how to make a product from the farm to the final consumer?
Hi, if someone can help me !
My problem is in french im gonna try to translate it.
I have two equations :
x(t) = p1 + sin(p2 * t + p3)
y(t) = p4 + sin(p5 * t + p6)
with x(t) et y(t) the position of the satellite at instant t (0;2*Pi).
p1,...,p6 are the parameters I have to find to anticipate the movements of the satellite at an instant t given.
I have a list in a folder : "position_sample.csv" which give me some positions at t given.
My job is to create an genetic algorithm which can find a good combination of p1,...,p6 that give the right trajectory of the satellite.
To simplify the exercise, each parameters are included in [-100;100]
Im a beginner I take any help
ok im also a beginer but it doesnt sounds too dificult
have you made any of the perceptron already_
I tried but .. looks like nothing
i made a proyect similar in the university but it was a robot
in my case it has to calibrate his sensors whit data of the past
I know I have to write some function called fitness mutation crossover but im lost
this problem can be solve whit machine learnig but i tink its too much for htis you can make an simple algoritm to play hot and cold and get valid values to solve this problem
or make this in a little perceptron if you want this have a full automatic calibration for any kind of curve
you can solve this whiout much math
you have to solve this whit an AI or can you use any kind of algoritm?
I have to solve this with an algorithm
ok make an simple hot and cold algoritm
its simple and it work perfect for this problems
its less proccesing heavie for the computer
and you can develop in 20 minutes
whit AI this can take days in develop
just training the AI
Hm I misspoke, I have to use AI but u sure it takes that much time ? bc I have to do it for tomorrow
I mean we have 1 day to do it
it tecniclly posible doing in 1 day but just whit a lot of practice and knowing exactlly what are you doing
<
in our case i recomend make an simple algoritm that learn how to calibrate his exit whit out much math as an AI
is easier as using trigonometry insted sinusoidal waves
well in my school i called the algoritm hot and cold
How do you create an algorithm that learn how to calibrate his exit
its because this algoritm predicts where the thing whill be and if it fail it changes his value
its probably easy for you but idk
6 parameters p1,..,p6, time t and positions x,y i guess ?
in you case it could be actual_potition(x,y,z), acceleration, and speed
you are using only x y?
ok
there is no notion of speed here
only position
I have to predict the position at instant t
and you give multyples "pictures" to calibrate the algoritm?
so i tink it would be
you have time t and x,y as input data
so you have to predict the next input data as exit
yes
ok and the p1 ......p6 are you hiden layer of "knowledge" in the neurons of the perceptron (wheigts)
sry i didnt understand your sentence
Is DE fun? š
oh i understand
I have a folder yes
with some knowlege
- P1,...p6 are included in [-100,100]
ok then you normalitation of data is in the range of -100,100
you can use some kind of perceptron maker but i tink they doesnt accept values biger than 1 and the normalitation default is crap
I rely on you sry
im tinking if you can make this work whit out normalize the data
and you have only 1 day to finish this?
yes
or you didint make your homework of the last mont?
this is like an work it takes at least 3 days for a new student
i can made this in 1 day because im graduated
but i tink its too much for only 1 person
ok i found the example i was searching for
Not gonna lie, I had 5 days to do it >< but still Im completly lost even with 5 days
the good thing i this exaple can do the work you want the bad thing is you have to normalize the data whit anoter algoritm
ok 5 days sounds more undestandable
but still hard and i even wasnt in that class
ill search for my perfect and very polished normalitation algoritm
Would be more easier if the teacher gave us some example before giving this but nevermind
love u
I got perceptron next semester
Isnāt it like 20 lines of code or less?
Sure only takes 1-2 days
yea but this is for tomorrow
Rip
<
ok i found the code but my comand line is deat because i was playing whit some controlators so i have restart some configurations ill be ofline some time
ok
from sklearn import preprocessing#ayuda a preprocesar los datos de entrada
from sklearn.preprocessing import MinMaxScaler
#input_data = np.array([2.1, -1.9, 5.5],#xyz seeker1
# [-1.5, 2.4, 3.5])#xyz seeker2
# input_data = np.array([2.1, -1.9, 5.5],
# [-1.5, 2.4, 3.5],
# [0.5, -7.9, 5.6],
# [5.9, 2.3, -5.8])
input_data = ([13, 50, 200],
[802, 23, 38],
[0, 0, 1240])#1240 es el valor maximo para calibrar el resto de numeros inferiores
data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print ("\nMin max scaled data:\n", data_scaled_minmax)
print (input_data[0])
print (input_data[1])
#create NumPy array
data = np.array([[13, 50, 200, 802, 23, 38, 0, 1240]])
#normalize all values in array
data_norm = (data - data.min())/ (data.max() - data.min())
print (data_norm)```
this is all the code you ill need to normalize ina perfect form any kind of data
the bad is you have to manually put the data in the array or change the code to read the data from you .csv file
I don't have to use the equations ?
no. just enter the data into the input data
thats how perceptrons works
you give him data and it gives you data in exchange all cooked and ready to use
How do I collect the 6 parameters p1,..,p6
but that file you have its ahiter the trining data or the usuall data it will manage and the perceptron have to calibrate himself to acmoplish his task
Im gonna try it, thx
in data = np.array([[13, 50, 200, 802, 23, 38, 0, 1240]])
data = np.array([[p1, p2, p3, p4, p5, p6, t]])
import numpy as np
from sklearn import preprocessing#ayuda a preprocesar los datos de entrada
from sklearn.preprocessing import MinMaxScaler
#input_data = np.array([2.1, -1.9, 5.5],#xyz seeker1
# [-1.5, 2.4, 3.5])#xyz seeker2
# input_data = np.array([2.1, -1.9, 5.5],
# [-1.5, 2.4, 3.5],
# [0.5, -7.9, 5.6],
# [5.9, 2.3, -5.8])
input_data = ([13, 50, 200],
[802, 23, 38],
[0, 0, 1240])#1240 es el valor maximo para calibrar el resto de numeros inferiores
data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print ("\nMin max scaled data:\n", data_scaled_minmax)
print (input_data[0])
print (input_data[1])
#create NumPy array
data = np.array([[p1, p2, p3, p4, p5, p6, t]])
#normalize all values in array
data_norm = (data - data.min())/ (data.max() - data.min())
print (data_norm)
what does input_data = ([13, 50, 200], [802, 23, 38], [0, 0, 1240]) do ?
p1, p2, p3, p4, p5, p6, t are unknown
well in the actuall state its no goin to work because you have to give values to a p1 p2 ...
random values
if you want but the perceptron will not lear aniting because they are random
i mistaked that its no p1 p2 p3 it has to be x,y,t
ok x,y are the equations with p1 .. p6 in it
those lines are just to normalize the data into numbers betwen 0 to 1
xy are the input data but p1 to p6 are the neurons
inside p1 ...... p6 exist values betwen 0 to 1 and those are the knoledge of the AI
all that conbinated gives you an exit of data it will be x,y,t of the future
well I do have to install sklearn, i do it then i try
ok im back and is very funny the error i haved
with open('position_sample.csv') as f:
data=np.empty(shape=90)
lire=csv.reader(f,delimiter=';')
for ligne in lire:
data = np.append(ligne)
it doesnt work with numpy I try to find a solution
are you using which comand to run it?
im using spyder
are you using python3?
yes
I have this TF Lite object detection model. It seems like it returns a (1,50000) shaped array. I'm having a hard time understanding it's output. The code:
import cv2
import numpy as np
import tensorflow as tf
im = cv2.imread("/path/to/image")
im = cv2.resize(im, (768, 768))
im = np.array(im,dtype=np.uint8)
im=np.expand_dims(im,0)
interpreter = tf.lite.Interpreter(model_path="/kaggle/input/id-out/model_quant_tl.tflite")
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_ = tf.convert_to_tensor(im)
input_tensor = input_[tf.newaxis, ...]
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], input_)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data) # [[65504. 65504. 65504. ... 42720. 4148. 14856.]]
print(output_data.shape) # (1, 50000)
Can someone explain this output_data to me? I'm trying to plot the image with output detection boxes
wait im having troubles to test the normalitation algoritm
sure
i tink this is a old vertion
But I think u shouldn't waste more of your time
Does anyone know how to do interpolation from two value which is non-linear
I have nine equation of different temp, but If i want about 16.5 temp, how can i interpolation between 16 and 18 trend line?
you would need to approximate your temp with some non linear function, then you would be able to find out temp value for given input
I fell tired about handling the missing value in machine learningš„²
if its pandas fillna doesn't work?
it might not be as straightforward as replacing them all with 0 or the mean, in their case
i see
do optical character recognition libraries use ai?
yes
just curious to know if I can recrute somebody
IntroThis is a part of the series of blog posts related to Artificial Intelligence Implementation. If you are interested in the background of Ā the story or how it goes: #1) How to scrape Google Local Results with Artificial Intelligence? #2) Real World Example of Machine Learning on Rails #3) AI
We don't allow that.
wait seriously?
Yes
I'm sorry then, didn't know. If not recrute, then how about knowing people, or assistance request
You can ask questions related to tts, yes.
does anyone know a trick to optimize heavy tts architectures to work in realtime and on the cpu instead of the gpu?
Do anyone know how to do more better in Optimization-curve fitting, since now my cuvre only have one parttenš¤£
For training or for deployment?
for deploiment
My way to do curve fitting, is about using the coef from most data line, and the intercept from them.
How quickly do you need to be able to generate ten seconds of audio for it to be viable?
for example, if I want to impliment a model for a screen reader, which requires the voice to be extremely responsive, introducing no lag or delay what so ever, no matter how much the size of the text is, just like how responsive espeak, rhvoice or flite are, and those aren't neural based tts systems
How realistic does the voice need to sound?
Also, remember that "use", "deploy", and "implement" are not the same things. It doesn't sound to me like you're trying to implement a model.
you can try something like this: https://iq.opengenus.org/polynomial-regression-using-scikit-learn/
In this article, we have implemented polynomial regression in python using scikit-learn and created a real demo and get insights from the results.
thx, this article only introduce polynomial regression, but i already done that, now the issue is about how to fitting the eqution
Since, the less data in somewhere, so there are some curve is weird, I have done curve fitting, but not exactly perfect I think , as all the curve shape and coef is the same lol
BTW, i think statsmodels is more better, there are more information that give you in summary model
well, stelerkos, the voice needs to instantly generate the audio and speak it after request. The voice sound is already fixed, once I find a cool voice I just want to be able to optimize it
need*
@sand stone you're telling me about the responsiveness. I need to know how realistic you need for it to sound.
The more realistic you need for it to sound, the less likely it is that there's any way you could possibly synthesize it fast enough for your use case on a CPU
using the example code, like this would work for you?
#predictig the result of linear regression model.
lin_reg.predict( array([ [6.5] ]) )
no need for overly emotional voices, just a simple reading speech, like reading a story, that kind of intonation. Can I make a model using the realtime voice cloning toolkit to make a voice model with only six or less sentences of myself, then somehow modify the hyperparams to disallow it to vary the speech for each utterance of the same string, I don't want it to change how it reads stuff the more I give it the same text. Then can I take/export that little model and go from there?
only six sentences of yourself? it would take significantly more audio to generate something that sounds realistic even for completely monotone speech.
This is a regression model code, for predict the value, like input the value, then predict it
yes correct. im guessing you want something else?
stelerkos, I'm not taslking about training a standard tts model with a crafted dataset that the more you give it the better it'll sound, I'm talking about a tts toolkit called realtime voice cloning. It can generate speech with your voice with only a few audio files
so I hypothesized that maybe the models generated with that toolkit are much smaller, and much easier to integrate within a realtime requiring application like screen reader use
I don't think it's possible to achieve this with the constraints you have specified. That's probably all I can contribute.
@tacit basin All the curve in the same shape, cause the coef is the same, but in reality, the coef is little diferent
do you even know about real time voice cloning? I'm talking about the toolkit
I'm not familiar with whatever that toolkit is, but I work as a computational linguist and have been on a project for creating TTS systems that are indistinguishable from a real person (without emotion) for several months.
absolutely not to be rude, but it says in the docs/manual/whatever you call it, that it can generate a tts voice similar to the wave files, with only a matter of minutes of audio
so stelerkos, can we voicechat about this?
seams like you have some expertees
I can't voice chat; sorry.
hi
ah no prob, I'm sorry for sounding rude in that message, I absolutely didn't mean to, mister greekson
and hi
lol people are talking some advance things
I'm not actually Greek. A couple of mods changed our names to greek letters for the lulz.
lol
Do you have a link for this toolkit? I should look into it. I'm a bit skeptical as to how well it might perform.
oh I see, well cause I'm learning greek and am interested in it, I can also type english with greek characters if you're interested
We should use Latin letters so that everyone can read what we're saying. Also, how would you transliterate H?
what is greek
the language spoken in š¬š·
sorry about that, just thought I'd translate quickly
im not
oh, sorry then
its fine
ana darastu al-arabiya mataa ana kuntu taalib 3alam al-lughaat
(attempted meaning: I studied Arabic when I was a linguistics student.)
don't type that way, it's confusing, especially the numbers used for letters
but nice
here's your linky mister stelerkos
I don't use Arabic often or well enough to use an Arabic keyboard.
yeah I understand, I can speak english just fine :D\
oh by the way I apologize again for my rudeness
.bm
oh handyyyyyyyy
wow. can you give this thing a sample of any type of voice and it will turn any sound to the sample provided?
python discord creators, if you're reading this, you're geniouses
.source bm
looks like it's not clear from the commit history who first created this.
well it makes a voice that can speak anything you give it, when you provide it an audio file so it learns the voice's characteristics and speak anything you give it provided that the language is supported
woah really, stelerkos? but isn't qorintin j or don't know how to spell the name the one who made it?
the command is older than the first commit in that file, so it must have been moved from somewhere else.
can you use a sample of any other language
if there is a pretrained model for that language which is supported, yes, it'll generate. I'm not sure if it can do crosslingual voice cloning or not
maybe it was cloned from somewhere else and forked and whatnot
i had a question. is that possible to make a voice assisstant just like google assisstant but runs on my own servers and has machine learning so it keeps learning for me
I think there's something called lione or leon or I don't know what it's called, it has many modules that customize it and you can run it on your own servers
hello i am trying np.where conditio for two conditions
my code ```python
whole_new['status'] = np.where(whole_new['time_hm']==whole_new['time_hm'].iloc[0], 'entry', 'hold')
whole_new['status'] = np.where(whole_new['time_hm']==whole_new['time_hm'].iloc[-1], 'squareoff', 'hold')
I'm reading multiple things online. Is stochastic gradient descent always updating after 1 train sample, or after a subset of the training data?
a subset. which I guess could be 1, but I'm not sure that it ever is.
Ah alright thx
I see it called mini-batch gradient descent for a subset, and distinguished from SGD
like here in some course slides
they might be synonyms. one of the downsides of the democratization of data science is that anyone can go online, write an article, and call things whatever they want. there are several concepts in NLP that I want to rename, but I don't want to be the one to dilute the vocabulary.
Yeah that's True
It's kinda annoying, because at my uni we get repeated the same info many times over, and everyone starts calling stuff different and making different distinctions between names
maybe SGD does usually refer to one sample 
Traditionally it does
but I think nowadays people only make the distinction for educational purposes
interesting
from Wikipedia (emphasis mine):
[SGD] can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).
yeah
boo
now I'll have to figure out what they meant every time I read something that mentions it
that's why I said "traditional" because in practice that's how people use it
it's not traditional if it's no longer being passed down š
because it's silly and arbitrary to distinguish
at least as far as i know, the batchsize=1 case doesn't have any special properties that the batchsize>1 case doesn't have
so there's not much point in distinguishing, except if you're teaching the concept for the first time and you want to start with batchsize=1 for simplicity
maybe i'm wrong and the batchsize=1 case is special mathematically, but if so i have no intuition for it and i've never heard of it before
for batchsize>1, doesn't there need to be some reducing operation that would be idempotent for batchsize=1?
don't forget that the gradient is the gradient of the loss function
which is already a "reducing" operation
you are just varying the number of points you use to compute the loss
batches smaller than the full dataset are basically approximations of the full gradient
also don't forget that it's the gradient with respect to the parameters, not to the input data. so it's not like the size and shape of the gradient vector is changing as you change the batch size
Is Prophet Library a type of artificial intelligence?
why does something like this work:
df["col"] == b
While this doesn't:
df["col"] in range(50000, 250000)
this was my issue as well 
um its a library for forecasting
So Is it only Data Science?
Can't we relate it to artificial intelligence?
bro i think you need to revisit how loose the definitions of data science and AI are currently

plenty of venn diagrams about this
as well as debates on definitions
can i dm?
it's text message
I know definitions but As far as I know Prophet is neither machine learning nor deep learning,
so it is not AI?
you could say that it is machine learning
"machine learning" is a general category of problem and also a field of study. in general, machine learning is anything where a machine or algorithm learns to perform a task without human supervision by learning from data. statistical modeling and analysis is often used to solve machine learning problems.
"deep learning" is just a buzzword for "neural networks with a lot of layers".
"AI" is anything that mimics human intelligence. not all AI even uses "machine learning", for example an "expert system" with a lot of hard-coded rules could be considered a form of AI. conversely, not everything that uses machine learning is AI.
prophet is a machine learning algorithm that uses statistical inference. it is not deep learning. it is not AI by itself, but could certainly be used in an AI-like system if it made sense to do so.
@modest shuttle does that help?
So,
In the end, I did not understand whether Prophet is artificial intelligence or not?
i addressed this directly.
prophet is a machine learning algorithm that uses statistical inference. it is not deep learning. it is not AI by itself, but could certainly be used in an AI-like system if it made sense to do so.
Can you give an example?
I have blessed this comment
Andrew ng is the best at teaching ML
I do not think anyone else is a better teacher
there was also an IEEE article i think
if thats more your cup of tea
āIn many industries where giant data sets simply donāt exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn.ā
you are asking about an example of an AI application that you'd use prophet in? anything that requires time series forecasting. be creative š
and suddenly bayesian inferece gets its day in the sun

actually bayesian methods are gaining in popularity
idk if its just hype or what
i think theyve been gradually on the rise across fields
since the one bayesian podcast i listened to said many of these causal methods have only been applied to small datasets and they werent really used for these use cases

i also think that theres still a lot of active research and development on the techniques
right, exactly
yeah its needed too lol
i think there's still a lot of "todo" work in bridging between the traditional social science applications of causal inference and causal ML
like where do the statistical assumptions fall apart and how robust are they, etc.
i think it's also gaining popularity because the promise of bayesian inference is precisely what people are looking for
interesting interesting
today's machine learning techniques are in a similar position to traditional statistical frequentist techniques
there is no coherent framework right now for making principled decisions that incorporate uncertainty, measurement error, etc.
the promise of bayesian inference is such a framework
the need and desire for such a thing has been growing over the last several years, with all the media attention about "bad AI"
thats true
and in the last ~10 years the software and math methods for doing bayesian inference have improved by a huge amount
i think the only i have to share about this is this cool library https://causalnex.readthedocs.io/en/latest/04_user_guide/04_user_guide.html

nice, i never even saw that
yeah its pretty dope
obv you have to set your assumptions beforehand
but thats something you can work with a SME/domain expert with
i remember working through the beginning of the koller coursera course on probabilistic graphical models but never had the time to commit to the full course because i was busy with my actual undergrad classes
now it's not free anymore š¢
my bayesian knowledge comes from this podcast: https://podcasts.apple.com/us/podcast/casual-inference/id1485892859
i feel like podcasts have too low of an information : content ratio
the hosts each specialize in different bayesian schools
yeah i mean, podcasts are just a medium to listen to when commuting or working out for me

its def not a sit down activity
its like they are the worst of both worlds, i am sitting there trying to learn something but it's 50% casual banter, or i'm trying to hang out and not think about work but it's 50% work talk
nah dont use it to sit and learn something
thats what courses are for
this is more just eavesdropping on peoples' convos

yeah i can't do the thing where i listen to podcasts about data science while i'm commuting, i need to not think about work in my free time

understandable
that was me in my previous life/profession

but theres also podcasts about everything
may or may not be addicted to them 
one of my faves is 99% invisible
its about architecture and design 
oh ive heard of that one, didnt know what it was about
yeah its pretty interesting since we dont typically think about the physical world around us
at least i dont until that podcast lol
I swear they do not even know what "AI" is
does anyone know if there is a good method for video recognition? Example: You want to find out if a person is tossing a ball upwards. Dropping the ball and holding it in place do not count
could you just stitch the frames together?
you probably want some recurrent convolutional network
Or maybe if you can locate the object, you could just use the bounding box coords to see if the ball is going upwards or downwards
How can I reduce the length of bars in an histogram. Basically, change the scale of Y axis.
I think it's best to first learn about classic Bayesian methods like belief networks and belief propagation (https://en.wikipedia.org/wiki/Bayesian_network https://en.wikipedia.org/wiki/Belief_propagation ). Then if you want to incorporate causal methods you can learn about structured causal models (https://en.wikipedia.org/wiki/Causal_model ). The key idea is to understand the ladder of causation (also Judea Pearl's work in general). (https://arxiv.org/pdf/2106.07635.pdf if you want to go further)
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of...
Belief propagation, also known as sumāproduct message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node (or variable), conditional on any observed nodes (or variables). Belief propagation is com...
In the philosophy of science, a causal model (or structural causal model) is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.
They can allow some questions to be answered from existing...
Or it's not possible
true, but the idea is that it figures out the animation to detect
Not really experienced with sequential data tbh, so I'm probably not the best person to answer :/
just looked it up, seems like it may work
appreciate it
from a quick search: xlim() and ylim() change the range of the axes, you could use that to set a scale



