#data-science-and-ml
1 messages · Page 386 of 1
hey, i was wondering how could i make labels and test images ? because ever since i started learning they're given to me
can someone help my with my lstm model, this is what it predicts
lstm_5 = Sequential([
tf.keras.layers.InputLayer(input_shape=[n_past, n_dims]),
# ADDING 1st LSTM LAYER
tf.keras.layers.LSTM(64, return_sequences=True),
tf.keras.layers.Dropout(0.2),
# ADDING 2nd LSTM LAYER
tf.keras.layers.LSTM(32),
tf.keras.layers.Dropout(0.2),
# DENSE OUTPUT LAYER
tf.keras.layers.Dense(1)
])
lstm_5.compile(loss='mse', optimizer="adam")
multivariate = lstm_5.fit(mat_X_train, mat_y_train,
validation_split=0.2, shuffle=True,
verbose=0, batch_size=batch_size, epochs=200)
# FORECASTING ON VALIDATION SET
multivariate_prediction = lstm(lstm_5, validation_index)
# SCALING OUTPUT TO MINMAXSCALER FITTED TO TRAINING CURRENT VOLUME
multivariate_prediction_scaled = scale(vol_scaler, multivariate_prediction)
have i used too many epochs or are my units incorrect
Has anyone tried the book The hundred page ML book by Andrey Burkov?
any kind soul can help to give some feedback to a self-check assignment from a regression textbook? Unfortunately I have to pay for the answer and I am not willing to, hopefully someone can let me know if there's any glaring issue
Any newbies want to do some NumPy practice? I was thinking about doing some live coding.
So question. i want to have it so an AI does things repeatedly(wake up, go to the window, open fridge, sleep, repeat) but it doing so brings down a variable called happiness. then i want another map like (wake up then leave) but i want the AI choices this when happiness is a certain point but as well having a chance to not choose it. Ik i could some like
if happi == "0"
map1 = False
map2 = True
while map2 == True
other code stuff
but i want the AI to choose map2 instead of doing map1. I am wondering if this possible?
this doesn't really sound like AI in the sense that this channel is concerned with AI. If something just follows the same steps repeatedly, that's just a regular program.
Anyone know autocorrelation function (ACF) and partial autocorrelation function (PACF) #🤡help-banana
that is a good point but if it was possible i was going to see if i could use this in my project SLOAM as a way to emulate a form of emotion that is caused by repetitive actions. SLOAM is a Self Learning Optical Auditory Machine. it will learn from well optical and auditory responses.
Hello Everyone,
I have a data science use case, (I am not super beginner tho), however I need some guidance in where to start/look for.
I have a dataset of people with some work information, (department, skills, responsabilities, etc) and also training they have taken.
I need to train a system in order to be able to suggest to new employees or even existing employees which training they should take based on the model.
for example if I a switched from Junior to Senior range, then the system would recommend me which trainings to follow.
any pointer in the right direction would be useful
Just a quick question. What makes you believe your clustering "didn't make sense"?
Your observation/answer to this will determine the kind of solution I'll suggest you try.
I made new datasets by their clusters and did a profile report on them.
I compared the values of every feature and they didn't differ all that much
for ex. "days_since_last_visit" ranged from 0-13 in one cluster and from 0-7 in another
While I would expect some devision
It could possibly be the problem of exploding gradient.
Anyone know autocorrelation function (ACF) and partial autocorrelation function (PACF)
Which clustering algorithm did you use? KMeans?
no not KMeans, it wasn't well suited from my problem
I used Chi2 distance with gaussian mixture models and hierachical clustering (My data was/is count of the occurences within a time interval)
I need some help with know autocorrelation function (ACF) and partial autocorrelation function (PACF)
im an econ student, have done time series last year, if that helps
@upper spindle omg life savior
wasnt the best at it, but i understood acf and pacf partially
Do you have 30 mins or smth #🤡help-banana
Hmmm I have not used the chi-square distance before ... Probably it's the case of not using the right number of clusters. I usually would use KMeans to get the right number of clusters and re-confirm using Silhouette plot or dendrogram from hierarchical clustering.
Alternatively, try using DBSCAN algorithm for your clustering and check if it's much better suited for the kind of data you have.
DBSCAN had even worse results 😅
I've now tempered some more with the freatures, tomorrow I'll see if it worked or not
Anyone know autocorrelation function (ACF) and partial autocorrelation function (PACF)
If the dimension of your data is much, try decomposing it with t-SNE before applying your preferred clustering algorithm. Maybe you'll get a more customer-friendly result that way.
Nonetheless, may the force be with you 😊
I read that t-SNE might find false patterns
It doesn't find pattern. It's used for dimensionality reduction.
I mean that it might not be able to reconstruct the data in 2d
If the data was in 2d you wouldn't have need to decompose the data in the first place.
You can use PCA as well but t-SNE is better than PCA.
Anyone know autocorrelation function (ACF) and partial autocorrelation function (PACF)
i do remember model giving nice values(and decreasing loss more and more)
so it could be a vanishing gradient too right?
learning that i can replace pandas' default matplotlib viz library with the plotly one has greatly improved my mood

whoever came up with that feature. i am glad. 
hey, i am stuck with assigning weights to my multinominal one-vs-rest logistic regression with sklearn. I know that I can assign weights but how would I go around it in a multilabel setting with e.g. seven possible outcomes / labels?
Anyone familiar with bootstraping sample replacement
Which platform is good for data science colab or jupyter
i'm VERY NEW to tensorflow
and I want to make a rock paper scissor game using tensorflow (basically just detect rock paper or scissor)
I've trained my model but i wonder how can i trigger a function when CONFIDENT is above 70% or something like that like retuned a list with data that I can check for
||or maybe it already retuned something i can check for like confident level but i just don't know how to access it educate me pls||
anyone have tutorials or article about this matter?
there's a ref : this project use tensorflow trained model to check for laughing face and trigger "LOSE"
https://github.com/andypotato/do-not-laugh
**
tl;dr : wanna make rock paper scissor game using my already trained model
how can i check for confident score so i can use it to trigger win/lose
please suggest what i need to know and what i need to learn
**
any way to make pytesseract.image_to_string() faster?
Guys, how do you showcase a Tableau workbook when you no longer have Tableau?
there is tableau community or something like that which is free
because fuck notebooks, that's why
Apologies for the audio distortion in the beginning of the video: I am a robot.
In this video, I'll show you how to fix the error message "The kernel appears to have died. It will restart automatically." in JupyterNotebook if you've recently updated anaconda.
My issue is with MATPLOTLIB, and was fixed by typing in the anaconda terminal: conda...
bless
i created a new environment and then just copy pasted that and it worked
You can also try UMAP: https://umap-learn.readthedocs.io/en/latest/ @urban lance
This talk will present a new approach to dimension reduction called UMAP. UMAP is grounded in manifold learning and topology, making an effort to preserve the topological structure of the data. The resulting algorithm can provide both 2D visualisations of data of comparable quality to t-SNE, and general purpose dimension reduction. UMAP has been...
what would be the best method of getting the second column given the first column
Is this in pandas?
yes
why the hate for notebooks 
i mean theyre obv not meant for dev or production
I would separate the text column by the space delimiter. The write a lambda function to multiply the newly created numeric column based on if the first column is minus or not
but theyre decent for experiments
Does this help or do you want to see code?
honestlytheycanbekindaannoying tbh

i dont think the code is necessary
thats along the lines of wht i was thinking as well
Snap I started writing it. I’m on my phone and it’s not the most efficient but it gets the job done:
df[['col1', 'col2']]=df['string'].str.split(' ', expand=True)
df['col2']=pd.to_numeric(df['col2'])
df['col1']=df['col1'].apply(lambda x:-1 if x=='Minus' else 1)
df['final_col']=df['col1']*df['col2']
thanks
do cpr
HAHAHAHHA
that made me almost spit out my coffee
good one
CPR failed, trying paddles
thanks lol
honestly they can be kinda annoying tbh
I don't get the hate towards notebooks, they're lightweight useful and pretty nice. if one wants to go hardcore, just use git with sublime text?
Crashing a kernel > crashing your computer
facts
if you mean to say its more convenient to crash a kernel
Lol yea, for standard development they might not be as useful. But for handling big data it’s a lifesaver to run on a kernel
I find colab more useful. the fact that I can totally mess my env up by downloading 10 versions of torch and do a simple reset blows my mind
I like colab but with their pricing lately you might as well just open up a databrick account
a big F U to docker nerds, nerding over why learning 100 commands and debugging errors is better than colab
I doubt it
colab pro+ is heavily undercutting its competition
Maybe, depending on if you are using it a ton. I have a databricks account with aws cloud and never go over 50 a month for my personal stuff

takes notes

Just be supperrrrr careful to terminate clusters when you done
so...its just AWS but with a better UI
Pretty much
True but the data storage
i mean i think 50/mo is cheap
Heaven
well, depends
Spark sql pass throughs with hive storage
Google's TPUs are dirt cheap, and AWS doesn't do much for pricing for A100s
It’s not terrible
But if you training all bloody month go with colab, although I think they have ways of limiting usage
I meant multiple projects
I suppose
Colab is for experimentation anyways, which is why I love the colab+kaggle comob
got the data https://cloud-gpus.com/
Cloud GPU Price and Feature Comparison
GCP is just objectively cheap all around. it has a pretty UI, helpful support and plenty of integrations
I think google is trying to capture space in the market, they’ll prolly jack up prices
Just don’t use Microsoft lol
¯_(ツ)_/¯
I doubt it
I think they're just getting more competitive with thinner margins - and their TPUs ofc
Maybe
TPUs are a pain to get working, but they just outperform every single GPU out there no biggie
they're criminally underrated IMO
(but I hope it stays that way lol)
the nonlinear execution order and discouragement of modularity.
I would say that they are everywhere now due to being part of Apple's new SOCs. But in typical Apple fashion they are very closed off in terms of access. You have to use their own tools to do specific operations on it while in reality it's pretty generic and could probably support something like OpenCL.
The thing is that that kind of thing where you can visualize stuff and have code everywhere in random spots is better served by functional reactive programming / declarative programming.
Since it all readjusted as you change anything then.
(see Enso)
(or shader graphs in Blender or flow graphs in Unreal, etc)
regardless, I've decided to position myself as the anti-notebook, anti-jupyter guy to spread awareness of their limitations/issues.
I agree, they are kind of like a bad repl in a way.
Rather than what they actually want to be which is this pipes and filters thing.
@serene scaffold This is just a personal question. Because you are a data scientist with linguistics, do you work on NLP?
how would you go about counting objects (with tensorflow for example) in a video where the camera is moving randomly ?
As an example , say the camera is pointing towards an object and counts it and afterwards it moves away from that object facing anywhere else away from the object and later faces again to the same object but you would not add it to the count since its already been counted.
i have an image and i want python to work out the average pixel intensity below the blue line and average pixel intensity above blue line from image. I know code to work out average pixel intensity of full image. But dont know how to do pixel intensity below blue line and above. IF you have any ideas please help. Or another way is to get python to split image into 2 - one with everything below blue line and other with everything above line then i can do my code. But i dont know how to do that
just @ me if you have any thoughts
fourier transformation

jk
idk anything about signal processing
even tho i took one class
made me avoid it more 
Got a machine learning question for you guys
I have an airbnb data set that has prices of properties throughout the year (300+ entries for a single ID). Applying ml on the data would result in heavy overfitting and not capture the true goal of measuring price change throughout the year. I know in R there is lm.clustering which accounts for multiple entries, is there any equivalent in python? @tender hearth
Yes
stelercus i know you might be biased and may or may not be able to look into the future, but do you think specializing in NLP is good in terms of future job growth or should i try to specialize in another set of ML algos 🔮

okay no problem thanks for at least replying
i joke but seriously maybe look into the direction of signal processing bc im like 75% sure your answer is at least in that direction
there is probably something in cv2 that can help segment this image
heck you can probably even do it in photoshop, if you don't need to automate it for more than one image
then you can literally just look at the pixel rgb values
of course you can use gimp if you don't want to use adobe proprietary shitware 🙂
can someone help in #help-potato
specialising in ML algos is a really bad idea
unless you are postdoc
my experience is that it's very hard to find work as a specialist
there are too many postdoc kaggle grandmasters for rent
I think people have an extremely tilted idea of how competitive the post doc grind is and how much talent there is in the market atm
ML algorithm work is being advanced by huge research teams, and the impact of a lone ranger in the field is becoming super low
It's as high as ever if you think outside the box, but if you want to do incremental improvements then yeah that is already done in parallel by large teams.
(Actually applies to more than just ML)
(incremental vs different axis)
I think you underestimate how hard it is to do research outside a research group
of highly motivated peers
but that's sorta not industry
I think in terms of industry the applications of NLP are extremely saturated right now kinda. The capacity for NLP projects is maxxed out.
I don't but I know it's hard for many to do so. Solo is not only hard for motivation, but also confidence. You may fail and have wasted your time or not, but if you can get over that fear it's fine.
To be successful in research you need to know 100 other successful researchers
just to help you course correct etc
and learn of opportunity
and yeah, motivation confidence etc
Eh, not exactly. You have one more trick up your sleeve and that is the results. If it clearly works (and you made sure it actually works and are not tricking yourself) then it works.
Yes, but the low hanging fruit is very picked
in the 21st century every research area has a small army of wannabes assaulting the easy stuff
so you need quite the edge to tackle stuff.
and the network and mentorship is part of that
It's actually not because of again the axis issue. The focus is different. There are many things in ML for which there are only a handful of people working on it.
It's just not even being attempted often.
like?
Online learning, causal modelling, non-backpropagation based methods. These things have relatively few people working on them.
There are still many, but these big teams in the ML world are often focused on their specific stuff which others follow making it seem like that is all there is.
And that is a losing strategy because they are already doing it, and in a large group.
Hmm, online learning I think is not very compatible with large scale backprop
also to be frank unsupervised online learning in industry is very...
risky
Yeah online learning is an example where incremental improvements by just using backprop does not work.
but yes I guess so
Risky, but it must also happen, because an AGI can do online learning.
But that's research, it's risky and you will probably fail, but that is how all interesting things happen. Not being afraid of failure is the first step. It's why kids are considered creative, they lack that fear in their shielded environment.
That's a very pure research mentality haha
the flipside is that succesful research requires extremely insane feedback loops to be done successfully
Having a group to work with is of course much better, but if there is not such group, and you know that it must be done (e.g. for AGI), then it is what it is. You need to start that group, someone has to do it.
So you need to be on top of your game to really do any thing, just because the intellectual capital being deployed is significant so you have to be really cutting edge to find delta
Artificial general intelligence talk kinda doesn't motivate me much
that entire field has skipped so much foundational knowledge
about neuroscience
we know so little about problem solving techniques that don't involve Von Neumann state machines
This is why our approach is based on modern neuroscience and results. If it does not work, even with a nice pretty theory, it goes into the bin. There are things however that we know we must have, like online learning.
but it's clear that we will need to use models that are not von neumann state machines to get anywhere close
We also are very into non-von neumann machines.
yea, but that is pure research.
The reason for the neuroscience is that while it may not be necessary, it's the current real world example out there of it working (the human brain).
I don't doubt it's valuable, but you must bring a pure research mindset to it
neuroscience is very necessary
we need other models of computatation haha
depending on your definition of neuroscience
For non-research, yeah you can join a big group and help with the incremental improvements. And maybe just keep tabs on the pure research people's stuff.
Well non research means that its commercial applications
if that makes sense
and commercially all this stuff is just miles and miles off
hello all
i'm now in progress on my personal project with image classification with cnn. i want to build airplane tail classifier started with 20 different airlines and each airlines provide 20 different tail image in 120x120 px dims and manually cropped using PS (i dont event care about the aircraft type). something that i want to ask is it feasible to do with only 20 images which i planned to split for train and test?
thats not what i actually meant sorry but i understand your concern
allg
Not exactly, there are already non-backprop methods in use and have been long before deep learning actually. And they can out perform them.
5000 images per class is my rule of thumb
Especially when it comes to computation cost**
but you can cheat and go with a few hundred if you're really good at oversampling techniques
where are you coming from then?
i know cnn more image could help
secret 
well where do you think about going
I think the tl;dr of a MLE career is Kaggle good
Pump the weights in the Kaggle gym it's really real world relevant
So I have two pieces of advice
#1 don't use photoshop for labelling
maybe i will bring the sampling to albumentation or keras preprocess to help
invest the effort up front to configure a labelling tool
#2 get a bit more source data, say 50 distinct original images for each
Don't
the world doesn't need more pms
I'm really serious
what is PM? project manager?
in this context yea
the world doesnt need a lot of things but it still gets it. ill do what makes me happy. i wouldve continued my past life if i didnt care about my happiness 
I think the hardest lesson I've learnt in life that happiness doesn't get solved by the easy path
most happiness is hard earnt
if you've got a good path ahead of you for the hard yards of that domain sure, but just keep that in mind
I am of the view that it's very hard to succeed in any domain in this industry if you can't succeed at the technical IC path
i know myself and my personality and i would suffer if i went down the IC route
everyone is different. dif strokes for dif folks.
Perhaps, but I think the fundamental thing you need is the ability to focus and achieve difficult hard work
I think the skills to be a succesful IC are a lot less rare than people think
well, a lot less specialised
but what happens is that highly succesful ICs are highly successful because of a range of skills that are successful in any domain
i dont doubt your words, but i think success in life looks dif for everybody, yknow?
hence they are given 10 direct reports because they're highly successful people
I think the output of people's success
is different for everyone
I think the inputs of success are extremely similar for everyone
To add to that and the thing about research. I could have done web development or some "normal" programming job, but I would have been miserable. Instead of having a relatively easy route I do AGI research (although we actually have immediate applications) which is risky and niche.
i think we are talking past each other. lets just say i wouldve stayed in my original path had i cared about what others thought and what society thought of "having a successful career" meant

but to each their own, you do bring up some valid points.
That's fair Squiggle haha. I was trying to think from a industry mindset
AGI research is of course extremely interesting as long as you do it right
I mean that's the thing, you're being kinda mysterious about what you want so I can't give you advice
I'm just saying that you need to figure out what you want and then you need to really work on improving your input goods to success kinda
but I can't give more advice to you really without knowing more, I know you said ML nat lang stuff but yea
My experience in life is just that
successful people are rarely bad at a particular area
The successful researchers I know are often surprisingly solid full stack devs,
and im saying that success in life =/= success in career
thats what i mean
you get me?
You may be onto something with that, myself and my circle of researchers all do stuff from web to simulation to ml and more.
anyway, im getting off-topic for this channel, so ill see myself out 
(often out of need because we need to simulate things, have a web UI for it, etc. Or because people now in ML often come from other things such as game dev)
"jack of all trades master of none" - lame, self limiting phrase, "jack of all trades master of one" - much better.
(a large math knowledge base is probably the underlying thing here (and/or physics))
and im saying that success in life =/= success in career
thats what i mean
you get me?
I understand, but I completely disagree
The #1 factor to success IMO is not having a mentality of.
"I'm not good at that and that's fine"
Most successful people are "Jack of all Trades AND Master of Something"
oh that's what you already said squiggle haha
jack of all master of one
Yeah it's what can be seen when looking at pretty much all famous researchers (and lesser known ones). For example Newton was good at much more than just physics.
But that was his one master thing.
I don't like Newton or older researchers as an example
back in the 19th century you could be a weird shit and still successful
for various different reasons.
I mean yeah, Newton and such are weird.
Modern successful researcher just looks like Alan Kay kinda
Just... they're not bad at stuff.
Yeah. Well, the thing is there is of course always some stuff that is out of your hands, but using that as an excuse is not going to help you.
The thing is though, you can often see such people, successful or not, doing many different things.
And often just as a sort of problem along the way to trying to solve a different problem.
It really comes down to whether or not you throw up hands when there is a problem or you constantly push for a solution.
in a way
I think the biggest attribute is investing in your personal productive capacity kinda
I think it's less about charging through problems
becuase there are a lot of failed people who pushed really hard on bad problems
Survivorship bias is a really strong thing
I think it's about investing in your personal capacity to do really good work, and to find and identify really important and profitable work to do
and there's a lot involved in doing that successfully for obvious reasons
The thing is that it's a multi-arm bandit problem and it's really hard to tell if you keep going or not. At some point the whole different approach thing kicks in for some, and others just keep pushing the same way. Knowing more about seemingly unrelated things may just happen to give you the alternative solution and math happens to be generic enough to often allow for such bridges.
I just would try to not worry about the problem
rather than the process
be process minded
Always, be very process minded
Don't sweat about the win sweat about having an amazingly good process
I agree.
breadth of knowledge is also a very good point
is 87% training accuracy considered underfitting?
I want to print this in google colab any can help??
i just saw this on net
fig, ax = plt.subplots(figsize=(8, 8))
ax.matshow(con_mat, cmap=plt.cm.Blues, alpha=0.3)
for i in range(con_mat.shape[0]):
for j in range(con_mat.shape[1]):
ax.text(x=j, y=i,s=con_mat[i, j], va='center', ha='center', size='xx-large')
plt.xlabel('Predictions', fontsize=18)
plt.ylabel('Actuals', fontsize=18)
plt.title('Confusion Matrix', fontsize=18)
plt.show()
Im getting this error on my CNN deep learning code. Any help? I tried to fix it but Im stuck. ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 24), found shape=(None, 8)
Hi
Anyone familiar with fitting a logistic regression model using 70%-30% of the data for training-testing the model. Repor AUC
This feels like another homework question, Jessica.
hm seems like you are either giving the wrong input or you need to edit architecture a bit more, why dont you share more details.
try ctrl v or paste on notepad first then copy paste again
When my pipeline gets really convoluted, it'd be nice to see all the steps graphically and be able to inspect intermediate results
scikit learn pipelines have graphical representation: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#using-the-prediction-pipeline-in-a-grid-search
My model get emotional damage for now
Epoch 2/500
5/5 [==============================] - 1s 116ms/step - loss: 33.5003 - accuracy: 0.0507 - false_negatives: 139.0000 - categorical_crossentropy: 33.5003 - val_loss: 236863552.0000 - val_accuracy: 0.8100 - val_false_negatives: 57.0000 - val_categorical_crossentropy: 236863568.0000
Epoch 3/500
5/5 [==============================] - 1s 116ms/step - loss: 7.6320 - accuracy: 0.0300 - false_negatives: 131.0000 - categorical_crossentropy: 7.6320 - val_loss: 361702784.0000 - val_accuracy: 0.8233 - val_false_negatives: 53.0000 - val_categorical_crossentropy: 361702784.0000
Epoch 4/500
5/5 [==============================] - 1s 123ms/step - loss: 2.8214 - accuracy: 0.0343 - false_negatives: 127.0000 - categorical_crossentropy: 2.8214 - val_loss: 139447792.0000 - val_accuracy: 0.8100 - val_false_negatives: 57.0000 - val_categorical_crossentropy: 139447792.0000
Epoch 5/500
5/5 [==============================] - 1s 142ms/step - loss: 4.0497 - accuracy: 0.0379 - false_negatives: 131.0000 - categorical_crossentropy: 4.0497 - val_loss: 51209328.0000 - val_accuracy: 0.8233 - val_false_negatives: 53.0000 - val_categorical_crossentropy: 51209328.0000
I need to tweak it more ...
Hey guys how would you go about using groupby only on rows that are "1T" apart in a pandas timeseries ?
So im my mind it would be something like this :
deltaTimeThreshold = np.timedelta64(1, 'm')
Not valid code obviously : df = df.groupby('sn')(if df.date -df.date.shift() <= deltaTimeThreshold)
on the topic of groupby
does a groupby sum change nan to 0 🤔
if a group has only nans, I want it to remain nan in the sum
df = df.groupby(["x",pd.Grouper(key="y", freq="M")]).agg({
"feature" : "sum",
ex:
features = [nan, nan, nan, nan, nan]
output : nan
features = [nan, nan, 0.14, 0, nan]
output : 0.14
https://stackoverflow.com/questions/26145585/pandas-aggregation-ignoring-nans @urban lance using nansum if you want to ignore that nan
i actually was trying to run a NN model from github, but i am getting this error .......all i changed was remove the encoding part as i had the encoded dataset for the same
could some be able to help at #help-pretzel
i assume that either you have changed model bit, or data.
its simple shape error
yes but i can even show you my code, all i did is remove encoding
yeah showing code will help.
also given the error already, what is the shape of your X and y?
s_label=[]
s_image=[]
fig = plt.figure(figsize=(5, 20))
k = 0
sample = paths.list_images(r'this is the path')
for s in sample:
s_label = s.split(os.path.sep)[-2]
s_image = cv2.imread(s)
s_image = np.array(s_image, dtype="float") / 255.0
fig.add_subplot(2, 5, (k + 1))
plt.imshow(s_image)
plt.axis='off'
plt.title=s_label
plt.savefig(r'this is also a path', bbox_inches='tight')
I want to plot 10 image into a plot of 2x5 area but when i run this, i only got the last image
the structure look like this, but only the last folder shown in the plot
hm i think i have solved this before. hold on.
you want to save all of them right? its out of loop.
put in in loop.
it works
but now the axis make it annoying
nvm
fig = plt.figure()
k = 0
r = 2
c = 5
i = 1
sample = paths.list_images(r'')
print(sample)
for s in sample:
plt.subplot(r,c,i)
plt.title(s.split(os.path.sep)[-2])
plt.axis('off')
s_img = cv2.imread(s)
plt.imshow(cv2.cvtColor(s_img, cv2.COLOR_BGR2RGB))
i=i+1
plt.savefig(r'', bbox_inches='tight')
and the output seems fine
cheers!

i dont know event each tail looks very distinct, i still cannot made the model predict all of them easily
visually
it was easy to classified them since I was choose carefully the easy one, not put the hard to separate pattern
import tensorflow.keras
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
import numpy
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from keras.utils.vis_utils import plot_model
seed = 9
numpy.random.seed(seed)
data = pd.read_csv("C:\\Users\\rahul\\PycharmProjects\\pythonProject1\\complete.csv")
X = data.iloc[:, 0:8]
Y = data.iloc[:, 8]
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.001, random_state=seed)
# create model
model = Sequential()
model.add(Dense(8, input_dim=8, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(3, activation='tanh'))
model.add(Dense(3, activation='softmax'))
print(model.summary())
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
history = model.fit(X_train, Y_train, validation_split=0.3, epochs=16, batch_size=128)
# evaluate the model
scores = model.evaluate(X_test, Y_test)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
plot_model(model, to_file='model.png')
# Plot training & validation accuracy values
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
my code^
i bet your dimensions of Y are (something, 1)
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
import numpy
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from keras.utils import plot_model
seed = 9
numpy.random.seed(seed)
# load datasets
#csv files were filtered based on the data.
input_file = "C:\\XXX...csv"
test_file = "C:\\XXX.csv"
dataset = pd.read_csv(input_file).values
# read training data
datasetTest = pd.read_csv(test_file).values
# split into input (X) and output (Y) variables
X = dataset[:,0:8].astype("int32")
Y = dataset[:,8]
XT = datasetTest[:,0:8].astype("int32")
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
(X_train, X_test, Y_train, Y_test) = train_test_split(X, dummy_y, test_size=0.001, random_state=seed)
# create model
model = Sequential()
model.add(Dense(8, input_dim=8, init='normal', activation='relu'))
model.add(Dense(4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='tanh'))
model.add(Dense(3, init='normal', activation='softmax'))
print(model.summary())
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
history = model.fit(X_train, Y_train, validation_split=0.3, epochs=16, batch_size=128)
# evaluate the model
scores = model.evaluate(X_test, Y_test)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
plot_model(model, to_file='model.png')
# Plot training & validation accuracy values
plt.plot(history.history['acc'])
```(removed some lines to fit in)
dataset is 63168 by 9
i repeat, what is the shape of your Y?
model.add(Dense(3, init='normal', activation='softmax'))
your model expects output of vector of 3 not 1.
no hold on listen.
you are not using encoded y right?
so your encoded y has the shape of (63168, 3) that is why it works for that, and not for this.
whats the shape of encoded_Y can you print it?
yes i will verify
i said print encoded_Y.shape 
i removed my encoded
okay...okay do you first of all understand what is the issue?
yeah
and why removing encoding affects your code?
i think i should try printing his encoded Y too
i will know if error is after that or somewhere else
hm you can use to_categorical
I'm guessing those 2 purple clusters are classified as being of the same cluster right? 😬
Converts a class vector (integers) to binary class matrix.
Y = to_categorical(Y)
this will convert your Y in (something, 3) so then you're good.
@mint palm
ok thank you....i will try all that you said
alright, ping me here if you're still stuck.
well there must be 2 clusters and so they aren't all visible in 2d space (being covered by other datapoints)
I'm using Kmeans so how does this work 🤔
my input is a distance matrix of my dataset
depends on how you show your dots you need to show different colors for different clusters Im not sure if you did that.
of course I did
oh lol no offense. how much dimensions are there actually?
if it's 2 dimensional then by the definition of kmeans, there are 2 possibilities, something is wrong with your algo or something is wrong with your visualization.
this is what I imput
probably the plot then, thought finally I had something cause I clearly saw 3 clusters now :/
been trying to cluster this data for almost 2 weeks 😅
this is the same data just with different x-y values
the elbow method tells me the optimal number of clusters is 3
and this is what happens when I input the original data set
what is that zombie infestation
okay that seems like k means output.
;_)
it is
but kmeans isn't optimal for my dataset;
I tried with chi1 distance and just counts of my dataset
i was gonna say zombie infestation is not a part of #data-science-and-ml or something but then i got the context lol
this is what the data looks like in that case
clusters I made looked fine on the 2d plot
lol
but when I did a profile report, I saw that the clusters didn't make sense
I know what you're all thinking, outliers! haha
I'm not sure what do you mean by profile report here?
I made a different dataset for each cluster, then looked at the values for al features but they weren't really different
for ex:
in cluster 1, the num_visits ranges from 1-27
and in cluster 2 they'd range from 1-24
and cluster 3: 1-32
I tried with count values + chi2 distance and hierarchical clustering/GMM
and redid the feature engineering so I had the right data from kmeans
should add that the data is normalized
@lapis sequoia
hm yeah I've been thinking about it, but I'm lost.
i cannot understand whats the issue
the clusters don't make sense so something must not be up with the feature engineering
I'm trying to predict where in the customer journey a certain user is so the data should make sense 🤔
(tried with 3 datasets from different companies)
all the same results
i want to work out the gradient of colour from this image. so basically im thinking of creating a line or box that starts at e.g. y=0 and goes up until it hits the black colour and it should find pixel colours in terms of whiteness at all points along that line. so it should go from high white pixel values to medium grey pixel values to blackish pixel values. i can use some more basic softwares like imagej to draw a line and plot pixel values but the problem is in my image region of interest there are lots of random pixels with completely different greyness colours so the graph would have lots of noise (go from very high pixel whiteness to sudden low pixel whiteness). i want some way to kind of smooth image or change odd pixels to have same colour as its neughbouring pixels. for smoothing i cant smooth much because essentially i want to get an accuarate plot of changing pixel intensity across the line. could anyone siggest some code
wait what - TPUs merging with Apple Sillicon?
idk how they're exactly closed - their speciality is bf16 ops, that's what they're designed to do and Jax, Pytorch or TF works very well IMO
Will double encoding do anything at all?
for what
for encoding string to integers
for unique representation of set of possible value of any attribute
so i've built this classification model on the fruits360 dataset, and it's pretty accurate, but i'm not sure how to match up folder names to the numbered classes that pytorch outputs. does anyone have experience with this or know how to get that classification? thanks https://colab.research.google.com/drive/1WWtTrG57chcm2xf5bHlP7NCiFE_YpHin?usp=sharing
it depends on the algorithm, but generally speaking, you can't just assign words to arbitrary integers, as this tells the algorithm that a word with a higher number is "more" than another word, which makes no sense.
i am talking about this
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
^ that and the paste bin are the only way I will look at code.
import tensorflow.keras
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
import numpy
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from keras.utils.vis_utils import plot_model
seed = 9
numpy.random.seed(seed)
input_file = "C:\\Users\\rahul\\PycharmProjects\\pythonProject1\\complete.csv"
test_file = "C:\\Users\\rahul\\PycharmProjects\\pythonProject1\\complete.csv"
dataset = pd.read_csv(input_file).values
# read training data
datasetTest = pd.read_csv(test_file).values
# split into input (X) and output (Y) variables
X = dataset[:,0:8].astype("int32")
Y = dataset[:,8]
XT = datasetTest[:,0:8].astype("int32")
print(Y)
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
print(dummy_y)
(X_train, X_test, Y_train, Y_test) = train_test_split(X, dummy_y, test_size=0.5, random_state=seed)
what's the algorithm?
now see
so you're using a neural network, built with keras
what is the model intended to do?
i have a dataset thats basically all strings
so i have to predict an output that falls in one of the 3 category
for which i use onehot representation
"strings". just saying that you have strings is uninformative--what kind of strings? what do they represent? where do they come from?
and what are the categories?
1,1,1,1,0,2,1,5,1
1,1,1,1,1,2,1,5,1
1,1,1,1,2,2,1,5,1
1,1,1,1,3,2,1,5,1
1,1,1,1,4,2,1,5,1
1,1,1,1,5,2,1,5,1
1,1,1,1,6,2,1,5,1
1,1,1,1,7,2,1,5,1
1,1,1,1,8,2,1,5,1
1,1,1,1,9,2,1,5,1
1,1,1,1,10,2,1,5,1
1,1,1,1,11,2,1,5,1```
this is a sample
after you encode them?
encoded sample^
Is there a good format to save a table with images? I'll typically plot a data frame and save each plot to a file, but it might be nice to just save the whole table to a single file which I can open/annotate. The question is, is there a format which already has visualizers, which allow you to filter/maybe annotate?
you can pickle the dataframe and load it again if you need to generate additional figures later, if that's what you mean.
yes columns are weekdays, network types, etc etc
don't say "etc etc". I can't possibly guess what they are unless you tell me.
remember: I know nothing about what you're trying to do. only you do
I want a file format which I can open with a GUI, to examine plots (probably sorting/filtering) and maybe annotate them
Something a little better than opening a folder full of plots with the file explorer
#data-science-and-ml message can anyone help me
I've cooked up something like this, just wondering if there's something mature and off-the-shelf out there
I'm sorry that you haven't gotten help with that question after all this time. It's looking unlikely that anyone can help with that, so you might need to do some more investigation on your own and come back with a more pointed question later.
what are the categories you're predicting for?
I should really get back to work, but I would recommend reading about feature encoding. you have to be intentional about how you represent each feature for the network, or it won't understand what you're telling it to do.
like my weekday attribute is 0 to 6......for sunday to saturday
time is 0 to 23
like this so on
actually the main problem was....when i run this code it says 100% accuracy lol
has anyone had any experience with openai gym? i installed in python 3.9 and tried running sample code from the docs on the openai gym site but the example isn't popping up for me...
here's the code:
env = gym.make('CartPole-v1')
observation = env.reset()
for _ in range(1000):
env.render()
action = env.action_space.sample()
observation, reward, done, info = env.step(action) # take a random action
if done:
observation = env.reset()
env.close()```
look at various edge detection methods like sobel or canny. if those arent fine-detailed enough, maybe look at specific ones or adjust/tweak them to fit your needs.
i know matlab does this type of stuff really well, but i think you might be able to get by with python's opencv library
you might have to mess around with kernel sizes
Okay I'll try to read up on this. Can I ask you again once I read up
im not a CV person, so im not the best person to ask sorry. i really didnt like my image processing/feature engineering class 
My X train input is this shape (700227, 8) and my y train input is this shape(700227, 11). Here is my model architecture: model.add(Dense(2000, activation='relu',input_dim=24)) model.add(Dense(1500, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(800,activation='relu')) model.add(Dropout(0.2)) model.add(Dense(400,activation='relu')) model.add(Dropout(0.2)) model.add(Dense(150,activation='relu')) model.add(Dropout(0.2)) model.add(Dense(12, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) My error is ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 24), found shape=(None, 8)
haha no worries!
is there any matlab discord server
this is a part from github code
why are there seperate input_file and test_file
he later does split data set into X_train, Y_train, and X_test , Y_test
Hello all, I am trying to perform invoice data extraction from an image of invoice and export the data into an excel file. But I want to extract only a few fields from the invoice and not the entire invoice. Can anyone please advise how can I do that ?
from PIL import Image # pip install Pillow
set tesseract cmd to the be the path to your tesseract engine executable
(where you installed tesseract from above urls)
and start doing it
your saved images on desktop
list_with_many_images = [
"PartI_Data/Img1.PNG",
"PartI_Data/Img1.PNG",
"PartI_Data/Img1.PNG"
]
create a function that returns the text
def image_to_str(path):
""" return a string from image """
return pytesseract.image_to_string(Image.open(path))
now pure action + csv part
with open("images_content.csv", "w+", encoding="utf-8") as file:
file.write("ImagePath, ImageText")
for image_path in list_with_many_images:
text = image_to_str(image_path)
line = f"{image_path}, {text}\n"
file.write(line)
and whats the error?
!code first of all
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
hi so i had one doubt, if i apply LabelEncoder on already encoded dataset, will it change anything?
i think NO
I am not aware about LabelEncoder, what is it?
its like giving unique value to represent a string like in dataset containing sunday to saturday we can substitute 0 to 6
well leaving that aside for a while, my main issue is i am getting 200% accuracy on running github code without any change
200% ACCURACY!!!!
what in the world
lol
seems like private
Hello, I'm beginner in AI,
What is best methods for fake image(modified) detection? (without CUDA)
the best method is going to involve CUDA, just so you know.
hm isn't cuda just for GPU?
yes
I know that but i don't have CUDA GPU
so involve as in to make it faster right?
do you have any other suggestion?
what are the fake images you're trying to detect, anyway?
if test_size=0.001, thats like...0.1% you do realise how much test data you are giving right?
photoshped and etc...
also, you can experiment with GPU computation using google colab.
^for certain hours
just detecting if an image has passed through photoshop in some way is probably going to be too broad.
yes thats just 65 examples, i dont get what input_file and test_file mean
imput_file is ok
but if he used train_test_split then why he need test_file
oh yes so if you have different testdata why are you bothering to split train data?
i am not bothering, the git user is!
ok so lets see, I'll assume he has just split train data to get some scores, but if you do look below, he has commented out the part
#from sklearn.metrics import confusion_matrix
#y_pred_keras = model.predict_classes(XT)
#csv = open("C:\\DeepSlice\\5G\\output.csv", "w")
#"w" indicates that you're writing strings to the file
#pd.DataFrame(y_pred_keras).to_csv("C:\\DeepSlice\\5G\\output.csv")
#cm = confusion_matrix(Y_test, y_pred_keras, labels=[0, 1, 2])
which is for...testing i assume
now why he did above thing, well thats the thing I am unaware of.
so we are stuck
you* are
i also tried to train for test_size=0.5,
it still gives 100 %accuracy
i mean... it literally depends on dataset and the problem, just for the sake of argument, give it like 0.1?
also just ping me here if you ask another question, I'm reading a novel.
in 5/16 epoch it had 100% acc
lmao
i mean, i dont know what the problem is, may be it could be solve by some linear function for all i know.
cant be....an author mentioned after applying CNN +LSTM it gave 95% accu
how can it be 100 with linear function lmao
as i said, i am not aware of even the problem.
yeah i understand, i am not complaining, you are fair at your place
How come the solution got (20,20) would someone mind helping me with this solution
that isn't really fit for this channel
but either way, it's just (2+2, 3+1) = (4,4) then (4*5, 4*5) = (20, 20).
You could've asked in a general help channel though (#❓|how-to-get-help)
could anyone help me with something. i want to get the pixel whiteness (in terms of greyscale) from bottom to top along a single imaginary line in my image. so i get many values of numbers
if the image is already a 2d array of grayscale values, then the pixel "whiteness" along a given line would just be one row-column of the array.
oh okay, would you be able to help me with the code
I can't do that rn. sorry.
sure, no problem. would i be able to come back at a time that suits you to get more help
It's never guaranteed that I'm available at any particular time. But if you loaded the image with PIL, you can read it into a numpy array. If you get a 3d array, that means one of the dimensions is RGB values, so you'd have to look into how to convert that to a 2d array of grayscale values.
They call them neural cores or whatever, but the thing is that you can't just write programs for them without using their high level ML library which is Pytorch-like (and can convert Pytorch, TF models). Which is limiting because for those who want to run their Pytorch models and such there are missing functions (and you just have to hope that Apple adds the missing stuff), and for those that want to just get as much compute as possible out of the SOC, they would have to now hack on this high level API rather than just being able to generate instructions for it directly. People are already actively reverse engineering it, but there is not really any (good) reason for Apple to make it this painful.
(The neural cores are more or less just CUDA-like cores ripped out of the GPU)
In addition, in the high level API, you can't control where the ANN runs. Apple's driver decides it dynamically and can place it either on the CPU, GPU, or neural cores. This might sound nice, but in practice programmers often know where they want it to run and the driver will just make everything worse by trying to be smart (many have already run into this issue). It would be fine if that was the default setting and you can still force it to run where you want.
Apple hardware has not really been that good for scientific computing. It partly stems from the apple/nvidia rift and NV being completely ahead of the game wrt CUDA, but also historically the higher end apple hardware has been designed to be really good at photo/video work
i say this as a huge apple fanboy who just got an m1 pro macbook
i think they are broadening things tho with the arm/m1 stuff
I do agree but you don't have these restrictions on XLA at all, so I don't see any basis for arguing - they're pretty flexible for training large DL models. Its not like anything is closed source 🤔 one could still compile and execute custom operations
its still very early days for desktop ARM. As I see it building massive deep learning models is still a very niche thing and most orgs are going to either do it via cloud or dedicated specialized hardware
also ipython nb are for sharing/documenting research output for other ppl, they arent good as units of computation imo
stuff like anaconda are obsolete and shouldnt be used IMO
it seems to me if you want to do serious GPU compute, you want to use CUDA cause thats what all the libraries support best, which means you're going to use Nvidia hw, which rules out apple.
I am not sure how correct that view is
Not sure if XLA supports the neural cores on Apple's SOCs, nor if it ever will since it's closed off. Can't just throw LLVM at it (LLVM has to be allowed to target it in the first place). Also some want to non-ml stuff with the neural cores, because it's some compute power that would otherwise be wasted if they are not doing any ml.
there hasn't been any information released about the neural cores AFAIK
We know that you can use Apple's core-ml lib or whatever it's called which is Pytorch-like and convert Pytorch / TF models to itself. Unofficially people have been reverse engineering it for some time.
if u want to use other ppls built models coreml is good imo
its a decent solution to a tricky problem
and I still don't see how it is relevant to the original point - TPUs are pretty customizable w/ Jax, and nothing like Apple's SOC at all. they have plenty of information available and work directly with XLA as well as the Jax team
I thought you wanted more info on TPUs being a thing on Apple silicon.
I don't see any operation you can't do with jax, just that it won't be any faster or optimized unless its precision-agnostic
TPUs are Google's specific terminology for them, but it's more or less the same thing as Apple's neural cores. 16 bit floats and all.
uhh, no I was actually looking for an elaboration of this 🙂
TPUs are much more open, but the general hardware idea of them could be considered to be everywhere now that they are in Apple silicon.
And will continue to be everywhere like how a GPU is now.
well, again AFAIK fundamentally TPU structure is pretty different to other hardware alternatives
its more about the hardware architecture 🤔 rather than direct customizations of the chips themselves - they have different memory systems and other complex stuff which I didn't get
TPUs do two things well, fast low precision floats for matrix multiplies, and convolutions. They were also designed to fit nicely into their data center racks. Other than that, they are basically just stripped down GPU cores.
Apple's neural cores do these two things well also.
And are also stripped down GPU cores.
So other than name, TPU vs neural core, they are more or less the same thing. My guess is that the rename was to both avoid confusion with Google's stuff and to sell it better.
There are probably differences between the two, but they both have the same goal of those fast low precision float operations. And both come from having previously used the GPU and so they are still GPU-like to save R&D time.
Hello, I am trying to use plot_predict with python 3.10.3 and statsmodels 0.13.2. My advisor ran the exact same code and it worked but when I run it I get the following error. I have tried uninstalling and reinstalling everything 3 times with python 3.9.11, 3.9.9 and statsmodels 0.12.2 which is the version that the advisor uses. None of it is working, how can I get it to work?
Thank you
fig, ax = plt.subplots()
ax = adtrain.loc['2020-05-02':].plot(ax=ax)
fig = result_whole.plot_predict(start = '2021-05-02', end = "2022-02-14", dynamic=True, ax=ax, plot_insample=False)
plt.show()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [7], in <cell line: 6>()
4 ax = adtrain.loc['2020-05-02':].plot(ax=ax)
5 ## fig = result_whole.plot_predict(start = '2020-05-02', end = "2022-02-14", dynamic=True, ax=ax, plot_insample=False)
----> 6 fig = result_whole.plot_predict(start = '2021-05-02', end = "2022-02-14", dynamic=True, ax=ax, plot_insample=False)
7 plt.show()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\statsmodels\base\wrapper.py:34, in ResultsWrapper.__getattribute__(self, attr)
31 except AttributeError:
32 pass
---> 34 obj = getattr(results, attr)
35 data = results.model.data
36 how = self._wrap_attrs.get(attr)
AttributeError: 'ARIMAResults' object has no attribute 'plot_predict'
If I try to install statsmodels 0.12.2 with the current python version I get a very long error: ``` note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.```
I found this solution ( https://stackoverflow.com/questions/71009659/note-this-error-originates-from-a-subprocess-and-is-likely-not-a-problem-with ) but Idk what plugin to take from the site. The one from the stackoverflow solution does not work.
How do I change the learning rate of my CNN using tensorflow?
I'm not a tensorflow user, but isn't it one of the parameters when you go to compile the model, or something like that?
can you show the code that creates the model? so I have an entry point for where I should be looking in the docs.
cnn = tf.keras.models.Sequential()
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPooling2D())
cnn.add(tf.keras.layers.Conv2D(32, 3, activation='relu'))
cnn.add(tf.keras.layers.MaxPooling2D())
cnn.add(tf.keras.layers.Conv2D(32, 3, activation='relu'))
cnn.add(tf.keras.layers.MaxPooling2D())
cnn.add(tf.keras.layers.Flatten())
cnn.add(tf.keras.layers.Dense(units=255, activation='relu'))
cnn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
cnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
cnn.fit(x=train_set,validation_data=test_set,epochs=25)
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(layers.Activation('softmax'))
opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=opt)
I found this.
ok thx
tbh I'm not clear on what an "optimizer" is. is it a way of representing backpropagation as an object?
has OOP gone too far?
no, it's the actual optimization algorithm
i guess you could say it's an implementation of the strategy pattern, if you want to think about it in OO terms
basically it just changes the weight update algorithm
although in principle you could have l-bfgs or something like that, i don't think there's a "stochastic" l-bfgs version, so you'd have to set batch size = training set
interesting. so these are algorithms for updating the weights that are "smarter" than backprop?

so backprop is always backprop
but how the weights are updated, is the optimizer
so SGD is the basic algorithm
gradient descent that conforms to a defined probabilistic distribution?!
what do you mean
just expanding out "SGD" to have the definition of "stochastic" in it.
oh
I'm still listening, if there is more that you were planning to say.
sorry i got pulled into a dota match!
ping me in an hour lol
tldr you can have fancier weight updates than sgd
!remind 1h lick the salt
Your reminder will arrive on <t:1647485281:F>!
Is simulated annealing in category of machine learning?
I suppose it's kind of like unsupervised learning
its an optimization algorithjm
are you playing dota or not?!
Can be algorithm and unsupervised learn same time?
It's not a category, it's a method.
yes. informally, unsupervised learning is when you don't tell it what the answers are.
There are various algorithms for unsupervised learning.
@serene scaffold an optimization algorithm in general is an algorithm for finding a local or global maximum or minimum
so gradient descent is a general category of optimization algorithms
Okay a method of unsupervised learning
Unsupervised is the type of algorithm, not a specific one. There are several.
Huh then not machine learn
simulated annealing is another optimization algorithm
There are categories of machine learning, and there are categories of optimization.
@ionic palm you are probably looking for k-means clustering as an unsupervised algorithm
Zachary never made this about unsupervised learning. I did.
I was referring to this
Can be algorithm and unsupervised learn same time?
So you pick which category of machine learning you want / need. Then it will probably involve some optimization problem which can be solved by picking a method from some category of optimization algorithms.
Wtf is unsupervised algorithm
no labels
say if you have a dataset of cats and dogs - in unsupervised learning, you just lump together photos that look like cats together and photos of dogs together, but you don't know which one of the either sets are cats or dogs
this is an example btw
what was your actual question going to be, anyway, @ionic palm?
Is simulated annealing in category of machine learning?
Now i understand it simulated annealing is a optimize method of unsupervised learning
😀 It's a type of Machine Learning. So apparently there are basically 5 types of ML (some even say 4 types)
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Self-Supervised Learning
Reinforcement Learning
if you ever feel like writing an explanation for each in one comment, let me know and I'll pin it.
Alright I'll do that.
no pressure. I say I'm gonna do stuff and then decide not to increasingly lately.
Im sorry for asking that much, thank you so much
nah, we like talking about this stuff.
it's long debugging sessions that are more draining for us
Sure! It's 3:00 am here so I'll do that perhaps tonight or tomorrow
There is more, but these are the most known, so I think this list is fine. Some like to say there is only 3 (supervised, unsupervised, rl). But as time goes on there will probably be even more.
(some don't even have proper names yet beyond temporary made up names)
(you could argue there is just two, unsupervised (external input only (and those generated by itself)), and addition input / structure / supervised (e.g. labels, embedded vector spaces, knowledge graphs, reward signals, etc (can be hand crafted)))
(but you could also go more extreme and say it's all just inputs, which is not very useful because it does not distinguish anything (how many clusters do I choose?))
Here's your reminder: lick the salt
[Jump back to when you created the reminder](#data-science-and-ml message)
as much as people seem to dislike the idea of full stack DS, i really think there is power in being able to prototype stuff, especially "data apps"
thats my hot take for tonight 
@misty flint if full stack were applied to DS, I would think it should mean "a data scientist who is also a software engineer"
Because there are a lot of data scientists who shit out terrible code. Especially if they only use notebooks.
There's my hot take as well.
Why would I hire someone that can only do DS versus someone that can do DS and write an app?
(Assuming they do DS just as well)
I'm not sure what rex means by disliking the concept of full stack ds

im trying to understand that myself
i think those who are more "A-type Analyst" data scientists dislike the concept of full stack is what it seems like
There are a lot of concepts I hate as well. Including Keurig and Applebee's.
but the "B-type Builder" data scientists obv are all for it
(X_train, X_test, Y_train, Y_test) = train_test_split(X, dummy_y, test_size=0.001, random_state=seed)
does this suffle dataset before dividing between train and test?
@mint palm the docs probably specify if the partitions are random, arbitrary, or deterministic.
i think full-stack DS can prove business value to average companies much easier than someone who doesnt; i think maybe as this field develops more they can help add ML features to apps and such (outside of data-driven orgs)
otherwise you have business people always asking DS whats their business value
since many dont truly understand the concept of R&D

and experimentation
"what did you do this quarter?"
"...our experiments were inconclusive"
"..."

I'm not sure who would be opposed to this. If you can do more work for me then sure, go ahead.
I am new to the data science world. I want to ask some questions. If I want to be Business Analytics Specialist, is it better to focus on the ML aspects or I need to learn something else? RIght now I am still an undergrad student at math. Thank you!
.
T-shaped for life
Is this your chart?
Sadly.. yes
Business Analytics doesnt necessarily use ML. they might use some simpler statistical models like linear regression, etc; usually these "business" titles require some domain expertise in the same industry (i.e. real estate experience, logistics, etc.)
Simulated annealing is outside all of that, in another bubble, under "optimization".
I am so sorry
What are you sorry for?
Getting any kind of set chart with intersections, subsets, and such will be difficult and not really work for this.
Not everything fits in those.
as humans, we like to put everything into boxes 
It's more like pick type of ML, then pick algorithm in that type, and that may involve annealing, etc, which is its own separate thing.
ML Type + Algorithm = ML?
What you had is kind of like putting linear algebra under chemistry. Chemistry makes use of it, but it's not a subset of it, linear algebra exists on its own.
ML is any program that "learns", which basically means it starts out with some base amount of knowledge and gets more data over time.
Note that this is super loose and can pretty much include any program that stores inputs or information about those inputs.
A really simple example is a program that takes two numbers as input, X and Y.
The program then stores those are pairs.
And if you give it one, it can give you the other.
It learned to associate them.
The complexity of ML is how to do more with less. And how to infer stuff based on what was stored.
And also what if your input is noisy / not exact? Etc.
What if the input is too complex to really deal with directly (e.g. an image)?
There are still confusion about the job responsibility and capability for ML and DS
(how do you store the relevant stuff about it?)
ML is about machine learning, nothing more and nothing less. DS often involves standard statistics, forcasting, business stuff. DS can make use of ML since ML happens to also often make use of statistics to function well.
DS is more like, I have this job, and I need to pick the right tools and such. That might include picking an ML based tool.
from my previous code
i hate that big gap between rows... now find the way to tighten the gap
i thought MLE just the implementer of a model...
Yeah they implement the models, and then some DS somewhere might find it applicable to their task.
However, many DS are also often MLEs, etc. People are not limited to one thing, it's just their job title.
oh now i get the point
also this field is so new that the boundaries tend to blur quite often
DS in title
full stack Data and Modelling is the real job desk
and it differs per company too
Yes, all of this is fuzzy sets really, so don't worry too much about it.
The more distinctive is DS and DA
DS could also handle job of MLE
and sometimes DA too
Yes, DS is broad.
for companies, i think in general for a "data team" if they can cover the majority of the skills between all the roles, i think thats sufficient for most business use cases
(in part due to companies not knowing what DS is and often just want a statistician that knows Python or R)
obv if you are a data-driven SaaS company, thats very dif then
(but also acts somewhat as an accountant? idk, it's weird)
(and for some reason reports to the CFO. strange)
at least could operate Excel fluently
I rarely find a job title dan use 'Machine Learning Engineer'. sometimes it's also business analytics. and many people don't know what Business Analytics means. That's why I often are confused, haha
i read the docs,
random_state = int is for reproducing same division
shuffle = bool is for shuffling the dataset
but i didnt use shuffle, and its still shuffled
i heard about this on a podcast today and it comes from business just not understanding data teams 
I find an MLE job
and it need higher degree like Master or PhD
I mean really it's a nebulous "handles data" person. Which often involves statistics, some spreadsheets (or getting stuff from databases / any tables), and some graphing.
(from what the companies know / POV)
many times companies look for graduate degrees for their DS roles too

Many companies will ask for PhD but there is not really enough competition / supply so if you apply you may get accepted anyhow (without any degree even).
They will often make the requirements much bigger than needed.
yknow what i heard about that on a separate podcast
that particular element as well as increasing number of years allows companies to do one critical thing for job postings
decrease the amount of job applicants

dunno if thats actually true but thats what the podcast guest advocated for
hes a director-level so maybe 
interesting

i do want to try out airflow sometime
get bit familiar with it
see if i can use it for this one project
can anyone suggest me a good YouTube video on reinforcement learning ?
where did you draw this?
i need a nice platform, I'm using draw.io but I dont know...it seems okay.
please anyone help me with this 2 csv , I want a single row of each which contains only numeric data so that i can again convert that into another and use to test my model, I am getting this type of csv as a feature extracted from a package. So please help me out in this. Thank you
I tried but it gives weird output and in t2.csv unable to open using pandas and also unable to convert to float from string
each you mean c(...) ?
based on the input can you show what you want as output?
like 0.0406, 0.0363, 0.0278, 0.0206, 0.1041, -0.0145, -6e-04, 0.0654, 0.04, 0.086, 0.0775, 0.0018, 0.0285, 0.109, 0.0569, 0.0169, 0.0484, 0.161, 0.0248, 0.0696, 0.0285, 0.0367, 0.0438, 0.0269, 0.0758, 0.0389, 0.0049, 0.0367, 0.0325, 0.0796, 0.0778, 0.0334, 0.0589, 0.0939, 0.0919, 0.026, 0.0331, 0.0943, 0.0247, 0.0616, 0.014, 0.0314, 0.0409, 0.0419, 0.0949, 0.0409, 0.0249, 0.0614, 0.0345, 0.066, 0.0485, 0.0438, 0.031, 0.0688, 0.1064, 0.0406, 0.0488, 0.0868, 0.0314, 0.0431, 0.0329, 0.0514, 0.0432, 0.0533,
0.0747, 0.0552, 0.0489, 0.0638, 0.045, 0.0484, 0.031, 0.0579, 0.0085, 0.0498, 0.1074, 0.0454, 0.0442, 0.0902, 0.0173, 0.0316, 0.0124, 0.0327, 0.0582, 0.0438, 0.116, 0.0352, 0.029, 0.0849, 0.0306, 0.0418, 0.0375, 0.0412, 0.0265, 0.0628, 0.0717, 0.0515, 0.0487, 0.0904, 0.0454, 0.0399, 0.0316, 0.0517, 0.0459, 0.0304, 0.0781, 0.0454, 0.0245, 0.0442, 0.0446, 0.0643, 0.0492, 0.0501, 0.0239, 0.0616, 0.0838, 0.0381, 0.0484, 0.1091, 0.0281, 0.0469, 0.0461, 0.0619, 0.0493, 0.0503, 0.0629, 0.0572, 0.0522, 0.0611, ...... in a single row without c() for t2.csv and t3.csv
separate output for t2 and t3
like c(1,2,3,4),c(5,6,7,8,..)
should give output as 1,2,3,4,5,6,7,8...
what is t2 and t3?
not the most elegant solution, but you can:
split on ","
replace "c(" with ""
replace ")" with ""
change all items to float
okay I have to try this, if you have code ref you can share
its ugly but should work 😉
t2a = t2.split(",")
t2b = [item.replace("c(", "") for item in t2a]
t2c = [item.replace(")", "") for item in t2b]
t2d = [float(item) for item in t2c[3:]]
also asking in general channel may hel, but probably simpolifing inut a bit, jsut to show format
can we perform t2.split() directly on csv
i meant to read file first
by which func
with open(file) as f:
t2 = f.read()
@lapis sequoia
this is the unencoded matrix
i found it
the green box in Y(prediction)
Y can be either eMBB, URLLC, mMTC
as you can see the rest0(X[0:8,:]) are categorical data, thats why author might have encoded it
Hey ,wanted to begin ML ,don't know where to start and if i should to algorithms and data structures and dwell into competitive coding first
Also i don't comprehend uni level math
like the stuff in andrew Ng
hm?
yes
yes what?
I'm sorry I don't even remember what was the question. it was all yesterday.
ok, the problem was:
is ok to encode X before solving
yesterday i only had encoded data but now i have this description of dataaset
How come the 2nd question got an answer of 77mm. Can someone mind interpretin it?
Yes it does. By default, it randomizes your sample observations before splitting them into train and test set. You can also disable shuffling prior splitting.
train_test_split(x, y, test_size =0.15, shuffle=False, random_state = 2022)
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
Examples using sklearn.model_selection.train_test_split: Release Highlights for scikit-learn 0.23 Release Highlights for scikit-learn 0.23, Release Highlights for scikit-learn 0.24 Release Highligh...
Where I'm from, you basically need to be good in SQL, Excel, PowerBI and/or Tableau. Any other skill like python, etc is an added advantage.
So, you really don't need to focus on ML to excel in your job as a Business Analyst.
Hey I'm in doubt, I was training a model and I used a split part to be train and other to be valid, to both I normalized, but I've another dataset to be test, that simulate "real data", should I normalize too?
@lone merlin hello dude do u know any best discord communities on mathematics I had stuck with linear algebra, coordinate geometry, a random process that is all based on problems on computer vision and image processing problems.. Plz would u mind helping me to get what I'm looking for desperately since I took the CV and IP course which is all abt mathematical and physical underpinnings
I think a large number of people who recently got into ML had at some point used the popular Andrew NG's Machine Learning course on Coursera.
You can start from there and see if that works for you. However, if you happen to be like me who was practically dosing off each time I go beyond 20 mins in Andrew Ng's course, then feel free to drop the course and try Udemy, DataQuest, DataCamp, Books, Bootcamp, etc.
I guess the moral of my story is, try as much resources as possible before you eventually settle for one, and don't waste time to drop any material that doesn't work for you.
Only after you've gained some form of experience in learning ML would you truly appreciate using Hackathons to validate and reinforce what you've learned.
All the best ✌️
@odd meteor @catcatgurl hello dude do u know any best discord communities on mathematics I had stuck with linear algebra, coordinate geometry, a random process that is all based on problems on computer vision and image processing problems.. Plz would u mind helping me to get what I'm looking for desperately since I took the CV and IP course which is all abt mathematical and physical underpinnings
Hi Patrick, what is sauce for the goose is sauce for the gander. To ensure equity and fairness, and to give your new data an equal playing ground, yes it's necessary!
Assume you have done all the preprocessing and wrapped it in a pipeline, even your new test dataset is suppose to pass through the same furnace (pipeline) to get it in its most useful state for your ML model.
Hey guys I'm a beginner in python and I have some module installation problems,I don't know how to solve can any f you help
Or is this not the place to ask
Unfortunately, I don't know any. However, I'm sure Salt and Stelercus would have a better answer to your question.
What's the name of the module and what error message did you get?
The module name is pywhatkit
The error message is line 3, in <module>import pywhatkit
Forgive my ignorance and laziness to use google, what's the work of this library?
wrong go to #python-discussion and you need to install this module
run pip install pywhatkit
-
Did you successfully install this library on your machine?
-
If answer to #1 is yes, then try to add the location where this library was downloaded in your machine to your PATH
-
Alternatively, if you use Anaconda, you can also use conda to install the library.
point to the error and pycharm would suggest you to install it
OK
It doesn't matter. If you used PyCharm to install the library, check if it actually download the library in your default python environment or in a new one entirely.
But i've tried installing it on both pycharm and command propmt and they all say it has been installed.
Ok
I will retry
Oh and thanks for your help.
I don't know if there's any other better way to visualise a 3d dataset other than probably making it interactive. That way you can interact, rotate, and view each dimension of the data with ease.
Isn't this plotly? 😀
If not, try using plotly or cufflinks to plot and visualize the data
hello peoples
what clustering methods are great for hyperbolic data?
yeah you're quite right) only some improvments are change alpha value, size and color
@serene scaffold @digital radish plz any of u help me with this problem
@odd meteor no, plotly is good for zooming live, but I need this plots to include in my paper, so
@serene scaffold...
anyone here have any experience with pyttsx3 for text to speech? Currently weighing different options for a voice engine for my AI but for windows pyttsx3 uses sapi5 engine which brings robotic voices. Other txt to speech engines I have found need a file to read from to turn it to speech. Does anyone have a recommendation for natural sounding voices using pyttsx3 or a free API that can achieve the same thing?
anyone work with gpt3 before
If you want to use the plot on paper, I don't suppose the interactivity of the plot would still be possible. If that's the case, then you can visualize your 3D data using Seaborn, Plotly, Cufflinks, etc.
You only need to get the 3d plot then copy it and paste it ( or download and upload the image) on your MSWord or Notion.
@exotic thicket it's impolite to ping random people to draw attention to your question. Please refrain in the future--this is a warning.
@lapis sequoia did you see what I just said to pari?
oh i apologise, i had a question about where did they draw the diagram.
I'll delete it if you want.
no; never delete a message in which you pinged someone
damn, i just..did.
now they've been pinged and they'll never be able to figure out where it came from.
anyway, you can ping someone if they've already engaged with your specific question.
no they have not exactly engaged. I asked them before a while and seen them online now, so wanted to ask, but hm, I'll wait when they are actually typing or something.
shouldn't that be velocity? also as much better in terms of what? I can look if we can save in some lossless way or not.
high definition was possible in matlab as much i remember.
it's fine with matplib only, seaborn how can it improve look-up and only set the style. Anyway
no it's welocity of magnetic field, not speed velocity)
ah alr, my bad lol.
it's ok
so yeah what do you mean by improve here?
good looking, I added dpi property, increased the size of scatter points, added (but commented this line) that changed the angle of view
i used one hot encoding on all Y.
is it correct?
changed transparency value
you could perhaps include more than one angles?
Are you just interested in the aesthetics? Or are you more interested in the quality of .png file generated from each of those visualization libraries?
If Matplotlib is okay you can use that as well.
aesthetics depends on data, but the data is that I've got after research so the here's the picture as it is
https://jakevdp.github.io/PythonDataScienceHandbook/04.12-three-dimensional-plotting.html
this has some interesting ways to show them
I'd argue it depends on customization of your plot not the data. We simply cannot panel-beat the data just for aesthetics.
But then again, nobody's ever gotten an A++ from just plotting a picturesque dreamscape 😀
I personally think, you are good to go with what you have already (I mean the initial image you sent.) I honestly think it's looking nice.
i've already read the official documentation) to find out some ways
!otn a panel beat the data
:ok_hand: Added panel-beat-the-data to the names list.
yeap!) only useful command(method) ax.view_init to rotate and get the better angle of view
G'day everyone, not sure if it would be able to be done or not, but is it possible to create a VERY simple machine learning algorithm using ONLY numpy?
Something that can categorise messages as spam or not for example
If so, please ping/pm me 🙂
sure you can. hm you can use naive bayes for spam detection for example.
That's pretty cool
I'm not sure what to look for, do you think that you'd be able to nudge me in the right direction @lapis sequoia ?
Would it also be possible to do the following:
1. Export the programs "learning" to a text file of some sort
2. Read this file when a "classify" function is run?
I assume this would be better rather than training the bot every time you want to classify something?
Anyone with experience in (geo)pandas that wants to help me figuring out why my plot does not show? #help-cookie
Hi, does anyone here use kaggle, want to ask for their opinions on the cost on data since I don't have any unlimited internet plans
You can train model and then use it for inference correct
I use kaggle from time to time
I am thinking of signing up to kaggle to learn pytorch on their jupyter notebooks. i was afraid on whether it would cost money to use their gpu but someone told me that there aren't any charges.
I would like to ask, does kaggle need a constant internet connection to code
You can run code in so called commit mode. This means it's run in the background and results are saved. You get 38 GPU hours per week and some tpu hours as well. Cpu hours unlimited i think. Max session time is 12 hrs
and just to be sure, if I go over the 38 hours, it just slows down with no hidden cost right? and the datasets that I download for the training are saved in the notebooks and not locally?
If you want to use GPU above 38 hrs they try to sell gcp cloud. Kaggle is a google company. But one 38 hrs are used you will not be able to use GPU compute in that month. You can access your data.
There are more free GPU options: google colab, paperspace gradient, AWS sagemaker studio lab
thank you for answering all my questions 😄 . I actually was scared of hidden costs because of a reddit post where the comments were saying google cloud deep learning something can have hidden charges
Can you share the link?
9 votes and 26 comments so far on Reddit
You don't need credit card to use kaggle so they can't charge you :)
here's the link, not sure if the google cloud in this one is the same used for their other services
9 votes and 26 comments so far on Reddit
ahhhh, thats great to hear
Yeah for gcp you need to provide cc, so you need to be careful with what use. If you run out of free credit your card will be charged
But kaggle is data science competition platform. They provide free GPU hours so ppl can learn or start competitions, but it's not used as paid cloud computing resource as for example gcp, AWS, Azure
that is the thing that scares me, I was googling around remote services to practice ML and google cloud deep learning something came up, thought I would look at reddit before I try the account. so when kaggle came up, I thought it might have similar problems
that is wonderful to hear, thank you so much for taking the time to answer
i'll setup an account and try it only for learning data science.
Nice. Kaggle also have nice free courses (with certs) as well. One you create account you will get access to compute, datasets, competitions, courses, disciforums. It's a great platform.
See you on some competition leaderboard soon. Good luck!
yeah, the certificates dont interest me much but they are a plus 😄
...thank ...you ...i ...will...try
@lapis sequoia
Can anyone give me a good educational documentation regarding heatmapping etc. With matplotlib?
Ive found this but it isnt well explained
I’d normally recommend seaborn for that but matplot really straightforward it’s only takes one line of code @lapis sequoia
Nevermind, opened your document to find something that looks far from the heatmap I am used to
maybe look up Choropleth charts?
(and relevant documentation / stackoverflow questions for whichever libraries you use)
Anyone else find the titanic Kaggle impossible to beat 80% accuracy?
I see scores even high
the Titanic dataset is difficult because there really aren't that many data points.
Have you done the space titanic too?
It’s better
It’s about a spaceship called titanic
No they disappear

Thanks
100 % scores are cheating
why was i pinged?
ahhhhh
endless query optimizations

if only people had all their requirements listed at the beginning
tragic
this is the real data scientist life

Sorry I was desperate at that moment..
Ah you guys.. how long does RFE take to run on a 8000 row and 8 feature big set
On Kaggle CPUS
It’s been trying to select 3 features for 8 minutes now
Step=1
logistic regression estimator
Okay it’s been running for 25 mins now, and on my macs cpu 15 mins
It’s sklearn btw
Is there even any point in doing this or should it just be manual selection based on a correlation heatmap
okay my laptops about to set on fire holy shit
AAA HELP
living on the edge :))
Bro, i am on 170%
Did i do something wrong?
its been running for 30 mins
no, more actually
does this feature selection only work on data with same data type or wat?
gpu memory 15.7 out of 15.9, amost the dreded out of memory error lol
the gpu accelerator doesnt even work with mine
is it normal for feature selection to run for hours? when theres only 8 features in total?
okay well, im actually running with one hot encoded version so quite alot more
not sure why you select features from X, after you split to X_train
not sure if this will speed up massively as this is still 70% of data, but it's a correct way not to use test data to set up features etc
I have read before that its better to do this way
because it's a bit cheating, in practice you don't know the test data, that's the reason to split

