#data-science-and-ml
1 messages ยท Page 266 of 1
But even prior to that, how would I know to get the b column?
Because as of now, I have no idea where the value is.
oh I assumed that you only wanted to perform a search on the b column
if there are multiple columns then you can convert the entire thing to a NumPy array
nonzero would still work
So essentially we would need to scan across all columns for it?
yes
Ahh okay.
i'm having questions on making Sierpiลski triangle about matplotlib
hi everyone, i have a JSON question, hope this is the right channel to ask.
I need to parse the JSON from a string. it goes like this:
<!-- something something some changing text {"prodID":"20XE","isenrollment":false}} something something some changing text -->
is there a way to just get the JSON part without declaring something something parts?
yes you can use regex
hi how do i activate a conda virtual environment in colab that persists across all cells?
!source activate command works only for that particular cell
Unable to convert 'activate command works only for that particular cell' to valid command, tag, or Cog.
Can we search for index by values in pandas DataFrame?
@heady hatch why do you want to do that?
that's my first question
second question: do you want the original index, or a numeric one?
@velvet thorn Because I have a dataframe consist of unique values across, and I need to look for the index of those unique values.
eg
a ('asdf', 1) ('fdsa', 2) ('qwert', 3)
b ('zxcv', 4) ('vcxz', 5) ('qsqdqw', 6)
c ...
original index, but I'm assuming I could just map it back using the numerical one.
One solution I've found was
df.isin([value]).any(axis=1)
This will give me the index of where the value exists.
But I think looking up values to transform them isn't efficient. I think it would be better to transform the values beforehand before turning it into a dataframe.
Originally what I needed to do is count all the values in a list, then turn that count into a dataframe.
But then later on, the objective changed into count all values plus some metadata.
So I was thinking since I've already made the dataframe, I could just go back and insert the metadata at the particular values.
Apparently it's a whole other mess. hahaha
hm
@heady hatch okay, wait
originally you said "by column", didn't you?
like can you give a realistic example?
Hmm what do you mean by column?
Oh I think it was Darklight's solution of scanning by column.
I'm not really sure what the code you posted represents
(also...why are there tuples in your DF...?)
@velvet thorn
A realistic example, hmm.
So let's say we have a list of features.
#features for category
[['feat_1', 'feat_2', 'feat_3', 'feat_1', 'feat_3'], ['feat_1', 'feat_3', 'feat_4', 'feat_5']]
# turning into count
[{'feat_1': 2, 'feat_2': 1, 'feat_3': 2}, {'feat_1': 1, 'feat_3': 1, 'feat_4': 1, 'feat_5': 1}]
df
category1 ('feat_1', 2) ('feat_2', 1) ('feat_3', 2) (None)
category2 ('feat_1', 1) ('feat_3', 1) ('feat_4', 1) ('feat_5', 1)
But now I need to go back into each feature and add some metadata.
Yup.
...is it supposed to be like this?
>>> df = pd.DataFrame([[1, 1], [1, 2], [1, 3], [1, 1], [1, 3], [2, 1], [2, 3], [2, 4], [2, 5]], columns=['category', 'feature'])
>>> df
category feature
0 1 1
1 1 2
2 1 3
3 1 1
4 1 3
5 2 1
6 2 3
7 2 4
8 2 5
>>> df.groupby('category').count()
feature
category
1 5
2 4
(I meant more something like this btw)
like something that can be executed
df_csv.loc[df_csv['Type 1'] == 'Fire'] I have this but I only want to inculde those who are Flying Fire pokemon so how can I do it?
df_csv.loc[df_csv['Type 1'] == 'Fire']I have this but I only want to inculde those who are Flying Fire pokemon so how can I do it?
@undone flare show data
ok
no images please
then?
text
@velvet thorn
Something like this, this was generated via random ascii lowercase letters 100 times for 3 times.
In [34]: pd.DataFrame(results)
Out[34]:
0 1 2 3 4
0 (d, 8) (v, 7) (q, 6) (u, 5) (m, 5)
1 (c, 9) (u, 7) (e, 6) (d, 6) (q, 6)
2 (n, 8) (b, 7) (z, 7) (o, 7) (p, 6)
although I'm going to assume that you have a Type 1 and Type 2 column
yes
accordingly, I believe you want df_csv[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2'] == 'Flying')].
assuming 'Fire' can only be in 'Type 1'
...
HAHAHA
Wiat
Don't do it.
@velvet thorn
Something like this, this was generated via random ascii lowercase letters 100 times for 3 times.In [34]: pd.DataFrame(results) Out[34]: 0 1 2 3 4 0 (d, 8) (v, 7) (q, 6) (u, 5) (m, 5) 1 (c, 9) (u, 7) (e, 6) (d, 6) (q, 6) 2 (n, 8) (b, 7) (z, 7) (o, 7) (p, 6)
@heady hatch ๐ฅด
why are there tuples in your DataFrame
that is Bad
๐ฅด
not Bad, but still Bad
no I mean
It's terrible but the people wanted me to do this wanted the data like this.
Or hmm do you have any other suggestions?
do I have to do df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]?
Depending on what you want.
do I have to do
df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]?
@undone flare that's literally what I typed
without the .loc
and assuming this
assuming
'Fire'can only be in'Type 1'
@velvet thorn
if you can have Flying/Fire in that order then you need to add a bit more
ok
It's terrible but the people wanted me to do this wanted the data like this.
@heady hatch uh...
Or hmm do you have any other suggestions?
@heady hatch to stoer the data differently?
what's the first element in the tuple
[Counter({'j': 7,
'f': 3,
'y': 8,
'b': 2,
'c': 3,
'm': 8,
's': 6,
'z': 6,
'r': 3,
'a': 3,
'h': 6,
'd': 5,
'w': 1,
'p': 4,
'g': 2,
'i': 5,
'u': 6,
'q': 5,
'o': 3,
'n': 3,
'l': 2,
'k': 4,
'x': 3,
'v': 1,
't': 1}),
Counter({'i': 6,
'e': 5,
'f': 5,
'v': 6,
'g': 4,
'o': 2,
'x': 4,
'q': 1,
'm': 2,
'k': 4,
'y': 3,
'w': 4,
'a': 3,
'r': 3,
'z': 9,
'd': 3,
's': 4,
'h': 5,
'n': 2,
'l': 2,
'p': 8,
'c': 5,
't': 2,
'b': 5,
'u': 2,
'j': 1}),
Counter({'d': 2,
'x': 3,
'b': 5,
'k': 4,
'i': 6,
't': 5,
'v': 9,
'm': 5,
's': 3,
'a': 4,
'z': 5,
'p': 3,
'r': 4,
'o': 9,
'q': 5,
'l': 5,
'c': 3,
'e': 4,
'u': 1,
'g': 4,
'n': 1,
'f': 1,
'h': 4,
'j': 3,
'y': 1,
'w': 1})]
So the data is something like this.
It's a count of certain values.
and they want the top 50, each as a column.
Not the top 50 of alphabetical characters but top 50 of something else.
The first element of the tuple is the key in the count, the second value is the count itself.
This gives me error df_csv.loc[(df_csv['Type 1'] == 'Fire') & (df_csv['Type 2' == 'Flying'])]
And there are a thousand something of these counts.
What kind of error are you getting?
The placement of
]was wrong
@undone flare yes, because I told you to look at the code that I wrote
not edit what you wrote...
as I said, you shouldn't be using .loc
And there are a thousand something of these counts.
@heady hatch wait, go back
so each individual Counter instance, when stored in the DataFrame, should have something to identify it?
i.e. a count from one is distinguishable from a count from another
Itโll be identified by another value, which will be their index.
yeah
Iโm on mobile so I canโt type code. But something like
{โvalueโ: Counter(...)}
And the value will be the index.
that's what you want in the result
what I mean is
IDEALLY
you would have a DataFrame with three columns
category, character, count
not that tuple mess ๐ฅด
Hahaha Iโve maintained two data frames. One before the tuple mess and the other one as the output that the other people want them.
as I said, you shouldn't be using
.loc
@velvet thorn I am learning rn
But then I needed to edit the tuples which started this whole journey.
which is why I'm telling you not to use it
Lmeow
I'm just saying
it'd be a lot easier to add metadata
you would have a DataFrame with three columns
@velvet thorn with this
add one more column, done ๐
I think the reason they wanted it is because theyโre not familiar with Python and they want to visually understand the counts.
create a visualisation then
Iโll let them figure that out and Iโll update you tomorrow on what happens.
Going to head to bed, good night and thanks again.
yw!
Hello everyone, I've been having a little issue. I'm unable to import datasets from sklearn. I'm getting a "URLopen error (error no 11001) getaddrinfo failed"
Hello everyone, I've been having a little issue. I'm unable to import datasets from sklearn. I'm getting a "URLopen error (error no 11001) getaddrinfo failed"
@boreal summit HUH.
are you running behind a proxy?
like are you in school or something
or somewhere that restricts what sites you can visit
Dru, that has to do with your internet
Check and try again
No, I'm running it on vs code.
Sorry, I had to go do something real quick.
@velvet thorn @brazen canyon it's on vs code.
Running jupyter on vs code.
running it on vscode has nothing to do with your internet, read the questions above again....
How would you guys break this down np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)What should be the result of that shape..Is is a 7x7 matrix or?
np.zeros((7,7)) This will give 7x7 matrix
I'm also not connected to the internet.
I think I got it..is this a 4D tensor then?
yea I think so
I'm also not connected to the internet.
@boreal summit you need to be
the datasets are downloaded
if you're accessing them for the first time
How would you guys break this down
np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)What should be the result of that shape..Is is a 7x7 matrix or?
@lapis sequoia it's 4D
@velvet thorn ooh, I never knew. I thought they come with the installation. Thanks for the tip. ๐๐ฟ๐๐ฟ
How can I get this only for Bug?
@velvet thorn ooh, I never knew. I thought they come with the installation. Thanks for the tip. ๐๐ฟ๐๐ฟ
@boreal summit np! the thing is some of the datasets are a bit larger
and many people will never use them
@velvet thorn true, that's a valid reason.
@undone flare filter where Type 1 == 'Bug' before doing the groupby()
df_xlsx[df_xlsx['Type 1'] == 'Bug].groupby(['Type 1']).count()['count']
'Bug' - i missed the closing quote mark
thx
and put Type 2 in the groupby too
Hello guys
lets say I want to do this
if condition meets put 1 else put 0 in the row
how do i add ELSE to this
data.income = data.income.replace('>50K',1)
data.income.apply(lambda x: if x == '>50k' then 1 else 0)
lambda?
or you can use np.where()
how do i do it with np.where?
np.where(data.income == '50k', 1, 0)
uh
data['income'] = (data['income'] == '>50K').astype(int)
in general, don't use apply if there's another method
How would I order the labels in the x axis of a graph using seaborn
@whole vortex they should beo rdered by default
So my data contains a date/time datatype and I've created a new column to retrieve and show the specific day based on these date/time values
That works well and good however when the graph is displayed, the days are ordered randomly
I have 6 of these graphs btw
Ideally I want to start with Monday and end with sunday, do you or anyone here know if there's a way to custom order the labels here
ah, okay
this is a bit different
they're not ordered randomly
they're ordered in increasing order of value
you need to order the source data
I haven't done anything to change the data's order
I've only been adding data to the pre-existing rows and analysing it all in different ways
That reference is what I used to be able to create 6 separate graphs with the data I had and to present them nicely
I'm not restricted to seaborn, I've just been sticking to it because it looks nice ๐ฌ
I don't mind trying something new? To be honest, I think this is an aesthetic problem and not really needed but I think it'd be nicer to have the days ordered
sorry got distracted
@whole vortex okay I don't normally use Seaborn (don't like the abstractions)
there's probably a way
but I don't know what it is
in matplotlib
Aha, don't worry, you're volunteering ๐
I would just process the data manually
because that's basically the result of a groupby, right
and feed that directly to ax.plot
because then I would be able to control the order of the data
This is going to be interesting. I'm quite new to data science as a whole so still figuring some things out
I did come across something earlier regarding ordering the days but didn't manage to apply it
{row,col,hue}_orderlists
Order for the levels of the faceting variables. By default, this will be the order that the levels appear in data or, if the variables are pandas categoricals, the category order.
this might help
check that out
@velvet thorn does matplotlib have a facetgrid equivalent
I'm unsure how I'd go about this yet
yes, take a look at subplots() on the matplotlib documentation
can anyone help me with matplotlib? I am doing Sierpiลski Triangle
i have a value, temperature, and i want to make a graph in matplotlib with 0 C to 50 C, and i want my temperature to show on that graph, how will i make this?
What type of graph do you want
In this short guide, you'll see how to plot a Line chart in Python using Matplotlib. Example is also included for demonstration.
i see, thanks
but the data is different from the examples
i have one data and i want to display it between a range
so a straight line
https://pythonforbiologists.com/ @left vault out of my realm of stuff I know but this might be helpful
Hey! I'm wanting to learn how to make an RNN but I can't find anything that doesn't require Tensorflow. However, I am unable to install tensorflow through pip as I get an error that a lot of other people seem to get but none of the alternative command lines work.ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none) ERROR: No matching distribution found for tensorflowI've tried lots of the .whl files that I've seen suggested as solutions but either pip says it's unsupported or it just results in another large error. Any ideas on an actual fix?
I'm on Windows 10 64bit, I wish to use my GPU, I just updated to 3.8.0 to see if that might fix it despite the fact Tensorflow is supposed to support Python 3.5 and up, I'm on the latest version of pip... let me know if you need any more information
hello
I am in my 1st sem of DSA
please suggest what I should be learning out of class
@flint arrow do you like courses or books
you didnโt answer my question lmao
Can someone pls help me understand bias in a neural network
say the bias is 1, does it act like another input and have a weight for each output/node
or does it just add 1 to each node.
Ive seen both, and idk which is correct or if both are and when to use one over the other
@obtuse skiff each neuron always has its own bias
but there are two ways to represent that
one bias per layer and one weight per neuron
or simply one bias per neuron
output = w * a + b, which is equivalent to w * (a + b / w).
although you can have one bias per layer
but that would make it harder to fit
@serene scaffold here i am
there you are indeed
let's see if we can figure out what this article is saying: https://sebastianraschka.com/faq/docs/lda-vs-pca.html
Both LDA and PCA are linear transformation techniques: LDA is a supervised whereas PCA is unsupervised โ PCA ignores class labels.
by the way, any time you have a general question about machine learning, a lot of people who know way more about the subject than me hang out here.
in this particular channel.
I red it before, from my perspective for PCA is that it sees a covariance between 2 different datas and then tries to standardize it?
by the way, any time you have a general question about machine learning, a lot of people who know way more about the subject than me hang out here.
@serene scaffold ok thanks for the info
I red it before, from my perspective for PCA is that it sees a covariance between 2 different datas and then tries to standardize it?
@smoky bobcat is this right by any chance? like Tries to standardise data by looking at covariance matrix between different data @serene scaffold
I only learned about LDAs recently so I'm trying to wrap my head around all this myself
hmm
I'll see if another staff member can more effectively answer this question.
ook
i think that LDA is more about classification while PCA is more about standardisation
i think that LDA is more about classification while PCA is more about standardisation
@smoky bobcat ...what do you mean by that?
@smoky bobcat ...what do you mean by that?
@velvet thorn i mean that LDA tries to classify the data in different portions while PCA tries to get all the data at the same level. correct me if im wrong, im not an expert just a noobie trying to understand
``` this is pretty vague, but what would be the best way to plot this kind of data, i just want to plot a few things like temp and humidity, i am getting this data from an api
can someone help?
@tropic junco your question is pretty vague.
as you noticed
also the data is very chunky
like maybe if you shared your ultimate objective
it'd be easier to help you
"I just want to plot a few things" <- what did you set out to do originally?
i just want to plot a graph for temperature and humidity
but i am getting confused as how to do it, as i cant plot one time values
wdym?
you said "this kind of data"
so I assume you have more like that
so just extract temperature and humidity
now you have 2 1D arrays
scatterplot them against each other
i mean, if i have temp given 25 C , how do i plot it between a range of 0 C to 50 C
oh
Hey @velvet thorn :^) They wanted to separate out the number in the tuple as its own column now.
eg
('a', 1) ... -> ('a', 1) (1)
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Hahahaha
I'M DYING
๐ฅด
so who wants you to do this
I'm guessing they don't have DB experience
it just shows an empty graph to me
hahaha I think they wanted to do this so they can visually see how the data breakdown.
it just shows an empty graph to me
@tropic junco show ocde
nvm, i realized it will be useless to plot a graph of one time values rather than comparing it with past ones
like a graph of the temperatures in the past week
you mean latent discriminant analysis, right?
@velvet thorn yes sir
Idk if this is the right channel, but is anyone here fammiliar with likelihood ratios?
@hollow sentinel I meant I am in a course and its ok.
but u can suggest me some other courses too.
how can you create graphs with sqlite query?
or, what is the best way to create a graph for the user's messages i get from my discord bot?
@flint arrow python for data science and machine learning bootcamp
Udemy
right..thank you.
Is there a more efficient way to do batched scatter operations (in TensorFlow terms) in NumPy? Currently I have something like this:
>>> import numpy as np
>>> a = np.zeros((5, 5))
>>> indices = [4, 2, 4, 3, 1]
>>> np.add.at(a, (np.arange(5), indices), 1)
>>> print(a)
[[0. 0. 0. 0. 1.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 1.]
[0. 0. 0. 1. 0.]
[0. 1. 0. 0. 0.]]
I'm also interested in ways to parallelize a loop that calls the batched scatter operation in each iteration.
The use case is building a histogram from a dataset (in practice the dataset is a generator instead of a NumPy array because it doesn't fit in memory).
import numpy as np
n_samples, n_positions, n_bins = 1024, 256, 100 # Real situation: (~64k, ~64k, ~256)
hist_per_position = np.zeros((n_positions, n_bins), dtype=int)
idx_dataset = np.random.randint(n_bins, size=(n_samples, n_positions))
for bin_indices in idx_dataset:
np.add.at(hist_per_position, (np.arange(n_positions), bin_indices), 1)
Hey guys how do i mark code here?
If i need to post same code here?
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
Hey @rich silo!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:
โข If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
โข If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
!code-blocks
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
print('Hello world!')
There you go. ๐
Hello guys I have a question I'm trying to gather a bit of information for a project and I'm looking into an image classification problem, where I have for example different animals and the program needs to be able to classify the animals with best accuracy? What would you guys recommend me to look into considering I would want to test multiple algorithms and see what would be most accurate for such a problem should I use MachineLearning or Deep Learning and what tools should I learn or libraries?
If I use keras would I be able to specificy which algorithm I want it to use or how exactly does it work
Well I just want someone to direct me a bit tbh, the algorithm to classify the pictures
If you want to use other types of Machine Learning methods (such as KMeans) you may want to take a look at sklearn
like different algorithms will give different accuracy
should I be using ML or DL?
what would be easier ?
DL is a subset of ML
In your case, I do recommend using DL
There are plenty of tutorials on Keras that you can search online
I know its a subset but I don't understand then if I use ML would then ML automatically use DL behind the scenes
Then I can specify in Keras whether I want it to use CNN, RNN or other algorithms?
Hi everyone, as one of the authors of the open-source framework github.com/dstackai/dstack Iโd like to kindly share with the community what I and my friends are doing currently to help use ML models in applications.
In today blog post we wrote on how one can run ML models on live data to build interactive reports with our open-source library https://blog.dstack.ai/run-ml-model-on-live-data-to-build-interactive-reports If this is something relevant to your work, weโd appreciate your feedback!
Hi guys! I'm looking for projects which utilises the concept of digital twins. I'm doing a research for a school assignment and thus would like to see what has been done already.
data-science , I am looking for SymPY for calculating integral from calculus, I am struggling with some fundamentals for calculating the area under the curve. Can anyone help ?
Hello all, I am looking for some help with plotly.
I want to make 2 vertically stacked graphs that share the same range slider (and also the x-axis).
This is my code so far:
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.2, 0.8]) fi - 5a075f09
Too long to paste
@rich silo plotly provides a library called dash, which has that capability. Maybe you could take a look at it?
Bind interactivity to the Dash Graph component whenever you hover, click, or select points on your chart.
@grave path Sorry for the late response, you basically build your model block by block so it's highly customizable.
@hasty grail its okay thanks a lot man I'll just have to figure out whether I will do it in ml or dl considering the time frame I have and which is less complex as Im learning ML right now
anyone can suggest me a really good dataset to work on for a uni coursework?
@smoky bobcat Kaggle is a great source for datasets
@smoky bobcat uspto has some good sets https://developer.uspto.gov/data?MURL=data
it should just be pip3 install pep8
I don't know if this is the place to ask but should I learn SQL before starting a data science course or am I able to learn it while doing it?
you dont really need sql for data science
Oh, do I need any other language understanding besides Python 3 or should I be set to dive in and have fun with it
no you can just go straight in lol
woot thank you, have a good one
@smoky bobcat depends on what you want to do
uni coursework
yes but like do you want tabular data?
what do you mean
numbers
that would be tabular data
its been all day im searching for a good dataset as i need to start working on something asap
that would be tabular data
@austere swift oh okay
tabular data is basically just any data thats in the form of tables, like different features of something
Here's something real basic.
@heady hatch bro, cant do the most basic ones like these, these are used as example is in uni lectures
Oh man.
Hey guys, i'm a recent grad of computational physics, and for this year i've studied lots of python, data science tools like numpy, pandas, scikit-learn, data visualization, basic sql, machine learning fundamentals and algorithms with scikit-learn, and neural networks with keras tensorflow. I'm doing some projects, but I feel like it would be better for me to ask an experienced person on some tips, so that I know i'm not just wasting time. What more should I learn, what projects should I make, and is there something that i'm missing?
well if you havent gone into the deep math of neural networks and machine learning i highly recommend you do since thatll help you make much better models and will make your life a whole lot easier
as for projects that's really up to you and what you wanna do
since you're into computational physics you can do some projects of machine learning in physics
there were a few papers I've heard of that used machine learning and deep learning for CFD simulations and it made them wayy more efficient in terms of processing and speed, you can try to replicate those
Yeah, I did go into the maths, they really are useful. Some holes here and there but I intend to make an implementation on each and everyone of them soon enough to make sure I learned. Yesterday I did a little project on computational physics, went well actually.
I'll try looking up for them
I've been wanting to know what else should I learn to get started on the career as a junior DS. I've heard that stuff like Azure spark is important
Does anyone knows a platform for people looking for a mentee?
Apparently its all paid, and I don't have much money atm. And I only want some guide/roadmap, I don't need someone to teach me something in specific.
@errant cargo Hmm anything you're looking for specifically?
To say that you're not wasting time, and what you should learn, depends on what your final goal is.
Do you want to get a job? Do you want to go back into academia? etc etc.
Getting a job first for sure
Okay, DS have different requirements and definitions at different companies.
Do you know what kind of company and what kind of ds they're looking for?
I think it's good that you have a good pool of skills to refine.
Now the next step would probably be looking for particular company to understand what skills they're looking for.
makes sense
Because how's your analytical skills?
than work on the stuff they require
In terms of EDA, i believe that a few more notebooks and it'll be really decent
THeres some statistical concepts that I need to learn that my uni didnt cover, but thats fine
Let's say a company asks you to break down why their user engagement is decreasing by 10% over the past few weeks.
EDA is nice and helpful in many things but not really helpful if you can't get some insight that will help with the solution.
itll depend on what do I have to work with
But I guess that what I have to work with depends on me as well
Say, maintaining a data base
Here's some of the definition of DS I've come across.
- Hard ML researchers
- DS for products/decisions
- DS, that's a senior version of DA
- Some combination of DA + DE, maybe MLE
Probably many more.
Hard ML researchers usually look for graduate degrees in actual ML.
DS for products and decisions is dealing with the question I asked above.
senior version of DA is also that but I suppose adding ML to the mix.
Sometimes company doesn't have infrastructure so they ask you to do the data engineering too.
That would be something i would have to work on a lot if they ask me
since I dont have a CS curriculum, just a computational physics
I think if you want a direction for the next step to take, talk to people who are actually working and ask them what their company is like and what their data scientists are like.
I think having some kind of comfort with programming is nice.
I've been interested in IBM recently, so i'll try that first
Which then helps you ease into what company might be actually looking for.
I'll try to find some then
Atm i'm just developing skills that I know that i'll use as a Data Scientist, but I havent looked into the gritty details yet
Which now would be the moment
Thanks a lot, would definitely help
Not to be mean but to play devil's advocate. How do you know you'll use them as a data scientist?
Unless you've had data scientist experience already, I'm curious of what you're using as your ground of evidence.
Everywhere that I looked it mentioned
I'm mostly learning from books that are focused on data science
Data Science Tools for python, hands on machine learning with scikit-learn and keras
I don't have any fancy degree in ML and pretty much everything is self taught.
Currently working as a NLP engineer.
It's rough
There are some data scientists I've come across that doesn't touch ML at all.
Thats cool, i've been wanting to learn a bit on NLP
Which then kinda makes me question why are you learning sklearn and tf/pt if you're not going to use them on your job.
But I'm digressing.
I think asking industry people for their experience is a much better metric.
Because you get to see what they're working with and what they're looking for.
yeah, nothing better than people actually working on it
Although I would guess that it would depend a lot on the job that they're doing
Mhm.
So I would have to ask more than one person
thanks a lot for the help though, def helped
yeah, it did
Since you're working already, would you recommend me getting an intern before trying to apply as a DS?
Yea, unless you have some kind of connection to the company.
Or maybe sometimes they're okay with you just having academia experience.
I think that part depends on how well you sell yourself in terms of job search + interview.
yeah
In any case, maybe its good to do 2 months or 3 of internship just to fixate the stuff i learned
thanks for the talk bud
Ye, update us. Would love to hear your progress.
Yeah, same for you
how do I balance a dataset?
You can under, over, or combine under and over sampling.
@heady hatch u good good in this ml stuff?
lol u work?
You should ask your ds questions. hahaha
lol
Hey guys question on fine tuning gpt2.
Let's say I'm trying to generate stories, would it be better to fine tune it on the whole story text or the stories broken down into sentences?
Never mind, figured out a direction to head towards!
Hello all, I am looking for some help with plotly.
I want to make 2 vertically stacked graphs that share the same range slider (and also the x-axis).
This is my code so far:
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.2, 0.8]) fi - 5a075f09
anyone use Kaggle on here?
I have an variable integer that is 20201015.
How do i convert it to datetime while maintaining the format of yyyymmdd?
@proper swift yeah it's a great resource
@hollow sentinel could you get this csv file for me, https://www.kaggle.com/crawford/80-cereals?select=cereal.csv
ive forgot my password and my reset email hasnt come through yet :/
you can't download it yourself?
oh
print(val_y.head())
anyone know why it's saying invalid syntax
I don't see it
nope still wrong
why are you printing the df?
bc Kaggle asked
oh lol
# print the top few validation predictions
print(iowa_model.predict(val_X.head())
# print the top few actual prices from validation data
val_y.head()
confusion
why is kaggle so stupid
idk why it's wrong too
nvm copy pasting from the answer key fixed it for some reason
Is this the channel for stuff related to machine learning and AI?
yessir
If so...
Can you guy recommend any tutorials to learn neural networks? I've watched the series about neural networks and the series about machine learning by tech with Tim
Dunno if you know him
But now I feel kinda stuck in what to do next
no problem
Hello all, I am looking for some help with plotly.
I want to make 2 vertically stacked graphs that share the same range slider (and also the x-axis).
This is my python code so far:
https://controlc.com/5a075f09
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.2, 0.8]) fi - 5a075f09
sure
import seaborn
import matplotlib.pyplot
import numpy
import pandas
import requests
import re
import parse
from parse import *
import pandas as pd
#Pull Database, from Site
DB = requests.get("https://www.milehighcomics.com/cgi-bin/genresearch.cgi?title=SUPERM").text
#Global Variables, to pull from
lines = DB.split("\n")
#Create Easy Dataframe to Confirm Conditions
data = pandas.DataFrame({
"Store": "Mile High Comics",
"Comics": lines ,
"Comics": lines ,
"Comics": lines ,
"Comics": lines
})
#Change Data Frame size to display The entire Data frame
print("This is working?!?!")
#Fuctions, which search the scraped Site
def BatmanMap(line):
for line in lines:
return 1 if search("Batman", line) else 0
def WWMap(line):
for line in lines:
return 1 if search("Wonder Woman", line) else 0
def GLMap(line):
for line in lines:
return 1 if search("Green Lantern", line) else 0
def FMap(line):
for line in lines:
return 1 if search("Flash", line) else 0
#Mapping to the Data Frame
data["Wonder Woman"] = data["Comics"].map(WWMap)
data["Green Lantern"] = data["Comics"].map(GLMap)
data["Batman"] = data["Comics"].map(BatmanMap)
data["Flash"] = data["Comics"].map(FMap)
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(data)
The project is to eventually, graph a bar chart, detailing something, what i choose was superheros appearing in Superman Titles, on this comic stores Site, however i suck at coding, and it doesnt seem to be working, i added in the Pd.Set_option, but ever since that was added it just makes 2 data frames, one which is the Entire Sites, Source and another where it is correctly formatted, but doesnt work (because i suck)
so right now im just trying to get it to where Pandas formats the Dataframe i want, and not reposts the sites Source....
I don't know if this is the issue you're having but I think your dataframe is initialized with the same column rewriting itself.
data = pandas.DataFrame({
"Store": "Mile High Comics",
"Comics": lines ,
"Comics": lines ,
"Comics": lines ,
"Comics": lines
})
Any recommendations on a finite difference book that gives examples in Python?
how can i make a bar graph, with my x axis like - [1, 1, 1, 2, 3, 3, 2, 1, 5, 4, 1, 1, 2, 4, 2, 3, 1, 2], basically i want to make a graph based on the occurences of same elements
@tropic junco I'm not sure what you're talking about with the xaxis, but you can look into how to make a histogram.
i see
i did it :)
Congratulations!
what does index_col do in pd.read_csv()
It sets the index as the column you want.
k
is this the chat for aperture science
good morning
Traceback (most recent call last):
File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
resp = resource(*args, **kwargs)
File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
return self.dispatch_request(*args, **kwargs)
File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
resp = meth(*args, **kwargs)
File "E:\demo3\recDoc1.py", line 283, in post
print("{}: {:.2f}%".format(label1, predictions1 * 100))
TypeError: unsupported format string passed to numpy.ndarray.__format__```
Hi, I'm trying to deploy my custom keras flask app which has a size of about 2.3gb and due to these heavy size constraints, I don't think it is possible to use heroku or netlify to deploy it. Is there any alternative?
*free alternative
Or even a budget alternative
Google VM??
@mild topaz The error is pretty self-explanatory - you passed an incorrect datatype
hi, someone can help me and say me what thats mean :
ValueError: exog does not have full column rank.
Did you google your error first?
a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
print(a.fit())```
yeah and not find a solution
import pandas as pd
import numpy as np
from linearmodels import PanelOLS
#lecture data
data = pd.read_excel("familyfirms.xlsx")
#drop NaN
data.dropna(inplace=True)
#Log Tobin's Q
data['logQ'] = np.log(data['Q'])
#Log age
data['logage'] = np.log(data['agefirm'])
#Log assets
data['logassets'] = np.log(data['assets'])
df = data.set_index(['company','year'])
a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
print(a.fit())
data```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-313-bb26f941ef72> in <module>
21 df = data.set_index(['company','year'])
22
---> 23 a=PanelOLS(dependent=df['logQ'],exog=df[['founderCEO','logassets','logage','bs_volatility']],time_effects=True)
24 print(a.fit())
25
~\anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights, entity_effects, time_effects, other_effects, singletons, drop_absorbed)
1038 drop_absorbed: bool = False,
1039 ) -> None:
-> 1040 super(PanelOLS, self).__init__(dependent, exog, weights=weights)
1041
1042 self._entity_effects = entity_effects
~\anaconda3\lib\site-packages\linearmodels\panel\model.py in __init__(self, dependent, exog, weights)
242 )
243 self._original_index = self.dependent.index.copy()
--> 244 self._validate_data()
245 self._singleton_index: Optional[NDArray] = None
246
~\anaconda3\lib\site-packages\linearmodels\panel\model.py in _validate_data(self)
381 w = w / w.mean()
382 self.weights = PanelData(w)
--> 383 rank_of_x = self._check_exog_rank()
384 self._constant, self._constant_index = has_constant(x, rank_of_x)
385
~\anaconda3\lib\site-packages\linearmodels\panel\model.py in _check_exog_rank(self)
343 rank_of_x = matrix_rank(x)
344 if rank_of_x < x.shape[1]:
--> 345 raise ValueError("exog does not have full column rank.")
346 return rank_of_x
347
ValueError: exog does not have full column rank.```
thats the rror
error
i dont understand why
everything seems okay
Any leads on chatbot powered by Generative Models? Even any git repo link will do.
@golden saffron DO you want to make one, or do you want to use some pre-existing model?
@grave frost I want to make one, but need some reference and guidance. I have already made few rule based and context based bots.
Yes, my next target is to make a Bot who can interact with user.
Yeah, but do you know Machine Learning?
Something like GPT - 3.
Are you actually trying to make GPT-3 ? I am confused
Yes, ML, NLP, RL I know. What to use RL for the chatbot.
Not exactly GPT - 3 but as I mentioned above some RL based chatbot.
You can't use RL in a chatbot ๐คฆ
You can't use RL in a chatbot ๐คฆ
@grave frost imagine lol
FIrst, I recommend brush up on the basics of ML and NLP first before diving in to chatbots
Why not. every conversation will be at one state, there would be some information available about that user that can be used for the conversation
so what do u want exactly?
hey!
hi
guys i am new to programming any good resource to learn data science\
That state can tell me the interest of the user, at lest gender age etc that can be used in conversation.
guys i am new to programming any good resource to learn data science
@unborn wraith sure do you know the maths already?
That state can tell me the interest of the user, at lest gender age etc that can be used in conversation.
@golden saffron ok , just tell us for what task is the chatbot for
or do you need some calculus, linear algebra and stuff too
@grave frostI think you are mistaken about RL. RL is all about having a state, a option to be picked up and a reward.
WHy cannot that be applied for chatbots?
@golden saffron You can research about that. Bottom line is that it would produce a random bag of words
please tag me
@misty cargo nope
@unborn wraith oh ok then i recommend starting with calculus, you can find courses on mit open courseware for both LA and Calculus
Dude, That would come up with a set of meaning responses that the RL will have to select.
that applies to probability and statistics too, mit has pretty good courses
Dude, That would come up with a set of meaningfull responses that the RL will have to select.
@unborn wraith Just see 3b1B youtube videos and it would keep you an extremely good base
e.g. a Hi can be responded by Hi, how are you
@golden saffron bro, it doesn't work like that
or by Hello, whats up
after that i suggest
can anyone provide me a link?
You would have to provide a whole skeleton for RL to fill it up with below avg accuracy
https://www.coursera.org/learn/machine-learning Stanford Machine Learning (Andrew NG)
http://work.caltech.edu/lectures.html Caltech courses that are great
https://www.fast.ai/ EVERYTHING FROM FAST.AI
@grave frost, can you explain where did you actually used RL? and whats your understanding of it.
@unborn wraith https://www.google.com/url?sa=t&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwibvIzE2vDsAhWlguYKHf5LC2MQFjAAegQIDRAD&url=https%3A%2F%2Fdevelopers.google.com%2Fmachine-learning%2Fcrash-course&usg=AOvVaw3xRM4CQgVMATc_B_e56j3H Google's crash course. Very good with interactive things and 2 mi videos. Google whatever you dont get or ask it here. It is for absolute beginners
thanks
Yw ๐
thanks
@unborn wraith np
I just want to explore RL into Chatbot to understand the user and have a better meaning-full and rewarding conversation.
if it fails that will be perfectly fine.
bro, you can explore ofc, no one is stopping you, but you will have to research a bit to find out how it can be used. THe way you described it not how it is to be done
Okay that means you don't know RL.
also guys i came talking here just because there was some requirement of sending 50 messages or smth
ohk, I am not saying anything now 
if you need help im free to help
any tips for ds?
@undone flare depends on what you want to do (in general)
data analysis..
any tips for ds?
@undone flare don t jump to deep learning right off
yea I am not
most problems can be solved with scikit in like light speed
even tho you may not like it lol
@undone flare yeah, there is a site called kaggle.com with great datasets. There is something called EDA - Exploratory Data analysis. Find a dataset you like (There are tons of real world ds and are pretty great). You can check the EDA others have done and try to learn the libs....
I learned the basics of NumPy and currently learning Pandas and I am using the Pokemon Dataset from Kaggle
great! It's a pretty good place to start your DS journey.
Should I take up the udemy bootcamp course?
ohk, I am not saying anything now
@grave frost Anyways you don't know much, still thanks for the info.
Should I take up the udemy bootcamp course?
@undone flare Take free things, There are lot many free things available and most important take up a internship.
@grave frost the course u gave me is that for maths?
@golden saffron If you keep saying things like that, you would be reported to server sooner or later. We are not getting paid to help you - it's completely voluntary. Don't get too frazzled up about these things and google things first instead of harassing others
Internships will be very help full in learning real life issues in data science.
@undone flare Take free things, There are lot many free things available and most important take up a internship.
@golden saffron just wanted your guys opinion
alright
thx
@unborn wraith It tries to explain things intuitively - without maths. That's why I liked it since it helps to grasp concepts easily and then explore the maths side of it
ok thanks ๐
@undone flare When you feel like it ๐
is it worth to learn ml and data science now?
@undone flare If you are getting bored of EDA and other things, just do some courses to help you get started. Try simple things first (perceptron and linear regeression are good first projects, though the names may sound heavy)
@unborn wraith yes
after learning some basic modules?
@undone flare maths(calculus, linear algebra, probability & statistics), then classical ml and then deep learning
Also @undone flare, Try working with spark and cloud as well as many real life datasets are on cloud and they use spark ML libraries for the same.
@undone flare maths(calculus, linear algebra, probability & statistics), then classical ml and then deep learning
@misty cargo I still need to learn calculus
is Kaggle Competitions good?
@undone flare Youtube it. Easiest way to learn something fast
@misty cargo I still need to learn calculus
@undone flare i suggest mit open courseware
is Kaggle Competitions good?
@undone flare yup but focus on the ones marked with #knowledge
Okay, thx guys very helpful ๐
yup but focus on the ones marked with #knowledge Any reason why? lol
can still participate in the lower end ones, like $500 or so
I only know numpy and pandas so I will just skip competitions for now lol
@golden saffron If you keep saying things like that, you would be reported to server sooner or later. We are not getting paid to help you - it's completely voluntary. Don't get too frazzled up about these things and google things first instead of harassing others
@grave frost Look bro, I asked for some suggestion. If you dont know its fine, No need to panic and rant out things, A Generative models donโt rely on pre-defined responses. They generate new responses from scratch. When we have multiple GMs they will give multiple responses, and I am just exploring RL here. You responses to my question was simply idiotic. The above definition is textbook def of GM.
@undone flare S'ok - you will get there eventually
@golden saffron Man, just read what I posted above. I didn't doubt about RL of GM. All I said is that your approach to using RL/GM in NLP is very wrong and you should research about that.
Right now you are just raging that on why it wouldn't work and saying that I am not fit to answer. If you think so, just ignore me. Why would you keep pinging me after that??
please keep it civil, both of you.
hEaTeD
I don't think the Kaggle mini courses are helpful they're kind of cookie cutter
Is that analogy supposed to be obvious? I don't do much baking
oh that just means it's really simple
I'm kind of scared of Ng's course bc it's not in Python
so I'm doing Kaggle instead
What? What is it in - julia, matlab or somthing?
That's surprising, should have been converted to python by now
yeah but I found a github that does everything in python
so that's good
I think the google crash course is good too
yeah, but not much coding (atleast not in the start)
I like courses that make me code from the start
I didn't like Columbia's course bc it was so focused on theory that it was boring
it's why I liked the Python for Data Science and Machine Learning Bootcamp so much
yep, especially in ML, the theory-practical balance is just too bad. There are vids explaining complex things in 4 points and then there are people who explain it all by pretty advanced code. sad
He has other important work too, except making new courses 
PLus there are plenty others too, so it's not like there isn't much choice
I think matlab is great - it doesn't even require coding for most tasks
Like regression is done from GUI
Just a few values here and there, load the database, a couple drop downs and boom, your regression is done. And it handles some pretty complex graphs upto 3D (in old version) too
Ng has people do linear regression by hand
no using sci kit learn
ooooooooh spooky
there's nothing wrong with doing linear regression by hand
I know I just never did it before
yeah
can someone explain what underfitting is? What does it mean to perform poorly on training data
because the real ML stuff happens always "by hand"
@ebon lynx I disagree- in the world where every other guy uses scikit-learn and Keras, it just isn't ML "by-hand" anymore - more like a glorified version that involves programming for people is a trend, so they can secure a good job
@grave frost I do scikit learn + keras but I still feel like I'm missing out
a lot of real world problems require knowing how to program the solutions
That's my point
lol @ebon lynx i still donโt get it
oh ok
Few understand how it all actually works
cybersecurity is not a niche field
oh
neither is ML
Though ML is kinda
Coz the people who truly understand it have PHD's - years of experince and studying to get to be the experts
yep
well yeah but the ones who arenโt script kiddies are hard to find
oh
Even you can learn a great deal about it in a few weeks (and implement it if coding skills are good)
THe thing is to just have knowledge about the methods involved
idk I just found ML more interesting
me too ๐
I find the lack of interpretability of ML models very interesting, which is one of the reasons why I delved into it
Hello,
Does someone can tell me what I need to change to not replace every rows by Nan please? ๐
.fillna would be a good method to look into
it's just that my comprehension list replace every other variables than the ones with Kbtu by Nan and in don't know why
how much of your dataset is NaN
there aren't before my last code line
@hushed wasp to find the correct columns, try the function .filter(like="kBtu")
that will give you only the columns with that in the name of the column
for the location of the columns it "works" just the Nan replacement I don't know how to solve
df = df[df[[c for c in df.columns if c.endswith('(kBtu)')]] >= 0]
it's this last line which gives me so much nan
O
@hollow sentinel are you still confused about underfitting?
underfitting is where the model hasn't learned enough from the training data
right?
but like what does it mean to not learn enough
@hollow sentinel let's focus on a classic model, linear regression.
How do you know when a linear regression is doing badly?
the mean squared error
And what does that tell you?
how close a regression line is to a set of points
Right right, and in terms of prediction this means how good or bad your prediction is.
So where does underfitting and overfitting come in?
Looking solely at underfitting first.
Let's think about the relationship between weight and the height.
Let's first assume there's a linear relationship between the two.
where f(x) = y, and x = weight and y = height.
Meaning we're trying to use weight to predict height.
How does that theory sound to you?
Do you think it makes sense that if people's weight increases, their height will increase in some linear fashion too?
Hey guys, wondering if i could get a solution to a small problem i'm having with pandas
im trying to use "read_html" on a url but the url is behind a login screen. even when i login with bs4 pandas dosn't recogise it has access pass the login screen. is there another way to do this?
Yes @heady hatch
@hollow sentinel okay now think about what happens if the algorithm predict average of height for all weight.
Meaning f(x) = avg.
How would you describe this algorithm in terms of complexity and the quality of prediction?
Is there anything wrong with the algorithm? What's going to happen with the MSE?
idk
but like what does it mean to not learn enough
@hollow sentinel there is an actual physical relationship between two populations of data (features and target). a model is one "guess" (based on mathematical rules) at that relationship, which we can evaluate.
naturally, we do not have access to the whole population, but only a subset (the datasets that we perform training on)
we say a model is "underfit" when the actual relationship is much more complex than that represented by the model
Got it
hey guys
I have this dataframe
and I want to make it like this
do you have any idea?
i don't get what you mean
oh wait i think i see it now
you wanna sum all the ones that have the same id?
YEAH
i think i remember there being a function for this but i don't remember what it was called
wait no it was just a groupby
df.groupby(['id']).sum()
df.groupby(['name', 'id']).sum()
I know this but it give me the name just one time
I know this but it give me the name just one time
@tawny oak elaborate
did you do
what I said?
YEAH
show the result
it's supposed to be like that
.reset_index()
nope
actually, no
df.groupby(['name', 'id'], as_index=False).sum()
.reset_index works too though
p sure you didn't use it right
name id minutes
0 A 11 3
1 A 13 3
yw
Wanted Ideas for metric: what's a good substitute for false positive rate in a one-class object detection algorithm?
The kicker is: it would be important for this to be model agnostic. And truly capture the essence of "how likely is my model to falsely predict another object where none exists"
Any suggestions or even partial ideas welcome.
How come 1 vs 0 wouldn't work for the metric?
Where it detects the class or it doesn't.
The issue with object detection is that it's not a binary detection. There's the problem with localization as well (where in the image is an object detected). As such, when it doesnt predict a box, it's doing a good job out of an amazingly large number of candidate boxes that were never predicted.
So we don't really compute true negatives for object detection (and if we did it wouldn't be model agnostic anyways) thus leading me to this issue.
Anyone into Kaggle can DM me we could form a team
I was actually thinking of per pixel binary detection.
During inference, you would predict whether the pixel is part of the object you're trying to detect or not.
in terms of localization, it would be part of the extraction to localize on where it thinks the object is, then within the ROI detect the object.
Then you can have an average precision rate of how well it recognize the pixels.
hey
I have a pandas series which type is string
the series is like this
10:30
02:45
I want it hour:minute
could I change data type?
yo so i am a beginner in ML and neural networks and i am currently tryining to create a face recognition neural network with tensorflow and keras
i have finnaly figured out to bring the data in the right shape
but my accuracy is 0.00 sth xDDD
how do i find out which loss functions i should use, which activation functions and how many denselayers
cause i guess thats why i have such a low accuracy xDD
or how many epochs i do need
lol
Perhaps yo ucan help with this question
How do I keep using the return of a function instead of it only being available inside the function
It used the one defined outside the function if that makes sense
hmm I see what you mean
you should store it in a variable and then do the conditions
if you know what I mean
yeah I just thought about that let me try it
after the return dataset you should store that in a variable and then use it
after?
I stored the dataset inside another variable and then returned the new variable
ah I see what you mean ill try that
yea otherwise they are overriding each other
Legend 
Worked?
yeah I see this it just calls the function everytime to have that result
yeah it did โค๏ธ
nice
cheers mate
np.eye() and np.identity() are same right
will it be easy switching from web dev to ai
@lapis sequoia Hey mate I have done web dev for a while not the best in it but have built some websites and now I'm doing a bit of machine learning its not the hardest I have realized so far because some stuff in ML are repetitive but then again my experience is limited in both
Just go for it
ight thx
@lapis sequoia if you have the basic knowledge of python which I am assuming you have because you were doing web dev so it will not be that hard
how do I know if i should use StandardScalar or MinMaxScalar?
@grave path It is one of those design decisions that is frequently not very obvious to me going into a problem. You can always try both, but it adds a lot of compute if you keep trying every combination of methods. I frequently will research the model I am planning to build and see if the documentation recommends normalization or standardization.
Thank you very much
I'm working on a jupyter notebook in pycharm, but it takes up excessive amounts of memory. Does anyone know how to solve this?
I'm working on a jupyter notebook in pycharm, but it takes up excessive amounts of memory. Does anyone know how to solve this?
@spark nimbus Directly use the python terminal. No need of Jupiter in pycharm.
What I prefer is make a different folder and shift+right click and then install all the libraries and jupyter notebook and run the jupyter notebook from that folder itself so it is easier to keep track of stuff
Note : This is just my opinion
@golden saffron No I mean, this is meant as interactive documentation
but this kinda keeps happening every so often
@spark nimbus maybe allocate more memory to PyCharm
oof
hey guys
how would i turn this into a dictionary:
i really want to convert size_name to a dictionary
the output looks like 20511552:10
where the number after the colon is the size
nvm i found it
It is sort of confusing to me to rename something over the size variable inside the loop of sizes. Maybe the initial size variable in for size in sizes should be named differently?
0 10 condo A
1 24 duplex D
2 32 home D
3 25 duplex A
4 65 condo A
how do I turn it in to this^
Price Type City AVG
0 10 condo A 37.5
1 24 duplex D 24.0
2 32 home D 32.0
3 25 duplex A 25.0
4 65 condo A 37.5
i tried groupby type, city and .agg price to mean
And what did that give you?
incompatible index of inserted column with frame index
import pandas as pd
list_values=[[10, 'condo', 'A'],
[24, 'duplex', 'D'],
[32, 'home', 'D'],
[25, 'duplex', 'A'],
[65, 'condo', 'A']]
df_values = pd.DataFrame(list_values, columns=['Price', 'Type', 'City'])
df_values.groupby(by=['Type', 'City'], as_index=False).agg({'Price': 'mean'})
This outputs:
Type City Price
0 condo A 37.5
1 duplex A 25.0
2 duplex D 24.0
3 home D 32.0
Then you need to rename price as 'AVG' and then left join this dataset back on to your original dataset on Type and City.
I always forget if I need to use merge or join. I think one uses the index and the other columns, but I was being a little loose in my language :/
i got what you meant tho, and it was useful!
practically the last piece to my puzzle
What determines the default values of an ndarray when created like this ?
a = np.ndarray((height,width,3),dtype=np.uint8)
is there any book for image processing? but not for opencv, I know that, but some more advanced stuff, detecing peoples or creating haar contours
What determines the default values of an ndarray when created like this ?
a = np.ndarray((height,width,3),dtype=np.uint8)
@alpine bay I wrote that code and it is random
@molten hamlet what do you mean you wrote that code?
Hey @indigo skiff!
It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .flac, .afdesign, .m4a, .csv.
Feel free to ask in #community-meta if you think this is a mistake.
Hey guys i needed Help with assignment which is due within next few hours. I wanted to check if im doing it all right. It's introductory level masters assignment which is asking us to apply DFS, BFS, uniform cost search, best first search and Algorithm A functions along with few more interesting questions. I am not able to attached the assignment. Reading time would be 4-6 mins please could someone have a look. Any help would be really appreciated. I am looking for someone i can discuss this with quickly. I am a new member therefore please do excuse if im not asking this in write place. Unfortunately since i am new member i am not able discuss or use voice chat function therefore would anyone want to volunteer and have a quick discussion please? Thanks again everyone.
@molten hamlet what do you mean you wrote that code?
@alpine bay just print(a) few times and you will see
@indigo skiff create sample problems and code up some stubs. Internet has Bfs and Dfs free for the picking.
@spare lotus What are you trying to do?
@spark dirge are you available for quick discussion? please check message
deta
?
Hey - hopefully, this is the right channel: This is encoded with "ISO-8859-1" What is happening here, and how can I avoid that? (Left Dataset CSV - Right Side Output)
@verbal jetty try encoding='latin1'
Thank you @velvet thorn . but unfortunately the same result
you sure
the encoding is correct?
can you
show
all the arguments
to pd.read_csv
other than filename
@velvet thorn
numbers
got it - works with utf-16
Hello everyone!
i am wondering how to extract validation data from this
self.__Dir_Data = tf.keras.preprocessing.image_dataset_from_directory(self.__Dir_Path ,validation_split = 0.1 ,subset="training", seed = 1, labels='inferred', label_mode='int' ,batch_size=32 ,image_size=(124, 124))
I have following statement, which is getting from specified directory whole Train Data
And I store inside Dir Data the Train Data from directory, Is it possible to extract for example 10%-20% of images to separate Validation Data?
and it is the same for labels
hey
can someone help me iterate into a list?
i think i need to add an if statement to increase right?
wait
Well, looking add the code I expect it, to add the same number 1000 times
its never re-evaluated
yeah
its not called 1000times
only the list it
is*
sorry
the range
dont i just need to do somthing like
data * 1000
and add it to the list
should be correct?
the only thing being its saying array
better use a help channel for this
if you guys have data such as "time, Rates per minute, Rate of penetration, Torque, and Weight"
what type of algorithm do you suggest I use? I was thinking of just plotting the data then being like "when weight spiked at this time, the Rates per minute were increased"
I heard doing that is a type of algorithm called "linear regression"
you guys got any other suggestions ?
Well, plotting alone doesnt have a lot to do with linear regression
linear regression is finding the line, that fits the data (blue points) best

so if you plot your data and the points are arranged like this, linear regression is probably a good model
if it tilts slightly as time goes by
i.e. exponential,
What do you recommend I use then?
well, in most cases you would want to transform your data
and do linear regression after that
so if it looks exponential, you'd take the log
and do linear regression with the transformed values
Ohh, I see. Thanks for your explanation
What library should I use for reading MySQL in pandas?
x = [-2, -1, -1, 1, 2, 3, 4]
y = [0, 0, -1, 1, 1, 0, 0]
plt.xlim(-1.5, 3.5)
ply.ylim(-1.5, 1.5)
plt.plot(x, y)
``` and then some labels too somewhere in there
how can I remove a row in my dataframe if it contain all NaN value? (using pandas)
I tried this but doesn't work:
.dropna(how='all', axis=0)
Just wrote a new blog:
https://greatexpectations.io/blog/data-tests-failed-now-what/
TLDR: The job isn't done after you build your data pipeline tests. This blog goes through the processes necessary after a test fails.
You think all you need to do to secure your data pipeline is implementing some tests and all your data problems are solved? Unfortunately, itโs not quite that easy...
I have one doubt
I have taken a gender classification model
I have columns as 'names' and 'gender'
But for better training, I trained using columns 'starts by vowel/consonant', 'ends by vowel/consonant', 'long/short size'
I tranined it using decision tree classifier
And I saved the model
Now, I sent the model to someone
He knows only that dataset had 'name' and 'columns'
So, he gives predict([test['name'])
Will it return right answer? I mean, will it return gender?
Or he has to give only in way of test['starts with consonat/vowel', 'ends with consonant/vowel', 'short/long word']
Please ping me while saying solution to me
And please do provide me a solution
Any opinions on Dagster vs. Prefect?
Hello, I am using pandas dataframes with the following short snippet: https://pastebin.pl/view/691d9cc0
I am getting the error
rsi.py:17: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I have tried using the .loc method, but It hasn't worked thus far. it says I'm making a copy, but I'm not sure how or where.
how can I remove a row in my dataframe if it contain all NaN value? (using pandas)
I tried this but doesn't work:
.dropna(how='all', axis=0)
@shy mesapandasmethods create copies
they don't modify inplace
you need to reassign to the original variable or add inplace=True
@smoky fractal you're doing it at the start
symbolData = symbolData.tail(bars)
which is equivalent to symbolData.iloc[-5:]
anyway
your code could be improved a lot IMO
ah, how can I isolate only the most recent x rows?