#data-science-and-ml
1 messages ยท Page 264 of 1
key phrases
But not a set number, and the order matters for when and where to split
Hey guys, not sure if this is the right place to ask for help, but here it goes: I want to make a small "app" to read data from a database. That I know how to do, but my problem is that I want to show it on a graphic, anyone free to help me?
!rank
Iterating over range(len(...)) is a common approach to accessing each item in an ordered collection.
for i in range(len(my_list)):
do_something(my_list[i])
The pythonic syntax is much simpler, and is guaranteed to produce elements in the same order:
for item in my_list:
do_something(item)
Python has other solutions for cases when the index itself might be needed. To get the element at the same index from two or more lists, use zip. To get both the index and the element at that index, use enumerate.
Anyone here familiar with DVC ? I got an ML project (gonna be creating a CNN for a final), and we're wondering if it's worth learning to use versus just using regular git
im not familiar with some of the things theyre pointing out in their Getting Started tutorial, so im a little lost with how we can integrate this with out project
@steel palm
i need to write a script to download a pandas df
like i have a flask app, it queries mysql db, places the result in a pandas df
i then convert it to xls, because my goal is to have the user be able to downlaod that xls file
idk how flask works so idk how youd get them to download the file but you can convert the df to xls by using df.to_excel()
Hello guys, is here someone willing to help me how to make appending to a list from csv.reader faster? I profiled my code and 90% of time is in a block of if-else with type control and then appending to a list. It is kinda annyoing as i have aroun 500k values
@hollow ferry Maybe use pandas.read_csv() instead? Then do type conversions one column at a time, and export contents to lists if that's how you want them
i am supposed to use csv.reader only :/ but i dont think its problem with csv reader i think i am just looping to much and appending too much 
Ah. Well what's the code?
reader = csv.reader(
TextIOWrapper(
opened_csv,
'iso8859-2'),
delimiter=';',
quotechar='"')
all_rows = list(reader)
for row in all_rows:
for i, value in enumerate(row):
if data_types[i] == float:
try:
region_data[1][i].append(
float(value.replace(',', '.')))
except ValueError:
region_data[1][i].append(np.nan)
elif data_types[i] == int:
try:
region_data[1][i].append(int(value))
except ValueError:
region_data[1][i].append(-999999)
else:
region_data[1][i].append(value)```
i have to do there that replacements and type controls...also data_types[] is a list with desired types matching column order
What's with the TextIOWrapper? Also, you don't need to make a list of all rows, you can iterate over the reader:
for row in reader:
oh i am opening csv inside zip files and without that wrapper it screamed some bytes error
yeah you were right with that unnecessary conversion to list, but still it wont make much difference
maybe around 5%
or my computer has warmed up ๐ and its no diff
umm
how would you
do backpropogation for this
from matplotlib import pyplot as plt
def NN(m1,m2,w1,w2,b):
z = m1*w1+m2*w2+b
return sigmoid(z)
def sigmoid(x):
return 1/(1+numpy.exp(-x))
w1 = numpy.random.randn()
w2 = numpy.random.randn()
b = numpy.random.randn()
given that the input is a list of lists with two elements
??
self.w_h -= self.eta * delta_w_h
self.b_h -= self.eta * delta_b_h
self.w_out -= self.eta * delta_w_out
self.b_out -= self.eta * delta_b_out
z = dot(x, w) = dot(wT, x)
o = step(z)
dot = x1*w1+x2*w2
derivative = o * (1. - o)
[[3,1.5,1],
I can only type now
@final garnet I can only type now sorry
hmm
sorry
umm
sorry I have to go
I can still type
tho
@final garnet
@sweet owl
# Calculate neuron activation for an input
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation
# Transfer neuron activation
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))
# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs
# Calculate the derivative of an neuron output
def transfer_derivative(output):
return output * (1.0 - output)
# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
thanks
Hi, I am trying to web scrape data and this is my first project. Here is the HTML I am trying to scrape.
<tr>
<td class="posterColumn">
<span data-value="47" name="rk">
</span>
<span data-value="8.459750370323722" name="ir">
</span>
<span data-value="1.1610432E12" name="us">
</span>
<span data-value="1166711" name="nv">
</span>
<span data-value="-2.5402496296762784" name="ur">
</span>
<a href="/title/tt0482571/">
<img alt="The Prestige" height="67" src="https://m.media-amazon.com/images/M/MV5BMjA4NDI0MTIxNF5BMl5BanBnXkFtZTYwNTM0MzY2._V1_UY67_CR0,0,45,67_AL_.jpg" width="45"/>
</a>
</td>
<td class="titleColumn">
47.
<a href="/title/tt0482571/" title="Christopher Nolan (dir.), Christian Bale, Hugh Jackman">
The Prestige
</a>
<span class="secondaryInfo">
(2006)
</span>
</td>
I bolded the title I want to get. Here is my code:
def movieLinks(content, classTitle):
table = content.find('table', {'class': classTitle})
rows = table.find_all('tr')
for row in rows:
cells = row.find_all('td')
if len(cells) > 1:
movie_link = cells[1].find('a')
print(movie_link.get(title'))
This is the output:
Christopher Nolan (dir.), Christian Bale, Hugh Jackman
I do not want this information though. I just want the movie title. Any idea of how to do this?
you using bs4?
In this video we walk through web scraping in Python using the beautiful soup library. We start with a brief introduction to HTML & CSS and discuss what web scraping is. Next we start getting into the basics of the beautiful soup library. This includes how to load a webpage, t...
this might be helpful
In this Python Programming Tutorial, we will be learning how to scrape websites using the BeautifulSoup library. BeautifulSoup is an excellent tool for parsing HTML code and grabbing exactly the information you need. So whether you're pulling down headlines from news sites, sc...
I've still been using Kaggle datasets lol
I am following a tutorial and it had me using beautiful soup for a couple parts but not this part. I just don't know how to identify the title portion
idk haha maybe someone else knows
I will watch these videos though. Maybe it will contain the answer. Thanks for the help
no problem
I'm getting this ValueError: could not convert string to float: 'Pentax Optio 430RS'
when i try to run this lm.fit(X_train,y_train)
i thought I dropped the model column
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
I thought I dropped the models Idk why it's freaking out now
send help pls
@half mountain ```
test = ''' <td class="titleColumn">
47.
<a href="/title/tt0482571/" title="Christopher Nolan (dir.), Christian Bale, Hugh Jackman">
The Prestige
</a>
<span class="secondaryInfo">
(2006)
</span>
</td> '''
title = []
parse = BeautifulSoup(test, "html.parser")
cells = parse.find_all('td')
for x in cells:
title.append(x.get_text(separator=','))
it will get you all the text values including the title
if the structure of the whole html is the same you could easily remove the unwanted data so that in the end you only have left the titles
if anyone figures it out lmk
@hollow sentinel because your trying to convert a string to a float, so instead maybe try to use a labelencoder for example to convert string values to numerical values to it can be used in your model
idk how to do that
@hollow sentinel for example see: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
@chrome barn what does it mean to normalize values
@hollow sentinel see https://www.askpython.com/python/examples/normalize-data-in-python for an example and little explanation
i'm confused on how to use it lmao
ValueError: bad input shape (1038, 13)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(cameras)
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
and what is the full error you get now
ValueError: bad input shape (1038, 13)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
df["camera"] = le.fit_transform(cameras)
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
#y.shape
#X.shape
i don't get the df["camera"] = le.fit_transform(cameras) line
cameras is the entire dataframe
which column was giving the error before about the string to float error
that is the column that you use the label encoder on
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit_transform(cameras["Model"])
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
#y.shape
#X.shape
it was the model column
but i still have the same error
ValueError: could not convert string to float: 'Pentax Optio 430RS'
le.fit_transform(cameras["Model"]) this doesn't do anything you have to assign to to the dataframe back again
cameras['model_label'] = le.fit_transform(cameras["Model"])
then drop the Model column
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
i dropped the model column before in my code
should i delete that line to drop it there
what is the code that you dropped it with?
cameras.drop(["Model"],axis=1)
it didn't drop it
cameras = cameras.drop(["Model"],axis=1)
or cameras.drop(["Model"],axis=1, inplace=True)
AttributeError: 'numpy.ndarray' object has no attribute 'drop'
your dataframe is probably not a pandas dataframe object anymore but has been converted to an numpy array
uhhhhhhhhh
your using jupyter notebook to work in
yes
how does the whole code look like because this way it is hard to understand what you are doing
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
cameras = pd.read_csv("camera_dataset.csv")
cameras.head()
cameras.tail()
cameras.isnull().sum()
cameras= cameras.drop(["Model"],axis=1, inplace=True)
sns.countplot(x=cameras["Max resolution"],data=cameras)
sns.jointplot(x="Max resolution", y="Low resolution", data=cameras)
sns.lmplot(x="Max resolution", y="Low resolution", data=cameras)
sns.heatmap(cameras.corr())
sns.pairplot(cameras)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
cameras["model_label"] = le.fit_transform(cameras["Model"])
cameras.drop(["model"],axis=1)
y = cameras["Price"]
X = cameras.drop(["Price"],axis=1)
#y.shape
#X.shape
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)
sorry it's long
there are 3 options on how to do a drop in a dataframe
cameras = cameras.drop(["Model"],axis=1)
cameras.drop(["Model"],axis=1, inplace=True)
cameras.drop(columns=['Model'], inplace = true)
cameras= cameras.drop(["Model"],axis=1, inplace=True) replace with cameras.drop(["Model"],axis=1, inplace=True)
also your trying to remove the model twice
first in row 9 and then later again in row 18
you can only drop a column once
also since the model column is dropped remove the whole label encoded
encoder
because you got the string error before because you didn't remove the model column properly from the dataframe
KeyError: "['Model'] not found in axis"
i have that from when I tried to remove it in the first place
cameras.drop(["Model"],axis=1, inplace=True)
use that not cameras= cameras.drop(["Model"],axis=1, inplace=True)
remove the cameras = part
cameras.drop(["Model"],axis=1, inplace=True)
that's what I'm using
nope i'm still getting KeyError: "['Model'] not found in axis"
is the dataset from a public source?
yep kaggle
can you link the url to the kaggle dataset that your using so i know for sure that we are looking at the same dataset
give me a moment
@hollow sentinel i had this issue before, try doing cameras.drop([cameras.columns[0]], axis=1, inplace=True)
did you reload your notebook again?
because probably your trying to delete the model column while it is already deleted
could be that
KeyError: 'Model'
import pandas as pd
cameras = pd.read_csv('camera_dataset.csv')
cameras.drop(["Model"],axis=1, inplace=True)
which means it's no longer there
works for me drops the model column
should i close the notebook and reopen it?
when i had the issue it was because the name of the column had some sort of whitespace or encoded character, so when i called it on the column name from df.columns it worked
yeah his issue is probably that he already removed the column
yes
oh i think i got it lmao
the key error was bc i was trying to do a seaborn countplot with the Model column
F
ayyyyy it works thank you guys
y'all are the best
your initial error with the string to float was caused that you didn't remove the Model column properly in your code
did you also remove the label encoder in your code
cameras = cameras.drop(["Model"],axis=1)
cameras.drop(["Model"],axis=1, inplace=True)
cameras.drop(columns=['Model'], inplace = true)
remember choose 1 for dropping in the future
got it
and yes the label encoder is gone
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
UHHHHHHH
theres probably nans in there
.isnull().sum() returns how many NANs there are right
yeah
there are a few nan's not that many
yeah if theres only a few you can just drop them
or fillna
well that would be bad in this case
cus you cant really average the columns and just assume the specs of a camera
but since there arent many then dropna wouldnt affect it much
that makes sense
it will go outside the scope but you could like at the model and see if the nan values have a comparable camera type to make a prediction from for the nan
but in this case it won't matter to drop them
what is a heatmap and a pairplot supposed to show
like the darker the color is in the heatmap means ...
also to take into account next time daspecito you also have 0.0 values in your columns for example in weight
from a common sense the weight of a camera is not 0 but now your putting it in the model
0.0 then means probably it was also a nan values but somebody before you already put those nan values at 0
so in your current value you have nan classified as empty and 0.0
value=dataset
like what it weighs a human is 100kg like that
oh you mean like a column in the dataframe
i mean like weight in general in machine learning
oh no i am talking about your dataset
and since your doing linear regression the 0.0 values in your dataset that don't make sense are essentially nan values and will affect the model
but that is for next time just try to get the model/code to work for now
anyone have a favorite or obscure pandas tricks? I just learned about read_clipboard() and to_clipboard() the other day. Curious what other cool and lesser known tricks are out there
there was a whole post of obscure tricks including those on reddit r/learnpython a couple weeks back
can someone explain the folowing line of code. it will crate a list with the moving average, x being the list and n the number of added values: pd.Series(x).rolling(window=n).mean().iloc[n-1:].values
@rigid phoenix this is mostly done via the roling method it seems. Did you read the pandas documentation on that? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html
@neat basin do you have an easy explination for rolling window calculations?
I'm pretty new to pandas myself, maybe one of the seasoned people can explain it as I haven't used it myself
okay thanks for your help
@neat basin do you have an easy explination for rolling window calculations?
@rigid phoenix basically
it's like an overlapping groupby
so like say you have 10 rows
and your window size is 3
and you want to calculate the rolling mean
you'll get a result with 8 rows
first row is the mean of original row 1, 2, 3, second row is the mean of original row 2, 3, 4, and so on
Hi, I was wondering if I could get some tips on how to merge a dataframe based on id and date but with a twist.
I need the same dataframe to have rain data from the nearest station, and if there are dates with no rain data then I need to fill in those rows with data from a secondary station.
I've tried combined_first, mixing merge with isna() and a lot of other stackoverflow stuff but no luck so far.
Below I have some example code that will hopefully help.
import pandas as pd
level = pd.DataFrame(
{
"date": ["2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05"],
"id": ["asset1", "asset1", "asset1", "asset1", "asset1"],
"level": [0.1, 0.25, 1.2, 1.3, 0.5]
}
)
df2 = pd.DataFrame(
{
"date": ["2020-10-01", "2020-10-02", "2020-10-04"],
"id": ["usgs1", "usgs1", "usgs1"],
"rain": [0.1, 0.2, 0]
}
)
df3 = pd.DataFrame(
{
"date": ["2020-10-01", "2020-10-03", "2020-10-05"],
"id": ["nws1", "nws1", "nws1"],
"rain": [0.5, 1.0, 0.89]
}
)
rain = pd.concat([df2, df3])
rel_table = pd.DataFrame(
{
"id": ["asset1"],
"nearest_rg": ["usgs1"],
"secondary_rg": ["nws1"]
}
)
solution = pd.DataFrame(
{
"date": ["2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05"],
"id_level": ["asset1", "asset1", "asset1", "asset1", "asset1"],
"level": [0.1, 0.25, 1.2, 1.3, 0.5],
"nearest_rg": ["usgs1", "usgs1", "usgs1", "usgs1", "usgs1"],
"secondary_rg": ["nws1", "nws1", "nws1", "nws1", "nws1"],
"id_rg": ["usgs1", "usgs1", "nws1", "usgs1", "nws1"],
"rain": [0.1, 0.2, 1.0, 0, 0.89]
}
)
display(
"Level",
level,
"Rain",
rain,
"Relation Table",
rel_table,
"Solution",
solution
)
this would probably be better for a help channel
@rare portal
intermediate = level.merge(rel, on='id').merge(rain.rename(columns={'id': 'nearest_rg'}), how='left', on=['date', 'nearest_rg']).merge(rain.rename(columns={'id': 'secondary_rg'}), how='left', on=['date', 'secondary_rg'])
intermediate.drop(columns=['rain_x', 'rain_y']).assign(rain=intermediate['rain_x'].combine_first(intermediate['rain_y']))
I renamed rel_table to rel
it's a bit messy since I just did it but you can probably clean it up a little
@velvet thorn Thanks! Looks interesting. I'll try it out and report back soon.
edit: Reporting back and it works like a charm!
@velvet thorn the real MVP today
So, was learning seaborn recently and something that is really bothering me is that they discontinued the .annotate() method and the integrated display for statistical significance values. Have they replaced this with anything so I can still have my plots display p-value or is this functionality just completely gone?
@neat basin @velvet thorn is always a MVP
damn right
hahaha
yeah, I'm fairly certain linear regression isn't supposed to look like that
hahaha ๐
so we have an answer then you can't predict price from Release date Max resolution Low resolution Effective pixels Zoom wide (W) Zoom tele (T) Normal focus range Macro focus range Storage included Weight (inc. batteries) Dimensions
maybe trying eliminating the outliers at the 5k and 8k Y test
you can use .set_xlim()
so try setting the axis to .set_xlim([0,2000]) just to see if the model is working at all
here is an example
hmmmm
so just do plt.set_xlim()
AttributeError: module 'matplotlib.pyplot' has no attribute 'set_xlim'
hmmmm
maybe it's bc i didn't put an index
this is what I get for playing phasmophobia and trying to do python at the same time
no problem
@hollow sentinel here is the doc to figure it out https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.set_xlim.html
I'm not entierly paying attention so probably someone who isn't just running off 2 brain cells can help
alright thank you
haha yeah i'm new to this stuff I just decided to practice linear regression
it's pretty neat
right
though seaborn is better for statistical plotting
definitely
i heard matplotlib gives you more control
I think seaborn is prettier
it does, but there is only so much control someone needs
lmplots from seaborn are great for linear reg
oh yeah
oh yeah
yeah I could've used the max resolution to predict the min resolution
but i wanted to do price
It might not be a regression type problem
actually i take that back idk
Hey, I'm trying to do something for my school project. I had to choose between tkinter and matplotlib or pyqt and pyqtgraph and I choosed the second option. I managed to get the mysql connection working and make the graphic read data from there but now my problem is how to update the data since I'm going to use an ESP32 to import the data to the database. Basically I want to make a live updating graphic. Any idea how I can do that?
Python Programming tutorials from beginner to advanced on a massive variety of topics. All video and text tutorials are free.
maybe that's helpful idk I've never tried that before
I think pyqtgraph is more suitable for my project
oh that's a graphing library i'm not familiar with
are you allowed to use seaborn?
i like seaborn lol
I can use whatever I want, but one guy here told me that those two might be better for me
Since I'm a beginner in python
oh ok
since you said you wanted to do PyQt5
I'm open to any tool I could use since the main goal of this project is to learn ๐
that might be good
I followed the second link, in fact, I used part of the code
nice
And the mysql part I did myself since I already worked with mysql before
do all the plots look like you're still running DOS for pyqtgraph?
Works like a beauty... My problem is the "self update" part
do all the plots look like you're still running DOS for pyqtgraph?
@neat basin I didn't understand your question, sorry
it was a joke because the plot looks like it's being rendered on a submarine radar
lmao
Now it's all I can see ๐
if it works and no one else needs to see that it looks like someone had a stroke when opening MS paint then you're good
why i prefer seaborn lol
I'll take a look at seaborn
seaborn is built for statistical plotting, what you using it for here?
seaborn is actually stupid easy, but yeah you're doing fine for dynamic plotting
still have no idea how to do annotations not that they got rid of the function
It looks like it's hard to find someone that worked with pyqtgraph already ._.
they don't display p-values by default anymore and that irks me
F
rip, maybe ask a helper?
F indeed
try going to a help channel
just go to python help: available and click a channel underneath it
Occupied
And how I use it? xD
just ask your question
there is an entire channel showing you how to use it
Thank you guys, I'll see what I can do
Where is a good place to learn about neural networking?
I learn better if I learn how the components work and how to change them rather than being told to make a certain project and then using those skills to develop my own thing
hi i need some help with google colab and YOLO object detection
I learn better if I learn how the components work and how to change them rather than being told to make a certain project and then using those skills to develop my own thing
@heavy light like the base theory?
I can reco a book or two
when i train the darknet model my google colab webpage completely freezes after a few iterations and i cant do anything and the page shows as not responding
hi i need some help with google colab and YOLO object detection
@wheat seal fair warning though, that's a relatively advanced topic
not really sure how many people will be able to help you
I would but I need to go soon
Essentially, yes, but I don't want so basic and esoteric that I don't understand what's going on
Essentially, yes, but I don't want so basic and esoteric that I don't understand what's going on
@heavy light try Deep Learning With Python
by Francois Chollet
he made keras
I'm not familiar with keras though I'll take your word for it
I'm not familiar with keras though I'll take your word for it
@heavy light have you worked with any DL libraries?
when i train the darknet model my google colab webpage completely freezes after a few iterations and i cant do anything and the page shows as not responding
@wheat seal the notebook itself freezes?
like you stop getting output?
yes
that's unusual
I'm not really sure how Google Colab works but
are you hitting memory limits or something?
I have not, in fact this is one of my first ventures out of the basic libraries not counting one I had with libtcod while trying to make a roguelike
I have not, in fact this is one of my first ventures out of the basic libraries not counting one I had with libtcod while trying to make a roguelike
@heavy light what's your mathematical background like
im not sure
hm
is it because of the output that darknet gives?
my best guess is that you're running out of memory or some other resource...?
because it gives a lot of output
but I can't say with any certainty
ok
not being a Google Colab user
thanks
if that's the case
try a less verbose mode
and see if it works?
because it gives a lot of output
@wheat seal to test this
ok ill do that
I'm in basic algebra but I can understand more complex stuff
Don't fuck around in school is my lesson
I'm in basic algebra but I can understand more complex stuff
@heavy light before you get into deep learning
I think linear algebra, discrete mathematics, statistics, and calculus would all be good to know
Hm okay
to add to that, a lot of the stats involved depend on your model and i/o
to what you'll need to learn will vary from project to project
Okay
Well, where can I find more advanced projects? All I've been seeing is the basic "make the computer pick a number between 0 and 1 that's closest to 2" or something
imo, prediction with real world variables (pop., economy, etc, etc) and image/voice recgn (voice is definitely a bit more science oriented and a bit more advanced)
Good because I don't really want image or voice recognition in whatever I choose to develop
why not?
Well, where can I find more advanced projects? All I've been seeing is the basic "make the computer pick a number between 0 and 1 that's closest to 2" or something
@heavy light wait people actually made something like that
pick a number from 0 to 1 thing?
most common example is having a list ei: [0, 1, 1, 1] = 1 and predicting the result
I haven't seen any other basic examples actually
It's just not the kind of thing I wanna do, as I'm more into chatbots and essentially computers you can communicate with and teach via input rather than, like, verbal input
btw @velvet thorn i am clearing the output every few mins of the training session i started before i asked for help here and it works great now thanks
I haven't seen any other basic examples actually
@bitter harbor that's the one ye
the project idea that got me into this was a voice recgn program that would parse speech into code/text/commands
NPL's a bit more advanced definitely
avg loss at 420
at iteration 113
haha funny cause 4+2=6 and 6 sounds like sticks
lmao
It's just not the kind of thing I wanna do, as I'm more into chatbots and essentially computers you can communicate with and teach via input rather than, like, verbal input
The difficult thing with chatbots is they have to be able to recognize context
Yeah
That's what I'm interested in doing actually
I wanna see the struggles of designing a chat bot that actually works
@wheat seal is colab giving you any sort of error
so it isn't timing out anymore?
I mean if it works it works ยฏ_(ใ)_/ยฏ
Considering that and 'common' issues it seems like you're just running out of mem
im not too sure, I'd assume it would vary depending on the model but I found this from a YOLOv2 article:
"0.451929 avg is the average loss error, which should be as low as possible. As a rule of thumb, once this reaches below 0.060730 avg, you can stop training.```
Ok thanks
@woven jacinth look at the 2nd line after import statement. You have accidentally put r infront of the file
@woven jacinth ...it says the file's not found
@woven jacinth look at the 2nd line after import statement. You have accidentally put r infront of the file
@supple pond p sure that is correct
if you don't use raw strings for paths
you need to double backslash for directory separators
i dont understand this part.. how do these tenserflow shape changers work and the syntax of the code
@lapis sequoia did you change the column names properly?
thatโs the only possible cause of that error
Ehh, it's weird
I am getting a plot I plotted earlier when I try to plot the RPM, dunno what's going on
with this error
"Models must be owned by only a single document, ColumnDataSource(id='1209', ...) is already in a doc"
lmao, odd bug. After searching for a couple mins, the solution was "reload the page"
Any idea what this caret shape is saying. He's talking about joint probabilities and marginalizing. I've never seen the notation before though.
I'm pretty sure the correct notation would be p(s1 = sunny, s0 = sunny) no caret needed unless it is pointing to something else. Like marginal probability is equal to the sum of the joint probabilities
I think given is |
And I don't think given wouldn't work here since p(x,y) = p(x|y)p(y), so p(s1) = sum(p(s1|s0)p(s0) for all s0 would be the correct one rather than what he has here
@velvet thorn also, the code you gave me did not work, sadly
my code here https://paste.pythondiscord.com/wocimumuja.py
Traceback (most recent call last):
File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
resp = resource(*args, **kwargs)
File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
return self.dispatch_request(*args, **kwargs)
File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
resp = meth(*args, **kwargs)
File "E:\demo3\recDoc1.py", line 266, in post
if labels == predictions:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()```
plz ping me if u have any suggestions or solutions
I have this data which is week day frequency. I want it to be plotted as simple as just weeks with number of times it's frequency (for example monday is 36446)
I also don't get this histogram, like what is 2.00 ?
you need to swap the axis around, and the 2.00 is your weeknames you have 2 weeknames friday and saterday that have almost 30k+ counts is it is plotted as frequency 2.00
same for tuesday and wednesday little over 34k+ each
Hi guys, I have a table that looks like this (attached) I want to change the 'quantity' part of the multindex into a a sub-level on the columns how can i do this?
@velvet thorn also, the code you gave me did not work, sadly
@lapis sequoia huh why not
I tried it on a small dataset and it did
what happened
aha figured it out
flatten the df
df2.pivot_table(index=cols, columns='quantity')
ez
i need help converting strings to utf8 binary
all the methods i use leave out 0's and it doesnt work
how can i make operations for every row in df like cloumn1 / column2?
there are several options easiest will be:
df['column3'] = df['column1'] / df['column2']
oh thanks
Turns out the symbol means join/and. So itโs the same as p(x,y)
you need to swap the axis around, and the 2.00 is your weeknames you have 2 weeknames friday and saterday that have almost 30k+ counts is it is plotted as frequency 2.00
@chrome barn
I'm halfway through understanding this, if you explain this a little bit more I will get it buddy
like why I have 2 friday?
that's exactly what I don't get.
I want it to be simple just 7 bars
@trail kite you don;t have 2 fridays in the graph, closely look at the graph and what it tells you
what is the horizontal axis in the graph referring too....
ah okay, it should be reverse then, right? as you said
I simply want it to be like this, if I reverse it, it's fixed right?
i don't know if it is fixed but yes currently the amounts you have are horizontal and need to go to the vertical axis
you know what sucks? when I chain functions, when hover on them, I can't see the function docs, like parameters and such (vscode - jupyter)
which parameters do I give it to reverse it?
it looks like your trying to use the pandas plotting function I don't use it to create my graphs so look in the documentation of the pandas plotting
today I just learned about the UCI Machine Learning Repository
I think I like Kaggle more since I can't see what other people did with the data set
@velvet thorn i printed out the table before and after, the code went through but the table remained the same
@hollow sentinel I managed to make the pyqtgraph update if there is a change in the database, works like I wanted/needed it โค๏ธ
(Sorry for the ping)
@unreal lily haha no problem man happy for you
Hey guys question on dealing with large json file.
So how do you deal with a json file that has a weird format in it?
The json file I'm working with has it in the format of
{
"random_key": [
rest of the records
]
}
https://www.codeproject.com/Questions/1280095/Data-display-on-strange-format-how-to-convert-it-t @heady hatch idk man tbh
idk what to do with json other than json loads/json dumps
hahaha @hollow sentinel , love your enthusiasm.
After talking to a few people, it seems like there's no way around it unless I can get the file without that random key.
I'm currently trying to do a streaming method to see if I can somewhat get around it.
@heady hatch haha i know a little bit of JSON files from freshman year, but I've been sticking with good ol CSVs from Kaggle
Oh, I don't want to come off discouraging. But do be careful not to get too conditioned to Kaggle.
It's not the best representation of Data Science and work.
I love Kaggle, but it's a quick slice of what the work could be like.
Part of the issue is the data is often really clean.
But it is good place for understanding the different techniques that other people uses.
@heady hatch yes i'm learning Selenium and bs4 so I can scrape my own data like a big boi
Not to give a cheeky answer, but depends on where I need it from.
amazing
:^)
built different
To give you an example. Recently I wanted to do a project on folklores and fables.
So I looked for websites that hosted that kind of text data.
I try to mainly focus on NLP.
But I'm also working with computer vision due to work.
I want to learn it but I want to focus on the basics first
i wanna be comfortable using the basic machine learning algorithms
Get your basics in.
do so many linear regressions i can do it in my sleep
hahaha
Algorithm is one thing, but pay attention to what the model is doing.
Such as overfitting, underfitting, validation, data distribution.
yeah i keep having myself refer to old code i'm not sure if that's good or bad
It's a point of reference until you built your musckles.
Another thing to learn is don't worry so much about getting the right and wrong answer.
haha yes
DS/ML is super malleable.
And I think it's heavily empirically driven.
I mean there's a lot of research behind it too.
But you'll probably get most of your experience from playing with the data and models than reading about it.
haha yes my ML research professor paid for the udemy course i'm taking now
Oh!
It's okay, he's a forever teacher and you're a forever student.
College is just formality. Pay it back to him with good ML work.
i'm always gonna be a student hahaha it's machine learning
:^)
@heady hatch do you recommend any books to learn machine learninig
although the books tend to put me to sleep
hahaha
If books put you to sleep, I'd stick with MOOC.
There's Google Crash Course.
once i finish the Portilla course I wanna do the Ng course
Ng is more focused on the theory behind machine learning
Portilla is more execution
I love Ng's course.
that's the seal of approval I need
and then I'll do the google course after bc i'm built different
and mostly bc i hate myself but that's for another day
what date time format is this (and timezone I guess more specifically)
2020-10-26 19:59:52
Can I use R with python?
I think that's the "yyyy-MM-dd HH:mm:ss" format. Can't tell the timezone without more information.
https://www.quora.com/Can-I-run-R-in-Python @slender nymph
very cool
just trying to figure out how to handle these date time values with python and mysql
im querying a mysql db and trying to change the way the user inputs time data
very nice
i already have it set to the default syntax that mysql uses, but obviously i don't want the user to have to follow mysql, i want python to convert it so that mysql can read it
To clarify you want python to convert it to insert into MySQL?
the other way, im pulling data from mysql
the user input currently is set to match mysql syntax. but that's not really intuitive for the user
Ty @hollow sentinel
since it is yyyy-mm-dd hh:mm:ss
im trying to get it to mm-dd-yyyy hh:mm (user time zone)
all users will be est
To convert it into EST, that will require manual computation. Or maybe there's a library.
theres def a lib somewhere
But datetime should be able to do what you're looking for.
Along with converting it to EST.
thx
I'm optimizing a TensorFlow project I found online and so far I've improved it's speed by over 600% but I'm not sure where to stop. Is 0.5 seconds to identify a couple letters in a small picture considered slow for CNNS?
Yup
It was written by someone who didn't even know list comprehension ๐ but somehow made an impressive cnn
basically one line optimized for loops
yeah i just googled it
Hard to say if it's good or not.
Because the tf project could be badly written and you've just optimized it. In regards to 0.5 seconds to identify couple letters, I don't know if there's a benchmark for recognition latency, but it would be useful if there's some kind of baseline for comparison.
Maybe even to the project's constraint.
Because let's say you need the identification to be less than 0.1 seconds. Then 0.5 seconds relatively, to human or even the original project is pretty quick, but then to the results you're trying to achieve, it's still not feasible.
On the other hand, I will give you kudos for optimizing projects. Great work.
Hm I see what you're saying
With the end goal of using it with async the 0.1 would of course be perfect but in the short time I posted that question I printed some durations of different functions, and alas it's the run method which takes the complete bulk of the duration
Hi I was wondering if someone could help me out with the regex library.
new_seo = pd.read_csv(path + r".*us-" + str(date) + r".*")
the string should be grabbing something like the below:
organic.Positions-us-20201026-2020-10-27T18_26_04Z.csv
When I run the code, I keep getting the file is not found
I think it might be because it's reading path + .*....
in python, r'string' is raw string.
It's often used in regex to not have to escape characters.
I think you might be thinking of glob.
You can make it into a Path object and then use Path.glob(f'*us-{str(date)}*'), maybe something like that.
ok thanks i'll read into the glob library
tell me how it goes, I want to get into tensorflow soon I hope
@drifting hemlock no problem
I'm just going for a general understanding of machine learning by doing the udemy course
I'll come back and do personal projects later
Which course? I kinda want to start with https://www.fast.ai
if you want a general overview of machine learning Python for Data Science and Machine Learning Bootcamp by Jose Portilla is pretty good
it's on Udemy
also Andrew Ng on coursera I've heard a lot of good things about
there's also mini Kaggle courses you can take
I've been meaning to take this for a while https://www.udemy.com/course/deep-learning-tensorflow-2, I've heard good things about it
maybe i'll look at it after I finish the Ng course
so how come when i do this, a shows up as a tensor with a value of 0.85, but when i do a.item() it returns 0.8500000238418579
there should be no like hidden super tiny decimals because the function error() just returns the differences between each value in x and y
i'm just testing it in the repl
and idk why its not exactly 0.85
whats data science?
statistics machine learning basically just analyzing data or something i dont know i aint good at it
I need some help on building ML pipeline
I figured it out it was just the precision, when i used double precision it worked
Interesting.
I was trying to follow along and trying to replicate it in tf. I wonder if it's just a thing in pt.
Or maybe I'm doing something wrong.
@lapis sequoia What do you need help with?
me just now realizing how i didnt need the function at all and i could just do sum(abs(x-y))
We're all learning new things.
lol its always the basics that i just overlook
@heady hatch
I research the basic steps to build ML pipeline yet ain't quite sure, so let me break it down to multiple questions if possible. In Data preparation, what is the best practice for it? Do people usually build this step as a microservice then send the data to next step? Are there any frameworks / libraries available already?
Depending on the scale of the project and how it's broken down. Some people even have different definitions for ML pipelines.
So one question I have is what is yours.
Or if you're unsure here is how I structure my projects.
I usually set up two pipelines, one for data and one for ml.
data pipelines deal with extraction, transformation, and load.
then from the ML pipeline, it takes the data and gets it ready to be modeled. for text and images, it would be preprocessed, shuffled, split, batched etc.
Then sent it off for training.
๐
Similarly with tabular data.
How would unsupervised ml pipeline be different to supervised ?
Depending on the unsupervised problem.
Clustering
One thing that comes to mind is algorithm difference.
which then makes the data prep different.
Oh I guess something that comes to mind is testing and debugging.
I'm not quite sure how you'll test unsupervised machine learning pipelines.
For me there are 4 major steps in ML pipeline.
1.) Data Preparation which would define Train/Test and Featureization
2.) Build and Train the model including hyper-parameter turning, testing and validation.
3.) Deployment
4.) Monitoring
Am I thinking too narrow ?
I was thinking using mlflow to abstract 2 and 3 if possible
That sounds like the basics of it, but there are a lot to think about too.
Such as experiment tracking.
Agree. Do you break down part of the pipeline as its own microservice or just integrate it in large code base
I've never used mlflow but maybe for things like tracking and hyperparam search. But it also depends if I can build my own.
like for data preparation, if the data is couple gig, it would be a bit of hassle to send it to a microservice and wait for it to return.
Unless you have some other definition for microservice.
Make sense
Right now I am still trying to grasp the idea of building a basic ML pipeline and integrate it to my personal projects, so I am really new to this area
Lots of exploration, what I'm saying is probably not best practice.
i see
I'm a big fan of building stuff on my own before using libraries.
btw, are they all implemented in Python? I know Tensorflow has support in Java, but do you see people using it on Java?
Within my circle, it's all python for ML.
I think it's largely because of the library support.
so usually it would be packaged up to have it as a backend.
If the model is too big, then we'll use a distilled version.
Will building your model in keras decrease the performance of the model? I know you can use purely tensorflow for it
I don't see any decrease in performance.
I usually use tf + keras, or in v2, just tf since it swallowed keras.
right
I usually use keras because of the abstraction.
Because I think calculating stuff myself will introduce a lot of human errors.
Yea that's why I prefer Keras too it makes my life lazy
hahaha
Do you need to use Panda for featurization
For text and images, I use pandas for visualization and exploration.
Maybe for tabular data.
But sometimes it gets too big, and I have to work with Dask.
To answer your question, no.
I see.
Probably I have enough idea to continue researching on my own on building the pipeline.
Thanks for the insight.
Yea definitely.
I was just going to touch up on when I build classical ml pipelines
I use sklearn or library pipelines.
What are the library pipelines you would recommend?
oh sorry I meant like how SpaCy has their own pipeline objects.
same with sklearn.
Instead of writing one with Pandas.
NGL SpaCy has strong features for data preparation
Maybe there are ones specifically just for pipelining.
the reason why I use the ML library pipeline is because for robustness.
Let's say
You want to just fit it on training data, and transform it on train and test.
when you do it yourself, you might fit it on both training and testing.
which is why pipelines from respective libraries are useful, since they take care of it for you.
right, that's why I found out mlflow which can make your life lazy on training, testing and deploying your model
Does MLFlow allow you to add parts from other services?
Such as for hyperparameter searches.
I should try using MLflow. hahaha
It would clear things up.
seems you can tune your hyper with mlflow
https://github.com/mlflow/mlflow/tree/master/examples/hyperparam
but yeah, SpaCy or NLTK is used for data preparation right? Because it has all the features which abstract transforming texts to vectors
They can, I believe yes.
Previously I did it by hand, Iโll look into how I can use SpaCy or NLTK.
Part of the reason is also because models require the data to be in certain format but SpaCy will make it only in SpaCy format.
Like I donโt know if they produce attention masks and segment masks.
what would be the best way with numpy to take a 7 value rolling average of a list to create another list of the same length?
This is the code I'm currently using:
def smooth_increases(increases_list):
vals = []
new_vals = []
right = 0
while True:
vals.append(increases_list[right])
if len(vals) > 8: vals.pop(0)
new_vals.append(int(mean(vals)))
right += 1
if right == len(increases_list):
return new_vals
increases_list is just a list of ints
working on making a simple cli / interactive cli / module for making covid graphs
does anyone know if this book is worth reading ? "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow"
i finished andrew ng's course in ML and i want to keep learning.
what do you guys think ?
It is a really good introductory book with plenty of references to other valuable articles.
@idle root you can also try Harvard's Machine Learning course on edx. IBM offers a Data Science course on Coursera too. If you want to strengthen your fundamentals you can try Python for Data Science and Machine Learning by Jose Portilla. The Google Crash Course on machine learning looks interesting too
wait guys
is coursera free
or do you have to finish it in a week and they charge money
ohhhh you need to pay for a certificate that's lame
if you don't understand deep learning clap your hands
can anyone explain why the MinMaxScaler is needed? Portilla just used it in his lecture and I don't get why
anyone good with machine learning?
just ask your question lol
I can't remember exactly why you need to standardize, but minmaxscaler scales each feature separately
yes if they want to
ok.
So, from my understanding, a neural network receives inputs, and then calculates the dot product of the inputs and the weights. Then it adds the result by the biases. Then it repeats at the next set of weights and biases until it reaches the end, which is the output. Is that correct or am I missing anything?
it repeats the weights/biases for each layer in the network not just for the next 'set'
so it multiplies and adds all the weights and biases in the network?
not all together no
so, it multiplies the weights by the input, then adds the biases, then multiplies the result by the next set of weights, and then adds the next set of biases, until it reaches the output?
lol i'm learning neural networks rn and i don't get it
same
it takes the dot product of the input and the weights, that product becomes the biases
then it repeats with the product of the biases * the next set of weights for each layer
there's also an activation function involved but my knowledge is kinda lacking I haven't done it in a while
ik, the activation function really gets me stuck
I'm pretty sure the biases get pushed through the activation function to keep it within [0,1]/[-1,1] (depending on the function)
then it repeats with the product of the biases * the next set of weights for each layer
@bitter harbor this part is also kinda confusing
is there any links you recommend?
I'd suggest watching 3b1b's series' on networks/calculus/linear alg
to get the basics
Python for Data Science and Machine Learning is good too
it has a 5 hour section on neural networks
idk how new you are to this material tho
I've only watched 3b1b, the rest of what I learnt was through research so I can't confirm
lol I might be able to do research in December
so I'm studying up now
taking as many machine learning courses as possible
I've found it's sometimes easier to take a super complicated model and pick it apart until you understand each component individually
ok
i don't know which one i like more neural nets or NLP
nlp doesn't seem fun
how come
I feel like I never understood NLP properly since i haven't done an actual project in it yet
I mean nvm syntactic analysis which is it's own challenge, lexical semantics (meaning of individual words in context) seems like a absolute pain
your words seem like zarglar to me
what's zarglar
a fictional alien language
star trek?
no, came from a video from jabrils
oh lmao
NLP is hard but neural nets are even harder imo
but i just started looking at it
so ofc it's gonna be hard
npl is a subfield of nn's
tl as an example of lexical semantics, detecting which homonym is used in speech depending on context
neural networks?
and then you get into tone and intention
yes
yes
they aren't so bad until you get into super specific models
you should probably learn the math before anything else lol
i tried
it's apparently a lot harder to understand what's going on
lmao that's not going to help
what level math is neural networks?
high
well I'm a first year uni student and im pretty sure I can't take linear alg until my 3'd year
well it's a 300 course
I'm not sure about when multivariable optimization is taught
you guys are way older than me
hey I'm not old :{
:((((
3b1b literally taught me lin alg tho
i'll look at his videos
legitimately take notes
@keen bear we're joking
k
yes
i hope I can learn it though
dw everyone is here to help
I started machine learning maybe 3 weeks ago and the whole channel helped me
guys what is feature engineering explained simply is it the same as cleaning data?
Portilla keeps saying it and idk
it's pulling features from data with domain knowledge, in other words, knowing that variables present themselves in the env you're working with
so manual optimization?
idek
same I just looked at the wiki page
Is it still recommended using the 1st version of tensorflow using the following commandsimport tensorflow.compat.v1 as tf tf.disable_v2_behavior()?
why would you do that?
I've been following along with ML with Scikit and Tensorflow
nice
and to make the code work I need to convert to the previous version.
try to find a different course then
there's plenty with tf2
kinda the same concept as learning py 2.x to learn python
haha, kind of ๐
idk if it's to the same extreme tho
it teaches you the fundamentals tho'.. there is a page with the mappings from the old version to the new one. It's quite annoying to go back and forth.
ya again i'd suggest finding a different one
if it's just fundamentals it'd probably be a good idea to learn the up to date fundamentals
yeap, I will as soon as I finish this one >.> 100 more pages to go
there's a good tensorflow course on udemy
there might be more on coursera or edx idk i haven't looked
is it worth it, I'd rather go for the hands-on-ml2
I've heard good things about it
I ll look into it, thanks ๐
no problem
casual 4.6 rating
thatโs pretty good
I might do that after the Ng course
or I could do the google crash course on machine learning
I'm curious. I've never taken these courses other than Ng. What do they teach?
which course?
Just in general. Like let's say the tf ones.
pretty sure they teach how to implement tensorflow like I saw part of the course and they were teaching how to code a stock trading bot
Do they teach it from scratch as in instead of doing a dense layer, they show you by dot product this and add this, you'll get this other thing?
idk about that since I havenโt taken it yet/donโt understand NNs
Iโve just heard good things
hahaha sorry to put you on the spot.
nah no problem
Ahh, please update me. I'm curious.
I wonder what knowledge I'm lacking since most of my knowledge comes from reading notebooks and playing with code.
from the first minute or so of the course it seems like it goes through basic + advanced stuff
If you don't mind me asking, what's basic and what's advanced?
well like the basics so the lin alg, calc, stats stuff like that
advanced would be specifics of models, where they should be used, etc, etc
Ahh okay okay, ty ty.
I should take an algorithm and data structure course too
Bc I never took that class I only took my intro CS class
@lapis sequoia that course actually came with a reason not to code along and I feel like the last point is relevant
has anyone used optuna here?
I've only watched 3b1b, the rest of what I learnt was through research so I can't confirm
@bitter harbor 3b1b really helped, I just watched the series. Thanks a lot!
how to fix this?
I wonna submit this exam and be done once and for all with exams for at least next 10 days
First i made an java chat app (sockets + web + android + desktop) then they told me to learn python for 10 days in order to do finals in probability and statistics and now i am burnt down, meanwhile i need to finish my paid projects so i can actually pay the uni...life is hard these days
that means data in the csv file has different types in the same column, either clean up the data or specify the dtype in read_csv()
also it's just a warning so you might not need to do anything if you aren't using any data from that column
when you define X and y when you're doing neural nets why do you have to put .values
I don't remember them doing that for lin reg, log reg, k nearest neighbors, k means clustering, etc.
does neural nets just need a .values?
and why is Portilla using a MinMaxScaler
he's just built different ig
@hollow sentinel
Regarding scalers, it's really context dependent.
So understanding it like a tree.
-> Do you need a scaler?
-> Standard or MinMax or some other scaler like log?
-> Standard if you need it on a z scale, minmax if you need it on a 0 to 1 scale.
Regarding the .values and nn, could you give more information on what you're referring to?
dang I can't find where he used .values in the video
it's not in his coding examples
F
what is an epoch lol
An epoch is a pass, forward + backward propagation.
over the training data
Over the network.
hahaha I think Portilla and I are probably referring to the same thing, looking at it differently.
You can configure the dataset such that it might never encounter certain values of the training data.
can you explain overfitting simply
Yea to understand overfitting and underfitting, you'll need to understand bias + variance.
but I'll use a more relatable example.
this is why fundamentals are important
Imagine yourself to an algorithm.
You've come to learned from the data that cookies taste like chicken.
Thus in the future, you'll predict cookies tasting like chicken.
However you, as a human, knows this isn't really true.
You have overfitted on the data where cookies taste like chicken.
ok cool
sorry @bitter harbor i fall asleep doing the math behind machine learning
I've been doing it tho
When we discuss prediction models, prediction errors can be decomposed into two main subcomponents we care about: error due to bias and error due to variance. There is a tradeoff between a model's ability to minimize bias and variance. Understanding these two types of error ca...
I think that's one of the articles that's been linked quite often regarding bias and variance and how it relates to overfitting and underfitting.
alright I'll check it out
In context of ML, overfitting is when the model gets too complex while under is not enough complexity.
That's how you've been describing Portilla and Ng right?
yep
Portilla is more applicable while Ng is more theoretical.
Eh.
At least now you have applicable context when learning the theory.
Because unless you have a good grasp of the math foundation, I think learning theory is kinda rough.
But if you can supplement it with applicable experience, it might not be as bad.
it's pinned here that columbia's ML course is better than Ng's
Because now when Ng says this, you can think about how in your previous projects you might have encounter such a thing.
yep
I wouldn't get too sticky about the course quality and etc.
As long as they're able to teach it and you can follow along while doing your own research, I think it's a mission accomplished.
Because you want to spend time on the ML stuff, not on course ratings.
i just want to understand ML
My advice is play and experiment.
yeah I'll probably end up doing multiple courses
All the theory and books you read is just a piece of the puzzle.
machine learning books scare me
I like courses more
layer my neural nets like lasagna
what
nothing
what's the question
so i am simply trying to change the value of a row and column. prety easy, right?
but heres whats happening
it just doesnt change
when i created a variable to copy that value over
and tried to change it
it successfully changed
so a little confused
if my issue is unclear, i can show a little code
symbolsDf[symbolsDf[SYMBOL]==symbol][changeLbls[dateIndx][percentIndx]]
= (float(numHigherChanges/numDays))
that's what I am trying to do
I also tried this
series = symbolsDf[symbolsDf[SYMBOL]==symbol][changeLbls[dateIndx][percentIndx]]
series[0] = float(numHigherChanges/numDays)
The latter worked for the series, but not for the symbolsDf when I tried
symbolsDf[symbolsDf[SYMBOL]==symbol][changeLbls[dateIndx][percentIndx]][0]
= (float(numHigherChanges/numDays))
I also read about some method called set_values, but apparently the interpreter says that series doesn't have a method like that
I am aware that the values are tuples
And I might want to consider replacing the entire tuple
It's just a weird concept to me
Some clarification, what's the latter you're referring to?
The line with the series?
the latter is the second block of code
the second approach to changing the values which was done in both the second and third blocks of code
What does your data look like?
= (float(numHigherChanges/numDays))```
sec
= (float(numHigherChanges/numDays))```
okay I assume
symbolsDf[SYMBOL]==symbolis a row mask
@velvet thorn yes
what is changeLbls[dateIndx][percentIndx]?
it's a list of labels
so i have int indexes to get the correct index
so it just comes out to be a specific column
ill simplify
symbolsDf.loc[symbolsDf[SYMBOL]==symbol][columnName]
then that should be df.loc[row_indexer, column_indexer]
not sure if I mentioned this before but I would suggest you use snake case names for your variables
ahh
and strongly suggest you have spaces around your operators
Hi
hello
Hey guys, any quick way to get groups with their values in Pandas?
ie.
dataframe.groupby -> return a dictionary with key and values of their groups and respective values.
I was able to find
df.groupby.groups
But it ends up being in index form instead of their actual values.
haha as if I understand Pandas well enough to help ๐ฆ
that might be helpful
idk
sns.boxplot(x="loan_status", y="loan_amnt",data=df)
anyone know why this is so slow
sike it's completely fine
Hey guys, any quick way to get groups with their values in Pandas?
ie.
dataframe.groupby -> return a dictionary with key and values of their groups and respective values.
@heady hatch why do you want that?
@velvet thorn I need to compare across groups to count intersections. Any suggestions on how to go about it?