#data-science-and-ml
1 messages ยท Page 246 of 1
yes this should work
`from nba_api.stats.endpoints import playercareerstats
Anthony Davis
career = playercareerstats.PlayerCareerStats(player_id='203076')
player_df = career.get_data_fames()[0]
player_df.loc[player_df.SEASON_ID == '2019-20',:]`
like this?
wow brilliant, it work!!
nice
yeah i saw the error haha
hurray 
i was working on similar thing when i saw your question
I think this was the easy part how can I get this row and other rows and join them?
so try to take that last line
this_year = player_df.loc[player_df.SEASON_ID == '2019-20',:]
type(this_year)
im guessing its also dataframe but what does that say
ML questions get posted in this channel once in a while, so here's mine:
When creating AIs for playing board games, it's common to take advantage of symmetry to reduce the number of possible states. How is that actually done in practice? I had to take advantage of symmetry in one case before(not ML, just a metaheuristic optimization task), but I achieved rather meager results. Is there some sort of hashing algorithm for a 2d array of values that is invariant under rotations/reflections?
@solemn hull what if i want to get 3 months of 3rd quarter using similar code ?
so try to take that last line
this_year = player_df.loc[player_df.SEASON_ID == '2019-20',:] type(this_year)
@solemn hull Im not sure I quite get it what you are trying to say
so you can build a function to parse each specific player/dataframe, then iterate through the players or days etc
๐ค I think I need to learn more python, I try to understand
DF1 = DF.loc[DF['Month'] == '07',:]
DF1
i also want 08 and 09
from nba_api.stats.endpoints import playercareerstats
# Anthony Davis
def get_player_current_year(player_id):
career = playercareerstats.PlayerCareerStats(player_id=player_id)
player_df = career.get_data_fames()[0]
return player_df.loc[player_df.SEASON_ID == '2019-20',:]
player_results = []
for player_id in ['203076', ...]:
player_results.append(get_player_current_year(player_id)]
print(player_results )```
So I can put different IDs at the same time and I would get the row I want?
so i think pandas has specific syntax for multiple conditionals.. no idea if this will work but
DF1 = DF.loc[DF['Month'] in ['07', '08', '09'],:]```
it will call the api for each player, get the year 2019-20 then build up a list
and at the end print the list.. there is probably a better way to do it though, im a pandas newb
and yeah carly, it will get only that row for each player
Hmm. Maybe DF.loc["07"<=DF['Month']<="09",:]? Not quite the same, mind.
@solemn hull oh sorry, my doubt was a separate thing
i think they are strings so comparison wont compare the digits
not related to Carly's
no worries
im asking in general.. if i want to get 3 values out of rows 07, 08, 09 are for 3 months of 3rd quarter
DF1 = DF.loc[DF['Month'] == '07',:]
DF1
if i give this it will onl return for month of july
did you try the above DF['Month'] in ['07', '08', '09']
yes is an error
ah xD
what you definitely can do is
DF1 = DF.loc[(DF['Month'] == '07') | (DF['Month'] == '08') | (DF['Month'] == '09'),:]
shame the other way doesn't work, though
there's probably a way to make it work.
freakin pandas, eating up all the bamboo and making strange syntaxes ๐ผ
oh yes it worked. what does | did here ? @tidal bough
| is or
so we cant simply pass or ?
try and see
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I see, thanks ๐ I saw that guy's question and was similar to what I was doing otherwise im a noob myself lol
heh, i think im gonna go learn the basics
from nba_api.stats.endpoints import playercareerstats
# Anthony Davis
def get_player_current_year(player_id):
career = playercareerstats.PlayerCareerStats(player_id=player_id)
player_df = career.get_data_fames()[0]
return player_df.loc[player_df.SEASON_ID == '2019-20',:]
player_results = []
for player_id in ['203076', ...]:
player_results.append(get_player_current_year(player_id)]
print(player_results )```
@solemn hull It worked but the data got a weird look, is there a way to get the data and make it look 'pretty'?
for that, i think you need to use the pandas method of joining data rows.. someone said merge before.
Yeah, I will take a look but you definitely helped a lot!
or instead of print, you could do
for result in player_results:
print(result)
that way its not printing a list but each specific item individually...
awsum glad u got it working
u can convert it to dataframe if u want tabular form
pd.DataFrame
Another question, can we plot trend graph for 2 separate values in same graph ?
I want to give format to the data to display it nicer ultimately
@muted oyster | is bitwise OR, which for Series is overloaded to act elementwise.
or isn't really the same thing
Something like this but in a noob way, since thats from the official page
@tidal bough is it only for pandas or used in other libraries too ?
try
import pandas
from nba_api.stats.endpoints import playercareerstats
# Anthony Davis
def get_player_current_year(player_id):
career = playercareerstats.PlayerCareerStats(player_id=player_id)
player_df = career.get_data_fames()[0]
return player_df.loc[player_df.SEASON_ID == '2019-20',:]
player_results = pandas.DataFrame()
for player_id in ['203076', ...]:
player_results.append(get_player_current_year(player_id))
#if youre using jupyter you can call display()
display(player_results)```
@muted oyster Well, having | work elementwise on Series is just a Pandas thing.
the operator itself is of course used often when working with bits.
ok I understood. thx : -)
@solemn hull I got no result from that
๐ฑ
lol, dang.. ok, i guess go back to list []
import pandas
pandas.set_option('display.max_rows', None)
pandas.set_option('display.max_columns', None)
pandas.set_option('display.width', None)
pandas.set_option('display.max_colwidth', -1)
from nba_api.stats.endpoints import playercareerstats
# Anthony Davis
def get_player_current_year(player_id):
career = playercareerstats.PlayerCareerStats(player_id=player_id)
player_df = career.get_data_fames()[0]
return player_df.loc[player_df.SEASON_ID == '2019-20',:]
player_results = []
for player_id in ['203076', ...]:
player_results.append(get_player_current_year(player_id))
for result in player_results:
print(result)```
@dire pollen
thats supposed to remove the abbreviating '...' stuff
Oh I see, well anyways thank you for your help I will try to take a look about the other stuff!
np 
Terminology question : I came across the term "interval" for a column data type. (for context, this terminology is used in sas documentation). Does interval data refer to continuous data?
Could be referring to timestamp data
usually interval refers to the interval between two given dates
or whatever time periods are required
pandas has a "time period" data type
I have a dataframe like this which i want to convert into:
this
like a state wise counts of closed and open and its total at the end
ML questions get posted in this channel once in a while, so here's mine:
When creating AIs for playing board games, it's common to take advantage of symmetry to reduce the number of possible states. How is that actually done in practice? I had to take advantage of symmetry in one case before(not ML, just a metaheuristic optimization task), but I achieved rather meager results. Is there some sort of hashing algorithm for a 2d array of values that is invariant under rotations/reflections?
I have a dataframe like this which i want to convert into:
@muted oyster groupby count unstack
@velvet thorn can u look over #help-apple
DF2.groupby(['State', 'Final_Status' == 'Open' | 'Final_Status' == 'Closed']).size().unstack(fill_value=0)
do u mean like this ? but its giving error
I tried something like this:
but is giving total closed and open values for all states and not individually
ok I figured out to get closed and open in rows and sort of this code worked:
DF3 = DF2.groupby('State')['Final_Status'].value_counts()
DF3 = pd.DataFrame(DF3)
DF3
can i get the values in columns ?
like closed and open in columns instead of rows
ok figured it out lol
thanks buddy @velvet thorn
groupby just 'State' actually, but yeah
ok I figured out to get closed and open in rows and sort of this code worked:
DF3 = DF2.groupby('State')['Final_Status'].value_counts() DF3 = pd.DataFrame(DF3) DF3
I added .unstack().fillna(0) so it worked
i'm pretty inexperienced in ds/ml coming from an econ background. i want to fit a supervised learning model to associate bodies of text with items from a list of shorter texts. in the training set i know which large texts should be associated with the short texts and in the out-of-sample dataset i have groupings in each list
does that make sense
my data look like this https://gist.github.com/weverett96/31b30a1cb201bf9fe357d0ed5c3ec860
Fund names and strategies from Eaton Vance filing. - dataexample.py
where the matched pairs are 'name' and 'strategy'
then in the unmatched set i want to associate strategies with the 'name' fields
is this turtorial outdated?
https://www.youtube.com/watch?v=wypVcNIH6D4
Ever wanted to create an AI Chat bot? This python chatbot tutorial will show you how to create a chatbot with python using deep learning .
Playlist: https://www.youtube.com/watch?v=wypVcNIH6D4&list=PLzMcBGfZo4-ndH9FoC4YWHGXG5RZekt-Q
Download JSON File: https://techwithtim.n...
is it?
it's from 2019, so hardly ๐ค
df.sparse.to_dense() is returning sparse not found? Am I missing something?
hmm
https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html
Sparse accessor
New in version 0.24.0.
Pandas provides a .sparse accessor, similar to .str for string data, .cat for categorical data, and .dt for datetime-like data. This namespace provides attributes and methods that are specific to sparse data.
Maybe you have an old version?
check pandas.__version__
Nope, not renaming my dataset anywhere
It is actually the output value of a multilabel classification which is in sparse format and I need to convert it into a dense matrix for the metrics
Weird thing is that it worked before but when I got back to it and tried running it again, it's returning this error
Tried it, no luck! The function works alone in a separate instance though
This is so weird
Hi, this is probably an annoying question, but I have to ask: Does anyone recommend any book to learn Machine Learning? I've looked online but there's just so much stuff!! It's hard to distinguish from hyped stuff, oversimplified things and the actual useful things. So I was looking for something to hook into that would get me through things. I come from a physics background and I'm confortable with python (if it helps in some way)
@keen root I personally:
- Did this amazing coursera course:https://www.coursera.org/learn/machine-learning as an overview of the field.
- Am now doing the Practical Reinforcement Learning course from this specialization (just because that's what I'm interested in): https://www.coursera.org/specializations/aml
For reading material, I found useful the materials the AI discord suggests:
MACHINE LEARNING
Before you start specialising in any particular field, it's important to learn the core theory of Machine Learning for a broad exposure to ideas and techniques that you can likely apply to any field.Core
โข Bishop - Pattern Recognition and Machine Learning
- Also check out Model-Based Machine Learning by the same author
โข Tibshirani, Friedman, Hastie - The Elements of Statistical Learning
โข ColumbiaX on edX - Machine Learning
The first course is free. The ones from the Advanced specialization aren't, but coursera's audit mode allows free access to basically everything from the course except quizzes for some reason (programming assignments are available).
@tidal bough Thank you, that's amazing, I'll follow the first course, seems to be quite complete, however it is based on matlab/octava, will it be crucial to understand the contents if I've never worked with them?
ooh, I finished that specialization on coursera, not all courses are equally good, but overall I learned a lot
Also, did you find it important to follow some book at the same time?
nah, if you want to know something google is your friend.
but I do recommend to supplement the material in courses by looking things up whenever you're curious or confused about something
@keen root I've never worked with Octave before that course. I didn't find it hard to learn - it's very nice in its native support of matrix and vector calculations.
Also, did you find it important to follow some book at the same time?
I didn't read any ML books until my Practical RL course. The first course provides its own materials, which are quite enough.
Got it, thank you
The Elements of Statistical Learning is absolutely fantastic
@keen root there are lot of books from OReilly
yes i started with Head First Python brain friendly
Also, I recommend that you guys check out StatQuest with Josh Starmer on youtube
He covers basic statistics and ML. The channel is amazing for beginners and experts alike.
sure, anytime! I'm new to everything and evrything helps : )
and also most of the O'Reilly books are available in pdfs just a google search would do
@keen root
That's great, thank you :)
Can I say join distribution instead of mutivariate disctribution or it is better to say multivariate distribution for multi dimensional distributions and joint distribution for jointly distribution between different random variables?
if i pass this
DF3.set_index('Date').plot();
I get a plot of very small size, how can i enlarge it ?
i guess i should ask in help section ๐
got it, but had to change it to something much messy
Anyone here with experience with LSTM layers in keras?
Im not sure how to interpret the shapes, input and output
@pearl crystal either is fine, multivariate distribution implies that the random variables covary - so it's really the same thing as explicitly saying that the distributions are joint. a multivariate distribution with zero covariance wouldn't really be multivariate, it would just be a collection of univariate distributions
you can use .plot(figsize(width,height)) where width and height are in inches
or you can just use matplot lib and build the graph yourself
which will probably end up looking better
hey I'm trying to make a classifier to identify if a burger is burger king or mcdonalds. Can some people help me build a dataset? 1 is the worst, and 100 is the best
- Can you give me a rating on a scale of 1-100 on how good a burger king burger tastes?
- Can you give me a rating on a scale of 1-100 on how healthy a burger king burger is?
- Can you give me a rating on a scale of 1-100 on how good a mcdonalds burger tastes?
- Can you give me a rating on a scale of 1-100 on how healthy a mcdonalds burger is?
Anyone willing to take a few seconds and think back to when they've had a burger would be great!
which will probably end up looking better
@plucky cairn yes I plot it using matplotlib. It's messy bcoz i wanted 12 lines on graph and for every line i had to copy that code 12 times and passing every column in it.
Interesting. Of all the scipy solvers, only DOP583, supposedly a very precise RK solver, has any problems with this equation.
this is dy/dt = 1/y - 1/(1-y) + 10*abs(y-0.5) + np.cos(t/10), from y(0)=0.6
anyone want to rate burger king and mcdonalds burgers?
Interesting. Of all the scipy solvers, only DOP583, supposedly a very precise RK solver, has any problems with this equation.
@tidal bough this looks interesting. What are the legends about ? As u mentioned dop853 is the only one ? Or alsoLSODA
And what are these things actually ?
I only really know how the RK ones work
Close-up of the diverging interval
RK23 is actually oscillating a bit too
RK45 oscillates less
the rest are nigh-perfect
hey what is a good way to learn python for machine learning if I have absolutely no experience with coding at all
pls ping
someone uses nginx here?
@polar berry https://www.coursera.org/learn/machine-learning-with-python, or the Advanced ML specialization on coursera.
But first, you'd have to learn Python in general. See !resources.
anyone know any good datasets for training very basic classifiers?
What kind of classifiers?
Like, any ones? Check out the Titanic dataset, it's a classic.
oh god, I found a really unforgiving equation
dy/dt= y**2 - 50/(1-y)**2 + 50*np.cos(t/5) - 10*np.sin(t/10)
@tidal bough where is resources?
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
@tidal bough which one is the best?
We're a large, friendly community focused around the Python programming language. Our community is open to those who wish to learn the language, as well as those looking to help others.
@tidal bough which one is the best?
@polar berry do you really expect them to be able to tell
@velvet thorn idk bro
gonna use codecademy
https://www.coursera.org/learn/python-for-applied-data-science-ai
https://www.coursera.org/learn/machine-learning-with-python
which of these courses should i do after
IBMโs always a good choice
Thought this is kinda relevant, since it's using numpy with large amounts of data;
Just wondering, I've got two arrays, both of the same shape. They look like this;
a = [[1,2],[6,4,2]]
b = [[3,4],[5,3,4]]```
Both a and b are numpy arrays, and I was wondering how I'd go about adding them together, to get a result like so:
```python
[[4,6],[11,7,6]]```
Would this be possible?
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(58, 78, 3)))
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.keras.layers.Conv2D(128, (3, 3), activation='relu'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(196, activation='softmax'))
model.compile(optimizer=tf.keras.optimizers.Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])```Isn't my shape supposed to get smaller per layer in a CNN? if so then why do I get this error ```python
OOM when allocating tensor with shape[479232,1024] and type float on ``` whats with the input shape of `479232`
@desert parcel did you read what I said above
@velvet thorn I just did
It does return those
when you said you should return loss.item() and the other stuff
@bitter harbor they're both IBM?
Youโre asking about courses that I bet not many people here have looked at, look at the reviews as itโll probably mostly be up to your/the general opinion
when you said you should return
loss.item()and the other stuff
@desert parcel huh
no, it's literally returning a string
don't you see the .format call?
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(58, 78, 3)))
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.keras.layers.Conv2D(128, (3, 3), activation='relu'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(196, activation='softmax'))
model.compile(optimizer=tf.keras.optimizers.Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])```Isn't my shape supposed to get smaller per layer in a CNN? if so then why do I get this error ```python
OOM when allocating tensor with shape[479232,1024] and type float on ``` whats with the input shape of `479232`
@fervent bridge you cut out half the error...
anyway, a CNN does decrease the size of the input (channels last) along the 2nd and 3rd dimensions (assuming you don't have padding), while increasing the size of the 4th dimension (assuming the number of filters increases)
but then you flatten.
(which is a prerequisite for passing to a dense layer if you want to work with all the dimensions, but)
say you have an input image of size 640x480x3
after going through the first three convolutional layers it would end up being 636x476x128 = 38750208.
in a vanilla CNN the operation that decreases the size of the image is not really convolution, but pooling.
look up MaxPooling2D
Thought this is kinda relevant, since it's using numpy with large amounts of data;
Just wondering, I've got two arrays, both of the same shape. They look like this;
a = [[1,2],[6,4,2]] b = [[3,4],[5,3,4]]``` Both a and b are numpy arrays, and I was wondering how I'd go about adding them together, to get a result like so: ```python [[4,6],[11,7,6]]``` Would this be possible?
@thin solstice how can those be arrays?
they're not the right shape
unless you're saying they're object arrays (print(a.dtype)), which is not what numpy should be used for, in general
don't you see the
.formatcall?
@velvet thorn oh.. Now I See
yes.
which is why I said you really would benefit from some more work on your fundamentals
don't try to dive into DS (and DL) so quickly...
not really sure if it has support for type hints
I don't have an IDE installed on my machine
but you can try mypy
but anyway, I'm not going to argue with you about your learning path...?
yeah sounds fair lol
the last thing I'll say is that that would have been a trivial error to debug
I'm good in a sense that
not that I mind solving this kind of problem since I love procrastinating ๐
I know what to do ut sometimes
but*
but sometimes I just don't look too into detail like return and print
I just assumed they were interchangeable well... until now of course
that...is a very scary statement
how long have you spent using Python
about 4 months
hm.
but I don't really know anymore lol
well, I hope it works out for you
haven't been keeping track
It worked
so all my issues with my code
was because I was returning a string
ohhhh no wonder I couldn't get any where
it's like
trivial way to prevent such errors
you're missing a function name
yeah I know lol
I use this in my functions
or whatever they are used in
I don't need to keep track of it too much
How to join 2 df's using pandas on the basis of a column but the column data matches partially is that possible
hello work (df a) ------- hello world data.
not really.
not without a fair bit of processing
that's quite a high-level problem
unless you're saying they're object arrays (
print(a.dtype)), which is not whatnumpyshould be used for, in general
@velvet thorn in that case, oops
can anybody help
You may be able to incorporate something like Levenshtein distance as a conditional check whether to join a specific column, but I think it would be kinda awkward depending on the structure of the two dataframes.
let me give you an example to be more specific
about 4 months
@desert parcel I'm honestly impressed that you got away with that for so long
you cannot just use Levenshtein distance or some other difference metric
or rather, not alone
I would suggest some form of clustering
then join on the cluster IDs
which is why I said "not without a fair bit of processing"
In one df the there is a column named as model i,.e equal to Galaxy s2
another df there is a colum named model i.e equal to Galaxy s2 a
I want to match these 2
yes, we understand the problem.
it's not a simple problem
it is not difficult to find the distance between two rows given a specific column.
@velvet thorn you are taking it ina complex manner
wait let me thik a bit out of it
just use some form of string metric
@velvet thorn you are taking it ina complex manner
@graceful ice do you understand why I say it is not simple...?
@desert parcel I'm honestly impressed that you got away with that for so long
@bitter harbor lol so am I
I learnt some selenium and other stuff
hello work (df a)
would these be three different columns
no, "hello work" is the value in a column in one dataframe, and "hello world" is the value in an identically named column in the other dataframe
it was a pretty poorly formatted example TBH
yikes ya I thought it was comparing 'words' not the phrase
would it work if you checked if the indices of the characters in the column were equal on both objects+/had the same spacing (ei (data) > data - different indices but the letters are the same spacing)
and the thing is
(I'm not so sure about this)
because their use of terminology isn't very clear
but they might want to join the rows
as opposed to match them
I did it
and the thing is...where do you stop?
because with a high enough threshold any number of strings can be matched
df['PartialModel'] =df['Model'].apply(lambda x: difflib.get_close_matches(x, invoiceDf['Model'])[0] if len(difflib.get_close_matches(x, invoiceDf['Model']))>0 else "Unknown")
read this
good for you then
@velvet thorn thanks for your help
@graceful ice yw
Hello all! I had a question involving route optimization and distance matrices.
It's similar to the traveling salesman problem but if you had multiple salesmen
@lofty scarab So no overlapping of the salesmen?
http://matrixmultiplication.xyz/
what's numpy's name for this function?
An interactive matrix multiplication calculator for educational purposes
you can use the @ operator
thanks! :)
or np.matmul
wait... it seems to only return an array with one value?..
lemme show you what I've got...
>>> a = np.array([0.0019, -0.01])
>>> ht = np.array([[-0.09],[0.04]])
>>> a@ht
array([-0.000571])
# shouldn't this array be shaped as (2,2)?
@pale thunder ^
since here on this website, matrix multiplication of two arrays returns an array shaped as 2,2
& this is what I get when I try that same thing in python:
>>> A = np.array([1,2])
>>> B = np.array([3,4])
>>> A@B
11
>>>
try np.matmul
*!!!
In [17]: A = np.array([[1,2]])
In [18]: B = np.array([[3],[4]])
In [19]: A @ B
Out[19]: array([[11]])
In [20]: B @ A
Out[20]:
array([[3, 6],
[4, 8]])
this is why linear algebra is hard ๐
yeah haha
you need to have have a row and a column, so they need to be 2D.
I'm only in year 10 and I'm struggling to wrap my head around matrix multiplication :P
good luck!
haven't done anything like this in school lol, thanks! :)
id suggest watching 3b1b's series on it
another useful thing is transpose, which you do as A.T
& this is what I get when I try that same thing in python:
>>> A = np.array([1,2]) >>> B = np.array([3,4]) >>> A@B 11 >>>
@thin solstice try with B = np.array([[3],[4]])
oh already said. sorry.
also, np.array([3,4]).T
that transposes ("turn the other way") the vector
Someone is using plotly? I wonder if there is a way to not use browser to plot
Matplotlib has always been the go-to plotting library for all my use cases.
Heck, even pandas has a plotting method.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html
Hey ppl can anyone tell me why everyone uses Jupyter Notebook for data science? I use Spyder but if that many people uses Jupyter it has to be a reason right?
@covert rover Main advantage for me is the cell structure.
It really is convenient. It's a nice balance between running entire programs, and running single lines of code (REPL).
So you can have a cell with imports, one that calculates stuff, one that plots stuff...
And if you want to plot with different settings, you just change the plotting cell and rerun that cell without recalculating the data.
@tidal bough that's convincing
thanks!
How would I go about testing image recognition code that I wrote using OpenCV?
Are there any libraries/tools that can help with this?
I haven't found any good resources online
The only thing I can think of is just to have a folder with a bunch of test images and some json file with text or numbers or whatever that I expect to find in each of them. Is there a better way?
Data pipeline
anyone knows how to export a pandas dataframe to csv? Im reading the doc but I got this 'list' object has no attribute 'to_csv'
@deft harbor could you elaborate?
Decision tree prototype, I mean, its functioning and predicting, the plotting is prototype ๐
that's not a tree, it has cycles ๐
nooo
I cant draw arows , im learning plotly :d
its all one direction starting from want
In this video we continue on the topic of Lipschitz continuity by presenting a paper which proposes a projection method to enforce it! If you enjoy this video consider watching others which I have on the topic! ๐ I would love to have discussion here or on the comment section, the goal of this youtube channel is to create knowledge and interesting discussions in this area of ML.
Video: https://www.youtube.com/watch?v=9kxhEdiTwek
Paper: https://arxiv.org/abs/1804.04368
Abstract: We investigate the effect of explicitly enforcing the Lipschitz continuity of neural networks with respect to their inputs. To this end, we provide a simple technique for computing an upper bound to the Lipschitz constant---for multiple p-norms---of a feed forward neural network composed of commonly used layer types. Our technique is then used to formulate training a neural network with a bounded Lipschitz constant as a constrained optimisation problem that can be solved using projected stochastic gradient methods. Our evaluation study shows that the performance of the resulting models exceeds that of models trained with other common regularisers. We also provide evidence that the hyperparameters are intuitive to tune, demonstrate how the choice of norm for computing the Lipschitz constant impacts the resulting model, and show that the performance gains provided by our method are particularly noticeable when only a small amount of training data is available.
In this video we continue on the topic of Lipschitz continuity by presenting a paper which proposes a projection method to enforce it!
Paper: https://arxiv.org/abs/1804.04368
Abstract: We investigate the effect of explicitly enforcing the Lipschitz continuity of neural net...
We investigate the effect of explicitly enforcing the Lipschitz continuity of
neural networks with respect to their inputs. To this end, we provide a simple
technique for computing an upper bound...
@raven mulch has this technique been adpoted at all? its interesting but i havent heard of it before
if i have a dataset with a lot of missing values and i want to calculate (cramers) correlation, is it important to impute the missing values first?
you need to do something with them. either drop them or impute them @lapis sequoia
Similar techniques have had great success with GANs
imputation is kind of a can of worms but maybe you can get away with mean/median/mode imputation
ty
And experimental section shows very promising results with feed forward nets and conv nets
looks like you're interested in regularization, i see you have a video on another "obscure" technique
My main area of interest is ML security
Thatโs what I do research in at my uni
But Iโm interested in this stuff too yeah
Which is quite related
i suspect regularization would be an important topic in that area
Yep
very interesting
Hi.
Recently I've started exploring graph-like data (complete beginner). Does anyone have a resource recommendation for modelling 'labeled property graph' data? I want to learn how to properly represent such data in python.
Nowadays, should we use criterion like AIC to compare models?
AIC= 2k-2ln(L)
We can compare models based on the accuracy in test data and utilize cross validation techniques. So, why do we need these absurd criterion?
@pearl crystal the criteria aren't absurd. they are meant for cases where you don't necessarily have enough data, or good enough data, to use cross validation or a train/test split
also they use different goodness of fit criteria, in this case the likelihood of the model
that said, there are some nice asymptotic results relating model fit criteria like AIC DIC and WAIC with LOOCV
This video is part of a lecture course which closely follows the material covered in the book, "A Student's Guide to Bayesian Statistics", published by Sage, which is available to order on Amazon here: https://www.amazon.co.uk/Students-Guide-Bayesian-Statistics/dp/1473916364
...
in a lot of todays' machine learning problems, you dont usually need these criteria. and not all of them are actually good criteria. but to call them "absurd" is imo ignorant of their intended purpose
@desert oar
Ben Lambert is a great and expert data scientist. I have seen some of his videos. They were perfect. thanks
๐ indeed
one place you still see AIC used is in time series modeling
although its not necessarily ideal there either
but in time series work it's often much harder to cross-validate or otherwise hold out test data
I do not know why his videos in youtube do not have enough views
he's a pretty well respected researcher, so he probably just doesn't spend effort promoting his work
i agree i really like his content
anyone here use eta-squared before?
matplotlib's imshow() on a 3-d array treats the third axis as RGB, right?
As 10 seconds of googling show, yes:
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html
The image data. Supported array shapes are:
(M, N): an image with scalar data. The values are mapped to colors using normalization and a colormap. See parameters norm, cmap, vmin, vmax. (M, N, 3): an image with RGB values (0-1 float or 0-255 int). (M, N, 4): an image with RGBA values (0-1 float or 0-255 int), i.e. including transparency.
if shape[2] is more than 4, higher indexes by that dim are ignored.
@tidal bough
I am having trouble understanding this: https://www.statsmodels.org/stable/generated/statsmodels.robust.robust_linear_model.RLM.html#statsmodels.robust.robust_linear_model.RLM
exog : array_like
A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant.
I googled nobs array and got nothing.
Here is an example of this array being constructed...
nsample = 50
sig = 0.5
x = np.linspace(0, 20, nsample)
X = np.column_stack((x, np.sin(x), (x-5)**2, np.ones(nsample)))
beta = [0.5, 0.5, -0.02, 5.]
y_true = np.dot(X, beta)
y = y_true + sig * np.random.normal(size=nsample)
X is exog the nobs array
that code comes from this example:
https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html
To restate my request for help: What is nobs? how are the 4 elements in each array element for nobs used? Any suggestions on something I could read to inform myself?
nobs is the number of observations as per the text you posted
and I'm unsure what beta is in the snippet as it is unused
in the code you posted, nobs = 50, k = 4
the k comes from the 4 elements in the argument to np.column_stack
oh, ignore beta. it is used to calculate y_true and I accidently left out the line of code that explains how it was used
added that line back
what is k?
number of regressors? What do I need to read to better understand that?
also looking at this https://www.statsmodels.org/stable/endog_exog.html it appears exog/endog are the generic terms it uses for x/y or input/output
regressors are basically how you would call features
my remaining confusion is about the four k elements... 1: x, 2: sin(x), 3: (x-5)^2, 4: 1
hey hey hey... I wouldn't call them anything ๐ I don't even understand what they are
what do you mean by "you would call features"
the independent variables
the "things" that make up the observations
like you had a pandas dataset, you'd have some column "output", and 4 other columns that would correspond to these
these would be used to predict the output
i'm not exactly a stats major so pardon my lack of proper terminology
Ok... so how do those 4 features apply to the plot that was generated?
x is raw data. that part is easy.
i'm guessing the OLS curve is the one generated from the model ?
yes
if so, then i'm guessing it'd be something like OLS would try to fit y = ax + b sin(x) + c (x-5)^2 + d
That is what I was worried about, because that means that OLS was given a ton of data about what the plot should look like. So, what work is OLS actually doing if most of the curve is already defined?
the model predicted the a, b, c, d
Interesting. OK! That helps a ton!
do you have the code that gets the result of the model, and plot it ?
Often times, a person's confusion is more about their lack of ability to view the problem from the right perspective.
It is completely copy and paste from the example I linked above.
ah yeah i see it now
Except I naively swapped the data out with a Google option chain volatility smile expecting to get a nice curve fit without changing much in the code. Finally figured out that it wasn't working because I wasn't defining the k parameters.
But, I think you gave me enough of a hint... I have an idea of what I need to do now.
this part shows the a/b/c/d
and if we do graph it, we can see it matches the graph from above
does the number of k parameters define the number of orders of a polynomial equation that is used?
yes
cool
if you wanted to use a polynomial that is
so, it wouldn't be y = ax + b sin(x) + c (x-5)^2 + d it would be y = ax^3 + b sin(x)^2 + c (x-5)^2 + d?
or not.
it is not how many orders then, it just defines an equation.
you could make it a polynomial or not, your choice
no this model accepts anything apparently, you'd have to replace sin(x) , (x - 5)**2 etc with actual x ** 2, x ** 3 etc
you could make it a polynomial or not, your choice
@modest rune exactly
cool. I think I get it. Thanks!
@odd yoke Thanks! I was able to make progress! Have more to learn now, but I was able to get a decent curve fit.
It was a straight line before, now it is curvy and fitty! all in one ๐
nice
okey, I've got a question about something...
I'm trying to write a neural network library, and I've got something so far. it works at predicting, it's got weights & biases, an mutate functions, etc. it's fully functional if you use a genetic algorithm to train it, but personally I'd like to incorporate backpropagation, however I'm having some trouble
Hey @thin solstice!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
there's my code, and it seems to be very strange during the training process;
if __name__ == '__main__':
n = Network( [2,3,1] )
tests = [
[[1,0],[1]],
[[0,1],[1]],
[[1,1],[0]],
[[0,0],[0]]
]
for i in range(2500):
test = random.choice(tests)
print('\n')
print(test)
print(n.feedforward(test[0]))
n.train( test[0], test[1] )
print(n.feedforward(test[0]))
for test in tests:
print(test, n.feedforward(test[0]))```
I'm attempting to teach it the XOR problem, but I'll send a sample of what happens when it is run..
this is during training;
[[1, 0], [1]]
[0.43385151]
[0.44138086]
[[1, 1], [0]]
[0.44142151]
[0.43554815]
[[0, 0], [0]]
[0.43553338]
[0.42969438]```
the first line is the test, second line is the network's guess, and the third line is (hopefully) the improved network's guess
and as you can tell, it is improving, but only per question
by the end of the 2,500 training examples, it seems like it hasn't learnt a thing, apart from making all the answers equal for some odd reason
[[1, 0], [1]] [0.45336777]
[[0, 1], [1]] [0.45343231]
[[1, 1], [0]] [0.45340774]
[[0, 0], [0]] [0.45339234]```
the first list is the test, and the second list is the network's guesses
as you can see, they seem to converge to 0.45
I've been trying to follow along with this, but clearly I messed something up somewhere along the way, but I can't figure out what I've done wrong..
https://www.youtube.com/watch?v=tlqinMNM4xs&list=PLRqwX-V7Uu6aCibgK1PTWWu9by6XFdCfh&index=18
any help is appreciated greatly, and please @ me in replies, thanks :)
( I moved my question to #help-peanut )
a quick question, how do i get max value of a column along with corresponding row element?
max value I know, DF['column'].max()
DF['column'].idxmax()
this gives index of the max value but i want value from another column which falls in same row as max value
๐ฅด idk if someone will understand what im trying to say
Ok got it, sometimes just need to revisit the basics:
DF[DF.Column == DF.Column.max()]
Hello, how are you guys? I want to learn data science and artificial intelligence, and I know that I have to start learning linear algebra, differentiation and integration, statistics, probabilities, and data analysis. Is there anything more I should learn?
I want to make every cell that's >0.4 yellow and I'm trying to do it like this --> ```python
df.style.applymap( lambda x: 'background-color : yellow' if x > 0.4 else '')
@lapis sequoia no, that's everything
Hello, how are you guys? I want to learn data science and artificial intelligence, and I know that I have to start learning linear algebra, differentiation and integration, statistics, probabilities, and data analysis. Is there anything more I should learn?
@lapis sequoia I'd suggest basic neural network architecture
Hi Guys,
I have a pandas dataset with a datetime field and a value field.
I would like to get the sum of the records sorted week wise in such a way so that the all the records before that week should be included in the sum.
Week 1 should have sum of values for week 1 dates
Week 2 should have sum of values for week 1 dates+week 2 dates
Week 3 should have sum of values for week 1 + week 2 + week 3 dates
and so on
Your help will be very much appreciated
Thanks in advance ๐
Your help will be very much appreciated
Thanks in advance ๐
@viral scroll sort, groupby, sum, cumsum
Hello, so I had this question about neural networks,
when we merge outputs from 2 different layers, we usually use 'add' layer
In keras there are many such layers like 'add', 'multiply', 'average', etc.
does anybody have a practical explanation of which one to use to merge when ?
Im working with a dataframe in pandas,
I dont know how to search by a specific year
Basically my question is In 2016, which person sold the most in each category?
#group = df.groupby(["Category", "person"]).sum()
#group."Ship Date"].to_datetime()
#total_sales = group["Units Sold"].groupby(level=0, group_keys=False)
#total_sales.nlargest(1)
But how do I group by the specific year aswell
the data type of the date column is datetime64[ns]
df.groupby(['Ship Date'].dt.year)
not working
@velvet thorn use resample instead
df.resample('1Y', on='date')
I think
Off the top of my head
hey there
I'm having a hard time understanding sync and async..
I looked up simple explanations, it says : sync is when request 1 -> response, before you run request2..
async is request1 and request2 get executed at the same time without waiting for either to complete
I don't have anything I personally do to correlate this with, so this explanation isn't useful..
anyway, I'm ultimately trying to understand this in relation to model training:
Synchronous training has all worker training on different subsets of input data and incrementally combines results. In asynchronous training, workers operate independently and update variables asynchronously
@desert oar
@lapis sequoia this belongs in #async-and-concurrency, also id appreciate it if i wasnt randomly pinged
Hello fam,
Hey guys, if anyone is good with pandas, I am having some issues with duplicate values that I've tried to describe out in #help-popcorn channel, not sure if there is a more appropriate channel to post this too so apologies if this isn't the place
I am trying to wrap my head around something with regards to surface fitting. Libraries like scikit-learn and statsmodels provide the tools to fit a curve, but not the tools to fit a surface (3D surface). I get the feeling, that given 3 axis, X, Y, and Z, there is a way to curve fit Z with respect to X but do it for every value of Y, then seperately curve fit Z with respect to Y and do it for every value of X, and then somehow combine those curve fits to form a surface fit.
Like i mentioned above, scikit-learn and statsmodels libraries have lots of curve fitting algoritms but no surface fitting algorithms.
I think scikit has a few surface fit funtions, but not for the vast majority of their curve fit algoritms (ex. OLS, RLM, LOESS, LOWESS, etc.)
Many curve fitting methods work on any number of dimensions. It's weird if scikit can't do it, lemme check...
They might and maybe they simply lack examples showing how to do it with an extra dimension.
Being a noob in this area, I am certain I don't understand much of the documentation.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge
Can't this do multidimensional?
This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape (n_samples, n_targets)).
In fact, it looks like the first example is that case:
>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> rng = np.random.RandomState(0)
>>> y = rng.randn(n_samples)
>>> X = rng.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y)
Ridge()
and this constructs polynomial features, even for multidimensional data: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures
And this tutorial shows using them together, though only on 1d data: https://scikit-learn.org/stable/auto_examples/linear_model/plot_polynomial_interpolation.html
I looked, didnt see an example of them using that regression for surface fitting. That doesn't mean it can't though. I think there is a high probability what I want to do is supported and easy, I just am missong a piece of the mental puzzle
I think you are right. I bet I can pass a properly dimensioned array to pull this off. But, I guess I don't quite get how to do that... An example would be sweet. I am a bit suprised I can' find one if it is supposidly easy.
What is special about the ridge model though?
@velvet thorn use resample instead
@desert oar depends on what you wanna do I guess?
like if you wanted to transform instead of aggregate you couldnโt resample
ridge sounds like an esoteric statistical model.
I see it very often at work, alongside lasso and elasticnet
lemme try LinearRegression
it's a scarily simple way to regularize linear models, and it generally doesn't cost anything other than adding a few characters to your code to specify you want to use ridge
ridge sounds like an esoteric statistical model.
@bitter fiber it is mega common
at least IME
i have the same experience
My data has lots of outliers, so i was hoping to use something stable like lowess or robust linear models
not working
@jovial oriole what do you mean not working
actually .groupby(pd.Grouper(โShip Dateโ, axis=โyearโ)) would have been more appropriate
I've used the facebook ai model prophet since 2016 > for time series specifically
@velvet thorn I got it in the end , I did
dfbyyear2014= df[df['Order Date'].dt.strftime('%Y') == '2014']
So I reworked the dataframe, basicly a pre applied filter
way better than any other model i've tried
@modest rune
Yup, it works.
https://repl.it/repls/FlawlessBlueModules#main.py
As you can see, it correctly finds out the coefficients:
[ 0.00000000e+00 1.00000000e+00 1.11022302e-15 3.33066907e-16
1.00000000e+00 -1.00000000e+00]
wait, or does it
that's not correct at all, lol
@velvet thorn I got it in the end , I did
dfbyyear2014= df[df['Order Date'].dt.strftime('%Y') == '2014']
So I reworked the dataframe, basicly a pre applied filter
@jovial oriole that's not grouping by though
well, actually it's not completely wrong
it estimated 0 + X + XY-Y^2
real answer is 2 + X + XY - Y^2
and I'm not sure how it missed the bias
...or maybe it's the model as a whole that does it?
group = dfbyyear2014.groupby(["Category", "person"]).sum()
the intercept is in lin.intercept_ @tidal bough
uh, does it ?
group = dfbyyear2014.groupby(["Category", "person"]).sum()
@jovial oriole okay, so you actually wanted to filter and then groupby I guess
coef_ has 6 elems, it fits a x2y2 + b x2y + c xy2 + d xy + e x + f y
looks fine to me
then we have the + g with the intercept
still kinda misleading, since the
coef_also has it, but it's 0 always ๐
@tidal bough no, the bias is not in the coefficient vector
I would guess they just both happen to be 0
@odd yoke
coef_ has 6 elems
that's the problem, they correspond to 1, x, y, x^2, xy and y^2
and yet the first of these is actually always 0, which the actual bias is in intercept_
Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].
That's the output ofpolynomialFeatures
(that's exactly my case)
ah, okay, because you have an explicit constant feature
ah I see, PolynomialFeatures has a include_bias parameter, and so does LinearRegression (fit_intercept)
lin = LinearRegression(fit_intercept=False) fixes it
ah, that makes sense
the slighly more efficient way is probably include_bias=False in the features.
(I'm assuming that to add the constant to the result array is cheaper than having the input have one more column).
Thanks yall. I'm afraid I need some time to process what you all are saying. It seems you have confirmed it is easy and possible, I just need to figure out the cobfusion in my head. If I can't make sense of things, I might come back with sample inout data and sone code tomorrow or Monday.
@modest rune Basically, the general idea is that you can do polynomial curve fitting by only linear regression by generating tons of new features (like, if you have arrays of X and Y coordinates, you also generate arrays of multiples X*Y, X**2, Y**2 (for order=2)) and then fitting a line to this 5-dimensional data. PolynomialFeatures for the former, LinearRegression for the latter (or something with normalization like Ridge)
So in general, you just do
lin = LinearRegression()
model = make_pipeline(PolynomialFeatures(degree), lin)
And then pass it the input and output: the input an array of shape (m,n), the output of shape (m,k), where:
n is the number of points - in my case, a total of 10000 points.
m is the number of dimensions of each input point - in my case, 2.
k is the number of dimensions of each output point. In my case, it's 1. Having it >1 is the same as having several models with the same inputs, but predicting different parameters of the output.
@tidal bough thankyou so much!
@desert oar I dont think it particularly belongs in async, because it's about model training strategies.. and ok
i'm trying to translate natural language text into text for a program..anyone know where I can get started?
is it possible to make a dictionary where the keys increase by an increment
say
a = 5
...
Results in dict = {1: None, 2: None, 3: None. 4: None, 5: None}```
@winter citrus you can use google translate api
@bitter fiber ridge regression is L2 regularization, if that's something you are familiar with
is it possible to make a dictionary where the keys increase by an increment
say
a = 5 ... Results in dict = {1: None, 2: None, 3: None. 4: None, 5: None}```
@prime elm ...you want all the keys to be None?
>>> dict.fromkeys(range(5))
{0: None, 1: None, 2: None, 3: None, 4: None}
adapt as necessary
inputs = np.array([
[1, 2, 3, 4, 5, 6],
[7, 8, 9, 10, 11, 12],
[11, 22, 33, 44, 55, 66],
[100, 200, 330, 400, 500, 123],
[99, 123, 33, 32, 12, 44],
[9999, 123123, 123123, 444343, 5555, 66699]
], dtype='float32')
targets = np.array([
[4], [6], [8], [10], [12], [14]
], dtype='float32')
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs.shape)
print(targets.shape)
train_ds = TensorDataset(inputs, targets)
train_dl = DataLoader(train_ds, shuffle=True)
There is a problem with converting the inputs into tensors from numpy arrays
Because it says it expected a numpy array but a tensor was given instead
But having only one row works just fine
tensor([[ -2.3799],
[ -6.7324],
[ -23.5662],
[-145.1938],
[ -52.2061],
[1322.5454]]
Some of my predictions still have negative values even though I used mse_loss
never mind I figured it out
when computing limits, is it possible to do it all with factoring/one function, or do other methods have to be implemented?
I have an error with the final line in this file
Error output:```
untimeError Traceback (most recent call last)
<ipython-input-55-90c5585d3b40> in <module>()
1 opt = torch.optim.Adam(model.parameters(), lr=7)
----> 2 fit(5, model, loss_fn, opt, train_dl, eval_dl, accuracy)
3 frames
<ipython-input-49-afd130f584e4> in forward(self, xb)
18
19 def forward(self, xb):
---> 20 xb = xb.reshape(-1, 784)
21 outputs = self.linear(xb)
22 return outputs
RuntimeError: shape '[-1, 784]' is invalid for input of size 200
I have tried stack overflow but the solutions that are covered are part of a more advanced model and I am unable to follow along with it.
And there may be a few lines of code that are not needed so don't mind those too much
ping me btw
yeah
at least I think so
doesn't it just well
reshape a tensor
into a difference shape
Hi. I have already watched "Udemy_The_Data_Science_Course_2020_Complete_Data_Science_Bootcamp_2020". It was simple and I think it was for beginners and at a shallow level. Could you suggest me better online course (deep knowledge) to become a data scientist? I have M.S. degree in artificial intelligence, thanks.
I do not know where I can ask similar questions about it, here or another channel
do you understand
@velvet thorn yeah I think so
yeah so
how can you reshape a tensor of shape (200) into one of shape (1, 784)?
it doesn't make sense
they have different numbers of elements
so then
it's the MNIST dataset though
So I don't think that's the case
unless there are multiple instances
or I made an error
so what can I do then
instead of making it into 784
I just change 784 to 200?
But the MNIST dataset is a 1x28x28
changing it from (-1, 784) to (-1, 200) just gave a matrix multiplication error
@velvet thorn
I have this endpoint code to get the average stock closing price given a stock name, month, and year
@app.route('/stock=<stock>/date=<date>/average', methods = ['GET'])
def average(stock, date):
if request.method == 'GET':
dict = {'FB': 0, 'AAPL': 0, 'NFLX': 0, 'GOOG': 0}
if stock not in dict:
return "This stock does not exist. List of stocks (case sensitive): \nFB \nAAPL \nNFLX \nGOOG \n"
try:
dt = datetime.datetime.strptime(date, "%Y-%m")
except:
return "Please enter a valid month and year \nExample: 12-2020 \n"
df['date'] = pd.to_datetime(df['date'])
by_stock_month_year = df[(df["company_ticker"] == stock) & (df['date'].dt.month == dt.month) & (df['date'].dt.year == dt.year)]
if by_stock_month_year.empty:
return "There is no available price for that date \n"
prices = by_stock_month_year["closing_price"]
data = {}
data['price'] = round(prices.mean(), 2)
return json.dumps(data, indent = 2)
else:
return "Only GET methods are supported \n"
For this csv file
company_ticker,date,closing_price
AAPL,1989-09-19,1.54
AAPL,1989-09-20,1.59
AAPL,1994-12-08,1.28
AAPL,2019-11-15,265.76
GOOG,2004-08-19,49.98
GOOG,2004-08-20,53.95
GOOG,2019-11-15,1334.87
Is there a way to make this cleaner
784 is 1 * 28 * 28
@velvet thorn I understand why it's it's 784 but I have no idea what to do next
Someone in SO helped me out
the solution worked
the problem was
I left out an argument in a function loss_batch
Hi, I am trying to scrape emails from yelp by crawling into individual listing. Using bs4 and selenium for it but not able to scrape them. Where do I ask this?
that might be against yelp terms of service, in which case we can't help with that on this server @sweet ember
!rules 5
5. Do not provide or request help on projects that may break laws, breach terms of services, be considered malicious/inappropriate or be for graded coursework/exams.
how much should i have this?
net = tflearn.fully_connected(net, 12)
and of what size?
it claims to be higher-level than TF itself, but doesn't TF has its own Sequential class that allows building models the same way?
is tflearn separate from tf?
wdym
it's a wrapper over TF, basically
TFlearn is a modular and transparent deep learning library built on top of Tensorflow. It was designed to provide a higher-level API to TensorFlow in order to facilitate and speed-up experimentations, while remaining fully transparent and compatible with it.
how much should i have this?
net = tflearn.fully_connected(net, 12)and of what size?
@calm wagon
@calm wagon you should probably find some existing simple implemetation/guide and see how they do it
speed-up experimentations ok
is it possible to build a face-recognition using Tensorflow
that, or just guess. 3 layers of 100 neurons or something.
is it possible to build a face-recognition using Tensorflow
well, yes, this is one of the things ML tends to be used for ๐
that does seem redundant tho
!tempmute 739406136981192784 1d Be silent.
:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until 2020-08-24 16:54 (23 hours and 59 minutes).
Dose Tensorflow specifically made for face-recognition stuff like that?
tf is specifically made for machine learning yes
Tensorflow is a pretty low-level framework for machine learning and neural networks.
It's not, like
from tensorflow import FaceRecognition
FaceRecognition().recognize(faces)
๐
TensorFlow or Opencv which one is great for face-recognition?
for a simple solution, google gives me https://pypi.org/project/face-recognition/
opencv can't do recognition on it's own
ik
whereas tf can train on images
Thanks @desert oar I was just trying projects on webscraping/crawling for my github. I ll try with someother site
cvlib can detect faces
and then max pooling extracts numbers from each frame separately ๐
for example, image (32x32x1)
image -> convolution of 10 filters -> result is 10 x (30, 30, 1)
@sweet ember wikipedia.org is a good place to start. you also dont need selenium for that which makes it a lot simpler
Recommendations for best way to write dataframes and numpy arrays to a file? I assume numpy and pandas has builtin functionality to do this. Should I use those or is there something better?
Whatever direction I go, I'd like something that can handle large datasets and I can be reasonably confident won't break as I upgrade my library versions over the years.
Human readable files would be a plus, but not at the cost of huge file sizes.
Looking at pandas file IO documentation, it seems they give lots of options. Which one is the best fit?
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
I assume pickling is a bad idea because pickle files are likely to fail to load if you try to load a pickle file saved in a previous library version?
Recommendations for best way to write dataframes and numpy arrays to a file?
numpy hassave, which saves to a numpy's ownnpzbinary format
I saw that. I am leaning that direction. But, I think I would prefer something that works with numpy AND pandas.
specifically, I think I would have to use a different function to save a dataframe to file. Which isn't the end of the world, but not ideal.
The idea that json is human readable is very attractive... maybe the files wouldn't be too big in json. If I had to guess, the largest files I would save would be maybe 1 billion doubles.
honestly, a dataframe/array in json will probably not be very readable ๐
^^
by the way, if they are 2d, you can use just csv
or excel's version
honestly, a dataframe/array in json will probably not be very readable ๐
@tidal bough
Good point, I hadn't even contemplated what it would look like.
Interest point from this stack exchange discussion: "Another useful point is that although ASCII CSV encoding isn't very efficient, using a file compression utility (like zip, gzip, etc.) on your ascii file will typically bring the file size down to something similar to the size of a binary file."
https://scicomp.stackexchange.com/questions/8404/binary-vs-ascii-file-size
.npy is binary, .npz is compressed @tidal bough btw
so german if you care about efficiency/compression i'd suggest looking at numpy's file types
while(True)
if (cvlib.object == 'person') AND (cvlib.wields == 'baseball bat'):
police_state.send_swat_team()
hahaha! I just received an email from the Central Intelligence Agency asking me to code their new Auto-Policing robots
hahaha
what did it label the books?
while(True)
if (cvlib.object == 'person') AND (cvlib.bottles >= 2) AND (cvlib.ethnicity != 'russian'):
bar.refuse_service()
@molten hamlet here, I fixed it for you
And these examples are why I tell my brother that AI is going to cause a disaster at some point.
xD
you can use slavic instead, im not russian ๐
russian can understand polish but we can't understand theirs :d
Petty Officer Dirk: "General Dukes, the AI has detected the launch of 56 thermonuclear warheads. But, I am pretty sure it is just a bunch of pencils that fell out of a teacher's satchel. The AI is recommending a counter-offensive."
General Dukes: "Son, trust the AI, launch the counter-offensive."
@molten hamlet you could be American for all I know ๐ I didn't mean to insinuate you were Russian. Was only making a joke that Russian's can hold their liquor.
nah chill ๐
im fine
I love that korean soju
its cheap in korea
but not cheap here due import ๐
should I know something specific in detecing road signs or keeping car on road between line and edge ๐
got interview tommorow
Can't help you with that. Maybe someone else can chime in.
Nope, zero experience with machine learning. Other than I have starting trying to get better at curve fitting.
"dates": "[[\"2004-08-20\", 53.95], [\"2019-11-15\", 1600.63]]"
Does anyone knowhow to get rid of that weird formatting on the dates
@modest rune you have no idea how close that was to actually happen. Soviet early warning syatem got confused by sun reflecring off clouds and assumed nato had launched a first strike.
@untold hare wow.
I think a similar thing happened with America's early warning system.
Lots of incidents yeah. There is a good book about this lemme see if I can find it. Basically a must read if you do data science and ML for defense companiea
Cool! Thanks, I might kindle taht.
Do it
Hey someone can help me to fix:
print("Train data:")
for i in tqdm(range(0, X_train_windowed.shape[0] - seq_len+1)):
X_train_Conv_LSTM[i] = current_seq_X
y_train_Conv_LSTM[i] = y_train[i + seq_len - 1]
(262, 3, 50, 50, 3) X_train_Conv_LSTM.shape = (1, 3, 50, 50, 3) current_seq_X.shape
(262, 1) y_train_Conv_LSTM.shape = (264,) y_train.shape
cupy\core\core.pyx in cupy.core.core.ndarray.__setitem__()
cupy\core\_routines_indexing.pyx in cupy.core._routines_indexing._ndarray_setitem()
cupy\core\_routines_indexing.pyx in cupy.core._routines_indexing._scatter_op()
cupy\core\_kernel.pyx in cupy.core._kernel.ufunc.__call__()
cupy\core\_kernel.pyx in cupy.core._kernel._get_out_args()
ValueError: Out shape is mismatched```
is there some library that lets you create an index on a column in a pandas dataframe that isnt the index of the dataframe?
e.g. some data structure that keeps a sorted collection of rows, or a hash table, and does binary search or a hash lookup to find the dataframe rows that you want (or whatever other index implementation is out there, trees etc)
df = pd.DataFrame(...)
product_category_index = ColumnIndex(df['product_category'], algorithm='b-tree')
df_pants = df.iloc[product_category_index('pants')]
something like that
would be a fun project if nobody has done this already
Could someone help me in the help voice channel?
is there some library that lets you create an index on a column in a pandas dataframe that isnt the index of the dataframe?
@desert oar hm.
not possible in general, because the index would need to update with the DataFrame
like you could hack it, but it'd be prone to breaking with pandas updates
and the caller would re-index as desired
and the caller would re-index as desired
@desert oar then what would the benefit over normalpandasindexing be
since filtering is at worst linear, and index-building is at best linear
for big datasets where you already have an index but need to do repeated lookups on non-index fields, or a variety of fields
not that uncommon in my work
I see
fair enough
okay I'm going to need you to stop talking about use cases
because I don't think I need another side project
lol
class BaseIndex(metaclass=ABCMeta):
def __init__(self, data: Sequence[_T]):
self.data = data
@abstractmethod
def lookup(self, val: _T) -> Optional[int]:
pass
class BinsearchIndex:
data: Sequence[_T]
data_sorted: Sequence[_T]
sort_key: Optional[Callable[[_T], Any]]
def __init__(self, data: Sequence[_T], sort_key: Optional[Callable[[_T], Any]] = None):
super().__init__(data)
self.sort_key = sort_key
self.data_sorted = sorted(data, key=sort_key)
def lookup(self, val: _T) -> Optional[int]:
# https://docs.python.org/3/library/bisect.html#searching-sorted-lists
i = bisect_left(self.data_sorted, val)
if i >= len(self.data) or self.data_sorted[i] != val:
return None
return i
i slapped this together, not sure if it actually works
what do you see this line doing though
df_pants = df.iloc[product_category_index('pants')]
yeah
looking it up in the index in < O(n) time
then looking it up in the dataframe in O(1) time
I mean
idk if it actually works that way
what's the expected output
the column pants sorted by the value of product_category?
i.e. df.sort_values(by='product_category')['pants']?
it would be equivalent to df.loc[df['product_category'] == 'pants']
wouldn't that just be df[df['product_category'] == 'pants']
but okay I get it
if you say that the index doesn't need to change with the DataFrame
then it seems to me that you could just use a dict
where the keys are unique values of the given category and the values are row numbers
which would reduce lookups to constant time
thats what i was thinking too
that was on my TODO list
you could use a B-tree or whatever
but yeah a dict is easy
also this doesnt support range index lookups (yet)
eg if there is more than 1 row with that value
and obviously something like this is kinda useless except on pretty large dataframes
heh have fun ๐
indeed
how big are your dataframes?
honestly I don't think I've ever been at the point that this would be a necessary optimisation
not that big anymore
but ive worked on problems with > 1bn rows in memory
or where the lookups just needed to be faster than they were
and you needed to index on arbitrary columns
such that a multi-level index wouldn't have worked?
@velvet thorn thats an interesting option, i still like this separate index idea though ๐
im curious if it can actually produce any speed improvements on bigger datasets
and a new project is born
I'm working on a project generating guitar hero charts based on tablature but honestly I don't think I'll ever finish
using ML?
Does it even need ML for that?
does anyone know what nan means when you're calculating your loss?
Hmm alright
yeah
due to numerical error, some number just get smaller than epsilon
too big distinct from that usually comes when your learning rate is too high
so gradient descent becomes gradient ascent ๐ข
hmm well I have that right now
I messed around with different lrs
but it didn't work after changing it differently
Immediately
then it's not that
Then what could it be
too big distinct from that usually comes when your learning rate is too high
@velvet thorn not this
the other stuff we said
huh
so my loss could be too large?
but becomes nan after a while
(and you see it going up real quick)
that suggests that your learning rate is too high
because your model's parameters bounce out of the valley of low loss into the skies of float overflow
but if your loss starts out as nan
that implies that the problem is something else
e.g. division by 0 somewhere
because your model's parameters bounce out of the valley of low loss into the skies of float overflow
@velvet thorn what does that mean
do you know how gradient descent works?
yeah
then you should understand that...?
My english isn't the best lol
An increasing gradient requires a low learning rate right?
and a decreasing gradient is the opposite of that
basically if you adjust your weights by too much each iteration it is possible that you will "bounce" to the other side of the loss landscape
So you wanna check my code? Could it be because of the way I added in my input data?
increasing loss in the process
So you wanna check my code? Could it be because of the way I added in my input data?
@desert parcel no thank you
Because I've never done it this way before
hard to say, could be a few things
well I did that there are no issues but I'm just wondering
Because I'm only trying to predict one thing
yes, so it should be 1D ,right
Oh yeah
(9, 1) is different from (9,)
TBH I don't have much experience with Torch so I don't know how it would handle such things
but it's at least a little strange IMO
Well but I have 2, 2D tensors should be fine right?
I'm trying to set up an upstream for tensorflow by doing
git remote add upstream git@github.com:tensorflow/tensorflow.git
however when i try to run
git pull upstream master
I get the error seen in the screen shot. If anyone knows what im doing wrong please lmk. Sorry if im intruding in a conversation
well I meant that I have two 2D tensors multiplying them together should be fine
well I meant that I have two 2D tensors multiplying them together should be fine
@desert parcel okay, you have kind of lost me
which two tensors are you multiplying together
preds and targets
I'm trying to set up an upstream for tensorflow by doing
git remote add upstream git@github.com:tensorflow/tensorflow.git
however when i try to run
git pull upstream master
I get the error seen in the screen shot. If anyone knows what im doing wrong please lmk. Sorry if im intruding in a conversation
@random perch you can't pull directly from the TF repo
preds = model(inputs)
@desert parcel ah, okay
that seems reasonable
could be something else in the data
hard to say from here
just experiment a little
@random perch you can't pull directly from the TF repo
@velvet thorn How do I update my forked repo to match the TF repo
okay it's been a while since I actually forked a repo
so I don't wanna tell you the wrong thing that I'm not sure about
Lol i'm not even familiar with git
ite bet
actually
I feel like what I said might be wrong
about not being able to pull directly
hm
let me try
Hmm
it's a different issue
well then I'm not sure how to proceed then
Git can't access your credentials
are you using Windows or *nix
oh okay I think I get it
do you mean unix?
Im using mac
you're trying to connect using SSL
so unix
the SIMPLEST way to fix this
is
do this instead
git remote add upstream https://github.com/tensorflow/tensorflow.git
oh mm
although I would suggest you look into setting up SSH keys
yeah that actually might work lol
no Mac experience, sorry
what do u use
Ubuntu
well with my code the tensors seem to be working fine
why is there
a nan
in the input?
if you have nan inputs of course the output will be nan too
oh yeah
well
I mean
git remote add upstream https://github.com/tensorflow/tensorflow.git
@velvet thorn 10/10 it worked ty!
yw
I'm not sure how to put this in a non-condescending/offensive way
but this is really basic debugging
so...yeah...
๐