#data-science-and-ml
1 messages · Page 75 of 1
Yea Thanks you so much
Im still new to this library
numpy so much easier
You're welcome, np
One quick question, Can i ask why to print only certain colums its
print(df[['MA1', 'MA2']]) and not
print(df['MA1', 'MA2'])
df[col] is the way to select a subseries inside of the dataframe
df[[cols]] is the way to select a subdataframe
In the first case, the argument is a string, in the second case it's a list
thank you!
same. I hate pandas with a passion. It appears to be the standard people use, but I try to avoid it at all costs and just use numpy or just straight python if possible
I hate it but I’ve just learned to accept the terrible ergonomics.
Yea ITs sucks ive never seen anything like it
interesting, i find pandas to be the only thing I enjoy doing in python.
To expand on that explanation, the reason df['MA1', 'MA2'] doesn't get you two columns is because ('MA1','MA2') (a tuple of two strings, yes) is a valid column name in pandas. Yeah.
it just comes across to me as unintuitive and I don't really see it offering any advantage over say numpy. Everything pandas does in terms of data organization and modification can easily be done using just python (no dependencies required), and while various libraries like numpy, scipy, matplotlib, etc. work well together, the same cannot be said for pandas.
i mean, you can replicate pandas's capabilities in numpy, but with quite a lot of effort - either structured arrays, or an array per column. Also, the moment you want a groupby, problems will start.
(people who dislike pandas might want to take a look at polars, though - it is similar but makes some different choices, like not having indexes)
Idk maybe coz im New to it but its just whole new world to me
I personally prefer how matlab organizes and treats data. Love it's syntax and usage with linear algebra
I was told i could do this, but i did some research and found for a lot of data, pandas is a lot more efficient than numpy
Having worked almost exclusively with pandas for more than a year, I can say that it's very natural to me and it has so many capabilities
It's not made for linear algebra, just data analysis and transformation
I am facing an error in my code. can anyone help me to remove this error?
send it these guys r good
I am facing error in line 144 of SHAP summary plot. https://paste.pythondiscord.com/CT5A
I like numpy for numpy stuff and sql for everything else: pandas angers me because it is less capable than both. (I’m conflating sql and dbs but you get the idea)
I literally can't ask for help with my code when smth doesn't work, cus it's such a mess
are you sure? if your code is messy but functional, then it should be fine
pandas is literally numpy internally, the things that you can do with a typical pandas dataframe is mostly a superset of what you can do with the underlying numpy array (including access the underlying array if needed). if the "data frame" concept isn't useful for you then you don't need to use it. but it has a long successful history among statisticians and other data analysts in python, as well as R for many years before pandas came out
df[thing] selects the "thing", whatever that thing is. if it's a string or a list of strings, it selects a column or multiple columns. hence you need df[[col1, col2]]. it's completely logical, don't be thrown off by people who hate pandas because they didn't bother to learn how it works.
pandas does have a few big design flaws however, e.g. you can write df[thing] for boolean masks as well
that is, sometimes df[thing] is df.loc[:, thing] and sometimes it's df.loc[thing, :] depending on what thing is. imo that's bad design, and df[thing] should be reserved for onlye one of those cases. personally i always use it for the former and flatly reject any code that uses it for the latter.
Maybe ill learn to love pandas, but right now its a pain in the ass
the docs honestly aren't great. that's the worst part.
the reference docs are good, but the "howto" material is kind of chaotic.
it's somewhat easier if you already know data frames from R
Aight well Corey Shcafer it is
who?
ort.InferenceSession.get_inputs() equivalent in cv2.dnn.Net?
i don't recommend learning programming from youtube in general
Does matplotlib numpy django and pandas
he might be good, but my overall trust in "youtube programming educators" is very low
Eh if you think that, i learnt all my file handling and matplotlib from him
in any case we have some very experienced pandas users here. feel free to ask or search stackoverflow
my teachers hate stackover flow
why?
I've started reading that part recently, it's indeed a mess
Idk i think they think of it kind of like the wiki pedia for prgramming
its just blurts out the answer with no explanation
but i learnt it myself if i really have too
imagine learning from that and that only, back in 2015. what's a little unnerving is that the core tutorial and howto material is largely unchanged since then. i once tried to start revising it but i felt a little out of touch with what the core devs wanted for it and gave up. i'd need to be in closer contact w/ someone on the core team to make good progress on it.
i spent a while on it though. i should have saved what i did.
i think there are some good books on using pandas
the good answers do provide an explanation
at least, i always try to provide an explanation with mine
I feel like you just like helping people
props to you
if i didn't have R background knowledge i don't know how i'd have learned pandas tbh
i do indeed. i wouldn't be here otherwise
Well i probably should start using R, but no clue where to start
nah, don't spend time on it. learn one thing at a time
I didn't have any R experience but I could probably have been way more efficient
learning another programming language is low value among the many other things out there to be learned
For me, it's not SO (or GPT) that's the issue... its how people using it. Instead of asking: "What's the API for XYZ? What parameters does it take? How should I use it?", people jump straight to: "here's how you get the top 5 results from a dataframe".
And thus, they get caught in a vicious cycle of solution seeking, rather than understanding.
But the thing is, in my experience, thats exactly how we are taught 'this is how you get XYZ' 'This is how you plot'
It's even worse with copilot
Oh, that's interesting.. I haven't tried, but I can totally imagine.
the only major advantage I've found R to have is it has a lot of pre-set libraries for very common data analysis for various scientific fields. Otherwise from my experience u can use both interchangeablt
I REALLY WANT CO PILOT
i emailed Github but they havent responded :(
You don't even need to write a proper google search or prompt
It just fills your code
Yea
Can't you just buy the subscription?
All you have to do is give a function the actual name of the function and it fills the function
You can but im trying to go through my uni account coz i might be able to get it for free
Oh, right
In my uni its only avaible for the Engineering students and COmp sci students or Stats students
im neither so i have to send a personal request
I thought they would give out licenses to every student, that's surprising
Honestly with the amount of python on my course i expected it too
But GOOD NEWS I got pycharm pro
so thats good at least
Like I'm pretty sure the free azure credits is available for students regardless of major
Azure crdits?
is this for VS?
it's not, Pandas has switched to Arrow internally
What does pycharm provide that VSCode doesn't?
Pandas' Syntax is godawful though. It's the only data frame library I've used that feels wrong and I've used several in multiple languages.
For cloud services
I have no clue, i havent really used any IDE's apart from python IDLE and Pycharm
so i just grew up with pycharm
Polars is a better option, both performance and syntax wise, but it definitely doesn't tie in as well with the data ecosystem
used it for all my codeing career so far
No exactly, but pandas does support both numpy and arrow backends... but there are many workflows that require numpy... so I'm more curious what happens longer term with the broader community.
Pandas will remain king because of sunk cost
I didn't expect pandas to be such a controversial topic lol
But personally I'm not writing any new code in Pandas
Unless I really have to for compatability reasons
Quick question, i had an issue now where my training on Google colab stopped after idling with T4 graphics card
Was training a massive CNN but at the point it wasn't using T4
Do you guys know what the cooldown is?
I am using free Google colab
eh it's really preference. I prefer to reinvent the wheel half the time and try to do everything in python so there are zero dependency concerns (ugly bloated code, but I like it). There r tons of libraries that all do the same thing, all just comes down to preference
Lately, I've just been gutting pandas code and replacing with duckdb sql, it's been much cleaner and more flexible. but, I love sql.
Polars does look cool, I might try it out for personal projects
not completely. it's still provisional and numpy backend will not be removed
more verbose than pandas but worth learning. main downside right now is lack of indexes and devs hostility to them
Polar bears are more deadly than pandas
what disadvantages does this come with? just the lack of natural joins?
this has convinced me to use polars
filtering on things other than the sorting key
I don't really understand - filtering is much nicer in polars
Lack of indexes is amazing
They're horrible
do you have example of what you mean?
Is there something equivalent to query in polars?
Anyone maybe has experience with this? 😅
you can get an SQL context, or you can just chain filters/selects. you don't really need it
Makes sense
I've been wondering: does anyone know the technical reason xlsx files are so slow to load on python/pandas?
Don’t know your specific issue and never benchmarked this in pandas, as they’re acceptable for me, but: xlsx files as zips of xml files, which tend to be large and slow for very large datasets. Plus the side caching of shared strings leads to a lot of lookups.
is RNN a viable model for anything? or Transformers have been the norm?
Am I wasting time and effort in deep diving RNNs?
afaik, RNNs are still relevant for time series data
But other than that, it's mostly still a hot topic for research because there is a lot of untapped potential
@civic elm should I do khan linear and stats or do Coursera on the math 🤔
hard to beat MIT 18.06 for an intro to linear algebra
because the xlsx format is relatively slow to parse. if you used the underlying library (openpyxl) and did it yourself with for loops, it'd be just as slow or slower.
Do model inferences get slowed down when you do A, B, C, D in a cycle? Do I instead use multiprocessing and queues to speed this up, or will it just cause resource starvation?
MIT that good for linear algebra 
dr strang is a legend
what do you mean by "A, B, C, D"?
Model inferences
A is face detector
B is face recognition
C is magik
D is more magik
unless all the models are being trained simultaneously like in a GAN, i would not try to train them all simultaneously
that seems like a giant waste of effort to get it working right, and it doesn't seem like you actually get anything out of it. if anything it's worse because you can't adjust the training processes individually
Not train, its for inference only
Wouldn't be too much work probably https://stackoverflow.com/a/45231035
@desert oar btw i wanted to ask whether yunet would be a decent enough replacement for retinaface.
Its built right into opencv so no extra deps
i wouldn't know, i have no practical experience with facial recognition in particular
is that because you're a lump of rock with a light bulb inside you?
i see. does B input require A output? or are they all completely independent?
yes, the entire concept of a "face" as you humans understand it is alien to my species, and i still have family members who don't really get it
so i try to stay away from those problems at work. no good intuition for it
Fascinating
Yes
🤔
Its a pipeline
so you need the output from each step as the input to the next step?
The question is whether I should operate it concurrently, in batches or serially right now?
i see, you are asking about running the pipeline on multiple images
that's actually a very good question
Its actually a video
But video no good for my model
Need to break it down to frames
i see, is the model pipeline sequential across frames? or can you batch/chunk/re-order arbitrarily and it won't change the results?
Uhh i don't think it depends on previous results, if that's what u mean
okay. instinctively i would imagine that if i need to analyze something in a video i would want some window of past frames as input to my current inference. that would limit how you can run this pipeline
Fyi its YUNet > SFace > INSwapper > (some ONNX upscaler that opencv can run, not decided)
i think in general the answer to this question is why ML engineers get paid the big bucks. but i think the short answer is that it depends on what hardware you have available and how the models are implemented.
its not like that, but i dunno if the model internally does some stuff like that
if the underlying implementation is already multithreaded/parallel, you can probably do ok by running it serially. you wouldn't want to combine that with parallelism in your application because everything will get gunked up
I am super new (3 days old) to this stuff
you would know if it did, i think. it sounds like you're planning to use pre-trained off the shelf models to analyze each video frame as a separate image, so it's probably not a concern
Yea
you might want to check the opencv docs to see if it says anything about threading or parallelism
it's very easy to run into situations where paralyzation actually slows things down because your program is spending too much time sending data between processes
The bigger concern here is that I don't run into resource starvation
Most likely RAM / VRAM
Serial pipeline won't cause it
so if opencv has a way to run each inference in threads or processes, i recommend starting there and benchmarking
But if I parallelize it, what little control do I have over how much RAM the model chooses to eat?
eg numpy includes openmpi support
probably just inference batch size
i would start by just running everything serially and profiling + looking into threads/processes within opencv
Yea I'd need to run the serial pipeline with multiprocessing + queue anyways because it shouldn't block the gui
which makes me wonder, how does gradio achieve non blocking UI when everything is happening in the main thread?
are you sure everything is happening in the main thread?
That's what I have seen in many gradio apps
Or maybe they do use something after all?
I have definitely not seen any code using queues
is plotly optimal for large datasets?
idk if consumes much memory just to make a chart
optimal? I dunno if anything is optimal, but I do some large stuff with it.. but for complex diagrams, make sure to statically render rather than interactive.
matplotlib is still the baseline everything is compared against. Generally, for large datasets, the first step should be reducing the complexity of the plot, whether from quantizing/sampling/aggregation/smoothing/whatever
How do u mathematically determine whether a distribution is skewed or not
I’ve tried plotting the histogram, and it looks skewed but I heard that median is better than mean for skewed graphs and so I tried comparing the mean and median results and the median result actually gave a result that leaned more towards the skew
Does that mean my data isnt actually skewed?
skew and kurtosis
Oh I get it now. Imputing with median is not meant to address the skew. It’s just suppose to make the distribution more robust
Does anyone know if there is a straight-forward method of adding hover-text to a seaborn generated line plot? The plot is busy enough that a legend is not useful.
I've experimented with all of the major plotting libraries, and Id rather stick with the seaborn/matplotlib ecosystem if possible
Hi, this may be the wrong channel but, generating an image with pytorch takes a very long time, with CUDA enabled:
with autocast("cuda"):
image = model("An image of a hand with a ball of ice levitating above it.")
This takes about 4 minutes with my RTX 2070S
Is there something I need to enable in windows11?
hello everyone, I'm looking to try and make an app for cameras in vehicles so that it can immediately detect and count the amount of passengers inside. Are there known examples of this that I can study from? I'm very new to AI/ML, I only managed to make a custom YOLOv4 model to do palm oil fruit classification deployed in an android app (I put the model as a .tflite inside the app itself) recently. I'm thinking can I use YOLO models for this passenger detection & counting? And if I want to have the AI model to be in a web API/cloud to be consumed through a website, are there examples on how to do that?
it's very typical to deploy a model in the cloud like you describe. often you can just do it with a basic web framework like fastapi or flask. but there are also platforms that can do it for you
can you mention some of those platforms?
and another question. How's everyone's opinions on ML.NET?
sagemaker can do it for example. or mlflow
MLflow gives you many features (and complexity!) that you may (or may not) need
For the easiest case I'd start out with a simple container running your model with sanic / fastAPI, maybe CI/CD to easily update the model
I'll try that. thanks
since there's a lot of YOLO models now, which one is the best one for detecting and counting just one type of object?
I made a simple neural network to predict y = 2x + 1
but the output is off by 0.002 
what is the reason for this?
the dataset contains no noise? ie: every entry in the dataset is exactly y = 2x+1 or there is some error value?
since its easier then having multiple objects etc what I did is that I took the yolo8n and changed the head for something simpler
Hi everyone, I am facing this error in the code line 144 "IndexError: index 2 is out of bounds for axis 1 with size 1" the code is https://paste.pythondiscord.com/CT5A
there's no noise in the dataset
def gen_data(start, stop):
x = np.array([])
y = np.array([])
for i in range(start, stop):
x = np.append(x, i)
y = np.append(y, (i*2)+1)
return x, y
X, y = gen_data(1, 100)```
and what is the architecture of the model?
I'm pretty new to AI/ML, so how did you change the head? If you don't mind explaining
if you didn't already know, the way you've written this code is incredibly inefficient
I pulled the yolo repo, and wrote in pytorch a new model using the yolo backbone and the new head. but if you need yolo you can directly used in my case I had some issues with inference time and overheating so it was needed to have a lighter model.
model = tf.keras.Sequential([
keras.layers.Dense(units=1, input_shape=[1]),
keras.layers.Dense(units=5),
keras.layers.Dense(units=10),
keras.layers.Dense(units=5),
keras.layers.Dense(units=1),
])```
I didn't really think much about it, just messing around

i just realized that np.append returns a copy
right I really have to learn about how to improve inference time too, lots of homework. Thank you
i'm working on yolo for an edge device so that's important for me
can you share the whole code? i'm not seeing where this is going wrong
maybe i should use a regular python array and case it to a numpy array when returning?
a list is not a "python array"--it's a list.
you could also do this
x = np.arange(start, stop)
y = (x * 2) + 1
oh ok thanks
So lets say I've made some basic neural network - and now I wish to download the model. Am I essentially downloading the (now adjusted) weights and biases of* the currently trained model?
download it? where from?
download, or just save?
I think save is a more appropriate word
"The thing that will enable me to continue training it later" maybe?
if you save the model, the information that gets written to your hard drive is some representation of the weights and biases, yes.
depending on your requirement you may also choose to save the state of the optimizer (to pick up where you left off) as well as model configuration
I think I've never seen a "downloadable model", only the weights and biases. Then you have to rebuild its architecture in your code 
if you save the model configuration as well then you don't have to, for example .h5 models
saving just a state dict for the parameters only is quite useful tho, so it's popular
And I could simply feed the machine those saved weights and biases, instead of the initial, random weights and biases?
Ah, I haven't gotten much into optimizers yet. So I'm not sure what this means
I'm just trying to figure out how I should construct my program - I'm just building a simple neural network with NumPy as a starting project
is there a way I can avoid doing a full scan on a pandas dataframe when filtering?
I want to get the first 5k filtered rows (when there are 500k), I dont want pandas to keep filtering the dataframe once it found 5k rows, is there a way to do this? and are there any alternatives?
need more context before i can comment on this.
what kind of context you want to know?
what is the data that you are filtering on?
what kind of "filter" is it?
what is your data's cardinality?
why first 5k?
how often do you need this?
does data change?
what performance do you have now and what do you expect?
hmmm thats alot of questions, my question is fairly simple, can I do this:
from more_itertools import take
big_data: Iterable[...]
filterd_generator = (x for x in big_data if predicate(x))
print(take(5000, filterd_generator))
instead of:
filterd_list = [x for x in big_data if predicate(x)]
print(take(5000, filterd_list))
you see the difference?
pandas does the latter.
I want to do the former
hmmm thats alot of questions, my question is fairly simple, can I do this:
i don't ask question just for the sake of asking question, it's all for the ultimate goal of helping you.
if you simply require an answer to that, then no, you can't do that as far as i know.
I don't know of a way to do that in pandas (but doesn't mean it cant be done) without some sort of iteration, and iteration is generally an antipattern with pandas.
if you simply require an answer to that, then no, you can't do that as far as i know.
thanks, thats what I wanted to know 🙂
You could map, for instance, accumulate and perhaps throw an exception when bucket is "full"
Or, perhaps do the filtered list over smaller windows of the data
and what if the sliced window dosent have enough to fulfill the 5k threshold
(which is just an example, but it could be any number)
you'd just keep moving the window until you fill.
ie: check first 1million, then next 1 mill, etc
I see
I'm thinking of using Polars with their lazy API theyre supposed to have this functionality
but I dont want to add that dependency to my project
but, what kind of condition do you have where this is important? Like, df['col'] == something is not an expensive operation, and df[condition].head(5000) is only returning the first 5000 rows
(to be specific: I don't know where you could do df['col'] == something but only return the first 5000 indices that match the condition in a single operation)
col.isin(list_of_vals) for 3 columns, and a very expensive: df.col.apply(lambda lst: my_set.intersection(lst))
the latter is for a filter, not creating a new column
for what it's worth, what I do is: ```py
import duckdb
filtered_df = duckdb.execute("select * from df where col in (?) limit 5000", [list_of_vals])
but, I'm a sql guy.
duckdb has got to have a query optimizer
indeed.
tag me if you have any query questions, this is my jam.
I do actually
if a column is a list of integers, can you actually check that each row is a subset of another list of strings?
this is the sort of things that i wanted out of you, because without this it's pretty hard to point out alternative ways for achieving the same thing in a hopefully more efficient way.
Can you give an example? I don't follow.
yeah its a bit confusing:
df = pd.DataFrame({'col': [(1, 2, 3), (1, 2), (1, 2, 3), (1, 2, 3, 4), (1, 2, 3, 5)]})
# I want to find all the rows that contain: 1, 2, 3, and 4.
to_keep = {1, 2, 3, 4}
>>> df['col'].apply(lambda x: to_keep.issubset(x))
Out[150]:
0 False
1 False
2 False
3 True
4 False
Name: col, dtype: bool
mask = df['col'].apply(lambda x: to_keep.issubset(x))
df = df[mask]
>>> df
Out[154]:
col
3 (1, 2, 3, 4)
This is my preferred approach (given your statement that pandas is too slow for your filter/limit): py import duckdb import pandas as pd df = pd.DataFrame({'col': [(1, 2, 3), (1, 2), (1, 2, 3), (1, 2, 3, 4), (1, 2, 3, 5)]}) duckdb.execute("select * from df where col = ? limit 1000", [(1,2,3,4)]).df() if looking for set intersection, need to get a little cleverer (function added to make this simpler): py CREATE OR REPLACE FUNCTION "@>"(haystack, needle) AS (select c == len(needle) from (select count(*) c from (SELECT UNNEST(haystack) INTERSECT SELECT UNNEST(needle)))) ; select col, col @> [1,2,5] b from df where b = True
note that pandas does the latter because it doesn't have a lazy query engine or a query optimizer. note also that you sometimes need/want to just use plain python instead of doing everything inside pandas. python for loops can be reasonably fast if you build them carefully.
i am gathering that you have an array/list-valued column, and you want to find the first row where the array/list contains some certain values?
the right solution definitely depends on how much data you have, memory vs. cpu constraints, etc. but that duckdb unnest operation above looks very elegant
you could also consider re-encoding your data as an integer bitfield and using binary &. that's a good leetcode trick for lookups on fixed-size sets
to be honest I dont understand how it works, but it looks interesting, I will test it tomorrow
just out of curiosity, in the line:
select col, col @> [1,2,5] b from df
what do the numbers 1, 2, and 5, reppresent?
I like slr's bitfield suggestion, that's probably going to be the optimal approach tbh
[1,2,5] is your to_keep list
oh so I can change that. and I can add any number of items in the array, correct?
yup
got it thanks
as I learn more, I realize that pandas is very ineficcient for filtering, and thats why I think a simple Python generator-expression might faster, altough I havent tried it yet
not just the first row, but the first N rows
I'm not familiar with whathever you mentioned, you would ve able to link some example?
the only optimzation that I did was dictionary encoding, so instead of having sets of strings, I know have sets of integers which are lighter
sounds like the issue is more about your data modelling than pandas itself?
- you should not store tuples, lists and other arbitrary objects in pandas dataframes
- you should avoid using
applyas much as possible
seriously, don't go around complaining about pandas performance if you're using apply(). That alone kills any benefits you might hope to get from using pandas.
if I only had a cent for each time somebody said that, I would be very wealthy
but seriously, whats the alternative?
and about the previous thing you mentioned about not doing a full lookup: Yeah, pandas is not the right tool for that job
^
ehm coughs in pandas performance
pandas is good for medium sized datasets
if it's large enough to justify stopping early on the example case you gave, you might as well use an actual database instead
Imagine you used a bit mask to represent your tuple: where each bit represents a unique member of the tuple. So, 0001 means (1) and 0101 means (1,3), got that part?
pandas is good for
mediumsmall sized datasets
ok?
Then, array & mask == mask means that all mask entries are in the input array
And this is a very efficient vectorizable operation
@left tartan I'm not following you at all, can you link some tutorial or something that I can read up on how this stuff works?
I don’t know off the top of my head any tutorials, just a quick google shows this one perhaps https://towardsdatascience.com/understanding-bitmask-for-the-coding-interview-b1643f4b0e24
(Altho I hate that site for requiring account)
I miss the days bypass paywall actually worked on that site
in Pandas, if i got a column with numbers, 0's and with NaN's, if i wanna replace all the numbers withs 1's and leave the 0's as 0, but leave the NaN's as NaN's, Can i just replace it like df['Boo'] = [1 if df.loc[i, 'Boo'] > 0 else 0 for i in df.index]
or would that also changes the NaN's
i think it's reasonable to have array-valued data in general. pandas however does not have any optimization for it
storing this data as sets might also help
df = pd.DataFrame({'things': [{'a', 'b'}, {'b', 'c', 'e'}]})
important_things = {'q', 'c'}
df['has_important_things'] = df['things'].map(lambda s: bool(s & important_things))
there are a couple of things going on here. yes, pandas has no support for "partial" or "lazy" filtering, and yes i suspect that a plain python loop might be faster (which you can implement in a single pass).
no, if and else are not "vectorized" over pandas series
@umbral charm
boo_is_notna_notzero = df['Boo'].notna & (df['Boo'] != 0)
df.loc[boo_is_notna_notzero, 'Boo'] = 1
Yea tried it and failed miserably
But i found this solution df['Boo'] = pd.notna(TSLA['Boo']).astype(int)
doesnt change the NaN's
took me a good 4 mins to realise not to use the & symbol but and instead
did you see the code i posted directly above?
notna returns True if the value is not null, and False if null
so that will set both 0 and 1 to True, then astype(int) converts True to 1
Thank you!
I want to put my own dataset into this recommendation system. anyone know how i can replace the classic movie lens dataset?https://github.com/microsoft/recommenders/blob/main/examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb
Check the format of the input data and transform your own dataset to fit that format
Suppose we have a dataframe, and a coloumn that has got True's and False's i want to know the index of all the True values, However if there more than 1 Trues togeather, i only want the index of the first one, how would i do this? Im so lost on iterating throught columns
Can you share an example of what you mean?
i have a column called 'Boo' in a dataframe called df, now this column Boo is full of True's and False's values (Boolean) but mostly Falses (90%). i want to find out at what index the True values occure, But if there is more than 1 True value togeather (so like True on index 95 and True on index 96) i only want to retrun the first True index (In this case 95)
i cant figure that out. can i dm?
My DMs are not open sorry
Would this still leave The True values by itself alone?
or would it drop themm too
It would just leave 2 rows, one with true and one with false
And it keeps the first instance of both
Alternatively if you really just want that one true index, you can do df.query(boo==True).index[0]
I was going to suggest something like: a cumsum() of boo == False, then grouping and computing cumcount, and then keeping only those where cumcount()==1 (eliminating any runs)
I see, but i dont think its what im looking for
my dataframe only cosits of 1 column and 1 index
i understand. it looks like the dataset is being imported from another folder. but this folder isnt part of my download. can i import another way?
from recommenders.datasets import movielens
i want another column where the index in which there are True values are copied over, but if there are more than 1 True value tgoeather, i want it to be just in the first index they were seen togeahter
I'm not sure I understand, it would be something like None,... Until index 95 where it's equal to 95 and then 95 at index 96 because it's still true?
Have you installed the recommenders library?
It seems to be a dataset that is part of the library
Example:
1 False False
2 True True
3 False False
4 True True
5 True False
6 False False
7 True True
8 False False
9 True True
10 True False
11 True False
12 False False
no the dataset i want is not a part of that library
its not from the same project. this is the example from the project:
from recommenders.datasets import movielen
Referencing the notebook, it looks like the format is userid itemid rating
You See how Its basically the same columns until there comes a consecutive True's in the first column, in which i only need the 2nd column to produe one True for the start of that consecutive run
That could work
So my solution is; calculate cumcount() over False. Then cumcount over Trues for each group from step 1. Then eliminate any count > 1
but wouldnt that just make like
True True
True False
True True
True False
if there were 4 togeahter
Idk what my max True's are consecutively
True true = false
True false = false
False true = true
False false = false
Not A and B
Oh easier; just drop where lag() == True and Val == true
lag and val? what r these
(Lag=shift)
I just mean; compare boo to previous boo, using shift. Like df[boo]==df[boo].shift()
I have to say, I appreciate boo over foo
And maybe & df[boo] otherwise you’d drop consecutive falses
I would just use series.diff() == 1
I’m assuming they want to eliminate runs, not just single changes
.diff() would be False => False : 0 True => True : 0 True => False : -1 False => True : 1 their (original) question was identifying where it goes from False => True
That couldwork
It definitely does
sounds like at some point you tried to remove entire runs of 2+ consecutive Trues?
mhm
max 11 consecutives
how can i acheive this format if i dont have a rating?
i think the dataset i have has the user and item id's, however
Op said more or less: more than one true together. Maybe there were two questions in thread tho
What kind of information do you have other than that?
Does it represent clicks? Or buying?
tweets dataset
Did this and something fishy happend
https://gyazo.com/e59f20d99d51bfb073ee597054733a92 this is with my df
this is with your series.diff
https://gyazo.com/511dbeb486e7fe49568cf2099a253920
did you do == 1 or != 0
== 1
Does the link represent a yser posting?
show what exactly you did? (code)
this is where i got it https://ktype.net/wiki/research:articles:progress_20110209
it looks like weights are calculated
and i think thats the rating
oh wait .diff() with bools seems to be just XOR
sorry, you'll have to .astype(int).diff() instead of just .diff()
you can specify np.int8 instead of int if you want
Or you can add an and with the same column if you want to stay in full boolean for some reason
Works like a charm holy shit
How do you guys think of these
seen similar problems a few times in the past
all you people seem like proper smart
if you haven't yet, check out the pandas User Guides and take a look over all the different functions in the documentation, or at least ones that catch your eye
be working for apple or some shi
I don't have the time to read the full article but I'm pretty sure this is an n-gram representation, not a recommender dataset
StackOverflow is also pretty useful if you know how to search effectively
I love stackover flow, apart from the fact answers are a decade old, they still work somehow.
Surprisingly enough, they often edit their answer to correct it if a new version breaks it
Hey fellas, quick question about gradients:
I'm using MSE as my cost function
Now I'm trying to calculate the gradient, but I'm at a bit of an intuitive crossroad:
On one hand, the gradient should consist of all of my weights, each one being its own variable (So in my case of a 28x28 image, 784 variables)
On the other hand, the gradient of MSE is just:
2/n * (prediction_vector - target_vector)
And my prediction has 10 variables.
What am I missing?
you should just about never calculate the gradient yourself, but rather leave it up for the library you're using to determine it for you (pytorch, tensorflow, jax etc)
This is on purpose
take a look at https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html - while being focused on torch, it explains the concept in general
And also not entirely the point of the question - there's clearly a knowledge gap here
I've read this
backpropagation takes the loss of the output of an operation and broadcasts it to the input
I know...
the gradient with respect to what?
Pardon? I'm not sure I understand
Gradient descent is what I'm after
yeah, you want the vector of partial derivatives with respect to each parameter in your model
So, if I have 784 pixels as my input, and 0 layers (e.g., just input/output) - would the relevant gradient be a column vector of size 784?
Well, just a vector of shape (1,784) or whatever.
yes, if you're treating them as 784 individual features
using the chain rule + rearranging terms to get the usual backpropagation formula
which loss function you are using does not influences this part at all btw
+1
yes, but specifically the gradient with respect to the parameters of the model
I'm not sure what "with respect" means in this context
the loss function is usually something like loss(prediction(parameters), data)
so you need the chain rule to get at the gradient with respect to the parameters
def MSE(prediction, ideal): is what I have
it means that's what you're treating as the "input" to your function
i'm talking about the math, forget the code
Oh, apologies
if you have an expression like f(x,y,z) = ax + by + cz then you're implying that x,y,z are the "inputs" to the function
Right
so the gradient of the function with respect to x,y,z would be the vector of partial derivatives with respect to each input
Right
but you could also talk about the gradient with respect to a,b,c, reversing the roles of x,y,z and a,b,c
Huh, the scalers?
it's just math jargon to specify which variables are "inputs" and which aren't
I mean if you calculate the gradient relative to the scalers you'd just get nonsense no?
let's assume they're all scalars, and no
a and x are identical here except that one is treated as an "input" and the other is treated as given
but you can just swap the symbols
when you say "with respect to", it's telling me which symbols represent "inputs"
Oh, so if you "calculate a gradient relative to a,b,c" are you assuming those are the inputs now?
right. so if i have an expression like loss(prediction(parameters), data) the gradient with respect to prediction(parameters) is different from the gradient with respect to parameters
i think you're thinking you need the former, but you need the latter
Can you maybe explain what prediction(parameters) is exactly? I might be missing the point here
thinking about this properly kind of requires you to flip around what the "inputs" are
the prediction that you produce is literally a function of the parameters of the model + the data
Right
i guess i should write it loss(prediction(parameters, x), y)
loss being our loss function I assume?
yes
Just trying to make sure I understand you
thinking about this as an optimization problem requires you to flip around what you understand the inputs to be
the optimization problem treats the data as fixed. x and y are handed to us as-is and do not change.
Right
we are now interested in finding the parameters (model weights, coefficients, whatever) that minimize loss
that is, we are maximizing the loss as a function of the parameters
so when we talk about the gradient of the loss function, we are talking about the gradient of the loss function with respect to the model parameters
Again, "model parameters" = weights/biases?
yes
also i'd suggest maybe start with something simpler than images. imagine linear regression on a dataset like iris. 4 features, 1 response variable, all continuous numeric data.
forget even the notion that linear regression is a special case of a "neural network". just think about minimizing loss as a smooth differentiable function of some parameters
Yeah I do somewhat regret not going with something simpler, but this has been a learning experience all-in-all
It'd be a rather shame to stop now as I've sunk a few hours into this now
Ah, this is the basics as far as I'm concerned
it's something i never had the discipline to do when i was younger and i'm still paying for it 10+ years later
Fair enough. I'll keep this in mind
Can I perhaps type out a concrete example (relating to what I'm trying to program) to see if I got the memo? I'll make it brief.
maybe? but it seems more like a matter of understanding the math than writing out code
Oh no, I'm not talking about the code
But I'm not entirely sure this is a math issue either. Intuitively this does make sense to a certain degree, and I do understand what you're saying
I'll try to type it out
Try a single input and output (no hidden layers). Can code that without any libraries. Then 2 inputs, 1 output.
See if it can learn some logic gates.
i was suggesting iris but that'd work too
coming from a (social) science background i find the idea of learning logic gates abstract and very unlike anything i'd expect to encounter in real work
So I have my loss function, which takes 10 outputs and does a nice little trick with them to calculate the loss itself:
f(x_1, ..., x_10) = MSE(x_1, ..., x_10)
But each of those x values were given to me by another function:
x_1 = w_1*a_1 + w_2*a_2 + ... + w_784*a_784 (a representing the pixels here, w the weights)
Is this correct so far?
10 outputs? you have 10 images, or something else?
This sounds complicated for some reason
a single image that's 28x28 and 10 possible numbers (The image represents a digit, so something between 0 and 9)
ah, mnist
Right
why are you using MSE on this?
I'm simply following 3blue1brown's video
I used RMSE but I was too lazy to calculate that gradient
Rather, a watered down version* no layers for now
it's just one step in the chain rule. i'd suggest working through it. that's essential and worth drilling until it's natural.
Just input, weights, and output
I'm not even sure what the bias does here
yeah i think this needs to be dialed back
from what i remember that 3b1b video is meant to be illustrative and relatively nontechnical
Right
Nevertheless, I wanted to see if I understood what was being said - and generally speaking, considering the fact that I've gotten rid of the layers, I figured this should be fairly simple
And yet, there's a lot of gaps in my knowledge
there are still some complicating factors here that i'd like to strip out
Can you perhaps explicitly type what I should do? There's a bit of a language barrier problem here (for me) I suspect
so let's dial it back to something simpler. imagine a single continuous output like body mass, and 3 inputs: height, waist size, and chest size.
yeah. let's say we are interested in whether we can determine body mass from those 3 measurements
Alright
so we propose a simple model of the form y = b1*x1 + b2*x2 + b3*x3 + b0, where the xs are the 3 measurements and y is body mass
this is the standard linear regression model. among many many other things, we can interpret it as 1 input layer and 1 output layer.
Right, similar to what I'm trying to do
it just sets the y intercept
Like a +C with antiderivatives?
that's what the machine learning people call "bias" because it kind of resembles bias in an electrical circuit. it's unrelated to the statistical term bias. statisticians call it an "intercept" to avoid the confusion & because it's literally the y intercept.
imagine setting all x1, ..., x3 to 0. then what's y?
just b0
right
if you didn't have b0, that forces y to be 0 as well when all xs are 0, which forces the entire line/plane you fit to pass through the origin, which is restrictive and makes your model worse for no benefit
Not sure I understand why, but feel free to skip this if it's not crucial
it's worth thinking about. having good geometric intuition for the math can help a lot
I do agree, I'm just not sure what this does in the context of ML
let me draw a picture
Sure thing
i'm attempting to build up some kind of foundation quickly so that you can proceed in your study 🙂
You probably couldn't figure but I do have some academic mathematical background ^^; It's just hard for me to process math in English for whatever reason
So you might be able to skim on some explanations
Much appreciated
ok, i'll keep going and hopefully you can work up to understanding why the b0 is useful
let's assume for now that it's useful and that we usually want it
Sure thing
so we have our simple model y = b1*x1 + b2*x2 + b3*x3 + b0
now we want to find b0, ..., b3 that produce the best line/plane to describe this relationship
the relationship could be totally wrong, but we want to produce the best possible estimate among all relationships of this shape
we do so by coming up with a loss function and minimizing that
just to avoid messy notation, let's call our model prediction p, so we have the following task:
minimize
l(p, y)with respect tob0,...,b3wherep = b1*x1 + b2*x2 + b3*x3 + b0
l is the cost function
y is the body mass
Mostly typing this for myself
so how do we do that? we note that p is differentiable with respect to the bs, so as long as l is differentiable and convex, we have the whole wide world of convex differentiable optimization techniques available to us
Sorry, convex?
In mathematics, a real-valued function is called convex if the line segment between any two distinct points on the graph of the function lies above the graph between the two points. Equivalently, a function is convex if its epigraph (the set of points on or above the graph of the function) is a convex set. A twice-differentiable function of a si...
basically, it's a bowl, and there is a bottom of the bowl. we need to find the bottom of the bowl.
well in this case the whole thing is, but yeah the real life loss surfaces are enormously complicated
Right
we aren't always guaranteed to have a global minimum. gradient descent only finds a global minimum under certain nice conditions, otherwise it finds a local minimum and we hope it's a good one
Right
in this particular case there happens to be an exact analytical solution (which you'll spend quite a lot of time reasoning about in a statistics class, it turns out to be just an orthogonal projection), but you can also use gradient descent, so that's what we'll use because it's what neural networks use
orthogonal projection
Oh god, those are relevant to statistics? :(
Nevermind, sidetracked
yes, linear algebra is essential in stats and machine learning
Hi, I was hoping to get some advice here.
I am working on a digital text sentiment analysis tool in python. I was hoping to achieve this using machine learning and an amazon review dataset.
First of all, I'm not sure what type of model i will need to create (eg Linear Regression Model) so I could use some help deciding that.
Second of all, I have a 100gb file full of reviews and im not sure of the best way to go about importing and training on this data.
Thanks in advance
To machine learning - sure. I was just hoping to be done with it in my remaining math-oriented academic courses
Nevermind though. Gradient descent. Sure
for gradient descent, we need the gradient. but be careful: we specifically want the gradient of l with respect to the bs
Yes
remember, we are trying to minimize l over all bs
so we treat l as a function of the bs
does that make sense?
So you get the partial derivative of (for example) b_1 * x_1 where b_1 is the variable, so its just... b_1?
I'm probalby jumping the gun
Better to just understand by example, perhaps
no, this was my next question for you. it's calculus time. what's the gradient?
The gradient are the partial derivatives with respect(?) to the variable you're looking for
the partial derivative of b1 * x1 with respect to b1 is x1
Oh, right
yeah sorry. i mean it's time to compute it
use the chain rule
3x + 2y -> 3 partial derivative with x
Yeah I forgot
All clear
So the gradient would be (x_1, x_2, x_3) since there are no duplicate b's or whatever
that's the gradient of p with respect to the b's yes
Right
let's assume l is (p - y)^2. and p = b1*x1 + b2*x2 + b3*x3 + b0 as before. what's the gradient of l with respect to the bs?
Err just a moment, I need to go back to the original equation
Err... isn't that 0
Because p = y?
Or am I misreading
Probably misreading, you probably want me to use the chain rule with f(x) = x^2
i adjusted the notation. p is our prediction, y is the true body mass in the dataset. x and y are given to us and we treat them as fixed
Ah apologies
i need to go make dinner. ponder this for now, because i think it's the core of what you were struggling with originally
Could be. I think something clicked at the very least
Bon appetit~
Oh and, many thanks for your patience and help of course*
i strongly suggest working through the actual calculation here to get an analytical closed-form expression for the gradient
it's a drill that should feel easy
Indeed, but its crazy how quickly the human mind forgets things - I finished calc2 less than a month ago haha
I'll work through this
so:
l = (p - y)^2
l = (b1*x1 + b2*x2 + b3*x3 + b0 - y)^2
the partial derivative of b1 would be, err...
f(g(x))' = f'(g(x)) * g'(x) ->
f(x) = x^2, g(x) = (p - y) ->
2*(b1*x1 + b2*x2 + b3*x3 + b0 - y) * x1 =
2*x1(b1*x1 + b2*x2 + b3*x3 + b0 - y)? (partial derivative of b1)
Will pop something similar into wolfram real quick just to sanity check
Looks about right. I'll ponder on this for a little while longer
Thanks again, on the offchance you're reading this
hi there, I trained a model on a good bit of text based data and the model seems to give really odd results. Even when I copy and paste a sample of the training data into the model to be predicted it will return 1 despite the piece of data given having been labled as 0 when the model was trained. Is this a syntom of overfitting? I didn't add any sort of dropoff or regulation to tensorflow so I suspect it may be but would such cause the model to not even be able to identify data which was inside it's training data?
Overfitting would mean on the contrary that samples from the training set are almost always classified correctly
Do you have a loss curve to check or something?
I'm looking for a way to determine the probability (something akin to a p value) of getting a particular set of residuals (i.e. chi2) from a set of fitted solutions to my model (non linear least squares). I've seen a number of tests (e.g. pearsons chi squared test) but don't know which one is correct, and these also don't appear to be %s either
You'd need to define the model you're using, the labels you're trying to classify, and what you'd define as reasonable results
I'd suggest using already available datasets such as Fashionpedia
And if you want good performances, the best way would probably be to use a prerrained computer vision model and use transfer learning
Sadly not sorry, I have the loss value which I believe to be 1.4~
correct. see page 4 for a slightly clearer way to write this https://see.stanford.edu/materials/aimlcs229/cs229-notes1.pdf
actually this document is very good and i suggest working through it
it seems right at your level
are these continuous or discrete data? it sounds like you want the joint distribution of the set of residuals for a given dataset
discrete
which unless i am misunderstanding your intention, is just whatever error distribution is built into your model
ah. what kind of model?
it still sounds like you want something along the lines of an error distribution, which is pretty much exactly what most statistical models try to estimate
I've seen sometimes in their fits people report they got a chi2=1.5 with a 0.01% p value. Similar to how in F tests you report a p value (except there it's the probability of increasing adjustable parameters gives you a better fit)
here I'm looking more for the probability of my current fit given my data and model
and solutions
that just sounds like P(Y = y | X = x, Θ = θ) right?
I'm afraid I don't quite understand this terminology. From my reading it's more P(x^2|v) where v is degrees of freedom
i am very literally talking about the probability distribution of the random variable Y | X = x, Θ = θ
are you interested in the probability of your exact model predictions, among all possible model predictions?
I believe so
what kind of model is this?
it's a custom non linear model
unless I'm misunderstanding ur question
this sounds like a hard task in general unless your model is parametric with a specific data distribution
i'd be tempted to solve this by simulation
generate realistic data, fit the model, repeat many times
I'm more so looking for given this chi2 from my minimized solution, how unique is this chi2? Can I get this by another random set of solutions?
I wonder how do you sort and refine a large amount of data for image classification/computer vision models? Are there automatic image labelling tools?
if you rely on a tool to create your dataset automatically, any models trained from that data will perform at best as poorly as those tools do.
the best 'quality' datasets are typically labelled manually, by a lot of people hired specifically for that (see: human annotators)
for some purposes, you can just use images from Bing's API and alike, but typically you should prefer using curated datasets if any exist for the task you're trying to do
I see so there's no going around annotating manually for the best/cleanest datasets huh
is that why people keep saying AI/ML is like 90-99% spent on the data and only the remaining for the actual model? 
if you haven't yet, take a look at ImageNet and all the work that went behind the dataset used by it
part of, but not all of it
not just collecting/labelling data, but also dealing with issues like missing data, making sure you didn't misunderstand anything, checking some statistical properties sometimes
this field is really hard....
hello! my model as shown below seems to be suffering from what I can only assume to be overfitting. After retraining it and adding some regulation via L2 and disabling 50% of neurons during training with a 0.5 dropout. I'm not sure what I'm doing wrong here but whenever the model is tested on any text it will return something like 0.998~ however it seems to perform very well on the training data as when passed in it gets it correct. Here's my model ```py
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=len(
tokenizer.word_index) + 1, output_dim=128, input_length=max_seq_length),
tf.keras.layers.LSTM(64),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation='sigmoid',
kernel_regularizer=tf.keras.regularizers.l2(0.01))
])
and here are the epochs and final loss/accuracy, (I'm not sure why the final accuracy is so high) ```
loss: 0.0272 - accuracy: 0.9967
loss: 0.0134 - accuracy: 0.9992
loss: 0.0113 - accuracy: 0.9996
loss: 0.0096 - accuracy: 1.0000
loss: 0.0097 - accuracy: 0.9998
loss: 0.0095 - accuracy: 0.9999
loss: 0.0091 - accuracy: 1.0000
loss: 0.0091 - accuracy: 1.0000
loss: 0.0092 - accuracy: 1.0000
loss: 0.0090 - accuracy: 1.0000
Loss: 0.021111026406288147, Accuracy: 0.996656596660614
Beginner here. Have some experience with using Tensorflow and Keras though at a novice level. What's one thing I cam do to go to the next level?
What does the labels of your training data look like? My first instinct is that the class 1 label is overrepresented
Not sure what beginner level means. If you want to improve your ML skills, you can:
A) Train/fine-tune more complex models
B) Use your models on "real" use cases
C) Reimplement the basic bricks from scratch to learn how they work
It's not an exhaustive list and it really depends on what kind of skills you want/need to develop
I haven't seen the process behind ImageNet, but I've been doing some (personal) researches on dataset labeling around that (and exactly to make my own datasets)
You should try taking a look at Unsupervised Learning and, specially, Self-Learning(which may provide you with better results).
This blog post may also help you:
https://lilianweng.github.io/posts/2021-12-05-semi-supervised/
When facing a limited amount of labeled data for supervised learning tasks, four approaches are commonly discussed.
Pre-training + fine-tuning: Pre-train a powerful task-agnostic model on a large unsupervised data corpus, e.g. pre-training LMs on free text, or pre-training vision models on unlabelled images via self-supervised learning, and the...
In a nutshell, there's no escape from having to manually label your dataset, but you can spare some work and anti-inflammatories if you can make a model (and a method, maybe? Like SimCLR?) that's able to properly learn from few labeled samples and generate good quality pseudolabels (or labels automatically generated) for the rest of your dataset.
I'll look into it. To be fair, I'm only looking into it bcs I'm just studying all this alone, surely companies have human annotators to do the labeling.
Poor guys...
Anyone a pro in Plotly Dash?
I wanted to know:
1 - is it true that all callbacks get called automatically at the start, when the app is booted? If so, then in what order? Can that be checked/changed somehow?
2 - does that mean that there's no point in setting the value of a parameter/property inside the layout definition, if that parameter/property is the output of a callback, because it'll immediately be replaced by the output of the first automatic callback call?
I have an RL question. In this video https://youtu.be/my207WNoeyA?list=PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv&t=242, i understand previously that:
- a function taking state s and action a can be mapped as f(s\sub{t}, a\sub{t}) = r\sub{t+1}
- RL is mostly based on a loop State -> Action -> Reward
However i don't understand this, i don't understand how (and what) the transition probability is, even though i understand that the current action picked from a state determines the reward.
There exist environments which aren't deterministic - the same action in the same state may result in varied next states.
right
but i don't understand the text and the math notation in the video ive just linked above
like i don't know how does the information this page tie into contents discussed previously
This slide?
yes this one. i don't get it
That's the distribution over possible next states - if you're in state s and take action a of the allowed ones in that state, you may end up in state s' with reward r with probability p(s',r | s,a). The equation on the bottom is just rewriting the same thing - it's defined as the probability that S_t, the state at time t, and R_t, the reward at time t, are s' and r respectively, conditional on the state at time t-1 being s and the action taken at time t-1 being a.
i have a few questions on this:
- what does the | mean
- so does p(s',r | s,a) really just mean "i do something at state s', which is a (allowed by the state), which gives me reward r at state s'?
idk what the lower rewrite is
i mean the second part of the p(s',r | s,a)
That's the "conditional" notation - e.g. P(A|B) is "probability A happens, conditional on B having happened"
so does p(s',r | s,a) really just mean "i do something at state s', which is a (allowed by the state), which gives me reward r at state s'?
If you do actionaat states, you can, in the general case, get any reward and end up in any state - and that's governed by a probability distribution. Specifically, the probability of getting rewardrand ending up in states'isp(s',r | s,a).
E.g. if your environment is fully deterministic, then for each s,a, there'll be just one specific s',r pair the probability of which will be 1, and the probabilities of all other states-rewards pairs will be 0.
im not sure if i understand you correctly:
That's the "conditional" notation - e.g. P(A|B) is "probability A happens, conditional on B having happened"
does it mean "the probablility of A happening after B happens"?
and what you mean in your second part here is that after taking an action, the state and reward is like random but the chances of a SPECIFIC state and reward occuring is whatever is on the other side of the equal sign of p(s',r | s,a)? like a "spin a wheel" where the wheel has like sections with different colour?
and what you mean here is that if the game is deterministic, for action s,a will always yield s',r? so p(s',r | s,a) = 1 and everything else 0?
Well, sure. Of course, not the same pair for every state (or that'd be a very boring game where any action in any state gets you the same reward and gets you into the same state).
wdym same pair
does it mean "the probablility of A happening after B happens"?
Sure. That's probability-theory notation.
but in other words, should i replicate action a at state s, i'll yield the same rewards every time right?
nice, thanks for your clarification :D
im learning RL basics since im implementing AlphaZero (i set up the search alg and NN already, just need to implement the training loop since the paper implies i know this alr)
can't wait to learn this and implement training loop and train on my new GPU (excited)
Yup, and will end up in the same next state.
probably just ask here, I am not always online.
sure
but like a lot of times my question is ignored or get pointed to an SO link
:/
what exactly is expected return? is it the sum of all future (anticipated) rewards from current time step t, all the way to future final time step T? https://youtu.be/a-SnJtmBtyA?list=PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv&t=65
well, sure.
then the thing is
i don't understand why discounted reward exists and why it's useful lol
it says here but i don't get it https://youtu.be/a-SnJtmBtyA?list=PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv&t=145
because if you don't do any discounting, for many games the expected reward is clearly infinity no matter what you do, so not much to optimize.
well yeah but i don't get how discounting it would ever help
discounting will make the reward finite always, even if the game will be infinite.
no? like the sum of decaying exponential progression being finite.
not a pro, used it a couple of times, actually still getting up to speed with it this week after not using it for years.
re. 1)
https://dash.plotly.com/advanced-callbacks#when-a-dash-app-first-loads
yes, it is called automatically
order is determined by a dependency tree (i think of it as a DAG 🤷♂️ )
see https://community.plotly.com/t/what-is-the-execution-order-of-callbacks/6858/2 for an answer from the author himself.
no comment on 2)
so my questions are:
- how does making it discounted make it possible for continuous action (ik youve explained it but i still don't get it)
- wouldn't making the discounted reward make it "less accurate"? like the agent is getting less reward (sad)
- what does the equation here mean https://youtu.be/a-SnJtmBtyA?list=PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv&t=196
Well, that's just the equation for discounted reward - it's expected reward except will multiply the terms by 1, γ^2, γ^3, ... - a decaying exponential progression. It's a math fact that if you take most series (those that don't grow unboundedly, or even do grow unboundedly but not exponentially fast) and construct a "discounted" series like that, the sum of it will be finite. So this makes the reward finite even for infinite games.
wouldn't making the discounted reward make it "less accurate"? like the agent is getting less reward (sad)
kind of, yeah, but this is more of a philosophical point. Note that you can make γ arbitarily close to 1 if you want the agent to consider the future more - as long as it's below 1, the discounting will work.
then i have more questions in addition to the original ones:
- why is there a expected return in the first place? even for episodic tasks? why (and how) would the AI make the most of all anticipated rewards? i know the reason is along the lines of not being as myopic and focused solely on short term gains
- what would the agent do with the discounted expected return?
what would the agent do with the discounted expected return?
the value itself? nothing, it just should be maximized - so in practice, RL usually consists of getting a good idea of what sets of actions will give what rewards in the long run, then doing them.
why is there a expected return in the first place? even for episodic tasks? why (and how) would the AI make the most of all anticipated rewards?
I don't really understand what you mean by these.
hey guys, what are ways i can analyze audio data (mp3files), in such a way that the resulting extracted data would always be in the same shape, so it would be suitable for machine learning purposes. Thanks for any answers.
I have 3 callbacks that depend on each other 1->2->3. The 1st callback creates a global variable, and the 3rd one uses it to create a table. But for some reason, when u load the app - it doesn't work, there's nothing there, no table. The strange part is that if I then relaunch the app, then it does work (p.s. doing this in a jupyter notebook, so variables carry over from one app launch (cell execution) to the next, but when I reset the kernel - it doesn't work again). So all callbacks do work, the one that creates the global variable does work, and so does the one that uses it, but something must be wrong with the order, even tho it should be correct. I just can't get it
how should the reward be maximised when the expected return is not really used?
i was just asking about the purpose of expected return (and then i said that i think i know the reason of having them, yk for you to check and elaborate on my understanding)
Suppose I have a function, f(x) = x^2 + 5. I want to maximize it. I calculate the symbolic derivative of it, f'(x) = 2x. I note that symbolic derivative has one zero, at x=0, where it crosses from positive to negative. Hence, f(x) is maximized at x=0. There, I maximized my function, without ever explicitly calculating its values at any points.
similarly, you can introduce the concept of "discounted reward" to talk about "what actions to take to maximize this", but that doesn't necessarily mean you're actually going to be evaluating that function; maybe you'll just analytically determine the best strategy from analyzing it.
Well, sure, it's just how we formalize looking into the future. An agent that doesn't do that will only care about getting the best reward at the current state, which isn't necessarily the same as maximizing reward in the long run.
so the expected return in this cause would be the function and you'd try and maximise by finding the derivative, setting to 0 and solving?
Kind of. It'd be a much more complicated function - a function of your strategy (what actions you take depending on the state), and you'd want to find the optimal strategy. In almost no real games will you be able to just derive the optimal strategy (for instance, because you might not even know the form of the reward). But as you'll probably soon see in the course, that still allows you to derive some important properties the optimal actions must have.
can i do this using grad descent?
strategies are often discrete if you choose/are able to represent them numerically. otherwise they're algorithms. neither lends themself to differentiation
Ehh, sure, for some kinds of games you can "just" numerically optimize a very high-dimensional function. But, well, remember how a strategy is basically "what action you take in a state"? Well, the state space isn't necessarily discrete, it may be continuous. So you're finding a function that optimizes a certain value, and that's getting very complicated to represent numerically.
i think i had something similiar happened to me before, can't remember how i fixed it.
do you have a minimal reproducible example?
i see. this still sounds really confusing :/
is this channel not so beginner friendly. 🃏
it is if you ask beginner-friendly questions 😛
Well, consider the fact that this formalism is, arguably, powerful enough to describe any agent, humans included, so if acting optimally was trivial, your existence would be very boring indeed 😛
now i have an extential crisis 🫠
that is optimal in some sense, i.e. the worst possible
questioning my purpose as a simple being when i don't understand how decisions and rewards are supposed to be maximised 🫠
decision theory does cause that as a side effect 🙂
It's easy to always make the optimal decision when you know the state and position of every particle in existence and you can predict accurately human behavior
citation needed; i think there'll be some issues even then :p
Citation: Albert Einstein: "God does not play dice".
qualifications of Albert Einstein:
- hero of many anecdotes
- being famously wrong about quantum mechanics
Hi, I was hoping to get some advice here.
I am working on a digital text sentiment analysis tool in python. I was hoping to achieve this using machine learning and an amazon review dataset.
First of all, I'm not sure what type of model i will need to create (eg Linear Regression Model) so I could use some help deciding that.
Second of all, I have a 100gb file full of reviews and im not sure of the best way to go about importing and training on this data.
Thanks in advance
Can ML engineers also work as data scientist? Do they have to do anything extra to be a data scientist?
there's no consistency in what all these job titles actually mean.
uhh
uhh?
When I searched for machine learning engineer positions, I couldn't find any for some companies. That's why I wondered whether I could apply for a data scientist position. (I'm not searching for a job rn, I've just finished my freshman year and want to go through this field)
there aren't regulations around who is allowed to call their employees ML engineers or data scientists. and there isn't really even a consensus around what a "data scientist" is. you'll have to look at the job description and requirements to get a sense for what the job actually involves.
have you started looking for internships for next summer and beyond?
I've just completed my first year at university, dont have any internships yet
right, you probably wouldn't have gotten one this summer
yeah, can't apply for one now
So, it's better to look at job descriptions instead of job titles, right?
pretty much
Gotcha, thanks a lot
at least in the context of AI/ML/DS positions
I see, will keep that in mind
Also, don't overspecialize early: build a strong foundation. You'll need broad skills... not just "ML" skills... to thrive in any position.
- I've seen a lot of posts saying things like: "I don't need to learn XYZ because all I want is AI/ML"
may you give an example?
I dunno, yesterday someone was saying something about not needing to learn anything about web development because they didn't want to be a front-end developer.
And someone else said they didn't like data analysis but wanted to do AI/ML, which I thought was hilarious.
100 % agree with Stel
Additionally, just give it time tbh. Enjoy life, enjoy school, take courses that you like and do internships
I want to create a conversational chatbot that can generate text like GPT-3 or GPT-4 and can be trained with custom data, where should i start?
what is wrong with my tacotron2 training model this is supposed to be spongebob😭
kill it
😂
I mean it sounds like him at the beginning
I need to fix it because I want to stream ai sponge
so you're trying to make a synthetic voice of spongebob. what was the input for that audio? "hahahaha"?
what was the total duration of your training data?
welp
it's cursed
you might have to keep training it.
but it might also be that you don't have enough data, or that the quality isn't pristine enough
I don't think so it's just giving results like this no matter the training time
12 hours is quite long and it should at least be understandable
I just want to understand the issue
I've heard of people running tacotron for weeks
Have you trained it from scratch?
yes
Then Stelercus is probably right
I need to train it for longer?
> assuming Stelercus could be partially wrong
Tacotron 2 was originally trained on... I think...around 40.000 audio samples?
And for quite some time... I don't remember the details...been a while since I've read the paper 
wish they had named it sushitron
Why?
Oh
i've seen someone training a model using 10 audio samples and for 1 hour and it's understandable
Wish my Audio GAN would work without killing my GPU 
on youtube
Fine-tuned the model
They probably used a pre-trained model and applied training on their custom data
I've also used a pre-trained model on 150 audio samples and it worked quite fine after 2~3 hours
omg
so how long do you think I need to train it
is it possible to train spongebob voice on pre trained model?
On pre-trained model, you may need around 2~3 hours...maybe less, since you got a reasonable dataset size
I am using this command ```
python train.py --output_directory=outdir --log_directory=logdir
how can I use pre trained model
You need to have the model weights already downloaded
where can I get one
Maybe Tacotron2's GitHub will have the pretrained weights
can you please tell me what is the prompt to use the pretrained model
I can't give more details, though. I always thought using pre-trained models was boring...specially since those tech companies tend to make their models GitHub a bit confusing...
They might have the prompt in the README
ok
uberduck is very expensive if I need to stream 24/7
$120/day
anyway thank you for the information
wtf
This sounds like the stuff of nightmares
Good afternoon, Im reading up on layer normalization. And the concept checks out and makes sense, given that it combats issues like gradient explosion. But what bothers me is that I'd have to constantly take the mean and subtract it from my layer and then divide it by its std so im permanently altering my values in the layer so that they center around zero but dont i run into the danger of having too many zeros and subsequently killing the net?
Would appreciate if sb could link me a resource to help me understand the concept better
https://proceedings.neurips.cc/paper_files/paper/2018/file/905056c1ac1dad141560467e0a99e1cf-Paper.pdf probably the most in-depth analysis
Well, front end development doesn't really interest me either, but backend development. I studied some html and css though
Yah, that's all I'm saying... you should still study it a little, not ignore it.
Right, thanks for the suggestions 🙏
Appreciate you!
hey !
first second or so is passable
is this the way to use pretrained model to train a tacotron2 model : ```
python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start
anyone here attending IAIM'2023?
Any one here had an internship
If so what projects do the require
Or what good resume
helpppp
i am using ssh
lsof -i :<port_number>
prints nothing
i tried to run this code before it ran "FINE", but now i get this
98 means port is busy, how to change it
it's a near 50/50 split
I believe there are more 0 entries than there are 1
im also not sure why model.evalutate gives it a 0.99, I use model.predict on the testing data and most of it is wrong
What loss are you using?
Did you process both datasets the same?
As in you normalized the test data as well f.e.
And you didn't accidentally flipped the labels at some point
I posted my model code above, (#data-science-and-ml message), as far as I know the preprocessing was identical and when I print off some entries at random with their respective label it seems to be in check
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
So when does it provide "wrong" results then?
Only with .predict but not with .evaluate?
Here's the loss and accuracy of my model during training. The last printout is the loss and accuracy returned by model.evalutate
On the test data right?
yes
What is the shape of your dataset you pass to predict and to evaluate?
the values on the x axis were just a list of strings and on the y it was a numpy array of integers marking the flags. Both single dimensional
if that's what you're asking
hm, the model seems to perform well on the test data, however, when it receives data that isn't really positive nor negative like hey! it will always form a bias to negative and return 0.99 as far as I can tell
Good evening guys is anyone familiar with batchNormalization?
be sure to ask your actual question right out of the gate
When discussing reduced chi2, is the sum of squared residuals normalized against degrees of freedom or number of fitted data points? Because I've seen both used. Or is it only number of fitted data points if data points >>>> number of adjustable parameters
die = pd.DataFrame([1, 2, 3, 4, 5, 6])
trial = 10000
sum = [die.sample(2, replace = True).sum().loc[0] for i in range(trial)]
freq = pd.DataFrame(sum)[0].value_counts()
print(freq.sort_index())
Relfreq = freq.sort_index() / trial
Relfreq.plot(kind = 'bar')
plt.show()
is there a way i can make this faster, it takes about 4 seconds to do, but i need to go up to 1 million trials
maybe use numba or sometin
maybe ```py
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
trial = 10000
s = np.sum(np.random.choice(a, size=(trial, 2), replace=True), axis=1)
you have no idea how much i want to use numpy
but i have to use pandas in this task
alright can somebody take a look at it ```py
class BatchNormalization():
def init(self):
"""
Please note that I'm substituting gamma and beta for weight and bias to make this module compa-
tible with the rest of the libary.
"""
self.weight = 1
self.bias = 0
def forward(self, inputs):
"""
Subtracting from the input its mean before dividing by the standard deviation of the input.
Finally multiplying it by the self.weight parameter and adding self.bias to it.
"""
self.inputs = inputs
self.mean = np.mean(inputs, axis=0)
self.variance = np.var(inputs, axis=0)
self.stdDev = np.sqrt(self.variance + 1e-8)
self.normalizedInputs = (inputs - self.mean) / self.stdDev
return self.weight * self.normalizedInputs + self.bias
def backward(self, gradient):
"""
Backpropagation through the layer. We first compute the gradients of the loss with respect to
the normalized inputs, variance, and mean. Then we apply the chain rule to derive dweight,
dInput and dbias. As per usual, dbias is just the gradient as its derivative of the sum op-
eration is one.
"""
N, D = gradient.shape
dNormalizedInputs = gradient * self.weight
dVariance = np.sum(dNormalizedInputs * (self.inputs - self.mean) * -0.5 * (self.variance + 1e-8)**(-1.5), axis=0)
dMean = np.sum(dNormalizedInputs * -1 / self.stdDev, axis=0) + dVariance * np.mean(-2 * (self.inputs - self.mean), axis=0)
dInput = dNormalizedInputs / self.stdDev + dVariance * 2 * (self.inputs - self.mean) / N + dMean / N
self.dweight = np.sum(gradient * self.normalizedInputs, axis=0)
self.dbias = np.sum(gradient, axis=0)
return dInput
well, find a way to avoid iterating over trials.
Yes
thats the plan
perhaps ```py
die = pd.DataFrame([1, 2, 3, 4, 5, 6])
trial = 10000
samples = die.sample((trial * 2), replace=True).values
samples_2_2 = samples.reshape(trial, 2)
sums = samples_2_2.sum(axis=1)
print(samples)
print(samples_2_2)
print(sums)
die.sample(trial*2) gives you 20000 samples, then you reshape it to trial,2, then sum each row
Any easy way of sharing a jup notebook, since uploading here isn't allowed?
https://filetransfer.io/data-package/lPNgCRmp#link
So, not necessarily minimal, but should be reproducible, I think. If u just run the whole notebook, then the last cell will give the error in the first screenshot. If u then open the app by clicking the link, and simply close it again, and rerun the last cell, then you'll get what I expect immediately. When you open the link for the first time, there is no table at the top right. If you then close the link and reopen it, without running any extra cells, the table will suddenly appear (as the variable is now properly initialized and filled, as seen by the last cell of the notebook). I don't understand this behavior.
Ignore the terrible formatting, that's how it's "supposed" to be (haven't worked on it yet). I added some thicc markdown so you could navigate my extremely messy code more easily. Sorry about that.. 😅 I can't seem to figure out why it doesn't work on the first start. I think it might have something to do with order of initial execution of callbacks, but honestly no idea. After you reopen the link, everything works as it should (when you change the sector dropdown, the ticker dropdown options change, and when you change those, the table changes)
is anybody here presently working on object detection?
Yes
nope but i'd love to hear some about it
Hello, im new in the machine lerning field. i recently wrote the foundation of a neural network in c++. Currently im stuck at implementing the backpropagation method, and I just wanted to ask you guys if you have a good source where I can learn the math behind it.
edit:
I should note that this a supervised nn
The backpropagation is basically the Chain Rule from calculus.
There's one or another trick, like having to transpose the weights matrices when doing the chain rule, but in general it's just the chain rule, beginning at the loss function and going backwards until you get to the first layer.
Take a look at my references. The code itself is in python, but some references are more generalistic. They might help you:
cool, thanks! ill have a look
A class about chain rule will also be a must. I don't recommend the ones I've used because they're not in english
does anyone have any resources that show examples of how RNN hidden states encode patterns?
for example, my NLP professor mentioned that RNNs can learn open/close parenthesis and showed how the hidden state could encode that kind of thing, but it was a while ago and I don't remember it as well as I wish I did
is colab memory usage garbage?
Maybe it was based on this article? https://nlp.stanford.edu/~johnhew/rnns-hierarchy.html
that looks like exactly what he was talking about, thanks!
hello cool people I need help choosing what to use for sentiment analysis
I am currently in a node enviroment and am using natural, but results are trash no matter how much I preprocess the data
I can probably get python to run in there, but what do I use?
the data comes from what a user has written in an obsidian note
so could be pretty much anything
Check out huggingface, it has good documentation and a hub to select different pretrained models
okay many thanks, didn't know about it
https://huggingface.co/blog/sentiment-analysis-python#2-how-to-use-pre-trained-sentiment-analysis-models-with-python seems beginner-friendly enough
OpenCV seems to have a really shitty ONNX reader
I've asked about this before but I didn't follow up on the discussion, so I would like to ask again. Has there been any attempt to make a speech recognition model capable of distinguishing monologue and dialogue? So far I've found that it's possible and have a basic understanding of what I'm trying to achieve in a step by step which in this case determine how many person speaking in a given audio > figure out their segment in the audio > lastly convert the speech to text accordingly.
Would implementing NLP instead be better though?
I don’t know but reminds me of this thing I read a while back from Google: I think this is it: https://cloud.google.com/speech-to-text/docs/multiple-voices
Ah wonderful, I didn't know google provided such package ^^. much obliged.
I will try this out
Curious how it works out for you, let me/us know! I haven’t tried that feature
Oh it's nothing special, I wanted to build this model for audio only podcast so that people with impaired hearing can understand it with given context of monologue and dialogue
I don't know the formal term for "deaf" people, but to me this term felt condescending for some reason, so I'll refer it as impaired hearing even though I know it's not accurate (I'm ESL)
There is also a pytorch toolkit that is specialized for this kind of task if you want to have a more hands-on approach
https://github.com/pyannote/pyannote-audio
Hi! I am working on a project for work, where I have to use an LLM and fine-tune it based on user input. I have been asked to provide a general system configuration for a multi-server setup where this system would run for me to tinker with it and test it. I would like to get some suggestions for this. It will probably be a CPU-only server cluster, but GPU-based system recommendations are welcome. The systems would run some flavour of Linux, suggestions are welcome for that as well. I would also like to know how I go about using multiple systems to train a single model at the same time.
are you talking about how to set up the system (users, permissions, etc) or physically what hardware to buy, or both?
Hi any datascience substacks you read once in a while?
Have you already watched Andrej Kaparthy micrograd?
hi everyone
i was wondering, can we use ai ml in automating API testing?
if yes how ?, i would like to do a small implementation of it
Hello, people, I was recommended to ask here, but the question is not specifically related to python.
I am again working with Reinforcement Learning.
This time I use Neural Networks for the Q-Table.
My Agent is playing a game against another independent Agent. The Reward policy in the middle of the game always produces 0, and in the end of the game the Reward is either -1 for losing, or +1 for winning. This reward gets backpropagated over all the states achieved in the game.
But here is my question:
If the Q-Table were just a Lookup Table - Q-Value adjustment would be as easy as going over every accumulated state and performing the adjustment.
However when the Q-Table is a Neural Network: Adjusting Q-Value for one state changes the whole network. In which order should the adjustments be made? Reversed order from the end of the game to the beginning, or from beginning to the end?
haven't seen anything like this, but i imagine you could use some kind of AI to perform fuzzing or other kinds of "exploratory" testing
okie, thanks for this, will research
just the configuration
Also, how to use multiple systems to perform the training, if possible
It'd just be an ubuntu system or something and I would just run my code on the servers. I just want the config that can handle that
(They also asked this in #1136612354914799647)
Oh, nice reference. You had me at Atari!
The idea is that you don't just use the sample at a given time, but take multiple random previous samples and train the model on that
That way you can also use the same sample multiple times, and the batches are not as "correlated"
This is not relevant in my case, I am asking about adjusting the NN in my mentioned way.
Thank you for the reference tho
I think it is relevant to your question, you ask in what order you feed the data into your model for training right? @near basin
More on the context (should probably mentioned it before):
I make a simple RL for playing Tic-Tac-Toe. The "experience replay" sounds promising, except it does not fit the use case here, because the Agent has "no" experience on a made step, and makes this "experience" only when the game is finished
So taking random batches won't really help
What algorithm are you using to update the model, is it online or offline?
I am not very advanced in this, what is the difference between these two?
There are online methods that don't require future states to update the model
Such as deep-Q learning (probably most popular method)
It is online
So the experience replay buffer would be perfectly viable
This is what I am doing, except earlier I only had models, where after each step the model is updated
With deep-Q learning you can also update the model after each step
but not necessarily with the trajectory from that step
But with random trajectories from the previous 1000 or so steps
I am reading https://towardsdatascience.com/deep-q-learning-tutorial-mindqn-2a4c855abffc right now, based on your recommendations
I will report later if I have any questions
Sure
This was a tutorial I used when I did a RL project
But it is with pytorch, not tensorflow
I do not care about implementation, only theory. Because I do not even make this project in python
It helped me because code is a very exact way to describe how an algorithm works. So even if you don't care about the implementation, it may be good to look at still 🙂
Hhmmm, my biggest mistake for now was, that I was using only one network, instead of Main+Target pair
It helps with stability, but iirc I updated the target model after every 10 steps, and it still worked, so it is not 100% necessary for simpler problems
Wdyt, if I use main+target, then using the target network I will play one round, then update the main network using the saved states and then put the weights to the target network. Sounds like a plan, doesn't it? But will this approach work?
Then, I don't have to care about in which direction the backpropagation is going
In that case both networks would be the same
As you replace the target network with the policy network after updating the policy network every round
Typically you use the policy network to make decisions (but also add some random decisions to explore the state space)
And the target network is just used to calculate the temporal difference target
So this value is predicted with the target network
Yeah
Was talking about this
And the Q(S_t, A_t) is the policy network
Alright thanks! Is there any '/thank' command for helping points like in [World of Coding] server?
Good afternoon I've been trying to get my convolutional Layer to run but it is performing poorly compared to the mlp implementation. This is my test set up i know the learningrate decay is kinda aggressive but it yields the best result
from framework.nn.Dense import DenseLayer, FlattenLayer
from framework.nn.ActivationFunction import ReLU, Softmax
from framework.nn.Loss import CategoricalCrossEntropyLoss
from framework.nn.Metrics import Metrics
from framework.nn.Optimizer import Adam
from framework.visualProcessing.convolution import Convolution
from framework.nn.Utils import sparseToOneHotEncoded, visualize, shuffle
from sklearn import datasets
import matplotlib.pyplot as plt
digits = datasets.load_digits()
bilder = digits.images
tar = digits.target
bilder, tar = shuffle(bilder, tar)
bilder, tar = bilder, tar
tar = sparseToOneHotEncoded(tar, 10)
batchSize = 64
conv = Convolution((batchSize, 8, 8))
f = FlattenLayer()
l1, l2, l3 = DenseLayer(64, 128), DenseLayer(128, 128), DenseLayer(128, 10)
relu1, relu2, softmax = ReLU(), ReLU(), Softmax()
loss, optim = CategoricalCrossEntropyLoss(), Adam(lernrate=5e-3, lernRateDecay=1e-2)
acc, l, lr = [], [], []
from tqdm import tqdm
for i in tqdm(range(1000)):
for step in range(len(bilder) // batchSize):
batchX = bilder[step * batchSize:(step + 1) * batchSize]
batchY = tar[step * batchSize:(step + 1) * batchSize]
#convOutput = f.forward(conv.forward(batchX))
l1Output = relu1.forward(l1.forward(batchX.reshape(batchSize, -1)))
l2Output = relu2.forward(l2.forward(l1Output))
l3Output = softmax.forward(l3.forward(l2Output))
if i % 10 == 0:
acc.append(Metrics.accuracyClassifier(l3Output, batchY))
l.append(loss.calculate(l3Output, batchY))
lr.append(optim.getLearningRate)
l3grad = l3.backward(softmax.backward(loss.backward(l3Output, batchY)))
l2grad = l2.backward(relu2.backward(l3grad))
l1grad = l1.backward(relu1.backward(l2grad))
#conv.backward(f.backward(l1grad))
optim.learningRateDecay()
optim.step(l3)
optim.step(l2)
optim.step(l1)
#optim.step(conv)
visualize(acc, l, lr, optim)```
This is the run with the conv layer
this is without so it pretty obvious that the net has memorized the data set(i believe 2k images)
but this only proves that my backward pass in the conv layer is messed up as i just added the conv layer ontop of the other net
here's the conv net https://paste.pythondiscord.com/FYVA
would appreciate if sb could check if i implemented the backward pass correctly(cuz there must be an logical error which is messing up my net)
I guess Tictactoe is not a good exemple for you to learn since it's a solved game and you would only need minmax algorithm. It makes sense to try the RL methods with things having uncertainty. I would recommend this book: Reinforcement Learning: An Introduction by Richard S. Sutton. This is what i used to learn.
I am using universal sentence encoder tensorflow, How can I speed it up, its currently only using CPU not GPU for some reason
the only thing that will make any noticable difference is getting it onto the GPU, but I only know how to do that in pytorch.
I have a rtx 2060 super
It says that the current version is more cpu bound something like that, one sec let me show you
2023-08-03 14:51:19.521843: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
I am not actually learning, just having fun
but thank you
My point is that the question isn't "how can I speed it up?" because we already know the answer. The real question is "how do I move this computation to the GPU?". And I'm not sure how to help you with that.
Even if that TensorFlow binary is optimized to use the available CPU instructions, the available CPU instructions won't be enough for what you're trying to do.
oka
can you help me with another problem please 👀
I have to know what the problem is before I can decide that.
recommenders = {}
@shared_task
def process_question(question, instance_id):
if instance_id in recommenders:
print("Using existing recommender")
answer, recommender = use_main(
question, instance_id, recommender=recommenders[instance_id]
)
else:
print("Creating new recommender")
answer, recommender = use_main(question, instance_id)
recommenders[instance_id] = recommender
return {"answer": answer}
I am using django and it this api route, How can I make a shared object of recommenders dictionary between all the workers, the recommenders dict has key value pairs of the instances of this class:
class SemanticSearch:
def __init__(self):
self.use = hub.load("./Universal Sentence Encoder/")
self.fitted = False
def fit(self, data, batch=1000, n_neighbors=5):
self.data = data
self.embeddings = self.get_text_embedding(data, batch=batch)
n_neighbors = min(n_neighbors, len(self.embeddings))
self.nn = NearestNeighbors(n_neighbors=n_neighbors)
self.nn.fit(self.embeddings)
self.fitted = True
def __call__(self, text, return_data=True):
inp_emb = self.use([text])
neighbors = self.nn.kneighbors(inp_emb, return_distance=False)[0]
if return_data:
return [self.data[i] for i in neighbors]
else:
return neighbors
def get_text_embedding(self, texts, batch=1000):
print("Generating embeddings...")
embeddings = []
for i in range(0, len(texts), batch):
text_batch = texts[i : (i + batch)]
emb_batch = self.use(text_batch)
embeddings.append(emb_batch)
embeddings = np.vstack(embeddings)
return embeddings
I don't know enough about django to give you an informed answer; try asking in #web-development
it more of a python thing ig than django but okay, thanks
I am using GridSearchCV and for some reason it thinks an accuracy score of 96.47 is better than 96.57???
Can someone explain
Not sure this is the right place to ask, if not please point me in the right direction. So I have a list with 6911 values, it looks like the attached image. I want to make a new list every time the value drops by x amounts so I can do a regression analysis and calculate the slope on each list. Where do I start? What do I need to learn to do something like this?
when I do 'C': [1, 0.2236, 0.1] it gives me that 0.1 is the best with 96.47
when I do 'C': [1, 0.2236] it will tell me that 0.2236 is the best with 96.57.
I tried it multiple times and it gives me the same result
What's the source data, a dataframe? You need to define the conditions on which you want to partition the data. ie: "(current value - last value) / last_value < -10%". Once you can define the formula, then you can label the dataframe, and compute a regression on each label.
Currently just a list, but I should absolutely import it to a dataframe to speed up the process.
So if I'm using:
- MSE as my loss function
l - Sigmoid as my activation function
o - Some input layer
aand an output layerp(and nothing else)
in order to find the partial derivative ofw_alpha(some random weight) would it be right to do:
(l(p,t))' = 1/m * Sigma(1,m)[(o(w1a1 + w2a2 + ... + w_n*a_n) - t)^2)]'?
Trying to understand how to reach the gradient but I can't understand it for the life of me.
Does it always give the same accuracy? Ie is your model using a set random seed?
ye 42
always same accuracy
gridsearch just thinks 96.47 is better than .57
same result, 96.47 is better
ok
[LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear]
@twilit tundra it shows this
That's very weird, it should output something like this:
i tried both verbose = 3 and 4
[CV 1/5] END C=1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.946 total time= 0.0s
[CV 2/5] END C=1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.947 total time= 0.0s
[CV 3/5] END C=1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.949 total time= 0.0s
[CV 4/5] END C=1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.949 total time= 0.0s
[CV 5/5] END C=1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.947 total time= 0.0s
[CV 1/5] END C=0.1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.947 total time= 0.0s
[CV 2/5] END C=0.1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.947 total time= 0.0s
[CV 3/5] END C=0.1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.949 total time= 0.0s
[CV 4/5] END C=0.1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.949 total time= 0.0s
[CV 5/5] END C=0.1, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.947 total time= 0.0s
[CV 1/5] END C=0.2336, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.947 total time= 0.0s
[CV 2/5] END C=0.2336, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.947 total time= 0.0s
[CV 3/5] END C=0.2336, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.949 total time= 0.0s
[CV 4/5] END C=0.2336, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.949 total time= 0.0s
[CV 5/5] END C=0.2336, class_weight=None, multi_class=ovr, penalty=l1, solver=liblinear, tol=0.0001;, score=0.947 total time= 0.0s
hmm
Did your 96.47 come from evaluating on a set split? Then it makes sense that cv would have a different order
yeah
i think
ye it is
20 80 split
so should I still take C=1 as the best parameter
or C=0.1
According to the cv, it would be 0.1
ok but the accuracy score is lower when I call.score
On the 20/80 split?
ye