#data-science-and-ml
1 messages ¡ Page 97 of 1
For
[[-1.0, 0.0, 1.0], [1.0, 0.0, -2.0], [-1.0, -1.0, 2.0]]
dot [[26], [20], [970]]
I'm getting
[[944.0], [-1914.0], [1894.0]]
but it should be
944
1966
1894
It's like a 3x3 matrix dot 1x3 matrix
!e no, your program is right
import numpy as np
A = np.array([[-1.0, 0.0, 1.0], [1.0, 0.0, -2.0], [-1.0, -1.0, 2.0]])
b = np.array([[26], [20], [970]])
print(A@b)
@tidal bough :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | [[ 944.]
002 | [-1914.]
003 | [ 1894.]]
I am generating a plot of a graph with the networkx library
How can I make the spaces between my nodes larger in a circular draw
r1 + r2 = d1
r2 + d3 = d2
r3 + r4 = d3
r4 + r5 = d4
knowing that r1 = 2r5 how is the linear equation matrix constructed? (assuming r1 = 2R and r5 = R)
2R + r2 = d1
r2 + r3 = d2
r3 + r4 = d3
r4 + R = d4
would it be
2 1 0 0 R d1
0 1 1 0 x r2 = d2
0 0 1 1 r3 d3
1 0 0 1 r4 d4
or should I ignore r1 = 2r5
either you solve for r1 manually like that and get a 4x4 matrix, yeah, or you rewrite r1 = 2r5 as r1 - 2r5 = 0 and then you have a 5x5 matrix.
And for the equation to have an answer, A has to have an inverse, right?
If A has an inverse, then a solution exists, but I don't think the opposite has to be true - it's generally https://en.wikipedia.org/wiki/RouchĂŠâCapelli_theorem
In linear algebra, the RouchĂŠâCapelli theorem determines the number of solutions for a system of linear equations, given the rank of its augmented matrix and coefficient matrix. The theorem is variously known as the:
RouchĂŠâCapelli theorem in English speaking countries, Italy and Brazil;
KroneckerâCapelli theorem in Austria, Poland, Croatia, Ro...
A not having an inverse doesn't mean AX = B doesn't have an answer?
No. Consider A = [[1,1],[0,0]], b = [[1],[0]]. A has determinant 0 and hence is noninvertible, yet A x = b has infinite solutions.
i think it's pretty common to do things like sentiment analysis etc. using simple models on top of pre-trained word vectors
i've certainly done it for text classification. word vectors basically just acting as dimension reduction at that point.
Yeah that's pretty much what happens
Tomorrow I'm gonna see if there's a threshold where the feed forward doesn't work
Presumably it won't work in cases where the text is more complex
And the context window is larger
Like, the transformer blocks were just getting in the way. Once I made the model super small it found a way to skip forward the attention heads and just use the embedder module + feed forward
What I did next was to just delete the transformer blocks altogether. The embedder + feed forward converged crazy quick
Haven't checked these things but
I think the embeddings themselves will come grouped into regions, negative words to one side and positive words to the other
And the only thing the feed forward does is count them in the input
Inverse on
[2, 1, 0, 0],
[0, 1, 1, 0],
[0, 0, 1, 1]
[1, 0, 0, 1]
gives back
[[-0.0, -0.0, -0.0, -0.0], [1.0, -0.0, -0.0, -0.0], [-1.0, 1.0, -0.0, -0.0], [1.0, -1.0, 1.0, -0.0]]
when tested with numpy I got
[[ 1 -1 1 -1]
[-1 2 -2 2]
[ 1 -1 2 -2]
[-1 1 -1 2]]
numpy version:
import numpy as np
import numpy.linalg as alg
def main():
matrix = np.array([
[2, 1, 0, 0],
[0, 1, 1, 0],
[0, 0, 1, 1],
[1, 0, 0, 1]
])
print(f'A = \n{matrix}')
inv = alg.inv(matrix).astype(int)
print(f'det(A) = {alg.det(matrix)}')
print(f'A-1 = \n{inv}')
print(f'det(A-1) = {alg.det(inv)}')
if __name__ == '__main__':
main()
what that tells me is that the actual order and structure of the words is much less important than simply knowing which combinations of words are present in the sentence
that makes sense as well if you think about what the model is doing, it's basically just regression on top of embeddings at that point. it's also an illustration of why learning the embeddings along with the model is better than using pre-trained if possible
I reckon that it's true for small phrases. But when you get to essay type texts the model is gonna need to do almost like summarization internally
I instinctively assumed you were looking at smaller phrases because that's what everybody does with sentiment analysis self study projects đ but I shouldn't have made that assumption
however I don't think it's necessarily invalid even on longer documents as long as they are well separated by their vocabulary. Consider book reviews for example
It's small phrases at the moment, but I'm gonna need to find a dataset with long texts
This is one of the tasks that I'll use to make the ablation study on the transformer
I bet I could build a book review sentiment classifier with > 50% accuracy by just looking for words in some fixed-size neighborhood of "bad" and "good" in a pretrained fasttext embedding space
Perhaps, depends on the complexity of the text. If it's thesis type of text for example, with an opinion on some geopolitical matter, a transformer is likely needed since it requires actual conceptual understanding
remember what transformers do: they construct a new sequence of vectors such that each individual vector in the new sequence represents its own context in the original sequence
so transformers only improve your model if context is important
right, it will definitely depend on the task
but I wouldn't say it's just a matter of length, more about the subtlety of ideas involved in the text
Yeah I'd say so too
Gonna need to scrape the web for datasets
This is actually a good exercise to get to speed with all the NLP tasks
Sentiment analysis, machine translation, topic classification, and a couple others I don't recall
I'm gonna use them to compare my variant, and then replicate the MetaFormer study but for NLP
I haven't found anyone doing it yet, don't know why
Hi, i have the following code to segment the blood vessels of the eye:
import cv2
import numpy as np
import skimage
def vessel_segmentation(image):
im_rgb = cv2.imread(image)
# Extract green channel
im_green = im_rgb[:, :, 1]
# CLAHE enhancement
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
im_enh = clahe.apply(im_green)
# Negative
im_gray = cv2.bitwise_not(im_enh)
# Use Top-Hat transform
se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (21, 21))
im_top = cv2.morphologyEx(im_gray, cv2.MORPH_TOPHAT, se)
# OTSU Thresholding
_, im_thre = cv2.threshold(im_top, 50, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
return im_thre
i have the following result:
but should be something like this:
Hey if I traspose a dataframe, shouldn't the size of the df always be the same? In my program I lose 3 columns when I do it?
if you transpose a dataframe, the shape will go from (a, b) to (b, a)
if you appear to be losing data, there must be more going on.
Q:
I have a discord bot, and, for every guild,
I have an image banning method using embeddings of the image, comparing to stored banned embeddings.
Currently,
the structure is just every embedding in one tensor
and it works, but I can't have the same embeddings for every guild.
I could simply store an element of the tensor that's the guild_id,
but that feels like an antipattern,
and that I should use the Pony ORM database I have, and then have the guild_id-tensor pairs.
Is there a preexisting standard for this sort of application?
I have seen lots of videos about creating object detectors using a camera or webcam
But can't we do the same thing on our PC or mobile screen?
maybe check #discord-bots ?
anyone knows a good literature review on transformers ? I'm looking for something done in 2023
Demo of a CLI tool I built over the weekend that connects with Google's Gemini LLM and use it with your files.
It lets you add your own custom commands as well, so you can further enhance the CLI or use it to interact with the LLM and your files however you want.
Does anyone know how to make a weather forecast? Maybe it's impossible.
let me know if you find one
it's possible, but whether it's something you can do at home versus a serious multi-year research project depends a lot on the scale and scope that you intend
if you're trying to forecast the temperature at a single location, you can do OK with traditional timeseries methods, but your predictions will at best only be interpretable as an average
beyond that, you're getting into meteorology, not just statistics or machine learning
for what i know you can try LSTM for some decent results
Hey guys I have these 3 problem statements , how do I forward with these problem?
PS1 - Anonymise user identities in large databases to ethically employ machine learning in understanding customer trends and behaviour without violating their ri`ghts.
PS2 - Create a blockchain-powered platform allowing users to lease their information to social media services, with the assurance that the data will not be retained upon exiting the service.
PS3 - Develop an AI based solution to offer timely insights into current global hacking trends, prioritising potential threats based on their likelihood of targeting specific enterprises.
For sentiment analysis, I can't have the embedder module be learnable, it's gonna overfit every single time. I looked around and people seem to start with a pre trained one and stick a feed forward on top.
So I decided to just train one by getting a transformer to first do next token prediction.
But at that point might as well just let the next token prediction be the sentiment. I have a bit of padding on the sequences and the last token is the classification
The expectation is that the transformer will be obliged to learn syntax as it does on the normal next token pred. It's training rn will see if it works out. At least it's not overfiti g
Whatâs the context here? Are you expected to do all three? Or to explain how youâd approach them? Is this just an essay question?
50 epochs in with the next token prediction way, no overfitting and passed a similar test as this one
Does anyone know if in this code the binomial function only counts whether something is a 0 or 1? These values refer to heads and tails respectively
the "binomial distribution" describes the number of "successes" after a sequence of a "experiments" or "trials". the classic example is the number of heads in a sequence of coin flips. this code generates random numbers according to that distribution. each draw from that distribution is simulating 1000 coin flips with a 70% chance of heads.
i haven't seen that technique before, seems worth exploring. you might need to "clip" the final token to only have nonzero score for the classes and 0 for actual words
I've allocated 5 special tokens, one for each sentiment, which I'm assigning to the last position in the sequence
I think it is working, the limitation will be the dataset, which won't include stuff I can come up with, like sarcasm or the entire phrase being positive and then end with "jk, it was the opposite"
Can you explain how the values of the numbers generated mean success or failure?
Btw if instead of a coin, can we still use the binomial distribution to check success in a die for instance
Well, it seems like it worked, gonna run over some stats after a well deserved break
But this will do, I can train both the transformer and the metric tensor net, it gives me clear performance metrics, etc
Next task will be summarization
each number generated is the number of successes. if you have binomial(1000,0.7) and get 670, that's 670/1000 heads
as for the dice, you have to do some prep work yourself in defining what counts as a "success"
for a standard 6-sided fair die, you'd think of which sides represent a success. then for a single roll of the die, this determines the value of p, the probability of success
in general you need 6 "categories" for a die. success/fail as in binomial is only 2. the more general case of >2 categories is called "multinomial".
but yes if you can interpret some outcome of the die roll as "success" and other outcomes as "failure" (eg a saving throw in D&D) then yes you can use the binomial distribution
and by the way, binomial with exactly 1 trial has a special name: the Bernoulli distribution. a binomial distribution is the sum of independent draws from a Bernoulli distribution
and likewise for multinomial. a multinomial distribution is the sum of independent draws from a categorical distribution
Wikipedia articles for probability distributions are usually interesting reference points, even though most other stats articles on Wikipedia are not great
What does it mean when the validation loss is lower than the training loss
I don't think I have data leak, it only happens on certain hyper parameters
is it robust to resampling? could be just weird luck
Resampling ?
I'm shuffling the training data on every epoch
Have a running average for both sets
Ah they just flipped
like, re-splitting train and test
Oh, I haven't tried. The dataset readme instructs to use the indicated split for comparison with literature
Now it's overfiting ah
is it just a train/test split, or train/validation/test?
The labels they gave are train/test/dev
okay, so you're hopefully not using test for the loss curve during training
I'm training on train and using test for validation
is that what they say to do? normally "test" is reserved for checking at the very end
I decreased the size of the model and now the val is larger
the names are confusing and disagree with common english usage
Uhm, I just need a split to follow along to check when the loop is overfiting
search in pages like Springer, Mendeley
there's a lot of works in that topics bro
Thanks, I'll check it out
But I reckon it might be picking up on some pattern that is more pronounced in the validation set
right, use dev for that
Bit confusing terminology
But shouldn't matter right
a bit? very confusing
As long as there's no leakage
yes, as long as the set you use to "follow along" during training is not the one you use for final score
I have selected the PS1 as my statement , so i need suggestions or advice on how do I proceed?
You didnât answer my questions.
These are 3 problem statements given to me I have to choose one
Ok, now that youâve chosen one, what are you expected to do?
Anonymise user identities in large databases to ethically employ machine learning in understanding customer trends and behaviour without violating their rights.
Is this just an essay question? A coding assignment? Etc
I need to implement a machine learning model for a given database which understands the customer trends and behaviour of the customer without enclosing thier details
Its a hackathon
I don't think so-
The issue is entirely about the database, so possibly databases, but most database people probably know less about embeddings than most ML people.
@lapis sequoia the problem was the learn function I was doing self.weights[j][i] instead of self.weights[layer_i][i][j] so the changing one weight would change the other instead so it couldn't caluate the cost for the weight it was thinking about
I'm on one block with 2 heads
Kind of insane that this miniscule model is threatening to memorize the data ._.
hey, how opencv subtract works?
i don't get it
Aaaah back to regularization
With any data science type question, first step (imo) is to understand the data. Basic EDA stuff. Determine how what cleansing, transformation, normalization, etc is needed to prepare the data. Come up with a train/test split strategy. Determine your X and y variables, etc.
Increased model size, included L1 and L2, model size affects LR schedule which might've actually been messing up the other loops
But at some point I'm gonna have to do data augmentation or find myself a larger set
hey guys i need help saving architecture from autokeras
i trained a text classifier and i just want to save architecture not the already trained model
is there a way to do this?
@agile owl itâs not that it gets used the most times⌠usually youâre sliding the model forward and doing N tests, not a single test
Ie; train on 2019-2022, and test against 2023
Or start at 2010-2015, then walk forward
how do you get any sort of signal against a recent trend then
if you want behavior from a high rate environment
etc.
You want to train on 2023 and see how it wouldâve performed against 2020?
let's say you're trading equity indices right
now equity indices have different correlations to rates/inflation in different historical periods
(We havenât gotten there yet, but Monte Carlo is also a topic to discuss)
let's say you went from a low inflation environment where inflation was associated with higher returns on equities
but now you're in a high inflation environment and the opposite is true
and the last time you had something like this was 10 years ago
what is your sliding window gonna do
In your case, youâre asking how well a strategy wouldâve worked in different market regimes?
right
if you use too short a window, for instance
and any window is probably too short given how far back our datasets go in finance
the bias of your dataset depends on the timeframe right
if you made a trading bot in 2008-2009 to trade treasuries and it just always bought, that would be the right thing
if you made a trading bot including 2012-2021 and tried to use it in 2022 and it just always bought bonds
you'd just get blown up by the Fed
You could certainly test a model against some historical era, itâs just somewhat inevitable that youâre overfitting tho. This is where, perhaps, Iâd Monte Carlo it rather that test against market actual
sure but implicit in the montecarlo design is that you understand the market dynamics well enough to produce a better sample than historical conditioned on the current environment
which is a big claim
Yah, absent a multiverse, whatâs the alternative?
pretending that history repeats itself
that's the necessary axiom to any of this anyway right
Not precisely, the necessary axiom is that thereâs patterns, but not that the patterns exhibit the same order/etc
But I feel you, enjoying talking about this (few folks here engage on this topic)!
sure me too
so it's hard to think about the assumptions we are implicitly making about what historical behavior means about future behavior sometimes
My favorite pres on this topic: https://www.davidhbailey.com/dhbtalks/battle-quants.pdf
YouTube version: https://youtu.be/e3h9xf3p1DE?feature=shared
I think it often goes unsaid what people are actually assuming with respect to that
my favorite thing that has no real theoretical basis but gets used by everyone is implied to realized volatility ratios
Yah, the entire market is so circular
one is future looking and the other is past looking
but everyone uses them in every asset class
what people should look at is the implied volatility from X days ago vs the realized volatility
but that doesn't even answer the same question
lol, thatâs interesting to people like us. But to the market players, they have zero hindsight
also the meaning of realized volatility is completely different if you're delta hedging vs not
but yeah who cares
low ratio good high ratio bad
Whelp, off to dinner, nice chat!
yup u 2
I installed miniconda but when I type python I get the non conda version. My terminal also doesn't detect the conda command. Do I need to add this manually to my environment variables? I'm asking because during the installation process it said doing so was not recommended.
is there a reason you want to use *conda? because my suggestion is to not.
I thought it was the de facto virtual environment manager for python
-some- people use conda, but there are multiple package managers. And conda is certainly not the most popular.
That said; In some data science circles, Conda is well entrenched.
For a discussion of this: https://www.reddit.com/r/Python/comments/10bxkjp/what_are_people_using_to_organize_virtual/
HELP idk whats wrong
is conda the same as Anaconda ?
conda was designed to solve certain problems for data scientists, not the whole python community. but in my opinion, those problems have since largely been solved for all of python, so conda only serves to create a barrier between its users and the rest of the python community.
Do print(df.columns.tolist()) and put the text in the chat. As a general rule, please do not post screenshots of text.
why?
Because they are harder to read and can't be copied from.
oh, sorry then
helping people often involves googling error messages or running segmants of their code. and it's rude to expect people to retype stuff by hand.
no i'm not expecting people to repeat my code, i only want an explanation or a guideness
i apologize again
Don't worry about it for now. Just run print(df.columns.tolist()) and put the resultant text in the chat.
our professor said in order to display a column with all its rows in the data frame we should do df.loc[: , 'wanted column name'].head() , the print(df.columns.tolist()) is an equivalent to this ? and why didn't this code work , why it said keyerror and what does it mean
No, I'm asking you to run the code and show the result; it's information that I need to be able to help you
ohhh
okey then wait
But it appears that 'England' is not the name of a column in your dataframe
exactly it is thats the problem
thats why i was confused
wanna show u my dataframe ?
can i do a ss ?
I wanted you to run print(df.columns.tolist()) and put the text in the chat.
['A', 'B', 'c', 'D'] this is what appeared, i guess it is the D column named england
this is weird tho
Okay, so there's no column named England. But you expected there to be one with that name. Should the England column have been there when the dataframe was initially created?
so this gives me the names of the my dataframe columns? and u wanted to check if england was one of them ?
It might help if you show the code that creates the dataframe, and everything that comes after it
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
Yes
yes , wanna show u my dataframe ?
Copy and paste all the code and put it in the paste bin, per the instructions a few messages up.
I want to see the code, in this instance. Not the dataframe.
whats a paste bin
Please read this message
i think i have 2 df
It looks like you have two dataframes: data_frame and df.
and data_frame is based on whatever defra_consumption.csv is.
yes, sorry i'm not thinking straight
That's okay
i didn't understand this
Look at lines 3 and 4. You define file_path as the location for a certain CSV file. And then data_frame is a dataframe that represents whatever is in file_path
so whatever columns are in defra_consumption.csv will be the columns for data_frame (but not df). It might be that 'England' is one of them.
make sense, @sterile flare?
yes true
exactly
great đ
this is what i realized after remembering i have 2 dfs
thank you so much, you're the sweetest
and sorry if i was rude
No worries, now you know for the future đ
I appreciate yâalls input on the conda advice. Ima just run with it and see how it goes.
Hi guys, im programming a python script to segment retinal blood vessel, the current result is this:
I've applied CLAHE and another pre processing
how do i remove the circle and the small particles¿¿¿
you could use object detection and mask over the pixels where they are detected
What is often said about time series is that ML models aren't made to predict the future, we just use them to do so
this model is gonna get trained, whether it likes it or not
I am attempting to optimize a baseline fitting algorithm for signal preprocessing by instantiating it as a SKLearn Custom Estimator and using GridSearchCV. How can I define a custom score to optimize the baseline fit, i.e. maximise fit function smoothness and intersections with the signal function? Also, is that a rational way of defining an optimal fit?
The end goal is a SKLearn Pipeline with several preprocessing steps prior to deconvolution through model fit optimization.
is this vanilla SGD?
@wooden sail this is one for you
I'm batching the dataset and reshuffling it, using Adam as the optimizer
loss/train is large because I'm adding l1
I'm too used to using drop out to ever look at train loss
when I started this thing I didn't know about dropout, it's not coded into the transformer yet
I was mostly intrigued by how large the learning rate is and how the val loss isn't dropping that much at the end
But honestly, it's a meaningless observation on my part đ they could be reasonable numbers for your domain
it's a valid observation, it has been overfitting every time
Any method to calculate standard error of c, for a best fit line given by the equation log(y) = mlog(x) + log(c) ?
at some point val will plateu and start to grow
I'm using the LR schedule originally proposed in the 2017 paper, where the transformer was introduced
tho the formula doesn't seem to work as intended for small dimensions like these
I adapted it so that it defaults to 1e-3 if it tries to output larger values
but the intend behaviour is that it starts super small, grows up to 1e-3 or something, and then exponentially decays
oh that's solid then. The most important thing is having a decent lr to start with. If the paper suggests one, roll with that đ
I might need to modify it tho, I don't think they tested it for these values
this is a pretty broad topic and the score depnds a lot on your application. kinda rough without further context. regarding smoothness and intersections with the data, you can achieve this by having an L2 error term, which measures how good the fit is at points you specify, and then an additional term measuring how large the derivative is, trying to minimize both. this is also roughly the idea behind classical methods like splitting the signal with low and high pass filters first
its a chromatographic signal with gradient elution producing a curved baseline which i need to remove. ideally after correction, a maximal number of peak bases intercept with zero.
the baseline fitting algorithm essentially smoothes windows of the signal until the baseline is estimated, the hyperparameters to optimize are window size and number of smoothing iterations
so as I understand it, i would mask the signal to find the âzeroâ points, i.e. local minima, and see how well the estimation fits them? Is the size of the derivative the smoothness metric? I thought minimizing the second derivative would be the way to go.
I was thinking of scoring by measuring how many zero points existed in the signal minus the baseline, avoiding the need to find local minima
the size of the derivative is a measure of a type of smoothness, yes. there are several kinds of smoothness, not just the basic "is the function even differentiable". many of the definitions involve finding a bound for how much the function changes if you change the input
and indeed, for spectral data like yours, one tends to look for the local minima and try to make those points zero, meaning the baseline function doesn't need to pass through all data points exactly
there are many papers discussing this topic of you just look up "baseline fitting algorithm" on google scholar, most of them on spectrographic data
sadly you'll probably find that in your data, even after optimization, probably 0 points will be exactly 0 đ so oyu'll have 0 or only very few zero crossings
1d peak-finding is not very complicated though, you could almost consider this an input to your procedure and just calculate them ahead of time. this would tell you which points in the domain to include into your cost function
absolutely. the baseline correction is actually an attempt to increase the accuracy of scipy.peak_widths by removing convolution caused by presence of baseline.
This is true. My question is however, how to implement optimisation through SKLearn grid search, as I want to construct the processing pipeline in SKLearn with XGBoost classification modeling on the processed signals. I get the feeling im on the right track though - define the metrics, say L2 norm and derivative minimization, and implement as a custom scorer for the grid search
sadly the specifics of sklearn are beyond me, i don't use it. but that sounds about right
sklearn's grid search says something about defining the estimator with a score function, which i presume entails defining a function for the baseline, a score, and having sklearn fiddle with the baseline function's parameters through grid search
fair dues. btw this is the algorithm I applying:
made a small change in the output data, removed all regularization, it's looking promising
too soon tho
but gonna let it cook
is there a particular framework you recommend working in for problems such as this? Thus far everything had been written as python classes from scratch, based around pandas, with scipy.optimize for curve fitting
reduced the model size by half and adapted the LR scheduler to have its intended behavior, I'm getting there for sure
like, it's not baaad
it's very sensitive to punctuation changes, so there's quite a bit of data augmentation that can be done here
can anyone provide some references to federated learning and differential privacy
I think the lesson I'm gonna take from this one is that I should define my end goal more clearly. I'm trying to make the model not overfit, but I don't even know if the value b4 overfit is good or not.
Hi, im implementing a python script using opencv that segment the retinal blood vessel, the current result are this one:
how do i remove the small points????
Apply an erosion kernel
But also, try to not produce them in the first place
which kernel size???
No idea, you have to experiment with it
can anyone help me with pytorch? torch.cuda.is_available() returns false even though i installed cuda
Uhm, run nvidia-smi in the terminal
it opened then closed immediately
Shouldn't open anything at all, should just print out a table reporting the status of the GPU
Which OS r u using
windows
Uhm
it opend a console then closed
Literally never debugged this on windows, but I expect funky non sensical behavior like this
Maybe try to go via the Linux subsystem thing
Torch doesn't use the system's cuda, it uses a bundled one. How did you install torch?
i really should switch to linux
using pip
pip install
If you just did pip install torch, that'd get you the CPU-only version.
Yeah Linux is a good choice since almost all production servers are Ubuntu
I think that's actually how I've been installing it tho
Gonna check
i used this command .\Scripts\pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html then it gave me error saying it does not exist then i used the one in the website pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
I just do pip install torch
i am following a guide to install a Text to spech thingy
huh, and that lets you use the GPU? that's weird; the builds on pypi aren't even the right size (the ones with bundled CUDA are way bigger).
for some reason they say (the guide) you need an older version of pytorch
You should really set up some of robust hyper parameter tuning regime
--index-url https://download.pytorch.org/whl/cu118 should be the right way, yeah. Maybe you didn't uninstall torch before that, and so pip didn't replace the CPU one with the GPU one?..
I did this as a student as well, fiddle with tons of hyperparameters for ages. It's not time efficient, best to set this all up, run it, sleep and check the results in the morning
i used this command .\Scripts\pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html then it gave me error saying it does not exist
torch 1.8.0 only supports up to python 3.9, probably that's why these versions didn't work
Uhm, might be because colab comes with a lot of pre installed stuff
Yeah so true, honestly it even gets to not being healthy
I got a bit of infra setup, I can build on top of it
I'm thinking of using the GitHub actions API to programmatically start several training loops
Instead of manually triggering them
Yeah, that's the way to go
Well, at least some variant of it
Before I do experiments at work nowadays I think about what I want to evaluate etc. and then build something ad hoc to automate training / hyperparams etc.
but it's probably better to use mlflow, tensorboard, optuna, ... for this
The infra I have saves me on GPU compute. I could setup on kaggle/colab, but the free tiers will just run out over night
I think kaggle connects to Google cloud but all our credit is on AWS
But yeah, lesson about knowing the goal is well learned here
Gonna run over the equations for the cross entropy in the context of my output data (which is kinda funky) and make a Fermi estimate of what I would consider a good result
Or just an actual estimate
But so, this basically means that I need to get used to having an overnight delay between setting up experiment and getting the results
yes and no
Hmmm
I can only speak of my personal experiences but I usually think "okay this is my task" and then I draw out a schematic about how I'll try and evaluate what I want to do, what types of models, what types of metrics and I code this all up.
Then I might run experiment A manually a few times to see if it runs and try and make sense of the initial results, afterwards I run the experiment pipeline.
As it's running I prepare experiment B and repeat.
Yeah that sounds reasonable
There's quite a lot of stuff tho. So there's dropout, L1 and L2 regularization, the various LR schedules that themselves may have several parameters, and then there's potentially 3 models to compare across various sizes with quite a bit of parameters themselves
And this is just the first task, I'd wanna do this for at least 3 or 4, which I think is what they did in the MetaFormer study
And I haven't even gotten to the data
Yeah, hence why you should parametize it and use some optimizer.
It's a nasty problem. One of the first things they teach you in intermediate ML courses is you train models to solve a problem that is usually convex. On top of this you have a argmin_Loss wrt hypermaramters: Loss = F(hyperparameters) but this isn't a convex problem whatsoever.
I feel like this image seems so unprofessional in a paper on NLG
I think this whole exercise is gonna be good for me, even if it does not come out as the perfect workflow at first.
I do have some ideas on how to make this hyper parameter stuff searchable with gradient descent. But I'm sure a lot of people have tried it b4 me
No you're right, do it like this reflect on it and improve đ
that's not really true with reinforcement learning. in a sense, a robot learning to stand up and walk is learning to predict the future
and you aren't predicting the future per se you are predicting the best action given the current state
Can anyone tell me how this is different np.random.binomial(1000, 0.7, 500) from np.random.binomial(1000, 0.7)? Is the size parameter different from the number of times?
I agree that could be clearer
the question with reinforcement learning is if the environment admits information about the reward
the number of trials is a distribution parameter but the number of samples is not
I know what conditional probability is, but man it will be great if anyone of you could please help me interpret the equations
looking at cross entropy loss is not a good way to do it, I care more about the percentage of correct guesses
random chance is on the order of 1e-5
cross entropy loss is basically telling you whether the probability distribution of guesses fits the data. i wouldn't write it off so quickly
what about it specifically?
In this case it's not the full picture. I should even be separating the loss into two components. One for replicating the input the other to predict the next token, which is the classification
All this time I thought I needed to reduce model size, now I see that I get better results by increasing it
see how val actually increased, but the percentage of correct guesses got better
Anonymise user identities in large databases to ethically employ machine learning in understanding customer trends and behaviour without violating their rights.
Any idea on how to go forward for this problem statement?
I suggested using Federated Learning + Differential Privacy earlier. Have you looked in them?
I also shared a github repo on research paper implementation of Federated Learning.
Try checking that as well.
If you however don't fancy the idea of reading a research paper and using the code implementation of that paper to learn a new topic, then I'll suggest taking your time to go through this nice detailed blog and tutorial from Flower.
https://flower.dev/docs/framework/tutorial-series-what-is-federated-learning.html
It's not compulsory you combine Federated Learning & Differential Privacy though.
Both concept addresses privacy concerns in the context of data-driven tasks using different approach.
You can just implement either Differential Privacy or Federated Learning and you'll still be on point as well.
anyone here good in advanced math?
Honestly at this point I'll be happy with an overfit đ
Ok training set is getting to 50% correct guesses
So I think I'm getting somewhere
Is this a school project? If yes, then I think, maybe using the approach I suggested earlier might be an overkill.
A more straightforward concept is really just annonymizing information about the identity of people captured in your dataset.
Annonymize their name, address, county, and any other sensity information about the person in the data.
Omg validation loss just started converging out of nowhere around 75% correct guesses on the train set
It was at like 12 randomly jumping around, now it's under 1 and dropping
Honestly I gotta just let it cook
I did look into them especially the recommendation systems , and this is for a hackathon so we need to implement a technique which is suitable for the above problem statement.
So what I initially thought is we have a large dataset and I would just search on Google for the entities which are Personally Identifiable . Check the similarity of those against the columns of the dataset. And i am stuck here as to what to do
when would one use matplotlib compared to plotly
i dont know if i should stick to matplotlib or learn plotly
If you prefer better aesthetics and interactivity of plot
my notebook tab just crashed, : D........
I think almost everyone started with Matplotlib and Seaborn before learning either Plotly, Bokeh, Cufflinks for creating interactive plot.
Are you running the experiment in your local machine or cloud?
Yes this was the case for me too
I was just wondering since i feel fluent enough with matplotlib to learn somethign more complicated
But there would be no point learning it if its worse / inefficient
It was kaggle
I'm gonna give it a rest, tomorrow I'll implement the cloud infra stuff so I don't have to babysit notebooks
Hackathon hmmm...
If I were in your team, I'd be using differential privacy or Federated learning lol at least to have a shot at winning the money.
Are winners gonna be announced based on the concept used or solely based on model performance on a specific evaluation metric?
Its mostly the concept and accuracy plus I just need a roadmap of how I can do it inorder to submit an abstract ? Because this is my first time diving into differential privacy
Oh I see. Having to randomly move your mouse every 30 mins in order to keep the notebook active đđđ
Well, I still prefer Kaggle to free tier of Colab
Colab ran out yesterday for me. I think I got the right idea of what I want to do so I'll just have spot instances training with diff hyper parameters
But damn, this really shouldn't be so hard, it's just a classification problem
I'm afraid I might not be of much help at this time; since I've not worked on any project where I had to implement differential privacy yet.
However, I'm sure there are more knowledgeable people here with much experience in Differential Privacy who can be of help.
If I were you, I'd go down the rabbit hole of checking research paper with code implementation on this same topic or even learning from YouTube or something. ( Devot 1 or 2 days and you'll have a lot to write in the abstract you're expected to submit)
Hopefully, your team wins this Hackathon. All the best đ
I don't think Plotly is complicated though. Since you're confident enough with Matplotlib, you shouldn't struggle with Plotly.
The syntax of Plotly is somewhat similar to Seaborn
Hi can you tell me what can i do to improve in python and what are the projects can I do
!kindling has a list of projects you can do
The Kindling projects page on Ned Batchelder's website contains a list of projects and ideas programmers can tackle to build their skills and knowledge.
Thank you
cat, N,
I know conditioal prob, but what the heck are these chained "="
You mean ~ ?
That's the notation for saying that a given random var follows a given dist
no the conditional prob equation
What about it ? The || ?
Yeah that's a good question
also the double arrow?
This paper is nutz.
5 %contribution 95 % flassy equation
Double arrow is not any standard notation I am aware of
Anything that's not standard they have to define it
And its preferable not to use anything that's not standard
no supplementry nothing
Yeah that's kinda wild, I wonder if it becomes standard notation around some specialized niche
Maybe follow the closest related citation
See if they define it there
Its like those fancy restaurants
yeah i was checking that one only
found it
but its a bigger night mare
thanks for the suggestion
yup integration of kl loss from one of the term
The double arrow thing, maybe it relates to the concept of clustering somehow idk
i will try to find.
Is the pytorch documentation an open source thing ? I really wanna contribute to it if it is
I see, they are generated from the docstrings
I wonder if they're open to mods on this stuff, I can make them easier to understand
the || i think is just part of their KL notation, it might signify something else but i would just read it as part of the KL divergence operator/function
"Cat" is the categorical distribution, Bernoulli with >2 categories or multinomial with n=1 trial
that double arrow notation is new to me, it might be defined in reference 37
i agree that it appears to denote some kind of clustering structure, but i can only guess as to what it means
Hi, Iâm trying to make a model that converts bullet points into full sentences. Iâm not sure how to structure my dataset. I currently just have a .txt file with something like:
Input:
- finished data collection
- started cleaning data
Output:
I finished the data collection and I started cleaning the data.
Is this good enough? Iâve seen some people use json format for this, whatâs best?
Uhm, I'd just pickle a python object, or put it into a parquet file or an SQLite file
What matters most from an ML perspective is that the outputted sentences are useful. There is no "best" output format. It's just a matter of how you plan to use them downstream.
Ok, thank you!
What kind of model is it btw?
Presumably there are examples that are more intricate than "I ... And I ... And I ..."
Iâm thinking of a sequence to sequence model.
So Iâm in this research hub and we have to send weekly emails about our progress through that week. I forget to send the emails most of the time đ¤Ąđ I usually type down my hours and what I completedâIâm making this so that it can take the notes I have, convert it into full sentences and have the email be sent out automatically đ¤Ąđ
True, but Iâm making this model just for this use lmao
I definitely would not use pickle if I could avoid it
~ means distributed as
like x ~ N means x is normally distributed
you know the big fancy N
does anyone have a pdf of this book?Pandas for Everyone: Python Data Analysis, 2nd Edition by daniel chen
i'd reeally appreciate it
i made the wrong orielly account and cannot access it but i have hw due
in a few hours
nvm i got it now
@serene scaffold can I ask you a question about markov chains? (I saw you were a Computational Linguist). I know it can create sudo realstic senetences but sometimes they don't flow very well. Are there any solutions for this type of markov chain problem or is that just a known limitation. (Sorry to bother.)
that's a known limitation. generating text with markov chains is a primitive form of technologies like ChatGPT.
which obviously doesn't have that particular limitation.
I see but using Transfomers are more expensive because of training time right?
or am I wrong?
if you can generate text with markov chains, all the data you have about which tokens can be chained together is, collectively, a language model.
I'm not aware of a non-arbitrary point at which a language model becomes a "large" language model. but creating transformer-based LLMs from scratch is quite computationally expensive, yes.
you might enjoy this reddit post I wrote when I created a markov chain language model for my homework
I therefore submit to all of you one of those most statistically probable passages to appear in the Book of Mormon.
Ha, I'm even funnier than I remember.
lol
I have been having a lot of fun with markov chain
And wondered if there is any fix. thanks for answering all my questions super informatively!
of course 
if the "problem" is that your chains are producing sentences that seem to switch sentence structure partway (you can see a lot of examples of that in the reddit post), the only solution really is to increase the n for the ngrams
or decrease the temperature. I guess.
but both of these will just make the text more similar to passages from the training data.
ya
markov chains with ngrams aren't sophisticated enough to produce things that are "new"
ya what are the best sources for this?
I dunno
fair enough!
I'm trying to run the example code for d3graph from here: https://erdogant.github.io/d3graph/pages/html/Edge properties.html
... but whenever I execute, I just get a "File not found" error from Firefox, saying the temporary file it's trying to create doesn't exist.
"Firefox canât find the file at /tmp/tmpog5luy4x/d3graph.html"
...any ideas what to do here?
Yeah just listing it out, I sometimes use pickle temporarily, but even that I should avoid tbh
I'm gonna try to design a generalized pipeline
Only one input, which will be in json form
I gotta brainstorm
here's the initial version of the game plan
this is similar to what I already have, the only difference is that I'm extending it so that train.py can be selected
there's possibly a much better way though
if I get an AWS AMI with self hosted runners pre-installed
I need to read up on it
https://github.com/machulav/ec2-github-runner#example
this is awesome if it actually works
looks almost done, but it' not stopping the instance
this gonna be epic
this would be amazing if it works out
and it worked out, this is great, im very happy rn
hello, ive discovered LM studio, where u can download free open source models and run them locally. Is there a way to download them and import them on my own python script? with keras or tensorflow?
Hi
Guys, which topic would you think is more interesting to work on for a Master Thesis ? I'm kinda on the fence about that
DeepStereoBrush â Depth Map Interpolation Using Deep Learning
Neural Networks Optimization for Edge/Mobile Computing (such as CLIP network, etc..)
Hi, I have two questions about dummy variables and feature selection in machine learning.
First, so I know that to avoid the dummy variable trap, I should drop a column from the dummy variables. So now I see some people who say that you don't need to drop a column from your dummy variable because Sklearn will do it automatically.
I read some articles that said, "You don't have to do this because the sci-kit learn library automatically removes one of the variables for you in the following code."
# X is the training dataset and we are using Sci-Kit Learn
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [3])], remainder = 'passthrough')
X = np.array(ct.fit_transform(X))
print(X)
Other people said that you need to do it manually, and when I searched and looked at the OneHotEncoder parameters on the Sklearn website, I found that
drop{âfirstâ, âif_binaryâ} or an array-like of shape (n_features,), default=None
so the default of the drop is None, not first.
So, does the SK Learn actually take care of the dummy variable trap and remove a column, or should I do it manually? im confused?
Second, I read an article that says, "Backward Elimination is irrelevant in Python, because the Scikit-Learn library automatically takes care of selecting the statistically significant features when training the model to make accurate predictions."
So again, should I do feature selection manually? or Sklearn takes care of it?
Thanks.
Tbh I think it's only you that can decide this, because what's interesting to me might be boring to you, and vice versa.
Pick what you really have interest in and perhaps, narrow it down to availability of nice / approachable professors who are specialised in that same field in your school.
Hello, I have to generate a 'random on a grid' grid, every mesh has to be occupied by a point, can someone help ?
Here is my code in the case of a regular grid :
def generate_gridded(dim,nb_pts):
dx,dy=dim
x=np.linspace(0,dx,int(np.sqrt(nb_pts)))
y=np.linspace(0,dy,int(np.sqrt(nb_pts)))
X,Y=np.meshgrid(x,y)
X1,Y1=np.meshgrid(x,y)
return X.flatten(),Y.flatten()
Thank you
why some ROC curve life left, and some like right?
is right one having high discretisation of thresholds?
and due to rectangular interpolation, it looks choppy?
one job per hyperparameter set
@past meteor what do you think ? I literally just setup my set of hyper parameters as a matrix in the GitHub workflow and it runs them sequentially (or concurrently if possible)
It doesn't really handle fault tolerance though, if AWS takes away the spot instance it just kinda fails
Still working on how to handle it
I think I can possibly check for unfinished jobs and try to schedule them, but idk yet
definite improvement! Good job. I admire your dedication đ
thanks!
here's how the workflow is looking iff you're curious
ideally I think I'm gonna do 1 workflow per experiment instead of having a bunch of input parameters
Do all of them fail as soon as the first fails or what?
no, I'm disabling that on fail-fast: false
but I worry about the case where the spot instance is removed by aws
I can make this sort of "idempotent"
all jobs run when I run the workflow, but they will first check if that set of hyper parameters were already trained
yeah that's how it's gonna go
and I can set it to dispatch a totally seperate workflow
that checks for failed jobs and restarts this one in case of it
After making a model , what do we do?
kick back and relax
pour another coffee
No what I meant was how to use that model
depends on the model
what model did you make?
Okay how do you deploy and integrate the model
same answer
I didnt get you
the way you deploy a model depends on the model
Doesnt every model need to deployed on the cloud
no. but even if they did, the way that you deploy it depends on the model
Could you give me an example?
one technique is to have the model in a docker container that has a REST API.
What are the other ways
there's probably lots more ways than even I know about. it really depends on lots of different factors.
this is like asking "how do people deploy software" without being any more specific.
Hmmm My bad
Is REST a framework?
if you want to make the question more specific, I or someone else might be able to help
No, it's a style of API. you can make them with FastAPI, among other frameworks.
WHat about flask and django?
people who want to make REST APIs are switching from flask to FastAPI, from what I understand.
I wouldn't try to make a REST API with django.
Ahhhhh sorry mate if I didnt comprehend my question properly , such a newbie trying to figure out things
No problem. You are welcome to look through my early message history.
django is still a great choice if you know you need a database and don't mind doing a little bit of DIY work when it comes to processing inputs and formatting outputs
however it think there is a library that makes doing JSON CRUD stuff easier with django
"django rest framework" i think it's called
or django ninja
that one is new to me. i haven't used django since 2017
I used django rest framework (DRF) for the backend of an app I never finished and it's soul crushing đ¤Ł
Perhaps you should ask the question?
What I should learn at first
Kaggle.com/learn is a good start
Oooh that's super cool
You have been learning by this?
Me? No, but Iâve been using this stuff for a while
Itâs a nice intro because it doesnât try to teach too much at once, I recommend it a lot
Hi folks, in pandas, why is this the case:
import pandas as pd
df = pd.read_csv("...")
df.groupby('country').size() # calling the size() function on the group
def size_filter(grp):
return grp.size() > 2
df.groupby('country').filter(size_filter) # error on size is not callable
Also notably, when I do type(grp) from within that filter function, it's a Series. I'm just not sure what I'm operating on fundamentally in my filter function - why is it a Series? Which Series is it?
when you do df.groupby('country').size(), then df.groupby('country') is a DataFrameGroupBy object, and size is a method of DataFrameGroupBy.
When you call DataFrameGroupBy.filter, and you pass a function to that method, the function is applied to a DataFrame for each group.
not a DataFrameGroupBy
First paragraph makes sense. But the function is being applied to a Series, not a DF, no? If thatâs the type being printed
Thanks for getting back to me
can you create an entirely self-contained example, where df is defined in terms of (for example) a dict literal?
one way to do this is to do print(df.head().to_dict('list')) with the actual dataframe that you have.
but it needs to have enough unique values for the country column to be interesting.
yeah just double checked, I was calling type on the wrong thing - pitfall of notebooks. I see it's a DataFrame now
Well, that certainly clears up a lot - thank you
I appreciate that you recognize the pitfall of notebooks 
(I use notebooks for some things, for as much as I shit on them.)
Useful for visual representations of things during exploration, but yeah, makes keeping track of things very difficult
is there any better models than Black Scholes in terms of call option price?
For what? The people who typically answer questions in this channel are scientists. So we don't think of models in terms of their monetization schemes.
Ah ok mb
That model you mentioned. What does it do?
So the model, which I've recently heard throughout my researches throughout youtube. The model: Black-scholes models is a pricing model used for the valuation of stock options. I would say it looks good to me as it refers to the volatility of price, risk free rate, etc.
However, I would like to know if there's any better model I could use or should I stick with learning this new model
Black scholes is a very fundamental model and worth learning. A lot of things are based or derived from it.
Itâs one of those; everyone uses it to some extent, even if thereâs some flaws and weaknesses, itâs still pretty good.
The interesting thing about it is that implied volatility is derived from it (since you can observe the current price, you can solve for ivol)
intresting, anyways I'll learn this new model and hopefully know how I can implement or apply it to my future projects
It sounded intresting to me, but I wanted to know if it's worth it. Now I know it's worth it, so thank you so much đ
does anyone know a good implementation of Kendall's W concordance coefficient?
anyone here used Quarto on VS? I need some assistance with running Quarto
So I was able to compare dxy , spy , vix but I want to use python to build a model using this because well the dxy acted as the open interest while vix declined sending spy up
Anybody want to help me also finish a good project itâs turning the cot report into the dmi indicator ? Basically taking the values of the dmi to spit a bias and trend confirmation by acting as a live cot report
Because both have 3 variables
How is pitfall of a notebook if you can explain that would be great
Restart and re-run all helps a lot
Itâs user error still, but easier to make mistakes when you might have modified a variable later down in the notebook and you run cells further up
Nice bro
Thank you brother đ¤đžđ¤đžâ
Do you guys think a dmi is able to be used in python to spit a bias out based on the values since they are numerical ? Using the values makes it easier to configure true or false faster and more reliable with out too many outliers tbh
hard to take a thinkscript and turn it into python
Starting to enjoy it Iâm taking my 3 1/2 market experience and just trying to replicate my manual trading
yes true, restarting ane running all helps if possible, sometimes some cells run a long time, so that may not be best option, but for most cases a good option i think
Of course yeah, at the end of the day itâs my mistake
i mean the way notebooks work it's easier to do it than say using scripts as in scripts restart and run all is default mode, where in notebooks is only an option. i use notebooks daily and end up in mess often. i need to remind myself to keep notebooks code clean and tidy, but... lol
almost there
can someone give me good advice for getting colab alternative that can run at background (colab pro seem to me kinda expensive what people say on reddit as its computing unit get finished)
so if any one here have good alternative of this cheap and fast please recommend by tagging me
how long do you need a session to last?
That's exactly what I'm coding rn
I'm running ec2 spot instances using GitHub actions
I'm getting 10-30% of the original price
in googl colab t4 gpu free plan taking 11 hours to complete but as its free plan it's session get expired
well also i m experimenting but still 11 hours i think would be the one (1x t4 gpu of google)
can run multi gpu if available
i was thinking free paperspace but it's 6hrs session only, for paid ones will be longer, maybe cheaper than colab, you can check their pricing https://www.paperspace.com/pricing
Paperspace offers a wide selection of low-cost GPU and CPU instances as well as affordable storage options. Browse pricing.
also i m very confuse in what this computing units of colab, like hwo they work
i rememeber this
i once created accoutn on this XD
each gpu on colab will have different compute units cost, like A100 will be most expensive, t4 cheapest i think
many people are saying on reddit like their computing unit end in a day like if units get finished will they get back next day or just the end
ec2 from amazon?
and does this has gpu or multiple cpus?
Has everything you need, it just takes a fair bit of work to setup everything
Inputted the wrong parameter for the 1000th time
I'm so distracted rn
I'm even gonna train this on CPU tbh, what's the problem in it lasting all night if it will only cost me like one dollar
Better than having it run in 3h and costing triple
The relative scale of that should not line up
typically, the cost of getting enough cpu cores to match the GPU (assuming the actual math that would be done on the GPU is the limiting factor) would result in the cpu cost version being much higher
heello, i'm working on a CNN project using keras
It does when there's a shortage of the cheaper stuff
My result when I trained the model was val_accuracy 98.06
and my accuracy was 95.11
I keep getting my quota requests wrong too
So I'm always operating on a subset of what's available
what do your other metrics say? I.e. F1, Recall, Precision?
i think i dont use
my model is like this
i'm using reduce on plateu too
reduce lr on plateau
I saw someone saying it's good to use
Try the reverse, starting with small LR, build up for a bit, then exponential decay
If I have a 2D array of values [[1, 2, 3, 4, 5, 6], [2, 4, 5, 1 , 1, 2]], is there way I can get a minimum of each column such that the values are above 2?
Hello all, slightly off topic but I wonder if there's anybody aware of any app out of there, which in a similar manner like Duolingo, can support training in small chunks on data science challenges.
The idea is to get help developing those skillsets without digging into projects of any sort of without managing little scripts creation without any support in confirming the correct implementation
the experiment is literally gonna take 24h to run
waiting on GPU quota again
I think aws makes it hard on purpose so that people don't overuse the spot instances
you get certain number of compute units depends on the subscription, there is colab pro and pro plus i think. the compute is for full months so if you use all in one day, which you can easily do with A100, then you don't have any till next month. you can pay more compute units additionaly each month, either in 100s or in 500s compute units, like 10 and 50 dollar.
Thanks @tacit basin
yeah for A100 gpu you get like 20 something hours per month on most expensive plan pro plus
good thing about it it's that they are usually available, on some other clouds that's not the case
lol just woke up đ¤Łđ¤Łđ¤Ł
what is the best library for GANs right now
A good amount of time the next thing I need is to correlate the volume and compare assets to the vix for low volatility or high volatility
Do you mean generative AI instead of GANs ?
Can you add more clarity or further details to your original question.
GANs is just like any other type of NN. It's like asking which library is best for RNN or CNN.
I don't see how that's not a valid question
there are libraries for RNNs and CNNs
but yeah I guess you could say generative AI
I was using this app to learn Deutsch around 2020. It was all fun and nice till they updated the app and introduced gamified style of learning. I don't know if things has changed now
they have admitted that they are not an education company but an entertainment company
as a linguist, my professional opinion is duolingo bad
Of course, no question asked here is deemed invalid đ
What I inferred from your original question was:
/Which library (framework) is best for GANs./
And by "library", if you were referring to Keras, PyTorch, TensorFlow etc... then, I don't think there's a specific framework that's better than the other in that regards.
my favorite foreign language is korean and they had the absolute worst korean lessons that's how I realized it was a scam
there's also libraries that have a bunch of networks already implemented like stable baselines 3
for use in particular contexts like reinforcement learning
Well I love it, specifically the gamification thing, it beats doing 0, which is my most likely situation if I don't use it
At least I like the weekly rank competition they have. I hope they've not removed that feature
After some time I'm gonna branch out, voice chat, see movies in German, etc
model keeps overfittign
what is it about sentiment analysis that makes it so easy to overfit
this stuff has been a huge success tho
I really just need for aws to for the lvoe of god give me access to that juicy spot gpu already
been playing with quotas for almost a month
okay, I think I'm going to look for a larger dataset and data augmentation techniques, I refuse to believe that the transformer can't perform this task
What instances are you trying to use?
We have almost never any issue getting on demand instances, I dont think you'll ever get them on spot though
I'd be happy with a p2.xlarge
I'm trying spot
They do list it tho
yeah but most of the time they are never available enough
Normally when we do training runs we have retries on our scripts to spawn instances because we need to check other availability zones for available instances on demand
Okay, I'm gonna search Google for a bit on this GPU shortage thing, see what I can come up with
It can boot up any ec2 on demand or spot instance
nah I mean like your ML lib
i.e. PyTorch, since they are what are interacting with the hardware doing the math
Oh, I don't know what you mean by TRN, I assumed it was some instance type
Is it like TPU type of thing
there is TRN1 which is AWS' tpu thing
I'm using pytorch, don't know if it supports TPU, but I assume it does
you might have a better time getting some spot instances on those perhaps
Oh that is clutch
and still have a decent speedup
I'm gonna test my code on TPU using colab, thanks for the tip !
I got a GPU spot instance
where can i learn low level working of LSTM
I know the states/gates and suff
but i wanna know how embedding and iterations run though
What would you suggest otherwise, and linguistically speaking to learn a new language?
you need to understand structurally how the language differs from your native language. and you can build your vocabulary by writing sentences and speaking them aloud, and talking to native speakers.
long-short term memory
Clear, I was referring mostly to the process as using a supportive app. Do you have any preference, or anything better than Duolingo that you would recommend?
I don't have anything in mind--sorry
Thanks anyway
Does anyone have suggestions for a course on AI and more specifically LLMs? I don't mean creating a model, but topics should include:
- Running a model LOCALLY and not just sending data to OpenAI
- RAG
- How to differentiate and pick a model to implement in the chain
- LangChain (or other relevant libraries)
- Fine-tuning (maybe?)
- Ideally the course would also have a community to ask further questions
I'm doing some of this now, but it mostly feels like guessing. So I'd like to fill in some of my many knowledge gaps.
This is so much work, y so much setup for this I don't get it
Y r people publishing 10GB sized images
What is life
I might've dropped the ball when defining the storage for the last AMI tho
Don't matter, might as well now over then now under
Storage is supposed to be cheap anyway
The instance I managed to catch is AMD based, which is way I'm still on this. Amazon Linux AMIs come with stuff for Nvidia
And it failed
AMI got corrupted for sure
Need to repeat from the original
Gonna give it a rest, is getting late
But it's a matter of time, tomorrow I'll finally have GPU
hey 2 questions, should I learn how to use pytorch or tensorflow and what's a good video to get me into it?
Most people prefer pytorch. But don't think of it as "learning pytorch". You're learning about neural network theory, and applying it with pytorch
oh ok thanks and is there like a video a lot of people recommend or just start watch everything?
Whatever you watch, keep in mind that you'll learn nothing from just passively watching. Take notes and apply everything.
Is there a LaTeX or MathJax bot available to render math for this channel?
.help latex
**```
.latex <query>
*Renders the text in latex and sends the image.*
helpful embed, but yes - there it is
you can experiment with it #sir-lancebot-playground if you like, as well, especially as the resulting images do not have delete or revision features if you have incorrect latex input
.latex \LaTeX
Yay
Now I can be happy
Thanks
So I decided to enroll in a machine learning course focusing on neural networks. I don't know if it's just me, but I thought I was very comfortable with multivariable calculus, and this notation is really killing me. For example they wrote that given a model for a neural network X with depth N, the model
.latex $Y^i = F^i(\mathbf{X}) = f( \sum_{i_N} w_{N j_N}^i f ( w_{{N-1}, j_{N-1} }^{j_N} \ldots f(w_{1,j_1}^{j_0} X_{j_0})))$
I'm just going to compile it on my side and paste it as an image I guess
With some loss function:
Clearly has that the derivative with respect to the weights depend only on the outer most nested function so that
So my main confusion is that I have no idea how people are able to chew through this much notational complexity and just conclude something about the form of the partial derivative so such a blasĂŠ manner. Do people just not really care about the fine details? It took me nearly an hour to carefully keep every subscript and subscribe in my head, understood what the equation was trying to do and then apply the chain rule.
This is aimed at an upper senior undergraduate level, so it's not exactly ML for babies I guess. But I was kind of expecting a little bit more hand holding with respect to the computation.
This is a good place to start
Welcome to the most beginner-friendly place on the internet to learn PyTorch for deep learning.
All code on GitHub - https://dbourke.link/pt-github
Ask a question - https://dbourke.link/pt-github-discussions
Read the course materials online - https://learnpytorch.io
Sign up for the full course on Zero to Mastery (20+ hours more video) - https:/...
shout out
I have a question about Bidirectional RNN. How does Bidirectional RNN work when there's a sentence like "I am ___ hungry, and I can eat half a pig."? Can Bidirectional RNN be used to fill in the blank?
As a CS grad student, I made the mistake of taking a stats class that was for both stats majors and CS majors. Terrible mistake: the CS students were left in the dust in the first week, never keeping up with the high speed notational complexity that the stats folks were comfortable with.
Thanks! I'll take a look! đ
i've been really sad lately because i'm struggling in the math course (probability, linear algebra, discrete math, but mainly probability) in my data science program. i feel inadequate and i was wondering if you guys have any tips or advice on how to improve my understanding and skills in those subjects? are there any youtube videos or anything that can help me understand how different math problems apply to different tasks in data science projects? I think if i understand how i'm going to use them in a job setting, it will help me learn better
i think i always struggled a little with probability even when i took stats courses before
sorry to say that engineering maths are baby maths in undergrad :p
chain rule is the name of the game
UNLIMITED POWER
notational inside baseball is really exposed for the obfuscation it is when you compare papers to their code
I have absolutely no problem with graduate level statistical notation so if this is comfortable for you in CS Iâll just have to get used to it.
It is particularly annoying that this course uses a subscript in some cases and a super script in other cases to denote the same thing such as the index of the current epoch. It is making it very difficult for me to be able to just ignore a symbol that isnât of interest at the moment because those symbols appear in multiple inconsistent locations.
Great day everybody.
I would love your opinion in understanding what could be the best approach in determining the impact in web site traffic changes given changes on the page.
I basically have historical data, and I know the point in time when changes occurred.
I don't have confidence the casual impact is the right direction, also because there's no other way to compare/confirm the impact.
What is your take/advice?
Hello, I need to do work that checks whether the game "beat saber" is being played or not. To do this, I separated some images of him standing still or playing, but the still images are very similar, does this interfere with the model?
If they are not like this, the database becomes too small
It will likely overfit if all images of one category are very similar
If they are all in the same position, the model could f.e. get very good guess if it checks just a single pixel in your data
Whereas you want it to learn that it is not playing when the position does not change
Is it better then for me to take similar images even if the database gets smaller?
it's a dataset, not a database.
the size of a dataset is typically measured in the number of instances, not the size of each instance. you can make them smaller if they're still useful for what you're trying to do after that.
all my threads are doing their duty
PPO marches on in its inevitable but lengthy quest for convergence
it does, you can reduce it by doing label smoothing
honestly it still learns
people trained models on random labels and they still learn stuff
you could also train model and see what images the loss is the biggest on after training and remove those
Super proud of myself
what are you doing to that poor computer, monte carlo sims ?
vectorized reinforcement learning envs
this computer is living up to its vocation
it was given 64 virtual threads to be used
I feel bad for all the computers that are never used to their potentials, doing nothing but opening chrome tabs and copying memory around for stupid youtube video browsing
with gpu and stuff
it was a lot of work because AMD has very bad ML support on AWS
in the end I found an available nvidia instance
so I didnt even manage to make the amd stuff work
their latest image is outdated, theres no aws ami for it, etc etc
when u get the sus-est error ever
I feel like I'm doing what I was doing b4 but now at an industrial level
Can spawn hundreds of training loops in dozens of GPU machines
Only limited by AWS quotas
And money ofc, even with spot the burn rate can become large
nice. Iwas long TY today
TY ?
ten year treasury future
And I can track my loops on the go
ML=infra, all else is EDA
That's the lesson I'm taking
how do I take a standard normal distribution and transform it into something that looks like a square root or log shape what function can i use
Oh sorry yeah thatâs nice đĽ thatâs what Iâm working on building in my Script a model that takes dxy , 10y , vix to create the scenario for the market sentiment
That's cool
chat gippity to the rescue
I'm very tired rn, that thing took me 2 days to make
Didn't even lunch todah
That's an interesting question
You want a transform on a gaussian that transforms it into a sqrt or log shape, I never encountered that problem
It's actually easy tho
Think point wise
You're solving an equation at every point
Like you want
gauss * f = sqrt
f = sqrt / gauss
Something like that
yeah I see what you mean but I'd have to pick out points and do the math by hand
I'm not that smart
this is a great use for chat gpt
No it's literally just dividing the samples from one by the other
easily verifiable
As long as no zeros
return boxcox((self._get_return() - self.rate / 252) / self.return_volatility)
this is the reward function now
I expect a better mean variance ratio out of sample with this let's see what happens
Hi would anyone be willing to teach me ML using python đ
You good in python already ?
depends what part of python django no but I know most of the needed stuff
If I want to go into data science, should I major in CS: ML or stats: data science
both
But a double major would be painful
I'm actually using Yeo-Johnson with a lambda of -1 that's basically what I wanted
I actually don't think it's trivial if it's named after someone tbh
will be interesting to see what happens with different values of lambda
I would like to start with ML but I am not enought good at maths yet
Only one problem
but I am preety enjoying data science
its kinda easy
and it gives you fun
imo
Dude solve it numerically by sampling both functions, dividing one array by the other and cubic splite it
Easy
those equations are probly the result of the same procedure
But with the analytic expressions themselves instead of their samples
Which also does not look hard to do
I wouldn't say it's trivial, but it's also not hard imo
yea I skipped real analysis sue me
my real analysis was insane
the professor decided that he wanted to summarize the entire math field and teach it to 2nd year students
The memes were insane
Like dude was straight up teaching differential geometry
Which was only gonna be useful to the 20% of the class who would've eventually gone to MSc in physics
Sorry you triggered me by mentioning real analysis
._.
hi fellow data scientists
hoi fish
I'm eating a snack cuz otherwise I can't sleep
my original idea was to just multiply everything less than 0 by 2 and everything greater than 0 by 0.5
which probably would have worked but I never tried it
when ur algorith, makes a scientific breakthru
(actually these jumps are just an artefact of convolution smoothing of the episode rewards, the negative outliers make those big depressions)
Thank you!!
going into data science majors, is taking statistics in highschool more vauable than a CS class? I feel like I can easily learn python and other tools outside of school than learning stats on my own
are you sure you'll actually be majoring in data science? or computer science?
See if you can find the start of 30 days of ML on Kaggle.
It starts âbasicâish
ok yeah I've just relized that, I don't have enough knowledge of the industry to understand what I'd like/dislike
so, I think I'm oging to play it safe with CS
then, If I find that I want to specialize on a certain thing, maybe I'll get a minor in it, or switch to it for my masters?
feel like thats the most logical way to go about it at least for now, I like looking at the whole picture, probably don't need too though
anyone have an example of using GANs to generate samples from correlated time series
if not I guess that's going to be my next project
Milan has 16gb gpu at 7 cents
on aws
I've been applying LR as a function of the epoch, should I be doing it as a function of the current batch ?
the model overfits no matter what I change, culprit is data for sure, tho I think freezing the tokenizer and positional encoding would help a lot
increasing the distance between output and the tokenizer seems to help a lot
which is not intuitive since the number of parameters grows, so it should overfit more easily
my working hypothesis is that making the model grow that way slows down the convergence by a bit, so the final values on the loss val end up being shifted
tfw you're not sure if you're gonna run out of RAM or not
so the actual graph that I need is a loss/val vs loss/train
so there's two paths here
- find a larger dataset and use that + data aug
- modify the training procedure so that positional encoding is determined analytically and tokenizer is pre-trained
I'm tempted to tinker with the model, but experience has taught me that data is king, there's like a good chance that changing the dataset to higher quality stuff will make loss val converge in 10 nano seconds to the planck scale ._.
Hello,
I have equations like these(32) that I need to solve:
i_6 + i_22 = i_3 + 83
i_12 =i_26 + i_7 - 114
i_16 =i_18 - i_5 + 51
i_30 - i_8 = i_29 - 77
i_20 - i_11 = i_3 - 76
..........................
I have tried to use sympy, but its been 12 hours and the program is still running, am I doing something wrong ?
from sympy import symbols, Eq, solve
symbols_list = ['i_'+str(i) for i in range(32)]
vars_list = symbols(symbols_list)
equations = [
vars_list[29] - vars_list[5] + vars_list[3] - 70,
vars_list[2] + vars_list[22] - vars_list[13] - 123,
.....
vars_list[1] + vars_list[21] - vars_list[11] - vars_list[18] - 43
]
solution = solve(equations, vars_list)
for var in vars_list:
print(f"{var}: {solution[var]}")
yes
you're trying to solve it symbolically
the best approach is to translate the problem into a matrix equation
Ax = B
then use numpy or scipy to solve it
can even solve it by hand
thanks
Can numpy or somethign else do that for me ? translate equations like ax+b=c+d into ax=c+d-b ?
you can probly do that with sympy
but then the solving itself gotta be a numerical approach, there's just to many equations
the best is doing that yourself on paper
you'd have to read the documentation of available solvers and then it's up to you to prepare the problem in a compatible way
đ
sadly i'm not trolling you đ that's why people go learn this in uni
it's easier than it looks, after some practice it will be second nature
I reckon most people who studied this can transform it into matrix form right from the equations you wrote without modifying them
you already have them in matrix form, just gotta move a few coefficients around
i really do suggest you grab a pencil and a piece of paper and write it down, it won't take you long
is it possible to train a model with sine function without using LSTM? from what i see it doesnt work beyond training dataset
Try using Taylor features and a small learning rate
its using single value input and output, how am i supposed to use taylor series
instead of feeding x, feed [x, x**2, x**3, etc]
you should also normalize it
y = x % 2pi something of the sorts
CNN will work
You will need to send like 100 last values as input
hmm
how do you expect it to learn sine at all just given a single datapoint
How to host a low traffic deep learning model?
So i want to host a deep learning model, A10 seems good enough for my needs. I am not expecting a lot of traffic, so paying hourly for aws ec2 or ecs doesnt seem like the best way? Can anyone guide to alternatives that charge on per api call basis?
um so guys i wanna develop an ai application to detect if the user is looking at his computer what tools should i use and how do i do it?
if it can learn linear equations, why not sine
I think he means the network f is of the form y=f(x) where x is a single real number as opposed to a vector
Hi there, just curious, how do y'all modularize/organize your code in Jupyter notebooks?
For context, I have recently been given 2 problems to solve using different types of machine learning models (i.e. classification and regression models) and I am having difficulty splitting the code into individual functions that can be placed in another Python file (so as to avoid having the Jupyter notebook become too cluttered with long sections of code).
Hi, guys. Can you tell my someone python libraries for beginner developer?
What do you plan on developing using Python? The libraries you might need to use depend on what exactly you are trying to develop
data science
I see, that is a category of projects that can be done in Python, are you trying to do some data analysis on a dataset? Or are you trying to do something else?
If you are doing data analysis, the following libraries might be useful to you:
numpy
pandas
matplotlib (Used for plotting charts and visualizing data)
However, this is just a general list of libraries as I am not sure what exactly in data science you are trying to do
I wanna make ai for sorting flowers
for example
dataset: roze(500img), sunflower(500img), chamomile(500img)
thx
have a nice day
when i will make this, i will tell you about this
No problem
Thanks, same to you
Sure
xd
Just bumping this question in case anyone is able to respond to it
Hereâs how that went â¤ď¸â¤ď¸
i tried your suggestion, it fails to train that way
Increase model complexity, add a couple more layers
Keep learning rate small, I've done this before and LR was the final bullet
model = Sequential()
model.add(LSTM(10, input_shape=(n_feat, 1)))
model.add(Dense(10))
model.add(Dense(1))
model.compile(optimizer=Adam(learning_rate=0.001), loss="mse")```
used this, any suggestions?
Also note that the tailor series of sine doesn't have all orders
x, x3 and x5, and etc
omitting even powers is cheating :p