#numerical-analysis
1 messages · Page 26 of 1
I would assume round to even. But I guess depends on implementation?
and did you mean 1/y is less than machine epsilon or y less than machine epsilon?
y
do we round to nearest for y?
if i can figure that out question is ez, since 1/y is allowed to have more decimal digits than the mantissa
of the the two numbers being dividied
What do you mean by that? If it rounds to even or nearest for 1/y? Again, I would assume that depends on implementation. I think default 754 is to even
its a rounding method when u cant represent number w floating point precison
ok so you mean smallest possible number of the data type? ```
julia> floatmin(Float64)
2.2250738585072014e-308
julia> eps(Float64)
2.220446049250313e-16
oh wait i figured question out im dumb
julia> 1/(1E-400)
Inf
đ
xD the question was asking about cancellation error the whole time, sleep deprivation hits a mfer diff and make dem illiterate
would DFAs and regex conversion fit into here?
I just need to check if this NFA describes (ab)*(ba)*+a(a)*
I hand verified most of the cases but I'm concerned I missed something
@brave crypt I'm not completely familiar with that notation, but if + is an union then aren't you missing the "a" word?
Aw dammit, okay I have an older variation but I thought it didn't really make sense
and yes + is union, its notation from Miruoka
As a deterministic automaton it doesn't make sense because q0 has two paths with the same input, I think it might be missing an $\epsilon$ path between q3 and q2, but it's been a while since I studied this stuff
nanashi
Oh, do NFAs need to abide by that condition as well?
thats what I was assuming from the start
oh
okay then this would be acceptable as an NFA conversion?
Do you not require to make explicit the empty string paths?
my prof wasn't really clear on that, and the book wasn't making it to clear if they wanted empty strings either
i see it in explicit calculations when they do some abstract stuff, but otherwise there is no other mention
I remember it being more or less a must because not every path was connected by empty strings, but I'd probably lie if I asserted anything, I don't think I'll be of much help
Sorry
alright, well thank you for reminding me of that bit about NFAs, I'll think I'll go with the second variation and see if there is much else to be done
aight i've got one more for checking over if anyone else is interested, I tried to use some minimizing techniques from JFLAP to get it down a bit more:
$((a^*b^a^)^b)^$
Manzareh
I spent some time today optimizing the second and the first regexes to see if it would be easier to check my work and all I was able to get down to was $(ab)^(ba)^ \cup a^*$ and $(a^b)^$
Manzareh
I used a regex equality algorithm (this one: https://bakkot.github.io/dfa-lib/regeq.html) and did some minor commutative madness but the first was the one I couldn't figure how to get any shorter
it appears with JFLAP the generation of NFAs with the regex to NFA converter tends to get less noisy (with all the empty states) as the regex gets shorter which mostly lines up with what I've read, however I still don't really find either of these solutions particularly satisfying
is there a way of figuring out what equation would describe a path traced out by an object moving through a vector field? (in my case the vector field would be very similar to a gravitational well thingy) https://media.discordapp.net/attachments/712782199732174888/889961685102764103/unknown.png
here's an equation I was able to come up with for such a vector field, idk if it's even correct, I'm assuming it is lol
[b-a] is the vector from a to b
R is the radius of the circle
Hi does anyone here know lambda calculus?
Lambda calculus should go in #foundations probably
Got it, thanks.
EmilB
EmilB
Anyone good with secant method that can help to determine the radius of a cone with surface area $S=1200 m^2$ and height $ h=20m$? Im not sure what my function should be in this case
EmilB
I don't know if this is the good channel to ask, I don't know if there's one (the CS discord doesn't seem to have one either)
I have this exercise, I have the intuitive idea but I really struggle with properly writing it down and formulating the argument
Let $s, k \in K<<A^>>$ be proper series. Show that $(s + t)^ = s^(ts^)^*$
Philémon
These are formal series
And a proper series is a series over a semiring $K$, so a function $s : A^* \to K$ such that $s(\epsilon) = 0_K$
Philémon
You might have better luck in #advanced-analysis
hi
anyone familiar with taylor series here?
i wanted to know what kind of projects can be done under the said topic
What sorts of projects are you interested in
#advanced-analysis might be a better fit depending on what your interests are
What level of math are you at
school grad
i mean college level
or maybe eulers method
not necessarily taylors
Bisection method or central difference, forward or backward difference also..
You there @wide spear ?
how big is a "mini project"
not that big,
u could refer this
its more or less like an investigation or a research about a particular topic that could be used in engineering applications
If that is your teacher, maybe do something on finite volume methods? he seems to be into fluid mechnaics
ok
while doing the project
eulers method
not necessarily taylors
Bisection method or central difference, forward or backward difference also..
based on these topics
or related to computational
I have ideas related to my research, but I don't have much time to explain stuff and I don't know your background. But I can give you easy to digest references
tq
One idea would be to describe the spectral distribution of the Laplacian discretized with FDM https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors_of_the_second_derivative using the theory of generalized locally Toeplitz sequences
I am pretty sure your teacher has never heard of that đ
(it is kind of an advanced form of Fourier analysis)
but easy to learn the basics for somebody with like msc background
just tell me if it sounds interesting and i can give you references
Other people should come with suggestions đ
You can do perturbation analysis for a differential equation
This also touches upon Taylor series which you mentioned earlier
Say you have a differential equation $u''+\eps u=u^2$ or something which has a solution $u(t,\eps)$ and then you solve first for $\eps=0$ and then perform a Taylor series expansion of $u$ around $\eps=0$
Angetenar
And then you can compare this with what you get numerically
Hi everyone. I was wondering if anyone's got any resource recommendations for transitioning from applied math graduate school into industry. Graduating in May and I'm starting the job search!
I do most work in algorithm optimization and numerical solutions to continuous problems. Mostly the former
What most interests me are jobs in computer systems (I'd love to work for Intel as an Algorithm Engineer for instance), or aerospace, or digital signal processing
Ok if you want to work in aerospace, brush up finite element stuff
what country?
United States
aerospace is 99% fvm not aware of anybody using fem (except for structural computations)
https://boards.greenhouse.io/boomsupersonic/jobs/4017480004 is a job posting for plane design
If you want to work at Intel on something like MKL then you should be familiar with how algorithms work in hardware contexts
This is great information, first off thank you
on signal processing, i can recommend petre stoica's spectral analysis of signals and yonina eldar's compressed sensing book (i forget the name)
This is some stuff that Intel people are doing
I would say if you want to work in aerospace it is almost expected to have a PhD, at least from my experience
If Intel interests you, then Nvidia and AMD are also doing similar things
I'd say working in that realm (Intel) is my top pick
Nvidia is more GPU focused, do you have experience with cuda and stuff
I do not
I have a class this semester where I can read up on basically anything I want, so if you have any recommendations for what I could read into for that area I'd be grateful!
Also communication avoiding algorithms are a big deal at Intel and co
Okay, I'm going to look into that and their MKL. Thank you for the info!
Any ideas on how to start this? Does it just mean we need k terms of the Taylor series to get a reasonable estimate of the root?
As for how to start it - one method is to just write x_(n+1) - r in terms of x_n and r and expand f(x_n) and f'(x_n) as taylor polynomials with enough terms that you get something non-zero; stuff will start to cancel out and you'll be left with the quadratic convergence required
(I'd use r = 0 WLOG tbh)
Thank you I will try this
np
@past pebble you're solidly in qfin right? maybe you can help @tardy ember
hes a phd math looking into qfin industry
what's the context? were posts deleted or on a different channel?
oh no I just thought to introduce @tardy ember to someone familiar with qfin industry
Hello!
hello
i should say though that qfin is very varied and the term itself thus rather meaningless (not to mention that finance itself is all about detail so often you'll find that "good firms" exist in the same sense as "good universities", i.e. comparison makes more sense at a team/group level)
Interesting, that makes sense
well, e.g. AQR is very academic (lots of staff having PhDs) and publish plenty of whitepapers (probably not much in the peer reviewed space), but more from an econ perspective
and I don't think there's enough math to keep a math PhD entertained in the smart beta/factor investing land, though I'm ignorant of the deets
lopez de prado was there for a bit trying to bring ML to the space, but I guess it didn't work
lopez do prado, wherever he is now, has some neat results on hitting times and analytical optimal entry/exit
ive seen aqr names on a bunch of arxiv papers so it sounds like a place for phds yeah
Makes sense
For what it's worth I'm largely accepting that I'll probably enter into qfin (or really anything other than academia) on its terms rather than my own if that makes sense. Whether I find the work as interesting as my current stuff, it does still seem like somewhat interesting work, pays well, etc. Hence why I'd also consider e.g. trading.
sure, but there's plenty of firms doing very different things is all, finance is all about fine detail and finding a niche and being good at it
people often think, and this is even more true on all the internet forums, that finance is all about stocks, etfs, or exchange traded products
uhh wheres that quant shop with lots of ex-msri peeps, some still publishing non-quant papers
always funny seeing a rando result in eg protein folding coming out of a quant
i believe e.g. de shaw (the person) is doing protein folding himself nowadays, not finance
DE Shaw has a research division
There's also Renaissance
Which was founded by Jim Simons
and as for trading, there's a vast spectrum there too. one could just execute (for a PM/signal), or you could be taking views of the market (so being a PM, or being part of designing the strategy), or something in between or both; and again one might ask if a quant trader is someone guiding trading algorithms, or someone trading high touch illiquid derivatives
@past pebble sorry I was out getting food
So right now I'm looking at trading and/or QR internships mostly. I can't quite do AQR or SIG until summer 2023
Also don't know coding or stats so I don't know if research is at all accessible in the near future
Been looking at Jane Street, also today was told about De Shaw so maybe that too
I'll keep these names in mind
how can ODEs like y'(x) = y(sin(x)) for y(0) = y0 be solved?
solved numerically
are there algorithms for such ODEs?
itsmeyash31

They probably don't mean y\times sin(x) on the right, but rather that y depends on sin(x) or sth
The latter is not immediately sensible to me though
ah
a closed form sol will exist in this case?
No it won't
Why would a closed form solution exist
Anyways
I don't see what the issue is
You just solve it numerically
`but how do you solve it numerically? I don't think it is that straightforward
Forward euler?
I can write up the details later but you just get a time marching procedure
y'(0) = y(sin(0) = y(0)
y(delta t) = y(0) + y'(0)*delta t + O(delta t^2)
y'(delta t) = y(sin(delta t)) = y(delta t + O(delta t^3)) ~ y(delta t)
and so on
that could be wrong but that's my first instinct to check
If sin(t) is not a point that you have explicitly you can just compute it with linear/polynomial interpolation
^ + linear/polynomial interpolation
hi everyone, I'm trying to simulate a physic system in which I smoothly transport a particle by slowly changing the hamiltonian. I'm using the scipy package on python with eigh function to diagonalize. When I take the inner product between neighbors states (that should be nearly identical) I get negative signs in some cases, like in here:
product:(0.9966026884156858-0.05932605607477859j), psi1: (0.032943245023489+0j), psi2: (0.03339802473783493+0j)
product:(0.9966084381126055-0.0584749753051199j), psi1: (0.03339802473783493+0j), psi2: (0.03403795510530472+0j)
product:(0.9969452119066843-0.05402623734400029j), psi1: (0.03403795510530472+0j), psi2: (0.03485994413495064+0j)
product:(-0.9975294263339476+0.04629996657310793j), psi1: (0.03485994413495064+0j), psi2: (-0.035831790417432785+0j)
product:(0.9981950162523199-0.03622141578551043j), psi1: (-0.035831790417432785+0j), psi2: (-0.036901943385601044+0j)
product:(0.9987735107759269-0.024920471350837484j), psi1: (-0.036901943385601044+0j), psi2: (-0.03800549358129501+0j)
product:(0.9991503307680749-0.013362420884015513j), psi1: (-0.03800549358129501+0j), psi2: (-0.0390763133764267+0j)
psi1 and psi2 are the first coordinate of each eigenvector
any ideas?
Coding questions go in #computing-software
Hi everyone, I had a question about the nelder-mead algorithm. I know that its quite slow but I'm having trouble figuring out its order. I don't think its 0, 1, 2 since its quite slow so I think its 3 or greater?
By order, do you mean runtime or error
Sorry I should've clarified, I mean error
Not very much is known about the convergence of Nelder-Mead
There are some results in very special cases but overall, the picture remains fuzzy
Oh I see, thanks! Is this the same case for its runtime?
I mean usually with runtime, you just multiple # of iterations * cost/iteration
And we can't say anything about # of iterations
I guess I also have a follow up, unrelated question. I'm being asked what optimization methods might be useful when we are stuck in ravines where the condition number of the Hessian is large. The options are preconditioning, gradient descent, newton's method, and conjugate gradient.
I've ruled out gradient descent because the gradient changes way too rapidly in such situations for gradient descent to be efficient
And more generally I think first-order methods struggle in situations like this. But I'm not sure about the other three
By ravines do you mean local minima
I think the best thing to do here is precondition
Precondition is a pretty vague thing but generally it refers to techniques that make the problem easier by reducing the condition number
One way you can do this is transform the function being optimized by stretching/squeezing to make the level sets look more like circles
So the condition number of the Hessian will decrease as the gradient won't change as rapidly
Oh I see. I can select multiple answers here, so do you think the other methods could also be useful? I don't think newton's would be good here since second-order methods require the inversion of a hessian and if the hessian is very ill-conditioned then that might be tough
I'm not sure about conjugate gradient though
I mean in general, you don't have any guarantee of convergence with any of the 3 if you have non-unique minima right
yeah
They don't have any ways of getting out of local minima
If you were doing ML I would suggest hand tuning hyperparameters in your SGD
Lol
Hmm then yeah I think you're right preconditioning might be the best
thank you!
I actually just found this https://stats.stackexchange.com/questions/386091/why-newtons-method-is-not-sensitive-to-ill-conditioned-hessian
which seems to indicate that newton's could be good here haha
That doesn't help Newton get out of local minima though
Would anyone be willing to try to explain aliasing in regards to interpolatory quadrature and the Discrete Fourier Transform? I've gone to my professor on the subject twice now and his explanations aren't getting through to me
any recommendations on introductory expositions on Hilbert transform?
I mean, what are you looking for other than the definition?
Unrelated question. In the paper
Kaprzyk, S. "An algebraic formulation of the simplex linear method for multiple integrals over the d-dimensional domain." Computer Physics Communications 183.1 (2012): 20-25.
Kaprzyk finds that integrals in N-D of the form
equal
The context is basically that the former integral is useful for piecewise rational interpolation for integration in n-dimensional spaces, tessellated by tetrahedra
**My question is: ** is there a deeper meaning to a formula like this? Like, what's the role in this special structure for multidimensional integrals? I get the Jacobian determinant comes about for a change of variables, but this formula seems to have so much structure (and is a ratio of determinants!)
In the paper, he derives the formula via a proof by induction
but I'm afraid the intuition escapes me, despite the fact that the highly structured form of the result seems like there should be a deeper meaning or some intuition
For example, since determinants are involved, I imagine a clever change of variables should yield a Jacobian determinant times some easily integrable function that produces the results, but I can't see how to do that easily (what change of variables? how to keep track of all the integration ranges? etc)
Looks like Cramer's rule
What is r^(d) ? You showed us a formula for r_0 and then one for r, how are we supposed to link the two together?
Sorry, you're right. r^(d) is (1/d!) r_0^(d)
What about phi_d?
It's the d-th integral of phi_0
Here's the additional pics to above:
I mean, the part that blows my mind is that the resulting structure is independent of the integrand, so long as the function's argument is of the linear combination form
It isn't though?
The right hand column of the determinant in the numerator is very dependent on the integrand
Yes. The result isn't fully independent, I meant the structure of the result, the fact that there's a closed-form solution regardless. I edited the above comment to reflect that
so I didn't do it all the way but I think the trick here is just to develop the determinant along the last column of phi_d, so you get a sum of the integrals phi_d.
To get that sum from the r_0 I think you just need to make multiple changes of variables u_d = everything between the parenthesis of phi_0 and then u_i = ... and so on iteratively
Uh wait, I guess you only need to do the change of variables once for u_d and proceed by induction if you suppose the formula true for d-1 variables z1,...,z_(d-1)
I only did a cursory check with the first change of variable but you can already see the vandermonde determinant start to appear.
Yeah, that's what the author does in the paper. I can see how one would reproduce the result algebraically; I suppose my question at its heart is, is there a geometric intuition behind the result? Maybe a path forward to deriving it without induction?
I'm not sure there's straightforward geometrical intuition. I guess both determinant and integral represent a volume so looking at what volume they represent could be something to explore, but that's not going to be something you think of when seeing the integral, it'd rather be an interpretation after the fact. I suggest trying to visualize all the volumes involved with simple examples, in geogebra for instance.
As for the "how do we get the intuition to find this formula", I think it's not about geometrical intuition but more of an analytical one: you just have to see that you can do a linear change of variables for each individual integral and see what you get, then t'recognize a pattern.
Maybe someone with better understanding of volumes than me will be able to give you a better answer though
I'm probably missing something very obvious, but in the last line he claims that the fact that N'_k(r) = 0 implies that N'(r) = 0. I don't see how that follows
https://www.whitman.edu/documents/Academics/Mathematics/burton.pdf
Here's a nice book https://bookstore.ams.org/gsm-29 it is intro grad level but I'm not sure you can really do much with Hilbert transform at a lower level
@brittle atlas appreciate it. intro grad level is great.
no problem! beyond that, probably something like Stein Singular Integrals book or another book on singular integrals would have more stuff
if the Duoandikoetxea book isn't what you're looking for, you could try asking in the analysis channel also
I'm stuck on trying to show that there is a householder matrix for any 2 vectors with the same length in C^M
okay so as I was typing this I think I just got an idea.
I'll type it anyways if it's a dud
Anyone have any tips of how to approach this, or if my idea is correct?
Are you using v for the householder matrix when you should be using u?
I'm using u, but I wrote my algebra out in v (so I did divide out v^*v)
What I basically did was just expand out $$Fx = (I - 2 \frac{vv^*}{v^*v})x$$ and got that $Fx = \alpha y$ only when $$2\frac{v^*x}{v^*v} = 1$$
dk
So I tried showing that can be reduced to something true, i.e. 0 = 0. I was able to reduce it to $${\bar \alpha}y^*x = \alpha x^*y$$
dk
So I just need to show that for a given x, there exists a y vector such that above holds, which works when x and y are real multiples (which I'm thinking is technically different from the scaling done by $\alpha$ (i.e. since I assumed $x \neq \alpha y$), since $\alpha = e^{i\theta}$ is a complex number when at least $\theta \neq 0$
dk
<@&286206848099549185> 
@opal mason I will take a closer look today
whzup
Hi ! Does anyone have experience with well quasi-orders ?
I have an exercise about decidability of certain problems that uses WQOs
I'm usually terrible at computability but I think I found something convincing, so if someone is up to talk about it and shatter my dreams, don't hesitate to ping me đ
Maybe that's more of a CS thing but I'm trying nonetheless
You might also try in #foundations for questions about computability
Thanks, I'll try it !
how can i compute least squares with cholesky factorization?
Can someone tell me what im doing wrong in this process?
im supposed to get this matrix returned by the function : [6., -2., 2.] but instead im getting this [1. 0. 0. 0.]
Marius von Hagen
So you should replace rhand with np.dot(Ain.T,rhand)
(Also np.matmul(A,B) is preferred over np.dot(A,B) for matrix multiplication)
Also you should be multiplying A^TA, not AA^T
Thanks alot, it works
@opal mason look at theorem 2.1.13 in http://www.cse.zju.edu.cn/eclass/attachments/2015-10/01-1446086008-145421.pdf
You should be able to adapt this proof to your problem
I already submitted, but thank you! I'll take a look.
hmm link doesn't work for me
Itâs just very slow
this is for cryptography class so I hope it's the right channel
but I need help with figuring out the probabilities
don't really get it because isn't every deck going to be unique
so wouldn't the probability be 1/6 for every deck
ty
when would we want to find the pseudoinverse(Moore penrose) of a non square matrix ?
To solve least squares
Why is it generally a bad idea to use the inverse to calculate least squares ? as in why do we use cholesky or LU instead of just doing inverse?
The normal equations are ill conditioned
Calculating the inverse is always bad
Also because of floating point reasons
why cant we just use pseudo invese for ill conditioned? is it because it is slower than factorization?
Calculating the pseudo inverse is very slow because the SVD is slow
i see, so in rough terms its all about optimization
"Computing the inverse requires more arithmetic operations than computing an LU factorization."
Especially when doing this for the normal equations like EmilB was doing you can do even better than LU with Cholesky
"We do not address the question of computational
efficiency, but we do note that there is evidence that using the inverse is
sometimes preferable from the performance perspective [3]."
đ
just arguing against "always bad"
In general I agree đ
Ok sure
If you follow [3] then they present 2 cases where A^-1 is faster than LU
Anyways
You should never solve the normal equations in the first case
Hi
I was wondering guys
How can I solve a system of equation with module operation, like for example
In fact, the algorithm does not need to find the solution, it needs to determine if there is a solution or not
I do not know if a regular Gaussian elimination would work and in case it does, I do not know how to perform the modification to it
"Interpolation is not often something thatâs useful in machine learning, but rather something that we often use AI for. Feeding data to a computer and allowing it to make educated guesses for us, especially when it comes to millions of lines of data, is helpful in many different fields. "
Why is it not useful for machine learning? And i suppose by AI they mean neural networks, so why is interpolation good for NN?
with a deep enough network and enough data, the underlying structure can be inferred from the rough data
what's more, interpolating the input data incorrectly can make the network interpolate and extrapolate worse
interpolation and extrapolation don't generate any new information. they just exploit what is already there. the network already does this
so in laymans terms why is interpolation not good for ML , and more common in Deeplearning?
because at best it does nothing, and at worst it produces incorrect data
i'm not quite sure what you mean by machine learning there
interpolation is often an optimization task that can be approached in several ways. deep learning is one of them.
Would you please tell me a little bit more?
why would you use interpolation instead of just linear regression?
linear regression does exactly what the name implies. it assumes the data follows a linear relationship. this is simply not true for most data, and so if you interpolate in such a simple fashion, you make mistakes
the problem is if you take this data and then give it to the network as if it were the correct data (which it isn't)
you then teach your network that it should do this linear interpolation, when it could do much better if you didn't confuse it
So, the linear operations should be redifined to work in Z/2Z ?
Sorry, I think I do not understand. It is being a little bit confusing to me
If R is not considered a finite field, I do think it is not, then I do not have experience with Gaussian elimination over finite fields
Now I get it
There is a way to know if a solution exists without performing the gaussian elimination?
The concept is amazing, it blew my mind
May I ask why did you have to work with this?
Me too hahahaha
"Given N friends, and given M relations of the form A -> B where A and B are friends, find if there is a partition of at most size 2 such that each member of the partition have an odd number of friends inside the partition"
Is it working in O(n^3)?
Would you mind sharing it with me?
No problem, thanks in advance
I suppose this isn't directly related but maybe someone will benefit... looking for something related to numerical methods/ hpc internships e.g. the NVIDIA SWE and MathWorks that entertain undergrad applications. Anyone know of anything good/ worth looking into
stuff similar to this
It looks pretty interesting
Thanks for the sharing
@brave crypt
This is not the correct place for this
You should ask about this in #advanced-number-theory
Can anyone help with interpreting Least squares(OLS) in matrix form, im so lost.
What is your question
So i have this dataset right, where prmres is the dependent variable and the others are the independent. How would set this data up in matrices for least squares ?
Ok so
Given $m$ data points ${y_i}$ measured at $m$ places ${x_i}$, the goal is to determine parameters ${\beta_i}$ so that $f(x,\beta)=\sum_{i=1}^n\beta_i\varphi_i(x)$ and ideally, $y_i=f(x_i,\beta)$ but we cannot have this because there are too many $y_i$ and $x_i$ for a linear relationship so we want to minimize the residual $r_i(\beta)=y_i-f(x_i,\beta)$
Marius von Hagen
To convert this into the matrix form, we take $X_{ij}=\varphi_j(x_i)$ and $y$ as the vector of $y_i$ so we have $\beta=(X^TX)^{-1}X^Ty$
Marius von Hagen
$\varphi_j$ is some function (possibly nonlinear) of $x$
Marius von Hagen
can you give an example?
In your case, X will be the tall skinny vector formed out of oms, konk, and nypr, and y will be the vector pmres
i get that but idk how to represent X in matrix form
$X=\begin{bmatrix}5.355&2&0\5.551&2&0\5.62&1&0\\vdots&\vdots&\vdots\end{bmatrix}$
Marius von Hagen
is X a coefficient matrix? @wide spear
Coefficients of what?
ok so i have this python code to solve a least squares solution, it just uses np.linalg.lstq .
the arrays in the pic above corresponds to the this picture, where area is the independent variable
so what im asking if this is correct
Sure
are you knowledgeable on Stata? Do you know where the least squares solution is given in Stata?
No
alright thanks for the help man, i need to validate my python answers in some other program
Actual coding questions go in #computing-software
actually i think it is wrong, i read up on it and they say its called a "Design Matrix" and the whole first column should be 1's
what else do you know about the model? that sounds like one of the variables is known to be scaled by 1
is this like a polynomial? are the quantities just scaled and added?
the model you have built there says that your data is of the form area = a bedrooms + b age + c price
and you're solving for a, b, and c
having an extra column of ones could be interpreted as either knowing something else about the variables, or maybe having an extra parameter to denote a base area that doesn't depend on the other variables
say area = a bedrooms + b age + c price + (1) d, and d is this base area
Okay, so I seem to be needing to efficiently evaluate Vieta's formulas: https://en.wikipedia.org/wiki/Vieta's_formulas
Imagine I have the roots r1, r2, ... of the polynomials mentioned:
In particular, I have these roots, and I want to know if there's a quicker way of evaluating this (getting the rhs) than just straightforwardly
for k=1,2,3, ... say up to 7
you mean you want to compute both the LHS and the RHS? cuz the RHS is pretty simple
I have these $r_{i_j}$
imagine I don't know these a_{n-k}/a_n
right, so you want the LHS
yes, and I'm hoping there may be a quicker way to evaluate it than just as given
based on these Vieta's formulas I would think there might be?
i don't think so
it's likely gunna be the bottleneck to what I'm doing so I'm hoping for some way đ
i would almost suggest you do the first few small polys with the sum and product definition
and then for large polynomials, actually find the coefficients and use the RHS
ya I'll be doing it for like k=1,2
but "actually find the coefficients" seems to be done with the LHS?
yea ok
so (x - x1)(x-x2)... is the poly
and the coefficients of the poly are what is used in the rhs
the computer does poly multiplication with convolutions, which should be rather efficient
you can compare it if youd like, but this way you save yourself using 2 loops and the cost is that of repeated convolutions
so since it's on the computer, x is a number, it seems like I'd have to do it for like multiple x's and compute the coefficients via finite differences
what do you actually know about the polynomials
earlier you said you have the roots
if you have the roots, there's no need to do finite differences for anything here
polynomial manip is weirdly far from my numerical experience
but I found this: https://stackoverflow.com/questions/33067755/how-to-efficiently-find-coefficients-of-a-polynomial-from-its-roots
so I think I'm good
yes
its beta 0
if you say so đ the name beta 0 alone means nothing to me out of context lol
https://en.wikipedia.org/wiki/Design_matrix but yes my question was not the best, i shoulda asked how to format the linear regression model as a matrix and not just the data
In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual object, with the successive columns corresponding to the variables and their specific values for that...
if you can provide the data model, we can set up the model matrix
it sounds like the assumption is that the data behaves like y = a x1 + b x2 + c x3 + x4
you can do this in two steps. first, find the general solution in terms of all the free parameters (if any). if there are free parameters, you can then solve an optimization problem to make the sum of the coefficients zero (or some other constraint) if it's not evident how to do so at a glance
can you only plot regression with more than 1 feature by using a 3d plot?
This is more of a theoretical question to something involved applied computation math, but I can't find a suitable channel so I'll drop it here>
My question is on subdifferentials
https://www.stat.cmu.edu/~ryantibs/convexopt-F16/scribes/subgrad-scribed.pdf
So one of the examples they give in this PDF about subgradients is the max of 2 convex, differentiable functions. They claim for x such that f_1(x) = f_2(x), the subgradient at f(x) is a convex combination of the gradients of f_1 and f_2. Geometrically this is easy to understand, but how do I go about proving this? The case when f_1(x) /= f_2(x) is pretty much trivial, but I'm not sure how to do the equals case.
The example in consideration:
preferably I'd like a hint for both directions (showing that it's only a subgradient if it's a cvx combination of the gradients), but a proof sketch/actual proof will do as well
first, if they are equal, they have the same subgradient
next, a sketch in R that you can go over to generalize to R^n đ
we have the convex combination x1 a + x2 (1 - a)
let x2 = x1 + c, for some constant c
then the combination is rewritten as x1 a + (x1 + c) (1 - a) = x1 a + x1 - a x1 + c - ca = x1 + c(1-a)
we know 0 <= a <= 1 due to the definition of a convex comb
this means the result ranges from x1 to x2, i.e. either [x1,x2] or [x2,x1]
this is pretty much the same definition of a subgradient itself, since that is a range of slopes, so to speak, for which the subtangents lie below the graph
i.e. if the subgradient is [y1,y2] and x1 and x2 are in this set, then [x1,x2] is a subset of [y1,y2]
and graphically it's the region between two lines
idk if that helps you, i just woke up lol
so I'm kind of lost 
So for the 2D case, $\xi = \lambda\nabla f_1({\bf x}) + (1-\lambda)\nabla f_2({\bf x})$
Then if I suppose $\nabla f_2({\bf x}) = \nabla f_1({\bf x}) + {\bf c}$, so $c$ is some vector here
and then aaaaaa
I get $\nabla f_1({\bf x}) + {\bf c}(1-\lambda)$
mhm
dk
parentheses
wait so is this the sketch for showing a convex combination of the gradients is a subgradient?
or the other way around
you mean cnv comb of gradients?
It's really late here and my thought is very slow 
I'm still failing to see where I get the subgradient from the last expression
oh i guess ur right, yeah
how would I show that every subgradient at that point is a convex combination of gradients then? So starting with some subgradient, showing that it's a convex comb of grads
I feel like I'm missing something 
hmm
probably backwards
start with the subgrad at x being an interval
and show that a subset can be expressed as a cnvx comb
i don't see a good way of showing the endpoints of the interval are the grads of f1 and f2 right now, tho
Anyone know why centers for rbf neural networks are chosen using k means clustering? I'm going to use an rbf neural network to approximate some non linear function and I was thinking of using gradient descent to minimize the error and treat the centers as extra parameters just like the weights. Is there any reason why that approach would be inefficient compared to k means clustering?
Would it lead to overfitting because the algorithm might now assign the centers numerical values that best fit the data, but may not be the actual point about which the data is centered?
Kind of a simple problem but I have no idea with computational stuff. I have a simplical cone (Given 3 linearly independent vectors $v_1,v_2,v_3$ in $\mathbb R^3$, the simplical cone spanned by them is the set of all linear combinations with positive coefficients), and I wanna find out in an efficient manner if a point lies inside (Including on the boundary). The points I wanna know if are inside are specifically integer points if that helps. I've thought that, this cone is actually the intersection of the 3 half spaces given by the planes spanned by every 2 vectors from $v_i$, but i'm not sure how to find which part of the half-space I should even take (And therefore which orientation of the cross product I should take). I also thought there might be an easier way to find this out using recession directions but I haven't really fleshed that thought out
ShiN
I'm heading out now but I'll answer if it is unanswered when I get back
Thanks!
the straight up naive way is to set the vectors as columns of a matrix, invert the mat, and multiply by the coordinate vector
you'll get the coords in the basis of the v_i and then just check if they're all positive
this ignores the points being integers tho
Oh nice, I didn't consider that since I was too focused on the half-space idea đ
There might be a better way but at least this works haha
Thanks
Hi could someone help and point me in the right direction to solve this problem:
I want to write a python script that does the following:
- Isolate a square root term on the LHS.
- Square both sides of the equation.
- Add and subtract terms accordingly to get 0 on LHS.
- Isolate another square root term on the LHS. [ i.e. step 1) ]
- Repeat step 2).
- Repeat step 3).
- Repeat step 1).
ETC . . .
Would this be "solving equations iteratively in Python" ? (I have very little experience with code.)
(And I know I can just tell WolframAlpha to solve this for me, but it would be cool to this piece of a problem I am working on all by myself, in a way.)
Another solution: truncate the cone at a point past your point of interest, then determine if the point of interest is inside this closed convex polyhedron
This is a classical problem that you will find many algorithms for
is the cengage edition of sipsers introduction to the theory of computation fine
like does it have errors in it?
i bought the book for $20 new when the main published edition is for like $200
Might be better asked in #book-recommendations ; and I think Cengage is the authorised publisher for both hardcover and paperbacks.
More theoretical, but any ideas on how to show this is convex? (or counterexample?)
if i'm picturing it correctly geometrically, is this just a shifted quadratic, with lower bound of 1, which should be convex?
I've tried just a bunch of algebraic manipulation and I'm just not getting anywhere
Has someone here worked with polyhedra in sage? there's something dumb that is driving me crazy and I can't find any documentation on

so when you ask for a vertex representation of a polytope it gives you a list that's like 'A vertex at (x,y,z), A vertex at (x_2,y_2,z_2)' etc.
All I want is that, but without the words, and in vector form. Just a list of the vectors making up the vertices
is that too much to as kfor
I have a homology theory with coefficients in GF(2) that I ultimately need to find representatives for generating sets of the homology groups. I'm guessing that this is a fairly standard problem (compute the homology groups) that there might be a python package for? A brief search shows that I might be able to find some routine for computing the Smith normal form instead?
*The boundary maps are sparse finite dimensional matrices
Is computing the Smith normal form the standard route? / Is there a package that can do that for a very sparse boundary maps with dimension ~10k x 10k ?
If you don't get a response here you might also want to ask in #point-set-topology
I would ping max but max is no longer with us
differentiate with respect to x twice
i think given that A is pos def, x is any vector in R^n
so it's convex
hmm
nvm they gave the cond that the result is >= 1
the derivative will be $$A(x - e_1) + (x - e_1)^TA = A(x-e_1)+ (A(x-e_1))^T$$
the second derivative will be $A+ A^T= 2A$ which is the hessian, and is positive semi-definite so youâre function is convex.
c squared
but you can first look for the condition for a vector v, and then let v = x-e_1
you sure that works? they're asking for the convexity of the set, not the func
oh whoops. misread
2x - y = 0
x^3 + x - y = 0
am exact ans is (1, 2)
if my guess is (2, 3) and after my first iteration calculations, my values are (1.5, 3.5)
there's must be a mistake in my calculation
Hi friends I have no idea where to ask this question but maybe you guys have an answer: Is it possible to calculate partial derivatives of an pde even though it has no analytical solution? Like for example the 3dim Navier Stokes Equations?
Doesn't this only pertain to functions not sets
Turns out there was a very simple counterexample and I was too fixed on showing convexity
Which method are using to solve iteratively?
i mean, technically functions and variables are sets. but anyway, i just misread ur question and the advice i gave shows that ur function is convex. not that the set in question is convex
yea i was working through it. i got it to almost work out but there were some terms that just didnât want to disappear
You'd probably want to use a sympletic solver right
Equations coming from physics usually conserve energy
Given an array $a$ of $n \leq 1000$ integers with $1 \leq a_i \leq 250$, compute the number of distinct subarrays of $a$.
I have a solution using rolling hash that seems to work, but what is the deterministic solution?
I feel it should use string algorithms
IlIIllIIIlllIIIIllll
hey all
i have a problem i wanna write an algorithm for
but im v confused on how to start
Best to ask in the CS server, linked in #old-network
what would happen if you defined a new type of automataon saying that each state needs to have >=1 outward arrows for each letter in the alphabet
would any new special properties arise
For this question on a modified Newton's method problem, I can't comment on stackexchange, but in the last line of Gordon's answer, isn't f(r) = 0, so he is dividing by zero? My own answer came out quite a bit differently, I got the same s value but C = f''(r)e0/[2f'(r)]
Suppose I wanted to make a linear fit to a dataset with vector input and output, by minimizing the least square error. Then the least square error equation would be
$$E = \frac{1}{2}\sum_i(W\vec{x}^{(i)} - \vec{y}^{(i)})^2$$
And I have to find the weight matrix $W$ that would minimize this. Can't I treat this as a colelction of linear regression problems? Like the least squared error for the j'th component of the output would be
$$\frac{1}{2} \sum_i ( \vec{w}_j \cdot \vec{x}^{(i)} - y^{(i)}_j )^2$$
And each component equation is independent from the next. So can I treat these as simple linear regression problems and find the rows of the weight matrix from the one step formula of linear regression that used the pseudoinverse (I forgot the exact formula) ?
Abrar
I think I can but I see other people use methods like gradient descent or particle swarm optimization to get to the minimum instead for this sort of problems with vector outputs so I'm confused.
Hi Guys, I have a scenario, I want to turn it into a mathematical model,
The scenario:
I have a car that makes a reservation for the network resources in advance (use it in the future), That is, when the car reaches the coverage of the Base station (as you can see in the attached Foto), for example, the car has been reserved resources for a while, however, this reserved time may change (update) before the car enters the coverage of Base station due to uncertainty in traffic and movement. Updating the reservation takes the following steps:
-
cancel partial (overbook): If the reserved time is more than the time car needs it right now.
-
Cancel all and new reservation (underbook): If the reserved time is less than the time car needs it right now.
Cancel Partial:
In the cancel partial there are different strategies, for example:
a) If 70% of the reservation is overbooked, should we cancel it all and reserve a new reservation, or cancel it partially why?
b) If 50% of the reservation is overbooked, should we cancel it all and reserve a new reservation, or cancel it partially why?
c) If 20% of the reservation is overbooked, should we cancel it all and reserve a new reservation, or cancel it partially why?
There are cancelation strategies and fees in our model:
Peak-On (i.e. Traffic flow is very high in this Base station coverage):
- Cancellation of reservation before entering the car the base Station coverage. The cancellation fee is 5%.
- Cancellation of the reservation after entering the car the base Station coverage. The cancellation fee is 20%.
Peak-Off (i.e. Traffic flow is very low in this Base station coverage):
- Cancellation of reservation before entering the car the base Station coverage. The cancellation fee is 0%.
- Cancellation of the reservation after entering the car the base Station coverage. The cancellation fee is 5%.
I'm going to be honest, this question doesn't really fit here
I have a question,
In class I've proved the following inequality
Where Ax=b and Ax^=b^
Now I want to come up with a proof that there exists a b thats not b^ s.t. the equality holds
but I'm not sure where to start
I can see that I can easily get the equality if I assume |A^-1| |b-b^|=|A^-1(b-b^)| and |A| |x|=|Ax|
but I'm sure theres a stronger assumption I can make about b
any help is appretiated
You can consider vectors that maximize the operator norms, but I don't remember the outline of the proof
hi
can someone suggest some projects to be done with newton raphson method
a unique one infact
where you explain the project objective
methodology
problem statement
and algorithm
Bump
https://financialdata.tiiny.site/ I pasted the data that I got using python. My problem is that the PCA gives me 49 eigenvalues and I have 93 variables. Isn't that strange ? I should have as much eigenvalues as variables no ?
you have the data arranged as a matrix, yeah?
No one is going to do your homework for you
Why rows
Isnât it supposed to be related to columns ?
google rank
well, PCA deals with the eigenvalues of the gram matrix. you could build that either as A^TA or AA^T
A^TA
regardless, since the matrix is 49 x 93, apparently, then it can at most have rank 49
Matrix here is 93 93
Why 94 -49 ?
as a lazy proof, you can use the SVD to prove this
if the original matrix is 49 x 93, let's call it A
then it has an SVD such that A = USV^H, where U is an orthonormal mat of size 49 x 49, S has the singular values along its main diagonal and is of size 49 x 93, and V is orthonormal and of size 93 x 93
then A^H A = V S^H U^H U S V^H, and sice U is orthonormal, U^H U = I
then S^H and S are diagonal and you get a diagonal matrix D = S^H S of size 93 x 93 with at most 49 nonzero entries
and A^H A = V D V^H, all matrices square of size 93 x 93, V orthonormal, and D a diagonal matrix, so this is the eigenvalue decomposition of A^H A
and it has the same rank as A, and the eigenvalues are the absolute value squared of the singular of A
the rank of A is <= min(number of rows, number of columns)
so the rank of A^H A follows the same property
Whats H ?
hermitian transpose
complex conjugate transpose
not needed here if the data is all real, so just replace it with a regular transpose
How you know 49
because of how diagonal matrices multiply
try this
take $\begin{bmatrix} a && 0 && 0 && 0 \\ 0 && b && 0 && 0 \end{bmatrix}$
Edd
call that S
and calculate S^T S to convince yourself
this is easy to generalize to arbitrary sized diagonal matrices
hii
does someone know about NR method
newton raphson
and thomas algorithm
@prime kraken ?
@wide spear
If you have a question, ask it
We aren't here to do your project for you
anything where one has to optimize something
We aren't here to design your project either
still don't get it
they are diagonal yes but i don't the link with non zero
dude write it on a piece of paper and do it
and here's a related proof, too
oh i get it
it's because S is 49,93
so maximum u have 49 non zero entry right
for D
"S has the singular values along its main diagonal and is of size 49 x 93"
Someone could use a linear algebra refresher
how do you know that ?
that is has the same rank as a ?
no
here's a quick proof
i get his proof
ut now i'm stuck at the end
how did he get dim = dim and rand = rand from the first equality xd
The methods in question are PCA, EM algorithm, Neural networks and Naive Bayes
Obiviously for PCA there is a positive effect as we want to find underlying correlations. For the other techniques, I am less familiar with them.
Perhaps for neural networks we can consider correlation as a feature to work with, so it would not really matter.
I have a question about matrix decomp
so I know how to decompose a matrix A = L*U
where L is a lower triangular matrix and U is an upper triangular matrix
Now let A = [4,-1,-1;-1,4,-1;-1,-1,4]
I was able to decompose it into LU but one of the question asks me to decompose it to Ltrans(L)
But I don't know any method of doing that
Any help would be appretiated
See Cholesky decomposition
can anyone help with gaussian quadrature

i need to estimate this integral using the gaussian-legendre quadrature formula
And the question is "Would you consider the result of Gaussian quadrature in this case a good approximation? If no, why do you think, this integral is difficult to approximate? "
but idk what to make of it
so the polynomial : (1)/(x**2+0.0001) and im integrating with range [0,1] and degree 2 and i get the result : 23.82841
So that begs the question which i dont know the answer to: Why is this a bad estimation?
to get a result of 156 i would need the degree to be n = 100, that wouldnt make sense tho as the degree is the degree of the polynomial
where are you seeing a polynomial
Hi, I was sort of confused about this question and wanted to see what others had to say
My thinking is that as the model parameter increases, the training error decreases but the test error will increase too due to overfitting
For this reason I was thinking that the first one is probably not correct but, the second one sort of makes sense to me as well
Hi. Just curious. Would this be the right channel for numerical analysis stuff?
Yes
Could questions on automata theory work here? I don't quite understand how NFAs are made.
Best to ask in the CS server
I have a question to do with LaGrange interpolations. better to ask here or #advanced-analysis ?
Fine here
okay i think i figured it out but i was just having trouble reasoning it out this derivative. im still not sure if i did it correctly. Here is part of my work
im trying to find omega prime given omega.
this is the prompt:
Ok that looks fine
okay i was mostly concerned with the making sure everything was accounted for properly in the product rule expansion.
Coding questions go in #computing-software
how do I build this block tridiagonal matrix in matlab? I was able to build T but not A
Update: I was able to build a block diagonal matrix will all matrix T's. I do not know how to include the identity matrix yet
Hi quick question: What norm should be used for the banach fix point theorem? I have a fix point iteration of the form $x^{(k+1)} = \Phi(x^{[k)}) = Ax^{(k)} + b$ where A is a matrix and b a vector, now i wanted to do:
$||\Phi(x) - \Phi(y) || = ||A(x-y) || \leq ||A|| \cdot ||x-y||$, i would like $||A||$ to be my contraction number but for some norms i get $||A|| \geq 1$
Gewisser Fler
Whatever norm works
You'll get convergence in whatever norm you choose
And on fin dim vector spaces, all norms are equivalent
Easiest is to use Kronecker products. What you have there is 2d Laplace so just ```
n1=4;n2=4;
L1=toeplitz([-2 1 zeros(1,n1-2)],[-2 1 zeros(1,n1-2)])
L1 =
-2 1 0 0
1 -2 1 0
0 1 -2 1
0 0 1 -2
L2=toeplitz([-2 1 zeros(1,n2-2)],[-2 1 zeros(1,n2-2)]);
L2D=kron(L1,eye(n2))+kron(eye(n1),L2);
Modified so you can have different `n` in each dimension
This worked thank you!
I'm doing a PCA
I'm a little bit lost to know if I should remove outlier ?
and how to know if they are outlier
robust PCA is a big deal, since covariance matrices are susceptible to them
but sophisticated techniques are beyond your reach right now
you could try something simple
vectors whose norm differs too much from that of the others, for example, could be discarded
or entrywise computing the quartiles, too
@prime kraken i computed quartiles to find outlier
but idk if it's usefull to delete them
it is
Can I send you my notebook ?
no
ok x)
@prime kraken
for example for the column the mean is 36 and the max is 156
if he is considered as an outlier using quartiles, should i remove it ?
it's not a value that is crazy but it's also considered as an outlier by the strategy so idk
it also depends on what the data represents, so i can't comment
oh ok
Given the error amount (higher_order-lower_order)*step_size, what is the proper way to accept or reject a step in adaptive Runge Kutta? And what is the formula to adjust the step_size for the next step? Everybody on the internet says something different...
I have the iterator like this:
const auto result = method_type ::apply (problem_, step_size_);
const auto evaluation = error_evaluator_type::evaluate(problem_, step_size_, result);
if (evaluation.accept)
{
problem_.value = result.value;
problem_.time += step_size_;
step_size_ = evaluation.next_step_size;
}
else
{
step_size_ = evaluation.next_step_size;
return operator++(); // Retry with the adapted step size.
}
now trying to implement error_evaluator_type::evaluate which returns a boolean saying whether to accept or reject the current step and the next step size, but stuck AF:
struct error_evaluator
{
constexpr static auto absolute_tolerance = 1e-6;
constexpr static auto relative_tolerance = 1e-3;
template <typename problem_type, typename time_type, typename result_type>
static error_evaluation<time_type> evaluate(
const problem_type& problem ,
const time_type step_size,
const result_type& result )
{
time_type error = NORM(result.error / (absolute_tolerance + relative_tolerance * MAX(problem.value, result.value)));
return {error < time_type(1), step_size / error};
}
};
Boost.Odeint uses a weird scheme that I have not seen in any book: https://www.boost.org/doc/libs/1_63_0/libs/numeric/odeint/doc/html/boost_numeric_odeint/odeint_in_detail/steppers.html#boost_numeric_odeint.odeint_in_detail.steppers.controlled_steppers
I just want the classic step size adaptation method
Accept in error < threshold (decided by you)
Formula for step size: double if error is very small, half if error is large
Or whatever amounts you want
I want to make a comparison of the convergence rate of different matrices for conjugate gradient vs gaussian elimination. I have the expression $\frac{|| x_k- x^* ||}{||x^*||} \sim (1- \frac{1}{\sqrt{K_A})})^k$ for the convergence of the CG method. Im wondering if there is a similar expression to be obtained for the convergence rate of gaussian elimination involving condition numbers s.t I may compare use the condition numbers to evaluate which is the quickest method?
I believe $\frac{|| x_k- x^* ||}{||x^*||} \sim (1- \frac{1}{K_A)})^k$ holds for steepest descent, but does it work for A = LU?
here $x^$ represents the true solution of $Ax^ = b$ and $x_k$ is the solution of conjugate gradient at the k-th iteration.
The run time of Gaussian elimination doesn't depend on the condition number because it isn't an iterative method
well I believe the gaussian elimination in this context is LU decomposition
ahh, yeah that makes sense. But then the best comparison would be to state the time complexity for the different methods I guess
Yes
right thanks
not sure if this is the correct place to ask but
how do I turn a max min objective function into a linear program?
@wide spear sorry for the ping. I kept finding different answers for the time complexity for the conjugate gradient. There is a paper titled "An Introduction to the Conjugate Gradient Method without the agonizing pain" which lists the time complexity as $O(m \sqrt{K_A})$ where m is the number of nonzero entries in the mtx. Then if the number of unknowns is equal to $1, N=1$, there would be nothing to gain, but as N increases ($N \geq 2$) the CG algorithm will always converge faster. Does this sound reasonable?
Fredrikpiano
yes, Shewchuck. N is just to denote the dimension i .e N = 10 generates a 10x10 matrix in the matlab program I am running to generate some matrices which I want to compare the convergence rate for. I choose the infinity norm for when I am computing the condition numbers (||A^-1|| ||A||_\infty), but I could have chosen other norm as well right?
Usually the condition number refers to the ratio of the largest and smallest eigenvalue
Clearly there is nothing to be gained when N=1
But you haven't said what you're comparing CG to
Comparing it to LU decomposition with time complexity O(n^3)
Ok
one can relate condition numbers to a mtx being singular as well right? If the condition number is very large for example
Yes
I also have the following exercise Im working on from my teachers lecture notes
Im thinking I should show that A has one lin. independent column and work from there
if anyone could help or give me hints it would be much appreciated...
yes, i think we are choosing an i
such that a_i will be minimized
and then we "maximize" by choosing a x
I think the first step would be to replace the entire
with "A"
and then try to translate this minimiztion equation into a set of constraints
but im not sure how
Wait are you minimizing the sum of the a_i
What does minimize the a_i mean
You can rewrite that sum as Ax, yes
i believe so
Why do you want this as a linear program
What constraints do you have on A aside from Ax<=1
this is the only constraint
and this is an exercise, the rest of the numbers were fairly straightforward
I see
Wait
We don't minimize abs(ai1)+abs(ai2)+... right
We minimize ai1+ai2+...?
Then there is no minimizer A
You can make the sum as small as you want
Ok that makes more sense I think
so first i minimize the sum, and i pick the biggest "component"
Yeah
I'll assume that A is fixed then
Then you have m linear programs to minimize a_i^Tx
Then you have one linear program to maximize c^Tx where x is a vector with entries summing to 1 and c is the vector of minimizers of the a_i^Tx
And you use the constraint when minimizing a_i^Tx
I think
ok so you're suggesting i use two linear programs?
Yes
Can someone remind me about how random matrix theory is related to compressed sensing?
For example in the matrix completion problem you need to assume the entries are missing randomly. So you are basically dealing with Bernoulli random variables. Then you have a matrix of Bernoulli variables. Somehow properties about random matrices of 0 and 1s is relevant somehow. So I think random matrix theory comes in here. But I wanna know the core results that are relevant
you're interested in matrices that satisfy the restricted isometry property, then
looks good
Thank you so much I think the tools from probability chapter is exactly what I was looking for
Would this channel fit for Automata theory?
No, you should ask about automata theory in the CS server, linnked in #old-network
đ
hi friends possible to compute cholesky factorization on non square matrix? đ
^matlab
thank you â€ïž
anybody here know machine learning?
You might have better luck asking in the ML server linked in #old-network but you can always try asking here
alr, thanks
just going to ask here in case:
alr, so i have some data:
X1 X2 Label
0.4 0.4 Red
0.8 0.8 Red
0.2 0.4 Blue
0.4 0.2 Blue
0.8 0.2 Blue
0.2 0.8 Blue
and i have the following kernel K(x,y) = y^T x / ||y|| ||x|| and i am tasked with mapping each data point in the original feature space to the new feature space representation implicit by K(x, z).
im kind of unsure what to do...
just to make sure im understanding, the original feature space is the set of ordered triplets that look like
(x1, x2, label) correct?
Anyone can help me with calculating norm of inner products?
I've got u,v in V
V is a space of inner products
| | u | | =3 ; | |u+v| |=4 | |u-v| |=6
how do I find v?
or | |v| |
#linear-algebra is more appropriate I guess
use the associated inner product that induces the norm
for example, $\langle u, u \rangle = \Vert u \Vert^2 = 9$
Edd
#linear-algebra indeed
indeed đ
how does it work with | | u + v | | =4 -> | | u + v | | ^2 = 16?
does it give the option to break it?
i'm using RStudio and i'm trying to analyse a dataset. one of the variables in it is "product_brand" where each entry is a brand name e.g. "APPLE". i want to make a histogram of a different variable in the dataset but i only it to consider data from a specific brand, does anyone know what i would write?
ah sorry
Sage, Magma, and Mathematica will compute smith normal forms for integer matrices
Does heun's method give the same solutions as Euler's method? (Specifically the forward euler)
I'm a little confused on this, because Heun's is considered a 2nd order method, so I don't see how it would give the same answer as Euler's. But if it's just an improved Euler's method, shouldn't it give the same answer?
No it doesn't give the same answer as Euler's method
How does the green part works?
I search around the internet and find "finite difference approximation" but how?
you can read about it on almost any google search about finite difference approx :p
In numerical analysis, finite-difference methods (FDM) are a class of numerical techniques for solving differential equations by approximating derivatives with finite differences. Both the spatial domain and time interval (if applicable) are discretized, or broken into a finite number of steps, and the value of the solution at these discrete poi...
Intuitively think maybe like this: f'(x)=df/dx Here, df is a small difference in f and dx is a small difference in x. So, df=f(x1)-f(x2) where x1 and x2 are close, and then dx=x1-x2. f'(x)=df/dx is only true as the difference goes to zero and for finite differences you only have that f'(x) is approximated by df/dx
In your case you are interested in f'(x) for x=p_{n-1} and you have that p_{n-2} is a point close to p_{n-1}. Hence, just plug in your values to approximate the derivative f'(p_{n-1}), that is df=f(p_{n-1})-f(p_{n-2}) and dx=p_{n-1}-p_{n-2}
That makes sense, thank you!
A question for seasoned data scientists. Letâs say we have a dataset and we suspect it has a lower-dimensional underlying structure. Obviously, weâd like to work with fewer dimensions. But all dimensionality reduction algorithms I know of presuppose the structure of the manifold: PCA thinks of the data as an ellipsoid; Isomap only works if you happened to correctly guess the topology you embed the data into, considering the usual guess is a flat plane, it isnât very useful most of the time; t-SNE thinks of the data as a collection of globular clusters, and tries to fit anything into that framework; and so on. Iâm curious, is there an algorithm that can learn the underlying manifold directly? As an atlas? Or, even better, the algorithm to extract not only the topology of the data space, but also its curvature in every point?
Isnât it just a slightly better version of t-SNE?
Based on the paper, it seems like they aren't assuming anything about the underlying manifold
anyone knows about the redlich-kwong equation of state
i wanted to know if that eqn could be solved using numerical methods?
From Wikipedia: ```
This equation only implicitly gives Z as a function of pressure and temperature, but is easily solved numerically, originally by graphical interpolation, and now more easily by computer. Moreover, analytic solutions to cubic functions have been known for centuries and are even faster for computers.
No idea, I have never heard of this equation.
ask your teacher
hello! i have asked a question in #help-14 regarding finite difference approximation. im not sure how to tackle this problem.
Can you repost your question here and close the question channel
Hello! I currently have a problem with using finite difference approximation for solving the eigenvalue problem of a periodic boundary condition. Right now I only have the second-order forward and backward approximations for f'(x) and central approximation for f''(x). From there, I don't know where to proceed. As far as I know I should set up a banded symmetric matrix A?
The eigenvalue problem is as follows:
y''(x) = -lambda y
y(0) = y(1)
y'(0) = y'(1)
here it is @wide spear
from what I know about FDA, i'm supposed to have some equation that looks like
$$
\begin{bmatrix} 2 & 1 & 0 & 0 \ 1 & 2 & 1 & 0 \ 0 & 1 & 2 & 1 \ 0 & 0 & 1 & 2 \end{bmatrix}
\begin{bmatrix} y_1 \ y_2 \ y_3 \ y_4 \end{bmatrix} = -\lambda
\begin{bmatrix} y_1 \ y_2 \ y_3 \ y_4 \end{bmatrix}
$$
$$
\begin{bmatrix} 2 & 1 & 0 & 0 \ 1 & 2 & 1 & 0 \ 0 & 1 & 2 & 1 \ 0 & 0 & 1 & 2 \end{bmatrix}
\begin{bmatrix} y_1 \ y_2 \ y_3 \ y_4 \end{bmatrix} = -\lambda
\begin{bmatrix} y_1 \ y_2 \ y_3 \ y_4 \end{bmatrix}
$$
gristCollector
there we go lol
thats what i believe the matrices look like for dirichlet boundaries
but im not sure how this is going to look like in periodic boundaries
With periodic boundary conditions you'll get a 1 in the top right and bottom left corners of the matrix
The [1 2 1] wraps around, in a sense
Also don't some of those need to be negative
Shouldn't it be 1 -2 1
oh right, because $y(0) = y(1)$, which means $y_1 = y_4$
gristCollector
oh right, should it?
ah yeah it is whoops
it is supposed to be -2
but is that the only change to this matrix?
Yes, that's the only change
ohhhh
ok that makes a lot of sense actually
and now i just use the same methods as i use in dirichlet to calculate for the eigenvalues?
Sure
i see, thank you very much!
and eigenvalues of the circulant are f(t_j)=-2+2cos(t_j) for t_j=(j-1)2*pi/n j=1,...,n
may i ask why this is the case? our lectures just passed over this tbh and i didnt understand how
Because circulant matrices form a matrix algebra that can be diagonalized by, for example, the Fourier matrix, and in that case the eigenvalues are given by sampling the symbol (function f here, which generates the matrices (the elements of the matrix are given by the Fourier coefficients of f)), by for example that grid t_j. So, all eigenvalues of all circulant matrices are given in the same way.
i see
im still trying to understand the last bit about fourier coefficients but i think i get the gist of it
If you have Dirichle BC instead the grid is just t_j=j*pi/(n+1) instead (same symbol f)
f(t)=-2+2cos(t)=exp(-i*t)-2+exp(i*t) then Fourielr coefficients are f_{-1}=1 f_0=-2 and f_1=1 and that is the elements on the upper diagonal, main diagonal, and lower diagonal of the matrix (and in the circulant case for periodic BC you just add 1 oin top right, bottom left corners)
Dirichlet -> $T_n(f)$ and Periodic -> $C_n(f)$ and Neumann -> $T_n(f)+R_n$ where $T_n$ is a generated Toeplitz matrix $C_n$ is a circulant and $R_n$ is a low-rank matrix (only 1 in top left and bottom right corners)
Sven-Erik
(they all share the same symbol, eigenvalues are just given by different grids)
-Âż
ok i think i understand it
In engineering the matrices that diagonalizes these matrices (when combining different BC) are usually denoted dct-* and dst-* (discrete cosine and discrete sine transformes where *=1,2,3...)
Anyone here know about proximal operators
any help with how to get started on this stochiastic calc problem
i'll post it there
Hello everybody.
anyone has the demonstration or a paper that discusses the rate of convergence of stochastic gradient descent when the gradient is L-Lipschitz continues? I can't find it in any paper
Thank you. it just say SGD does not improve when we further assume f has Lipschitz gradient. So I conclude that there is no proof yet under this condition.
there are several papers on related topics though, just look it up on google scholar
there's other stuff here https://www.lpsm.paris/pageperso/picard/gradsto_pointdevue.pdf
Thank you. This powerpoint contains a reference which has a demonstration under this condition.
đ
Hey! I want to model a spiral transfer with a low thrust engine from an orbit of 2000 km to 40 000 km. Any advice on how to model this? I'm thinking express the equation of motions in cylindrical coordinates:
[
\frac{d^{2}r}{dt^2}-r\left(\frac{d\theta}{dt}\right)^2 = -G\frac{M}{r^2}
]
for the radial part, and for the tangential part:
[
r\frac{d^{2}\theta}{dt^2}+2\frac{dr}{dt}\frac{d\theta}{dt}=\frac{F}{m}
]
where (F) is the force of the engine then. I suppose I could just numerically solve this in MATLAB? How can I rewrite this system in a form that I can feed to a numerical method, like \texttt{ode45} in MATLAB?
Victor H
To feed this into a pre-built solver, rewrite this as a system of 1st order equations
Yeah, I'm struggling a bit with how haha, I totally forgot lol sorry.
How would I start off? If I remember correctly you usually express the second derivative by introducing an extra variable, and use that as an intermediate variable... I think
Write $x=\frac{dr}{dt}$ and $y=\frac{d\theta}{dt}$ so your differential equations become [\frac{dx}{dt}-ry^2=-G\frac{m}{r^2}] and [r\frac{dy}{dt}+2xy=\frac{F}{m}] and [y=\frac{d\theta}{dt}] and [x=\frac{dr}{dt}] so you have [\frac{d}{dt}\begin{bmatrix}\theta\r\x\y\end{bmatrix}=\begin{bmatrix}y\x\ry^2-\frac{Gm}{r^2}\\frac1r\qty(\frac{F}{m}-2xy)\end{bmatrix}]
éæŻć
You're such a fking chad
Thanks mate, I really appreciate it
I'll try to fix this, and inevitably fail and probably ask again. But damn bro, thanks a whole bunch
it seems ok ig
if anybody here knows anything about ML or something similar and is willing to help, id really appreciate it. pls ping me if you do.
so i have the following question i need help with. im pretty sure the weights matrix has dimension 2 (its a square 2 by 2 matrix) and the biases are also two dimensional.
im pretty confused on part (b) though
Chain rule
no, no, like. i dont know how all of these things are supposed to come together. i can do chain rule just fine, i just dont even know what to apply it to
to get from the input layer to the middle layer
is this on the right track?
Sure
so then what? just calculate partial derivatives?
Yeah
sick. so happy i was doing that right. thanks
wait so like
optimisation of the svm hard margin thing
they say set b = -1 and seem to negate the t and x
i thought it would be optimised by setting to b = 1 as the constraint no?
confused why they selected the A matrix as each instance of -t^ix^i and not just postive
guys, how would i go about finding eigenvector of a matrix with eigenvalue 1?
in code?
The power method is sufficient assuming that 1 is a maximal eigenvalue, or the inverse power method if itâs a minimal eigenvalue.
But if this not the case, thatâs okay, there are surely other iterative methods that are slipping my mind rn.
If you donât care about time complexity, finding nonzero x that solves (A-I)x=0 would work
i know
but power method wont get me as many marks đ precisely cuz of the time complexity
ill use both tho
thnx
im basically trying to find steady state vector for a markov chain
Yeah, doing (A-I)x=0 is only funky because A-I is singular
hm yeah, do you have an expected time complexity?
idk tbh đ
So given a method that has a region of absolute stability D (think about some methodsâ stability regions here such as Eulerâs), as you allow h->0, in order for your method to converge, does h\lambda_k need to be in this region D. In other words, can there be an eigenvalue with h\lambda_k outside of D without the method diverging (hint: this criterion is a part of what makes a method conditionally stable, think again about Eulerâs method)
just the power method will be considered comparitively inelegant to (A-I)x = 0 i guess
wait, u are saying power method is faster than finding eigenvector using (A-I)x = 0?
Well it depends on if 1 is a maximal eigenvalue
doing (A-I)x=0 would require Gaussian Elimination of A-I I would think
yep
thats O(n^3)
mhm
power method for a large matrix is..
Isnât that just doing Ax for multiple x? so thatâs n^2
then some number of iterations
complexity of dot product is O(n^3)
right?
tho i guess if its a vector with matrix it could be O(n^2), not sure
Nonetheless neither method is fast
Is there anything known about A?
or any property it possesses
indeed
idk what to make of that tbh
thats why these numerical methods are kind of tricky
how fast would a large matrix converge?
eh, i might just do the power method cuz im probably gonna run out of time
yeah rate of convergence is |\lambda_1/\lambda_2|, so you would need to know how large 1 is in comparison to the next largest eigenvalue
its a markov chain
so.. i think
wait lemme just use eig
these are the eignvals in this case
this is just a dummy version tho
actual matrix is something like this
ah yeah, so power method should converge quite nicely as long as you have similar ratios to that
each row adds up to one

