#linear-algebra
2 messages · Page 279 of 1
B1 = (1,1,1,0,-1)
a different basis
Then this B1 is a coordinate reprrsentation of some vector b1
Is it true that $B_1 = [b_1]_e$?
criver
Where _e indicates the canonical basis
well what is [b1]_e? it means that b1 = a1 * v1 + a2 * v1... where v_i is the standard base
sure then, I guess? it changes nothing basically right?
e.g. $B_i \ne [b_i]_C$
criver
please don't tell me to sandwhich xd
$I [v]_B = [v]_C$
wait
criver
what is $$[b_i]_B?$$
blackmamba[they/them]
(1,0,0,0,0)
yeah
So
I brings me from B to C
don't I need to multiply by (1, 0, 0, 0, 0)
and then I get $$[b_1]c = (1, 0, 0, 0, 0)$$
blackmamba[they/them]
and what does that mean
it means b1 = 1c1 + 0c2 ....
so b1 = c1
$$[I]^B_C*[b_1]_B = [b_1]_c$$
blackmamba[they/them]
I think that's incorrect because while $[b_1]_B = [1, 0, 0, 0, 0]$
criver
and this my matrix
that's the definition of the transition tho
So it requires a transformation like a covariant vector
For contravariant vectors, not covariant ones
do you agree this property holds for all w?
wait, one thing at a time
I'm not sure what contravariants are, but this is partly what the solution shows so it is probably correct...
if w is a contravariant vector then sure, if v is a covariant vector then no
Let me give you an example
Say you're in 1D
And you do a transformation that does [v]_C = 2 * [v]_B
That is, the coordinates become twice larger
So the vector coord [v]_B=2 becomes the vector coord [v]_C= 4
the transformation matrix is the identity one here
it's not it's 2
So it looks like your vector became twice longer in my example
I get your point w/ the issue
the thing is I wrote it to be true for identity transformations
But it is the same vector, so it should be the same length
Now let the basis vector b1 = 1 in the canonical basis
In C b1 is 1/2
In order to compensate for the coordinates becoming twice larger
From there you see that the vectors (i.e. contravariant vectors) transform inthe opposite wsy of the basis vectors
for the identity, the opposite is the identity
So nothing changes
But for other transformations its the reverse
You have to be careful how you transform vectors and covectors
The rules for transforming them are different
And the rules are such so that length and angle measurements of the same vectors agree between different coordinate systems
I really don't like video editing.
if A is anti-symmetrical(A=-A^T) does that mean that over R all of its eigenvalues are 0?
because:
$$Av = xv => -A^Tv=xv => (-A^Tv)^T=(xv)^T $$
$$=> -v^TA=xv^T => -v^TAv=xv^Tv => $$
$$-v^Txv=xv^Tv => x=-x => x=0 over R$$
blackmamba[they/them]
it's 0 or imaginary only eigenvalues
question is my proof correct or did I mess up the matrix multiplications
is the Jacobian limited to 2 dimensions?
no
Jacobian is just abs value of the derivative
whatever size the derivative is matrix wise doesn't change
I have a vector function and want to take the matrix derivative
well some people call the first derivative matrix the Jacobian
is the result some form of jacobian?
yes
by vector function you mean (x(u,v), y(u,v), z(u,v))?
you can get a non-square matrix e.g. for surfaces in 3D
but then you can take sqrt(det[J^TJ])
but is the Jacobian still 2D?
$$Av = xv => -A^Tv=xv =>transpose (-A^Tv)^T=(xv)^T $$
$$=> -v^TA=xv^T =>{"multiply by v"} -v^TAv=xv^Tv => $$
$$-v^Txv=xv^Tv => x=-x => x=0 over R$$
what do you mean is the Jacobian 2D?
fk
you can define it for an arbitrary dim vector function
if I have M vector derivative of N vector
can anyone check this?
the jacobian is MxN
I transposed it and in the second row multiplied both sides by v
so you have: (x_1(y_1, ..., y_M), ..., x_N(y_1, ..., y_M))?
sure you get a matrix, that will not be square if M!=N
sure, thats not my issue
whether the Jacobian is defined as MxN or NxM depends on convention
it depends on how you like to order you element
now if I have a PxQ matrix derivative of R vector function, is the Jacobian PxQxR
wdym O vector functions?
length, bad naming
I still don't understand what that means
a function that outputs a vector of length R
but that was already the case:
x(y) = (x_1(y_1,...,y_M), ..., x_N(y_1,...,y_M))
you feed this an M-dim y vector
and it spits out a N-dim x vector
so wdym vector of length R?
isn't N the R that you are referring to
to be sure, x(y) is a vector with N elements
so taking the M different derivatives of the N elements yields you the NxM or MxN Jacobian matrix
I see, let me rephrase. Jacobian of a function with matrix input and vector output
matrix input can be vectorized
m = vec(M)
then you're back to the original thing
but by all means, you can form a holor with a 3rd dimension if you want
a 3-d array
sure you can produce something that is NxMxR
or some permutation of those
it's just a derivative
its not a jacobian then though right
$\frac{\partial x}{\partial M}$
criver
I haven't seen it anywhere, but you can generalize jacobian to mean that I guess
I don't think it matters does it?
thanks, the reason im asking this is because im trying to generalize the gauss-newton algorithm
criver
it's just terminology, it shouldn't have any bearing on whether you're able to generalize an algorithm
as mentioned you can vectorize the matrix
then you can probably apply the original algorithm as is
thats a good idea, thx
$(\lambda+\overline{\lambda})\langle x, x\rangle = \langle \lambda x, x \rangle + \langle \overline{\lambda} x, x \rangle = \ \langle Ax, x \rangle + \langle x, \lambda x\rangle = -\langle x, Ax \rangle + \langle x, Ax\rangle = 0$
criver
@magic light
in the above x is an eigenvector corresponding to the eigenvalue lambda (e.g. we can assume x!=0), that is lambda * x = A * x, then from the above it follows that lambda + conjugate(lambda) = 0
but that's possible only if lambda is 0 or imaginary
the inner product is <u,v> = v^H u
for the first equality I have used linearity, for the second I have used that lambda * x = A * x if x is an eigenvector, and I have also used the conjugation property of the inner product when moving scalars between the two slots, for the third equality I have used that the adjoint of A is -A, and that lambda * x = A * x if x is an eigenvector and lambda is its corresponding eigenvalue
x is eigenvalue
eigenvector in my case
this thing
$\langle u, v\rangle = \overline{v}^Tu$
criver
it's the standard inner product for C^n
another way is to simply see that the characteristic polynomial is only solveable for eigenvalues 0(in R)
some other vector space
Are the square and parallelogram both can be written in the form of $t_1E_1+t_2E_2$ ?
Potato
@grave garden no, points in the square take the form t1E1+t2E2. the solution shows that points in the image of the square take the form t1(1,1)+t2(-1,2)
Given a vector space V, can we know how many linear map are there ?
a map is completely determined by where you send the basis, now think through
So they are the same number as the maximal set of linear independent ?
those are definitely some words coming out of your mouth
...are you like, working with a vector space over a finite field or something?
Yes
Is it wrong ? 💀
this message here?
it's nonsensical.
I see
Im having a hard time with linear mapping 
What are we trying to do when talking about describing the image of a subset under a mapping ?
Are we trying to find $W(F(x,y))=1, (W(x,y)=x^2+y^2)$?
Potato
How to find number of solutions of linear system of equations over finite field?
by solving 🙈
Give a system of equations, say 3 with 3 variables, it has a unique set of solution when the determinant of matrix of constant ( i forgot its name ) is ≠ 0?
So $a≠\frac{3}{2}$ and b belongs to R ?
Potato
Can anyone recommend me good book for vector space. I am not able to get it
A few books you could look into:
- Linear Algebra by Friedberg/Insel/Spence
- Linear Algebra Done Wrong by Treil, available for free from the author's webpage
- Linear Algebra Done Right by Axler; this one is a bit infamous for unconventional presentation of determinants
Ok
have you already put this system in RREF?
matrix representation of L_A should be equal to A right ?
<@&286206848099549185>
but when i substituted some value coordinate matrix of L_A(v)=Av wrt gamma is equal to [L_A]_gamma * (coordinate matrix of v wrt gamma) . the thing i am not getting is matrix representation of L_A is not A but still i get the same ans
Yes i got two equations but idk how to solve them
show your equations
$x_1+x_2+x_5=1 & x_3+x_4+x_5=4$
BLツ
\\ for a newline
okay so x_1 and x_3 are pivot variables and everything else is free
$x_1+x_2+x_5=1 \ x_3+x_4+x_5=4$
BLツ
Yes
how many values can each of the free variables take?
5 and 5 for x_1 and x_3 respectively
i said free not pivot
you mentioned precisely the variables you shouldn't have mentioned
and you could have even just said "5" and nothing else
By solving this both two i got $x_1+x_2+4x_3+4x_4+3=0$ so if I choose any three of them then i got remaining one right?
BLツ
Number of free variable are 3 then?
yes, there are 3 free variables.
or
as you would like to put it,
Yes, Number of free variable are 3
So we need to check possibility of values for free variable
you're overthinking it again
any assignment of values to the free variables, of which there are 5^3 = 125, will lead to exactly one solution of the system
Ohh for every choice we get unique solution
I mean assignment of values to the free variable
Then number of solutions will be 125 right
yes
if two vectors are linearly dependent can I say u = xv? for u, v dependent?
x is some scalar
or is that wrong 🤔
it's correct
great - so I solved this question really differently than the professor using this fact
if the norm of u, v, w = 1 and <u, v> = <u, w> = <v, w> = 0, can I say that u, v, w are linearly independent?
because, if I assume they are not(for example u and v are not) then u = xv and then <xv, v> = 0 => x<v,v>=0 but since the norm of v is 1, then <v, v> = 1 as well, meaning x=0 => u=v=0 which is a contradiction to the norm being 1?
Since a * u+ b * v = 0 for (a,b) != (0,0)
Without loss of geeraity assume a!=0, the u= -b/a * v
Let me see
it's proof by contradiction(i obviously simplified)
But that's 3 vectors
yeah I just wrote for 2 here
Then the assumption must be u = a * v + b * w
well if u is not dependent on v and v is not dependent on w that means u is not depenednt on w
that's not how it works
thereexists u
Such that u != a *v and u != b * w but u = a * v + b * w
The definition for linear independence
Is that sum_ia_i* v_i =0 <-> a_1 = ... = a_n = 0
well yeah, but if we show that u is independent of v, and v is independent of w, then u is independent of both v and w no?
But it's the same idea
No
u and v ay be linearly idepedent on their own
v and w may be,and u ad w may be
But that doesn't mean that u,v,w are
So the subsets being independent does't mean that te whole set is
The logic should go: let u,v,w be lin dependent
Then there exists (a,b) != (0,0)
such that
u = a * v + b * w
Now 0 = <u,v> = a * <v,v> + b * <v,w> = a * |v|^2
And the inner product with w too
You get that either v is 0, or a is 0
hmm
And from the second that either w is 0 or b is zero
Let's do that for n vectors
Suppose they are linearly dependent
Then there is an a_i !=0 such that
I didn't know proving that all subsets of (u, v, w) are independent except the set itself isn't enought o show the entire set is independent
$\sum_{i=1}^n a_i v_i = 0$
criver
Now assume that $\langle v_i, v_j\rangle = 0, i\ne j$
criver
Then
$\langle v_i, \sum_{j=1}^n a_j v_j \rangle = a_i \langle v_i, v_i \rangle = 0$
criver
So either a_i is 0 or v_i is zero
If v_i is zero, then sure,it islinearlydepenfent withevery vector
thanks
If v_i is not zero however thena_i = 0, and we assume a_i !=0 -> contradiction
So te only way for $v_1, ..., v_n$ to be liearly dependent with $\langle v_i, v_j\rangle = 0, i\ne j$ is for one or more of the vectors to be zero
criver
If you remove the zeroes, then you get a linearly independent set
The geometric intuition is that you can have 3 vectors lying in the same plane, with none parallel, but they are linearly dependent since they are in the same plane and every one of them can thus be written as a linear combination of the other two.
If two are parallel then clearly they are linearly dependent. But if none are parallel and they are all in the same plane , then any two of those form a basis for the plane, so the third one can be expressed as their linear combination.
Got it.
Another question where I did something differently than the professor
actually it's wrong
now I noticed
lol it's so long
if I show DimV=n and v1, v2... vn are linearly independent, can I say that they span the space V by definition?
or is it not enough?
acutally just thought of a counter example to that... damnit
what is the counterexample?
say V is the space of anti-symmetrical matrices 3x3, dimV=3
but the 3x3 matrices
1, 0, 0
0....
0, 1, 0
0...
0, 0, 1
0...
don't span V
even though they're linearly independent
could be more of an English language question, but I simply don't understand this 😂 I don't even know how to ask.
F^n is a set of n lists ..... of elements of F?
i think it's like F^2 = {(0), (0, 1), (0, 1, 2)}
oh
maybe not
it's actually
{(0, 1, 2), (1, 0, 2)...}
all the permutations of {0, 1, 2}
It's very beggining of linalg book. I just came back to it, because I want to properly understand linalg. In uni I just knew how to solve majority of problems and it carried me. But it was more like mechanical skill rather than understanding theory/proofs
those matrices aren't skew symmetric to begin with
exactly, they are linearly independent of the same dimension but don't span the space
i mean, they aren't even in the space...
yeah...
i think you have to adjust your original statement
F^n is a set of n-tuples with coordinates in F
to be about subspaces of a larger space
because any dimV linearly independent vectors in V do in fact span V and are a basis
if v1... vn are from a subspace of V
Set F = R then it's R^n vectors, set F = C, then it's C^n vectors
your statement is something like "if U is a subspace of V of dimension n, then any n linearly independent vectors in V span U", which is wrong, because if you take a vector not in U, then its instantly wrong
I realized the statement is wrong after writing it, I just thought if I take any n independent vectors (v1... vn) then they span any space of dimension n which is clearly wrong
I'm trying to rephrase it somehow for myself, so I can verify I understand the concept, but I can't. I'll give it few more minutes
my point is you have to specify where your vectors are from
from the space? from some ambient space?
What are elements of R^3?
please show the question
I know answer is {(x,y,z): x,y,z E R}.
So.... all possible combinations of x,y,z which are within R? x,y,z combinations being tuples
T:V->W is bijective(I think that's the word), v1... vn are in V
prove:
v1... vn are a basis in V iff T(v1)....T(vn) are a basis in W
So lets assume v1... vn is a basis and look at
a1T(v1)... anT(vn)=0
if we prove a1...an = 0 then T(vi) are linearly independent
by transformation properties:
T(a1v1... anvn) = 0
and since it is bijective then a1v1... anvn = 0 and thus the only solution is a1=a2=...=an=0
The only difference with F is that the field doesn't have to be R
that's the first direction
e.g. it can be C or Q
@subtle walrus
well, actually that just shows they're linearly independent
I'm not sure why they span the space
I guess because T(v1) are in W
because any n linearly independent vectors inside a space span it
Write the definition of linear independence
otherwise, you could find one vector not in the span and extend the set to make a set of n+1 linearly independent vectors
Ok thanks. I was very unclear with wording and firstly didnt understand if it means there are 3 tuples, or tuples consist of 3 elements. Although I could reverse engineer it from R^3.
A basis is a set of linearly independent vectors that span the space, if you have a bijection then you need the same number of vectors, and it remains to be shown only that they are lin indep.
A n-tuple is just an ordered list of n elements: (a_1,...,a_n)
the other direction though:
assume T(v1)...T(vn) are independent
and look at
a1v1...anvn=0
T(a1v1... anvn)= T(0) = 0
a1T(v1)...anT(vn)=0 => a1=a2...=an=0
so v1... vn are linearly independent, and because T(v1)...T(vn) span W then DimW=n and DimW=DimV=> DimV=n and thus a1v1... anvn also span the space, right? @subtle walrus
more or less
you need to know the dimension is n before you can conclude that T(v_1), ..., T(v_n) span your space
alternatively if you have one direction, the other is exactly the same but you replace T with T^{-1}
sorry, I assumed T(v1)...T(vn) span the space
or more specifically, that they are a basis
so they are a minimal span, so the dimension is n
i think you have all the right ideas, but you should maybe think about this again tomorrow or so
tomorrows the test 🙂
thx
Bro njose
blackmamba[they/them]
do I write the matrix as
1 -1
7 -3
or
1 7
-1 -3
whats the image of the standard basis?
it matter if you don't want Tv to mean v^TA^T
I'd suggest sticking to Av, it has the added benefit that it hints that v is a contravariant vector
T(1, 0) and so on
so T(1, 0) = [1, 7]
so [1, 7] would be the column
oh that's the first one
yes
ur welcome
the main reason for doing that is
this way Tv=Av
ie if ur matrix A is made this way then applying T is the same as left multiplying by A
Right.
Since diagonalizing can be seen as the characteristic polynomial having a root, can you always diagonalize a matrix in a field extension?
@night wren diagonalizable doesnt mean that. it means an eigenbasis exists
Concretely, consider $\begin{bmatrix}0&1\0&0\end{bmatrix}$. Is characteristic polynomial is just $\lambda^2$ which definitely has a root, but it's not diagonalizable.
Troposphere
But if a matrix has n distinct eigenvalues, then it is diagonalizable, right? If I have a characteristic polynomial of degree n with n-1 roots and is not diagonalizable, can I consider the matrix in a field extension where the polynomial has n roots (suppose they are distinct)?
Yes, if there's any field extension where the characteristic polynomial has as many distinct roots as the side length of the matrix, then it will be diagonalizable over the extension field.
(It can never happen that there is exactly one root missing, though).
whats the best way to formally prove this? https://cdn.discordapp.com/attachments/935224727746805792/941797584656298044/Screen_Shot_2022-02-11_at_2.46.08_PM.png
for part a
If we have V as a euclidean vector space, and T an orthogonal transformation on V, how do I show that T has at least one real eigenvalue without using a characteristic poly/complex conjugate argument when dim V = 3
Guys is linear map always injective ?
nope
Is "linear map" and "linear transformation" the same statement ?
Yes.
I misread about kernel 
if anyone has any advice on part a that would be amazing ❤️
T is injective iff ker(T)={0}
and its surjective too if T:V->V 🙂
no
Being an operator doesn't make it surjective
T is surjective iff Im(T)=V
for T: V->V
I think they were trying to say an injective operator on a finite dim space is surjective
one-one operator on finite dimensional vector space => surjective as welll
Jinx
🙈
bump
dim{0}=0 or 1 ?
0
I see thanks !
the empty set is a basis for {0} 🤓
I thought 0 is the basis for {0} so its dim is 1
nah {0} is linearly dependent
Ohhh
char poly will have degree 3, now what can you say about the root
"without using char poly"
Potato
Do you still consider the same mapping ?
yeah ok I didn't see that, the argument will be a little involved tho @sleek sundial
one thing you can try is show det(A-xI) is a continuous function on x from R -> R and then show that large enough -ve value of x will give you -ve det and large enough +ve x will give you +ve det so from continuity it must be zero somewhere
(still uses the char poly but implicitly
)
I can show that $v_0+u$ is indeed the solution, but how do I prove that there is no other solution that is not in the form $v_0+u$ ?
Potato
if v0+v is a solution with v not in the kernel, then L(v0+v) = w+(some non zero vector) != w therefore can't be a solution
Ahhh I see
here, i was asked to prove that identity written in the last line of the second image. red markings are from my professor. what makes this invalid? (apologies for low quality btw, can get a better version if need be)
should i just have been more clear with the whole cofactor expansion thing in the beginning?
I can’t use that argument for this unfortunately
Thx I’ll try that
@night wren
But if a matrix has n distinct eigenvalues, then it is diagonalizable, right?
yes, try proving this
can I consider the matrix in a field extension where the polynomial has n roots (suppose they are distinct)?
yes, an example is the pi/2 rotation matrix which isnt diagonalizable over R but IS over C (eigenvalues i,-i)
how do I solve 40?
39 was straightforward because the first 2 columns are multiples, no clue about 40 though
perhaps you could apply the definition of linear (in)dependence
i'm trying to reduce it to RREF and see if that helps
but i'm having trouble with fractions mixed with variables
if you concatenate the vectors of a linearly dependent set into a matrix, its determinant will be 0. think u can do it from there?
sure
i have a feeling you're taking a suboptimal route if you are going with row reduction
yuck!
i mean first off why decimals
Not lookin good
ok so like
do you explicitly want to do matrixbashing on this
or are you ok with a method that is not matrixbashing
your equations should look like this:
-2x + y - z = 0
z = 0
x - 3y + rz = 0
that's the equation x * (first vector) + y * (second vector) + z * (third vector) = 0
do you understand how this came about?
yeah
do you understand how to proceed with solving this system of equations?
||you should get that it only has the trivial solution no matter the value of r||
oh wait I can plug in z
right?
Wait nvm that gets rid of the r
So does that mean that there is no value for which it is linearly dependent?
Ok thank you @dusky epoch
any hint .
That x•v/v•v thing looks like the component of x on v
Potato
By the given $P(v)+Q(v)=I$
Potato

Ohh it is an identity mapping 💀
I take it as identity element
Is the third condition optional ?
I didn't see they use it
6/2 (1+2) =
why is the product of elementary matrices a regular matrix?
define 'regular matrix'
every elementary matrix is invertible
the product of invertible matrices is invertible
thanks Ann :)
T is housholder reflection
you can assume v=e1 and the see how a T looks like
V is arbitrary ,i tried with v=e1 , i got the answer, but i want to know the proper solution by considering v as arbitray
Yes but denominator is misssing
proper way would be to see that T² = I
meaning only possible eigen values are +1 or -1
now -2(v'x)/(v'v)x means we are subtracting twice the component along v
means GM of eigen value is -1 and so rest of the egen values are 1
Along x?
therefore trace = n-2, det = -1 and orthogonal as well (check symmetric)
no we are flipping along v
in other words reflection about the hyperplane perpendicular to v
I have solution ,but i am not getting one line
Let me send that
Is the last option the answer ?
there is another way tho, we know that eigen values of $M= \frac{vv^T}{v^Tv}$ are one 1 and rest 0. so now T=(1-2x)•M so eigen values will be one -1 and rest 1
yes
which line?
Span v
all vectors in span(v) has eigen value -1 not 1
they even showed it, probably a typo
How?
T(v)=v-2v=-v
I get this part

yeah than eigen value is -1 not 1
Just find the ....
which one exactly 😐
oh "ust find the eigenspace"
they showed eigen space of -1 is span(v) and eigen space of 1 is span(v) \perp
Yup
This is what i am not getting
the 2nd part?
ok are you able to see why span(v) perp has eigen value 1?
like just plug in x•v=0
then every vector perpendicular to v is in the eigen space E_1, so dim E1= dim v\perp which is n-1 (here n=3 hence 2)
Why they considering thete are two eigenspace on is sapn V and other sapnv perpendicular?
maybe an alternative approach helps
we can rearrange T(x) to write it in matrix-vector form in the canonical basis, and maybe this helps you out
Guys, is showing bijection enough to show that a mapping is isomorphism ?
notice that $T(x) = x - 2 \frac{x \cdot v}{v \cdot v}$ can be rewritten as $Ix - 2v \left( \frac{x \cdot v}{v \cdot v} \right)$
Edd
so far we've only added in an identity matrix and moved around a scalar, so nothing funny there
then, we notice that $x \cdot v = v^T x$
Edd
lol no
so that $T(x) = Ix - 2 \frac{v v^T x}{v^T v}$
Edd
and now we factor out the x from the right
definitely this is the approach I would recommend
$T(x) = (I - 2\frac{vv^T}{v^Tv})x$
Edd
u need to show linearity as well
in the context of VS
now you notice that $I - 2 \frac{vv^T}{v^Tv}$ is a symmetric matrix
Edd
and this means it is diagonalizable in an orthogonal basis
now pair this together with what ryu already told you regarding v being an eigenvector with eigenvalue -1
I see, i've been doing just bijection for past several exercise 

Do I need to show linearity of $(GF)^{-1}$ as well or just $GF$ ?
no
Potato
bijectivity and linearity is enough
no
Just wondering, can one determine how many of the mapping are there ?
Wait can we use the fact that eigenvalues of I- A matrix is 1- eigenvalues of A?
I know this fact
but do you know why?
(it's easy to show if you read what i wrote above)
you noticed that vv^T is a symmetric matrix diagonalizable in some orthogonal basis, so let's say c vv^T = QDQ^T
since Q is orthogonal, QQ^T = I
so I + c vv^T = QQ^T + c QDQ^T = Q(I + cD)Q^T
this also immediately establishes you'll be working with orthonormal eigenvectors, one of which is v
so what ryu says follows
I have to solve this on my notebook first. Thank you for the help both of you
I need someone to check whether my logic is correct because there's something bothering me when solving a problem.
I have the minimisation problem
$\min_x -v^Tx, , 1^Tx = C, , x\in[0,1]^n$
criver
I take the negated gradient: v
And project it onto the plane with normal 1/sqrt(N)
$\vec{d} = \vec{v} - \frac{1}{n}(\vec{v} \cdot \vec{1})\vec{1}$
criver
Thus if I have an initial guess, for example x^0_i = C / n, the gradient descent step will look like
$\vec{x}^1 = \vec{x}^0 + \gamma \vec{d}$
criver
And to take into account the [0,1] constraint I do
$\vec{x}^1 =max(\vec{0}, min(\vec{1}, \vec{x}^0 + \gamma \vec{d}))$
criver
What bothers me is whether x^1 satisfies 1^T x^1 = C
if you did it right, yes
since you're adding a contribution orthogonal to 1^T
On the other hand my argument for the clamping is that the clamping is equivalent to projecting out the e_i component
of d
this is a valid formulation of projected gradient descent, yes
What bothered me is that if I bring gamma large enough
Then I would get a solution of 1s and 0s except where d has a 0 component
1s and 0s don't necessarily sum up to an arbitrary constant C in [0,1] however.
So is it always the components where d is 0 that make up for the fractional part?
Or is there some flaw in this clamping that in fact violates the 1^Tx = C constraint
hmmm
Notably I would get 0s where d_i < 0, and 1s where d_i > 0
And x^0_i = C/n where d_i = 0
That seems to be the optimum according to the above
but that's part of the problem, right? by doing gradients you are now considering a proximal form
and it may very well be that the distance you can move is 0
this parameter gamma cannot be arbitrary
it can only be as large as the feasible region allows
I think it can, since v is constant
yes but you have constraints
equality and inequality
so no
otherwise the solution would just be infinity in the entries
inf* v
i.e. the gradient doesn't change, except when it meets a [0,1] constraint, but at that point this is equivalent to removing the i-th component of the gradient
So that's where the clamping comes from
it doesn't remove the element though, if you simply do that you change the direction
and it's no longer the direction of the gradient
here's how I think about it
Let's say I do the largest possible step gamma
such that one [0,1] constraint comes into play
without loss of generality let that be x_i = 0
now I project away this from the gradient
but that's equivalent to setting d_i = 0
Since $d' = d - (d \cdot e_i)e_i$
criver
ah, indeed
but yes, this also means this is no longer a direction of steepest descent
or possibly no longer feasible in the other constraint
with respect to the constraints it is the steepest descent direction
what makes more sense here is to instead use this result to cap gamma
with respect to the inequality constraint, yes
but it is no longer a feasible point in the first place
since as you noted, you are ignoring the equality constraint
so while all of the above seems plausible to me, I was wondering whether I am overlooking something regarding this 0,1 solution with occasional C/n terms that supposedly satisfies 1^T x^* = C
yes, that you're ignoring the 1^T x = C entirely
you can add it to the original cost function with a lagrange multiplier
or you can use the inequality constraints to cap gamma
instead of truncating
because ATM you're projecting onto the feasible set of the ineq constraint completely ignoring the equality one
lagrange multiplier doesn't help here, I tried it, linear independence of constraints doesn't hold for kkt
then simply cap gamma
So min(max is indeed not equivalent to doing the steps by capping gamma
the new x^i at each iter can be treated as a ray that intersects a hyperplane given by one or more of the inequality constraints
you can find the intersection of this ray and hyperplane analytically
and parametrize it in terms of gamma
I am still trying to reason about it
Since dot(d, 1) = 0 even if I set some components of d to zero
So maybe they are equivalent after all
My idea is that I should not be able to escape the plane 1^Tx = C, if x^0= C/n is on the plane and 1^Td = 0
But 1^Td = 0, even if I zero out some components
that much should be true, as long as d remains orthogonal to 1^T, yeah
so, as long as you are taking gradients only on the hyperplane with normal 1^T
So afterall the min(max should provide the minimum?
I can't shake off the feeling that I am overlooking something
hmmm
I'll try to get a proof by induction and come back later I guess
i'm not sure what you meant about not satisfying lin indep for kkt btw
My worry arose from the optimum being made up of 1s and 0s and potentially some C/n
Then their sum is numel(sgn(d)>0) + numel(d==0) * C / n and that must be equal to C
So either the $max(0,min(1, x^0 + \gamma d))$ is an incorrect expression or the above indeed sum up to C.
criver
what about the kkt stuff
The constraints are linearly dependent, it doesn't give anything useful.
what i mean is that you can take the gradient of -v^Tx + c 1^Tx
and adjust c as needed
I tried this, but you get v = -c
And it doesn't account for the other constraints
I also tried it with a quadratic term, it makes things harder as then I need to slide along geodesics on a hypersphere
I'll focus on trying to prove/disprove more formally the min max thing for now
that's not quite right, c is a scalar and v is a vector. you'd need to do an orthogonal projection of v onto the vector of 1s
and yes, another term would be needed for the inequality constraints
c us a scalar but c * 1 is a vector
When you add all the terms they are linearly dependent
Same thing happens in the quadratic ||x||^2=C^2 case
It's essentially a linear programming problem in the linear case, but my issue is not with that it's with proving/disproving the min max optimum. I'll work on it and try to post something more coherent when I've worked out the details.
at any rate, yeah, doing the projection as you did doesn't appear to work in general
take an example in R^3 where you start with x0 = 1/3[1,1,1]^T, with C = 1
you can built an orthogonal basis for the hyperplane with normal [1,1,1]^T by using the basis vectors [-1,1,0]^T and [1/2, 1/2, -1]^T
I need help to find this equation. m = 2/3, passing through (-4, -6). I checked the answer and it is in general form (Ax + By + C = 0). I am unsure as to how I'm supposed to get from m = 2/3, passing through (-4, -6) to Ax + By + C = 0 . <@&286206848099549185>
and let's say you add the second vector of this basis to x0. after clampling, the vector no longer satisfies the equality constraint
please check #prealg-and-algebra
ty
on the other hand, one can make a sort of trust region by instead checking that for a given step size, you are still within the feasible region
i think that makes more sense here
enforce the ineq through the step size instead of through projections
(i'm not sure this converges to the true solution btw, but your current method has very simple counterexamples showing it doesn't work)
solution according to min max is 1,1,0 which sure enough doesn't satisfy the constraint
yep
my geometric interpretation of this is "how parallel can we get to v"
with some constraints on needing a specific component in a given direction and not being able to add any other component that is too large
so there must also be some way to reformulate it as a convex combination of v^T and 1^T
$\gamma^{i+1} = \inf {\gamma : x^i_j + \gamma d^i_j = 1, d^i_j > 0} \cup {\gamma : x^i_j + \gamma d^i_j = 0, d^i_j < 0}$
criver
because if I take steps iteratively then I assume I can show by induction that it will stay in the plane, it's just that this process is not equivalent to the min max thing I had
Yes we want to maximize the dot product with v after all, but the constraints make this harder
well, i already presented one approach to you
do you have any example for which you know the true solution btw?
Not really, no.
Thanks a lit for the counterexample though
At least I won't be stuck trying to prove something that isn't true
i would really try a little longer with kkt and replace the ineq by a barrier function
or try the gamma stuff
That feels off, since this is a linear programming problem, so there are more efficient ways to solve it than through penalizers
I think that this iterative gamma thing is a special case of the simplex method
well sure, you can always use the tried and true solution approach
should be able to transform the equality constraint into an inequality one or backwards with some extra variables
but this is pretty much the same as doing KKT
I guess I am just deriving a simple version for the special case that I have yes. Thanks a lot for the counterexample again. I'll just stick to picking the largest gamma that hits a constraint, descent, update gradient, then rinse and repeat.
is anyone here able to help me rq, itll probably be easy for you
if not i understand
i'll give you a hand cuz the other channel seems dead
ty
notice that in slope intercept form we have y = mx + d
yes
and in standard form we have ax + by + c = 0
yep
so let's do some arithmetic on them and see
first, we can write -mx + y - d = 0
this is already very close
what sometimes is done is that one notices that m = rise/run, let's call them p/q
so that -p/q*x + y - d = 0
and we can multiply both sides of the equation by q
so that now we have -px + qy - qd = 0
which is of the form ax + by + c = 0
holy shit your smart
this only works when q is not zero, so when we don't have a vertical line
okay
if you didn't understand any of the details, you can also try opening a help channel, maybe you'll have better luck there
check out #❓how-to-get-help
i think i can work with this, thank you this helps a ton
I realized where my mistakes was working through your example btw. Zeroing out a component of the gradient can make it so that 1^Td != 0 as you said. So after each zeroing out I have to project the gradient back onto the plane with normal 1. Now everything makes sense. Thank you.
It was my gradients leaving the feasibility plane that was the issue, so I had to project back.
i thought about that as well, but it's possible that projecting back can make the inequality be violated again... i think?
you'd have to check this
it may or may not be necessary to iterate projection -> clamping -> projection ->...
Yeah it needs to be projected onto the lower dim hyperplane orthogonal to both ei and 1
otherwise it gets stuck
but you'll notice that this depends on the length of the original gradient vector, too
so in some sense this is similar to directly picking gamma
(in a very handwavy sense
)
Basically at step k, my gradient needs to be in the hyperplane orthogonal to $1, e_{i_1}, \ldots, e_{i_k}$
criver
i'm not sure i see why
where the index i_j is the index of the constraint that comes into effect at step j
So it projects into a smaller and smaller space until I get a single point, or it will stop before if d has some zeros at the very beginning
The point is that d^k eventually becomes 0, so no step is possible
sounds ok
Should lead to a global minimum for convex sets I believe
and a hyperplane cutout by a hyperube should be convex afaik
intersection of convex sets should be convex, yea
The only nasty thing is that the set $1, e_{i_1}, \ldots, e_{i_k}$ is not orthonormal, although very close to it
criver
right, but it shouldn't be all that complicated to handle in code
start with 1 in your set, and add in new vectors with gram schmidt as needed
I can always just fix the 1 I guess
Since the rest are orthonormal
And the orthogonal complement remains unchanged
actually, i don't think this is needed. since all you need is to take away components parallel to these vectors.
yeah
I wonder whether this orthogonalization will mess up with the length of the projected vector though
i.e. $v_{\perp} = v - v_{\parallel}$
criver
Won't the length of v parallel be affected
yeah, I don't think it should matter provided the projection to parallel has the proper denominators, or the basis is orthonormal
Ok that's great. Thanks a lot again. It was very helpful.
cool
could someone explain this?
what definition of rank does your book use
number of nonzero rows in RREF / number of basic variables
and do you know how this related to the number of linearly independent columns in a matrix?
(e.g. through the so-called "pivots")
if it's linearly independent, the nullity is 0
i don't think that's the correct thing to consider here
ok
unless you can say how the nullity relates to the number of (non)zero rows in RREF
why do I have a bad feeling about this 
lol
Ik that nullity is n-rank but I don't think that helps here bc there are no values
Edd correct me here but $\m{1 & 0 & 1 \ 0 & 1 & 1}$ has LD columns and rank = 2 but it's equal to m=2?
is this a true/false question? cuz otherwise...
ffs
then ryu already gave you the answer

that moment when your linalg book be trippin
the book says this
everything there is true
so it implies that it would also be linearly dependent if the rank of A is GREATER than m
Therefore the question is false
what do you mean by this?
oh wait
wrong row
one sec
ok nvm I'm confused again
ignore what I said
alright I get what Ryu is saying
Thanks @lavish jewel @zinc timber
@lavish jewel Turns out my minimisation problem had a trivial solution actually not requiring gradient descent.
If I sort the v_i in ascending order, and let C = m + delta, where delta in [0,1). Then I need to set x_i = 1 for the i corresponding to the m largest elements of v. And x_j = \delta for the m+1 largest. And then everything else should be 0.
This satisfies the constraint, maximizes the energy. And if there are other optimal solutions it is in the case where some v_i are equal.
Reminds me of m-term best function approximation wrt L2 in an orthonormal basis but with quantisation constraints [0,1] and total sum constraints.
i.e. I think $\min_{x\in[0,1]^n, , 1^Tx = C} ,|v - x|^2$ has the same minimum
criver
except for when some v_i are equal
aha, i see
The m-largest v in the L2 energy need to be >1 though, and the rest <0, and something else for the m+1st. So I guess it's not exactly the same.
The question says to "Find the general solution of the system in parametric vector form" so is final answer correct??
hello
,w rref[[1,1,1,1,1,1],[1,2,3,4,5,6],[1,0,-1,-2,-3,-4]]
Yeah I got that but I'm wondering the part of parametric vector form
the vector for x3 should be (1,-2,1,0,0)
Ok i fixed it
its good beside that
So vector x with the columns is my final answer correct?
i hace a question
The moment of a force F is defined as M = F * r. A force given by the expression 4i + j N is applied at point (2,1)m. Find the moment with respect to the origin
since vector r is (2,1) and the force is (4,1), we cross product the 2d vector, which formula is (a,b)*(c,d) = (ad-bc)k. vector r * vector f = (2,1) * (4,1) = (2-4)k = -2k.
thus, the moment of force with respect to the origin is -2k?
Ok thank you
please help me with a vector question
Given the vectors u1 = 2i-j+k; u2 = i+3j-2k; u3 = -2i+j-3k; and u4 = 3i+2j+5k, find the value of the scalars a, b, and c in which u4 = au1+bu2+cu3
plug them in
just like the following
2ai-aj+ak+bi+3bj-2kb-2ci+cj-3ck
then you need to get a common feild out like the following
i(2a+b-2c)+j(-a+3b+c)+k(a-2b-3c)
and then you just create three equations
2a+b-2c = 3
-a+3b+c = 2
a-2b-3c = 5
and solve them
Is there an easy way to invert a matrix ??
There are a couple of algorithms here:
https://en.wikipedia.org/wiki/Invertible_matrix
In linear algebra, an n-by-n square matrix A is called invertible (also nonsingular or nondegenerate), if there exists an n-by-n square matrix B such that
A
B
=
B
A
=
I
n
...
use python
Wolfram alpha?
no, why use something free when you can go out of your way and use python?
sorry ryu you know i had to do it
i should have said simple 
Hey quick question regarding definitions. If say A is a linearly dependent set of vectors but A is also a subset of B, does that mean that B is also linearly dependent?
Yes.
Because a linear relation between vectors from A is then necessarily also a linear relation between vectors from B.
Okay great, thank you for the explanation
If i have a strong positive autocorelation how do i make adujstments to my lag? and where to I modify parameters?
Is an eigenspace a subspace of the range space of a square matrix?
wow the first questions of LADW are already killin me
@grim cliff yes, u can show this
Let V be a linear subspace of $C^n$ and t an automorphism of $C^n$, and $E_{\lambda,t}$ be the eigenspace of t associated to the eigenvalue $\lambda$.
I read somewhere that
$V\cap t(V)\cap E_{\lambda,t}=0$ iff $V\cap E_{\lambda,t}$=0
Does anybody have an idea why this is?
fajitas
the if direction is easy, since intersection is symmetric and associative
in the only if direction, think first of what V cap t(V) is
See Im confused cause couldn't it be V\cap t(V)=0?
what is an automorphism?
Well it's an invertible endomorphism
what does this tell you about its image and kernel
The kernel is trivial
But I'm not sure about the image
rank nullity theorem
I know the image has the same dimension
invertible maps are surjective
Is the conclusion I'm supposed to get that t(V)=V?
yes
yes
still, you raise a valid point in that the problem should be tackled separately when t(V) cap V = 0, as this means V is the 0 vector space
but you quickly reach that the statement holds
Ahhh okay I think I see what you mean
Thanks so much guys,
If I can probe one last thing, what about t(V)\cap E?
Is this nontrivial as E\subset Image(T)?
sure, but i think you don't need need to worry about that
since you can rearrange the expression to check first V cap E
idt this needs to be done separately?
it doesn't need to, no, but if they were concerned
it is easy to show it still works
Okay thank you so much
it p much goes
$$\brc0=V\cap tV\cap E=V\cap V\cap E=V\cap E$$
RokabeJintaro
w/o having to think whether any caps are 0
The only thought that messes me up is imagining V as a line through the origin. Then t(V)=V iff V is an eigenspace, right?
But not every line going through the origin is an eigenspace
all subspaces contain the origin
if they are 1D, you can imagine them as lines passing through the origin
also no, because an eigenspace can be associated to the 0 eigenvalue
then t(V) = 0
or are you talking only about automorphisms
Where have I gone wrong?
Ignore the stuff under the question btw that’s old
Oh my the angles are around the wrong way
Smh
careful with the magnitudes @wintry steppe
notice they tell you the direction and magnitude of the vectors separately
so double check whether they need to be scaled before you add the forces
ah nvm i just noticed you had already corrected that lol
quick one: does $\lambda \in \lambda(A) \implies \lambda \in \lambda(A^T)$?
Sydd
What other differences are there between positive symmetric matrices and stochastic matrices other than the entries of the stochastic matrices are probabilities between 0 and 1?
Yes:
\begin{multline*}
p_{A^\top}(x) = \abs{xI - A^\top} = \abs{(xI)^\top - A^\top} = \abs{(xI - A)^\top} = \abs{xI - A} \ = p_A(x)
\end{multline*}
So the characteristic polynomial of $A$ is equal to that of its transpose.
You can actually prove that $A\sim A^\top$ but that’s more complicated.
How do we know $\det(M^T)=\det(M)$ without using the result I want to prove?
Sydd
It’s a pretty well known identity
you can use the definition of det, $$det(A) = \sum_{\sigma \in S_n} \epsilon(\sigma) A_{1\sigma(1)}A_{2\sigma(2)}\cdots A_{n\sigma(n)}$$
\def\sign{\operatorname{sign}}
Like:
\begin{multline*}
\det(M^\top) = \sum_{\sigma\in S_n} \sign(\sigma)\cdot\prod_{i=1}^n[M^\top]{i, \sigma(i)} = \sum{\sigma\in S_n} \sign(\sigma)\cdot\prod_{i=1}^n[M]{\sigma(i), i} = \
= \sum{\sigma\in S_n} \sign(\sigma)\cdot\prod_{i=1}^n[M]{i, \sigma^{-1}(i)} = \sum{\sigma\in S_n} \sign(\sigma^{-1})\cdot\prod_{i=1}^n[M]{i, \sigma(i)} \\sum{\sigma\in S_n} \sign(\sigma)\cdot\prod_{i=1}^n[M]_{i, \sigma(i)} \
= \det(M)
\end{multline*}
Makes sense, thanks
Ignoring all of the mistakes I made, that’s the idea
That’s a funky font
I am stuck in something slightly related: if $A$ is square and symetric is clear that $|A|2=\sqrt{\lambda{\max}(A^T A)}=\sqrt{\lambda_{\max}( A^2)}= \lambda_{\max}(A)$
I think this should be true for unsymetric A as well, but I couldn't rigorously prove this
Or it isn't?
This is the norm of A? So like sqrt(tr(A A^T))?
what does the trace has to do here?
Different norm, ignore that
Is $\lambda_{\max}$ the maximum eigenvalue?
yes
Sydd
Well, I’m a bit confused why this is right, because if the spectrum of A is say {-1, -2} and the spectrum of A^2 is {1, 4} the sqrt of the maximum eigenvalue of A^2 is 2, but the maximum eigenvalue of A is -1
You're right, I think that's bad... But the should the 2-norm of A be the biggest eigenvalue in module?
It should be the maximum absolute value of A’s eigenvalues I think
Anyways, regarding your original q, why should this be true for non symmetric matrices?
It is true that $|A|_2= \sigma_1$ in general, where this is the first (biggest) singular value. The thing is I am a bit lost on how the singular values and eigenvalues of $A$ are related (when $A$ is square)
Sydd
You should probably wait for someone who knows more than me to answer, but I’m not sure if there always is a relation like this
Like you have one if A is symmetric (which you showed)
this here is bad because of what you mentioned about negative eigenvalues etc
but something like that
like this maybe
Yeah
trying to figure out if this is the same as the 2-norm in general

