#numerical-analysis
1 messages · Page 23 of 1
you look at the rgb values and see by which amount each of the 3 layers was modified it
and you do the inverse operation
(assuming it was the same transformation applied to every pixel)
i.e. you open it in paint or photoshop or w/e and edit it back
im pretty sure it was
on photoshop, levels say wing is the whitest thing
not "whitest thing"
100% sure KNOWN to be white BEFORE adding in the extra color
because some pixels could turn white with this color transformation, and then you're stuck
okey, again xD lets assume wing is 100% white before adding purple
lets see what i get
anyway, the easiest approach is to augment the dataset with colored images so that the network learns to ignore color
why are you so stubborn
i give up on this one
dude, just assume this is true for the wing. Whats next?
whats next? i already have my white pixel
like, blending mode substract?
i think substract isnt
i mean, makes sense
since i chose wing as white, so the color of the wing - the color of the wing = 0 = black
but i guess this is not what u want
OH OH
OH
HOLD
is not 100%
but is close
divide
idk if thise 2 are exactly the same
but first one is dividing the color of the wing, the most white one
and the second picture is
color of the wing, invert it, and color dodge
The Color Dodge blend mode divides the bottom layer by the inverted top layer.
which yeah, is the same 🙂
so... if i had a real pixel which is 100% white on the original image, will i get the same exactly image?
any pixel for which you know the original color and the transformation did not produce clipping
white works best
and... there is still no way to know which pixels was white after a transformation, right?
or there is if i know the transformation?
xd
need two out of three from (input, transformation, output)
and even then, it will only work SOME of the time
HA
Okey, i got the input and the output
how can i know the transformation
xDDD
3000iq
it's what you just did
the operations you did in whatever software you used are the inverse transformation
what i did was copy the color of the wing on the purple image, and divide it to the purple image
so i have to multiply the image but the purple pixel on the image?
well, i cant find it exactly. Can i, somehow, with python for example, guess the relation (or anything) between input and output?
doing it in python is tough because there is rescaling and shifting going on, too, which your software was hiding from you
just multiplying can go out of the range of the pixel
But it you have things where nearby pixels get blurred together
Then it quickly becomes infeasible
i can rescale them on python before trying to find it
give it a shot, then
but also what dantalion said. this'll work only for this color stuff
but, idk what may i do between both images to see whats the transformation
not for general aberrations on the images
my guess is that it can be represented as an affine transformation
yeah, but even if it will work only for this color, the transformation is gonna be the same. I mean, if it is a multiply, i it will be always be a multiply, just different values
but the transformation is always gonna be the same
So im working on this problem
is my sketch clear and thought process
im currently stuck on how to solve for shock
The shock trajectory is where characteristics intersect
Hmmm
I'm not sure that shape is correct
What does LeVeque say
Isn't there a formula for the shock speed
is this it
Time for my monthly appearance on this channel... Is anyone familiar with Monte Carlo integration? In particular w/ the crude mc. I'm trying to figure out what's gonna be the variance of my experiment.
I'm using a book to guide my studies and this is what it says
but how exactly is sigma^2 calculated?
hold up
that's the entire page
what if I don't know the value of the integral of f(x)?
Do you not know what f is?
I do, but for the purpose of this exercise I have to assume that I don't know the value of the integral. Ideally, I should have a formula for the variance that does not rely on it
You want to compute the variance of an estimator for a function in a way that doesn't involve the function?
...yes, or something like that
Does that make sense?
well I know it doesn't, but this is so confusing I may be formulating my questions wrong
for reference, a few weeks ago I was doing some MC simulations to estimate a value for pi and found the very same roadblock - how do I find the variance if I have to assume I don't know its true value? Eventually I figured I could see the experiment as Bernoulli trials and the variance was given by p(1 - p)/n where p is the estimated proportion and n the number of trials (and this variance does not depend on the true value of pi)
I'm also very not smart at stats and probability
So I could totally be misunderstanding you
well I figured I could ask in #probability-statistics but then I think the problem will be on the applied math side. I asked some of my colleagues if they came up with something
I'll let you know once I get somewhere if you're interested
Yeah this is an appropriate place to ask
i suspect it should actually be something like p is the true proportion, and we plug in hat{p} since this is the best we can do
yeah, I'm reading into that rn and it seems the variance for the sample proportion is p(1-p)/n
I tried running some simulations using that and the standard error seems to be about 1 digit away from the "true" error
Well I think I should try with a function that is easy to integrate analytically to make sure lol but overall I'm confident this is the way to go
also, you should be able to estimate sigma from the sample
Let’s say I have a sequence $x_1, x_2, \ldots,x_{n-1}$. I want to compute the DFT of every initial sequence, so for each $k<n$ I want to compute the DFT of the sequence $x_1, x_2,\ldots,x_k$.
Unfortunately, $n = 1024$ and I need to do this ~10,000 times, so I need to speed this up significantly if it’s going to be feasible at all.
StellaAthena
So we already have the optimization from updating xi_n-1
Then, at step n
The computation cost is
- Update xi_{n-1}, which is (n-1) multiplications and (n-1) additions
- Compute the last entry of xi_n which is new, which is n multiplications and n additions
For a total of 4n-2 computation at each step
@glossy yoke
It’s neither, it’s for replicating a novel transformer architecture
@wide spear A computation here means an addition of multiplication of real numbers right?
Not of matrices
Yes one flop
You should not actually be performing any matrix multiplication
Wait
I am inting
DFT_{n+1} does not have DFT_n as a submatrix

Sorry I lied
That’s what I had thought lol
We can use the fact that the roots of unity are structured though
Hmmm. Except the coefficients don’t line up. For k = 2, x_1 is multiplied by -1. But for k = 4, x_1 is multiplied by i
Wait
x_j gets multiplied by e^(2πi j/k) for every j < k < n
Does it mess things up if you zero pad the initial sequences?
Yeah the coefficients don't match up
Well ok
I'm assuming that O(n^2log(n)) is too slow for your purposes?
Also, all the roots of unity are algebraically independent
If the coefficients are coprime
Or something
So you can't really do a snappy multiplication to fix them
Just get more compute 

Or maybe, is it possible to zero pad everything to length 1024, compute DFT, then transform the output so that it is as if you didn't zero pad?
Why does padding to length 1024 do anything?
Why wouldn't you directly compute the smaller DFT?
We have to compute e^(2pi i j/k) for many i j k, but if k is always 1024, this sounds good
I don't think computing these is the bottleneck
Zero padding and computing DFTs would be O(n^2) time/flops (assuming exp is a constant number of flops) I think (algorithm is basically to do a prefix sums thing)
Yes (I am assuming here you mean dft as fft), but if we let X be the output of a DFT, then the optimization comes from computing X_i simultaneously for all k (under the notation/variables from Stella's post above), where we use the naive DFT algorithm. As opposed to computing each DFT/FFT individually
#include <math.h>
#include <complex.h>
void initial_seqs_zero_pad_dfts(int n, double *x, double complex *dfts)
{
// dfts[n*k+i] stores the ith DFT coefficient of x1,...,xk,0,...,0
for (int i = 0; i < n; i++)
{
double complex prefix_sum = 0;
for (int k = 0; k < n; k++)
{
prefix_sum += x[k] * cexp(2*M_PI*I*k*i/n);
dfts[n*k+i] = prefix_sum;
}
}
}
(not sure if this runs/is entirely correct, but this is what i was thinking)
why would it be?
Two for loops
so n^2 right, and theres no implicit row/column multiplication
Or
We are doing matrix-vector multiplication n times
Doesn't this code compute the dft of a single vector
i wish i knew dft, all i look was the code 
also for this slide very minor thing but should it be "given u_0,...u_n-k-1" or something?
rather than u_0,...,u_k-1
nvm im being dumb again
k is looping over 0... n so this will compute all dfts (although it is dfts of zero padded sequences)
dang too many ppl blackbox dft
at the most it's basic fourier analysis
you should really learn it
8da I have no clue what your code is doing
Computing a dft naively is a matrix-vector multiply
Which is two nested loops
i'm pretty sure doing it with fft is still faster
there's also no point to the zero padding
indeed
lots of numeric methods and numeric analysis. also optimization, sigproc and compressed sensing have required me to do a lot of so-called "scientific computing", some differential equations, etc

so my job is pretty much pimping/making new imaging methods for high dimensional data sets. read papers, pretend i understand them, modify them or make new stuff, then code it
it is basically just, DFT_n is indeed not a submatrix of DFT_{n+1}, but with zero padding, it is true
this would be n^2, whereas doing n independent ffts would be n^2 log n. but whether zero padding is good/worthwhile depends on if it is possible to obtain the DFT of the unpadded signal from the DFT of the padded signal, or that the DFT of the padded signal is otherwise useful.
remember zero padding does not create any new info
it just interpolates between the points you already have
if you're just doing 0 padding, there is nothing new in the DFT output
also, doing all the fft operations in parallel is also a thing. you can do the whole alg over slices of a 3d array
if you do the matrix multiplications, it's gonna be the complexity of multiplying 2 matrices, no matter how you do it
but this is like saying every matrix multiplication is O(n^3) (or whatever fancy strassen algo you use), disregarding the structure of the matrices
well, the fft does precisely that, doesn't it? exploit the structure of the matrix
but your code is only multiplying
that is O(n^3)
fft exploits the symmetry in the matrix
why is it n^3? there are 2 loops
and each loop carries out a dot product?
you could say that the inner loop computes a dot product, or that it computes n dot products
dot products are sums over n elements
it's another loop
it can be done very efficiently, yes, but it anyway has a cost
if you ignore this, then the fft is even faster
it is computing n dot products, in the sense that it is computing dot products for n DFTs
this would make the fft code order n log n
(when applied to a matrix)
you can just code the two things yourself and compare how long they take to run
naive DFT over dense matrices is never a good idea
it's precisely what you were against here
this
oh, what's more, i've had cases where i need to compute only a select few fourier coefficients, and tried doing so with a rectangular DFT matrix keeping only the desired rows
it was still faster to do a larger fft and then just subsample that
(over multidimensional matrices, ofc)
i checked my code, it works correctly for x = [1,1,1,1] (as in, it computes DFT([1,0,0,0]), DFT([1,1,0,0]), DFT([1,1,1,0]), DFT([1,1,1,1]). checked with wolfram. actually, it is off by a factor of 2, for some reason)
interesting
different people software packages normalize the dft and idft, and fft, and ifft, differently
how optimized was your DFT code?
very, cuz it has to work with matrices of several terabytes on my laptop
but it's better for you to convince yourself, go try it out
🙂 i'll be surprised if fft is faster, for sufficiently few "select fourier coefficients" (also what did you mean by multidimensional matrices?)
sure, if you keep only 1, i'd be hard pressed to think that an fft would be faster than a single dot product
but it was for a compressed sensing application, so the number of coefficients never dropped below some threshold, some percentage of the total
by multidimensional i meant that the fft'd matrix was a 5d array
some tensor to map from 3d to 2d
interesting, from a quick look a "few coefficients DFT" is performing slower than a "simple" fft i found online, i will have to take a closer look at the code...
Is this hell, how do you expand K3
likei know but K2 has like 7-8 terms
wait i think its not too bad
since u can basically ignore quite a few K2 terms
told ya. it would have to be a super degenerate case for a subsampled dft to be faster than an fft
the fft doesn't even take dot products
lots of symmetry to exploit
but i haven't looked at the code, maybe it is being cheaty with eg inlined functions
and even faster if you need more than one coefficient, since the dft matrix is vandermonde and also you can compute some columns from the previous ones by factorizing them
since it's vandermonde equispaced... at least for regularly sampled vectors
as soon as you even have to create the full dft matrix, without even having to reach the multiplication part, your code is already slower
oh, i didn't create a full dft matrix, just the parts i need (well, not that this speaks well to its performance but)
i mean all the elements in each row
oh
like you wanna transform a vector of length n
the fft does not need to compute n distinct components
i would say the rule of thumb is to always use fft, unless you have a really special case in which the transformed array has super nice properties. and then you still wouldn't use a dft, but rather a modified dft specifically for your purpose
i should have studied fft better when i was in college 🙂 , i am having a hard time reasoning as to when fft should be faster, i will need some time to think through this...
normal dft will never be faster than fft
for sure
Yes it is hell, it gets even worse for rk4, and it gets even worse for vector rk methods
i spent 2 hours texing my derivation lol
Sounds about right
Imagine knowing how to write advanced mathematica scripts

look
being lazy has a benefit
which is forcing yourself to use mathematica effectively
anyone good at triangle ineq business
i have a scheme and I want t oshow its total variation diminishing
Oh my
😫
Sorry what is the scheme you are working with?
this form
What is D and C
yeah if i can somehow triangle ineq
then maybe reindex
i juust dont know how to deal with these sigmas and triangle ineq
but the TV derivation is staring at my face with those ineq
if i can get C+D to be less than 1 and have TV on the right side of the ineq
if any scheme can be written in the form above, then its TV diminishing
so prob the entire class, i cant use example
my attempt was to write (v^n+1 _ m+1 ) - (v^n+1 _m)
as you can see
and then take abs val
Which of the terms can you replace with TV?
Can't you rewrite the left hand side as a TV as well
I think that some of these other terms can be rewritten as TV as well
First factor out all the constants
Yes
i messed up the second term
Don't you get some more TV
if i add 1 to m my last term becomes C_m+1/2 |v^n||_TV
eh
this seems wrong
things are canceling out
and i end up with
Norm(v^n+1) TV= Norm(v^n)TV
but i want <= not =
i think thats why i wanted t ouse triangle ineq
How annoying
I think in the end you will want to end up with TV(v^n+1)=(C+D)TV(v^n) or something
Anyways it's all going to be these cancellations and stuff
right
i will use the fact that C+D <= 1
and then TV(v^n+1) <= TV(v^n)
its just applying triangle ineq before i reindex i think
without giving it away do u know how i can triangle ineq here ? @wide spear
ok 😄
But I think you should be able to figure this out
You have mismatched parentheses
The last line
Oh my bad
I see
I don't think that's what you want
Isn't this what you got before
TV(v^n)
well
i did the triangle ineq
or at least I hope i did it right
so the <= is there
if i can show (D_m+1/2 - D_m+1/2 - C_m+1/2 + C_m+1/2 + 1) <= 1
then i shyould be good right
i mean these terms cancel and im left with 1
Don't you have $TV(v^{n+1})\leq(D-D-C+C+1)TV(v^n)$?
Taboo Epiphany Garden: Salem
yes
So this just gives $TV(v^{n+1})\leq TV(v^n)$
Taboo Epiphany Garden: Salem
so my question is
and here is why i think i messed up somewhere
im suppose to use the fact that C + D <= 1 some where
So
In pdes
A big thing is finding conserved quantities
Or other types of bounds
But how do these translate to numerical schemes?
Like if you have a pde and you've proven that some quantity is conserved, how can you design a scheme around this?
For hamiltonian odes you have symplectic integrators
But that's just for the energy
What if something more complicated is conserved?
The most naive way would just be to have some numerical scheme and check the conserved quantity at each step
And if it has deviated too much, then reduce the grid size/time step
But this isn't really satisfactory
The same way that symplectic integrators are really good for hamiltonian systems
If you have a non-computable set A, and if you define a function f(x), where $f(x) = 1$ if $ x \in A$ and $f(x) = 0$ if $x \not\in A$, then is f considered total computable?
rustyimpact
i think u will see more chance of people helping if u also post in #proofs-and-logic since its computability theory
i mean wrong channel
uh
Yes this is better in foundations
in irl is it common for the need to solve ODE rapidly?
What do you mean by rapidly
A system of odes that has 10k entries?
eh maybe yea or just 10k different ones
Most industrial applications are at least this large
ic ic
you often do stuff like separation of variables, too. for stuff that needs a large frequency window, this means solving the same differential equation for each frequency bin, which means even more odes.
I apologize for the newbie question.
I am reading a paper, and I am confused why the network is giving so many outputs.
Wouldn't there be only one output?
It gives four outputs
One at each resolution
I would guess that those four are combined
To give a single output
Sorry, I didn't understand this.
Would you please elaborate?
Well you down sample a bunch of times right
Or
You down sample 3 times
So there are four resolutions that the network is working at
The original and the three new ones
And then for each of these four you output one thing
Ok
Yes
But what X, i, j r and W represent?
X is the input
i is the i-th entry of the output
j is the variable you sum across
r is the dilation
W is the convolutional kernel/filter
And may I know in the batch normalization what does gamma and beta represent?
Not sure
One is probably the sd before normalization and the other is probably the mean before normalization
Would this be the appropriated channel for optimization-related stuff?
Just curious, btw. Hehe
I would argue it could fit in #advanced-analysis depending on the problem. But either way I agree that this is probably the best channel
I don't know where to start 
p_L = p_max and p_R = 0 was for 10.2.3
but here they give p_L = 300 and p_R = 0
like what is it asking for, simulate as in implement?
hm i found this
would the lax wendroff scheme in that link work
Is lax wendroff monotone?
Anyone here know about compressed sensing?
I'm vaguely aware of high dimensional linear regression, at the level that I may or may not be able to solve exercises from a textbook, if that counts
Convert your data into various flowrates of bitrex based on spectral density https://en.wikipedia.org/wiki/Denatonium and use your nose from there
Denatonium, usually available as denatonium benzoate (under trade names such as Denatrol, BITTERANT-b, BITTER+PLUS, Bitrex or Aversion) and as denatonium saccharide (BITTERANT-s), is the most bitter chemical compound known, with bitterness thresholds of 0.05 ppm for the benzoate and 0.01 ppm for the saccharide.
It was discovered in 1958 during ...
What
Don't actually do that, I've just seen people visualize big matrices like this. And thought heck what if you did that with smell.


Edd's job is smelling

@rapid ridge what sort of boundary/interface condition are you imposing?
so its two ideal fluids with different desnities, and at the interferance they have the same pressure, the fluids have different velocotu fields and different velocity potentials
and the interferance is at y = n(x,t)
so ive basicly gotten to this point
but idk where to go from here
no they dont, they are seperate, independant of z coordiante, ideal fluids
What dynamics
What equation governs the evolution of the boundary
thats what im trying to find
im trying to find the dynamic boundary condition
well dynamic surface condition
well we usually do it with sruface conditions between air and the fluid, and we get
so im trying to find a simular thing but instead of the surface of a fluid with the atmosphere
the fluid with another fluids
where they have different densities
but pressure equals at the their boundary
yeah i know but i was giving and example of what im trying to find
im trying to find soething of this form
but instead of between the fluid and air between two fluids
ive literllay given you all the infomation
i dont think its the complex
i just dont know how to go about it
but dw if u dont know
Well, how do you go about it in the air-fluid case
Do you have gravity in your case?
yeah
In this derivation are you assuming that the air has no pressure/velocity
the pressure on the surface = air pressure
and i think this is just descrbing particals below the surface
The derivation you've presented is describing the surface eta
I think the only thing different in your case is u1 and u2 being different
It's Bernoulli's equation which gives the dynamic boundary condition between the air and fluid. If the fluid is like water which is way denser than air you get this. The surface motion is a wave, solvable analytically in the linear case. Maybe some nonlinear too, but I'm not familiar with those. Gravity is the restoring force for this type of wave just like a pendelum. Draw a picture of what your scenario is at the start so you get the BCs right, a deep water wave has very different dynamics from a finite water depth. But deep water is a limiting case of finite water depth if you've done everything right up to that point.
The type of wave motion which most people are familiar with are waves that occur on the free surface of water. For example, the ripples that occur when a small rock is dropped into the water or the waves that can be seen breaking on a beach (Pinery Provincial Park on Lake Huron, below). In this type of wave motion the restoring force is gravity ...
They specifically want fluid-fluid interface
That can be done with these techniques too you need another dynamic boundary between those two fluids. Those are internal waves, they occur through the stratification of the ocean. It can be done in a continuous way too like it is there, or in the case of like a tank with oil which separates into different portions due to density it's a little easier. Because then you know where exactly the height is the internal wave occurs.
https://uwaterloo.ca/applied-mathematics/future-undergraduates/what-you-can-learn-applied-mathematics/continuum-and-fluid-mechanics/internal-gravity-waves
Internal gravity waves are waves that occur in density stratified fluids in the presence of a gravitational field. They arise as gravitational restoring forces act on vertically displaced fluid. Internal gravity waves are all around us, but difficult to see. They are ubiquitous in stratified lakes, in the ocean, and in the atmosphere. They are e...
O shoot that's the wrong link. I have to find the one with the derivation.
Internal waves are waves which, as the name suggests, occur in the interior of a fluid. You may be asking what sort of restoring force makes waves in the interior of a fluid possible, or you may question how important waves which do not occur on the surface of a fluid are. The answer is that just like surface waves, gravity is the restoring forc...
There you go, I despise the Waterloo website for that course
gucci thanks!!!
Yo this is late but I'm so glad you brought up that picture.
I'm confused by x=psis
I know it means x is approx. sparse in some basis. But I feel like it should be psix=s where psi is something like the DFT matrix.
The way brunton has it written seems like psi is the IDFT matrix instead?
Why do you feel like that
they're right, btw
at any rate, the way it's written, it's ambiguous whether the signal is sparse in its native domain or in spectral domain
so psi could be either a DFT or an IDFT mat
DFT vs IDFT does not make that much of a diff in my experience
As long as you keep track
If you start with IDFT you DFT later on
If you start with DFT you IDFT later on
It's all the same
the thing is
Up to some signs
if you're doing an inverse problem, you don't need the opposite operation
Sensible
still, it's the same up to conjugation and anyways you probably want either a real signal (as in real vs imag vs analytic) or an envelope, so it wouldn't matter that much
gonna take an abs later down the line
I saw this in an early CS paper by candes
Wth does it even mean to say that you know some fourier coefficients f hat already?
It only makes sense to me to have partial measurements in the original domain
you're mixing up the two matrices
there's a compression matrix and there's a dictionary matrix in which the desired vector is sparse
say we have a signal vector s that is sparse in a dictionary D
then in s = Dx, x is sparse
but you usually don't measure s directly, you'd rather exploit the sparsity and use CS
here they call that omega, the c ompression mat
so omega s = omega D x
Well if F is the DFT matrix,
Don't you get the fourier representation of data x by multiplying the DFT?
That's why I think it's
$F*x=y$
fajitas
i already agreed with you above regarding that
it depends on which domain has a sparse representation
for images, sparsity occurs in the spectral domain
so as you said, it would be sparse in the IDFT dictionary
but you measure with a compression matrix. this would be a subsampled DFT matrix
image = IDFT * x, where x is the sparse spectral decomposition of the image
My bad I didn't read what you said yet
so you can now apply a compression mat
C * image = C * IDFT * x
if you choose C as a DFT mat, then C * IDFT is a selection matrix with zeros almost everywhere and 1's along a jagged diagonal (with gaps)
though it would probably be easier to just puch out holes from the image directly, since using a DFT is almost like inverting the original dictionary. if it can be done, nice. but normally it's not so easy
I think I finally understand it now lol
Hmmm maybe I should read more about dictionary learning. I'm coming from matrix completion and never learned compressed sensing precisely so I have some gaps I'm trying to fill right now.
well
cs makes the assumption you already have the dictionary
if you need both, you have a bloind problem. many of these can't be solved at all
you should be fine even without dictionary learning
Ahhh okay I think I see what you mean.
In MC you sampld random entries of the matrix. I'm assuming in CS you have a similar idea but for a vector.
Yet I keep seeing weird measurement matrices like gaussian or randomly 1&-1. Is there a measurement matrix that just map a vector to it's partial observations in it's original domain.
Forgive me if you have already clearly answered this lol
a subsampling matrix, yeah
take an identity matrix and throw away random rows
this works of the signal is sparse in the original domain for a dictionary of something like point spread functions, of these PSFs are so large that you can observe a good chunk of them even if you throw away most of the info
i'll give you an example from my work
when doing radar and ultrasound imaging, a single reflector gives something like a hyperbola that is observable by several sensors
if you already know that this shape is a hyperbola ahead of time, you need only very few observations to find out where the hyperbola is
this is equivalent to parameter estimation on a grid
Hmmm 👀. So the discarded rows correspond to unsamplex entries of the partially obzerved vector?
yep
instead of throwing away rows from the identity, you can also just set them to 0
same thing
Ahhh I see
I'm sorry for being dummy, I think you may have answered this already but I wanna be doubly sure
If I wanna use CS on an image that is sparse in the fourier domain, what is my explicit optimization problem?
let's see
let $\boldsymbol{C} \in \mathbb{C}^{m \times n}$ be the compression matrix, while $\boldsymbol{b} \in \mathbb{R}^n$ is the complete vectorized image. assume we know the image $\boldsymbol{b}$ is sparse in the frequency domain, so that if we use an IDFT matrix $\boldsymbol{D} \in \mathbb{C}^{n \times n}$ as a dictionary, we get that $\boldsymbol{b} = \boldsymbol{Dx}$ for a sparse vector $\boldsymbol{x} \in \mathbb{C}^{n \times n}$ that represents the image's spectral lines
Eρρa
then one would want to solve the problem $\min_{\boldsymbol{x}} \Vert \boldsymbol{x} \Vert_0 \text{
s.t. } \boldsymbol{Cb} = \boldsymbol{CDx}$
Eρρa
Ohhhhhhhh now it's starting to make sense
You explained it better than brunton's video lol
so we want the sparsest x that, when mapped through the dictionary, gives us the image
and because the image is sparse in this domain, there is still unique recovery from a smaller number of observations
and yeah, i've seen that guy's video before. it wasn't very good 😛
lots of other videos of his are great tho
btw you'd never go for the original $\ell_0$ problem
Eρρa
unless it's super small
Yeah you use l1 right
you'd use the convex relaxation with $\ell_1$ and possibly solve the langrage form instead
Eρρa
my favorite methods for this are FISTA and STELA
Hmmm I've never heard of those, I'll look into them. I'm used to just using proximal operators for MC
right
fista is fast iterative shrinkage/thresholding algorithm
iterative shrinkage/thresholding (ISTA) is the result of using the proximal of the l1-norm (this gives soft thresholding) and using gradient descent (which is a proximal of the 1st order taylor approx of the 2-norm)
Ahhh interesting, I'm even more curious now lol
stela is soft-thresholding (as above) + exact line search for the updates
and fista is ista with nesterov acceleration
to be completely fair, the stela "line search" is also more of a momentum kinda thing, like in nesterov
By acceleration do you mean that there is some sophisticated mechanism to decide the step length?
I've heard of momentum before in NNs like rprop and Adams but know a bit less about it lol
one of the characteristics of vanilla gradient descent is a "zig-zag" path toward the minimum
acceleration usually introduces "momentum"
at each gradient step, you don't follow the negative gradient, but rather a convex combination of the current gradient and the old one
this gives a smoother curve, instead of a path that changes direction by 90° at every step
as for the step length, us lazy people look for the lipschitz constant of the gradient of the cost function and just use 1/that constant as a fixed step size
in ML you can just let the network learn it for each step
Ah I see what you mean. This sounds very very fancy lol
How does nestorov acceleration differ from something like the update direction used in CG?
afaik, CG uses conjugate directions instead of directions that are perpendicular w.r.t. the previous error
so the CG ones are conjugate using a symmetric matrix other than the identity. they don't actually form 90 deg angles
Yes
in nesterov, you take vectors that do form 90 deg angles w.r.t. the previous step, and take a convex combination
I mean you sort of implicitly do this with cg
Because of the krylov sub space thing going on
you're right, i was just reviewing CG rn on wikipedia
the other difference that comes to mind is that nesterov's accel uses a fixed weight for the momentum
the way you combine the old and new directions is always the same, and does not depend either on the matrix nor on the update vectors
that's a little weird, but the dude showed some weird asymptotic optimality for a sequence of weights

Interestingly, the guy who made RNNs Michael I. Jordan was pressed on a podcast about what his favorite bit of math is (maybe worded different) and he mentioned nestrov acceleration
there's several papers that attempt to interpret how it works
black magic acceleration

Spooky
this is a nice read on it https://faculty.wharton.upenn.edu/wp-content/uploads/2016/07/nesterov_nips(1).pdf
i'll admit i've read the first 2 pages several times, but never the whole thing lol
ANy1 have this book?
Numerical Solutions for Partial Differential Equations: Problem Solving Using Mathematica
trying to find a scheme I can use to simulate traffic flow
how do i plot the z-component graph?
apparently it's plotted like this
but i don't understand whats going on
Your Complete Guide to All 27 Blend Modes in Photoshop! Learn the science behind each blending mode and how they work. In this tutorial, we will go through several examples to illustrate the uses and also the math that goes behind it.
Starting with the concept of Blend Mode groups and how each group has a common property, we will dive deeper to...
what are the maths on this blend mode?
Video is timed for the blend mode i want
Hi.
This is homework. I'm supposed to find a polynomial of degree 3 or less which best approximates the function $x \mapsto |x|$ uniformly on $[-1,1]$ (I mean $\sup$-norm). Any hints or ideas? I know I'm supposed to find a polynomial $p(x)$ and 5 points (pairwise distinct) that exhibit an alternating sign behavior, as in $x_1, x_2, x_3, x_4, x_5$ in $[-1,1]$ such that $R(x_i) = K_0 \sigma (-1)^i$, where $R(x) = |x| - p(x)$ $K_0$ is some number and $|\sigma| = 1$ ($K_0, \sigma$ don't depend on the $i$'s in the previous statement).
I'm trying to assume (hopelessly) things like "maybe x_1 = -1, x_5 = 1, x_3 = 0$, ... see what happens, etc.
phao
Oh.... I got it!!!
If I consider $p(x) = x^2 + a$, then for $a=0$, the graph of $p$ for $a=0$ is "below" the graph of the abs function. Intuitively, The largest between the two is for $x=\pm 1/2$. Intuitively, it's clear I can keep increasing $a$ gradually to
phao
reduce this distance at $x=\pm 1/2$ and increase at the extremes ($-1,1$) and at $0$. There will be a particular "critical" a in which all these five values will be equal in absolute value
phao
the five values meaning the difference between $p(x) = x^x + a$ and $|x|$ at these five points: $-1, 1, 0, -1/2, 1/2$
phao
Then it's a matter of finding this critical $a$, which is a simple calculation and verifying that the intuition that this will give the alternating sign behavior will work..
phao
what are the maths on color blend mode?
Video is timed for the blend mode i want
Your Complete Guide to All 27 Blend Modes in Photoshop! Learn the science behind each blending mode and how they work. In this tutorial, we will go through several examples to illustrate the uses and also the math that goes behind it.
Starting with the concept of Blend Mode groups and how each group has a common property, we will dive deeper to...
Linear interpolation probably
what's it called, didn't seem to be at any specific place
you can find most formulas if you google the names of the blend modes specifically
saturation?
can someone explain this slide? Why did they choose a_k = 1/sqrt(k). Doesn't the series sum of a_k^2 still diverge
it looks to me like they mixed up 2 approaches?
they gave the definition of the square summable step sizes, but later they use the diminishing but not summable stepsize
those 2 should both converge to the true solution, but as you said, the one shown here is not square summable
makes sense, and the diminishing but not summable stepsize even though don't satisfy the condition in first line, still satisfies $f_k^\star-f^\star \to 0$ right?
Anticipation
yeah
thanks
@prime kraken hey do you have time for another compressed sensing question?
a few mins to spare, sure
I was wondering if you had an idea about what the name of the measurement matrices that describe observing single elements.
As opposed to gaussian measurements and one-bit measurements
In other words, what would the projection operator $P_\Omega$ from matrix completion be in compressed sensing?
fajitas
nope, no idea if that has a name
hmm
P_omega is the projection onto some subspace where only the indices in omega are nonzero, yeah?
if so, this would be a "subsampling matrix"
you can do a projection of this kind with a row-subsampled identity matrix
if you're studying this from a more mathematical perspective, i'd just call it a projection. in engineering, depending on the context, you might go with selection or subsampling
Yeah!
Thank you for that insight. I might ask another question in the future but I'll let you go now lol
Yeetus
Ok and what is A
And you have no constraints on the LP?
No no no for your problem
Do you not have a specific LP you are applying this theorem to?
Ok
What is p in the theorem
It doesn't have any additional meaning?
Ok
So what is your A and b for this problem
Well, what have A and b been before?
Right A and b are the constraint matrix/vector
So $A=\begin{bmatrix}1&2\1&-1\end{bmatrix}$ and $b=\begin{bmatrix}5\2\end{bmatrix}$ so you have $Ax\leq b$
Go_A
Ok so how do these fit in to the theorem
Yes
So what do you think x is
Ok so next you need to find u and v
Ok sure
Well, you can compute b-Ax right
u needs to be perpendicular to that
So then you can find u
Yes
So u can be anything
I think
I feel like you are missing something about p
Hey guys, I'm a little stuck on this question, do you think using induction to prove this would be correct? I could do it based on the cardinality of the set I i think
We're going to need more details
Now that I'm looking at previous questions I don't think this would be the right channel for this question sorry.
Ok
finite automata questions go here
Oh, okay. So I was thinking that I could prove this is true as a base case where |I| = 1
This can be seen since Λ(S₁) ⊆ Λ(S₁)
So we can assume this holds for |I| = n where n is fixed but arbitrary.
Now we need to show this holds for |I| = n+1
Suppose set K ⊆ I where it contains every element except for the last and set W ⊆ I that only contains the last element in I
So Λ((∪ₖ∈K Sₖ) ∪ Sᵥᵥ) ⊆ (∪ₖ∈KΛ(Sₖ)) ∪ Λ(Sᵥᵥ)
We know that Λ(S ∪ T) ⊆ Λ(S) ∪ Λ(T) which is very similar to the above
where S = ∪ₖ∈K Sₖ and T = Sᵥᵥ
So Λ(∪ᵢ∈I Sᵢ) ⊆ ∪ᵢ∈IΛ(Sᵢ) holds for |I| = n+1
This is how I was thinking it could be solved.
One thing I wanted to confirm is is Fourier transform based on stone weiestrass using trig functions?
Or nah
Since it’s an algebra that vanished no where and whatever the second condition is
Separates points
And also Fourier series is just a periodic version of Transform? Or what’s a correct idea of their difference
fourier series is for periodic functions, transform is for square integrable functions
Oh I’m being kind of stupid
i don't see the relationship to stone-weierstrass
Cuz any interval would make the period the interval
So approximating on the interval means outside the interval it’s periodic
Hm and for square integrable functions I guess the difference is it could go to infinity?
So using Fourier series won’t work because of that
what goes to infinity
The function to be appeoximated is defined in unbounded set?
Also I guess even with Fourier series feel like stone weiestrass is not the most general conditions since doesn’t the series even approximates discontinuous functions
I think you're asking about how the Fourier transform is defined and why it works
For both the Fourier series on an interval and the Fourier transform on R, you want to define it for functions in L^2(T) or L^2(R)
To do this, you first define it for a dense subset
And then extend it to the whole space
Yea kind of and Our prof would be covering fft for 3 lectures but I’m just want to learn more considering the wide applications
The Stone-Weierstrass part is not important
You can take a Fourier transform of any L^2 function
You can also define it for distributions
Ic
The Discrete Fourier Transform can be viewed as a Fourier transform on the integers
You might be interested in harmonic analysis on locally compact abelian groups
Which is the general abstract theory
So would discrete Fourier be some numerical scheme for the Fourier transform?
The FFT is a particular implementation of the DFT
Lol my current knowledge is a 3b1b video 
discrete fourier and discrete "time" fourier are for periodic and aperiodic funcs on a discrete domain
You can read Stein and Shakarchi's book on fourier analysis
Oh ok thx
After that you can pick a book on signal processing (Edd probably has a suggestion for this)
I’ll just read the first part cuz seems like later parts get involved
With heavy analysis 
Yes, you need to do analysis to study the Fourier transform
Hi, what is meant by step size in the runge-kutta method?
dt
ah ok
i am stuck with this problem, i applied the 2nd order Runge-Kutta method and got
$x_{1}=(\frac{\lambda h }{2}+1)(\frac{\lambda }{2}+1)$
zammie
I was thinking to write the solution in form
$x_{k+1}=Ax_{k}$
zammie
What does the update look like for newton's method
zammie
this is what I used to calculate x1
basically this is not correct
$b_{1} = b_{2} = \frac{1}{2}$
zammie
$c_{2} = 1$
zammie
$a_{21} = \frac{1}{2}$
zammie
this is what's needed for the 2nd order method ?
I assume you dont need the $t_{0} + c_{2}h$ part of the function as the ODE hasnt got t in it
zammie
so I calculated in x only
The correct update should be $x_{k+1}=\frac{x_k}{2}+\frac12f(t_k+h,x_k+\frac12hk_1+\frac12hk_2)$
Horror of the Stars
So you need to solve for k_2 implicitly
Because $k_2=f\qty(t_k+h,x_k+\frac12hk_1+\frac12hk_2)$
Horror of the Stars
$k_{1} = \lambda$
zammie
is this wrong?
should the k1 and k2 there be k0 and k1?
I mean it doesn't really matter how you index them as long as you're consistent
k1 is not lambda
Well
It is for t=0
so neglecting t and finding the solution in x only is the wrong approach
$k_{1} = f(t_{0}, x_{0}) = f(x_{0}) = \lambda x_{0}$
zammie
is there a way to generalise it for any t?
isn't the derivative at x=0 $\lambda (1)$
Sure
zammie
which is just lambda
Sure
why doesn't k1 = lambda then
k1=lambda for t=0
but t doesn't matter right
right
So k1=f(x) is not always lambda
i thought $x_{0}$ and x(0) were the same thing
zammie
Sure
x_k is the numerical solution at k*h and x(t) is the true solution at time t
They are they same at k=0/t=0
What is k2
the formula or?
Well, what's the formula for it
if I go by your formula it's
in the formula from my lectures, k2 is only on one side
That's probably because the butcher tableau for that is different
a_22 is not zero
In the problem you sent
Yes the butcher tableau describes an implicit method
we only went through this very briefly in lecture so I am quite lost
$k_{2} = f(x_{k} + \frac{h}{2}k_{1} + \frac{h}{2}k_{2})$
zammie
like this?
Ok but what is f
Ok and what is the formula for f
$f(t, x) = \frac{\dot{x}}{\lambda}$
zammie
f(x)=lambda*x
So what is k2
Ok
one sec
Can you solve for k2 now
yes
Like this?
Yes and what is k1
f(x0)
Horror of the Stars
ah ok
Replace x1 and x0 with x_k+1 and x_k but yes
Yes
Ok can you simplify this
It doesn't matter if you can't I guess
Ok now you have the actual problem, showing that this converges to 0 for all h>0 if Re(lambda)<0
I see now
x_k+1 = 0
actually wait
disregard what i just said
It should not be 0
yep
$x_{k+1} = \frac{x_{k}(2+\lambda h)}{2- \lambda h}$
zammie
Can you do the problem then
this converges to 0 because the top is always negative and the bottom is always positive?
i dont know
What if lambda is complex
It is not the conjugate
should I put it into complex and try to rationalise
That might work
+?
the denominator?
oh right should bem ultiply
rationalising this I don't think I'm getting anywhere
multiply top and bottom by 2 +ah + bhi
the top is complex and the bottom is real
We want to show $\lim_{k\to\infty}\abs{x_k}=0$. We have that $\abs{x_{k+1}}=\abs{x_k\frac{2+\lambda h}{2-\lambda h}}=\abs{x_k}\frac{\abs{2+\lambda h}}{\abs{2-\lambda h}}$ so we want to show that $\frac{\abs{2+\lambda h}}{\abs{2-\lambda h}}<1$. We know that $h>0$ and $\Re(\lambda)<0$ so for $\lambda=a+bi$ we have $\frac{\sqrt{(2+ah)^2+b^2}}{\sqrt{(2-ah)^2+b^2}}$ and $(2+ah)^2<(2-ah)^2$ so $\abs{x_{k+1}}<\abs{x_k}$
Horror of the Stars
In particular $C=\abs{\frac{2+\lambda h}{2-\lambda h}}<1$ is a constant so $\abs{x_k}<C^k\abs{x_0}$
Horror of the Stars
So xk converges to 0
how did you get to here?
If $z=a+bi$ then $\abs{z}=\sqrt{a^2+b^2}$
Horror of the Stars
why do we take the absolute value of everything?
We want to show that $x_k$ converges to 0. What does it mean for a sequence to converge?
Horror of the Stars
@wide spear then in this question, did we need the fact that x(0) = 1?
No
could it possibly help in anyway to get a quicker solution?
is there a way to show that the u's are linear independent? Or it just a probability thing with sampling
not only that actually but also show its pairwise orthogonal
Maybe for orthogonal you can do product to sum identities or something + the formula that eg sum of cosines is the real part of the sum of complex exponentials (euler's formula)?
product to sum identity?
oh
you know, the one from highschool, like cos(a)cos(b) = 1/2 * (cos(a-b) + cos(a+b)), or whatever
ye
oh ok i think i see how to do it, thanks
and yea once i prove orthogonal then its indep
you could also try properties of even and odd functions
Anyone here know about dual methods?
In optimization?
Yeah
To my understanding they try to find a saddle point (provided strong duality holds) by minimizing the primal variables then maximizing the dual variables
Usually I see them iterate as
- Update primal
- Update dual
But I'm curious if you can do
- Update dual
- Update primal
Sure
Sweeeeeet
Hi, is anyone familiar with left-preconditioned GMRES and knows where to start when trying to find an expression for the residual norm?
Currently trying to implement left-preconditioned GMRES but I need to compute the residual norm to stop iteration.
If you have $x_n$ as your solution at step $n$ then the residual norm is $\norm{b-Ax_n}$
Hellscape of Eternity
for a question c, why do we always assume the unit normal to be pointing in the opposite side of the force?
it's a convention for the sign of the energy and direction of forces
to tell apart forces and energies within a system and exerted on the system from outside
so do we always assume the unit normal to be pointing the opposite direction of the force like does it always hold
i origonally thought it was (-1,0,0) lol
and got all the signs wrong
if the force comes from the outside, sure
but the direction of the normal is not unique in general
physical coordinate systems are usually taken to be righthanded, with normals pointing outwards
your text book probably talks about this somewhere near the beginning of the chapter where surface normals were introduced
hmm ok thanks, ill have to read my notes again
oh lol
yeah
does explain the force acting on the opposite side of the foce

