#Math Reading Group

1 messages · Page 1 of 1 (latest)

sullen estuary
#

A reading group for learning the Math found in ML papers (or not)

Official Github Repo: https://github.com/irregular-rhomboid/EAI-Math-Reading-Group
Playlist of past recordings: https://www.youtube.com/playlist?list=PLvtrkEledFjodK_UKua2h4exNMcWNGJrs

GitHub

Resources from the EleutherAI Math Reading Group. Contribute to irregular-rhomboid/EAI-Math-Reading-Group development by creating an account on GitHub.

#

<@&1098008274361663629> Hey all. Since we have a newly christened math rg thread, I'm reposting the announcement for this sunday's session on gradient dualization. (see #1089231591785639946 message)

I just created an event for it: https://discord.com/events/729741769192767510/1309611044754427944

@sinful forge and @royal osprey will be talking about the recent work on gradient dualization and operator norms.

The recommended reading, in order of difficulty is

Please be aware that the recording of this session will only be available for a limited amount of time, so if you are interested in the topic, the best option is to attend the session live.

reef estuary
sullen estuary
reef estuary
#

@sullen estuary

  1. I remember mods say that the reading group will be recorded and then taken down, may I ask for a link of the recorded video?

  2. Also one statement used to motivate the use of spectral norm for optimizers was:
    ||del Y|| <= ||del W||*||X||

Where ||del W|| is spectral norm and others are L2 norm. Can you cite this? Or give context for this?

Thank you very much

sullen estuary
reef estuary
#

@sullen estuary Does this imply that the more alignment is there between the input and weights, the better shampoo will work?

dusky marsh
#

When you say alignment, do you mean an inner product or what?

reef estuary
#

Yes

reef estuary
unique axle
#

alignment was heavily discussed in https://arxiv.org/abs/2407.05872

dusky marsh
#

The good regulator is a theorem conceived by Roger C. Conant and W. Ross Ashby that is central to cybernetics. Originally stated that "every good regulator of a system must be a model of that system", but more accurately, every good regulator must contain a model of the system. That is, any regulator that is maximally simple among optimal regula...

tulip gazelle
#

@sinful forge @royal osprey what's a good pedagogical source for the details of the norm duality and constructions with different layers we were discussing at the end of the call?

sterile gate
# dusky marsh Reminds me of https://en.m.wikipedia.org/wiki/Good_regulator

(sorry if maybe very beginner question but) isn't this basically solomonoff induction? if the simplest algorithm that can control outputs by manipulating inputs must be an accurate model of the system, the simplest algorithm that can predict outputs based on inputs must also be an accurate model of the system, no?

dusky marsh
#

Regulating means controlling or manipulating.

#

The existence of that theorem is the only thing I know about cybernetics so don't lean on my knowledge too much.

sterile gate
#

fair enough!

tulip gazelle
sullen estuary
tulip gazelle
#

One of the worst things about chronic migraines is what it does to my memory. I spent five hours yesterday reinventing [the analysis I did off-the-cuff](#research message) a month ago >.>

#

On the bright side I now strongly believe that empirical investigation into the distribution of ||Tv||/||v|| in a trained model using semantically meaningful inputs is a good idea and have written this fact down in three locations so as to (hopefully) not forget it again.

dusky marsh
# sterile gate fair enough!

It might be accurate to say that a model regulates inputs by manipulating them until they are accurate outputs.

A model could be considered a map of how to turn a map of one part of the territory, the input data, into a different map of the territory, the output data.

sullen estuary
#

cc @sinful forge @royal osprey

gloomy cloud
#

what are the extra insights of this talk, for people who have already read the papers?

sullen estuary
gloomy cloud
#

my current thought about orthonormalization replacing Adam is that all of its benefits come from the rows and columns being equal-sized (like Adafactor). I feel that pairs of rows being orthogonal is a downside

#

a mental testcase is a 2-input, 2-output model with a 2x2 matrix. the first channel of the input is scaled by 3x, and the first channel of the output is scaled by 3x. Adam can solve this perfectly. but if you orthonormalize the gradients, there is a lot of extra noise

sullen estuary
#

Hey all, I'm thinking of restarting the reading group after new years. The goal would be to get into optimization (classical and stochastic), but before I want to do some category theory as a (mostly) shitpost. The question is whether y'all would prefer to get the basics from a textbook (I have found a good one already) or directly read some of the Cats4AI papers

#

<@&1098008274361663629>

sullen estuary
# sullen estuary
poll_question_text

Which way, western man?

victor_answer_votes

6

total_votes

8

victor_answer_id

2

victor_answer_text

Category theory for ML papers first

sullen estuary
#

Well, the people have spoken. I'll prepare some papers to read in the coming days

sullen estuary
#

Alright <@&1098008274361663629> Next meeting on <t:1736715600>. We will be talking about some Category Theory.

The recommended reading is the following

Table of Contents Part One Category: The Essence of Composition Types and Functions Categories Great and Small Kleisli Categories Products and Coproducts Simple Algebraic Data Types Functors Functo…

white obsidian
sullen estuary
sullen estuary
#

<@&1098008274361663629> The meeting is about to start. I've moved it to another VC to avoid interfering with the other RG

jovial sand
#

#abstractalgebra #monoids #some1

Monoids are everywhere in mathematics, but what are they? And why are they so useful? This video uses a simple example to show you exactly what the 4 rules of monoids are all about. I made this video for the 2021 Summer of Math Exposition. Enjoy!

If you like this content, you can support me on Patreon: htt...

▶ Play video
#

Associativity of the binary operation is the necessary condition that ensures the corresponding ternary operation behaves consistently.

#

Since associativity makes parallel reduction possible, identifying monoids goes hand-in-hand with identifying opportunities for parallelism. e.g. MapReduce etc.

jovial sand
#

Monoids being the simplest associative structure (set + associative binary operation + identity).

sullen estuary
#

But yeah, associativity is what you want for parallel computing

jovial sand
#

Neat. This is pretty cool

#

credit: Tom Leinster

jovial sand
#

operad Δ (finite probability spaces)

pale orbit
#

I entered now to the discord and this reading group is so cool! Is the next session already scheduled?

sullen estuary
#

Alright. <@&1098008274361663629> Sorry for taking so long. Next meeting on <t:1742763600>. We will be talking about the theory of optimization.

The main source will be chapter 5 of Francis Bach's recent book: https://www.di.ens.fr/~fbach/ltfp_book.pdf

To go deeper you can look at the relevant chapters of Boyd and Vandenberghe's Convex Optimization https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf and Bubeck's review https://arxiv.org/abs/1405.4980

Have a nice week!

#

Now for the less good news. You may have noticed that that the time between sessions has been getting longer. This is in large part because I have much less time for this reading group than I previously did. While I would like for this reading group to continue, it will have to be at a reduced pace unless people volunteer to prepare sessions/take over from me.

sullen estuary
#

forgot to create the event

sullen estuary
#

<@&1098008274361663629> the meeting is starting now

slow parcel
dusky marsh
#

https://arxiv.org/abs/2109.03920

Interpretability is almost like trying to solve the inverse problem of identifying an objective function from observing a model's parameters and outputs, except instead of finding an objective function that describes the cost of incorrect outputs, we're trying to find a utility function that contains concepts humans recognize as familiar.

dusky marsh
#

We're looking to solve the inverse problem, but with the objective function we discover parameterized in a way that's semantically meaningful to humans, so rather than as something like (yhat-y)^2, instead as "this model likes turning static into pictures of dogs".

sullen estuary
#

<@&1098008274361663629> Hey y'all. I finally got around to uploading the last recording. As I said at the end of it, I just don't have enough time or energy to run this reading group at any serious capacity anymore, so this was pretty much the last Math reading group.

It was fun to organize, and I used it as an excuse to learn/brush up on things on more than a few occasions, and I thank everyone who participated in this group.

tulip gazelle
#

Thank you for all your work @sullen estuary ❤️

true field
#

Hey Why dont we have voice channels or video channels

#

to study together?