Math Reading Group | EleutherAI | Page 1

sullen estuary Nov 22, 2024, 8:02 PM

#

A reading group for learning the Math found in ML papers (or not)

Official Github Repo: https://github.com/irregular-rhomboid/EAI-Math-Reading-Group
Playlist of past recordings: https://www.youtube.com/playlist?list=PLvtrkEledFjodK_UKua2h4exNMcWNGJrs

41129584-737480dc-6abb-11e8-8f73-7b9a9afcb38a-2714677480.png

GitHub

GitHub - irregular-rhomboid/EAI-Math-Reading-Group: Resources from ...

Resources from the EleutherAI Math Reading Group. Contribute to irregular-rhomboid/EAI-Math-Reading-Group development by creating an account on GitHub.

YouTube

Math Reading Group

Recorded meetings of the Math reading group. See https://github.com/irregular-rhomboid/EAI-Math-Reading-Group for slides, and the corresponding thread on the...

#

<@&1098008274361663629> Hey all. Since we have a newly christened math rg thread, I'm reposting the announcement for this sunday's session on gradient dualization. (see #1089231591785639946 message)

I just created an event for it: https://discord.com/events/729741769192767510/1309611044754427944

@sinful forge and @royal osprey will be talking about the recent work on gradient dualization and operator norms.

The recommended reading, in order of difficulty is

Please be aware that the recording of this session will only be available for a limited amount of time, so if you are interested in the topic, the best option is to attend the session live.

reef estuary Nov 24, 2024, 7:22 PM

#

sullen estuary <@&1098008274361663629> Hey all. Since we have a newly christened math rg thread...

What is the prerequisite for this?

sullen estuary Nov 24, 2024, 7:24 PM

#

reef estuary What is the prerequisite for this?

if you want the bare minimum beyond what's typically covered when learning ML, its

the notion of dual space from linear algebra https://en.wikipedia.org/wiki/Dual_space
Normed spaces in finite dimensions, and the notion of "topological dual" https://en.wikipedia.org/wiki/Normed_vector_space
Matrix/Operator norms https://en.wikipedia.org/wiki/Matrix_norm

reef estuary Nov 25, 2024, 10:10 AM

#

@sullen estuary

I remember mods say that the reading group will be recorded and then taken down, may I ask for a link of the recorded video?
Also one statement used to motivate the use of spectral norm for optimizers was:
||del Y|| <= ||del W||*||X||

Where ||del W|| is spectral norm and others are L2 norm. Can you cite this? Or give context for this?

Thank you very much

sullen estuary Nov 25, 2024, 10:17 AM

#

reef estuary <@125340496951377920> 1) I remember mods say that the reading group will be re...

I still need to edit and upload the recording. Coming soon™️
you can find that in one of the first papers listed in the announcement. This is basically a consequence of the definition of operator norms/matrix norms.The spectral norm is the operator norm when the norm on the input and output space are both the L2 norm

reef estuary Nov 25, 2024, 2:10 PM

#

@sullen estuary Does this imply that the more alignment is there between the input and weights, the better shampoo will work?

dusky marsh Nov 25, 2024, 3:24 PM

#

When you say alignment, do you mean an inner product or what?

reef estuary Nov 25, 2024, 3:25 PM

#

Yes

reef estuary Nov 25, 2024, 3:34 PM

#

reef estuary <@125340496951377920> Does this imply that the more alignment is there between ...

Or rather the more alignment between input and weights there is, the more difficult time adam has but shampoo will make it work?

unique axle Nov 25, 2024, 4:02 PM

#

alignment was heavily discussed in https://arxiv.org/abs/2407.05872

arXiv.org

Scaling Exponents Across Parameterizations and Optimizers

Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on parameterization by investigating a key assumption in prior work about the alignment between parame...

dusky marsh Nov 25, 2024, 4:42 PM

#

Reminds me of https://en.m.wikipedia.org/wiki/Good_regulator

Good regulator

The good regulator is a theorem conceived by Roger C. Conant and W. Ross Ashby that is central to cybernetics. Originally stated that "every good regulator of a system must be a model of that system", but more accurately, every good regulator must contain a model of the system. That is, any regulator that is maximally simple among optimal regula...

tulip gazelle Nov 25, 2024, 8:05 PM

#

@sinful forge @royal osprey what's a good pedagogical source for the details of the norm duality and constructions with different layers we were discussing at the end of the call?

sterile gate Nov 25, 2024, 10:04 PM

#

dusky marsh Reminds me of https://en.m.wikipedia.org/wiki/Good_regulator

(sorry if maybe very beginner question but) isn't this basically solomonoff induction? if the simplest algorithm that can control outputs by manipulating inputs must be an accurate model of the system, the simplest algorithm that can predict outputs based on inputs must also be an accurate model of the system, no?

dusky marsh Nov 25, 2024, 10:22 PM

#

sterile gate (sorry if maybe very beginner question but) isn't this basically solomonoff indu...

It isn't accurate in general to say that a map regulates the territory. Maybe you had something else in mind. There might be a different way to describe things that makes that work.

#

Regulating means controlling or manipulating.

#

The existence of that theorem is the only thing I know about cybernetics so don't lean on my knowledge too much.

sterile gate Nov 25, 2024, 10:26 PM

#

fair enough!

tulip gazelle Nov 26, 2024, 3:51 AM

#

tulip gazelle <@702353318084608031> <@287214596484235265> what's a good pedagogical source for...

I'm currently using "Modular Duality in Deep Learning" btw

sullen estuary Nov 26, 2024, 5:58 AM

#

tulip gazelle I'm currently using "Modular Duality in Deep Learning" btw

That's a good starting point imo. There's also "old optimizer, new norm, an anthology" which is very much pedagogical

tulip gazelle Nov 26, 2024, 6:19 AM

#

One of the worst things about chronic migraines is what it does to my memory. I spent five hours yesterday reinventing [the analysis I did off-the-cuff](#research message) a month ago >.>

#

On the bright side I now strongly believe that empirical investigation into the distribution of ||Tv||/||v|| in a trained model using semantically meaningful inputs is a good idea and have written this fact down in three locations so as to (hopefully) not forget it again.

dusky marsh Nov 26, 2024, 3:25 PM

#

sterile gate fair enough!

It might be accurate to say that a model regulates inputs by manipulating them until they are accurate outputs.

A model could be considered a map of how to turn a map of one part of the territory, the input data, into a different map of the territory, the output data.

sullen estuary Nov 27, 2024, 6:16 AM

#

<@&1098008274361663629> The recording is finally here. Y'all can thank @pliant peak for having recorded on his end.

Note that it is unlisted, and will be removed after a week. https://youtu.be/ZZPHm8eASOE

YouTube

EleutherAI

Math Reading Group - Gradient Dualization - Jeremy Bernstein and Fr...

▶ Play video

#

cc @sinful forge @royal osprey

gloomy cloud Nov 27, 2024, 6:32 AM

#

what are the extra insights of this talk, for people who have already read the papers?

sullen estuary Nov 27, 2024, 7:17 AM

#

gloomy cloud what are the extra insights of this talk, for people who have already read the p...

Not much I would say. There was some back and forth between franz and the jeremy about some of the questions that had been discussed previously, around the middle of the talk

slow parcel Nov 27, 2024, 8:59 PM

#

sullen estuary <@&1098008274361663629> The recording is finally here. Y'all can thank <@7413680...

Thanks for posting this!

gloomy cloud Nov 27, 2024, 9:01 PM

#

my current thought about orthonormalization replacing Adam is that all of its benefits come from the rows and columns being equal-sized (like Adafactor). I feel that pairs of rows being orthogonal is a downside

#

a mental testcase is a 2-input, 2-output model with a 2x2 matrix. the first channel of the input is scaled by 3x, and the first channel of the output is scaled by 3x. Adam can solve this perfectly. but if you orthonormalize the gradients, there is a lot of extra noise

sullen estuary Dec 16, 2024, 3:14 PM

#

Hey all, I'm thinking of restarting the reading group after new years. The goal would be to get into optimization (classical and stochastic), but before I want to do some category theory as a (mostly) shitpost. The question is whether y'all would prefer to get the basics from a textbook (I have found a good one already) or directly read some of the Cats4AI papers

#

<@&1098008274361663629>

sullen estuary Dec 17, 2024, 3:15 PM

#

sullen estuary

poll_question_text

Which way, western man?

victor_answer_votes

6

total_votes

8

victor_answer_id

2

victor_answer_text

Category theory for ML papers first

sullen estuary Dec 17, 2024, 7:57 PM

#

Well, the people have spoken. I'll prepare some papers to read in the coming days

sullen estuary Dec 25, 2024, 12:07 PM

#

Alright <@&1098008274361663629> Next meeting on <t:1736715600>. We will be talking about some Category Theory.

The recommended reading is the following

For general background on Category Theory, I'd suggest either Awodey's Category Theory, or Milewski's Category Theory for Programmers. Awodey is targeted for both math and CS people, and I'd recommend reading chapters 1,2 and 3 and skimming chapter 5. Consult at your own leisure.
Categorical Foundations of Gradient-Based Learning a short paper on applications to ML
The simple essence of automatic differentiation
A category theory framework for Bayesian learning

Bartosz Milewski's Programming Cafe

Bartosz Milewski

Category Theory for Programmers: The Preface

Table of Contents Part One Category: The Essence of Composition Types and Functions Categories Great and Small Kleisli Categories Products and Coproducts Simple Algebraic Data Types Functors Functo…

arXiv.org

Categorical Foundations of Gradient-Based Learning

We propose a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametrised maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a varie...

arXiv.org

The simple essence of automatic differentiation

Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep understanding, improvement, and parallel execution. This paper develops a simple, generalized AD algorithm...

#

https://discord.gg/eleutherai?event=1321449804009701377

white obsidian Dec 27, 2024, 11:26 AM

#

sullen estuary Alright <@&1098008274361663629> Next meeting on <t:1736715600>. We will be talki...

Hi! What is the time zone for the meeting?

sullen estuary Dec 27, 2024, 11:39 AM

#

white obsidian Hi! What is the time zone for the meeting?

The time stamps should automatically display in your local timezone

sullen estuary Jan 12, 2025, 8:58 PM

#

<@&1098008274361663629> The meeting is about to start. I've moved it to another VC to avoid interfering with the other RG

jovial sand Jan 13, 2025, 3:26 AM

#

A wee bit of intuition - https://www.youtube.com/watch?v=fRJMggrpxRU&list=PLffJUy1BnWj2dBiTZgQ1IDetIQQveRxYO&index=1

YouTube

All Angles

What is a monoid? | #SoME1

#abstractalgebra #monoids #some1

Monoids are everywhere in mathematics, but what are they? And why are they so useful? This video uses a simple example to show you exactly what the 4 rules of monoids are all about. I made this video for the 2021 Summer of Math Exposition. Enjoy!

If you like this content, you can support me on Patreon: htt...

▶ Play video

#

Associativity of the binary operation is the necessary condition that ensures the corresponding ternary operation behaves consistently.

#

Since associativity makes parallel reduction possible, identifying monoids goes hand-in-hand with identifying opportunities for parallelism. e.g. MapReduce etc.

jovial sand Jan 13, 2025, 4:03 AM

#

Monoids being the simplest associative structure (set + associative binary operation + identity).

sullen estuary Jan 13, 2025, 7:12 AM

#

jovial sand Monoids being the simplest associative structure (set + associative binary opera...

You can actually remove the identity to get a semigroup

#

But yeah, associativity is what you want for parallel computing

jovial sand Jan 14, 2025, 4:04 AM

#

Neat. This is pretty cool

#

credit: Tom Leinster

jovial sand Jan 14, 2025, 4:30 AM

#

operad Δ (finite probability spaces)

sullen estuary Jan 20, 2025, 11:03 PM

#

https://youtu.be/xMkOx5Gr0rA

YouTube

EleutherAI

Math Reading Group - Category Theory (12/01/2025)

▶ Play video

pale orbit Jan 24, 2025, 11:22 AM

#

I entered now to the discord and this reading group is so cool! Is the next session already scheduled?

sullen estuary Jan 24, 2025, 11:57 AM

#

pale orbit I entered now to the discord and this reading group is so cool! Is the next sess...

Soon™️

sullen estuary Feb 17, 2025, 8:20 PM

#

Alright. <@&1098008274361663629> Sorry for taking so long. Next meeting on <t:1742763600>. We will be talking about the theory of optimization.

The main source will be chapter 5 of Francis Bach's recent book: https://www.di.ens.fr/~fbach/ltfp_book.pdf

To go deeper you can look at the relevant chapters of Boyd and Vandenberghe's Convex Optimization https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf and Bubeck's review https://arxiv.org/abs/1405.4980

Have a nice week!

arXiv.org

Convex Optimization: Algorithms and Complexity

This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms. Starting from the fundamental theory of black-box optimization, the material progresses towards recent advances in structural optimization and stochastic optimization. Our presentation of black-box optimization, strongly influenced by ...

#

Now for the less good news. You may have noticed that that the time between sessions has been getting longer. This is in large part because I have much less time for this reading group than I previously did. While I would like for this reading group to continue, it will have to be at a reduced pace unless people volunteer to prepare sessions/take over from me.

sullen estuary Feb 23, 2025, 1:20 PM

#

https://discord.gg/eleutherai?event=1343210871559557245

#

forgot to create the event

sullen estuary Mar 23, 2025, 9:02 PM

#

<@&1098008274361663629> the meeting is starting now

slow parcel Mar 23, 2025, 10:05 PM

#

@sullen estuary This is the line search in TRPO. Page 14 and 15. https://arxiv.org/pdf/1502.05477

dusky marsh Mar 23, 2025, 10:06 PM

#

https://arxiv.org/abs/2109.03920

Interpretability is almost like trying to solve the inverse problem of identifying an objective function from observing a model's parameters and outputs, except instead of finding an objective function that describes the cost of incorrect outputs, we're trying to find a utility function that contains concepts humans recognize as familiar.

arXiv.org

Inverse Optimization: Theory and Applications

Inverse optimization describes a process that is the "reverse" of traditional mathematical optimization. Unlike traditional optimization, which seeks to compute optimal decisions given an objective and constraints, inverse optimization takes decisions as input and determines an objective and/or constraints that render these decisions approximate...

dusky marsh Mar 23, 2025, 10:29 PM

#

We're looking to solve the inverse problem, but with the objective function we discover parameterized in a way that's semantically meaningful to humans, so rather than as something like (yhat-y)^2, instead as "this model likes turning static into pictures of dogs".

sullen estuary Apr 8, 2025, 8:23 PM

#

https://youtu.be/NAio-8To7EU

YouTube

EleutherAI

Math Reading Group - Convex Optimization (23/03/2025)

▶ Play video

#

<@&1098008274361663629> Hey y'all. I finally got around to uploading the last recording. As I said at the end of it, I just don't have enough time or energy to run this reading group at any serious capacity anymore, so this was pretty much the last Math reading group.

It was fun to organize, and I used it as an excuse to learn/brush up on things on more than a few occasions, and I thank everyone who participated in this group.

tulip gazelle Apr 11, 2025, 6:37 PM

#

Thank you for all your work @sullen estuary ❤️

true field Feb 6, 2026, 5:46 PM

#

Hey Why dont we have voice channels or video channels

#

to study together?

#Math Reading Group