I'm working on a method for detecting changes in the environment dynamics in RL. My idea is to detect if the dynamics in a trajectory changes. For example, we can use a snippet of the trajectory and want to know whether it is possible to generate with the dynamics of the environment that we learn or not. For this task we can generate many examples with the real dynamics but we can't generate any examples with a changed dynamics. So we can assume this to be a binary classification problem where we want to determine whether the trajectory in the snippet could have been generated by the dynamics or not. Not that we don't know the dynamics and we have to learn that. What approach can be used to train this? Usually for binary classification tasks, we might use one-class methods. From my understanding this is used with anomaly detection. However, in AD, they often assume anomalies to be rare. For this problem, this may not be the case. In fact, a trajectory might be provided at inference that has a change in the dynamics towards the beginning in which case the remaining part of the trajectory will be with a changed dynamics. What other approaches could be used in this scenario for training? I have thought about synthetic generation of trajectories with a changed dynamics - we don't know how the dynamics works but we can potentially generate trajectories by skipping states, mixing two trajectories, etc. These would correspond to some possible changes but not to all possible ones. Perhaps they can be somewhat useful. Are there any other possible ideas?
#Methods for training binary classification models when data is only available from one class
55 messages · Page 1 of 1 (latest)
So I'm not familiar with RL, I'm just gonna brainstorm here if that's cool. I'm also pinging @sage hawk as they're more familiar about RL than me. But in general, we can frame this as "how far is this deviating from current?"
One approach is, if you have some sort of vector representation of the overall dynamics, you can compute cosine similarity and set a threshold to start with. One-class methods don't necessarily mean they have to be rare, they just have to be far enough from the normal output that, they dont' look like the normal output in a vectorized form.
One other fun idea though, is that once you have a network that can generate the normal ones, you can also analyze which parameters have outsized effect. I don't know if this works, to be honest, but in SPQR: https://arxiv.org/abs/2306.03078, it's found that the reason why compression of weights often fail, is that there are that 1% of weights have outsized impact on final output. Thus, by compressing all other weights but keep these important weights the same, we can achieve near lossless compression. If we take that idea, find those weights, then instead purposefully edit/change them, then it might be a parameterizable synthesizer that can generate all kinds of changed dynamics in a controlled manner. I'm not sure how to implement this, but that's a rough idea
Thanks for giving some ideas. I presume that for your first idea, you want to capture the dynamics with an embedding and then compare the embeddings. My problem with this approach is that a snippet won't contain the entire trajectory. If I train on the entire trajectory, then it may be difficult. A small snippet may not contain enough information about the environment dynamics. Potentially we can combine multiple snippets together and then pass that through an MLP that can create a dense embedding. However, how can that be trained? Normally, we might train this using contrastive techniques but we can't do that in this case. That's why I thought that a network should directly take a representation of the sequence and decide if it can be generated by the dynamics or not.
In terms of your second idea, I actually was thinking about something similar. My original idea was something like a transformer that can understand the relationship between states and then an MLP that takes these representations and labels the sequence of representations as either true or false. So in this idea I was thinking that perhaps we can use a model that is being trained to produce sequences by injecting some bias. If we assume that the dynamics is a black box function, y=f(x), perhaps adding bias would result in y' = f(x+b). Perhaps following your idea, we can change the weights by adding some bias to these important weights
or we can just change the bias in layers where these weights are located
Just wondering, would multiple snippets make up a bulk of the trajectory?
one trajectory consists of the starting state, intermediate states and the goal state. A snippet is just a porton of the trajectory. It's possible to have overlapping snippets potentially to help with training and have more data. But in general yes, multiple snippets can make the trajectory if they all are from that trajectory
if we were to take snippets from different trajectories and combine them then it's not guaranteed
Multi-input sounds possible, but I haven't had a formulated idea yet. How are these states represented? (dimension and dtype I mean)
each state is a vector. The values correpond to different properties. For example, in the Mountain Car problem (this is an environment from OpenAI's Gymnasium) the state is represented by 2 values - position and velocity. Then action is discrete number from 0 to 2. Hence, action can be one-hot encoded perhaps and the state is just that 2 dimensional vector. So state dtype is floats and action is an integer. Potentially, in other environments, actions can be vectors of floats too
also, in some environments states can also be represented as integers if there are a few discrete states
but atm I'm focusing on the case of states being vectors of floats and actions being integer numbers (potentually vectors of floats )
@wild hare I didn't make the message as a reply, not sure if you saw it, my bad
Just saw now. Sorry was away. But dang, this is far from what I'm familiar with. I was trying to think of multi-input network and create an auto-encoder or something so that there's a translation going on, but now that I see how varied these can be, and quite a hassle to set this up
I don't think you should be worried. We can just think of one of the cases. And we can also potentially make them in the same format. The discrete ones we first one-hot encode and then the states we normalise. We can then concatenate the two vectors and have inputs.
this also discusses options - https://quoraengineering.quora.com/Unifying-dense-and-sparse-features-for-neural-networks
I don't think that input would be a problem since each network will be trained for just the specific environment. The interesting thing is how to train it. Would a one class approach be the best or inferring threshold from validation data or the idea with generating the dynamics using the model that we train to understand the dynamics
If you have time, you can probably try both, just to see what's the difference. If time is constrained and you already have validation data, definitely the validation data. If you can frame it to become a self-supervised learning model, then you probably can get good embeddings out of that model to compare similarity already. The same model can still be used for everything you mentioned too
I'm just not sure how you're gonna frame it like that lol
#7 best model for Atari Games on atari game (Human World Record Breakthrough metric)
LOL
Not sure how relevant but still good to see
the current idea I sorta have is to have a BERT-style model that understands the dynamics inside a sequence. Then passing the output into an MLP network to decide if it is the true dynamics that genetated that sequence or the false dynamics. Perhaps, I can directly integrate such a class into the model and not need a separate model. For example, BERT has a CLS token that can be used. I wanted to avoid using transformers and perhaps use something else. I found this other architecture called gMLP which is interesting since it sorta does what a transformer does but with MLPs. Idk if it has advantages over a transfoemer but they found it to be comparable in performance. I think it has a little bit less parameters.
Hence, I can get an embedding of each state from the transformer which basically is a localised embedding. Then I can create a snippet embedding using an MLP perhaps. But that snippet embedding doesn't contain all the dynamics of the environment. In fact, two embeddings from two snippets of the same trajectory may have quite different embeddings since one snippet may encode just the dynamics of one action being taken in the environment. Ideally, to learn the dynamics, we have to observe many actions in many states
You can concatenate embeddings of all snippets and then perform self-attention too. I know you say you don't want full transformers, but at least this part is doable. This may be helpful to determine the pattern for overlap vs. not overlap automatically too
wouldn't that be similar to just using the entire sequence as input in the first place?
I just thought you couldn't get the entire sequence in the first place, and had to split into snippets
I can. My idea for splitting comes from time series where we often don't use the entire sequence but a window (sliding window) approach
Also this means the model can be smaller
Do you know anyone that is familiar with physics?
I can perhaps find time to ask, but based on what you've described, it's also possible to find ideas based on pure time series techniques (and perhaps nueral network augmented)
I don't think a pure approach is viable. Usually we can say that time series are somewhat deterministic. Or we model them like that. In RL dynamics can be non-deterministic so it's different imo.
for example, a friend's team was looking for prediction time taken from point A to point B from a CCTV footage, it's was hard because there are all sorts of angles, and the team wanted to have the client to annotate and use transformers. I mentioned Kalman Filter, because I thought it probably would be helpful in that case, and he later on figured out neural network assisted Kalman Filter
But after all, dynamics are, physics
In some environments yes, in others not really. In general, the dynamics is a probability function
Very broadly speaking
It doesn't mean pyhsics doesn't deal with probabilities though, but if you don't like the idea of exploring that angle, that's fine too
Maybe I can give an idea of how dynamics can change in one environment
It's again the Mountain Car environment
One variable of it is force. The higher the force the faster can the car go in this environment per say
So if we have sequence
s1 -> s2 -> s3 -> s4 -> s5 -s6
In the original dynamic
That is, this sequence corresponds to 6 time points
Now if we were to increase the force, the sequence may change to
s1 -> s3 -> s5 -> s7 -> s9 -> s11
Like skipping states
We can generate such examples
But we can't do the opposite which would be if the second sequence was the original one and we were to reduce the force
Like, we can't generate all possible examples
I call this example a proxy of the dynamical change because it kind of resembles one
But can't gwnerate all such important examples
Can we perhaps have a neural network learn a dynamics function and learn how to parameterise it? For example, can we learn f() and that for a given set of examples the parameters were x1 and x2. Then perhaps we can just change these parameters but perhaps not possible
It can learn f(x1, x2) but not f() and x1, x2 separately
Technically, as you'd know, neural networks don't learn a function. They approximate. I don't fully grasp your limitations, partly because I'm unfamiliar with the mechanisms, otherwise on top of my mind are questions like
- Exactly why can't one reverse engineered if a force is known?
- If the force is unknown, is it really the sequence we're trying to predict, or the force?
- How is this not physics related? Are we sure out of all the physics math out there, there aren't anything to model around this to approximate? (And here I don't mean deterministic formula, there are various physics-informed neural networks research papers that demonstrate how much more effective they are)
the idea of the project is to build OOD dynamics detection that would work in many environments. Usually, we don't have access to the underlying environment mechanics. We query the environment which tells us what the next state is. Hence, we can't really reverse engineer it. We can approximate the dynamics from observations but we can't really reverse engineer it.
In the example environment I gave, the force is unknown, so we have observations. However, we don't know the underlying mechanics so we can't approximate the force. All we can hope to do is to have a model that can approximate the dynamics with the default force. When that force changes, the dynamics will be different. If it is a prediction model, then it should make more erroneous predictions with the changed dynamics. However, a prediction model is probably not ideal because it may only predict one possible trajectory. In a non-deterministic environment, that would be a problem. Hence, we want something discriminative that can look at a trajectory and say that it thinks that given the dynamics it has observed, that trajectory is possible or not. This is difficult to do.
What I'm thinking about at the moment is to learn the relationship of the states, then use that to learn an embedding that can understand this relationship. Then given a starting state and this embedding learn a decoder that can reconstruct this sequence (even autoregressively). Then we can compare the two sequences. Idea is that we have one model that understands the sequence and can build an embedding that encodes the dynamics and another model that given a starting state (or states) we can generate the same sequence. If this works, that would mean the embedding understands the dynamics inside the snippet.
we may then proceed to trying to find a boundary of the embeddings and claim that embeddings outside this boundary correspond to OOD dynamics
However, I'm not sure what would be a good way to find this boundary
we also discussed the idea of generating sequences with OOD dynamics somehow