#Recursive template dependency

130 messages Β· Page 1 of 1 (latest)

prime light
#
template <auto S2next>
struct SM1 {
    static void state1() {
        return state2();
    }
    static void state2() {
        return S2next();
    }
};

template <auto S2next>
struct SM2 {
    static void state1() {
        return state2();
    }
    static void state2() {
        return S2next();
    }
};

SM1<SM2::state1> sm12; // error: SM2 requires template argument 
SM1<SM2<SM1::state1>> sm12; // error: SM1 requires template argument 

Since both templates depend on each other, that's obviously not gonna work as is. I could write a combined SM1and2 class having all functions, but that doesn't let me compose the classes arbitrarily (I'll have SM3, and 4 and so on).
What design do I need in order for that to work?

One way to get something equivalent would be to manually combine SM1 and SM2 into another class:

struct Combined {
  struct SM1 {
    static void entry() { return A(); }
    static void A() { return SM2::entry(); }
  };

  struct SM2 {
    static void entry() { return B(); }
    static void B() { return SM1::entry(); }
  };
};

However, I would like to be able to compose Combined: template <class firstSM, class secondSM> class Combined; using MyCombined = Combined<SM1, SM2>;

orchid carbonBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

dreamy lava
#

Someone correct me if im wrong, but not sure thats gonna work. Its gonna be an infinitely long template specialization, no?

#

also, why would you need this?

prime light
#

I got SM1, that does stuff and at the end of state2 calls another function. That function can be something like itself (SM2) which also needs a function to call at some point. And if I want to have a (mutually) recursive algorithm, I need SM1 to call SM2 and SM2 to call SM1, hence my issue

dreamy lava
#

and then forward declare the classes ofc

prime light
dreamy lava
#

?

#

maybe im missunderstanding something about what you want

prime light
#

You can remove SM2 from my example, and just try to instantiate SM1 with anything that depends on SM1 as its template argument

dreamy lava
#

i just see know why you would neccessarily make this a template if you dont know at compile-time what type you need

prime light
prime light
# dreamy lava i just see know why you would neccessarily make this a template if you dont know...

The user of the whole thing will know at compile time what SM1 and SM2 (or their equivalenet in the new design) are, but I want them to be able to compose any classes together

Actual use case is something like this : I have some tree search algorithm represented as a tail recursive state machine, and I want to have that search algorithm sometimes call another search algorithm. In order for the whole thing to continue executing upon the nested algorithm returning, the nested algorithm has to know which function to call. And because the nested algorithm can be used with many enclosing algorithms, or even as an enclosing algorithm itself, it has to be templated

dreamy lava
#
struct Base
{
    Base* Next;

    virtual void Foo() = 0;

    void CallNext()
    {
        Next->Foo();
    }
};

struct One : public Base
{
    virtual void Foo() override
    {
        std::cout << "One\n";
    }
};

struct Two : public Base
{
    virtual void Foo() override
    {
        std::cout << "Two\n";
    }


};

int main()
{
    One o;
    Two t;
    o.Next = &t;
    t.Next = &o;

    o.CallNext();
    t.CallNext();

    return 0;
}

// COUT
two
one
prime light
# dreamy lava ``` struct Base { Base* Next; virtual void Foo() = 0; void CallNex...

Yeah that would do the job in the case I described, but I don't want to pay for virtual calls or indirection when they shouldn't be there.

The code I'd want (after instantiation of the templates) is something equivalent to the following.

struct CombinedSM {
  void state_SM1_1() { return state_SM1_2(); }
  void state_SM1_2() { return state_SM2_1(); }
  void state_SM2_1() { return state_SM2_2(); }
  void state_SM2_2() { return state_SM1_1(); }
};

Except I really don't want to write that myself because I have a bajillion different combinations of SMs

dreamy lava
prime light
# dreamy lava yeah, it was just an idea. its good to be sceptic about virtual calls. i gotta g...

I also need to have some members or typedef dependent on the other SMs, hence my template requirement.
Regarding premature optimization, I'm in a domain where every bit of performance is beneficial, so there's never a point where I'll say "alright this is fast enough" (although if the effort required to gain 0.1% in the SM thing is huge I might apply my effort elsewhere). These functions will be called about 100million times a second (when I do the Combined SM thing), so yeah any overhead here will actually hurt

#

also I'm doing C++ so I want my 0 cost abstractions goddammit 😑

dreamy lava
#

damn, what you working on?

#

some ai stuff?

prime light
#

yup

#

chess engine

dreamy lava
#

ah okay, nvm then lol

#

but all the unique states and next state will be able to be known at compile time, right?

prime light
#

Yes

dreamy lava
#

so every state should be able to call every state's function?

#

if you somehow knew the type of the baseclass ptr, you could just cast it to the type to avoid vtable

#

but yeah, there has to be some way to make templates kinda work

#

atleast, intuition tells me that yamikek

#

but not exactly like you did ofc.

#

man, why did you post this. im too interested to sleep peeposad

prime light
#

Each algorithm (represented as a state machine) has an entry state and an exit state. When entered (always in the same state), it does its thing, runs its own states, and then it's done. When it's done, it needs to return control to the caller. Except I don't do regular calls (because stack overflow) but tail calls with manual "call stack" return address setting. So when the SM is done, it pops the stack and calls whatever the "return address" is (the next state). The caller however could have called the nested SM from different states, and thus the nested SM could call different caller SM states upon being done.

Something kinda like that (in very rough definitely not valid c++)


template <Next>
struct SM1 {
  static void state_entry() {
    if (cond) { return state_1(); }
    else { return state_4(); }
  }
  static void state_1() {
    push_to_call_stack({.state_after_return = SM1::state_3});
    return Next::entry_state();
  }
  static void state_2() {
    push_to_call_stack({.state_after_return = SM1::state_4});
    return Next::entry_state();
  }
  static void state_3() {
    return state_4();
  }
  static void state_4() {
    auto frame = pop_call_stack();
    return call_state(frame.state_after_return);
  }
};

//SM2 is basically the same thing as SM1 (maybe with different states and a different graph of transitions)

SM1<SM2> sm;
sm.entry_state();
// Trace would be something like this:
SM1::entry
SM1::1 // we push SM1::3 as the return address on the "call stack"
  SM2::entry
  SM2::X // push SM2::Y to the stack
    SM1::entry
    SM1::4 // pop stack, call SM2::Y
  SM2::Y // we pop  the "call stack" and call whatever is at the return address (here SM1::3
SM1::3
SM1::4 // pop the stack, call nothing because we're done

(I have the call stack part working, my only issue is the recursive circular dependency SM1<SM2<SM1<SM2<SM1<....

prime light
prime light
prime light
orchid carbonBOT
#

@prime light Has your question been resolved? If so, type !solved :)

sullen pelican
prime light
sullen pelican
#

no, it doesn't

sullen pelican
# sullen pelican this sentence makes little sense to me

In order for the whole thing to continue executing upon the nested algorithm returning, the nested algorithm has to know which function to call.
like this makes no sense, if the "nested algorithm" returns, why does it need to know what the next thing to do actually is
that's the responsibility of the caller

sullen pelican
#

it just sounds like you're reinventing function calls

prime light
sullen pelican
#

based on your description I assume you have no local and the whole state is in some sort of global which I already dislike by default

#

but assuming that your "tail recursive transformation" requires a fundamentally recursive setup where the callee has to know the algorithm it "transforms" (no clue how you phrase that properly) and the parameter of the transformation must fundamentally be templated, then you must properly define what kind of mutual recursion you want to have

#

because at this point your description implies that you mus resolve the infinite chain of template parameter, no question asked, because there is never a point where you go back to a previously "known state"

#

for one SM1 is not a type/class and cannot be used, and assuming it was it would be a fundamentally distinct type from SM1<SM2>

#

assuming SM2 is a standalone type

#

so when you refer to "SM1" what kind of SM1 do you actually refer to

#

when you deal with recursion, there must exist a "base case"

#

without understanding of what you call "sm1" or where you "base case" stands in all this it's about impossible to provide suggestion

prime light
#

I want to be able to define some SM, maybe like that:

  template <auto Next>
  struct SM1 {
    static void entry() { return A(); }
    static void A() { return Next::entry(); }
  };
  template <auto Next>
  struct SM2 {
    static void entry() { return B(); }
    static void B() { return Next::entry(); }
  };

That I could instantiate like this:

struct Foo { static void entry() {} };
SM1<Foo> sm1_foo;
SM2<Foo> sm2_foo;

That part is easy.

But then I also want to be able to somehow "compose" different SMs, maybe like this:

using CombinedSM1_SM2 = Combine<SM1, SM2>;

so that I end up with something equivalent to this "manually instantiated" version of Combined<SM1, SM2> (the end result I wanna get, or something equivalent to it : I'm using structs here, but that's not required. I also don't care about having actual nested classes (I think)):

struct CombinedSM1_SM2 {
  struct SM1 {
    static void entry() { return A(); }
    static void A() { return SM2::entry(); }
  };

  struct SM2 {
    static void entry() { return B(); }
    static void B() { return SM1::entry(); }
  };
};
sullen pelican
#

the "combination" sounds like it should have an "entry point"

#

and whichever "entry" between the nested sm1/sm2 you try to use, it's infinitely recursive and senseless

prime light
# sullen pelican ok but to a large extent I have to ask *why* and how you hope to make this work

The goal is to be able to split a recursive algorithm into multiple steps, implement it as a state machine so that I can either run 1 step at a time or as many as possible, suspend/resume the algorithm, and test that the proper transitions are used. And then, combine different recursive algorithms state machines together to make a combined algorithm.
If you're familiar with tree searches / chess engines, the idea is that I can run a minimax algorithm, but once some condition is met (depth>X or leaf reached) run another algorithm from there (maybe MCTS, or whatever).

If what you're confused about is the transformation to a state machine of the recursive algorithm: In the linked file is an example of different transformations. I started from enumerate_paths (L110) implemented in the conventional recursive fashion, and then transformed it into an iterative version (L126), then a state machine (L209), then a tail recursive state machine (L306).

I do have locals and the state isn't global, but I didn't include any state or anything as to distill my issue to the simplest possible example.

prime light
# sullen pelican still doesn't make a lot of sense as far as I'm concerned, how do you use that "...

CombinedSM1_SM2::SM1::entry() for example.
But yeah you're right, it would need an entry point for it to be easier to call. I'm okay with writing the above example if I can't find a way to do that. But that's the next step of the process.
In the examples I gave they are infinitely recursive, but in practice I do have base case where the thing stops (dependent on the members of the SM). I didn't include them to make the example simpler.

sullen pelican
#

the "combination" operation makes no sense as a result of that and that's specifically what you're asking about

sullen pelican
prime light
sullen pelican
# prime light The goal is to be able to split a recursive algorithm into multiple steps, imple...

The goal is to be able to split a recursive algorithm into multiple steps, implement it as a state machine so that I can either run 1 step at a time or as many as possible, suspend/resume the algorithm, and test that the proper transitions are used. And then, combine different recursive algorithms state machines together to make a combined algorithm.
initially it just felt weird to want this "transformation" but if you have suspend/resume requirements then I guess, depending on the exact context you're in I can see why you'd want a "state machine" to deal with that

If you're familiar with tree searches / chess engines, the idea is that I can run a minimax algorithm, but once some condition is met (depth>X or leaf reached) run another algorithm from there (maybe MCTS, or whatever).
if this is your main motivation for the suspend/resume however I wouldn't have taken this approach as a first choice

If what you're confused about is the transformation to a state machine of the recursive algorithm
no that's not what I found confusing, the confusing thing was what semantics you expected out of your "combined" thing

I do have locals and the state isn't global
since you're using clang::musttail you pretty much have what I called (in an overly simplified way) no locals and only "global" state
the example in the file you shared uses a class instance to group the relevant pieces of information to that one instance, but my original remark was that any state you need to share across your states and entry points and whatnot have constraints as to where they can be, and you pretty much need a "side channel"

#

anyway I didn't look in detail at the file you shared, my first instinct is that I probably wouldn't have written the iterative version that way, and that I definitely would have tried to avoid a "state machine"

sullen pelican
prime light
#

the infinite template recursion or compute recursion? the compute one is definitely not there in an actual use of this

sullen pelican
#

what I'm trying to say is that I didn't realize how the infinite recursion between the functions was indeed something you could "ignore", in that a more proper setup wouldn't exhibit that issue

#

the compute recursion

prime light
#

πŸ‘

sullen pelican
#

anyway if the result of combined is meant to be akin to what you manually wrote, then it should be fairly easy to solve I think

#

well now that's a "fun" one, the thing I threw together breaks on gcc trunk because it apparently recognizes clang::musttail

#

but isn't happy with it

#

it's whatever actually, in the minimal example the attribute does nothing as everything is just optimized out

prime light
#

I've had quite a few issues with musttail. You'll notice I implemented a version that relies on longjmp to avoid the stack overflow, as musttail isn't really reliable atm

sullen pelican
#

right I forgot to mention it, but you're setting up a bench for this right?

#

anyway I have to leave

prime light
#

A performance benchmark? Yup. state machine implementations are the fastest (in the file I sent earlier) whether it's with clang or gcc (on my i5-12400). Except for the recursive implementation when Game is compiled with gcc -O3, where it somehow is 3 times faster than anything else (hence the pragma push -O2 for Game).

time ./g++.out 11 11 rec
real    0m0,713s
time ./g++.out 11 11 iter
real    0m1,428s
time ./g++.out 11 11 sm
real    0m0,853s
time ./g++.out 11 11 tail
real    0m0,527s
time ./g++.out 11 11 jump
real    0m0,544s

time ./clang++.out 11 11 rec
real    0m0,903s
time ./clang++.out 11 11 iter
real    0m0,535s
time ./clang++.out 11 11 sm
real    0m0,278s
time ./clang++.out 11 11 tail
real    0m0,313s
time ./clang++.out 11 11 jump
real    0m0,333s
prime light
prime light
# sullen pelican anyway I didn't look in detail at the file you shared, my first instinct is that...

I wrote the iterative version that way so it would be quite generic and would let me see if applying it to a more complex algorithm would slow it down or not. If you also had this "constraint" in mind, I'm interested as to how you would write it (pseudo-code or just plain english is fine).
Why avoid state machines for that?
(there's probably an "optimized way" to write it to take advantage of the specifics of the Game, but that's not what I'm after)

#

I definitely failed to properly communicate what I was trying to achieve here. How would you have explained it?

#

I'll try to fit your solution into my use case and see if it works (before marking this thread as solved), but either way thanks a lot πŸ™‚

orchid carbonBOT
#

@prime light Has your question been resolved? If so, type !solved :)

prime light
#

!solved

orchid carbonBOT
#

Thank you and let us know if you have any more questions!

This thread is now set to auto-hide after an hour of inactivity

sullen pelican
#

though I guess your main issue right now is more about ensuring the stack doesn't explode

#

but point stands

#

if the recursive version has issue with tco and explodes the stack it's not viable, but that shouldn't be a factor as to how you try to bench the other things

sullen pelican
#

and I just don't like state machines at all, aside from academic and theoretical stuff

#

in practice about most state machines I've had the pleasure to have to work with were horrendous

#

unless I have a reason to, I just wouldn't default to using them

#

exploring perf and tradeoffs, especially if you have the time to go through the thing can be fine, but depending on the scale of it, it just favours state sharing and entities that are more global than they would otherwise need to be, which easily makes maintenance and evolution more of a pain

prime light
# sullen pelican I don't see why you'd want to make the thing overly generic, especially if you w...

Game has an unsigned int as its state. I can iterate over the indices of the bits which are 0. That's it. (enumerate_paths(Game, x) should return factorial(x)). Given how simple it is, I suspect GCC found a way to do some cursed optimization on it. That's why I set this part to O2 and the rest to O3. I'll bench it with other games and see if the "issue" persists. As for now, my goal was to bench the transformed recursive algorithm mechanisms and not the whole thing so it's fine.

The stack overflow issue arises from the state machine itself being recursive, not the algorithm. At most the original recursive implementation would be 64 calls deep with this current game (and given that the size of the tree is factorial in the depth, that's definitely not an issue).

The iterative version is UB? What? how?

sullen pelican
#

to a large extent point still stands, for a proper bench you'd use the target level of optimization and add a footnote for why certains entries are relevant/not relevant

#

to a certain extent the different transforms are meant to be theoretically equivalent in the end result, if you bench it's usually against some compiler/codegen/target system

#

vs practical issues and oversight which makes certain optimization impossible

#

I guess this is more of a validation of a maybe-viable-in-general transform and less of a direct optimization problem

#

so you're trying to get an idea of average gain/loss of performance

#

but that makes it even weirder imo

prime light
#

I want the properties of the tail recursive state machine (testable transitions, suspend/resume, composable). And I wanna make sure it doesn't absolutely tank performance.

sullen pelican
#

The stack overflow issue arises from the state machine itself being recursive, not the algorithm.
I'd have to reread it

The iterative version is UB?
I don't have it on hand but I'm fairly certain you had an array on which you decrement .begin which is never valid

prime light
#

I got this (the end condition of the algorithm). The only time where frame would point before begin would be right there, so I should never dereference something before begin

        --frame;
        if (frame < stack.begin()) { return; }
sullen pelican
#

it's not a matter of dereferencing

#

that iterator is UB, period

prime light
#

Oh, I'll have to reverse the order of these two lines then

sullen pelican
#

I don't quite think you got what I meant

#
Frame* frame = stack.begin() - 1;

is already in UB land

prime light
#

Ah yes there's that one too. Forgot about it :D

prime light
sullen pelican
#

I'd have to read the thing, but then it's not just a matter of swapping the two lines

#

it's not clear if that would end up doing the same thing

#

well I guess you meant changing the comparison as well before decrementing

prime light
#

Yup

#

and I'll just handle the initialization part as a special case, or have the used part of the stack start at [1] instead of 0. No big deal, but I didn't know it was UB

sullen pelican
#
constexpr bool operator == (BitScanIterator<T>) const { return 0 == bits; };

thinkythonk

prime light
sullen pelican
#

"needing" it is normal depending on where you used it and what constraints was on the thing you used it with
copy-pasting the code for the sentinel comparison is the odd part

hasty swift
# prime light I remember having copy pasted that because having the == operator with itself wa...

just one thing to add for this since I didn't notice anyone explain what your original error was about, it had nothing to do with the types depending on one another (even though getting it to build as-is in your OP would have led to a stack overflow in the execution of the calls). the error: SM2 requires template arg error was really just telling you that the innermost SM2::state1 didn't have a template argument specified. Since SM1 and SM2 are templated, even when calling a static function, the type isn't instantiated without specifying the template argument (i.e. SM2<Foo>::state1).

You can get something like the original snippet working with some minor adjustments: https://godbolt.org/z/KKjGxsxnG
Or if you make the objects callable (functors), you can do something like this: https://godbolt.org/z/haGh5jrnP
Using that 2nd approach you can even pass in lambdas as your callable as another approach to a similar problem: https://godbolt.org/z/Ea1c34vEn

And just to be clear, this is really just to help explain what the compiler was trying to tell you in that error message. The snippet SlyBach looks perfect if what you were really after was the combined struct, I can't compete with that haha. Use their example for your project, the links above are just for clarity about your error and how to approach the structure of this type of general concept in case if it matters for anything you're trying to work into whatever you're building.

prime light
# hasty swift just one thing to add for this since I didn't notice anyone explain what your or...

The issue was with mutually recursive calls. The solution you gave me would be perfect for arbitrary long chains of alternating calls, but not for infinitely long ones (which is what recursion requires). I was indeed after what SlyBach provided :p

It seems like I really need to work on my communication skills when it comes to explaining the problems I'm having, as basically everyone had trouble understanding what I was looking for.

I very much appreciate you writing this thoughtful and educative answer though, it would have helped a lot if that was what I was after! πŸ™‚

hasty swift
#

yup all good, mostly just posted it to explain that first error and to show workarounds. glad you got the solution you needed earlier πŸ‘

prime light
# sullen pelican https://godbolt.org/z/brebjf5bf

@dreamy lava in case you haven't followed the thread, I'm pinging you so you can see what the solution to the problem was since you were interested
(I hope I didn't forget to disable the reply ping :x)