#Mutable Iterator for disjoint pairs of non-consecutive elements

67 messages ยท Page 1 of 1 (latest)

sonic adder
#

I would like to code a function that iterates over a slice, and operates on two values for which the indices differ in one specific (but configurable) bit. It should be possible to parallelize this iteration, ideally using Rayon.

For example assume my slice has length 8. If this bit were bit 0 (the least significant bit), then the pairs of values would have the indices (0, 1), (2, 3), (4, 5) and (6, 7), so that this could be solved by using chunks_mut.
However if it's bit 1, these pairs would be (0, 2), (1, 3), (4, 6), (5, 7) and chunks_mut does not work.

I could use (and have used) nested chunks which I then split and combined with zip. This works, but is ugly and the performance depends on the index, and performance in this code is critical to me, so this is not a good solution.
Instead if I manually create a loop that iterates over the indices using some bitshifts, the code becomes much simpler and faster, but now I need to use unsafe code to write to borrow multiple locations mutably at the same time (note that this is fine, because no index is used more than once).

I would like to avoid unsafe code though, or if I use it, make sure that I actually need it.
So my question is:
a) Do you know of a simple and performant way to implement this in safe Rust, or a crate that implements this type of functionality?
b) If I do implement this in unsafe Rust, can I simply create mutable pointers, as long as I make sure that no index is used more than once?

#

Mutable Iterator for non-consecutive pairs of elements

#

Mutable Iterator for disjoint pairs of non-consecutive elements

deep lodge
#

i'm not aware of any general method of getting an iterator that yields an arbitrary permutation of mutable references into a slice though

sonic adder
#

I'm currently testing it in a single thread, but I want to be able to run on multiple threads, I forgot to mention that in my initial post, I'll edit it.

deep lodge
#

you wouldn't be able to use get_disjoint_mut to eg implement an iterator, since even if it allows you to get mutable references in a single call, you won't be able to perform another get_disjoint_mut call until the borrows from the previous call have been released. to see why this would make it incompatible with Iterator, consider that eg Iterator::collect would need to be able to collect into a vec that holds all mutable references at once, but get_disjoint_mut wouldn't let you get more than a fixed number of references at a time

sonic adder
#

Right, I was testing with ugly manual code where I iterate over an index range, this wouldn't work for a proper clean solution.

deep lodge
#

i can't personally see a way to create an iterator that would allow you to get mutable references in an arbitrary order other than using <[T]>::as_mut_ptr and then doing something like

fn next(&mut self) -> Option<Self::Item> {
    let i = self.indices.next()?;
    let p = self.ptr.wrapping_add(i);
    unsafe { &mut *p }
}

where you must ensure that it's not possible for self.indices iterator to yield the same index multiple times, nor an index that's larger than or equal to the length of the slice, and you must also ensure that you are tracking the lifetime of the slice correctly

#

i'd probably define the iterator type as something like

struct SlicePermIterMut<'a, T> {
    indices: Indices,
    ptr: *mut T,
    _marker: PhantomData<&'a mut [T]>,
}

impl<'a, T> SlicePermIterMut<'a, T> {
    fn new(v: &'a mut [T]) -> Self {
        Self {
            indices: Indices::new(v.len()),
            ptr: v.as_mut_ptr(),
            _marker: PhantomData,
        }
    }
}

impl<'a, T> Iterator for SlicePermIterMut<'a, T> {
    type Item = &'a mut T;
    // ...
}
#

Indices being a separate type so it can be independently unit tested to ensure that it never yields the same index or an out of bounds index

#

and then you can pretty easily chunk the yielded elements into pairs without any unsafe, using a separate iterator adaptor on top

#

keeping the amount of unsafe code to a minimum in a context where it's easy to audit what it's doing

sonic adder
#

Thank you, this seems to be exactly the info I was looking for!

ensure that you are tracking the lifetime of the slice correctly
Is this what the PhantomData field is for? What's the advantage of storing the pointer and this marker over simply storing a mutable reference to the slice?

sonic adder
#

Can I write something like this?

// Calculate index0 and index1

let element0 = &self.state[index0] as *const T as *mut T;
let element1 = &self.state[index1] as *const T as *mut T;
// SAFETY: index0 != index1
unsafe { Some((element0.as_mut().unwrap(), element1.as_mut().unwrap())) }

Building the pointer directly from the slice should ensure that it's in bounds and within the lifetime.
Also is there a way to write this more concisely, without these casts?

deep lodge
# sonic adder Thank you, this seems to be exactly the info I was looking for! > ensure that y...

yeah exactly. the reason has to do with some nuances with how lifetimes and pointer invalidation works with mutable references.

consider a type like this, which just yields a reference to each element in turn (bounds checking skipped for brevity):

struct Iter<'a, T> {
    slice: &'a mut [T],
    i: usize,
}

impl<'a, T> Iterator for Iter<'a, T> {
    type Item = &'a mut T;
    fn next(&mut self) -> Option<Self::Item> {
        let ptr = self.slice.as_mut_ptr();
        let item = unsafe { &mut *ptr };
        self.i += 1;
        Some(item)
    }
}

so what are we doing when we call self.slice.as_mut_ptr()? well, as_mut_ptr takes &mut self, meaning that you need to take a mutable borrow to the entire slice whenever you call it. it's effectively equivalent to calling:

let ptr = <[T]>::as_mut_ptr(&mut *self.slice);

the pointer we get will thus be tied to the lifetime of the reference that we used to create it, and the pointer follows all the same rules as the equivalent reference

as a simpler example, consider:

let mut a = 42;
let a_ref1 = &mut a;
let a_ref2 = &mut a;
println!("{}", *a_ref1);

we know that this will, of course, not compile, because we can't create another borrow (a_ref2) while a mutable borrow (a_ref1) is still alive. but what if we did the same but using a raw pointer instead?

let mut a = 42;
let a_ref1 = &raw mut a;
let a_ref2 = &mut a;    
println!("{}", unsafe { *a_ref1 });

the above code will in fact compile, however it is undefined behaviour, as it violates the same rules that the borrow checker was complaining about in the first version

the general rule to remember is: creating a mutable reference &mut a invalidates all existing borrows to a, including raw pointers

#

so to bring it back to the slice example, consider this code:

let mut a = [10, 20, 30, 40, 50];
let a_ref: &mut [i32] = &mut a;

let ptr1 = a_ref.as_mut_ptr().wrapping_add(1);
let ref1 = unsafe { &mut *ptr1 };

let ptr2 = a_ref.as_mut_ptr().wrapping_add(2);
let ref2 = unsafe { &mut *ptr2 };

*ref1 = 22;
println!("{a:?}");

this is also undefined behaviour, because when we call a_ref.as_mut_ptr() a second time, we are taking another mutable reference to the slice (essentially calling (&mut *a_ref).as_mut_ptr() as above), which invalidates the existing ptr1, and hence also ref1 because they borrow from the same slice. for our iterator what this means is that when we call self.slice.as_mut_ptr() in next() a second time, it invalidates the mutable reference that the previous next() call returned

the way to avoid this is to make sure that you only ever create a mutable reference to the slice once to get a pointer to it, and then never construct a mutable pointer to the full slice again until all borrows derived from the pointer have been released. so to correct the example above, we should only call as_mut_ptr once:

let mut a = [10, 20, 30, 40, 50];
let a_ref: &mut [i32] = &mut a;

// our single call to as_mut_ptr - after this, we only use this pointer
let base_ptr = a_ref.as_mut_ptr();

let ptr1 = base_ptr.wrapping_add(1);
let ref1 = unsafe { &mut *ptr1 };

let ptr2 = base_ptr.wrapping_add(2);
let ref2 = unsafe { &mut *ptr2 };

*ref1 = 22;
println!("{a:?}");
deep lodge
# sonic adder Can I write something like this? ```rust // Calculate index0 and index1 let el...

this is instant ub. you are never, ever allowed to take a shared reference &T and convert it to &mut T. in fact the compiler will even detect simple cases of this as unconditional ub and give a hard error:

let a = 42;
let a_ref = unsafe { &mut *(&a as *const i32 as *mut i32) };
println!("{a_ref}");

// error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
#

the only case where it is sound to cast a *const T to a *mut T and create a mutable reference to that, is if the *const T was itself originally derived from a mutable reference. but you may never write to a pointer that was derived from a shared reference, nor create a mutable reference from it

sonic adder
#

Or is it the same and the compiler just can't detect my version?

deep lodge
sonic adder
sonic adder
deep lodge
#

i'm telling you, that code is unsound

#

you cannot cast a reference like that

#

please trust me

sonic adder
#

Sorry, I'm not trying to argue, I believe you, just trying to understand ๐Ÿ˜…

deep lodge
#

no worries

#

but yeah my recommendation is still to call as_mut_ptr() a single time, and then use that to construct any later mutable references

#

as explained in my mini-essay above whenever you have a minute to read through it

#

it's often instructive to run tests of small snippets like this through miri (available in the playground via tools > miri), it can help you gain some intuition for what kind of conversions are allowed

sonic adder
#

Sorry, I should've read that more carefully before skipping to your response to my code.

sonic adder
deep lodge
#

essentially any shared reference &T gets tagged with a read-only tag, and that tag persists even if you cast it to a &mut T, and it's ub to write to a pointer tagged as read only, or create a mutable reference out of a read-only pointer

deep lodge
# sonic adder I believe this is what I am doing, as I do have a mutable reference to the slice...

also to clarify here, what i'm saying is that this is sound:

let a = &mut 42;
let a_const = a as *mut i32 as *const i32;
let a_mut = unsafe { &mut *(a_const as *mut i32) };
println!("{a_mut}");

because you're taking a mutable reference and casting it to *const T, and then going back to mutable

this, however, is unsound:

let a = &mut 42;
let a_const = &*a as *const i32;
let a_mut = unsafe { &mut *(a_const as *mut i32) };
println!("{a_mut}");

because we turn the reference into a shared reference before we turn it into a raw pointer.

essentially, you're free to cast back and forth between *mut T and *const T without losing; however you cannot go back and forth between &mut T and &T, because the very act of turning into a shared reference tags the borrow as read only

sonic adder
# deep lodge also to clarify here, what i'm saying is that this is sound: ```rs let a = &mut ...

After reading through your messages again, I think I was able to get it to work, thank you very much!
My code works, has good performance and runs without complaints from Miri.

However then I wanted to parallelize it using Rayon, which I was able to get to work, but now there are Miri errors.

While this added quite a bit of code, the only additional unsafe code I needed was implementing Send and Sync for my Iterator struct, so I assume my problem is there:

pub struct CustomIter<'a, T> {
    range: Range<usize>,
    target_bit: u32,
    ptr: *mut T,
    // TODO: Is [T] necessary here? Can I replace it by T or maybe even ()?
    _marker: PhantomData<&'a [T]>,
}

// SAFETY: TODO: CustomIter has no interior mutability, so this should be fine?
unsafe impl<'a, T: Send> Send for CustomIter<'a, T> {}
unsafe impl<'a, T: Sync> Sync for CustomIter<'a, T> {}

I hoped this should be fine since my struct has no interior mutability, and all my struct members are already send and sync, except for the raw pointer. But even there, and according to the Rustonomicon, "one could argue that it would be "fine" for [it] to be marked as thread safe". But I guess I'm missing something?

Also I was wondering if it's actually necessary for PhantomData to use a T slice, or if I could also write PhantomData<&'a T>.

Here is my code on Rust playground, together with an example function that triggers the Miri error: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=c4f5a1422f81eca8d1427f0e2ed3c5f7

deep lodge
# sonic adder After reading through your messages again, I think I was able to get it to work,...

just going to answer your question about PhantomData first before i have a look at the miri issues

the short answer is: in this case, technically, PhantomData<&'a ()> would be fine; however, it's very important to understand why it's fine, and i'd encourage you to stick to PhantomData<&'a mut [T]> (important: including the mut!) as a good habit

the long answer is: when you have a PhantomData<T> in a type, you are asking rust to treat the type as if it contained a T, even though it doesn't. this affects:

in this case, we can pretty easily see that all types are Unpin already, since *[mut/const] T and &[mut] T are always Unpin regardless of T, and the raw pointer will automatically make the type !Send and !Sync, so the auto traits are less of a concern. the more subtle yet important part is the variance of your type

so if you're not familiar, variance controls whether you're able to shorten/lengthen the lifetime of a type. for instance, if you have a &'long str, you're allowed to turn that into a &'short str for any lifetime 'short that's shorter than 'long. this is called covariance: you can shorten the lifetime 'a of &'a T. (hopefully that's intuitive to you)

#

now let's say that we have a slice containing &'static str: [&'static str]. what can we say about the variance of &'a [&'static str] and &'a mut [&'static str]? note that we have two lifetimes at play here: the lifetime of the reference to the slice itself, and the lifetimes of the str references stored inside the slice.

we can test this pretty easily: let's define two lifetimes 'long and 'short, and see if we can turn a &'long [&'static str] into a &'short [&'short str]:

fn test<'short, 'long: 'short>(strings: &'long [&'static str]) {
    let _s: &'short [&'short str] = strings;
}

this does in fact compile! &'a [&'b str] is covariant with respect to both 'a and 'b. more generally, &'a T is covariant with respect to 'a and any lifetimes in T. this should be pretty intuitive, i think: an immutable reference doesn't allow you to mess with the original data in any way, and shortening a lifetime doesn't allow you to keep the reference around for longer than you should

now let's see about the &mut [&str] case:

fn test<'short, 'long: 'short>(strings: &'long mut [&'static str]) {
    // works! we can shorten the outer lifetime
    let _s: &'short mut [&'static str] = strings;
    // fails :( we're not allowed to shorten the inner lifetime
    let _s: &'short mut [&'short str] = strings;
}

so why are we not allowed to shorten the inner lifetime? the reason is that it would allow something like this:

fn test<'short, 'long: 'short>(
    strings: &'long mut [&'static str],
    short_str: &'short str,
) {
    let shortened: &'short mut [&'short str] = strings;
    shortened[0] = short_str; // oops
}

in other words, it would allow us to insert a string with a non-'static lifetime into a slice that should only contain 'static strings. hence, &'a mut ['b str] is covariant with regards to 'a, but invariant with regards to 'b.

#

now consider your CustomIter<'a, T>. it yields the items &'a mut T which reference a slice &'a mut [T]. so what should the variance of CustomIter be with respect to 'a and T?

well, say we have a [&'static str] slice as above, so our iterator is CustomIter<'a, &'static str>. we are yielding mutable references, &'a mut &'static str, into the slice, making it vitally important that we cannot shorten the inner lifetime, or we would be able to write data with a shorter lifetime back into the slice. hence, it must not be possible to turn a CustomIter<'a, &'static str> into a CustomIter<'a, &'short str>. in other words, while CustomIter<'a, T> may be covariant with respect to the outer lifetime 'a, it must be invariant with respect to T

now as it turns out, *mut T is also invariant with respect to T, so in this specific case we don't technically need to do anything more to ensure its invariance, and it's technically enough to just include a PhantomData<&'a ()> to mark 'a as covariant. however, in my opinion, unless you know very clearly from the get-go exactly what the variance requirements are for your type, it's better and easier to simply take another type that your type should "act like", and just include it in a PhantomData. in your case, your iterator should "act like" a &'a mut [T], since it allows accessing mutable references just like a slice would, which you can signal with PhantomData<&'a mut [T]>. it doesn't technically make a difference compared to eg PhantomData<&'a mut T>, or having one PhantomData<*mut T> field and another PhantomData<&'a ()> field, but this way you don't have to worry as much that you got the variance right

#

(sorry for the wall of text, i didn't originally mean for it to be this long... if you don't have time atm just read the short answer, but i think it's important to understand this properly if you're going to be writing unsafe code)

#

gonna have a look at the miri issues now

deep lodge
# sonic adder After reading through your messages again, I think I was able to get it to work,...

oooooh. i don't think this is an issue with your code actually. it looks like crossbeam itself violates the rules of stacked borrows by converting an &Entry reference to a &Local, where Entry is the first field of Local. in c it's a common pattern to go from a FullStruct* pointer to a Header* pointer, where Header has the same field(s) as the start of the full struct, and then back to FullStruct* again, but in rust you can't go from &FullStruct to &Header and back to &FullStruct under what's called stacked borrows. from what i understand the newer borrowing model, tree borrows, is supposed to allow this, but it doesn't seem like miri's tree borrows mode is complete enough to support checking this code either

note that violating a rule of a specific model like stacked borrows doesn't necessarily make the code undefined behaviour. while it's often ub, afaik this particular rule is a bit of a grey area and i belieeeve that rustc attempts not to break code that does this? and given how popular crossbeam is, i'd assume that if there was actual ub happening there, people would be much louder about it. neither stacked borrows nor tree borrows perfectly describe rustc's current assumptions, though generally you want to play it safe by conforming to stacked borrows, since stacked borrows is generally stricter than what rustc actually requires

https://github.com/crossbeam-rs/crossbeam/issues/545 seems to be the relevant upstream issue for this. it's likely that you're fine here, but it's very unfortunate that this prevents testing the code in miri...

#

however, honestly, i don't think you need to be implementing ParallelIterator here. generating the references itself is fast enough that it's not something that needs to be parallelised. i'd recommend just using par_bridge() instead, which allows you to use a single thread to read from the iterator, but then work on the yielded references from multiple threads

#

in general i'd highly encourage you to keep the code that deals with unsafe as small as possible. the more code you have around any unsafe usage, the harder it becomes to audit it to ensure that all assumptions are upheld. it's better to have a very small safe abstraction that encapsulates the unsafe code, and then build upon that using other, entirely safe types. that's also why i suggested using a separate iterator to generate the indices so that it can be tested independently - it allows you to look at the unsafe code and go "well as long as the index iterator is correct, then this is trivially sound", instead of having to consider both the index generation and the unsafe usage together

sonic adder
sonic adder
deep lodge
#

oh wow

#

that is one large slice

sonic adder
sonic adder
sonic adder
deep lodge
deep lodge
#

also for the record, even just this results in the same miri error:

use rayon::iter::IntoParallelRefIterator;
use rayon::iter::ParallelIterator;

fn main() {
    let mut v = [1; 16];
    v.par_iter().for_each(|_| {});
    println!("{v:?}");
}

so definitely not your code that's the issue

sonic adder
#

Oh wait, I ran it in debug mode ๐Ÿ˜…

deep lodge
#

hmmm yeah that's unfortunate. i wonder if it's because par_bridge doesn't have any specialisation for ExactSizeIterator

#

would be interesting to profile to see where the bottleneck is

#

oh dear it's spending the overwhelming majority of time on acquiring and releasing a mutex

#

that is... unfortunate design

deep lodge
#

seems that using channels just makes the problem worse :( i guess there's just too much overhead in having to send so many small items over to another thread so frequently, really does need to be possible to pass as large chunks as possible to each individual thread

#

would likely be less of an issue if each worker had to spend more time on each individual item so there's less contention, but this is somewhat of a pathological case for par_bridge

sonic adder
#

I guess that's the price for having par_bridge work for every time of iterator.

Anyways, thanks you so much for your help, my code works as I hoped and I learnt a lot.
Now I can hopefully go back to needing only safe Rust for a long time ๐Ÿ˜„