Extend Array/Paging | Rust Programming Language Community | Page 1

steel cairn Sep 13, 2024, 12:05 PM

#

Hey there,
for a small concurrent container which basically only has

pub fn push(&self, value: T);
pub fn iter(&self) -> impl Iterator<Item = T>;
pub fn pop(&mut self) -> Option<T>;

as its API, I'm looking for ways to manage its actual memory.

What I need is some form of paging or growing in blocks, probably how std::vec::Vec does it: I start with some capacity, say 100, if the 100th element has been pushed, I grow for another 100 etc.

The point is: I need the memory to be addressable by an index (or pointer) or an offset, i.e. anything which can be atomically incremented.

Ignoring the concurrent aspect, a naive implementation would be something like this:

struct CStack<T> {
   data: Vec<T>,
   block_size: usize,
   next_elem: usize,
}
// ... 

fn grow(&mut self) {
   self.data.extend_from_slice(&vec![T::default(); self.block_size]);
}

fn push(&mut self, value: T) {
    if self.next_elem >= self.data.len() {
      self.grow()
    }

    // this is crucial, access by index or offset or anything which can be atomically incremented:
    self.data[self.next_elem] = value;
}

So, a few things feel off here:

T needs to have impl Default. I would rather dip and learn about MaybeUninit if this is the right direction?
Vec seems to be the wrong choice. I'm not interested in its capacity, since the structure manages its capacity on its own. Something with Box<[T]> maybe? Or more raw?

Might this be a good time to learn about allocation in Rust?

The project is to learn more about low-level implementations, concurrency and sprinkles of unsafe (I'm reading "Rust Atomics and Locks" by Mara Bos and just playing around with ideas, trying to understand other stuff on the way as well).

Thanks in advance!

jolly stratus Sep 13, 2024, 12:18 PM

#

steel cairn Hey there, for a small concurrent container which basically only has ```rust pu...

Well if you need performant data structure then I would suggest sticking with Vectors because they tend to be contiguous in memory and so iteration over them are fast due to cpu prefetching.

Also, why do you need a grow function? because Vectors can automatically grow if they don't have space in them.

Also, if you do want to implement your own self allocatable and manageable data structure I think you can use something equivalent to c++ buffers in rust because if you study llvm in depth you will realize they use the same technique to create their own data structures. So yes I would suggest using this approach like using a c++ buffer like data structure in rust.

I hope this helps 🙂

jolly stratus Sep 13, 2024, 12:19 PM

#

steel cairn Hey there, for a small concurrent container which basically only has ```rust pu...

Also, I would suggest this resource if you want to learn even more in depth about atomics in rust:

https://www.youtube.com/watch?v=rMGWeSjctlY

Again I hope this helps 🙂

YouTube

Jon Gjengset

Crust of Rust: Atomics and Memory Ordering

In this episode of Crust of Rust, we go over Rust's atomic types, including the mysterious Ordering enum. In particular, we explore the std::sync::atomic module, and look at how its components can be used to implement concurrency primitives like mutexes. We also investigate some of the gotchas and sometimes counter-intuitive behaviors of the ato...

▶ Play video

sharp trench Sep 13, 2024, 12:19 PM

#

jolly stratus Well if you need performant data structure then I would suggest sticking with `V...

concurrent container
This requirement tends to make it impossible to stick to Vec

#

Well, unless you're ok with locking, but usually "concurrent container" implies lockfree

jolly stratus Sep 13, 2024, 12:22 PM

#

sharp trench > concurrent container This requirement tends to make it impossible to stick to ...

Ok I see, maybe I would suggest hybrid data structure approach then. Again if they really want to have a very performant data structure 🙂

sharp trench Sep 13, 2024, 12:23 PM

#

steel cairn Hey there, for a small concurrent container which basically only has ```rust pu...

Word of warning: There is no good way to resize a lockfree concurrent container. If you're ok with locks it's easy, but then all you need is Mutex<Vec<T>>.

Concurrent list-like data structures usually come in two variants

ones with a fixed capacity, so you preallocate all the possible space ahead of time and refuse to ever increase the capacity (think of a Vec where push can fail because it's too full)
linked lists: You can append to a linked list in a nice concurrent lockfree way.

You can hybridize the two approaches and have a linked list of chunks, where each chunk is the size of the sum of all previous chunks (so adding a chunk doubles your capacity), which is the hybrid approach neon-mmd is suggesting

jolly stratus Sep 13, 2024, 12:23 PM

#

steel cairn Hey there, for a small concurrent container which basically only has ```rust pu...

Also, I would suggest this talk to you this may help you with writing your own high performant data structure 🙂 :

https://www.youtube.com/watch?v=XKODaZgKcnE

YouTube

Rust

Writing Performant Concurrent Data Structures by Adrian Alic - Rust...

Writing performant concurrent data structures in Rust can be challenging, and often requires subversion of Rust's ownership guarantees. In this talk I'll be using a multi-producer single-consumer queue as a case study for writing performant code, crafting safe abstractions, sniffing out undefined behaviour & concurrency issues, and avoiding pitf...

▶ Play video

sharp trench Sep 13, 2024, 12:27 PM

#

steel cairn Hey there, for a small concurrent container which basically only has ```rust pu...

As for your actual questions:

MaybeUninit is a fine direction to go to learn this, yes.
Vec is an ok choice. Box<[T]> is slightly better, but the ideal choice is to manage allocations yourself: ideally you want blocks to look something like this

struct Block<T> {
  next: Option<NonNull<Self>> //next in the linked list
  len: usize, //number of init elements
  capacity: usize, //capacity of this one chunk
  total_capacity: usize //sum of the capacities of every chunk up to the current one, included. You can also recompute it whenever `.len()` is called
  data: [T; 0] //this isn't actually the data, you manually manage the allocation to have a `capacity`-sized chunk of space for `T`s starting here
}

jolly stratus Sep 13, 2024, 12:27 PM

#

sharp trench Word of warning: There is no good way to resize a lockfree concurrent container....

Also, there is the approach which is similar but this has a limitation that the requirement is if you do reads more than writes then you can use ArcSwap instead of Mutex which is lockfree. 🙂

sharp trench Sep 13, 2024, 12:27 PM

#

That's... true, but we're moving increasingly further from what people usually mean by "concurrent collection"

steel cairn Sep 13, 2024, 12:28 PM

#

sharp trench Word of warning: There is no good way to resize a lockfree concurrent container....

The goal would be somewhat lock-free. New elements just get "pushed" concurrently on existing memory, if the index of the elements which needs to get pushed is >= capacity, a spin-lock waits until it has grown... something along these lines. But working this out is part of the fun. ; )

sharp trench Sep 13, 2024, 12:29 PM

#

steel cairn The goal would be somewhat lock-free. New elements just get "pushed" concurrentl...

Get an actual mutex, they already include spinlocks internally that run for a while to hope the lock isn't very contested. There's no good reason to ever use a spinlock if you have an alternative.

#

Also, what you have here is equivalent to SpinLock<Vec<T>>

#

The issue with locking on resize is that you then must also lock on every single access

#

Since you need to ensure the resize doesn't conflict with an access

jolly stratus Sep 13, 2024, 12:30 PM

#

sharp trench As for your actual questions: - `MaybeUninit` is a fine direction to go to learn...

Also, I would suggest they could use something more equivalent to c++ buffers in rust I have seen this approach being used in llvm alot like they have custom structures for small vectors, etc. So yes if they want to really control allocation I think may be they can look into this. 🙂

sharp trench Sep 13, 2024, 12:30 PM

#

Alright, stop there

#

You're going increasingly off topic

jolly stratus Sep 13, 2024, 12:32 PM

#

sharp trench Also, what you have here is equivalent to `SpinLock<Vec<T>>`

Well I will point out the only limitation here is spinlocks are notorious for the slow lock/unlock speed I believe. 🙂

jolly stratus Sep 13, 2024, 12:33 PM

#

sharp trench Alright, stop there

Hmmm... Idk how I got offtopic though well I was just suggesting about using something similar to c++ buffers. But still sorry if it is. 🙂

steel cairn Sep 13, 2024, 12:33 PM

#

sharp trench Get an actual mutex, they _already_ include spinlocks internally that run for a ...

Probably I will get there... I was wondering that if for all new elements < capacity this is just a relaxed loading instead of an Acquire, which I would need for a mutex, or? If it is bigger, I could add a fence and "start spinning".

sharp trench Sep 13, 2024, 12:33 PM

#

jolly stratus Well I will point out the only limitation here is spinlocks are notorious for th...

Well, no, being fast at locking/unlocking is the one thing they're good at.

#

They just thrash an entire CPU core

sharp trench Sep 13, 2024, 12:34 PM

#

jolly stratus Hmmm... Idk how I got offtopic though well I was just suggesting about using som...

We're discussing concurrent data structures. I have no idea how you went from there to "data structures optimized for a small amount of elements"

#

But it's definitely not the same topic.

steel cairn Sep 13, 2024, 12:34 PM

#

jolly stratus Also, I would suggest this talk to you this may help you with writing your own h...

Thanks for the link and ideas neon-mmd. But indeed, some are a bit too far for now. : )

jolly stratus Sep 13, 2024, 12:36 PM

#

sharp trench We're discussing concurrent data structures. I have _no idea_ how you went from ...

Nah I mean't like they could use buffers to implement well there own data structure I was just giving an example of their one of the data structures they implement in this style. Like I had no intention to discuss about data structures optimized for small amounts 😅 .

sharp trench Sep 13, 2024, 12:36 PM

#

steel cairn Probably I will get there... I was wondering that if for all new elements < capa...

I was wondering that if for all new elements < capacity this is just a relaxed loading instead of an Acquire, which I would need for a mutex, or?
From a formal perspective, this sentence does not make sense.

It's a salad of concepts related to atomics and concepts realated to mutexes, and the way you're choosing the atomic ordering is not correct according to the standard for atomics. That, or I don't understand what you're trying to express, which is entirely possible

#

I'm definitely not sure what you're doing here, at least

#

What is "this" in "this is just a relaxed loading"

#

Why are you choosing how to load based on how many elements there are

steel cairn Sep 13, 2024, 12:53 PM

#

I see. Let me try to formalize. Put aside how exactly memory is allocated for a moment, just on when to lock and how to lock.

If the container has a fixed size, I could get along with a single atomic, which I fetch_add(1, Ordering::Relaxed) and check if the returned index is smaller then the capacity. If so, I can write at that index, otherwise, I cannot.

Now, I thought about using an atomic for the capacity and the criteria if the index is smaller as the capacity as a lock criteria:

let next_elem = len.fetch_add(1, Ordering::Relaxed);
if next_elem == capacity.load(Ordering::Relaxed) {
    fence(Acquire);
    // .. extend memory
    capacity.fetch_add(BLOCK_SIZE, Ordering::Release);
}

while next_elem > capacity.load(Ordering::Relaxed) {
    fence(Acquire);
    // ...  spin
}

// write at next_elem

And assuming here, that the blocks themselves aren't touched while the memory of the container is extended, hence a write into an "old" block while a "new" block is allocated is fine.

Something along these lines.

#Extend Array/Paging