RAMPU - computer within RAM | Turing Complete | Page 1

keen marsh May 16, 2025, 12:59 PM

#

Have you ever wondered whether only RAM without any additional gates can make a useful computer? I hope not.

RAMPU uses only RAM to implement a basic instruction set.

126 4-bit registers + 126 4-bit registers for secondary outputs
upto 14 user-defined operations 4b x 4b -> 8b. By default: MOV, AND, NAND, OR, SHL, SHR, ADD
every instruction is also a jump based on low 4 bits of operation output
- upto 16 user-defined jump conditions
64b instructions (32b instruction + 2x 16b jump addresses)
Memory access via 4 memory registers (making up 16b address)

#

Since this is a bizarre architecture, here is ISA spec and CLC2 solution in 4 KiB as a proof of concept.
Obvious spoilers (CLC2 solution is very similar to the normal one).

📎 CLC2.asm

📎 spec.isa

keen marsh May 31, 2025, 1:31 PM

#

RAMPU216
more memory efficient, to fit CLC2 into only 512B memory

Uses 16-bit vectors of 8 2-bit values.

The 16-bit instructions are mainly designed to:

read 2 registers
vectorwise compute 2 results
possibly shift one result by 2 bits
save into 2 registers
possibly repeat if shift register is not zero
Which allows nice implementation of 16-bit ADD, SUB, CMP in single instruction
It will take several ticks, but for unsigned values the shift register will be quickly zero

#

and SSDPU216 for completeness

coarse flicker May 31, 2025, 9:14 PM

#

so does this basically just rely on a bunch of LUTs?

keen marsh May 31, 2025, 9:30 PM

#

coarse flicker so does this basically just rely on a bunch of LUTs?

The first one is entirely LUTs.

RAM216 still mostly is. All operations, jmp conditions, etc. are done via 96B LUTs.
Then there is 8B scratchspace which is essentially temporary LUT for selecting next IP etc.
And there are few cases where i emulated NOR/NOT by read to invalid memory ((bit_to_negate << 16) | address_with_set_bit), but that could have been a LUT as well.

coarse flicker May 31, 2025, 9:30 PM

#

presumably this won't be very cheap after scoring gets fixed, but it is cool

keen marsh May 31, 2025, 9:35 PM

#

coarse flicker presumably this won't be very cheap after scoring gets fixed, but it is cool

Depends, if the cost will be 1 gate per byte, it is still only 512 gate score for essentially 16-bit architecture, and if it is more, the RAM will dominate cost of most architectures anyway
The delay is not great though

coarse flicker May 31, 2025, 9:38 PM

#

pipelined RAM?

keen marsh May 31, 2025, 9:58 PM

#

coarse flicker pipelined RAM?

Maybe, the main problem of RAM is that writes are always sequential.
At minimum you need to write IP, 1 register, and memory, which is already 15 delay, without doing anything.
So at best you could make it with ~3 times less delay with long pipeline..

I definitely won't try it before the score change

#

And the pipelining state won't fit into 64B, which is the current max width of a single write, so another few writes would be necessary, making it unusable
Maybe once we can write more at once

keen marsh Jun 1, 2025, 3:09 AM

#

keen marsh Depends, if the cost will be 1 gate per byte, it is still only 512 gate score fo...

In few places I needlessly depended on later value, when earlier one works as well
And minor pipelining - updating not only the IP, but also writing the instruction associated with given IP next to it

Another 10 delay can be saved by reorganizing the LUTs and using another 16B for LUTs so i don't need the decompression step
So 50 delay is achievable with this design, still not great, but close to best what can be done with RAM

fluid stump Sep 25, 2025, 6:49 PM

#

keen marsh and SSDPU216 for completeness

what

scenic scroll Oct 2, 2025, 10:47 PM

#

wow

#RAMPU - computer within RAM