#RAMPU - computer within RAM

14 messages · Page 1 of 1 (latest)

keen marsh
#

Have you ever wondered whether only RAM without any additional gates can make a useful computer? I hope not.

RAMPU uses only RAM to implement a basic instruction set.

  • 126 4-bit registers + 126 4-bit registers for secondary outputs
  • upto 14 user-defined operations 4b x 4b -> 8b. By default: MOV, AND, NAND, OR, SHL, SHR, ADD
  • every instruction is also a jump based on low 4 bits of operation output
    • upto 16 user-defined jump conditions
  • 64b instructions (32b instruction + 2x 16b jump addresses)
  • Memory access via 4 memory registers (making up 16b address)
#

Since this is a bizarre architecture, here is ISA spec and CLC2 solution in 4 KiB as a proof of concept.
Obvious spoilers (CLC2 solution is very similar to the normal one).

keen marsh
#

RAMPU216
more memory efficient, to fit CLC2 into only 512B memory

Uses 16-bit vectors of 8 2-bit values.

The 16-bit instructions are mainly designed to:

  • read 2 registers
  • vectorwise compute 2 results
  • possibly shift one result by 2 bits
  • save into 2 registers
  • possibly repeat if shift register is not zero
    Which allows nice implementation of 16-bit ADD, SUB, CMP in single instruction
    It will take several ticks, but for unsigned values the shift register will be quickly zero
#

and SSDPU216 for completeness

coarse flicker
#

so does this basically just rely on a bunch of LUTs?

keen marsh
# coarse flicker so does this basically just rely on a bunch of LUTs?

The first one is entirely LUTs.

RAM216 still mostly is. All operations, jmp conditions, etc. are done via 96B LUTs.
Then there is 8B scratchspace which is essentially temporary LUT for selecting next IP etc.
And there are few cases where i emulated NOR/NOT by read to invalid memory ((bit_to_negate << 16) | address_with_set_bit), but that could have been a LUT as well.

coarse flicker
#

presumably this won't be very cheap after scoring gets fixed, but it is cool

keen marsh
coarse flicker
#

pipelined RAM?

keen marsh
# coarse flicker pipelined RAM?

Maybe, the main problem of RAM is that writes are always sequential.
At minimum you need to write IP, 1 register, and memory, which is already 15 delay, without doing anything.
So at best you could make it with ~3 times less delay with long pipeline..

I definitely won't try it before the score change

#

And the pipelining state won't fit into 64B, which is the current max width of a single write, so another few writes would be necessary, making it unusable
Maybe once we can write more at once

keen marsh
# keen marsh Depends, if the cost will be 1 gate per byte, it is still only 512 gate score fo...

In few places I needlessly depended on later value, when earlier one works as well
And minor pipelining - updating not only the IP, but also writing the instruction associated with given IP next to it

Another 10 delay can be saved by reorganizing the LUTs and using another 16B for LUTs so i don't need the decompression step
So 50 delay is achievable with this design, still not great, but close to best what can be done with RAM

fluid stump
scenic scroll
#

wow