#0

34 messages · Page 1 of 1 (latest)

faint orchid
#

So like HBM3 dies?

#

Except not expensive

#

Don't really see how regular ram would help

#

Or like M2?

heady eagle
#

L2 and L3 caches are getting bigger and with tiles/chiplets they will continue to. Intel and other semiconductor design companies have tried and thought about this. It just doesn't make sense to put on package DRAM except in very specific workloads and use cases.

faint orchid
#

Apple M2, not m.2 form factor

#

Sorry for confusion

rugged fjord
#

That’s not really how caches work…also not sure why this is in community support.

heady eagle
faint orchid
#

No worries, should have specified

#

But yeah it has on substrate memory

#

Just asked if you meant similar to it

rugged fjord
#

Even going back to when some cpus had EDRam on them it wasn’t massively faster than the standard DRAM of the time. I think it was like ~50GB/s when ddr4 was 34GB/s in dual channel.

#

It would likely take a complete arch change to support an L4/victim cache again.

faint orchid
#

Not all arm do, just small SoCs where the entire system memory is on substrate

#

Yeah I know you're just describing a SoC

rugged fjord
#

Megabandwidth + low latency = cache...which is why its expensive.

#

higher the bandwidth the worse the latency gets if you want it at commodity prices.

#

Apple's subsystem powers both the cpu/gpu cores which is why they get ~300GB/s memory bandwidth, but it comes at the cost of SKUing and limiting your total addressable market to sell to. Intel would struggle picking the winning cpus to meet market demands

#

also if your memory is on package, that means you aren't able to route out additional memory lanes without sacrificing something else.

#

as an example, you would have to nearly double the memory controller domain

heady eagle
#

The big thing with tiles is you want it to act like a single die.

faint orchid
#

Desktops usually use more lanes than that

#

Mobile maybe

heady eagle
#

Its not as simple as you are making it seem JNT

faint orchid
#

Would be very different to current design

#

All SoCs I know of has all ram on substrate and none outside

#

No odd hybrid

#

Either way I don't really see the point of it or the advantages

rugged fjord
#

As a single point of contention, if you could make a 1GB GDDR6x die instead of a 2GB die, you would get roughly 90GB/s maximum throughput out of it, but the latencies would be god awful, at a 256MB address ddr5 typically sits around 90ns, gddr6x sits around 270ns. So the memory manufacturers would have to do some magic in this work as well.

atomic walrus
#

1 GB nano RAM on consumer CPU (not cache, sorry)

heady eagle
#

It is all cache all the way down to bulk storage

#

Just a matter or bandwidth and latency

atomic walrus
#

4 GB nano RAM on CPU (not cache, sorry)