#0
34 messages · Page 1 of 1 (latest)
L2 and L3 caches are getting bigger and with tiles/chiplets they will continue to. Intel and other semiconductor design companies have tried and thought about this. It just doesn't make sense to put on package DRAM except in very specific workloads and use cases.
That’s not really how caches work…also not sure why this is in community support.
JNT has lots of business suggestions for intel and leaves them here
No worries, should have specified
But yeah it has on substrate memory
Just asked if you meant similar to it
Even going back to when some cpus had EDRam on them it wasn’t massively faster than the standard DRAM of the time. I think it was like ~50GB/s when ddr4 was 34GB/s in dual channel.
It would likely take a complete arch change to support an L4/victim cache again.
Not all arm do, just small SoCs where the entire system memory is on substrate
Yeah I know you're just describing a SoC
Megabandwidth + low latency = cache...which is why its expensive.
higher the bandwidth the worse the latency gets if you want it at commodity prices.
Apple's subsystem powers both the cpu/gpu cores which is why they get ~300GB/s memory bandwidth, but it comes at the cost of SKUing and limiting your total addressable market to sell to. Intel would struggle picking the winning cpus to meet market demands
also if your memory is on package, that means you aren't able to route out additional memory lanes without sacrificing something else.
as an example, you would have to nearly double the memory controller domain
The big thing with tiles is you want it to act like a single die.
Its not as simple as you are making it seem JNT
Would be very different to current design
All SoCs I know of has all ram on substrate and none outside
No odd hybrid
Either way I don't really see the point of it or the advantages
As a single point of contention, if you could make a 1GB GDDR6x die instead of a 2GB die, you would get roughly 90GB/s maximum throughput out of it, but the latencies would be god awful, at a 256MB address ddr5 typically sits around 90ns, gddr6x sits around 270ns. So the memory manufacturers would have to do some magic in this work as well.
1 GB nano RAM on consumer CPU (not cache, sorry)
It is all cache all the way down to bulk storage
Just a matter or bandwidth and latency
4 GB nano RAM on CPU (not cache, sorry)