#Sega Dreamcast
9397 messages · Page 10 of 10 (latest)
skmp started adding support for "Fastmem" a while ago, but I couldn't get it to compile / work.
I think it's possible to get the software emu running fast enough for playable speeds on the FPGA side, but we'll see.
oic, skmp did already get fastmem into the code, but I don't think I knew how to tweak the mmap stuff to work with it...
Oh great, most of the files were on my other PC. I'll have to figure out how to cross-compile for ARM again. lol
@pseudo tinsel do you have an idea why ElectronAsh was struggling with the fastmem compilation?
Any hints?
Honestly can't remember where we got up to, but I've been scrolling back through my DMs with skmp.
I think the emu was essentially "running" with fastmem support, but I hadn't figured out how to hook up the mmap stuff again, so it can talk to the PVR thing on the FPGA side.
(I'm wary of calling it a "core" atm. lol)
The Discord search feature is rather for the extreme tech crowd 😁
I get a weird feeling I need to set up something else, to cross-compile for ARM.
Like I do for Main MiSTer, but we'll see.
Can't believe how much stuff I've forgotten, in only about six months.
export PATH=$PATH:/opt/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf/bin
export CC='/opt/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf/bin/arm-none-linux-gnueabihf-gcc'
cmake -DCMAKE_TOOLCHAIN_FILE=tc.cmake ..
Without that stuff, it will basically compile the emu for your host system, so it just compiled it for x86 / Linux. lol
i do not recall anything at this point
but fastmem is trickier
Ahh, you have to delete the CMakeFiles folder, if it was already configured previously (to compile for x86, not ARM).
Then run the thingy again.
My brain has already gone to mush.
Some older code.
Call to mmap_setup() used to be here, in _vmem.cpp...
Well there's a surprise. lol
Missing some Linux header stuffs.
Ok, that was just because I uncommented the mmap stuff in mister_if.cpp, but it got moved into _vmem.cpp a while ago.
Ok, that's strong deja vu.
I added the pvr_regs.h include to mister_if, and fixed the other errors.
And I'm starting to remember that this shm_unlink thing was quite hard to fix.
ChatGippity...
shm_open() and shm_unlink() live in librt on ARM Linux toolchains.
Your code compiled fine, but the linker was not told to link -lrt.
In CmakeLists.txt...
target_link_libraries(${APP_NAME}.elf PRIVATE stdc++ rt)
Added the "rt" part.
Removed it again.
ChatGippity gave a way to fix the _vmem thing, without using posix stuff.
Because it said "MiSTer doesn't need it", whatever that means. lol
It also says to NOT try to "FastMEM" the ARM/FPGA shared mem, but I don't think it quite knows what we were trying to do.
It did compile, at least.
In posix_vmem.cpp...
// Allocates memory via a fd on shmem/ahmem or even a file on disk
/*
static int allocate_shared_filemem(unsigned size) {
int fd = -1;
#if defined(_ANDROID)
// Use Android's specific shmem stuff.
fd = ashmem_create_region(0, size);
#else
#if HOST_OS != OS_DARWIN
fd = shm_open("/dcnzorz_mem", O_CREAT | O_EXCL | O_RDWR, S_IREAD | S_IWRITE);
shm_unlink("/dcnzorz_mem");
#endif
// if shmem does not work (or using OSX) fallback to a regular file on disk
if (fd < 0) {
string path = get_writable_data_path("/dcnzorz_mem");
fd = open(path.c_str(), O_CREAT|O_RDWR|O_TRUNC, S_IRWXU|S_IRWXG|S_IRWXO);
unlink(path.c_str());
}
// If we can't open the file, fallback to slow mem.
if (fd < 0)
return -1;
// Finally make the file as big as we need!
if (ftruncate(fd, size)) {
// Can't get as much memory as needed, fallback.
close(fd);
return -1;
}
#endif
return fd;
}
*/
// MiSTer does not use POSIX shared memory or file-backed fastmem.
// We return a dummy fd and use MAP_ANONYMOUS instead.
static int allocate_shared_filemem(unsigned size)
{
(void)size;
return -1; // signals "no fd backing"
}
Not quite sure what that does, and what it might break.
Oww. Stupid server bot. lol
It culled the "links", but those were just my notes, for getting the software emu to run.
Shove it, Server Turtle. lol
Fastmem does help — but not enough on MiSTer by itself
Fastmem mainly accelerates:
SH4 RAM loads/stores
Instruction fetch
Texture cache reads
Tight interpreter/JIT loops
On desktop x86:
Big TLB
Aggressive OoO
Fast branch prediction
Fastmem can give 20–40% speedups.
On MiSTer’s Cortex-A9, Fastmem gains are limited
MiSTer specs (DE10-Nano):
ARM Cortex-A9 (dual-core, but single core used)
In-order execution
Small TLB
Weak branch predictor
No speculation magic
What that means:
Fastmem saves function calls ✔
But memory latency dominates ❌
Page table walks are expensive
mprotect / guard pages hurt
So even with:
PVR disabled
No rendering
Full Fastmem
You still won’t hit full speed in many titles.
Fastmem on MiSTer: what it can realistically give you
From real-world measurements on A9-class ARM cores:
Change Speedup
Fastmem vs safe mem ~10–20%
Disable bounds checks ~5–10%
Inline load/store helpers ~5%
Remove virtual calls ~5%
Stacked:
~20–30% total
Helpful, but not transformational.
fastmem brings around 2x perf
I'd consider it pretty "transformational" if it makes it even 20% faster. lol
Nice.
I need to try it, if I can remember how to even get it all running.
My MiSTer box is kinda hooked up to a projector atm, but at least it's a laser projector, with no evil lamps.
I'm having a hard time remembering what we even did / tested last time. Even looking back on old DMs.
But I'm slowly getting there.
lol. ffs
This is the old one, that was on the SD card from many months ago...
Aaaaand... the latest code that I just screwed up...
The older fastmem version seems a LOT faster, even just the way the text scrolls past at the start.
If that second value is the "FPS", then the older code looks fine.
OK, so. I can’t tell which of the two older versions still worked with the crusty PVR “core”.
But apparently it’s not the core that’s in the “DC_MiSTer” folder on this PC. lol
I’ll have a look on the other PC.
Found the old backups, and can see how it works again now.
Shoved it onto Mega.
Sorry for the rushed video.
I was trying to remember how we set up the controls. skmp made the controls work via the ssh console. lol
It renders faster than I remember, but it looks like frame sync isn’t enabled atm.
This is also the non-fastmem reicast.
OK, yeah, the old fastmem version doesn't trigger frame rendering, so that's where we were up to last time.
Arrows on I,J,K,L
A = Keyboard V.
B = Keyboard C.
Yes, yes, I know it's crappy. lol
It's doing about 3-7 FPS.
How the emu triggers a frame render on the FPGA side...
case (ra_state)
0: begin
draw_last_tile <= 1'b0;
// Triggered from the MiSTer menu, or after a PVR dump load...
if (ra_trig_reg) begin
ra_trig_reg <= 1'b0;
ra_state <= 8'd1; // Start rendering the FRAME.
end
else begin // Else, keep polling for the magic number in DDR3 / PVR regs copy...
trig_pvr_update <= 1'b1; // Trigger the PVR reg update.
ra_state <= 8'd200;
end
end
200: if (!trig_pvr_update && !pvr_reg_update) begin
if (TEST_SELECT != 32'h0000000) ra_state <= 8'd1; // Start rendering the FRAME.
else ra_state <= 8'd0;
end
The "TEST_SELECT" PVR register isn't likely to be used be any commercial games.
Then it all gets triggered from the reicast/minicast emu, in mister_if.cpp...
// Write the magic number to the evil PVR register. ;)
//
// parameter TEST_SELECT_addr = 16'h0018; // RW Test - writing this register is prohibited.
//
pvr_regs[ 0x18 ] = 0xCA;
pvr_regs[ 0x19 ] = 0xFE;
pvr_regs[ 0x1a ] = 0xBA;
pvr_regs[ 0x1b ] = 0xBE;
// Copy the PVR regs directly ABOVE the 8MB VRAM (in DDR3).
memcpy(vram+offs_8meg, pvr_regs, pvr_RegSize);
// Trigger the PVR reg copy on the core, then render the frame...
// (the core should then clear these to 0x00, after the frame is rendered.)
vram[ (offs_8meg-8)+0 ] = 0xCA;
vram[ (offs_8meg-8)+1 ] = 0xFE;
vram[ (offs_8meg-8)+2 ] = 0xBA;
vram[ (offs_8meg-8)+3 ] = 0xBE;
vram[ (offs_8meg-8)+4 ] = 0xCA;
vram[ (offs_8meg-8)+5 ] = 0xFE;
vram[ (offs_8meg-8)+6 ] = 0xBA;
vram[ (offs_8meg-8)+7 ] = 0xBE;
I barely remember doing any of this.
So my challenge for the next week - Try to get some of the fixes from the sim version into the DC "core".
So the graphics don't look as crap.
That's if the new stuff will even fit.
I basically re-use a single instance of the interpolation block now, to pre-calc some stuff at the start of each polygon (per-tile).
Like the ARGB and Gouraud shading stuff.
Which currently takes 7 clock cycles, which isn't too bad, but saves on a ton of logic.
Currently using 83% of the FPGA logic, and 47% BRAM.
Which is a lot.
= very inefficient atm.
DE10 is too weak sauce.
Can't even add the updated intri_calc, nor the interp module.
Awake again. Only got about 40 minutes sleep. sigh
I tried tweaking the interp module last night, to save on logic for some dividers. It didn’t work. lol
Quartus used exactly the same amount of logic for that module, whether it was a combinational assignment, or a clocked / register assignment.
So it’s going to be tough to get the renders to look any better atm. A few more tricks to try.
Added the newer code for the texture address module / colour blender, and saved a few percent logic.
But now the core is unstable once again, and only renders about ten frames before freezing. It’s a marginal timing thing.
I think it looks better overall, but hard to tell unless I can get it into a game.
Ok, it’s just as crusty as before. lol
So I next need to fix the real issue with reading from DDR3, because it's clearly corrupting a lot of stuff.
For reference, a typical Taxi render looks like this in the sim atm...
(part of the road missing, etc. because I have to break some things to start to fix / add others.)
I'm up to about the fifth Quartus recompile now.
Still not running yet. I just need to get to the next stage.
Quartus is taking about 35 minutes per compile, and there's no easy way to debug some of the trigger logic otherwise.
I can't even easily test it in Verilator, because the reicast/minicast emu runs on the ARM / DE10.
The turnaround time of Quartus compiles is often so long, it's what kills motivation.
Some of the time, I can't even remember what I changed last. lol
I'm basically trying to paste in parts of the newer code (from the sim version) into the core, to try to improve the renders.
And also hopefully speed it up quite a bit.
Then I can look again at getting the fastmem version of minicast working.
The current issue. ^
Isn't there some Windows running on the Dreamcast? How is that emulated? Just side loaded like a BIOS?
The ISP pulsed DDRAM_RD, so there was a read pending / it was waiting for the DDR controller to respond by asserting DDRAM_DOUT_READY, but it never does.
Not sure. I saw a vid yesterday about somebody running "Windows 98" on Dreamcast, but it was clickbait. lol
It was just an old KOS example, showing a GUI that was made to look like Windows 98.
I can't remember if any homebrew got actual Windows running on it. I don't think so?
The Windows CE logo on the Dreamcast is a bit of a misnomer, as it's not like it ever ran a full Windows CE OS.
Just parts of the API / SDK, to make it "easier" for devs to write games, AFAIK.
Man, waiting for Quartus is killing me. lol
Half the reason I built this 5900X with 32GB of RAM was to try to compile faster.
I often wonder how fast a Threadripper might do Quartus compiles, but then Quartus sucks for taking advantage of multi-threading.
Yeah, but parts of DirectX run actually on the Dreamcast?
Because some games need that?
I guess the DirectX thing is more like a translation layer, tbh. Not sure.
The non-Katana games?
ie. it just let the programmers use the same / similar set of API function calls, etc., to help port / write stuff?
But probably only a smaller subset of what a typical Direct X graphics card was capable of, even back then.
It's my main goal this week, to just get the renders looking better on MiSTer. Even if it's still slow.
It can only get faster, if I can implement some cache stuff, to do Burst Transfers from DDR3.
(I don't yet need to write back to DDR3, because my framebuffer is in the old-people SDRAM.)
The reicast/minicast emu basically writes all the params and textures into VRAM, like it normally would.
Then I trigger a render on the FPGA side, by writing to the PVR "TEST_SELECT" register.
The FPGA renders one frame, using the data already in VRAM.
Then it clears the TEST_SELECT register, just so it doesn't re-trigger the same frame.
Then it writes some stuff near the end of VRAM, to tell minicast to write the next stuff to VRAM, and that repeats.
I probably don't have it synced atm, meaning minicast will be constantly writing stuff to VRAM as fast as it can, even if the FPGA hasn't finished rendering the full frame.
Which is what causes the chonky vertical line gaps and other weirdness in the recent vid.
I purposely didn't have frame sync on, to allow minicast to run faster, else it was taking forever to actually get into a game.
void rend_start_render(u8* vram) {
// kick off render
//printf("rend_start_render\n");
//SetREP(20 * 1000 * 1000); // in 20 mhz = 10 ms at 200 mhz
SetREP(20); // in 20 mhz = 10 ms at 200 mhz
// Copy the PVR regs directly ABOVE the 8MB VRAM (in DDR3).
memcpy(vram+offs_8meg, pvr_regs, pvr_RegSize);
// Write the magic number to the evil PVR register, to trigger a PVR update and Frame render ;)
//
// parameter TEST_SELECT_addr = 16'h0018; // RW Test - writing this register is prohibited.
//
// The new core should clear the TEST_SELECT reg (internally) after the frame is drawn.
//
pvr_regs[ 0x18 ] = 0xCA;
pvr_regs[ 0x19 ] = 0xFE;
pvr_regs[ 0x1a ] = 0xBA;
pvr_regs[ 0x1b ] = 0xBE;
// Trigger the PVR reg copy on the core, then render the frame...
// (the core should then clear these to 0x00, after the frame is rendered.)
/*
vram[ (offs_8meg-8)+0 ] = 0xCA;
vram[ (offs_8meg-8)+1 ] = 0xFE;
vram[ (offs_8meg-8)+2 ] = 0xBA;
vram[ (offs_8meg-8)+3 ] = 0xBE;
vram[ (offs_8meg-8)+4 ] = 0xCA;
vram[ (offs_8meg-8)+5 ] = 0xFE;
vram[ (offs_8meg-8)+6 ] = 0xBA;
vram[ (offs_8meg-8)+7 ] = 0xBE;
*/
// flush vram contents here from cache
// call out to hw to render
}
void rend_end_render() {
// wait for render to end
// interrupts get fired automatically
//while ( emu_vram[ (offs_8meg-8)+0 ] != 0x00) {}
FrameCount++;
//printf("rend_end_render\n");
}
The core doesn't actually check for the "magic number" in the TEST_SELECT reg atm, it just triggers whenever the reg != 0.
Everything is very exciting though. Really looking forward to the performance gain. And how much it will be.
I don't think it will be that much faster yet. But yeah, anything would be an improvement. lol
The BIG gains won't happen until I can get some cache stuff working.
Even a more stable 10-15 FPS would be almost playable.
20 FPS is more playable than I think a lot of people realise. I mean, we had that on many N64 games. 😛
Stable 15 fps would be more than some N64 games had 😁
10-15 FPS isn't that far from 25 fps
(What Pal Resident Evil Code Veronica runs at I believe)
But it's always good to see you back ElectronAsh and coding away
Well, two minds like one 😁
lol
I still think the N64 was almost intentionally crippled, by Nintendo’s cost-saving.
I’m convinced SGI were aiming for 30 FPS in most games, at the full 640x480 (480i).
But they were forced to cram almost everything into the ONE chip, which was apparently one of the largest (in silicon area) for the 130nm process at the time.
(Steve Shepherd worked on the chip. He wrote a great article about the challenge they had trying to cram it all on one chip.)
The cost savings moved them to a B-tier supplier. The PSX annihilated them
But then I think I asked him on TwitterX about it years ago, and I think he said they always intended to use one chip, if possible, so maybe not.
It just seems like - if you are seriously struggling to fit a design within a specific chip area, what do you think they would look at culling first?…
Yep… the on-chip memory. lol
Hence the Texture cache ended up a measly 4KB.
If they had split the design across even two chips, and had separate Main RAM and VRAM, it could have been so much better.
Anyway, I guess it was often about getting the console “usable” back then, without it costing the company a larger fortune to build each one.
Sorry, I’m a bit bored. lol
I really hate this “limbo” stage of waiting for Quartus compiles, knowing that it probably still won’t work on the next try.
I wrote this article in 1996 when the Nintendo 64 was released. It doesn’t appear to be available on the internet any longer, so I’m reposting for historical purposes. I’ve worked on some amazing teams in my career, but this one was truly special.
If you read some of that article about how much they struggled, it sounds to me like the engineers would have just asked Nintendo / NEC if they could split the design.
And if it was that hard to make the design fit the RCP, it also seems like that’s a good reason why the Texture cache and some other stuff were so small.
Anyway, just a theory of mine. We’ll probably never know. lol
Sony sold two times as many PSX units as N64 and GameCube were sold combined.
Yep, and even as a big fan of the N64, even I would admit that the PSX graphics have actually held up quite well, especially in racing games etc.
Because... frame rate matters. lol
A higher frame rate can make up for a LOT of other issues.
I was never a fan of the wobbly graphics of the PSX, although it was amazing to see in 1995 or so, before most of us had a Voodoo 1 or whatever.
I felt like N64 was the first to at least do 3D "properly".
ie. non-wobbly meshes, and perspective-correct textures.
So it felt more "solid" and grounded, in a way.
But it was always a bit too blurry. lol
Years later, we find out that the final "VI Blur" was a kind of leftover from when it runs in 640x480 mode.
Since the last linear interp of the VI seems to be intended for 640 pixels per line.
When most games only rendered at 320x240, that linear interp would basically cause it to blur every-other pixel.
Having said all that, it looked a LOT better on a half-decent CRT TV than it does on most modern TVs.
(unless you have the money for a RetroTink 4K or similar)
My current MiSTer setup, btw. 😉
It’s a long story. lol
I had to just hit Ctrl-Z on Quartus, to undo my last changes.
And try to get back to where it was before, just with a slightly updated ISP parser.
Also compiling the older minicast code, without fastmem. So I have a basis to work from.
Woah. I forgot there was a time when it didn’t and just thought of the VMU beep as a part of the boot up process in my mind. Heh.
lol
True.
I don't remember ever replacing the battery in mine.
I had the light gun, for HOTD2 etc.
I think I had the fishing rod, too, but I wasn't a huge fan of fishing.
I wish we'd known at the time, that most GD drives could be made a lot quieter, just by adding some silicon grease to the laser sled gears.
OK, I think the issue with the core freezing, is actually a side-effect of me trying to speed up the rendering.
As it starts to do more read requests from DDR3, it was causing DDRAM_BUSY to go High.
Which I did have some checks for, but it was likely getting stuck in a kind of race condition.
Last compile took 40 minutes, so it's painfully slow to test things.
What I should really do, is try to write a proper sim for the DDR interface.
Including the gaps in non-contiguous bursts, write support, variable latency, etc.
Man, I've forgotten SO much about all of this.
It's not even using the standard SDRAM any more. lol
The framebuffer is in DDR3, alongside normal VRAM.
(kind of closer to what the DC does.)
I think the freezing is caused by me doing too many read or write requests, without checking to see if DDRAM_BUSY is Low.
I do some checks for it, but my logic is still not quite right, so sometimes it will send too many Writes, causing the DDR controller to lock up.
But at least it might not be core instability, as such. It's just that each time I get it running faster, it's doing more DDR requests, so failing sooner.
Yep, it's getting stuck in ra_state 13.
13: if (tile_accum_done) begin
if (opb_word[31:29]==3'b111 || poly_drawn) begin
ra_state <= 8'd9; // Read the next Prim TYPE entry.
end
end
Either waiting for tile_accum_done, or poly_drawn, or both.
tile_accum_done comes from the ISP parser...
// Write pixel to Tile ARGB buffer.
56: if (!vram_wait) begin
fb_addr <= my_addr[23:1];
fb_writedata <= {pix_565, pix_565, pix_565, pix_565};
fb_byteena <= (both_buff) ? ((!FB_W_SOF1[22]) ? 8'b11110000 : 8'b00001111) : 8'b11111111;
fb_we <= 1'b1;
//if (!vram_wait) begin
if (y_ps[4:0]==5'd31 && x_ps[4:0]==5'd31) begin // Last pixel written...
tile_wb <= 1'b1;
tile_accum_done <= 1'b1; // Tell the RA we're done.
isp_state <= 8'd0; // Back to idle state
end
else begin // Not on the last Tile pixel yet...
x_ps[4:0] <= x_ps[4:0] + 5'd1;
if (x_ps[4:0]==5'd31) y_ps[4:0] <= y_ps[4:0] + 5'd1;
isp_state <= 8'd51; // Jump back.
end
//end
end
I moved the "if (!vram_wait) to the top earlier.
So that it couldn't spam the DDR controller with the fb_we pulses, without checking for vram_wait (Low) first.
But that didn't help. lol
The tile_accum_done flag, was from when I was starting to implement the proper Tile Accumulation buffer.
ie. one of the internal buffers on PVR, which the final RGB pixels get rendered to.
A completed tile would then get written back to VRAM (on the DC) via Burst transfer. It's the only method that really makes sense, so it can render fast enough.
Quartus is so SLOOWWWW. lol
// Wait for current poly to be written to the Tag buffer in the ISP.
12: begin
if (poly_drawn) begin
if (ra_cont_last) draw_last_tile <= 1'b1;
ra_vram_addr <= ra_vram_addr + 4; // Go to next WORD in OL.
ra_vram_rd <= 1'b1;
ra_state <= 10;
end
end
// Wait for the ISP/TSP to render the final pixels to the tile.
13: if (tile_accum_done) begin
//if (opb_word[31:29]==3'b111 || poly_drawn) begin
ra_state <= 8'd9; // Read the next Prim TYPE entry.
//end
end
Commented out some stuff. Works in the sim (still lets it run, and doesn't break stuff), now to see if it works on the FPGA.
Many hours gone, and I'm pretty much back as I was before, but trying to kill this bug.
'cos I can't have it freezing after a handful of frames. Can't make progress that way.
You can see the "render_poly" and "poly_drawn" flags there.
I should rename those, because that doesn't actually render the final pixels.
It just registers the Tags for the current polygon (triangle) in the Tag Buffer, assuming any pixels of the triangle pass the inTri and depth_compare tests.
Once all of the triangles in the current tile have been processed into the Tag buffer, the "render_to_tile" signal is what kicks of the final pixel output.
atm, the final pixel colours get written directly to the framebuffer in DDR3.
I didn't get far enough with implementing the proper Tile Accumulation Buffer before, due to all of the other issues.
You can see there are three clock ticks between each DDRAM_WE (write) pulse, though.
Multiply that by the number of pixels on the whole screen (640x480 = 307,200), you can imagine why it's so slow atm.
At 100 MHz, it would be able to write to all pixels on the screen within 3.072 milliseconds.
(not including the rest of the ISP/TSP processing to actually render the tiles first.)
Since it's taking basically four clock cycles between each Write, it would take 12.288 milliseconds to update the entire screen.
Which would obviously massively impact the overall frame rate, once you include all of the other processing.
So the (eventual) goal, is to try to output a new pixel value on EVERY clock tick.
Thinking more about it, that only leaves around 13.59 milliseconds for the PVR to do all of the other processing for each frame, if it is to hit 60 FPS.
Fairly sure I just found the REAL bug. sigh
In the RA parser, I wasn't checking to see if the ISP was in the idle state before pulsing render_to_tile.
And the ISP will only see the render_to_tile pulse in the idle state.
Which means it was never kicking off the render.
Which in turn means the ISP was never pulsing tile_accum_done when done, which was locking up the RA.
I need to learn how to code this stuff better.
With a lot of help from ChatGippity, I got the fastmem version of reicast/minicast to compile and "run".
But with no rendering atm, because I had to roll back the Verilog again. sigh. lol
Fastmem does appear to let the emulator hit 60 FPS, at least in the BIOS menu.
Trying Daytona on it now, just to see what frame rate it hits. To be clear, there is no graphics rendering at all atm, until I get this next Quartus compile done.
And even then, it will be very slow.
AFAIK, the first value shown above is the frame time. So 16.66ms etc.
And the second value is the FPS.
I guess we can say the emulator still struggles a lot on MiSTer. lol
And that's without any kind of graphics atm.
It only hits about 15-20 FPS in Daytona.
MarioKart 64 port won't even run, it just crashes.
emu_vram = vram;
// Write the magic number to the evil PVR register. ;)
//
// parameter TEST_SELECT_addr = 16'h0018; // RW Test - writing this register is prohibited.
//
// Write magic into PVR regs (emulator-side)
pvr_regs[0x18] = 0xCA;
pvr_regs[0x19] = 0xFE;
pvr_regs[0x1A] = 0xBA;
pvr_regs[0x1B] = 0xBE;
// ---- DEBUG / BRING-UP: full VRAM copy ----
memcpy((void*)VRAM_BASE, (const void*)emu_vram, offs_8meg);
// Trigger the PVR reg copy on the core, then render the frame...
// (the core should then clear these to 0x00, after the frame is rendered.)
*(volatile uint32_t*)(VRAM_BASE + offs_8meg - 8) = 0xCAFEBABE;
*(volatile uint32_t*)(VRAM_BASE + offs_8meg - 4) = 0xCAFEBABE;
// Copy the PVR regs directly ABOVE the 8MB VRAM (in DDR3).
// Copy regs into FPGA-visible DDR
memcpy((void*)(VRAM_BASE + offs_8meg), pvr_regs, pvr_RegSize);
arm_cache_flush((void*)VRAM_BASE, (void*)(VRAM_BASE + HW_FPGA_VRAM_SPAN));
Looks like we’ve been eating good in here recently!
Sega Dreamcast
I bet that sucker has the hardware to run a Dreamcast core (if it is ever unlocked)
Possibly, it does have quite fast DDR3, compared to MiSTer.
IIRC, DDR3 on MiSTer runs at 400 MHz.
ie. Two 16-bit chips in parallel, so 32-bits wide...
400 MHz * 2 (DDR) = 800 MT/s (mega-transfers per second).
800 MT/s * 4 (bytes per clock edge) = 3.2 GBytes/sec.
On the Ana 3D, there are two sets of RAM chips.
The three chips = Alliance AS4C32M16D3-10BCN. - 512 Mbit (64 MByte), 16-bit wide, 933 MHz DDR3.
Two other chips = Alliance AS4C128M16D3C-93BCN - 2 Gbit (256 MByte), 1.5V, 2133MHz DDR3.
And it's likely using the full 48-bit bus for the first three.
32-bit bus for the fast pair.
So, peak transfer rates...
AS4C32M16D3-10BCN * 3 @ 933 MHz
933 * 2 (DDR) = 1,866
1,866 * 6 (48-bit bus = 6 bytes transfered on each clock edge) = 11.196 GBytes/sec.
AS4C128M16D3C-93BCN * 2 @ 2,133 MHz
2,133 * 2 (DDR) = 4,266
4,266 * 4 (32-bit bus = 4 bytes transfered on each clock edge) = 17,064 GBytes/sec. :o
I can't remember if the Cyc 10 can actually run the DDR3 as fast as 2,133 MHz, but it's still impressive.
I also don't know how fast the Cyclone 10 (GX) can run complex cores, 'cos the DC SH4 runs at 200 MHz, which I think would still be hard to hit.
The next biggest hurdle for running other stuff on the Ana 3D is figuring out the DDR3 and SDRAM pin mapping.
I didn't get too far with it yet.
DDR3 mapping is far more flexible than I thought. A lot of signals can be placed on any FPGA pin (within a specific bank), I thought it was far more strict than that.
And common layouts on Cyc 10 dev boards tend to use both the top and bottom banks.
But with the way stuff is placed on the Ana 3D motherboard, it makes more sense that the three DDR3 chips would be on the top two banks.
Then the other pair on the left-hand one or two banks.
Leaving the bottom two banks for the old-people SDRAM, and cart slot, STM32 comms, etc.
I'm not interested in messing with their code, I just want to run my own stuff + MiSTer cores. lol
What I can say is that the JTAG header uses 1.8 Volt logic levels.
So the el-cheapo $6 USB Blaster clone dongles won't cut it.
The one I first tried actually did have a buffer, presumably for voltage translation, but it could't talk to the FPGA at all.
I had to buy a new one to get it to work.
Then about three days spent, trying to figure out how the hell to get the old Hamsterworks HDMI stuff to work with the high-speed transceivers on the Cyc 10 GX.
'cos you can't just assign signals directly to those pins, you have to go through the Intel / Altera Transceiver IP block(s).
But I got that part working, after some struggling. I might put the template Quartus project on github soon.
The newer Quartus 25 does have an IP block for doing 4K HDMI output, btw, including HDMI 2.0, etc.
But it requires a licence file to compile with it, else it won't generate the programming files.
Actually, there is one way to confirm the pin mapping for DDR3 and SDRAM...
Remove the chips from the board. lol
Then you can inject a test signal (1 KHz or whatever) onto each pad, then see it in SignalTap.
That's how I did it for the SuperNT, but that uses TSOP RAM chips, so you have access to the pins.
I figured out most of the pin mapping between the cart slot and buffer IC, but not between the buffer and FPGA yet.
Dude, you rule
I figured based on the specs.
I was moreso expecting the edge cases with the PS1, SEGA Saturn, and Nintendo 64 being cleaned up long before the Dreamcast with what we know.
@rain obsidian I sent you a PM
Ways to Support the Channel
https://www.youtube.com/channel/UCEozS0uaZibXKTQSu10XgSw/join
https://www.patreon.com/PixelCherryNinja
Buy a Octopus TR Fightstick (affiliate link)
https://www.trfightstick.com/?ref=PCN
Click link then use code "PCN" for $25 Off
Join the Pixel Cherry Ninja Gaming Discord
https://discord.gg/5W9pCy2nXa
Upscaled to 4K ...
4/1
Oh god dammit
Did you by any chance tag me, then delete it again, once you realised? 😆
I'm not a huge fan of this specific date, btw. lol
In other news, I've been trying to get that Voodoo core running.
But I'm missing something critical, so it never tried to access DDR3 at any point.
When it was culling less of the core, it was using about 60-70% of the FPGA.
So I doubt it would ever fit alongside the ao486 core, at least not on the DE10.
But, if this works (it looks very complete), and hits a high enough clock freq for decent frame rates, I'd for sure look into designing a PCI card from it.
I haven't completed abandoned looking at the Ana 3D again either. It's just not high on the list atm.
I just spent over £400 on PCBs, mainly for the boards which let me retrofit the laser from Epson projector into JVC projectors.
Since these are from the Epson Education / Business projectors, they are super bright, like 6,200 Lumens.
But that class of Epson projector sucks at black levels and native contrast. lol
The JVCs are basically some of the best in the World (vs price) for that stuff.
And it looks fecking incredible already.
And that's only on the smaller HD350 JVC, which is 1080p.
The main source of light on the black bars now, is just from the reflection off the magnolia walls. lol
The HD350 was only £93 on eBay.
The donor Epson L630U (6,200 Lumens, 1080p LCD) for the laser, was £100.
I just had to reverse-engineer the serial commands to start up the laser, then designed a control board for it.
Then, a Mean Well 48V PSU (cranked up a bit) powers the laser, and takes the place of the old lamp + ballast.
The Epson laser uses a chonky Blue laser diode array.
That gets focussed onto a phosphor wheel, to generate the "Yellow" (Red + Green) light.
Part of the Blue light is bounced back on to the dichroic filter / mirror.
I had to chop off part of the RGB beam splitter on the JVC, to get the laser to fit.
New control board(s) already built by JLC.
Just waiting for them to finish the power board, for the JVC.
That will let me ditch the old JVC power supply completely, and put the new power board + Mean Well PSU in its place.
('cos the Mean Well is bolted on top atm, which means I can't put the top lid on. lol)
So, now you can see why I struggle to get each individual project done. 😛
The prototype control board had some issues. I kept blowing up the pins on the STM32, due to the Tacho and/or PWM signals from the fans, spiking up to 12V.
Hopefully that's fixed now, as well as adding some chonkier switching regs, for the phosphor wheel motor, laser driver logic, 12V fans, 3V3 STM32 supply.
The first Epson I bought was a 1470UI, which is one of those ultra short-throw things, often used in classrooms.
But the laser is slightly older tech. Not as efficient, quite a bit larger. I fit that into a JVC DLA-X500R, which has 4K e-shift, and even better native contrast.
A very kind person on another Discord, designed a fan holder for me.
It was the first time I'd printed something actually useful, on my 3D printer.
But the first layer made an "interesting" shape. lol...
(which possibly belongs on the "spicy off-topic" channel. lol)
Nope that wasn’t me haha. I just dropped that video link in here without tagging anybody
judging by the size of the core its about 66% full if even that so that is also quite promising
I hope its not a april fools thing especially sice the core is from 28/3 but who knows 😛
No no, I'm not the willing one
Where did you find it? It looks like it's just sitting on your Google Drive.