#Sega Dreamcast

9397 messages · Page 10 of 10 (latest)

rain obsidian
#

(ie. allowing the software emu part on the ARM side to run as fast as I could, with no graphices being rendered, it still wasn't amazing).

#

skmp started adding support for "Fastmem" a while ago, but I couldn't get it to compile / work.

#

I think it's possible to get the software emu running fast enough for playable speeds on the FPGA side, but we'll see.

#

oic, skmp did already get fastmem into the code, but I don't think I knew how to tweak the mmap stuff to work with it...

#

Oh great, most of the files were on my other PC. I'll have to figure out how to cross-compile for ARM again. lol

rich kindle
#

@pseudo tinsel do you have an idea why ElectronAsh was struggling with the fastmem compilation?

#

Any hints?

rain obsidian
#

I think the emu was essentially "running" with fastmem support, but I hadn't figured out how to hook up the mmap stuff again, so it can talk to the PVR thing on the FPGA side.

#

(I'm wary of calling it a "core" atm. lol)

rich kindle
#

The Discord search feature is rather for the extreme tech crowd 😁

rain obsidian
#

I get a weird feeling I need to set up something else, to cross-compile for ARM.

#

Like I do for Main MiSTer, but we'll see.

#

Can't believe how much stuff I've forgotten, in only about six months.

#
export PATH=$PATH:/opt/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf/bin
export CC='/opt/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf/bin/arm-none-linux-gnueabihf-gcc'
cmake -DCMAKE_TOOLCHAIN_FILE=tc.cmake ..
#

Without that stuff, it will basically compile the emu for your host system, so it just compiled it for x86 / Linux. lol

pseudo tinsel
#

but fastmem is trickier

rain obsidian
#

Ahh, you have to delete the CMakeFiles folder, if it was already configured previously (to compile for x86, not ARM).

#

Then run the thingy again.

#

My brain has already gone to mush.

#

Some older code.

#

Call to mmap_setup() used to be here, in _vmem.cpp...

#

Well there's a surprise. lol

#

Missing some Linux header stuffs.

#

Ok, that was just because I uncommented the mmap stuff in mister_if.cpp, but it got moved into _vmem.cpp a while ago.

#

Ok, that's strong deja vu.

#

I added the pvr_regs.h include to mister_if, and fixed the other errors.

#

And I'm starting to remember that this shm_unlink thing was quite hard to fix.

#

ChatGippity...

#
shm_open() and shm_unlink() live in librt on ARM Linux toolchains.

Your code compiled fine, but the linker was not told to link -lrt.
#

In CmakeLists.txt...

#
target_link_libraries(${APP_NAME}.elf PRIVATE stdc++ rt)
#

Added the "rt" part.

rain obsidian
#

Removed it again.

#

ChatGippity gave a way to fix the _vmem thing, without using posix stuff.

#

Because it said "MiSTer doesn't need it", whatever that means. lol

#

It also says to NOT try to "FastMEM" the ARM/FPGA shared mem, but I don't think it quite knows what we were trying to do.

#

It did compile, at least.

#

In posix_vmem.cpp...

#
// Allocates memory via a fd on shmem/ahmem or even a file on disk
/*
static int allocate_shared_filemem(unsigned size) {
    int fd = -1;
    #if defined(_ANDROID)
    // Use Android's specific shmem stuff.
    fd = ashmem_create_region(0, size);
    #else
        #if HOST_OS != OS_DARWIN
        fd = shm_open("/dcnzorz_mem", O_CREAT | O_EXCL | O_RDWR, S_IREAD | S_IWRITE);
        shm_unlink("/dcnzorz_mem");
        #endif

        // if shmem does not work (or using OSX) fallback to a regular file on disk
        if (fd < 0) {
            string path = get_writable_data_path("/dcnzorz_mem");
            fd = open(path.c_str(), O_CREAT|O_RDWR|O_TRUNC, S_IRWXU|S_IRWXG|S_IRWXO);
            unlink(path.c_str());
        }
        // If we can't open the file, fallback to slow mem.
        if (fd < 0)
            return -1;

        // Finally make the file as big as we need!
        if (ftruncate(fd, size)) {
            // Can't get as much memory as needed, fallback.
            close(fd);
            return -1;
        }
    #endif

    return fd;
}
*/

// MiSTer does not use POSIX shared memory or file-backed fastmem.
// We return a dummy fd and use MAP_ANONYMOUS instead.
static int allocate_shared_filemem(unsigned size)
{
    (void)size;
    return -1; // signals "no fd backing"
}
#

Not quite sure what that does, and what it might break.

#

Oww. Stupid server bot. lol

#

It culled the "links", but those were just my notes, for getting the software emu to run.

#

Shove it, Server Turtle. lol

#
Fastmem does help — but not enough on MiSTer by itself

Fastmem mainly accelerates:
SH4 RAM loads/stores
Instruction fetch
Texture cache reads
Tight interpreter/JIT loops
On desktop x86:
Big TLB
Aggressive OoO
Fast branch prediction
Fastmem can give 20–40% speedups.
#
On MiSTer’s Cortex-A9, Fastmem gains are limited
MiSTer specs (DE10-Nano):
ARM Cortex-A9 (dual-core, but single core used)
In-order execution
Small TLB
Weak branch predictor
No speculation magic
What that means:
Fastmem saves function calls ✔
But memory latency dominates ❌
Page table walks are expensive
mprotect / guard pages hurt
So even with:
PVR disabled
No rendering
Full Fastmem
You still won’t hit full speed in many titles.
#
Fastmem on MiSTer: what it can realistically give you

From real-world measurements on A9-class ARM cores:

Change                     Speedup
Fastmem vs safe mem        ~10–20%
Disable bounds checks      ~5–10%
Inline load/store helpers  ~5%
Remove virtual calls       ~5%

Stacked:

~20–30% total

Helpful, but not transformational.
pseudo tinsel
#

fastmem brings around 2x perf

rain obsidian
#

I'd consider it pretty "transformational" if it makes it even 20% faster. lol

#

Nice.

#

I need to try it, if I can remember how to even get it all running.

#

My MiSTer box is kinda hooked up to a projector atm, but at least it's a laser projector, with no evil lamps.

rain obsidian
#

But I'm slowly getting there.

#

lol. ffs

#

This is the old one, that was on the SD card from many months ago...

#

Aaaaand... the latest code that I just screwed up...

#

The older fastmem version seems a LOT faster, even just the way the text scrolls past at the start.

#

If that second value is the "FPS", then the older code looks fine.

#

OK, so. I can’t tell which of the two older versions still worked with the crusty PVR “core”.

#

But apparently it’s not the core that’s in the “DC_MiSTer” folder on this PC. lol

#

I’ll have a look on the other PC.

rain obsidian
#

Found the old backups, and can see how it works again now.

#

Shoved it onto Mega.

#

Sorry for the rushed video.

#

I was trying to remember how we set up the controls. skmp made the controls work via the ssh console. lol

#

It renders faster than I remember, but it looks like frame sync isn’t enabled atm.

#

This is also the non-fastmem reicast.

#

OK, yeah, the old fastmem version doesn't trigger frame rendering, so that's where we were up to last time.

#
Arrows on I,J,K,L
A = Keyboard V.
B = Keyboard C.
#

Yes, yes, I know it's crappy. lol

#

It's doing about 3-7 FPS.

#

How the emu triggers a frame render on the FPGA side...

#
    case (ra_state)
        0: begin
            draw_last_tile <= 1'b0;
            // Triggered from the MiSTer menu, or after a PVR dump load...
            if (ra_trig_reg) begin
                ra_trig_reg <= 1'b0;
                ra_state <= 8'd1;        // Start rendering the FRAME.
            end
            else begin    // Else, keep polling for the magic number in DDR3 / PVR regs copy...
                trig_pvr_update <= 1'b1;    // Trigger the PVR reg update.
                ra_state <= 8'd200;
            end
        end
        
        200: if (!trig_pvr_update && !pvr_reg_update) begin
            if (TEST_SELECT != 32'h0000000) ra_state <= 8'd1;    // Start rendering the FRAME.
            else ra_state <= 8'd0;
        end
#

The "TEST_SELECT" PVR register isn't likely to be used be any commercial games.

#

Then it all gets triggered from the reicast/minicast emu, in mister_if.cpp...

#
    // Write the magic number to the evil PVR register. ;)
    //
    // parameter TEST_SELECT_addr = 16'h0018; // RW  Test - writing this register is prohibited.
    //
    pvr_regs[ 0x18 ] = 0xCA;
    pvr_regs[ 0x19 ] = 0xFE;
    pvr_regs[ 0x1a ] = 0xBA;
    pvr_regs[ 0x1b ] = 0xBE;
    
    // Copy the PVR regs directly ABOVE the 8MB VRAM (in DDR3).
    memcpy(vram+offs_8meg, pvr_regs, pvr_RegSize);
    
    // Trigger the PVR reg copy on the core, then render the frame...
    // (the core should then clear these to 0x00, after the frame is rendered.)
    vram[ (offs_8meg-8)+0 ] = 0xCA;
    vram[ (offs_8meg-8)+1 ] = 0xFE;
    vram[ (offs_8meg-8)+2 ] = 0xBA;
    vram[ (offs_8meg-8)+3 ] = 0xBE;
    vram[ (offs_8meg-8)+4 ] = 0xCA;
    vram[ (offs_8meg-8)+5 ] = 0xFE;
    vram[ (offs_8meg-8)+6 ] = 0xBA;
    vram[ (offs_8meg-8)+7 ] = 0xBE;
#

I barely remember doing any of this.

rain obsidian
#

So my challenge for the next week - Try to get some of the fixes from the sim version into the DC "core".

#

So the graphics don't look as crap.

#

That's if the new stuff will even fit.

#

I basically re-use a single instance of the interpolation block now, to pre-calc some stuff at the start of each polygon (per-tile).

#

Like the ARGB and Gouraud shading stuff.

#

Which currently takes 7 clock cycles, which isn't too bad, but saves on a ton of logic.

rain obsidian
#

Currently using 83% of the FPGA logic, and 47% BRAM.

#

Which is a lot.

#

= very inefficient atm.

rain obsidian
#

DE10 is too weak sauce.

#

Can't even add the updated intri_calc, nor the interp module.

rain obsidian
#

Awake again. Only got about 40 minutes sleep. sigh

#

I tried tweaking the interp module last night, to save on logic for some dividers. It didn’t work. lol

#

Quartus used exactly the same amount of logic for that module, whether it was a combinational assignment, or a clocked / register assignment.

#

So it’s going to be tough to get the renders to look any better atm. A few more tricks to try.

rain obsidian
#

Added the newer code for the texture address module / colour blender, and saved a few percent logic.

#

But now the core is unstable once again, and only renders about ten frames before freezing. It’s a marginal timing thing.

#

I think it looks better overall, but hard to tell unless I can get it into a game.

#

Ok, it’s just as crusty as before. lol

#

So I next need to fix the real issue with reading from DDR3, because it's clearly corrupting a lot of stuff.

#

For reference, a typical Taxi render looks like this in the sim atm...

#

(part of the road missing, etc. because I have to break some things to start to fix / add others.)

rain obsidian
#

I'm up to about the fifth Quartus recompile now.

#

Still not running yet. I just need to get to the next stage.

#

Quartus is taking about 35 minutes per compile, and there's no easy way to debug some of the trigger logic otherwise.

#

I can't even easily test it in Verilator, because the reicast/minicast emu runs on the ARM / DE10.

#

The turnaround time of Quartus compiles is often so long, it's what kills motivation.

#

Some of the time, I can't even remember what I changed last. lol

#

I'm basically trying to paste in parts of the newer code (from the sim version) into the core, to try to improve the renders.

#

And also hopefully speed it up quite a bit.

#

Then I can look again at getting the fastmem version of minicast working.

#

The current issue. ^

rich kindle
#

Isn't there some Windows running on the Dreamcast? How is that emulated? Just side loaded like a BIOS?

rain obsidian
#

The ISP pulsed DDRAM_RD, so there was a read pending / it was waiting for the DDR controller to respond by asserting DDRAM_DOUT_READY, but it never does.

#

Not sure. I saw a vid yesterday about somebody running "Windows 98" on Dreamcast, but it was clickbait. lol

#

It was just an old KOS example, showing a GUI that was made to look like Windows 98.

#

I can't remember if any homebrew got actual Windows running on it. I don't think so?

#

The Windows CE logo on the Dreamcast is a bit of a misnomer, as it's not like it ever ran a full Windows CE OS.

#

Just parts of the API / SDK, to make it "easier" for devs to write games, AFAIK.

#

Man, waiting for Quartus is killing me. lol

#

Half the reason I built this 5900X with 32GB of RAM was to try to compile faster.

#

I often wonder how fast a Threadripper might do Quartus compiles, but then Quartus sucks for taking advantage of multi-threading.

rich kindle
#

Yeah, but parts of DirectX run actually on the Dreamcast?

#

Because some games need that?

rain obsidian
#

I guess the DirectX thing is more like a translation layer, tbh. Not sure.

rich kindle
#

The non-Katana games?

rain obsidian
#

ie. it just let the programmers use the same / similar set of API function calls, etc., to help port / write stuff?

#

But probably only a smaller subset of what a typical Direct X graphics card was capable of, even back then.

#

It's my main goal this week, to just get the renders looking better on MiSTer. Even if it's still slow.

#

It can only get faster, if I can implement some cache stuff, to do Burst Transfers from DDR3.

#

(I don't yet need to write back to DDR3, because my framebuffer is in the old-people SDRAM.)

#

The reicast/minicast emu basically writes all the params and textures into VRAM, like it normally would.

#

Then I trigger a render on the FPGA side, by writing to the PVR "TEST_SELECT" register.

#

The FPGA renders one frame, using the data already in VRAM.

#

Then it clears the TEST_SELECT register, just so it doesn't re-trigger the same frame.

#

Then it writes some stuff near the end of VRAM, to tell minicast to write the next stuff to VRAM, and that repeats.

#

I probably don't have it synced atm, meaning minicast will be constantly writing stuff to VRAM as fast as it can, even if the FPGA hasn't finished rendering the full frame.

#

Which is what causes the chonky vertical line gaps and other weirdness in the recent vid.

#

I purposely didn't have frame sync on, to allow minicast to run faster, else it was taking forever to actually get into a game.

#
void rend_start_render(u8* vram) {
    // kick off render
    //printf("rend_start_render\n");
    //SetREP(20 * 1000 * 1000); // in 20 mhz = 10 ms at 200 mhz
    SetREP(20); // in 20 mhz = 10 ms at 200 mhz
    
    // Copy the PVR regs directly ABOVE the 8MB VRAM (in DDR3).
    memcpy(vram+offs_8meg, pvr_regs, pvr_RegSize);
    
    // Write the magic number to the evil PVR register, to trigger a PVR update and Frame render ;)
    //
    // parameter TEST_SELECT_addr = 16'h0018; // RW  Test - writing this register is prohibited.
    //
    // The new core should clear the TEST_SELECT reg (internally) after the frame is drawn.
    //
    pvr_regs[ 0x18 ] = 0xCA;
    pvr_regs[ 0x19 ] = 0xFE;
    pvr_regs[ 0x1a ] = 0xBA;
    pvr_regs[ 0x1b ] = 0xBE;

    // Trigger the PVR reg copy on the core, then render the frame...
    // (the core should then clear these to 0x00, after the frame is rendered.)
    /*
    vram[ (offs_8meg-8)+0 ] = 0xCA;
    vram[ (offs_8meg-8)+1 ] = 0xFE;
    vram[ (offs_8meg-8)+2 ] = 0xBA;
    vram[ (offs_8meg-8)+3 ] = 0xBE;
    vram[ (offs_8meg-8)+4 ] = 0xCA;
    vram[ (offs_8meg-8)+5 ] = 0xFE;
    vram[ (offs_8meg-8)+6 ] = 0xBA;
    vram[ (offs_8meg-8)+7 ] = 0xBE;
    */

    // flush vram contents here from cache
    // call out to hw to render
}


void rend_end_render() {
    // wait for render to end
    // interrupts get fired automatically
    
    //while ( emu_vram[ (offs_8meg-8)+0 ] != 0x00) {}
    
    FrameCount++;
    //printf("rend_end_render\n");
}
#

The core doesn't actually check for the "magic number" in the TEST_SELECT reg atm, it just triggers whenever the reg != 0.

rich kindle
#

Everything is very exciting though. Really looking forward to the performance gain. And how much it will be.

rain obsidian
#

I don't think it will be that much faster yet. But yeah, anything would be an improvement. lol

#

The BIG gains won't happen until I can get some cache stuff working.

#

Even a more stable 10-15 FPS would be almost playable.

#

20 FPS is more playable than I think a lot of people realise. I mean, we had that on many N64 games. 😛

rich kindle
#

Stable 15 fps would be more than some N64 games had 😁

edgy pilot
#

10-15 FPS isn't that far from 25 fps todd (What Pal Resident Evil Code Veronica runs at I believe)

But it's always good to see you back ElectronAsh and coding away

rich kindle
rain obsidian
#

lol

#

I still think the N64 was almost intentionally crippled, by Nintendo’s cost-saving.

#

I’m convinced SGI were aiming for 30 FPS in most games, at the full 640x480 (480i).

#

But they were forced to cram almost everything into the ONE chip, which was apparently one of the largest (in silicon area) for the 130nm process at the time.

#

(Steve Shepherd worked on the chip. He wrote a great article about the challenge they had trying to cram it all on one chip.)

rich kindle
#

The cost savings moved them to a B-tier supplier. The PSX annihilated them

rain obsidian
#

But then I think I asked him on TwitterX about it years ago, and I think he said they always intended to use one chip, if possible, so maybe not.

#

It just seems like - if you are seriously struggling to fit a design within a specific chip area, what do you think they would look at culling first?…

#

Yep… the on-chip memory. lol

#

Hence the Texture cache ended up a measly 4KB.

#

If they had split the design across even two chips, and had separate Main RAM and VRAM, it could have been so much better.

#

Anyway, I guess it was often about getting the console “usable” back then, without it costing the company a larger fortune to build each one.

#

Sorry, I’m a bit bored. lol

#

I really hate this “limbo” stage of waiting for Quartus compiles, knowing that it probably still won’t work on the next try.

#

If you read some of that article about how much they struggled, it sounds to me like the engineers would have just asked Nintendo / NEC if they could split the design.

#

And if it was that hard to make the design fit the RCP, it also seems like that’s a good reason why the Texture cache and some other stuff were so small.

#

Anyway, just a theory of mine. We’ll probably never know. lol

rich kindle
#

Sony sold two times as many PSX units as N64 and GameCube were sold combined.

rain obsidian
#

Yep, and even as a big fan of the N64, even I would admit that the PSX graphics have actually held up quite well, especially in racing games etc.

#

Because... frame rate matters. lol

#

A higher frame rate can make up for a LOT of other issues.

#

I was never a fan of the wobbly graphics of the PSX, although it was amazing to see in 1995 or so, before most of us had a Voodoo 1 or whatever.

#

I felt like N64 was the first to at least do 3D "properly".

#

ie. non-wobbly meshes, and perspective-correct textures.

#

So it felt more "solid" and grounded, in a way.

#

But it was always a bit too blurry. lol

#

Years later, we find out that the final "VI Blur" was a kind of leftover from when it runs in 640x480 mode.

#

Since the last linear interp of the VI seems to be intended for 640 pixels per line.

#

When most games only rendered at 320x240, that linear interp would basically cause it to blur every-other pixel.

#

Having said all that, it looked a LOT better on a half-decent CRT TV than it does on most modern TVs.

#

(unless you have the money for a RetroTink 4K or similar)

#

My current MiSTer setup, btw. 😉

#

It’s a long story. lol

#

I had to just hit Ctrl-Z on Quartus, to undo my last changes.

#

And try to get back to where it was before, just with a slightly updated ISP parser.

#

Also compiling the older minicast code, without fastmem. So I have a basis to work from.

distant wagon
#

Woah. I forgot there was a time when it didn’t and just thought of the VMU beep as a part of the boot up process in my mind. Heh.

rain obsidian
#

lol

#

True.

#

I don't remember ever replacing the battery in mine.

#

I had the light gun, for HOTD2 etc.

#

I think I had the fishing rod, too, but I wasn't a huge fan of fishing.

#

I wish we'd known at the time, that most GD drives could be made a lot quieter, just by adding some silicon grease to the laser sled gears.

rain obsidian
#

OK, I think the issue with the core freezing, is actually a side-effect of me trying to speed up the rendering.

#

As it starts to do more read requests from DDR3, it was causing DDRAM_BUSY to go High.

#

Which I did have some checks for, but it was likely getting stuck in a kind of race condition.

#

Last compile took 40 minutes, so it's painfully slow to test things.

#

What I should really do, is try to write a proper sim for the DDR interface.

#

Including the gaps in non-contiguous bursts, write support, variable latency, etc.

rain obsidian
#

Man, I've forgotten SO much about all of this.

#

It's not even using the standard SDRAM any more. lol

#

The framebuffer is in DDR3, alongside normal VRAM.

#

(kind of closer to what the DC does.)

#

I think the freezing is caused by me doing too many read or write requests, without checking to see if DDRAM_BUSY is Low.

#

I do some checks for it, but my logic is still not quite right, so sometimes it will send too many Writes, causing the DDR controller to lock up.

#

But at least it might not be core instability, as such. It's just that each time I get it running faster, it's doing more DDR requests, so failing sooner.

#

Yep, it's getting stuck in ra_state 13.

#
        13: if (tile_accum_done) begin
            if (opb_word[31:29]==3'b111 || poly_drawn) begin
                ra_state <= 8'd9;    // Read the next Prim TYPE entry.
            end
        end
#

Either waiting for tile_accum_done, or poly_drawn, or both.

#

tile_accum_done comes from the ISP parser...

#
// Write pixel to Tile ARGB buffer.
56: if (!vram_wait) begin
    fb_addr <= my_addr[23:1];
    fb_writedata <= {pix_565, pix_565, pix_565, pix_565};
    fb_byteena <= (both_buff) ? ((!FB_W_SOF1[22]) ? 8'b11110000 : 8'b00001111) : 8'b11111111;
    fb_we <= 1'b1;
    
    //if (!vram_wait) begin
        if (y_ps[4:0]==5'd31 && x_ps[4:0]==5'd31) begin    // Last pixel written...
            tile_wb <= 1'b1;                                    
            tile_accum_done <= 1'b1;    // Tell the RA we're done.
            isp_state <= 8'd0;            // Back to idle state
        end
        else begin        // Not on the last Tile pixel yet...
            x_ps[4:0] <= x_ps[4:0] + 5'd1;
            if (x_ps[4:0]==5'd31) y_ps[4:0] <= y_ps[4:0] + 5'd1;
            isp_state <= 8'd51;    // Jump back.
        end
    //end
end
#

I moved the "if (!vram_wait) to the top earlier.

#

So that it couldn't spam the DDR controller with the fb_we pulses, without checking for vram_wait (Low) first.

#

But that didn't help. lol

#

The tile_accum_done flag, was from when I was starting to implement the proper Tile Accumulation buffer.

#

ie. one of the internal buffers on PVR, which the final RGB pixels get rendered to.

#

A completed tile would then get written back to VRAM (on the DC) via Burst transfer. It's the only method that really makes sense, so it can render fast enough.

rain obsidian
#

Quartus is so SLOOWWWW. lol

#
        // Wait for current poly to be written to the Tag buffer in the ISP.
        12: begin
            if (poly_drawn) begin
                if (ra_cont_last) draw_last_tile <= 1'b1;
                ra_vram_addr <= ra_vram_addr + 4;    // Go to next WORD in OL.
                ra_vram_rd <= 1'b1;
                ra_state <= 10;
            end
        end
        
        // Wait for the ISP/TSP to render the final pixels to the tile.
        13: if (tile_accum_done) begin
            //if (opb_word[31:29]==3'b111 || poly_drawn) begin
                ra_state <= 8'd9;    // Read the next Prim TYPE entry.
            //end
        end
#

Commented out some stuff. Works in the sim (still lets it run, and doesn't break stuff), now to see if it works on the FPGA.

#

Many hours gone, and I'm pretty much back as I was before, but trying to kill this bug.

#

'cos I can't have it freezing after a handful of frames. Can't make progress that way.

#

You can see the "render_poly" and "poly_drawn" flags there.

#

I should rename those, because that doesn't actually render the final pixels.

#

It just registers the Tags for the current polygon (triangle) in the Tag Buffer, assuming any pixels of the triangle pass the inTri and depth_compare tests.

#

Once all of the triangles in the current tile have been processed into the Tag buffer, the "render_to_tile" signal is what kicks of the final pixel output.

#

atm, the final pixel colours get written directly to the framebuffer in DDR3.

#

I didn't get far enough with implementing the proper Tile Accumulation Buffer before, due to all of the other issues.

#

You can see there are three clock ticks between each DDRAM_WE (write) pulse, though.

#

Multiply that by the number of pixels on the whole screen (640x480 = 307,200), you can imagine why it's so slow atm.

#

At 100 MHz, it would be able to write to all pixels on the screen within 3.072 milliseconds.

#

(not including the rest of the ISP/TSP processing to actually render the tiles first.)

#

Since it's taking basically four clock cycles between each Write, it would take 12.288 milliseconds to update the entire screen.

#

Which would obviously massively impact the overall frame rate, once you include all of the other processing.

#

So the (eventual) goal, is to try to output a new pixel value on EVERY clock tick.

#

Thinking more about it, that only leaves around 13.59 milliseconds for the PVR to do all of the other processing for each frame, if it is to hit 60 FPS.

rain obsidian
#

Fairly sure I just found the REAL bug. sigh

#

In the RA parser, I wasn't checking to see if the ISP was in the idle state before pulsing render_to_tile.

#

And the ISP will only see the render_to_tile pulse in the idle state.

#

Which means it was never kicking off the render.

#

Which in turn means the ISP was never pulsing tile_accum_done when done, which was locking up the RA.

#

I need to learn how to code this stuff better.

rain obsidian
#

With a lot of help from ChatGippity, I got the fastmem version of reicast/minicast to compile and "run".

But with no rendering atm, because I had to roll back the Verilog again. sigh. lol

#

Fastmem does appear to let the emulator hit 60 FPS, at least in the BIOS menu.

#

Trying Daytona on it now, just to see what frame rate it hits. To be clear, there is no graphics rendering at all atm, until I get this next Quartus compile done.

#

And even then, it will be very slow.

#

AFAIK, the first value shown above is the frame time. So 16.66ms etc.

#

And the second value is the FPS.

#

I guess we can say the emulator still struggles a lot on MiSTer. lol

#

And that's without any kind of graphics atm.

#

It only hits about 15-20 FPS in Daytona.

#

MarioKart 64 port won't even run, it just crashes.

rain obsidian
#
    emu_vram = vram;

    // Write the magic number to the evil PVR register. ;)
    //
    // parameter TEST_SELECT_addr = 16'h0018; // RW  Test - writing this register is prohibited.
    //
    // Write magic into PVR regs (emulator-side)
    pvr_regs[0x18] = 0xCA;
    pvr_regs[0x19] = 0xFE;
    pvr_regs[0x1A] = 0xBA;
    pvr_regs[0x1B] = 0xBE;
    
    // ---- DEBUG / BRING-UP: full VRAM copy ----
    memcpy((void*)VRAM_BASE, (const void*)emu_vram, offs_8meg);
           
    // Trigger the PVR reg copy on the core, then render the frame...
    // (the core should then clear these to 0x00, after the frame is rendered.)
    *(volatile uint32_t*)(VRAM_BASE + offs_8meg - 8) = 0xCAFEBABE;
    *(volatile uint32_t*)(VRAM_BASE + offs_8meg - 4) = 0xCAFEBABE;

    // Copy the PVR regs directly ABOVE the 8MB VRAM (in DDR3).
    // Copy regs into FPGA-visible DDR
    memcpy((void*)(VRAM_BASE + offs_8meg), pvr_regs, pvr_RegSize);

    arm_cache_flush((void*)VRAM_BASE, (void*)(VRAM_BASE + HW_FPGA_VRAM_SPAN));
floral vale
#

Looks like we’ve been eating good in here recently!

rain obsidian
#

I got a bit sidetracked.

ripe stump
#

Sega Dreamcast

floral vale
rain obsidian
#

IIRC, DDR3 on MiSTer runs at 400 MHz.
ie. Two 16-bit chips in parallel, so 32-bits wide...
400 MHz * 2 (DDR) = 800 MT/s (mega-transfers per second).
800 MT/s * 4 (bytes per clock edge) = 3.2 GBytes/sec.

#

On the Ana 3D, there are two sets of RAM chips.

#

The three chips = Alliance AS4C32M16D3-10BCN. - 512 Mbit (64 MByte), 16-bit wide, 933 MHz DDR3.
Two other chips = Alliance AS4C128M16D3C-93BCN - 2 Gbit (256 MByte), 1.5V, 2133MHz DDR3.

#

And it's likely using the full 48-bit bus for the first three.

#

32-bit bus for the fast pair.

#

So, peak transfer rates...

#
AS4C32M16D3-10BCN * 3 @ 933 MHz

933 * 2 (DDR) = 1,866
1,866 * 6 (48-bit bus = 6 bytes transfered on each clock edge) = 11.196 GBytes/sec.
#
AS4C128M16D3C-93BCN * 2 @ 2,133 MHz

2,133 * 2 (DDR) = 4,266
4,266 * 4 (32-bit bus = 4 bytes transfered on each clock edge) = 17,064 GBytes/sec. :o
#

I can't remember if the Cyc 10 can actually run the DDR3 as fast as 2,133 MHz, but it's still impressive.

#

I also don't know how fast the Cyclone 10 (GX) can run complex cores, 'cos the DC SH4 runs at 200 MHz, which I think would still be hard to hit.

#

The next biggest hurdle for running other stuff on the Ana 3D is figuring out the DDR3 and SDRAM pin mapping.

#

I didn't get too far with it yet.

#

DDR3 mapping is far more flexible than I thought. A lot of signals can be placed on any FPGA pin (within a specific bank), I thought it was far more strict than that.

#

And common layouts on Cyc 10 dev boards tend to use both the top and bottom banks.

#

But with the way stuff is placed on the Ana 3D motherboard, it makes more sense that the three DDR3 chips would be on the top two banks.

#

Then the other pair on the left-hand one or two banks.

#

Leaving the bottom two banks for the old-people SDRAM, and cart slot, STM32 comms, etc.

#

I'm not interested in messing with their code, I just want to run my own stuff + MiSTer cores. lol

#

What I can say is that the JTAG header uses 1.8 Volt logic levels.

#

So the el-cheapo $6 USB Blaster clone dongles won't cut it.

#

The one I first tried actually did have a buffer, presumably for voltage translation, but it could't talk to the FPGA at all.

#

I had to buy a new one to get it to work.

#

Then about three days spent, trying to figure out how the hell to get the old Hamsterworks HDMI stuff to work with the high-speed transceivers on the Cyc 10 GX.

#

'cos you can't just assign signals directly to those pins, you have to go through the Intel / Altera Transceiver IP block(s).

#

But I got that part working, after some struggling. I might put the template Quartus project on github soon.

#

The newer Quartus 25 does have an IP block for doing 4K HDMI output, btw, including HDMI 2.0, etc.

#

But it requires a licence file to compile with it, else it won't generate the programming files.

#

Actually, there is one way to confirm the pin mapping for DDR3 and SDRAM...

#

Remove the chips from the board. lol

#

Then you can inject a test signal (1 KHz or whatever) onto each pad, then see it in SignalTap.

#

That's how I did it for the SuperNT, but that uses TSOP RAM chips, so you have access to the pins.

#

I figured out most of the pin mapping between the cart slot and buffer IC, but not between the buffer and FPGA yet.

floral vale
#

Dude, you rule

valid idol
#

😠

polar goblet
#

I figured based on the specs.

#

I was moreso expecting the edge cases with the PS1, SEGA Saturn, and Nintendo 64 being cleaned up long before the Dreamcast with what we know.

vale tide
#

@rain obsidian I sent you a PM

floral vale
#

Ways to Support the Channel
https://www.youtube.com/channel/UCEozS0uaZibXKTQSu10XgSw/join
https://www.patreon.com/PixelCherryNinja
Buy a Octopus TR Fightstick (affiliate link)
https://www.trfightstick.com/?ref=PCN
Click link then use code "PCN" for $25 Off

Join the Pixel Cherry Ninja Gaming Discord
https://discord.gg/5W9pCy2nXa

Upscaled to 4K ...

▶ Play video
jolly oriole
#

4/1

floral vale
#

Oh god dammit

rain obsidian
#

I'm not a huge fan of this specific date, btw. lol

#

In other news, I've been trying to get that Voodoo core running.

#

But I'm missing something critical, so it never tried to access DDR3 at any point.

#

When it was culling less of the core, it was using about 60-70% of the FPGA.

#

So I doubt it would ever fit alongside the ao486 core, at least not on the DE10.

#

But, if this works (it looks very complete), and hits a high enough clock freq for decent frame rates, I'd for sure look into designing a PCI card from it.

#

I haven't completed abandoned looking at the Ana 3D again either. It's just not high on the list atm.

#

I just spent over £400 on PCBs, mainly for the boards which let me retrofit the laser from Epson projector into JVC projectors.

#

Since these are from the Epson Education / Business projectors, they are super bright, like 6,200 Lumens.

#

But that class of Epson projector sucks at black levels and native contrast. lol

#

The JVCs are basically some of the best in the World (vs price) for that stuff.

#

And it looks fecking incredible already.

#

And that's only on the smaller HD350 JVC, which is 1080p.

#

The main source of light on the black bars now, is just from the reflection off the magnolia walls. lol

#

The HD350 was only £93 on eBay.

#

The donor Epson L630U (6,200 Lumens, 1080p LCD) for the laser, was £100.

#

I just had to reverse-engineer the serial commands to start up the laser, then designed a control board for it.

#

Then, a Mean Well 48V PSU (cranked up a bit) powers the laser, and takes the place of the old lamp + ballast.

#

The Epson laser uses a chonky Blue laser diode array.

#

That gets focussed onto a phosphor wheel, to generate the "Yellow" (Red + Green) light.

#

Part of the Blue light is bounced back on to the dichroic filter / mirror.

#

I had to chop off part of the RGB beam splitter on the JVC, to get the laser to fit.

#

New control board(s) already built by JLC.

#

Just waiting for them to finish the power board, for the JVC.

#

That will let me ditch the old JVC power supply completely, and put the new power board + Mean Well PSU in its place.

#

('cos the Mean Well is bolted on top atm, which means I can't put the top lid on. lol)

#

So, now you can see why I struggle to get each individual project done. 😛

#

The prototype control board had some issues. I kept blowing up the pins on the STM32, due to the Tacho and/or PWM signals from the fans, spiking up to 12V.

#

Hopefully that's fixed now, as well as adding some chonkier switching regs, for the phosphor wheel motor, laser driver logic, 12V fans, 3V3 STM32 supply.

#

The first Epson I bought was a 1470UI, which is one of those ultra short-throw things, often used in classrooms.

#

But the laser is slightly older tech. Not as efficient, quite a bit larger. I fit that into a JVC DLA-X500R, which has 4K e-shift, and even better native contrast.

#

A very kind person on another Discord, designed a fan holder for me.

#

It was the first time I'd printed something actually useful, on my 3D printer.

#

But the first layer made an "interesting" shape. lol...

rain obsidian
#

(which possibly belongs on the "spicy off-topic" channel. lol)

floral vale
vale tide
dense shard
#

wait no its backwards NO WTF

#

ARTGHGH why is that the first ONE

vale tide
#

I hope its not a april fools thing especially sice the core is from 28/3 but who knows 😛

austere shuttle
#

Who's willing to find out what this does

#

Never Gonna Give You Up?

ripe stump
austere shuttle
#

No no, I'm not the willing one

sinful monolith
#

Where did you find it? It looks like it's just sitting on your Google Drive.