#Sega Dreamcast
1 messages · Page 9 of 1
The GPU does technically have "four" Address busses for VRAM.
A0, A1, B0, and B1.
I'm certain that the "A" and "B" busses can have a different address, for when it's reading the Vertex / Colour params from once 4MB half (A), whilst writing back to the Framebuffer in the opposite half (B).
That A/B order then swaps on each completed frame.
But I think the A0+A1 pair might just be a duplicate of the SAME logical signals.
And the same for the B0+B1 pair.
The main reasons for that are: It helps make the PCB routing easier.
And also, it allows the HOLLY chip to drive up to four SDRAM chips per A/B bus.
(four chips per bus on NAOMI, two chips per bus on DC.)
It doesn't matter too much if it can't have four different addresess when reading textures in "64-bit" mode, because that can be done once the data is already pre-read into the texture cache.
I will hook up the Oscilloscope soon, to put this theory to rest.
The four resistor arrays on the underside of the DC mobo (at least on the VA1) carry the WE_N, RAS_N, CAS_N, and CS_0 signals.
But there are only two sets of those signals. One for bus A, one for bus B.
Which again strongly suggests HOLLY can output a different address for each 32-bit half of the data.
(but not four different addresses, one for each 16-bit part of the data.)
This stuff can be important for how the core is structured, as it can help to understand the inner workings somewhat.
I assume you can’t fuse the (16x2)x2 bit buses into (32x1)x2? Or am I misunderstanding FPGAs and programming for them?
@valid idol On MiSTer, the DDR RAM is 64-bit wide, so you have to access a whole 64-bit on each clock cycle anyway.
That made it very confusing, with the way PVR2 reads parameters as 32-bit wide, but textures as 64-bit wide.
I just had to visually rethink how the VRAM is laid out, then it was easier.
But there are problems with that, because you can't then read from the 32-bit "half" whilst writing back to a different address in the other 32-bit "half". lol
So I'm still using SDRAM as the Framebuffer atm, so I can write directly to that, whilst reading params and textures from the VRAM dump in DDR3.
Oh right, I forgot, this is a hybird emulator.
You can easily split and combine busses within the FPGA, but the problem with RAM is that you are limited to how many Bytes/Words of data you can read on each clock tick.
Not really related to the hybrid thing, as such.
"hybrid" emulation in this context, is just where the stripped-down reicast is running most of the emu on the ARM side.
But the GPU renderer is all running on the FPGA side.
Both can access VRAM from the same area of DDR3, though.
Not Flycast?
Is Flycast too heavy for the ARM cpu?
So the emulator does the SH4 CPU stuff, which then does all of the 3D calcs, and writes the textures and display lists into VRAM.
Then a flag is set, to tell the FPGA side that everything is ready within VRAM, so the FPGA can render the frame.
Flycast is based on reicast, AFIAK.
reicast still isn't running at full-speed on the ARM on the DE10, because the ARM cores are a bit sucky, even vs a Rasp Pi 1.
That's why I was wondering- wow that's bad.
skmp did add "fastmem" support to it, but I couldn't get it to compile/run at the time.
I can't rembember if the Zero (W) is better or worse then the Pi 1, time to wiki
ie. the emu is still not fast enough for "realtime" at all times, even if I completely disable the (currently very slow) FPGA renderer.
And I tried overclocking the ARM core(s), which only helped a little bit.
Keep in mind that the ARM cores on the DE10 (dual-core) don't have any form of GPU to speak of.
So not even any NEON instructions, to help speed up things like video compression. At least I don't think so.
IIRC, even mt32-pi struggled to run smoothly on the ARM, with other stuff going on.
I mean the MUNT + Fluidsynth thing.
Hence, Sorg preferred going with the mt32-pi project, which is obv all external to MiSTer.
skmp doesn't actually own a MiSTer setup yet, btw.
He said he'd be able to get more done if he had one.
I was in the process of sorting out a JAMMIX setup to send to him, before I went on holiday.
Checked and wow, the ARM core in the DE-10 Nano must be quite bad if the Raspberry 1 (which is worse then the Zero) can beat it.
I think it was at least a bit worse than a Pi 2. Not sure, now.
ARM on MiSTer runs at about 800-900 MHz by default.
Can only overclock to about 1 GHz or so, before it starts having problems.
DDR3 runs at 400 MHz, I think?
Basically 32-bit (4 Bytes) wide, so that means roughly 3.2 GBytes/sec... peak.
(400 MHz x 2, because DDR... times 4 Bytes = 3.2 GBytes/sec with Bursts.)
The DDR3 didn't overclock much at all before crashing. lol
https://forums.raspberrypi.com/viewtopic.php?t=321102 So barely better in terms of clock speed then the RP1, but that's prolly doesn't matter given that the Pi prolly has at least some form of GPU.
Probably more to do with the DDR controller on the FPGA, rather than the DDR3 itself.
I can't remember exactly which type of ARM cores the DE10 uses, but it essentially uses the same toolchain as for the Rasp Pi (1 / 2).
Can 1 GHz be relied on for a public core?
Erm, not sure. I haven't done much testing at all.
I think it was at 1 GHz most of the time when I did test, though, and it seemed fairly stable.
Just couldn't really get much overclock from the DDR3, without it crashing.
Might even be possible to do some of the "emu side" calcs on the FPGA, to speed things up.
But that would be WAY over my head. lol
Is the only benifit to dual SDRAM a faster speed for reading the SDRAM?
Single SDRAM module is only 16-bit wide.
And runs at say 140 MHz max, for most people.
Which means around ~240 MBytes/sec, with Burst transfers.
Dual SDRAM obv adds the second module, which can be run in parallel (same address) as the first.
So doubles the bandwidth to around 480 MBytes/sec.
Or, some cores can choose to output a different address to each module, depending on how they need it.
Can also have different timings for when bursts start, on each module, so overlapping, and other tricks.
So either you have two people trading off on jobs,
or they can do different jobs,
but not both.
Only uber devs like Jotego properly understand the in-depth SDRAM voodoo tricks. lol
It took me years to understand the basics of SDRAM, and I'm still learning.
You could also do things like reading from one SDRAM module, whilst writing that data to the other, for fast transfers of blocks of data.
Machines like 3DO did something similar, but that was built into the specific type of VRAM they used.
(very similar to what was used on the MD/Genny.)
VRAM on the DC / NAOMI runs at 100 MHz.
32-bit wide accesses for Params / vertex data, or writing back to the Framebuffer.
So that would be around 400 MBytes/sec, peak.
And 64-bit wide when accessing texture data, so 800 MBytes/sec, peak.
So even Dual SDRAM obv isn't fast enough for DC textures.
(VRAM on the DC is the same type of SDRAM as the MiSTer SDRAM module, just running at 100 MHz.)
First part of the new RLE code seems to be working.
Well, not tested properly yet, as I have more to do.
It has to do a kind of Histogram thing, where it counts up how many of the same Tag value it finds in the buffer.
Then it can output the RLE stuff, grouped by Tag (polygon), which is exactly what I need.
But I'll still need to use TWO Tag buffers / Z buffers.
Always seems to need a few steps back, to take a step forward. lol
It could be tested with one Tag buffer atm, but it would be even slower than before.
Is that why I haven't seen any refences to Dual SDRAM here, then?
It's not that I couldn't take advantage of Dual SDRAM for DC, it's just that I'm currently testing it by reading the VRAM directly from DDR3 atm.
Which is quite fast, ofc.
I'm only using a Single SDRAM module for the display / Framebuffer.
Since the DC pretty much renders all games as 16-bit colour, a single SDRAM module was fine for that.
That would leave a second SDRAM module for AICA (sound) eventually, but we might never get that far. lol
I mean, I did start testing with writing finished tiles back to a Framebuffer DDR3.
"pretty much"?
Not sure now, very tired.
IIRC, almost all games render as 16-bit, with maybe some dithering added during display?
Too tired to even remember if it can render 24-bit. lol
I think it can, but only when using a palette.
(so most of the time, 24-bit was only used for displaying still images on menus etc.)
Just wasn't too much need to display games as 24-bit back then, especially on the crusty CRT TVs most people had.
Gonna sleep soon. Awake half the night again, and it's nearly 5am in Mordor.
Goodnight!
The RLE module was updated, so it also emits the start Row and Column for each RLE run.
That is so the texture unit will be able to texture only the pixels within each run, but also skip over pixels to the next run.
It does that for each Tag (polygon) within the tile.
The RLE module will then signal that it has finished, and then the tile can be flushed from the ARGB buffer into the Framebuffer.
Oh yeah, I might need to store the tilex and tiley values, too.
I forgot that it renders each Primitive TYPE, one after the other, into the tile buffer.
But that's good, because it will let me handle the colour blending for Translucent prims later.
(Opaque and Translucent can also have a Volume Modifier type, but that doesn't directly change the pixel colour.)
Ohh, I see...
RLE Start. Prim Type: 0 (Opaque)
tilex: 19 tiley: 03 tag: 0x006 cnt: 0447 row: 18 col: 01
tilex: 19 tiley: 03 tag: 0x007 cnt: 0577 row: 00 col: 00
The RLE thingy always outputs the Tag values, starting with the LOWEST value first.
Which might be OK, but we'll see.
Also a slight bug there, where it thinks the run of tag 06 starts in Column 1.
That could just be the delay from when the Tag buffer takes an extra clock cycle to update the output.
ChatGippity...
I'm not sure this is working correctly, for non-contiguous runs of Tag values?
ChatGPT said:
You're absolutely right to raise this — the current implementation groups all instances of the same tag together, even if they occur in non-contiguous runs. That’s fine for global tag histograms, but not valid for proper RLE (Run-Length Encoding).
sigh
Quite complex.
Gah!
RLE Start
tilex: 0 tiley : 1 Prim Type : 0 (Opaque)
tag: 0x002 cnt: 0005 row: 31 col: 00
tag: 0x001 cnt: 0027 row: 00 col: 04
tag: 0x002 cnt: 0005 row: 00 col: 31
tag: 0x001 cnt: 0027 row: 01 col: 04
tag: 0x002 cnt: 0005 row: 01 col: 31
tag: 0x001 cnt: 0027 row: 02 col: 04
tag: 0x002 cnt: 0006 row: 02 col: 31
tag: 0x001 cnt: 0026 row: 03 col: 05
tag: 0x002 cnt: 0006 row: 03 col: 31
tag: 0x001 cnt: 0026 row: 04 col: 05
tag: 0x002 cnt: 0006 row: 04 col: 31
tag: 0x001 cnt: 0026 row: 05 col: 05
tag: 0x002 cnt: 0006 row: 05 col: 31
Now it's outputting the RLE runs right away, but NOT grouping them. lol
I love chat gpt's overuse of the phrase "You're absolutely right to raise this"
Yeah. It's amazing how it seems to INSTANTLY know what's wrong with the code, but only when you point it out. lol
I'm hoping the next version will be able to spot those issues itself.
Again, I get that it's the user input that trains the model, but it still seems to know the exact problem (and how to fix it) right away.
Looking a bit better.
ChatGPT said this is a kind of hybrid between "True RLE" (which would output in order of how the values are found), and Tag Grouping.
Still a slight issue with the row and col stuff being delayed by one clock, but it's getting closer.
OK, so my Tag buffer does have a delay of two clock cycles, every time the Row value changes.
ie. Row pointer increments... Tag buffer sees the change on the next clock... Data output gets updated on the next clock.
Otherwise, the RLE thing seems to be mostly working.
I'll have to separate the texture_address module and other stuff into a TSP module next, which will take a while.
The RLE module is currently VERY SLOW, though.
It's due to the EMIT stage of the logic using WAY too many clock cycles atm.
Every time the Tag entry changes, it has to scan all 1024 entries again, to find the next match. That's crappy.
Apparently this is a much harder problem than I ever gave it credit for. lol
At least the delay issue from earlier is fixed. But there's no point if it's going to be this painfully SLOW.
RLE Start
tilex: 00 tiley : 00 Prim Type : 0 (Opaque)
row: 00 col: 00 tag: 0x001 cnt: 0544
row: 17 col: 01 tag: 0x001 cnt: 0031
row: 18 col: 01 tag: 0x001 cnt: 0031
row: 19 col: 01 tag: 0x001 cnt: 0031
row: 20 col: 02 tag: 0x001 cnt: 0030
row: 21 col: 02 tag: 0x001 cnt: 0030
row: 22 col: 02 tag: 0x001 cnt: 0030
row: 23 col: 02 tag: 0x001 cnt: 0030
row: 24 col: 03 tag: 0x001 cnt: 0029
row: 25 col: 03 tag: 0x001 cnt: 0029
row: 26 col: 03 tag: 0x001 cnt: 0029
row: 27 col: 03 tag: 0x001 cnt: 0029
row: 28 col: 04 tag: 0x001 cnt: 0028
row: 29 col: 04 tag: 0x001 cnt: 0028
row: 30 col: 04 tag: 0x001 cnt: 0028
row: 31 col: 05 tag: 0x001 cnt: 0026
row: 17 col: 00 tag: 0x002 cnt: 0001
row: 18 col: 00 tag: 0x002 cnt: 0001
row: 19 col: 00 tag: 0x002 cnt: 0001
row: 20 col: 00 tag: 0x002 cnt: 0002
Back to the older method, which is a lot faster.
But it means the RLE Tags are output in the order they are found in the Tag Buffer...
RLE Start
tilex: 00 tiley : 00 Prim Type : 0 (Opaque)
row: 00 col: 00 tag: 0x001 cnt: 0544
row: 17 col: 00 tag: 0x002 cnt: 0001
row: 17 col: 01 tag: 0x001 cnt: 0031
row: 18 col: 00 tag: 0x002 cnt: 0001
row: 18 col: 01 tag: 0x001 cnt: 0031
row: 19 col: 00 tag: 0x002 cnt: 0001
row: 19 col: 01 tag: 0x001 cnt: 0031
row: 20 col: 00 tag: 0x002 cnt: 0002
row: 20 col: 02 tag: 0x001 cnt: 0030
row: 21 col: 00 tag: 0x002 cnt: 0002
row: 21 col: 02 tag: 0x001 cnt: 0030
I'll just have to wait until it says that the whole tile's worth of Tags have been processed, then group the Tags in the TSP.
The Daytona render is back up to ~22 FPS theoretical (at 100 MHz).
Even ChatGippity says it's complex. lol
I thought this would be the key to getting the frame rates up, but it turns out that most things with this project are hard.
Oww. I really thought that was working then. lol
tilex: 00 tiley : 00 Prim Type : 0 (Opaque)
row: 00 col: 00 tag: 0x001 cnt: 0545
row: 17 col: 01 tag: 0x002 cnt: 0001
row: 17 col: 02 tag: 0x001 cnt: 0031
row: 18 col: 01 tag: 0x002 cnt: 0001
Stupid ChatGippity.
I'm done for today.
I think it's close, and I'm sure it can be done, but my brain is fried again, even when GPT is helping.
sigh
It was "grouping by tags", but only within each ROW, if that makes sense.
I need it to group by tags for the entire 1024 Tag buffer array.
I KNOW this is doable. Just have to keep chipping away at it.
I just need it to output the RLE, grouped by Tag, like earlier.
But that was SUPER slow, so would never have worked.
I had to start a new ChatGippity session, pasting in my current code
As it kind of got stuck in a loop, and was generating code from stuff it wrote about an hour ago. lol
You shouldn't need a bitmask for all visited pixels.
It should be possible to just output RLE values for the current Tag, until the end of the tile (row_sel==31 and col_sel==31).
Then, rewind to the position where the first Tag change was, and repeat the process until the end of the Tag buffer.
Repeat, until the total run count == 1023.
ChatGPT said:
You're absolutely right — thanks for pointing that out.
Since you only need to:
Focus on one tag at a time per scan of the tile,
Remember the location where the tag first changed, and
Repeat the scan from that point forward for the next tag,
There's no need for a full pixel-level bitmask. Just a few key registers (like the first tag, tag change point, and run counters) are enough.
I KNOW I'm right. lol
I'm just too lazy to figure it all out by myself.
I think it works, after about the 20th time of asking GPT different methods.
Grouped by TAG.
But also handles non-contiguous runs of Tag values, which often tend to switch back to the first Tag.
Still quite slow, but oh well.
Only ~13 FPS now, instead of ~23 FPS.
This is because it now has an INITIAL_SCAN phase, to count the number of different Tags in the tile, or something.
I can't leave it there. Still trying to speed it up.
omg, I forgot how annoying it is using ChatGPT / LLMs. lol
Just when you think it's on the right track, it f*cks something else up
Nope.
Definitely giving up for today.
It reminded me once again, that while LLMs like ChatGPT are awesome for certain things, they are also simultaneously shit. lol
I know what I need the code to do. I'll just have to try writing it in my own style.
1. Read the FIRST tag value.
2. Only generate runs for pixels with the SAME tag value, until the last pixel (1024) is reached.
If the Tag value changes, store the position (and Tag) of the FIRST time the tag changed, ignorring the rest.
3. Jump back to the position where that first Tag changed, and repeat the same process, generating runs for the SAME Tag value.
4. Once 1024 pixels have been processed, we know all Tags should have been processed -> Emit the RLE values for each Tag group in turn.
One optimization for that, would be to start emitting RLE output as soon as the first run of Tags ends.
(assuming the first contiguous Tag is shorter than the entire tile. Sometimes the whole tile will be convered by the same Tag / polygon.)
ChatGPT just CAN'T do it. lol
It keeps changing between two different strategies, but still gets it wrong.
It does the first part OK, but then doesn't group by TAG. sigh
And the other method is WAY slower.
I can't believe I'm still looking at this today. lol
Back to something half-working again, but incredibly slow.
Wow, that's the first time it has had good output, AND doesn't slow down the rendering by any appreciable amount.
It's back to getting a calculated ~23 FPS.
Haven't done a git push in a very long time. Doing now...
Easier to see, in the last (lower-right) tile rendered.
It's not quite right, but still.
A run of 17 Tags, with the value 05.
Then it repeats a value of 05, one time, which is kinda wrong.
Almost there. I knew it could be done, it just needs refining / debugging now.
The RLE values will eventually get shoved into a FIFO, so the TSP knows which pixels to shade.
The different Tag values are used to access the param buffer, to grab the texture info, and re-load the Vertex values for the poly.
The vertex values, along with the depth values from the Z buffer, are interpolated in the TSP, to generate the texture address and RGB shading stuffs.
Almost fixed.
RLE Start
tilex: 19 tiley : 14 Prim Type : 0 (Opaque)
row: 00 col: 00 tag: 0x000 cnt: 1023
PRIM Done !
RLE Start
tilex: 19 tiley : 14 Prim Type : 3 (Translucent)
row: 00 col: 00 tag: 0x005 cnt: 0017
row: 00 col: 17 tag: 0x004 cnt: 0001
row: 00 col: 18 tag: 0x000 cnt: 1005
PRIM Done !
ChatGippity keeps removing the wait state, for when row_sel gets incremented.
Due to the delay needed for the Tag buffer / Z buffer to update.
Actually, that looks correct now, for that last tile.
A row of 17 Tags, with value 05.
Then one Tag of 04.
Then 1005 Tags with value 00.
It's also still outputting RLE based on each row, but whatever. I can work with that.
Typical Ash run: Pain, resignation, perseverence, progress 💞💞
As a quick test before sleep, I just disabled the rendering in the sim, to see what the max speed would be, with only the Tag buffer and RLE stuff in place.
The Daytona render hits 61 FPS (at 100 MHz), which is a tiny bit faster than my notes from before, when it was closer to 60 FPS.
So that's a good sign that the first stage of the ISP stuff is OK.
The rest will come down to efficient use of Burst transfers, to read the texture data from DDR.
The BIOS menu render was 98 FPS in my notes. It's currently 112 FPS with rendering disabled.
Hydro Thunder title was 101 FPS in my notes / comments, it's now 193 FPS. lol
But this is all cheating, of course, as it's not doing ANY Codebook reads, texture reads, UV interp, shading, colour blending, transparency.
It's just to see whether the "front end" stuff is fast enough.
Back again. Sort of.
I've been brainstorming about the next logical challenge...
The code for RLE seems to work well so far, and I've made it a bit more code-efficient, by removing duplicate sections of code.
So the RLE "runs" of Tag, Count, Start Row, and Start Colomn will get fed to the TSP via a FIFO.
The thing is, the TSP will need both the Params+Vertex+Colour info from the Param Cache, plus ALL of the Z values from the Z-buffer.
I could just allow the TSP to read the Z values from the Z-buffer one at a time during rendering.
But that ties up the Z-buffer, which means it can't be used for the next tile until the TSP has finished the current one.
The RLE logic does have to read each Tag in turn, for all 1,024 Tags in the buffer.
During that time, I could just be writing the (per-Tag) Params and Z values to the TSP.
Then the TSP should have everything it needs to texture the tile pixels, freeing up the Param cache and Z-buffer, allowing the ISP to processing the next tile.
It's just a shame to have to duplicate the Param cache and Z-buffer in the TSP, but from watching Mr RTLengineering's vid again, it looks like the real TSP (on the chip) does that anyway.
One good thing is, the Param cache in the TSP can be a lot smaller than in the ISP, because the RLE logic basicaly re-numbers the Tags, based on whichever it processes first.
Which is another concept that's a bit hard to explain. lol
An example of a Tag buffer, after the HSR (Hidden Surface Removal)...
Where the Tag values were just a counter that incremented once for each incoming polygon (or partial polygon).
Only the pixels of the polygon which satisfied the depth compare get written to the Tag buffer, overwriting any previous Tags / "pixels".
(the Z value for each pixel gets overwritten as well).
The job of the RLE logic, is to scan through the completed Tag buffer, and produce the RLE "runs" (Tag + Count) to feed to the TSP for texturing.
So instead of the Tags being 23, 24, 24, 24, 1D in the example above,
the RLE logic re-numbers those to 1, 2, 2, 2, 3, etc.
Incoming row from the Tag buffer...
23, 24, 24, 24, 1D, 1F, 1F, 1F, 20
RLE renumbering...
1, 2, 2, 2, 3, 4, 4, 4, 5
RLE output...
Start Row = 0, Start Col = 0, Tag = 1, Count = 1
Start Row = 0, Start Col = 1, Tag = 2, Count = 3
Start Row = 0, Start Col = 4, Tag = 3, Count = 1
Start Row = 0, Start Col = 5, Tag = 4, Count = 3
Start Row = 0, Start Col = 8, Tag = 5, Count = 1
You can't really just instantly "grab" the data from the Param buffer and Z-buffer and copy over to the TSP.
'cos it's not like C/C++, where you can just do that in one line of code. You actually need to think about where the data flows in the FPGA, and write a new module for it.
Sure, it's not always that hard to just copy n paste the instantiation for an existing module.
But I also need to think about the amount of logic available on the FPGA, and the DE10 is already running low. lol
I guess the absolute worse-case scenario for RLE is that every single "pixel" in the Tag buffer is unique.
So up to 1,024 (32x32) Tag values.
Which would be incredibly rare in a game, I think, but still.
The reason the TSP needs the Vertex params and Z values at all, is just to do the U,V interpolation for texturing, and then divide that by Z, for perspective-correction.
(and do the interp for RGB shading, and eventually Gouraud.)
I could maybe just do the UV interp in parallel with the RLE stage.
This is all the stuff the Param buffer stores atm, per Tag (incoming to the ISP)...
// output [31:0]
.isp_inst_out(isp_inst_out), .tsp_inst_out(tsp_inst_out), .tcw_word_out(tcw_word_out),
// output [47:0]
.vert_a_x_out( vert_a_x_out), .vert_a_y_out(vert_a_y_out), .vert_a_z_out(vert_a_z_out), .vert_a_u0_out(vert_a_u0_out), .vert_a_v0_out(vert_a_v0_out),
.vert_a_base_col_0_out(vert_a_base_col_0_out), .vert_a_off_col_out(vert_a_off_col_out),
.vert_b_x_out( vert_b_x_out), .vert_b_y_out(vert_b_y_out), .vert_b_z_out(vert_b_z_out), .vert_b_u0_out(vert_b_u0_out), .vert_b_v0_out(vert_b_v0_out),
.vert_b_base_col_0_out(vert_b_base_col_0_out), .vert_b_off_col_out(vert_b_off_col_out),
.vert_c_x_out( vert_c_x_out), .vert_c_y_out(vert_c_y_out), .vert_c_z_out(vert_c_z_out), .vert_c_u0_out(vert_c_u0_out), .vert_c_v0_out(vert_c_v0_out),
.vert_c_base_col_0_out(vert_c_base_col_0_out), .vert_c_off_col_out(vert_c_off_col_out),
Aaand, I think I just spotted a bug. lol
I'm currently using the Z values from the Param buffer during rendering.
But those Z values are from the three points of the triangle, NOT interpolated across the surface.
Which would explain why the textures are wonky.
Actually, I think that's correct.
The UV interp blocks do use the Z values from the three points of the triangle.
The UV Clamp / Clip module uses the Z values from the Z-Buffer, to do the final divide for Perspective Correction.
That makes things a bit easier.
Would be nice if I could just pre-calc the final UV values (texture coords) as the RLE block runs.
But the problem is, it would also need to load the params from the Param cache for each Tag as well.
I think this is doable.
It would essentially be doing some of the "rendering" stage during RLE.
But we would be directly storing the final UV values after Z-interp.
OK, yep, I'm already addressing the Tag/Z-buffer during RLE.
And that will be outputting the (original) Tag values, loading the params from the Param buffer, and doing the UV interp with Z.
So I just need to store the final u_flipped and v_flipped values, to send to the TSP.
(as well as the ISP/TSP/TCW params words from the Param buffer)
That should be all the TSP needs to do the texturing. The interp for RGB shading / Gouraud can happen later.
Ooh, I might not need to "store" this stuff before texturing.
This IS the texturing info.
And the RLE block scans through each "pixel" of the Tag buffer anyway.
If the UV interp is happening in parallel with the RLE, then that's all the TSP will need.
I mean, this is almost the same as doing the "direct" rendering, like before. lol
I clearly need to think about it some more.
The main aim of the RLE, was to group the Tags together, to avoid re-reading the Params and texture chunks for each "run" of pixels.
Yep, can't do this in parallel with the RLE.
(and can't group all Tag values together, until the RLE has processed all 1,024 "pixels" in the Tag buffer.)
OK, so the info I need to send to the TSP, can be done during the "emit" stage of RLE...
RLE Start
(Send the ISP/TSP/TCW, Vertex, and RGB shading Params to the TSP here).
tilex: 05 tiley : 07 Prim Type : 0 (Opaque)
row: 00 col: 00 tag: 0x4D7 cnt: 0000
row: 02 col: 14 tag: 0x336 cnt: 0002
row: 14 col: 11 tag: 0x0DB cnt: 0001
row: 16 col: 16 tag: 0x0FC cnt: 0001
row: 17 col: 17 tag: 0x102 cnt: 0001
Would need to keep shoving new params into the TSP FIFO every time there's a new Tag "run", actually.
So when rle_valid goes high, the Params can be put alongside the RLE stuff, within the same "word"
Nope, no need to send the Param stuff twice, for times when there are multiple runs with the same Tag (polygon) value.
So the params would be a separate input word into the TSP buffer, with a flag set to denote that.
I think this will create minimal overheads.
Since the whole point of the RLE stage is to group by Tag, but also minimize the amount of data sent to the TSP.
Sending the params via the TSP FIFO is a good idea, and possibly what the real chip does.
'cos as soon as all of the Params and RLE runs have been sent, the TSP can start texturing the pixels, and the ISP can continue with the next tile.
But still might need to duplicate the Z-buffer in the TSP, so it can do the interp.
Can't get around keeping the original Z-buffer, as that interp gets done on 32 pixels at once.
So that's the main speed-up for HSR. It does the Z-interp and depth-compare on 32 pixels (a whole tile row), so is about as fast as it gets.
And I can't transfer those Z values to the TSP until the ISP has finished processing the whole tile.
(plus the RLE needs to do the Tag grouping)
But I can transfer the Z values one-at-a-time during the RLE scan.
That can all be shoved through the TSP FIFO as well.
I have quite a lot of work to do. Need to think about it a bit more first.
Only a small update tonight.
Spent about two hours, just putting the stuff for the TSP into a separate module, then hooking everything up again.
Then next thing to do, is to add a second Z-buffer in the TSP module.
Then transfer the Z values from the main buffer into that one, during the RLE "scan" phase.
Since the RLE scan has to go through all 1,024 "pixels" of the Tag buffer anyway.
At 100 MHz, and 300 tiles (640x480 res), the RLE scan takes (close to) 3.072 milliseconds.
Which does eat into the frame render time, but there might be a few ways to optimize that later. I have a feeling the real chip didn't do it much faster than that.
Once the proper texture cache is in place, it will pre-read chunks of texture into it.
It would be possible later (if there's enough FPGA logic left), to render more than one pixel at a time, because you can access more than one pixel from internal texture cache.
That should give a big speed increase.
The ISP now contains the main state machine for Vertex / colour param parsing, the float_to_fixed instances, the Parameter Buffer, inTri_calc (32 Tags at once), interp_z (32 values at once), Z-buffer, and RLE block.
The TSP contains the U interp, V interp, UV clamp/clip module, texture address, palette, codebook cache, colour blending, tile ARGB buffer, and tile writeback logic.
Now I can see why (and how) the texture stuff was often kept separate, on graphics cards since the 3dfx Voodoo cards.
(and way before that, in the mid-80s, on SGI machines, etc.)
Nothing exciting to show for screenshots atm, as it looks exactly the same as before.
Which was kind of the idea tonight. lol
And I still never got around to hooking up the o'scope to a DC mobo, to confirm whether or not it can output four different VRAM addresses at once.
I now tend to agree with skmp, and that it doesn't.
It can definitely output TWO separate addresses, though, for each 32-bit half of VRAM.
(reading Vertex and Colour params from one half, Texture chunks from BOTH halves, then writing to the framebuffer in the opposite half.)
VQ textures are all messed-up again, because of course they are.
Missing chunks is some issue in my recent code, which was aiming to skip tile rows to speed things up.
And the half-res rendering, I think was meant to mimic how it writes to the Framebuffer on the MiSTer test "core".
(due to me not being able to figure out exactly how to tweak ASCAL to make it display the full res.)
What's it been, like two years now? lol
Is that Link as an owl with rosacea
The last visual memory card you'll ever need for your Dreamcast. But it's not just for your Dreamcast. The VMUPro has the ability to run applications and 8-bit console emulators turning it into a small handheld. Enjoy your favourite games from consoles like the Game Boy & Game Boy Color, NES, Game Gear and Master System on its amazing Backlit IP...
I liked the other memory card pros
This should be cool too
I loved my mem card pros till they all stopped working 😦
I never had such bad luck in my life with a product
I’m still wondering what the VM2 runs on?
I couldn’t find any info on its CPU on the product page.
That kinda makes me not want to buy them, given how that very much looks like beyond mere bad luck, and into shoddy build quality.
vm2 is fine, looking forward to finally having a matching wireless dreamcast “dreamcon”
If one was to buy a modded DC in the US, what is "the" store, or two, to do so
Stone Age Gamer might have what you're looking for
I need to mod mine
Sorry to hear that. I think mine is like the first they ever made, and it works great still
I have a pretty old school mod. Works great! I got it back when the options were very limited...
The basic "usb gdrom"
According to this Reddit Post, mine is the worst 🤣
I remember the guy who I bought it off was kind of a "personality"
What's his name
"Dr. Mnemo" he went by
The Reddit Post is critical, but I honestly hever had an issue and would never play PAL games or use the fishing rod so no biggy.
I have the MODE on my Sega Saturn, and it's really flashy/nice/perfect so I assume must be same on the Dreamcast. I'd get that one if I had to choose.
Jeeeeeezus the prices. Are ODEs really $299 ?
Oh really wtf
Just get a gdemu from Ali
Should be less than £50
I think there was an issue though with a specific game.
I need a Flippydrive equivalent for the Dreamcast
Ye thats on the 5.15 model, if u get a 5.20 model I havent had any issues with any game
The way how it’s flex cable acts as a pass-through for the original optical drive is genius, I would love to see a similar method used for PS2 if possible
so damn tired of OPL and its compatibility quirks
5.20 gdemu clone works great!
Had no problem fitting essentially a 1g1r set on a 400GB micro SD with plenty of room to spare
I assumed the clone GD Emus from AliExpress were so cheap, there wasn't too much hurry to try to get my version finished.
But it looks like there is always a market for alternatives.
I just can't seem to get out of this mental rut atm.
ohhh wait, I responded to this and missed the fact that flippydrive allows the disc drive to function as normal …that would be awesome tor DC 🙏
Oh, I just saw Tito's video (Macho Nacho Production) where he's modding a Dreamcast.
He is applying a new heat sink. I am always into preserving my consoles to last longer. I might just do this since it seems easy.
Has anyone done this before?
One thing I wouldn't mind doing is recapping the smaller capacitors on it as well
I think I recapped some of the caps, but not all.
I will have to check later.
Also installed a quiet fan mod ... so far!
Definitely worth doing. Especially if you can do it yourself.
I remember being really hyped for the Dreamcast at the time (and for good reason)..
But when I finally got one, and realized that the PAL models were basically all VA1 - they had no heat pipe cooling. 😦 lol
Tito is somebody I would love to send a "DC Pro" motherboard too, if I ever get around to building the new one.
It would have built-in HDMI, the FPGA, and the extra RAM necessary for running NAOMI games.
(games would be selectable via the On-screen menu, which could also be output via RGB, a bit like how MiSTer does it.)
I'm just in such a huge mess again in my two rooms here.
It reached critical mass again, and I need to get rid of more stuff to even be able to move, to tidy the rest. lol
(Optoma projector, small bookshelf speakers, Onkyo AVR with Atmos, four B&W in-wall speakers, too many DLP projectors, Yamaha rack sampler, high-speed camera from Dexter's Tech Lab that I hung onto for ten years, etc.)
Dreamcast rules
Yeah the Dreamcast in my setup is the only retro gaming system original I have.
When I was a kid I sold my NES at a garage sale...
Sold my SNES on eBay.
Those were the only two systems I owned before the Dreamcast. A buddy of mine who I played a lot of Soul Calibur and Gauntlet Legends with told him while I was away in another country for six years he could have the Dreamcast.
When I moved back to my country, I asked for the Dreamcast back and here it is in my awesome collection... the ONLY original console from my youth which I still own.
And I bought the Dreamcast on 9/9/99
We were able to reserve one somehow. Honestly it blew me away.
The N64 had never really impressed me (sorry, it didn't) but the Dreamcast made me think "OK, I can finally dig 3D console games now"
Soul Calibur is simply one of the best in the history of gaming. Period.
Also played the hell out of games like Shenmue and yeah, just such an awesome system. I even loved the design of the controller, the vmu thing was fun, and the system itself looked so sleek.
Was sad to see it die suddenly tho. I remember when I got to the video store and they were really liquidating all the games, for very cheap. I bought a bunch of them.
All from $5 to $20. I don't think there was a game there that was over $25.
Just purchased it https://www.amazon.ca/dp/B09DC772PR
Apparently this thermal pad is quite a bit better than the ones Sega used even when they were brand new.
Yeah. Mine just arrived today.
nice.
BUMP
Gouraud shading in Verilog now.
I realize the textures are all bumpy atm, but I confirmed that's due to lack of Z precision.
This was before I fixed most of the issues with Quads / sprites...
Before anyone gets over-excited, and I have to say this EVERY time (lol)... this is not anywhere near a "core".
This is just rendering single frames from the VRAM dumps from a tweaked reicast emulator.
Then shoving the 200 or so BMP files into a GIF Maker online.
But I managed to get the Daytona render back up to around 27 FPS (calculated, assuming 100 MHz core)...
Ignorring the broken big poly near the camera, and lack of logo.
And slightly wrong sky texture.
Still not rendering by Tag order / RLE spans yet.
Which should give another speed boost.
Even if this never makes it into a future core, I want to try hooking it up to an Amiga or something, for the lolz.
Why an Amiga?
Because the Amiga would have just enough power to do some basic 3D calcs, then some DMA to write to the DC GPU.
And because it would be funny. 😛
I just think it could also be neat for democoders and homebrew devs to write for a "new" style of GPU.
you could always design an isa card … new powervr for all!
The thought did cross my mind.
I could probably do PCI, from what I learned with the MiSTer PCI stuff.
“Please subscribe me to your mailing list” 😀
Hooooray, we're back on the @rain obsidian train!
He is back in the hood!
You probably know it as the Katana, not sure if you heard they renamed it
Oh thank you, you're right. Time flies, haha
@rain obsidian what is the range that prim-tag can be? My initial thought was past a certain number of triangles within a tile, or when it was given a translucent triangle, it would render out what it had.
12-bit prim_tag atm, at least on the sim.
And quite a few params stored per-triangle atm.
Probably possible to do the RGBA and UV interp precalcs as the ISP runs.
Which would mean only needing to store those RGBA UV values (and deltaS), rather than storing the vertex stuff for every triangle.
And the ISP currently doesn't "evict" stuff from the Tag buffer, if newer triangles completely overwrite the pixels of the older stuff.
(depth compare, etc.)
So it is very wasteful, tbh.
The param buffer in the TSP can probably be made a lot smaller, as the average number of unique Tags (per-tile) is often lower than what gets shoved into the ISP.
Some of this is quite hard to explain. lol
But I've been trying to tidy up the code a bit, this past week or so.
All in that "new_branch".
I need to ditch the older branch soon.
RA parser just reads the initial stuff which tells it what type of object (triangles, quads, opaque or translucent etc.) into each tile.
ISP Parser contains most of the rendering stuff.
Most texturing now moved into the TSP module, in a separate file.
Thinking about it, I have a minimum of 32 clock cycles when processing each triangle.
Which is more than enough time to do the interp precalcs for R,G,B,A, and UV.
I guess it still ends up storing almost as much info, though. lol
Currently stores this, for every incoming triangle (per-tile)...
input [31:0] isp_inst_in,
input [31:0] tsp_inst_in,
input [31:0] tcw_word_in,
input [47:0] vert_a_x_in,
input [47:0] vert_a_y_in,
input [47:0] vert_a_z_in,
input [47:0] vert_a_u0_in,
input [47:0] vert_a_v0_in,
input [31:0] vert_a_base_col_0_in,
input [31:0] vert_a_off_col_in,
Plus the same, for verts B and C.
Vert D is only used for quads.
For colour interp, I'd need to store FDDX, FDDY, small_c, but for Base colour A,R,G,B, and Offset colour A,R,G,B.
So that's already 24 values, which need to be maybe 16-bit wide (each).
For texturing, it needs FDDX, FDDY, and small_c for U, and the same for V.
So that's another six values, some of which need to be fairly wide, like at least 32-bit.
So that's, erm, about 30 values, some at least 32-bit wide.
For every single incoming triangle. lol
One good point is, I'm storing the vertex / shading / texturing params for ALL triangles read into the ISP atm, from VRAM.
Including the triangles which might not contribute any pixels to the Tag buffer.
So there's a saving to be made there, to reduce the max number of params stored.
Tried to tweak it, so it only increments prim_tag when each triangle has "visible" pixels in the Tag buffer.
It didn't quite work out.
Trying to add a Texture viewer.
Starting to see part of the SEGA logo in there.
But this will only work for uncompressed textures atm.
So many games use palettes and VQ textures, it's not showing much atm.
But interesting to see, and it can be improved in the C code.
Started adding a texture viewer.
I'm guessing it's upside-down, because 0,0 for the Y coord is at the lower-left for textures. lol
Interesting how the logo stuff is all shoved into a single texture.
Supposedly it is 256x256, ARGB 4444, uncompressed, but using a twiddled address.
Gonna try to fix the upside-down.
Texture viewer is probably half-res atm, because, well...
That whole thing with the DC reading textures from VRAM via the 64-bit wide path.
And now I'm confused between this, and how it's done on the hacked reicast emu.
One of them just places each half of VRAM (4MB each) contiguously.
The other interleaves the lower and upper halves in each 64-bit word.
The later actually makes a lot more sense for how to store it in DDR3.
Inverting Y, when writing to the screen, but then it puts the texture at the bottom of that whole image.
Not quite what I wanted. lol
Oww. lol
HOTD2 title screen, is drawn in strips of tall-ass textures.
That one sort of makes sense, but then it has to grab the PRESS START texture from a larger sheet. Interesting.
(also shows my decoding isn't quite right yet, because I'm only reading half of each 64-bit word.)
Yeah, bottom left texture origin is some evil Silicon Graphics holdover from academics.
Once again finding out that working with textures with twiddled addresses, is hard.
It really twiddles my brain, because of how the bits of the address get interleaved.
Still awake, just.
Been watching YT mostly (Mat Armstrong, etc).
But just got uncompressed textures to display OK.
The jumbled-up textures are VQ-compressed.
That will take longer to figure out.
But VQ basically squeezes 32 Texels in each 64-bit Word.
(1 BYTE from the VQ Codebook represents a block of FOUR texels.)
Wow, they even have a texture for the SOLE of the Zombie's shoes. lol
(the irony)
Sole/Soul.
Was a bit bored.
So I thought I'd try some digital Archaeology.
Shop judybeevers's closet or find the perfect look from millions of stylists. Fast shipping and buyer protection. Hi! I'm Judy. Some of my favorite brands are Michael Kors, Coach, and Louis Vuitton. Thanks for stopping by! Feel free to leave me a comment so that I can check out your closet too. :)
The sky texture is probably from some CD-ROM in the 90s, like many of the first-party Nintendo games.
But interesting to see it's possible to find it.
It was one of the first hits on a Google reverse image search, but I didn't see any other matches yet.
And it looks like they probably edited it for use in Daytona, so they could have a larger section of blue sky.
Crazy Taxi clouds...
Couldn't find on Google yet.
Anyway (lol)...
I confirmed the main hurdle for higher frame rates atm, is purely the couple of extra clock cycles being used to draw every pixel atm.
(If that rendering loop can write a pixel on every clock cycle, it could do the whole 640x480 image within just over 3ms. Two clock cycles, then it takes 6ms. Three ms, and it takes 9ms, and so-on. So the number of clocks-per pixel need to be as low as possible there. That will happen eventually, when the proper texture cache is in place.)
I can skip some of those states, and although it messes up the renders a fair bit, it gets the FPS back up to about 37-60 FPS.
So definitely doable.
I really want to get enough Verilog in place, to make it render faster on the DE10.
It won't easily run above about 30 MHz without proper timing fixes (which I don't really know how to do).
But if I can get it running with the hacked reicast at even 15-20 FPS on MiSTer, it would be a good proof-of-concept.
Is the DE-25 Nano likely too weak for hybrid Dreamcast to work at full speed?
I would keep the possibility of a SD 320x240 render in mind. Maybe that would be the trick to keep the intended fps. And some might even prefer a "scanline" look.
How much more intense is 480i V.S. 240p?
And would it even be possible to render only half the scan lines of 480p without breaking everything?
SUPPORT THE CHANNEL : http://www.patreon.com/VideoGameEsoterica
Video game emulation is a super popular topic in 2025 and I've done a bunch of emulation tutorials and emulation guides for everyone...but the one question I get asked constantly is what is the best emulator for X platform? and now there is a NEW Dreamcast emulator trying to get yo...
Not sure how much faster the new ARM cores are, vs the DE10 Nano, tbh.
But I would reckon it could run the emulator part fast enough. Dunno.
reicast on the DE10 ARM I think was something like 60-70% full-speed.
(ie. even when letting the FPGA side act as the GPU, and completely "unlocking" it, so the FPGA wasn't holding up the ARM side.)
skmp did add Fastmem support to the emu on ARM, but I couldn't get it to compile.
I think it might be possible to have the emu running full-speed on the DE10, but don't quite me on that.
I don't know if rendering only half the lines, or 320x240 would help too much.
There is far more time "wasted", simply trying to process all of the triangles atm.
Even when doing the visibilty and depth-compare checks on 32 "pixels" at once, that's still a lot of clock cycles per every single incoming triangle.
And that's 32 clock cycles per-triangle, per-tile, too.
So when a triangle overlaps more than one tile (which is quite often), it uses another 32 clock cycles to each of those tiles.
But the ISP does seem about as fast as it needs to be atm.
If I disable the rendering (TSP) part completely, the ISP was hitting about 50-70 FPS for a typical "frame"
Writing one pixel per clock cycle (at 100 MHz) means it takes 3.072ms to render the full 640x480 screen.
To hit 60 FPS, the total of course needs to be 16.67ms, or below.
I think my brain is starting to wake up a bit today.
A friend started working on some Laserdisc game stuff for MiSTer a couple of years back.
Wait, why can VGA support 480p, while SCART can’t?
And got it to start decoding some frames, on a Verilator sim.
Which really makes me want to do a DVD movie core for MiSTer. lol
SCART was only ever intended for 15 KHz TVs, etc.
You can technically pass 480p via SCART, ofc, it's just wires and a plug, but it's super non-standard.
Hence why the PS2 does Sync on Green?
I think there were some devices in the 2000s which passed 480i and maybe even 720p via SCART
Component, I mean, but via SCART pins.
I had a Ferguson TV in the 90s, which supported S-Video via SCART.
With Luma via the Composite pin, and Chroma via "Red". Just a separate Input option on the menu.
This is actually quite a complex subject. lol
YPbPr via SCART.
What the hell, sure.
Maybe, but I'm not sure.
Not quite sure what might have used that, but it rings a bell that something did.
I mean, it's really just wires in a cable.
And, get this...
Technically, RGB is a "Component" signal. lol
Just means the signal is broken up into more channels.
"Component" video usually refers to the YPbPr cables.
(YCbCr is the digital equivalent)
It's a fecking minefield of different specs, displays, cables, scalers, and spacebats.
Luma+Sync, Blue V.S. Luma, Red V.S. Luma.
YPbPr=G
I love video!
So, they could have made SCART carry 480p, but didn’t due to not wanting to go outside of SCART spec?
Screenshot of friend's MPEG2 attempts, from a couple of years back.
Which, in itself, was/is quite impressive.
I'll never forgot buying my first DVD player from a second-hand shop, around 1999.
It was a Panasonic DVD A150.
First DVD was probably Goldeneye, then The Matrix.
Hearing Dolby Digital 5.1 in the home for the first time ever, with a half-decent system.
The comparison to Pro Logic was mindblowing.
I'm still fairly hooked on home cinema, with a projector in BOTH rooms. lol
And have been slowly building my own AV receiver, with HDMI (4K passthrough), 11 amp channels, and Atmos / DTS-X decoding.
Then Samsung was doing a deal on a DVD player with three movies. Can't remember the price, but possibly £99.
Ghostbusters, and a couple of other movies.
But I didn't like the player too much.
Then a Toshiba SD-2109.
Right, I’m new enough to fail to have the experience to properly grasp how much of a step up this must have been from VHS (or VCD if you’re from some parts of Asia)
https://youtu.be/cvwuAKi1ZB4
You can support this channel on Patreon! Link below
Well, it had a good run. Although Laserdisc could never really get off the ground, it had a reasonable showing in the high end video market. Intended as a mass market product, Laserdisc would never get there.
But DVD... well that's a whole other story. In 7 or 8 years, DVD went from being i...
I’ve never watched an actual VHS tape.
Then a Pioneer DV-545.
But the Component output on it was horrific. lol
Like, really very strange, grainy output, with bad Chroma.
(Chroma decoding was weird. Hard to explain)
Early deinterlacers were pretty horrible, even with movies with 2:2 pulldown. lol
I never really had Laserdisc stuff until the late 2000s.
Even then, it was a weird Pioneer player from the early 90s, which didn't have a front display.
Almost the same model as used for the original Domesday Project in the 80s.
And no decoding for the newer "digital" audio formats.
But I repaired a Pioneer LD player for a friend about six years ago, and ended up watching the whole box of about 20 films. lol
There's just something about watching movies on some of those older formats.
But for sure, DVD was "better" overall.
Generally more reliable, better image and sound, menus, etc.
LD players were often also quite noisy, and had the pause when switching sides.
Could you tell a proper difference in video quality with the setup you had?
Ironically, early DVD players sometimes also had the short pause when switching layers. The better production houses would hide the pause in a scene transition.
I think the signal on a laserdisc was a Composite one?
Well, I guess I only really had VHS and Betamax before getting DVD.
But had a few nice Panasonic S-VHS decks.
And DVD was for sure "better" quality video and audio, all-round.
Also, I tried watching some DVD rips recently, when I bought my Zidoo Z1000 Pro player.
And, man, DVD looks SO bad on modern displays / projectors, vs a CRT. lol
Most people don't seem to realize that older consoles and DVD were designed for CRTs.
With the quirks of CRTs, the image was tweaked to look as good as possible on them.
480p to 4K is quite a scale.
Yeah, even I wasn't completely convinced that Laserdisc was really an "analog" format...
It still uses pits and lands on the disk, but with a big difference...
With DVD / Blu-Ray etc., the pits and lands basically have equal spacing (sort of).
But on a Laserdisc, the pits and lands were variable spacing, as they literally represented the Wavelengths (basically encoding the Zero-crossings) of the original analog signal.
Then they added the Digital audio stuff on a subcarrier freq.
So then "digital" really just means "Binary data, encoded / modulated onto an Analog waveform".
Karl's old code doesn't compile atm.
Well, the Verilator part, I mean.
But I'll definitely have a look at this later, and see if I can get it decoding some frames.
The other obvious use for this, is for the Dreamcast new GD/CD/DVD drive project.
That's if I ever get around to finishing the PCB for it.
I still need to build the new DC mobo, and new GD Emu boards.
I've had most of the parts and PCBs sat here for over 9 months now, due to severe burn-out.
Long story short-ish, I was working on a PCB to replace the GD drive completely.
Using the chipset from common Philips DVD player mechs.
I bought a Marantz DVD player with the same type of mech.
(not quite the same as here, but similar)...
Started reverse-engineering the serial commands that it uses for low-level access to the CD/DVD sectors.
In theory, if I can just get it to plonk the laser at the right place, I could eventually read a Dreamcast GD disk.
The "High Density" area on a GD disk really is just the same as a standard ISO9660 CD-ROM track.
Just burned at a higher speed on the factory master.
So crams in a bit more data than a standard CD.
Then, a future intention was to add DVD playback to the Dreamcast, because it would be hilarious.
The board above is just the bare CD/DVD drive part, though.
It would need to work in conjunction with the FPGA on the GD Emu board.
So probably too over-complex, tbh, and might never happen now.
The first GD Emu prototype worked very well, though.
The newer one adds an STM32 and USB 2.0 High Speed.
Which would be faster, and be able to support an internal SSD, etc.
First proto...
Used a big chunk of code from Marcus Comstedt, but he was using a small ICE40 FPGA.
microSD, and HDMI output.
But the HDMI will require a custom flex cable to be designed / made.
To hook up to the Video and Audio DACs on the DC mobo.
Which would be ideal, because I already have the basic OSD stuff for the FPGA.
So it could display a menu on-screen at any time, to swap disks, and reset the console.
Meaning you'd never need to boot into an actual menu program, just to change games.
Only the small-ish MAX10 FPGA onboard, though, which makes the higher-level menu stuff a PITA to code, even for a softcore CPU.
The new version was again, maybe a bit too ambitious. lol
PCBs have been sat here for many months.
But I guess the only real addition is the STM32, and USB 2.0 PHY.
Also changed to the ADV7511 HDMI chip.
'cos it has a hidden secret vs the ADC7513 (used on the DE10)...
The ADV7511 supports 24-bit RGB video, by via only 12 pins.
(plus clock and syncs)
Because I needed every single FPGA IO pin possible.
Oh I'm quoting mate. Resident evil code Veronica runs at 25fps. That's 5 less frames to create xD
(Always great to see so much documentation and information on your findings.
I'm also timeextension so I'm about to create the website post saying Dreamcast core has been confirmed xD
Bear in mind, that's only the "emu without the GPU" as well.
Ah it's all good. Il ignore that part in my website post
With the GPU on the FPGA side, if I, or anyone else can make it fast enough. lol
lol
I think we probably had enough "clickbait" a while back, with Pixel Ninja and VGEs vids. 😛
I tried to make it VERY clear that I was not even close to finishing PVR2, let alone the "entire rest of the Dreamcast".
Il write a tiny tiny message at the bottom saying "In order to get any video from this core. You need to sync perfectly a playthrough of desired game on YouTube and play along"
Although, I just saw some of my old posts (on Simulant discord) from Nov 2024, and I was only getting calculated frame times of around 13 FPS back then, on the Daytona render.
That's up to 40 FPS right now, which is pretty good.
With a few glitches, ofc.
I think it's very interesting to see the whole development side of it being so extensively documented. Via your screenshots and comments.
If I can even get 15 FPS in some games (using the hacked reicast on the ARM side), that would be almost playable.
But unless somebody wants to collab, or somebody else starts working on a core, it's gonna be very rough for a while.
The only 'collab' I can do is give you words of wisdom. Here they are:
Your doing good mate. Keep it up 👍
Thanks. 😉
I've just been having some quite serious depression and stuff recently, so not working even close to full-steam.
Probably doesn't help that I'm STILL typing whilst SLOUCHING on the bed.
I moved the rooms around, so I had a proper armchair in one room, and this room just with the bed.
But I still have the 4K Samsung in here, so ended up almost as bad as before. sigh
I now have serious neck problems, blurred vision, bad constant tinitus, slight leg weakness, yet again.
You'd think I'd be able to break the habit, but it's tough.
Well I hope you feel/get better soon boss!
Thanks.
I would put this TV back in the armchair room, but I forgot I bought a second projector screen for in there. lol
Never realized how serious it is, not having a proper chair / posture.
Do you have any idea on why there are scratches in the video?
Disc is scratched I reckon
(JK)
lol
Yeah, it's just some timing delay for when the texture address module does the calcs, reads the texture from (simulated) VRAM, then does the colour blending.
Before, I added some extra wait states to the Verilog, so it hid those glitches.
EDIT: But that was one of the main causes of the slower frame times. 'cos it's "clock cycles X pixels drawn".
EDIT2: At 100 MHz, it takes 3.072ms simply to even write the pixels to the 640x480 framebuffer, regardless of all the other stuff that needs to happen.
EDIT3: So if you can do all of the main rendering loop in a single "state", it takes ~3ms per frame. If you add just one extra state, it takes 6ms, and so-on.
It's all fixable, whilst keeping the higher frame rates / lower frame times.
But that can stay all future work, because doing it "properly" involves pipelining some stuff.
And I'd need some help with that.
The renderer really is the dumbest bit of Verilog atm.
I typed “pipe” and I got a flute.
Just scans through each pixel within the curent tile, whilst the interp blocks figure out the RGB / Gouraud shading, and the texture address module does its thing.
lol
Audio Video Recorder?
That's pretty cool!
Oh.
I was pointing the phone / camera at the projector screen a lot, but it was just to demonstrate that there are three speakers behind it. lol
It has an ARM SoC module on it, with an Allwinner chip.
Which was intended for music/movie streaming/apps.
But I might even ditch it on the next version. It's a PITA to sort out the software for it.
Most people tend to use a Firestick, AppleTV, PC, console, or external Media Box anyway these days.
The main aim was to get the Atmos / DTS-X (and legacy) audio decoding working.
That mainly works now, but yeah, tons of other stuff to figure out, like IR or Bluetooth remote.
It actually has 12 power amp channels built-in.
Class D, using the TPA3255 chip.
Which in theory gives around 140 Watts (RMS) per-channel, into 4 Ohms.
But I don't have a way to properly test that atm.
I also had to find an HDMI switch chip which...
A: Does 4K60 passthrough, with Audio extraction.
B: Is commonly available.
C: Has a reasonable Datasheet / pinout / schematic available.
D: Has some source code / Flash images available.
Which is much harder than you might expect. lol
Eventually found a demo board on Aliexpress, for the SIL9777 chip.
They only sent me the bare PCB (sigh), but then also sent the source code, which is what I really needed.
Cirrus DSP chip does all of the audio decoding. Same chip used in my Onkyo and Anthem AVRs.
So I had to reverse-engineer the basic SPI config commands for that, too.
Anyway, Dreamcast...
Yeah I use the Nvidia shield 2019 pro for all my streaming. (And Dreamcast emulation 😉 to keep it on topic haha)
I heard quite a few peeps in the home cinema groups / servers use the shield, yeah.
Never used one myself, but heard good things about them.
I'm very glad I bought the Zidoo, as it was only around £160 (second-hand) on Feebay.
Doesn't do 8K, but meh. I can't imagine too many people even to this day have an 8K TV. lol
Doesn't seem THAT long ago (maybe 6-7 years) that I even bought my first 4K TV and monitor.
I wanted something that supported the most things out of the box. It doesn't support AV1. People do complain about how slow the UI is. You can debloat (which is what I done) aswell as remove the stock launched and ads. Replace it with a new launcher.
People do suggest the apple TV 4k device. But I believe that doesn't do proper lossless audio passthrough.
Ugoos am8+ I think is the name? Can support all of the above. But you have to flash coreELEC to it and only use Kodi. Which is fine.
I think if I was going to do it again. Il probably get the am8. Just because my new TV has android TV built in. So I don't need the shield's android TV stuff.
(Kodi also has a great add-on called PlexMod4Kodi. Which basically gives you the 'front end of Plex and access to your Plex server' and the back end of Kodi. (The amazing Kodi player.). I had to do this for a while as I had truehd7.1 issues with the Plex app. Worked fine with Kodi, so I assumed it was plexs fault. To cut a long story, it was my fault. I messed with some developer settings on the shield. Once I sorted that out, Plex is fine.
But yeah Dreamcast stuff. My favourite Dreamcast game has to be Resident Evil Code Veronica. Although I am partial to the Biohazard 2 value pack. (Japanese version of RE2) As that comes with a playable demo for Biohazard Code Veronica 🙂
The Zidoo still has a few firmware quirks, like audio not being output after unpausing a vid.
And some of the newer Beta firmware caused slower UI, etc.
But overall, I really just wanted something mostly-reliable, with fairly fast UI, and no extra apps.
It technically can run Android apps, but even that would often crash-out. lol
And I don't think Nutflicks even worked with DD+ / Atmos, which sucks.
We have four 4K Firesticks in the house, and I've been very impressed with them overall.
After struggling with trying to run Kodi etc. on Rasp Pis for years, it was never a good experience.
I do have an iPhone again now, but just Apple in general, I'm not a big fan of. lol
Some stuff on Apple TV looks worth a watch, but then I barely even watch Nutflicks nor Amazon these days.
And cancelled Disney+, shortly after watching the finale of S2 of Mandalorian.
But, I have to say for anyone even mildly interested in Star Wars, by God, that ending.
I nearly cried. lol
No spoilers, please.
That was a really enjoyable conversation about DVD to read through earlier ❤️
Slamy is working on mpeg1 decoding for the cd-I core, although I think he is using a soft risc-v processor with some special optimizations
Might have some overlap with a dvd core/ld core?
I wouldn't dive into mpeg2 before someone has mpeg1 working, it also may be beyond what can be done on the DE-10 Nano
Just bringing it up, and we are in a channel dedicated to the “may be beyond the de-10 nano” already 😉
Awww, I miss it saying “press the fart button”
I have a Ugoos am8+ with coreELEC to take advantage of the full enhancement layer of Dolby Vision for my 4k rips. Love that thing
I think this core might even play MPEG1. tbh, I hadn't tried it yet.
No idea how much logic the MPEG2 thing takes up.
The core itself was from a University project, I think from around 2007.
So far, it just seems to "work", at least in the sim. You just feed in each byte from the file, as long as the "busy" signal is Low, and it spits out the frames.
The big problem is - their original Xilinx dev board used ZBT RAM, which is basically like Synchronous SRAM.
And it needs a minimum of about 4MBytes to decode SD (NTSC/PAL, etc.)
And around 16MB to decode up to 1080p.
Unless we can figure out when the MPEG2 core needs to do Burst reads/writes, it could prove very tricky to get working with SDRAM or DDR.
Not too bad, thanks. I just still have this terrible habit of slouching whilst watching YouTube or coding etc.
Which is causing all kinds of quite serious health problems, including blurred vision, anxiety, leg weakness, etc.
It's my own doing, but REALLY hard to break the habit.
I think I need to just ditch the TV from the bedroom, and force myself to use the room with the armchair.
Dayum, I never thought of that. I even watched the Retro World panel video from last year...
Where they (including Kitrinx) were talking about the CD-i core, and MPEG1 support.
sounds like a plan 😛
I will try an MPEG1 stream on this core/sim thingy tonight.
i need to do something about my weight... will go to surgery next month if it all goes according to plan
If anyone can get the MPEG core working with SDRAM or DDR, it's probably slamy.
I have trouble with leaving the house, and doing all the "normal" things, like taking the driving test.
But I've been to the gym six times, and desparate to get back to it.
My brother was taking us originally, but said he can't afford the petrol (it's two miles away), nor the extra food (fair enough. lol)
Both my brother and nephew said it is quite far to walk. By that time, you lose all motivation to use the gym itself.
And then people would ask the obvious question: "Why not just go on the walks?" lol
Hard to explain, but in the very few times I went to the gym, it just "worked" for me.
I immediately started feeling better, in almost every way.
And didn't have to do too much in each session to feel it.
I only did a brisk walk on the treadmill for about 20 minutes, then 1KM on the rowing machine, then some bicep curls etc.
And after the third time, I started feeling so much better.
i know a lot of people that love the gym tbh
sadly i just feel horrible after going 😛
And that was only me starting off. I'm incredibly unfit, so I couldn't go too mad at first, or I'm genuinely at risk of a heart attack.
Maybe you did a bit too much in each session?
I always had the water bottle near me, constantly drinking.
Then had a protein shake just after.
I did have the muscle aches for a few days after each gym session, so it was definitely doing something.
But I should have kept going, at least a few times a week, for years.
Kind of hard to get to places when you don't have a license. lol
I could probably just about afford a taxi each time, but it's not cheap.
(even for only a 4-mile round-trip. And we don't have Uber here, because I live in Mordor.)
We have Uber Eats, though, which is half my problem. lol
Nice.
I was never overly into cars before, but I've been a bit obsessed with car and engine channels recently.
Mat Armstrong is an awesome channel.
Some of his road trip vids are getting close to the quality of (old) UK Top Gear.
And he mainly restores supercars, etc, which makes it more interesting.
The retail price of a lot of the "Supercars" seems very overpriced, tbh. lol
I took driving lessons from the age of 17, probably about five different instructors.
But then I would have what I would perceive as a "bad" lesson, just because I didn't do reverse parking on the first attempt.
Procrastination, etc, it's not good.
So I never once took the test.
Probably about 70 or more lessons in total, over many many years.
If I don't do it soon, it will never happen.
Surprisingly hard to find any old MPEG1 / VideoCD test files online now, btw.
I mean, they are out there, but it's not like the first stuff that pops up on Google any more. 🙁
Half the stuff labelled as "MPEG" is an MP4, which is often encoded with H.264 etc.
Wish I still had some of my old VideoCDs.
Supposedly MPEG1, but VLC shows it as MPEG 1/2, which is not helping. lol
Extracted the Elementary Stream, using...
ffmpeg -i test-mpeg.mpg -vcodec copy -an stream.m1v
Core decodes DVD Video just fine, btw.
Well, that was the Dolby City trailer, from a VOB file, with the ES extracted.
Still takes around 7 seconds of sim time, just to render each MPEG2 video frame.
Trying the MPEG1 sample now...
Nope, doesn't seem to work, sadly.
So the core likely is only for MPEG2.
I just thought it was possible it was backwards-compat.
BBL
Or, maybe not back. lol
Discord was down for a while.
I don't know what they are using for servers, but I reckon it might be the ZX Spectrum.
Not to be confused with "MPEG"...
The MPEG2 files for Dragon's Lair are on archive.
I can understand both sides of the discussion for a DVD "core".
There are lots of ways to play older video formats these days. The HPS on the DE10 might even be OK for MPEG1. Not sure.
(not including things like MiSTerCast, ofc, that's cheating. lol)
But, I always try to look at the bigger picture (excuse the pun) with MiSTer / FPGA cores.
You never quite know which projects might help with future stuff, or even existing ones.
So if somebody wants to work up something which is a bit "out there", and might not seem hugely practical on the face of it... let them.
That's not to say that I won't have my own opinions on some of the ideas. lol
But I think it should be encouraged, if somebody seems to have the enthusiasm to work on something new.
I mean, look at the amount of crap I took, for even starting to work on some Dreamcast PVR stuff.
Even when I tried to make it VERY clear that I don't really know what I'm doing, couldn't/can't promise an entire core, didn't make any promises on any part of it, etc.
After the Pixel Ninja and VGE vids went out, I saw some ridiculous comments from a few people, that I was the one "clickbaiting" them, or promising far more than I can deliver, when I never promised it in the first place. sigh
I just work on what interests me, and try to learn something new.
Anyway, I'm not sure how much of the MPEG2 core applies to MPEG1 decoding. Maybe completely different, especially in the way the incoming stream gets parsed.
It’s not your fault that YouTubers that you’ve never talked to overhyped the project, why the hell would people think that?
Obviously stuff can be transcoded to different formats, but that doesn't really help cores like 3DO, PS1, and CD-i, which had their own MPEG1 decoder carts.
Not sure, but one person actually apologized after I explained things, and even offered to delete their reply, which they did.
So the code for CD-i MPEG-1 won’t work for Saturn?
Good for them.
I was already very hesistant about posting at all about the PVR stuff, until it was at least doing half-decent renders of single frames. Then I posted on TwitterX about it, because I wanted to get hold of Simon Finney (one of the main PVR designers).
And his account was only accepting DMs from friends, IIRC. Including his bluesky account.
It probably would, tbh. Like with the MPEG2 decoder, you pretty much just shove the data into a FIFO, and the decoder does the rest.
The issue with Saturn is space, right?
So I would think if somebody finishes an MPEG1 decoder, it would be usable for the other cores, with some tweaks, and depending on how much extra stuff the original FMV carts had.
Yeah, Saturn core is very full now. I even had to disable the CD stuff whilst working on testing STV. lol
Hence, ST-V is now a separate core, which in hindsight was the right choice.
EDIT: I mean, it's practically the same core, but with some stuff like CD disabled, and compiled from the same Quartus project, just from a different QPF file.
So that is obviously another problem for adding MPEG1/2 to certain cores.
I imagine PS1 is quite full, too, but I haven't compiled it in a while.
IIRC from one of the early PS1 cores, though, it was quite efficient on the amount of logic used. At least from what you might expect.
The PS1 was a far more "straightforward" design vs the Saturn.
Like, not having about seven different custom chips. lol
And PS1 only having ONE main CPU, if you don't count the small MCU used by the CD block, etc.
(the "leaked" N64 RCP Verilog, btw, when compiled for the Cyclone V C5G board, was using 55,000 Logic Elements. 😮 )
That didn't even include the R4300 CPU.
Uses a combination of FPGA and software. No idea how much of each. lol
External logic receive h.264 NAL stream from FIFO, which is external memory, and decodes it as YUV4:2:0,then store it into the external memory OsenLogic OSD10 includes stream parser(extensions and CAVLC),IDCT/Iquant(residual),internal predictor and deblocking filters parser can work without CPU, which means PL-end can do all the work
PL = Programmable Logic, IIRC.
Which is what the newer Xilinx SoC chips call "the FPGA stuff".
Yep, on Xilinx Zynq, PL = Programmable Logic (FPGA), and PS = Processing System (ARM stuff).
What deepseek said...
Since your core decoding pipeline (VLD, IQ, IDCT, MC) is already working for MPEG2, adding MPEG1 support is a very manageable task. The challenge is less about the core algorithms and more about controlling the data path and parsing the different stream formats correctly. Good luck
"manageable" lol
Ah yes I remember that discussion. Have you checked out the dvd player core thread on this discord. I think I remember people linking to various projects.
Actually I think it's all the same information you're linking to here
Ahh, see - I never saw that channel before. lol
'cos it's kind of hidden, until I post on there.
Even if I switch back to this channel, the DVD Player Core one disappears.
I'll probably try to keep future stuff about the MPEG2 thing on that channel.
For "Side Quest 2" last night, I was looking at trying to decode the metadata from Dolby Atmos streams.
I think it will take some effort to get the MPEG2 thing running on MiSTer.
Because the original dev board the Uni students had, used ZBT RAM, which is like super fast SRAM.
It also had a 64-bit bus.
To get the required (peak) transfer rates needed for the decoder, we'd have to figure out when to do Burst transfers.
That goes for either SDRAM or DDR3.
With the old-people SDRAM, it takes basically 8 clock cycles to read/write any random 16-bit Word.
(with a single module).
So even if the SDRAM itself is running at say 128 MHz, the "core" can only access it at a max of 16 MHz.
The only way to get the full performance, is to do Burst transfers, which reads/writes an entire row (probably 1,024 or 2,048 bytes) at the full 128 MHz.
It's a secret hidden channel that only appears when you accept the worthiness of a dvd player core into your heart ❤️
Bumperoo! So I just heard about the ModRetro M64. Would that thing be closer to the power needed to run a Dreamcast core?
Playing Sonic Adventure with an Nintendo 64 controller, makes sense.
/s
You ran Reicast on the Linux side with software rendering?
no, scroll back through this channel. he's been making a dc gpu on the de10, they've at least got the emu talking to it. not sure anything is full speed at this point tho
Allo.
Not quite - skmp tweaked reicast, so it can compile and run on the ARM side.
reicast then did all the normal stuff, to write the necessary textures and vertex info into VRAM.
That then kicks off the "GPU" on the FPGA side, to render each frame.
But even with the rendering completely disabled, reicast itself wasn't quite running at full speed.
The ARM cores aren't especially great on the Cyc V SoC. Probably not even quite Rasp Pi 2 level?
skmp did do a version with "Fastmem" added, but I couldn't get that to cross-compile at the time, and haven't tried since.
Also, it probably goes without saying, my crusty PVR2 stuff on the FPGA side was very very slow. lol
This is how slow it was running a while ago.
Maybe 5-6 FPS on the BIOS menu, but way slower in games. But then the PVR2 stuff could only manage to run at about 20 MHz before having timing issues...
#dev-talk message
None of that was taking advantage of DDR3 Burst transfers, though. That should give a significant speed boost.
(there was also an issue with it displaying both the previous and new frame at the same time, as I couldn't quite figure out how to tweak ASCAL to display the framebuffer from the weird interleaved memory mapping PVR2 uses.)
I have no idea how much faster the Agilex 3 / Agilex 5 might run complex cores at.
But a good friend (and excellent FPGA dev) reckons it could be up to 2.0 to 2.5x faster than the Cyc V typically manages.
(assuming you can increase the mem bandwidth of the core enough, too, and depending on the specific core, ofc.)
That could mean running some existing (ao486, minimig) cores at closer to 200 MHz on Agilex? Not sure.
I tried compiling the PS1 core in Quartus 25.1 the other day. It didn't go well. lol
The newer Quartus seems WAY more strict about certain things. It gave me about 280+ Errors, which would normally be considered Warnings on Quartus 17.
And I have no clue if those can be downgraded to Warnings again, so it could take some time to get the MiSTer framework and cores "ported"?
i noticed you can place orders on DE25nano now, no idea on shipping dates though, i wonder would that be enough to do it?
it has 2x ARM Cortex A76 at 1.5GHz, and 2x ARM Cortex A55 at 1.2GHz
RPI3 has A53 cores so should be faster than an RPI3 in theory
RPI4 is quadcore A72
Have you run it at 1.2GHz?
Can all DE-10 Nanos run at 1.2GHz with proper cooling?
All the ones I've tried can
All mine
I think I tried overclocking the ARM cores to about 1 GHz, but it was unstable above that.
Not sure, though. I might have been trying to push it further. It was far more unstable when trying to boost the mem timings.
That was just using the usual overclock script, or a bash commnd.
Dreamcast baby!!!
A way to compile the MK64 Dreamcast port, but you have to supply your own ROM.
It was one of the few ways they could make it easier for people to compile, while voiding most of the legal stuff. lol
GIF was made from frames rendered on the Verilog thingy.
It seems to work quite well, probably because it has far less geometry than a typical Dreamcast game.
It was rendering at a calculated 90 FPS, for most of the frames.
So I've been trying to add the proper cache stuff today, to see if I can get the speed up on the FPGA test "core".
The calculated FPS on the sim assumes 100 MHz on the FPGA, though. It will take (me) a long time to get it anywhere near that fast.
The old test core managed about 30 MHz, before it started falling apart.
But if a port like MK64 can do a theoretical 90 FPS on the sim (assumes 100 MHz), then at 30 MHz it might manage close to 20 FPS.
(which I would consider "playable".)
However (lol)...
The build of reicast for the ARM on MiSTer wasn't running too great, even with the PVR thing completely unlocked.