#Sega Dreamcast

1 messages · Page 3 of 1

rain obsidian
#
    // 610-0374-96   1998     317-5041-COM   ST-V
    ROM_PARAMETER( ":315_5881:key", "05272d01" )
#

Booo!, again. ^

#

Evil

#

I really just want to see a 3D game running. lol

#

I did it wrong - I copied the original ROMs, not the Byteswapped. Oops.

surreal sluice
rain obsidian
#

Nope. Black screen. What a surprise.

#

Protection IC, probably.

surreal sluice
#

Dang

rain obsidian
#

With the Stv1061 BIOS, it does get to the spinning logo at least.

#

But meh. No dice.

#

Not even an Andrew Dice Clay.

#

315-5881 encryption chip, looks like it was used on tons of NAOMI carts, too.

#

So likely fairly complex.

#

I won't be attempting to write that any time soon. lol

#

Didn't realize there are SO many puzzle games on ST-V.

#

It's puzzle games, all the way down.

#

Of course it's Mahjong.

surreal sluice
#

Shienryu is great try that

#

Bet the good ones are copy protected 😉

faint oxide
mossy falcon
#

there probably are decripted versions on the internet

rain obsidian
#

Mr Clay was quite popular on late-night TV, even here in Mordor, UK in the late 80s. lol

#

And Doug Stanhope, Bill Hicks, George Carlin, Richard Pryor, etc.

#

Nice.

#

And I can Coin-up!

#

Now I see where Karl Jobst gets his ideas from.

#

ffs, I pressed the Test button by mistake. lol

#

There seems to be an invisible shadow person moving around.

#

But not the guy on the left.

#

That was again using the epr-26603 BIOS.

#

I think MAME just choses the latest version of the BIOS that matches the game Region?

#

Most of these games will be Japan region, obviously.

#

I got rsgun wrong too, btw.

#

I didn't see it had an offset for where the SH2 ROM gets loaded.

#

For Groove On Fight, I had to duplicate the SH2 ROM four times.

ripe stump
#

Ash just casually building an stv core nbd

rain obsidian
#

Guardian Force, not working yet. 😦

#

Black screen of death.

#

Ooh.

#

Can't start the game, though. Very likely the protection IC thing.

#
ROM_START( rsgun )
    STV_BIOS

    ROM_REGION32_BE( 0x3000000, "cart", ROMREGION_ERASE00 ) /* SH2 code */
    ROM_LOAD16_WORD_SWAP( "mpr20958.7",   0x0200000, 0x0200000, CRC(cbe5a449)
    ROM_LOAD16_WORD_SWAP( "mpr20959.2",   0x0400000, 0x0400000, CRC(a953330b)
    ROM_LOAD16_WORD_SWAP( "mpr20960.3",   0x0800000, 0x0400000, CRC(b5ab9053)
    ROM_LOAD16_WORD_SWAP( "mpr20961.4",   0x0c00000, 0x0400000, CRC(0e06295c)
    ROM_LOAD16_WORD_SWAP( "mpr20962.5",   0x1000000, 0x0400000, CRC(f1e6c7fc)

    // 610-0374-96   1998     317-5041-COM   ST-V
    ROM_PARAMETER( ":315_5881:key", "05272d01" )
ROM_END
lime mango
#

mame does list it as 'game start protection'

rain obsidian
#

The main "fix" to get that running, was staring at those offsets a bit longer.

#

Ahh, OK.

#

Each one of those ROM chunks are 4MB.

#

But often, the SH2 code is smaller.

#

For rsgun, the SH2 code ROM is only 2MB.

#

So I had to duplicate that twice, to get the offsets correct for the other four ROMs.

#

Cleared the Backup mem, using the Test menu.

#

Hoping with 0 Credits, it might do an attract demo.

#

Not quite.

#

So forget that one for now. Needs protection chip stuffs.

#

VF Remix, no dice.

#

No VF Kids. No giant heads.

#

Trying to avoid the MACHINE_NOT_WORKING stuff now.

fossil flameBOT
#
GAME( 1995, vfremix,   stvbios, stv,      stv,      stv_state,   init_vfremix,    ROT0,   "Sega",                         "Virtua Fighter Remix (JUETBKAL 950428 V1.000)", MACHINE_IMPERFECT_SOUND | MACHINE_IMPERFECT_GRAPHICS | MACHINE_NOT_WORKING )
rain obsidian
#

Works with STV1061 BIOS.

#

But can't coin up.

#

Can't move around the game Test menu either.

#

So maybe some IO port stuff holding it off.

#
    // do strict overwrite verification - maruchan and rsgun crash after coinup without this.
    // cottonbm needs strict PCREL
    // todo: test what games need this and don't turn it on for them...
    m_maincpu->sh2drc_set_options(SH2DRC_STRICT_VERIFY|SH2DRC_STRICT_PCREL);
    m_slave->sh2drc_set_options(SH2DRC_STRICT_VERIFY|SH2DRC_STRICT_PCREL);
#

Not sure if that's only related to the DRC (Dynamic Recompiler) stuff in MAME. It might not apply to the core.

#

Very attractive demo.

#

It's just a shame most of these are on the Saturn. lol

#

Can play the game fine, but I died twice in the first 20 seconds.

little basin
rain obsidian
#

lol

#

I can't do the bullet hell shooter thing.

#

The highest "difficulty" level for me, is roughly Mario 64.

little basin
#

I like em, but suck at em

rain obsidian
#

Maybe Portal 2.

#

I played Portal 2 around 2015, having never played the first one.

#

I couldn't stop playing, and completed it in 7 hours, without a break. lol

little basin
#

original was great but 2 such a massive step up

rain obsidian
#

That's pretty much the last time I played a game, all the way through.

little basin
#

you can still DL Narbacular Drop, the uni project that lead to valve hiring the creators to make portal

#

i really enjoyed both the talos principle games

rain obsidian
#

That's as far as it gets. lol

#

Black screen after that.

#

But fixed the game Test menu. I had the offsets wrong again

#
ROM_START( diehard ) /* must use USA, Europe or Taiwan BIOS */
    STV_BIOS
    ROM_DEFAULT_BIOS( "us" )

    ROM_REGION32_BE( 0x3000000, "cart", ROMREGION_ERASE00 ) /* SH2 code */
    ROM_LOAD16_BYTE( "fpr19119.13",               0x0000001, 0x0100000, CRC(de5c4f7c)
    ROM_RELOAD_PLAIN ( 0x0200000, 0x0100000 )
    ROM_RELOAD_PLAIN ( 0x0300000, 0x0100000 )
    ROM_LOAD16_WORD_SWAP( "mpr19115.2",    0x0400000, 0x0400000, CRC(6fe06a30)
    ROM_LOAD16_WORD_SWAP( "mpr19116.3",    0x0800000, 0x0400000, CRC(af9e627b)
    ROM_LOAD16_WORD_SWAP( "mpr19117.4",    0x0c00000, 0x0400000, CRC(74520ff1)
    ROM_LOAD16_WORD_SWAP( "mpr19118.5",    0x1000000, 0x0400000, CRC(2c9702f0)
ROM_END
little basin
#

just having a read about st-v, honestly hadnt heard of it until this thread

rain obsidian
#

Really? lol

#

There are a few arcade things I don't know about either, to be fair.

#

I even forgot that the Lindberg and Ringworm were a thing.

#

(RingEdge)

#

Core is weird.

#

I only got Die Hard to show the Winners screen twice.

#

Usually after re-loading the FPGA from SnargleFlap, then chosing the correct BIOS (epr17952a).

#

But when it crashes, it really crashes.

#

SignalTap shows it jumped to some random address, then reads garbage until the end of memory, then halts.

#

Once it crashes, even loading a new BIOS won't get the core running again.

#

Not even the Saturn BIOS.

#

In general, that means something in the core isn't being reset correctly.

#

Done for tonight.

#

ST-V gave me a headache.

#

Wasn't able to get Shienryu running a second time, for some reason.

little basin
#

cheers for sharing what you're working on, always interesting to read about progress

golden cradle
#

I know it's a long shot, but just seeing Dreamcast listed among the mister-cores section, that's just wow. Impossible is nothing? I recall seeing MiSTer stuff about 5 years ago that people were saying SNES wasn't possible but look at everything that's happened. Hopeful not hopeful but still hopeful yet not expecting it but who knows 😛 N64 was the impossible so DC would be insane?

rain obsidian
#

The biggest hurdle for a Dreamcast FPGA core (even on a larger FPGA), is the 200 MHz clock freq that the SH4 runs at.

#

Obviously the entire rest of the console is a mammoth task, too. lol

#

But I made some progress with the PowerVR2 stuff.
Hoping to improve the SDRAM stuff soon, so it can get close to the theoretical frame times (on the DE10) that the sim does.

#

But a whole lot of other stuff is missing from the GPU atm, like the Tile Accelerator.

#

And the graphics are garbled atm, due to my crappy float-to-fixed stuff.

#

I don't even know if just the GPU alone will fit on the DE10 yet, once all of the proper speed-up logic is added.

#

If it can eventually be finished, though, I'd be happy to design a new FPGA board just to play Dreamcast and many existing cores.

#

It will... probably be a bit expensive. lol

#

I'm always hopeful.

#

Years ago, when MiSTer was only just starting (and I was talking to Sorg a lot), quite a few people said Saturn, PS1, and N64 were all impossible on the DE10.

#

So never say never, but anything beyond N64 took a huge jump in complexity and clock freq.

#

I don't think most of these mid/low-level FPGAs like the Cyc V and Cyc 10 will be able to handle much above 100 MHz with complex cores.

#

Even if you can somehow get DC to fit.

#

The thing about using an external SH4 is - the bus between the SH4 and HOLLY "GPU" runs at 100 MHz.

#

AFAIK, it's only the SH4 internally that runs at 200 MHz.

#

I think even the SH4 main SDRAM runs at "only" 100 MHz, too.

#

Compulsary SH4 + SDRAM hat board photo...

#

Just to show it is possible. lol

#

Still messing with ST-V.

#

Something something crash something, soundRAM...

#
map(0x05a00000, 0x05afffff).rw(FUNC(stv_state::saturn_soundram_r), FUNC(stv_state::saturn_soundram_w));
#

Trying to boot Cotton Boomerang again.

#

Could be related to the way I disabled the Saturn CD and YGR chip logic?

#

Works again, on the good old stv1061 BIOS.

#

But still can't Coin-up.

#

Also on stv1061 BIOS.

#

I fixed the ROM duplication / offsets thing.

#

I can Coin-up this time, but no Start.

#

And can't use the game's test menu.

#

Something funky with the IO stuff still, or some PORTG Counter, or some other crap it's waiting for.

#

SignalFlap shows it still running code, during that test menu...

#

And it's also reading from the IO chip logic...

#

System, P1, P2, then repeats.

#

Gonna change it back, to replicate the 8-bit port stuff across each 32-bit Word.

#

'cos if the games are using the ODD addresses for the IO stuff, they might also be reading bits [15:8] of the data (which currently always read back as 0x00), not bits [7:0].

#

Which will make it think tons of buttons are being pressed at once.

rain obsidian
#

Very very often getting stuck whilst polling SoundRAM.

#

Including in the inbuilt factory test menu.

#

(when doing the Clock Change test.)

#

stv1061 BIOS again.

#

Start button no worky.

golden cradle
#

Amazing work 👏

rain obsidian
#

Thanks. 😉

#

Games like Guardian Force are constantly polling this port...

#

Which is PORTF.

#
PORTF_CS ? {4{P4_CONT}}        :    // 0x0b. P4 / Extra 6-button Layout.
#

And used for Player 4 in many games, but also as a kind of Kick harness input, by the looks of it.

#

That could also be why the Start button doesn't work in some games.

#

Some of them might be using the "Kick" inputs as the Start buttons.

#

Can't easily trigger buttons for extra players on MiSTer, without doing evil keyboard mapping stuff.

#

Or, plugging in extra joypads.

#

I mean, I could add extra buttons to the CONF_STR, so they appear in the menu.

#

Then map those to P4_CONT.

#

A lot of info needs to be collected for each game.

#

All of the MRAs generated.

#

Which BIOS each game prefers to use.

#

Which have protection / decrytion ICs, etc.

#

Grdforce seems to work in MAME, with the usual P1 Start button?

#

Hard to control the tank, as two buttons are used to swivel the turret. An interesting control scheme. lol

#

And a nice looking game.

#

I think from the makers of the Cotton games?

rich kindle
surreal sluice
rich kindle
#

Tell this to my LCD which can just be rotated clockwise.

#

But yes, the Saturn STGs seem to need CCW altogether.

rain obsidian
#

A bit finnicky to get games running.

#

Quite a few seem to work better when using the stv1061 BIOS.

#

Some with their recommended BIOS that MAME uses.

#

In the case of most Japan region games, that would be epr23603.ic8

polar goblet
#

One particular game I'm interested in with the ST-V is Taisen Tanto-R Sasissu.

It's the obscure fourth installment in the Puzzle & Action series that was never ported anywhere else.

rain obsidian
#

I think I also had to byteswap the BIOS files first.

#

Also added .bin to the end of all files.

#

And making each Cart file is currently a bit of a chore.

#

When s BIOS file has the correct byteswap for loading, it should show the SEGA string in cleartext...

#

Also need to always have the Cartridge option set to ST-V, before you load a BIOS or Cart...

#

If you don't add the .bin to the end of each file (even if it has to be .ic8.bin etc.), the MiSTer file browser will hide the files.

#

I don't think the ST-V games read the Region from the menu option. They just depend on the version of BIOS being used.

#

The Region in the menu is just for Saturn, AFAIK.

#

Lots of stuff to confirm, not many games work.

#

And when it crashes, it REALLY crashes. lol

#

Usually have to reload the core, to try running a different game.

#

'cos when the SH2(s) crash, it's like they're not being properly Reset again.

#

Even if a game boots, it might not Coin-Up, or the Start button won't work.

#

Some games are fussy about having the Backup RAM cleared first, which can be done by hitting the Test button, and using the Test/Service menu.

#

Current joystick mapping...

#
(joystick_0, meaning Player 1.)

joystick_0[0] = Right
joystick_0[1] = Left
joystick_0[2] = Down
joystick_0[3] = Up
joystick_0[4] = Saturn A     -> ST-V B1.
joystick_0[5] = Saturn B     -> ST-V B2.
joystick_0[6] = Saturn C     -> ST-V B3.
joystick_0[7] = Saturn Start -> ST-V B4.

Then the upper bits of joystick_0...

joystick_0[8]  = Saturn R -> ST-V Coin 1.
joystick_0[9]  = Saturn X -> ST-V Coin 2.
joystick_0[10] = Saturn Y -> ST-V Service (no toggle?)
joystick_0[11] = Saturn Z -> ST-V Service 1.
joystick_0[12] = Saturn L -> ST-V Start P1.
joystick_0[13] = ST-V Start P2.
joystick_0[14] = ST-V Multi-Cart select?
joystick_0[15] = ST-V Pause.
#

Games like Die Hard seem to crash badly, as it jumps to some wrong code, then the SH2(s) just carry on reading garbage, until the Address wraps.

#

Cart BINs larger than 32MB probably won't work anyway atm.

#

The main reason some cart BINs are larger than 32MB, is because there's no proper MRA support yet.

#

So the MAME ROMs have to be padded when joining into a single file, so the offsets end up correct in memory.

#

Example mapping from the MAME code...

#
ROM_START( dnmtdeka )
    STV_BIOS

    ROM_REGION32_BE( 0x3000000, "cart", ROMREGION_ERASE00 ) /* SH2 code */
    ROM_LOAD16_BYTE( "fpr19114.13",               0x0000001, 0x0100000, CRC(1fd22a5f)
    ROM_RELOAD_PLAIN ( 0x0200000, 0x0100000 )
    ROM_RELOAD_PLAIN ( 0x0300000, 0x0100000 )
    ROM_LOAD16_WORD_SWAP( "mpr19115.2",    0x0400000, 0x0400000, CRC(6fe06a30)
    ROM_LOAD16_WORD_SWAP( "mpr19116.3",    0x0800000, 0x0400000, CRC(af9e627b)
    ROM_LOAD16_WORD_SWAP( "mpr19117.4",    0x0c00000, 0x0400000, CRC(74520ff1)
    ROM_LOAD16_WORD_SWAP( "mpr19118.5",    0x1000000, 0x0400000, CRC(2c9702f0)
ROM_END
#

That expects the SH2 code ROM (fpr19114.13) to be non-byteswapped. Assuming it already has the SEGA string in "cleartext".

#

And the fpr ROM is only 1MB, but the next ROM starts at the 4MB offset.

#

So you have to repeat the fpr ROM four times, before joining the other ROMs on.

#

And the other four ROMs need to be Byteswapped beforehand!

#

To Byteswap atm, I'm just using "dd" in WSL / Ubuntu...

#
dd if=mpr19115.2 of=mpr19115_swap.2 conv=swab
#

Then I'm joining all the ROMz using a DOS Prompt, because I'm too lazy to figure out how to do it in Linux.

lime mango
#

cat 1 2 3 4 5 6 > merged

rain obsidian
#
COPY /B fpr19114.13+fpr19114.13+fpr19114.13+fpr19114.13  + mpr19115_swap.2 + mpr19116_swap.3 + mpr19117_swap.4 + mpr19118_swap.5  dnmtdeka.bin
#

Thanks. lol

#

I knew it should be fairly straightforward in Linux, I just didn't bother Googling it. lol

#
cat fpr19114.13 fpr19114.13 fpr19114.13 fpr19114.13 mpr19115_swap.2 mpr19116_swap.3 mpr19117_swap.4 mpr19118_swap.5 > dnmtdeka.bin
#

^ Note how I repeated the SH2 ROM four times, as it's only 1MB. It also wasn't pre-byteswapped.

#

Then the other four Byteswapped ROMs are joined on.

#

Basically, it's a very crusty "core" patch atm. If it was a dog, it would be put to sleep.

#

Haven't actually tested dnmtdeka yet. Trying it now...

#

Oh, and many ROMs need the correct settings in the Test menu...

#

^ If the settings are wrong, the game name often won't even appear in the EACH GAME TEST list.

#

This was set to 1P and ALONE before...

#

Many games only work in Multi-Cart mode, even though it usually just repeats the same ROM four times atm.

#

"EACH GAME TEST" triggers the setup/diag menu specific to each game.

#

You don't usually need that, but some games need the Backup RAM cleared before they will run.

#

Backup RAM isn't currently saved to SD card, so the game has the wipe it to defaults.

#

When it reads the initial cart header correctly, it should show something like this...

#

As soon as you Coin-up, it usually changes to a different animation.

#

Which I think means it has loaded a chunk of Cart boot code into RAM OK.

#

Alas - I can't get this game to run. lol

#

Not with the epr-23603.ic8 BIOS, nor with stv1061.

#

And it has properly crashed each time...

#

The cart BIN ends up as 20MB, which is obv below the 32MB.

#

No idea what's causing most of the crashes atm.

#

Trying stv1061 again, and a Backup clear.

#

The game's own test menu works.

#

Triggered from "EACH GAME TEST".

#

We'll have us some Violence Mode.

#

Sometimes worth a try, using the game's own Backup Clear option.

#

And leaving it with no credits, for the anim to timeout, and (hopefully) jump into the game attract demo.

#

Aaaaaand, nope.

surreal sluice
#

Always choose violence

rain obsidian
#

lol

#

VFREMIX super crashes.

#

Cotton BM only loaded when using the stv1061 BIOS, and I did a Backup Clear from the test menu as well.

#

Didn't add Credits. I just left it to run.

#

It does Coin-Up, but Start won't work.

#

Fairly sure that uses the P4 inputs for extra buttons, including Start.

#
ROM_START( sasissu )
    STV_BIOS

    ROM_REGION32_BE( 0x3000000, "cart", ROMREGION_ERASE00 ) /* SH2 code */
    ROM_LOAD16_BYTE( "epr20542.13",               0x0000001, 0x0100000, CRC(0e632db5)
    ROM_RELOAD_PLAIN( 0x0200000, 0x0100000)
    ROM_RELOAD_PLAIN( 0x0300000, 0x0100000)
    ROM_LOAD16_WORD_SWAP( "mpr20544.2",    0x0400000, 0x0400000, CRC(661fff5e)
    ROM_LOAD16_WORD_SWAP( "mpr20545.3",    0x0800000, 0x0400000, CRC(8e3a37be)
    ROM_LOAD16_WORD_SWAP( "mpr20546.4",    0x0c00000, 0x0400000, CRC(72020886)
    ROM_LOAD16_WORD_SWAP( "mpr20547.5",    0x1000000, 0x0400000, CRC(8362e397)
    ROM_LOAD16_WORD_SWAP( "mpr20548.6",    0x1400000, 0x0400000, CRC(e37534d9)
    ROM_LOAD16_WORD_SWAP( "mpr20543.1",    0x1800000, 0x0400000, CRC(1f688cdf)
ROM_END
#

Linux...

#
dd if=mpr20544.2 of=2 conv=swab
dd if=mpr20545.3 of=3 conv=swab
dd if=mpr20546.4 of=4 conv=swab
dd if=mpr20547.5 of=5 conv=swab
dd if=mpr20548.6 of=6 conv=swab
dd if=mpr20543.1 of=1 conv=swab
#

My output filenames are just 3,4,5,6,1, because again, lazy.

#

epr ROM stays non-byteswapped, but needs to be repeated four times, since it's only 1MB...

#
cat epr20542.13 epr20542.13 epr20542.13 epr20542.13 2 3 4 5 6 1 > sasissu_stv.bin
#

My friend hates me using scp to copy files to MiSTer, for some reason? lol

#

Nope. 😦

#

Doesn't work with the stv1061 BIOS either, even after clearing Backup RAM.

#

Just before the crash. ^

#

Just seems to be reading garbage, and the SH2 address incrementing.

toxic ingot
rain obsidian
#

I can't, it doesn't give me permission to even post there?

#

I only found Saturn the other day, after doing a search for ST-V.

#

After I tried to post, the whole channel name even disappeared. lol

#

oic, it's kind of back now.

#

Might be hard to keep track of all of this on there now, though.

forest echo
#

Let's keep all your STV work here, best not derail that channel that is focussed on srg320s work as it will just get messy and confusing for people

rain obsidian
#

Yeah.

#

This channel was pretty much created for me to post my random ramblings on anyway.

#

At least for Dreamcast, originally.

#

OK, so that last game was reading some proper cart ROM data before it crashed.

#

Making me wonder if there is an alignment issue.

#

'cos I don't get why MAME has a load offset of 1, for most games?...

#
ROM_START( sasissu )
    STV_BIOS

    ROM_REGION32_BE( 0x3000000, "cart", ROMREGION_ERASE00 ) /* SH2 code */
    ROM_LOAD16_BYTE( "epr20542.13",               0x0000001, 0x0100000, CRC(0e632db5)
#

As usual with MAME, confusing AF.

#

I get that a lot of arcade boards used a pair of 8-bit ROMs on a 16-bit wide bus.

#

So you have Odd and Even bytes split between the ROM chips.

#

I don't get why most of the ST-V SH2 ROM gets loaded with an offset of 1.

#

If the Byteswapping was incorrect, I'm sure it wouldn't even show the game name on the first screen.

surreal sluice
flint oasis
#

^ Hope that is some sort of help or at least not a distraction. I appreciate your posts here, regardless of system or topic. It shows possibilities and is inspiring.

maiden granite
#

usually i run the template's clean.bat file in between each compile, but with signaltap you wouldn't want to do that unfortunately

rain obsidian
#

Not a whole lot better, but after a code tidy-up.

rain obsidian
#

Cheating version, using C code for the UV calcs...

pseudo tinsel
#

animations look pretty! Tried a DCA3 dump yet? 😛

rain obsidian
#

That's a fair bit better.

#

I got the Z_FRAC_BITS thing working again, so I can assign more fractional bits to the Z values than to the verts.

#

I think I still have something wrong with dividing the UV coords by Z, as the textures shouldn't be as wobbly as that.

#

But you can see the increased Z precision on the decals on the car.

#

Minimal Z-fighting now.

rain obsidian
#

Some of the ridiculous numbers involved (right-hand window).

#

"sim" generally means the C code.

#

"core" means the Verilog model.

#

And FX1 / float etc., are the Verilog fixed-point verts, represented as floats. lol

#

To see how close they match up with the pure float versions.

#

Some of the values, like for sim Aa and sim Ba, need more than 36 bits.

#

(and a sign bit)

#

I'm already using either 48-bit or 64-bit for most of that stuff in the Verilog, so something else is clearly wrong.

rain obsidian
#

That's the best that has looked in a while.

#

Menu icons look a lot smoother with Gouraud shading.

#

But I need to fix the colour blending there.

#

Kasumi looking mostly OK.

#

Sanic has his face back.

#

Forghorn's a bit rough.

#

Looney Startline is a bit mushed, but at least more stuff is visible.

#

Texture maybe a bit stretched here. The colour bits are likely a separate issue, to do with the codebook cache.

#

(and all the textures involving text are missing atm.)

#

HOTD2 Title screen always looks fairly nice, due to the high-res texture.

#

Lack of fractional bits mean the sky texture is missing.

surreal sluice
#

DC? What happened to STV? 😜

rain obsidian
#

And the raindrops kind of cut through stuff atm. It needs the ARGB tile buffer, to fix transparencies.

#

STV was last year. 😉

#

Zombies are still zombified.

#

Quad/sprite textures need fixing. It used to involve the "fourth" vertex in the interp calcs, but I don't know why it would need to.

When the second half of a quad gets rendered as a triangle anyway, it would be wasting logic to add a virtual vert to the interp.

#

Rayman a bit rough.

#

These transparent "light" textures never did work. Not sure which effect it uses.

surreal sluice
#

What is this on? I assume you can’t build a DC core on DE10 because the fabric is too slow (among other problems)?

#

Some fancy fpga dev board?

rain obsidian
#

And 18 Wheeler has gone a bit Brokeback Mountain.

#

This is just the verilator sim atm, but most of this Verilog can be run on FPGA, in theory.

#

I have a very rough DE10 "core" already, which can render single frames as well, but very slowly.

#

Like, 12 seconds per FRAME, slowly.

surreal sluice
#

So like N64

crude bloom
#

5 frames a min is enough for anyone

rain obsidian
#

Gonna be very hard to fit this with all of the new speed-up stuff, unless I can find somebody to help with the maths.

#

'cos I'm convinced there would be ways to simplify the maths calcs, to minimize the use of multipliers and divides.

#

And reduce the super wide busses for those calcs.

#

N64. lol

#

atm, the frame times in the sim are very good.

#

I'm very happy with that part, so far.

#

Theoretically, around 40-60 FPS in most renders.

#

So like, 25ms to 16ms frame times, basically.

#

That would only ever run that fast on the FPGA if: The logic fits the FPGA in the first place, and if it can take proper advantage of SDRAM burst transfers.

#

atm, the test "core" is using the old-people MiSTer SDRAM controller. Which limits the core freq to be 8 times slower than the SDRAM clock freq.

#

The SDRAM on MiSTer runs at a max of 166 MHz. On most people's boards, though, it's only reliable up to about 128 MHz or so.

#

So that means a max core clock atm, of only 16 MHz.

#

And none of the 32x pixel ISP calc stuff fits the FPGA atm either, hence the ~12+ second render times.

surreal sluice
#

That’s neat, would be a fun tech demo either way, and what you learn maybe can go into a mister2 or whatever

rain obsidian
#

And no Tile Accelerator done yet, which means I can't feed it raw display list data, and have it generate the PVR display lists.

#

Which means continuing to use full 8MB VRAM dumps, for every frame render.

#

Normally, the CPU would just write all of the textures to VRAM first (loading from disk or whatever).

#

Then it only needs to send say 16KB or so of data to the TA, to have the TA translate the vertex data, and render the next frame.

#

So, once the textures are loaded, the amount of data the CPU sends to PVR is quite minimal.

#

Unless it has to swap out textures, etc.

rain obsidian
#

tbh, it's one of the main reasons I preodered the Analogue 3D. lol

#

(I already had a custom Quartus template for the SuperNT working years ago, so I could in theory run various MiSTer cores on it.)

#

IIRC, the Ana 3D has around 220,000 LEs, so twice as much as the DE10 Nano.

#

But a Cyclone 10. Which still isn't exactly the most cutting-edge silicon around atm.

#

Meaning, it's not likely to run complex MiSTer cores much faster than the Cyc V on the DE10, AFAIK.

#

But the Ana 3D is likely to have very fast DDR RAM, I would imagine.

#

(just read between the lines with that last one. lol)

pseudo tinsel
#

hardware wise I have the ultra64 board

#

do you think it'd fit it @rain obsidian ?

#

;p

#

i got reicast running in one of the cpu cores

#

already

rain obsidian
#

Not sure which board that is? Xilinx / AMD, yep?

#

You could try shoving the crusty core onto that board...

#

It's fairly generic code. I think I even have it inferring most (or all) of the BRAM blocks.

#

ie. rather than using the Intel/altera/Quartus IP blocks.

#

But the PLLs are using the IPs.

#

The main slow-down with the test core atm, is that SDRAM controller.

#

If you have faster RAM on your board, or about 8MB of on-chip mem (lol), it would help a lot.

#

But yeah, no TA yet. Unless you can have reicast transfer the whole 8MB VRAM over to the FPGA fast enough.

#

(or do a diff, against the previous VRAM in reicast, and only write those changes to the FPGA side.)

#

Or, I guess capture whatever the TA writes back to VRAM, and just send that.

#

I have no idea how often a typical DC game would swap out larger textures, but it might not be too often?

#

The test core also doesn't write back to any framebuffer in VRAM.

#

It simply writes the output pixels direct to SDRAM.

#

SDRAM is only be used for the framebuffer / video display.

#

The actual vert params and textures are being read from DDR3 atm.

#

Oh yeah, and it has to load the VRAM dump into DDR3 first, ofc.

#

Using the MiSTer framework.

#

So you'd also need a way to do that.

#

It's a bit of a mess. lol

#

As you know, half the battle with the FPGA stuff, is just getting the data to the right places fast enough.

#

I just did a tweak, which I think is helping fix some of those missing polys...

#

Before...

#
wire signed [47:0] cross_term2_p0 = (y_ps_sub_fy1 * fx2_sub_fx1) >>>FRAC_BITS;
wire signed [47:0] cross_term2_p1 = (y_ps_sub_fy2 * fx3_sub_fx2) >>>FRAC_BITS;
wire signed [47:0] cross_term2_p2 = (y_ps_sub_fy3 * fx1_sub_fx3) >>>FRAC_BITS;
#

After...

#
wire signed [63:0] y1_mult_fx2 = y_ps_sub_fy1 * fx2_sub_fx1;
wire signed [63:0] y2_mult_fx3 = y_ps_sub_fy2 * fx3_sub_fx2;
wire signed [63:0] y3_mult_fx1 = y_ps_sub_fy3 * fx1_sub_fx3;

wire signed [63:0] cross_term2_p0 = y1_mult_fx2>>>FRAC_BITS;
wire signed [63:0] cross_term2_p1 = y2_mult_fx3>>>FRAC_BITS;
wire signed [63:0] cross_term2_p2 = y3_mult_fx1>>>FRAC_BITS;
#

ie. Doing the mult separately first.

#

I think what might be going on, is Verilator (or Verilog in general) is defaulting to a max width for each value, if you're doing certain calcs as a direct assign.

#

Which will of course be limiting the max those values can be, which is dumb.

#

I guess it has to chose a maximum, somewhere along the line, if you don't specify it?

#

Thinking it's defaulting to 32-bit, most of the time.

#

So doing the mult using implicit 64-bit results seems to force it.

#

I need to Google more on that later.

#

In general, it seems best to split up certain calcs in Verilog.

#
wire signed [31:0] y_ps_sub_fy1 = (y_ps<<FRAC_BITS) - FY1;
wire signed [31:0] y_ps_sub_fy2 = (y_ps<<FRAC_BITS) - FY2;
wire signed [31:0] y_ps_sub_fy3 = (y_ps<<FRAC_BITS) - FY3;

wire signed [31:0] fx2_sub_fx1 = FX2 - FX1;
wire signed [31:0] fx3_sub_fx2 = FX3 - FX2;
wire signed [31:0] fx1_sub_fx3 = FX1 - FX3;
#

The initial calcs are only the deltas, too. So the results from those should be fairly small, hence 32-bit results.

#

All of this stuff needs to be characterized, to find the typical values.

#

And make sure it works correctly when the values go negative, etc.

#

That's the kind of thing I really struggle with. It's bad enough working with fixed-point to begin with. lol

#

Oh maybe not. Still looks like crap. lol

rain obsidian
#

Tried to clamp the maximum vert values, to not go beyond 2000, or 4,000,000, or whatever.

#

It didn't work out.

#

Seems to be mainly positive values that go outside the edges of the screen.

#

So x_ps>639.

#

Or y_ps>479.

#

I think this is an example of the giant poly near the camera, where it's not writing any prim tags, because all inTri bits stay low...

#

Highest magnitude vert value isn't actually that huge...

#

Even after shifting it, to add (at least 8) fractional bits for fixed-point...

#

It still fits within 32 bits.

#

Finally.

#

I don't know why I have to keep doing "proofs", to know that it's the inTri thing not working right. lol

#

I just forced all inTri bits high, for that specific poly at the lower-right, and it works.

#

And it turns out it wasn't a poly with really huge vert values either, so it's some issue with signed values or something.

rain obsidian
#

They weren't kidding about the TA using a basic bounding-box check.

#

I think it really does draw a rectangle around each triangle, so say whether it thinks the triangle might be within the current tile.

#

So it still loads the params for that triangle, even if no part of it will even appear within the final tile. lol

#

I get why they did it, but it's still surprising.

#

eg. The big triangle that's behind the Daytona logo in the first pic above (part of the road)...

#

Imagine a rectangle drawn around that triangle, which extends to it's verts.

#

That rectangle (bounding box) overlaps the tile (highlighted in red) in the image above.

#

The ISP still needs to check that, even though no part of the triangle actually exists within that tile.

#

Triangle highlighted in Green.

#

Bounding box in Red.

#

@pseudo tinsel Does that sound roughly right to you, based on what you know about the TA ?

#

I know the Sega Bible PDF says something similar.

#

It's just strange seeing it in action. lol

#

It was loading the params for that triangle, even for tiles up the mountains, at the top-left of the bounding box.

#

I suppose that's the only logic-efficient way to do the tile binning stage.

pseudo tinsel
#

yeah it creates tile bounding boxes

rain obsidian
#

I knew it said it, I just... it's hard to accept that it's quite so wasteful. lol

pseudo tinsel
#

well it does help localize

rain obsidian
#

The deferred rendering thing is more involved than I thought.

pseudo tinsel
#

i suspect there's some form of early rejection

#

in core

rain obsidian
#

ie. you kind of have to gather all the info first, taking a fair bit of processing time, so you can extract it all again during texturing.

#

I suppose that is kind of the meaning of "deferred". lol

#

Yeah.

pseudo tinsel
#

yup

rain obsidian
#

Thing is, the frame times are quite decent on the sim now, so not all of the detail matters too much.

#

It's just neat seeing the effects of what they said in the PDF.

#

I never really looked into it before in the sim, all this time.

#

Quite hard to visualize stuff that doesn't appear on-screen. hehe

#

Thanks for the confirmation. It will help with the TA stuff later. 😉

tired spire
#

That looks like a lot of progress, really cool! 👍

rain obsidian
#

Thanks.

#

I tried to fix the inTri thing again.

#

Claude AI confirmed that it is due to overflow in the calcs, and it makes sense...

#

Since it's doing successive addition for the calcs, 32 times.

#

The final values become pretty huge.

#

Then again, it shouldn't mean the lower bits of inTri stay low?

#

Anywho, I'll just have to figure that out later, and try to move on.

#

I reverted to older simplified code for inTri_calc, though, and it got rid of most of the gaps between triangles.

#

Old inTri_calc, from when I was trying to find the overflow...

#

New...

#

A bit neater. lol

dense shard
#

Great progress dude, it’s cool popping in here and seeing all the awesome stuff you’re doing! 🥳

ripe stump
#

I don’t feel so good, tails

surreal sluice
#

Yeah fun to watch the DC research stuff. Any chance you can summarize the STV stuff for srg320? Maybe he can see if it’s at all possible to get the games over the finish line when he gets through the remaining Saturn work

rain obsidian
#

Yep, I might get in touch with Sergey soon.

#

But he'll likely be able to see the changes on github right away, and I honestly think he'll know how to fix most of the issues.

#

Most of the uber devs clearly do chip design as a profession. So they tend to be able to look at some code, and fix it quickly.

#

vs me, struggling for months on a single maths problem, like the one above. lol

surreal sluice
rain obsidian
#

Slightly less glitchy.

#

Or at least, the larger missing polys are more often shown in plain sight, instead of being drawn right across the car etc.

#

And the glitches on the right-hand side of the screen aren't quite as bad.

#

Anyway, I really do need to stop messing with GIFs now, and start on the TA or SDRAM stuff.

#

Frame times are still enough to hit 50 FPS, for most of these frames.

#

The inTri calc could probably be done on 16 pixels at a time, instead of 32.

#

Meaning it would take 64 clock cycles per-tile (and per-prim type), instead of 32.

#

It might not make too big of a negative impact on the frame times, but yeah, that's all on the TODO list.

#

I really want to try getting the speed-up stuff running on the FPGA, even if the core will render slowly until the SDRAM stuff is done.

#

I just don't think it will fit the FPGA, without sacrificing a lot of vertex and Z precision.

#

I still think there are ways to simplifiy the maths further. I was thinking of sending a Tweet-X to Matt at Stand Up Maths. lol

#

Claude AI has been very helpful, but I keep on running out of free questions.

#

I hate to admit even using "AI" to help with this, but it's more about helping look at the code a different way.

#

Claude gets quite a lot of things wrong, until you point it out.

#

Gonna have a sleepy. zzzzz

rain obsidian
#

Back again.

#

You know what, most of these issues do seem to exist in reicast itself.

#

No idea why that should be. It might even be something I messed with a while back, but I don't think so?

#

Gonna try some different games, to try to get some "clean" VRAM captures.

#

Other glitches on the emu...

rain obsidian
#

Not the most dynamic looking game. lol

rain obsidian
#

Months of work, and all I've created is a slightly-broken GIF generator. lol

ripe stump
#

I’m sorry, the Start what?

#

that’s assault.

rain obsidian
rain obsidian
#

Same thing, but without the gaps in the polys on Sonic, etc.

#

And slightly higher FRAC_BITS for the verts.

low widget
rain obsidian
rain obsidian
#

Stumbled upon VGE's video on the ST-V again.

#

Caused me to have another think about why some carts aren't working yet.

#
wire CART_ID_SEL = (AA[23:1] == 24'hFFFFFF>>1) && ~ACS1_N;
#
                    if (CART_ID_SEL) begin
                        case (MODE)
                            3'h1: ABUS_DO <= 16'hFFFF;            // ROM 2M.
                            3'h2: ABUS_DO <= 16'hFF5A;            // DRAM 1M.
                            3'h3: ABUS_DO <= 16'hFF5C;            // DRAM 4M.
                            3'h4: ABUS_DO <= 16'hFF21;            // BACKUP Mem.
                            3'h5: ABUS_DO <= 16'hFFFF;            // ST-V. TODO!
                            default: ABUS_DO <= 16'hFFFF;
                        endcase
                    end
#

The Saturn core already had this readback thingy for the "Cart ID".

#

I think that CART_ID_SEL logic would place the ID in the last two bytes of the ABUS Cart range.

#

So at 0x4FFFFF ?

#

From MAME ST-V...

#
void stv_state::install_common_protection()
{
    m_maincpu->space(AS_PROGRAM).install_read_handler(0x4fffff0, 0x4ffffff, read32sm_delegate(*this, FUNC(stv_state::common_prot_r)));
    m_maincpu->space(AS_PROGRAM).install_write_handler(0x4fffff0, 0x4ffffff, write32s_delegate(*this, FUNC(stv_state::common_prot_w)));
}
#

MAME shows some read/write handlers for the same two byte addresses.

fossil flameBOT
#
uint32_t stv_state::common_prot_r(offs_t offset)
rain obsidian
#

Presumably, that is for the more common copy-protection used in some games.

#

Other games have their own specific protection, and so have specific functions in MAME.

#

Maybe some games are just expecting to read back specific values from that address, so refuse to boot further.

#

But so far, it seems to be missing or garbaged code that has been causing it to properly crash.

pseudo tinsel
#

gifs look fun!

#

its starting to look like plausible renders

late cargo
#

I thought the ST-V was sooo last year! smugnep

surreal sluice
#

2025 the year of STV!

potent warren
#

is the dream alive?

rain obsidian
#

Not yet.

rain obsidian
#

Tried tweaking the PVR test "core", to display the framebuffer direct from DDR3.

#

After seeing the Saturn core using mainly DDR3 for everything, I think I could do the same for the PVR thing.

#

But I'll need to use Burst transfers a lot, to cancel out the DDR latency.

#

FB display didn't work, though.

#

I tripped up on the usual BYTE vs WORD address thing...

#
wire [28:0] DDRAM_BASE = (32'h32000000 >>3);

wire [31:0] FB_R_SOF1 = pvr_ptr[ 'h50>>2 ];
wire [31:0] FB_R_SOF2 = pvr_ptr[ 'h54>>2 ];

assign FB_EN     = |status[3:2];
assign FB_BASE   = DDRAM_BASE + (status[3]) ? FB_R_SOF2 : FB_R_SOF1;
assign FB_WIDTH  = 12'd640;
assign FB_HEIGHT = 12'd480;
assign FB_FORMAT = 5'b0_0_100;
assign FB_STRIDE = 14'd640;
assign FB_FORCE_BLANK = 0;
#

(DDRAM_BASE was defined near the bottom of the file originally. I forgot it uses the 64-bit WORD address.)

#

The (hopeful) fix...

#
wire [23:0] FB_R_SOF1 = pvr_ptr[ 'h50>>2 ][23:0];
wire [23:0] FB_R_SOF2 = pvr_ptr[ 'h54>>2 ][23:0];

assign FB_EN     = |status[3:2];
assign FB_BASE   = (DDRAM_BASE<<3) + (status[3] ? FB_R_SOF2 : FB_R_SOF1);
assign FB_WIDTH  = 12'd640;
assign FB_HEIGHT = 12'd480;
assign FB_FORMAT = 5'b0_0_100;
assign FB_STRIDE = 14'd640;
assign FB_FORCE_BLANK = 0;
#

Those regs get loaded from the PVR reg dump file, via the OSD.

#

The 8MB VRAM dump gets loaded into DDR3, starting at DDRAM_BASE.

#
"O[3:2],Display DDR FB,Off,FB1,FB2;",
#

status[3:2] bits used to enable/disable, or change which FB gets displayed.

#

The DC writes to one framebuffer, whilst displaying the other, so double-buffered.

#

And those buffers are split between each half of VRAM.

#

(but I think the address in each half can still be different.)

#

It's also a way they could maximize the bandwidth at the time.

#

Since it will be mainly reading 32-bit parameters from one half of VRAM.

#

Then reading 64-bit textures from BOTH halves, and then burst-write the completed tile to the framebuffer.

#

Just realizing this won't quite work...

#

The VRAM dump file gets loaded into the upper and lower halve of each 64-bit word.

#

So it puts the upper and lower 4MB chunks side-by-side in memory.

#

Which is "easier" for everything else.

#

But ASCAL will be trying to read it as the full 64-bit word, reading 16-bit pixels from those words.

#

It will end up displaying half of the pixels from half of VRAM, and half from the other. lol

#

So I guess two pixels from the framebuffer, and two "random" ones from elsewhere in the opposite half of VRAM.

#

Great, now just a black screen, when I enable the Framebuffer thing.

#

Not 100% what FB_STRIDE does?

#
// FB_STRIDE either 0 (rounded to 256 bytes) or multiple of pixel size (in bytes)
#

I found a core which just sets Stride to the same as the image width.

#

It was displaying garbage from DDR3 earlier, so can't be too far off.

#

I'm sure the VRAM dumps contain the pre-rendered Framebuffer(s) from reicast.

#

"Cheating" again, in the sim.

#

So that's displaying the framebuffer that reicast had already rendered into VRAM.

#

Again, no idea why I forgot about this...

#

Proves the reicast dump had corrupted polys to begin with. sigh

#

How it should look...

#

How it's going...

#

This one isn't too terrible...

rain obsidian
#

Supposed to be displaying the framebuffer of the Zombies (pre)-render.

#

That might even be a texture. Not sure yet.

#

During loading of the Sonic VRAM dump. ^

#

So yeah, probably partial textures, and part data.

#

lol

#

That was loading the Rayman thing first, then loading the PVR regs for a different file.

#

So the PVR regs are being loaded, but something else isn't quite right.

#

I knew it was likely to double up the image, and have garbage every two pixels.

#

But I hoped it would be at the start of the frame at least.

#

I think the stride thing probably was correct before.

#

And now I've set it to 0, it defaults to a stride value of 256.

#

Which would explain why some images are repeated five times.

#

(256 * 5 = 1280, but 16BPP)

#

Yet another compile.

#
wire [23:0] FB_R_SOF1 = pvr_ptr['h50>>2][23:0];
wire [23:0] FB_R_SOF2 = pvr_ptr['h54>>2][23:0];

wire [23:0] FB_OFFSET = status[3] ? FB_R_SOF2 : FB_R_SOF1;

assign FB_EN     = |status[3:2];    // Enable the FB display if either status bit is set.
assign FB_BASE   = (DDRAM_BASE<<3) + FB_OFFSET[22:0];
assign FB_WIDTH  = 12'd640;
assign FB_HEIGHT = 12'd480;
assign FB_FORMAT = 5'b0_0_100;
assign FB_STRIDE = 14'd640;
assign FB_FORCE_BLANK = 0;
#

Oh yeah, I think the stride is in BYTEs, so needs to be 640*2.

#
Framebuffer-related signals (ifdef MISTER_FB):
   - FB_EN (output): Enables the use of the framebuffer in DDRAM.
   - FB_FORMAT (output [4:0]): Specifies the format of the framebuffer:
     - Bits [2:0]: 011=8bpp(palette), 100=16bpp, 101=24bpp, 110=32bpp
     - Bit [3]: 0=16bits 565, 1=16bits 1555
     - Bit [4]: 0=RGB, 1=BGR (for 16/24/32 bit modes)
   - FB_WIDTH, FB_HEIGHT (output [11:0] each): Specify the dimensions of the framebuffer.
   - FB_BASE (output [31:0]): Base address of the framebuffer in memory.
   - FB_STRIDE (output [13:0]): Stride of the framebuffer (0 or multiple of pixel size in bytes).
   - FB_VBL, FB_LL (input): Vertical Blank and Load Line signals.
   - FB_FORCE_BLANK (output): Forces the framebuffer output to be blank.
pseudo tinsel
#

yeah fb stuff is a bit w/e

#

also write out is very flexbile w/ the Y scaler

floral vale
#

Love seeing these updates!

rain obsidian
#

Had to add more tweaks, so I can change things from the menu.

#
wire [23:0] FB_R_SOF1 = pvr_ptr['h50>>2][23:0];
wire [23:0] FB_R_SOF2 = pvr_ptr['h54>>2][23:0];

wire [23:0] FB_OFFSET = status[3] ? FB_R_SOF2 : FB_R_SOF1;

assign FB_EN     = |status[3:2];    // Enable the FB display if either status bit is set.
assign FB_BASE   = (DDRAM_BASE<<3) + FB_OFFSET[22:0];
assign FB_WIDTH  = 12'd640;
assign FB_HEIGHT = 12'd480;
assign FB_FORMAT = 5'b1_0_000 | bpp;    // [4] 0=RGB 1=BGR. [3] 0=16bit 565, 1=16bit 1555. [2:0] 011=8bpp(palette) 100=16bpp 101=24bpp 110=32bpp.
assign FB_STRIDE = 14'd640 <<status[12:11];    // Either 0 (rounded to 256 bytes) or multiple of pixel size (in bytes)
assign FB_FORCE_BLANK = 0;

wire [2:0] bpp = (status[7:6]==2'd0) ? 3'b011 :        // 8bpp
                 (status[7:6]==2'd1) ? 3'b100 :        // 16bpp
                                        3'b110;        // 32bpp
#

That's as good as it got on the last compile. lol

#

Important to get this working, as it means ASCAL will take care of the framebuffer display.

#

Then I can start working on getting burst transfers working, for vertex/param reading, and for writing a tile back to the FB.

#

(using the proper FB_W_SOF1 offset, too)

#

That also means doing away with the need for SDRAM for the framebuffer display.

#

Which in turn means being able to run the core WAY faster.

#

It was only 16MHz before. The DDR controller can accept a much faster clock of 50+ MHz.

#

For tile burst writes, though, it begs the question...

#

Does the real DC arrange the framebuffer in order of tile pixels, or in a more linear order.

#

Because it makes more sense to be able to burst write all 1,024 pixels of a tile at once, into a whole VRAM (SDRAM) page.

#

If the framebuffer address is arranged in a linear fashion, it would mean having to burst only 32 pixels' worth (one tile row), then having to pause for a few clocks to activate the next SDRAM row, then write the next 32 pixels, and so-on.

#

That would add up to quite a few wasted clock cycles for every tile written.

#

This is another thing that could be answered, by hooking up the Logic Analyzer to the DC's VRAM.

#

I guess during framebuffer reads (for display via the DAC), it would still need to read back 32 pixels at once, from each tile.

#

Say it takes four clock cycles to activate each new SDRAM row.

#

That would mean 128 wasted clock cycles per tile write.

#

300 tiles (640x480), so 38,400 clock cycles.

#

Actually, not too bad. lol

#

At 100 MHz, that would be 384 microseconds added, per-frame.

#

(0.384 milliseconds)

#
Linear FB order...

Tile
00000000 11111111 22222222 33333333 44444444 55555555 66666666 77777777 88888888
00000000 11111111 22222222 33333333 44444444 55555555 66666666 77777777 88888888
00000000 11111111 22222222 33333333 44444444 55555555 66666666 77777777 88888888
00000000 11111111 22222222 33333333 44444444 55555555 66666666 77777777 88888888
00000000 11111111 22222222 33333333 44444444 55555555 66666666 77777777 88888888
00000000 11111111 22222222 33333333 44444444 55555555 66666666 77777777 88888888
00000000 11111111 22222222 33333333 44444444 55555555 66666666 77777777 88888888
00000000 11111111 22222222 33333333 44444444 55555555 66666666 77777777 88888888
#
Tile FB pixel order...
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111
22222222 22222222 22222222 22222222 22222222 22222222 22222222 22222222 22222222 22222222
etc.
#

Number = Tile ID.

#

(assume it's actually 32 pixels per tile row)

#

A bit confused.

#

I'm only getting those images, due to loading a different PVR regs file.

#

And somehow landing roughly in the correct place to display the framebuffer from Rayman etc.

#

The last image was with it set to 32BPP, but with a stride of 2560.

#

So that's probably the "correct" setting for the alignment.

#

But I need to tweak ASCAL, to intepret only the upper or lower 16-bits of each 32-bit chunk.

#

No idea why the start of the frame is so far off.

#

Oh yeah. That does make some sense. lol

#

Since each 64-bit word contains twice the number of pixels.

#

I need to adjust the SOF (Start Of Frame) address to account for that.

rain obsidian
#

Yep, I think this next compile might work.

#

Need to do FB_R_SOF1 <<1, basically.

#

You have to double the framebuffer offset, because each 64-bit word contains twice the number of pixels you need.

#

Or basically each 32-bit half of each word, contains the upper and lower 4MB of VRAM. It's a Dreamcast quirk.

#

When I say this compile might "work", I just mean that it might display the framebuffer OK.

#

But this is the pre-rendered framebuffer that reicast did, before dumping the 8MB of VRAM to a file.

#

The test core isn't writing it's own rendered tiles to this framebuffer yet.

#

Makes for an interesting texture viewer, though.

#

Thta looks like maybe some of the decoded frames from the Sonic intro FMV?

ripe stump
rain obsidian
#

Don't tempt them. lol

ripe stump
#

This shit is so cool, ash

rain obsidian
#

Thanks.

#

Colour decoding not fixed yet, but at least the offset is correct.

#

Have to think of the 64-bit WORD address as if it's the 32-bit WORD address.

#

So it's controlling the address for both halves of VRAM at once.

#

Then you have to select the upper or lower 32-bits from each WORD you read back.

#

When it's set to display as 16-bit colour, there are twice as many pixels per word.

#

So you end up with odd and even pairs of pixels, from each of the two framebuffers. Hard to explain. lol

#

You can kind of see it on Sanic's glove.

#

A bit like interlacing artifacts, but horizontally.

#

16bpp ^

#

^ 32bpp, but trying to decode 16bpp from that.

#

The code/logic in ASCAL is too overcomplex for my brain.

#

Grabulosaure is a maths wizard, so it's no wonder.

#
            WHEN OTHERS => -- 32bpp
                IF (N_DW=128 AND (acpt MOD 4)=0) OR (N_DW=64 AND (acpt MOD 2)=0) THEN
                    shift_v:=dr & dr(15 DOWNTO 0);
                ELSE
                    shift_v:=shift(32 TO N_DW+15) & dr(31 DOWNTO 0);
                END IF;
#

I might have to ask Grabulosaure to tweak this. lol

#

It's complex, because he added support for all kinds of pixel formats, and data widths for DDR read/write.

#

I might need to kludge it enough to display nice colours, even if it's showing both framebuffers as "interleaved" atm.

#
WHEN "10" => -- 32bpp
    --RETURN shift(32 TO 119) & pix.r & pix.g & pix.b & x"00";
    RETURN shift(16 TO 119) &
        pix.g(4 DOWNTO 2) & pix.r(7 DOWNTO 3) &    -- TESTING!! Interpret 32-bit as 16bpp, for Dreamcast FB display.
        pix.b(7 DOWNTO 3) & pix.g(7 DOWNTO 5);
#

The original was reading Red, Green, Blue as 8-bits each, then ignorring the last byte.

#

The framebuffer in mem is 16-bit colour, 565 format (usually).

#
Original 32bpp...
[31:24] = Red.
[23:16] = Green.
[15:08] = Blue.
[07:00] = ignorred.
#
What I need...

{[31:27], 3'b000} = Red.
{[26:21], 2'b00}  = Green.
{[20:16], 3'b000} = Blue.
8'h00             = Padding / ignore.
#

Or, reading the lower 16 bits instead, for the other 4MB half of VRAM.

#
RETURN shift(32 TO 119) & pix.r(31 downto 27)&"000" & pix.r(26 downto 21)&"00" & pix.r(20 downto 16)&"000" & x"00";
#

No idea.

#

hmm. More complex than that.

#

VERY hard to make sense of the ASCAL code.

#

I'll just have to try some tweaks, and keep doing compiles.

#

It's super wasteful of time, but I don't have much brainpower tonight anyway, so will watch YT as it compiles.

pseudo tinsel
#

ASCAL?

rain obsidian
#

Still not right. Gonna take me a while to fix this.

#

The lower image is the crappo-render, from the "core".

rain obsidian
#

Written by Grabulosaure, quite a few years ago now.

#

Originally, Sorg was using the altera/Intel IP block "VIP" video scaler.

#

Which also meant you had to have a patch for Quartus to use it.

#

(you also need the patch, to be able to use SignalTap, but yeah.)

#

Grab's scaler is so much better than the basic Intel one, as it has had all kinds of features added since.

#

But it's very complex, and written in VHDL, which makes it much harder for me to understand.

#

No idea why the framebuffer(s) on that Toy Commander VRAM dump would contain completely different frames to the vertex data, btw. lol

#

ASCAL has the option to force it to display a framebuffer from DDR3.

#

It can also do that using PAL8 mode, with a separate bus input for writing to the palette.

#

When the Linux framebuffer is enabled (F11 or something, from the main menu core, can't remember), it forces ASCAL to display Linux instead of your own chosen one.

rain obsidian
#

I accidentally created a Dreamcast slideshow viewer, for 3D glasses.

low widget
#

Now, if you made it red and blue, I might have some 3D glasses in my drawer

rain obsidian
#

I just did quite a few tweaks of the core, copying code from the sim version.

#

So it will probably all explode, but we'll see.

#

I haven't tried fixing the colour decoding yet, so it will stay a green+magenta nightmare for a while longer.

#

One of the fixes in the code, was the masking/clamping logic for texture UV.

#

Which was causing the repeated smaller textures on some stuff, like the "KEEP OUT" texture on the screenshot above.

#

(but that's actually the reicast render, which was doing it correctly anyway, aside from the missing poly at the lower-left.)

#

Plus, I added the Z_FRAC_BITS thing to the core, to increase the Z precision, hopefully.

#

Probably not too noticeable if this next compile does work, as the renders on the FPGA look horrible atm. lol

#

It's making more sense, why the deferred rendering thing is a good idea...

#

It means there is less time spent by the chip reading/writing VRAM, as the Hidden Surface Removal essentially removes almost all overdraw.

#

Dividing the image into tiles is a very good idea, too, which is why all modern GPUs do that.

#

It also means that once each tile is rendered (internally), it can be written back to the framebuffer in a single burst.

#

Which again, maximizes the use of Burst transfers and VRAM bandwidth.

#

(technically, I guess rendering the transparent stuff is kind of doing overdraw, but not really.)

surreal sluice
#

I bet someone tripping on acid in 2000 had this Dreamcast experience

rain obsidian
#

Never done it, sadly.

#

My only real chemical education, was during seeing Foo Fighters, live at Glastonbury.

#

#cottonmouth

#

#widepupils

#

One of the best nights of my life. lol

#

Another trippy one…

ripe stump
#

What kind of frame rate are you getting on the De10nano? Fps or spf?

rain obsidian
#

That's kind of a giant todo list atm. lol

#

Because I'm using the old SDRAM controller for the framebuffer.

#

And that controller doesn't use SDRAM burst transfers at all.

#

Which means that for any random Word you want to read or write, it takes around 8 SDRAM clock cycles.

#

The SDRAM itself can only run at around 128 MHz or so.

#

Which in turn means the core clock can only run at 16 MHz currently.

#

(to guarantee it can read/write a Word in SDRAM, and the core will expect the data to be read/written within ONE core clock cycle)

#

If that makes sense.

#

So the core/renderer is only running at 16 MHz, which is obviously super slow.

ripe stump
#

It makes enough sense.

rain obsidian
#

Not only that, but the core doesn't currently contain any of the speed-up stuff that the sim version does, because it will use up tons of FPGA logic atm.

#

I need to rewrite parts of it, so Quartus properly infers RAM blocks for some stuff, and doesn't use too much logic.

#

The sim version Verilog can process 32 "pixels" at once atm, to update the Tag buffer.

#

I can also simulate RAM blocks of basically any size I need.

#

The FPGA on the DE10 Nano only has around 600-700 KBytes of on-chip mem, so you have to be careful how much you use.

#

So the sim version will be the "Big Daddy" version, to get as much working as possible.

#

Then that stuff will gradually filter down to the FPGA version, assuming any of it will even fit. lol

#

I really screwed up the core just now, and it takes nearly 28 minutes per compile. lol

#

It's a serious killer to development.

#

I need to just "emulate" the typical DDR3 latency in the sim.

#

So I can test everything on the sim first, which takes mere SECONDS to "compile".

#

The core (at 16MHz, with no other speed-up logic) currently takes around 3-4 seconds, just to render one frame.

#

The sim, is now hitting 50-70 FPS on average, for typical frame times. But based on it (eventually) working at 100 MHz on the FPGA.

#

Which is the ultimate goal, because that's what the real PVR2 chip runs at.

#

I'd be happy with even 20 FPS on the FPGA atm.

#

But no chance of doing real animations using the core, until I can get the TA done.

#

Or, capture whatever the TA in reicast writes to VRAM for each frame, then send that data to the core.

#

Would be a lot like how we tested laxer's PS1 GPU years ago...

#

(although he had the PS1 GPU quite far along at that point, so I was able to send it raw GPU commands, captured from an emulator. I can't even remember which emulator we used. lol)

#

This one looked WAY better, as I would wait for Vblank before writing each frame, so it would lock it to 60 FPS...

#

Or that might have been 30 FPS.

#

Anyway, that's the next goal for the core, but it's a serious killer, waiting for Quartus compiles.

#

It's probably the main reason cores take so long to write.

#

But the truth is - you can simulate almost anything, including SDRAM and DDR3 timings.

#

Even with the core running at only 16 MHz, the sim still takes longer to render a frame, but that's fine.

#

Since the turnaround time for dev is SOOOOO much faster in the sim. lol

maiden granite
#

sim is great to a point, then you get synth/sim mismatch and you realize you can't rely on just developing in sim very quickly

#

to avoid synth/sim mismatch can be laborious

rain obsidian
#

That's true - with HDL, it's SO much easier to write a few lines in C code, for most stuff. lol

#

And easier to "access" the data you want in C.

#

You can kind of access stuff from different modules in Verilog / SV as well, but it's not the same.

#

For the vast majority of cores, you have to route signals and busses around, if you want to access specific variables / register values.

#

Synth issues are definitely a thing, especially timing constraints.

#

Which is something I never learned to do properly.

#

But that's usually only an issue when a core becomes unstable, as there isn't enough "slack" for certain signals to propagate.

#

(in time for the next clock cycle)

#

So probably better to just sim as much as I can, including emulating the SDRAM or DDR stuff.

#

And try to tweak the logic for some stuff, so it infers RAMs properly in Quartus, etc.

#

At least it should be quite straightforward to render to an RGB tile buffer first.

#

Then burst-write that to the framebuffer in DDR3.

#

That alone should give a big speed boost, as well as ditching the SDRAM, so I can run the whole core a lot faster.

#

Looks like I was looking in the wrong place in ASCAL, to modify the colour decoding.

#
    FUNCTION shift_opix (shift  : unsigned(0 TO N_DW+15);
                                             format : unsigned(5 DOWNTO 0)) RETURN type_pix IS
    BEGIN
        CASE format(3 DOWNTO 0) IS
            WHEN "0100" => -- 16bpp 565
                RETURN (b=>shift(8 TO 12) & shift(8 TO 10),
                          g=>shift(13 TO 15) & shift(0 TO 2) & shift(13 TO 14),
                          r=>shift(3 TO 7) & shift(3 TO 5));
            WHEN "1100" => -- 16bpp 1555
                RETURN (b=>shift(9 TO 13) & shift(9 TO 11),
                          g=>shift(14 TO 15) & shift(0 TO 2) & shift(14 TO 15) & shift(0),
                          r=>shift(3 TO 7) & shift(3 TO 5));
            WHEN "0101" | "0110" =>  -- 24bpp / 32bpp
                --RETURN (r=>shift(0 TO 7),g=>shift(8 TO 15),b=>shift(16 TO 23));
                RETURN (b=>shift(8 TO 12) & shift(8 TO 10),                                            -- TESTING !!!!!!
                          g=>shift(13 TO 15) & shift(0 TO 2) & shift(13 TO 14),
                          r=>shift(3 TO 7) & shift(3 TO 5));
            WHEN OTHERS =>
                RETURN (r=>shift(0 TO 7),g=>shift(8 TO 15),b=>shift(16 TO 23));

        END CASE;
    END FUNCTION;
#

ASCAL has logic for decoding colour, for both the input and output stuff.

#

The input, ie. when you shove RGB plus Syncs into ASCAL, from a core.

#

But in Framebuffer mode, it's not doing that. It's just reading the pixels directly from DDR3, then scaling and displaying that.

#

The code above is hopefully what I needed to tweak.

#

I probably still have it wrong, but I'll just have to keep doing more compiles, until I get further.

#

It's basically doing...

#
wire [7:0] blue  = {shift[12:08], shift[10:08]};
wire [7:0] green = {shift[15:13], shift[02:00], shift[14:13]};
wire [7:0] red   = {shift[07:03], shift[05:03]};
#

Ahh, Verilog.

#

So much nicer. lol

#

Not sure how the bit order works with the VHDL shift function, but hey.

#

But that's decoding 565 format, supposedly.

#

(I needed to copy that into the 32-bit section, so the shifts etc. are correct for 32-bit, but it decodes 16-bit colour from that.)

rain obsidian
#

One issue mostly solved.

#

A different problem created. lol

#

Did NOT know Claude can do this. 😮

#

Super handy.

#

I think I see now, finally.

#

That shift_opack function, selects the group of bits from the whole 128-bit wide data from DDR3.

#

The shift_opix part I modified just now, does the actual pixel colour decoding, from the data supplied from shift_opack.

#

(I would imagine "opix" means "output pixel")

#

I should be able to comment some of that out, leaving only this...

#
shift_v:=dr & dr(15 DOWNTO 0);
#

So it only selects the lower 16 bits of each 32-bit word.

#

Actually, that might still be wrong. lol

#

I'm trying to get rid of the interleaving of the two different framebuffers.

#

Since they are technically placed alongside eachother, within each 64-bit word.

#

That might actually be two pixels from each buffer.

#

The vertical lines just look super skinny, as I also had to set the stride value to 2560 (four times 640).

#

Don't know. I'll just have to try it.

#

I know this is super slow work, but it's the only way I know how.

#

New Avatar prequel, confirmed.

ripe stump
#

Is this a sim or on device?

rain obsidian
#

Hard to explain. lol

#

On the FPGA, but... very importantly, this is displaying the Framebuffer that was already pre-rendered by reicast.

#

I can switch back to what the FPGA itself renders, but the colours are messed up atm...

#

Displaying the frame that reicast already rendered is obviously "cheating".

#

But I'm just trying to display it for now, so I can then do the FPGA render into that framebuffer, like it's meant to be.

#

Then I can ditch SDRAM altogether, which I'm using purely for the FPGA framebuffer atm.

#

Whenever you see a photo of the LCD, it's on MiSTer.

#

I can do the same thing in the sim now.

#

This frame took the sim 9 seconds to render...

#

If I hit the SOF1 button, it switches to displaying the frame within the VRAM dump, that reicast had already pre-rendered..

#

Which is also good for comparisons, ofc.

#

Although the frame from reicast is usually one frame behind the new vertex data.

#

You can kind of see at the lower left, where the missing chunk of road is moved further back, on the sim render version.

#

So the top Daytona image is rendered by the Verilog itself, using only the vertex params and texture data.

#

The bottom image was what reicast had already done, before the 8MB of VRAM was dumped to a file.

#

I'm very pleased with how things are lining up so far.

#

If you flip between those two frames, you can see it move slightly.

#

No texture filtering on the Verilog version, ofc.

#

And no transparencies, for certain types. That's something that got "broken" when I moved to using a proper Tag buffer.

#

The Tag buffer was responsible for some of the biggest (theoretical) speed-ups so far.

#
    //load_vram_dump("_rayman_level");        // 112.35 FPS   112.17 FPS with CB cache.
    //load_vram_dump("_xtreme_intro");        // 17.25 FPS    36.04 FPS with CB cache.
    //load_vram_dump("_daytona_intro");        // 26.76 FPS    44.00 FPS with CB cache.
    load_vram_dump("_daytona_behind");    // 39.46 FPS    60.36 FPS with CB cache.
#

Then I added the Codebook cache, and it improved it further.

#

Well, for most of the renders. lol

#

I will try to be very clear about what this is, going forward.

#

The main goal atm, is to just get the Verilog to render tiles directly into VRAM in DDR3, similar to the real PVR2.

#

The stride value of 2560 also makes sense now...

#
The stride value is in BYTES.
But the framebuffer is 16BPP (two bytes per pixel).
So the stride value would normally be 1280.

But, I have VRAM as the two 4MB halves, split across each 64-bit Word from DDR3...

DDR Data [63:0]...

[63:48]=Upper_4MB,[47:32]=Upper_4MB, [31:16]=Lower_4MB,[15:00]=Lower_4MB.
#

Two pixels from the Upper 4MB half of VRAM, two from the Lower 4MB half.

#

Four pixels per 64-bit word. 8 Bytes.

#

^ Frame rendered by the Verilog.

#

Switching over to the frame pre-rendered by reicast, which is still contained in the VRAM dump...

#

I know it's missing every other pixel now, but progress.

#

Or probably every two pixels.

#

Crusty reicast frame display, in PS1 mode...

#

Crustier Verilog rendered frame...

pseudo tinsel
rain obsidian
#

So the weird interleaving in some screenshots, was it displaying two pixels from the lower FB, and two from the upper FB.

#

I've just added the logic to hopefully get it to write the (verilog) renders back to VRAM in DDR3, instead of using SDRAM.

#

Once that's working, I should be able to ditch the SDRAM, then run the core at a much higher freq.

#

At least 50 MHz, I reckon.

#

It's only at 16 MHz atm, due to the SDRAM, and the crusty SDRAM controller.

#

Figuring out the addresses for DDR can be interesting.

#
wire [28:0] DDRAM_BASE = (32'h32000000 >>3);    // 800MB. (DDRAM_BASE is the 64-bit WORD address!)

// Limit the write/read addresses to 4MB!
wire [28:0] dl_word_addr   = DDRAM_BASE + ioctl_addr[21:2];
wire [28:0] vram_word_addr = DDRAM_BASE +  vram_addr[21:2];
wire [28:0] fb_word_addr   = DDRAM_BASE +  FB_R_SOF1[21:2] + fb_addr[21:1];
#

DDR data in/out is 64-bit wide.

#

You can use most of the upper 512MB of DDR for FPGA stuffs.

#

The whole lower 512MB (AFAIK) is used for the ARM/Linux.

#

ASCAL usually puts its own framebuffers (for upscaling) at the 512MB point in DDR.

#

So I've shoved the DC 8MB VRAM up at 800MB.

#

Then need to figure out how many LSB bits of the addresses to ditch. It gave me a headache again.

#

It is entirely possible to stomp over Linux memory in DDR, if you're not careful. lol

#

Then the DE10 will usually crash after loading the core, and the OSD menu freezes.

#

The really low-res screenshots were me trying to ditch the upper pair of pixels from the VRAM dump FB.

#

At some point soon, I might even have to blank the reicast framebuffers from the VRAM dumps.

#

So people don't get the wrong idea, and think I'm trying to fake the core. lol

pseudo tinsel
#

ah so you've made the '64 bit' view the native one?

rain obsidian
#

Hard to explain without a vid.

#

Yep, native reicast Framebuffer is displayed first.

#

But kind of interleaved, because 64-bit data, but 32-bit lower and upper 4MB within that same 64-bit word.

#

Now I just got writeback working, so it immediately gets overwritten with the Verilog render.

#

The pixel order is swapped again atm, but oh well.

#

The same image is being written to SDRAM at the same time, but I'm about to disable that.

#

It does mean that the core will only work with HDMI for the time being, but that's fine.

#

(since it won't have a normal RGB video output from the core, from when it was using SDRAM as the Framebuffer.)

#

I believe it is still possible to get RGB/VGA output from a core like that, by using ASCAL to display at a 15KHz or 31KHz mode, and route to the RGB port.

#

Tell a lie, the core was only running at 15 MHz all this time. Not even 16 MHz.

#

But now I can try ditching all of that, bypass the PLL, and use the DE10's 50MHz clock directly...

#
wire clk_sys = CLK_50M;    // SDRAM Framebuffer disabled now - Writing direct to DDR3.
                           // So we can finally speed up the core!
/*
wire clk_sys;
wire clk_ram;
wire locked;

pll pll
(
    .refclk(CLK_50M),
    .rst(0),
    .outclk_0(clk_sys),
    .outclk_1(clk_ram),
    //.reconfig_to_pll(reconfig_to_pll),
    //.reconfig_from_pll(reconfig_from_pll),
    .locked(locked)
);
*/
#

clk_sys was the original 15 MHz.

#

clk_ram was the original 120 MHz for the SDRAM.

#

Core is using 61% atm. Be interesting to see what happens with the higher clock freq.

#

(Quartus generally needs to be told the clock freq of an incoming oscillator, via the SDC file. From that, it will figure out the various signal propagation, and will try harder to reach that freq, so compilation might take longer.)

#

Apparently not.

#
Error (15836): inclk[3] port of Clock Select Block "hdmi_clk_sw" is driven by FPGA_CLK2_50~input, but must be driven by a PLL's output clock; clock pins should be moved to inclk[0] or inclk[1]
#

It gets angry, if the main core clock isn't driven from a PLL.

#

Fairly sure the Clock Select block was added for Direct Video mode.

#

(native core video output, directly via HDMI, for the cheapo HDMI-to-RGB dongles -> CRT.)

#

I just had a brainfart...

#

If I can get reicast or similar DC emu to run on the ARM, it could write directly to VRAM in DDR3, from the Linux side.

#

So would be super fast to load textures etc.

#

Then it would just need to write the usual stuff that the TA writes, then trigger the render.

rain obsidian
#

Core running at 50 MHz now, but getting stuck.

#

It's doing the texture read, but sometimes getting stuck, waiting for the data to arrive.

#

It waits for DDRAM_DOUT_READY.

#

It's not like DDRAM_BUSY ever goes high.

#

But this has happened on previous builds, at the much lower freq of 15 MHz.

#

So it's like the DDR controller isn't always seeing the Read request

inland valve
#

@rain obsidian
Hello,
I tried this morning to run the core Saturn_STV_20241231, but I get a black screen.
I followed your instructions to create the game ROM for sasissu_stv and tested with several BIOS files from the MAME archive.
In the OSD, the Cartridge option is correctly set to ST-V.
Could you please help me identify what might be causing the issue ?