#Sega Dreamcast
1 messages · Page 8 of 1
Core running at 30 MHz again. I tried it at 40 MHz, but there were quite a few pixel glitches, and vert lines.
A lot of the slow-down is actually the emulator atm, as the above video is with the emu not waiting for the FPGA to finish each frame
skmp did a tweak of the emu, to enable the use of "Fastmem".
But I haven't tried that yet.
ARM core(s) are at the default 800 MHz atm.
I did run the Overclocking scripts before, but I don't know if they continue working after a power cycle. I guess not?
I just tried the 1.2 GHz overclock script, and it instantly crashed. lol
Trying 1 GHz.
/root# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
1000000
1000000
Nope, those overclock settings don't stick, after loading the core.
Just had to run the script from the shell.
Trying C Taxi.
Running about ~28% fullspeed atm.
(at the BIOS logo)
Some weirdness going on. I think frame sink is still enabled.
Did a make clean.
That's more like it. Emu runs at 100% speed, at the BIOS logo.
In fact, maybe a bit too fast. lol
40% at the Sega logo.
Around 8-16% in-game.
So not great... yet.
The ARM cores aren't as fast as I was hoping.
I'll try the Fastmem version now...
oic, it's on a separate branch.
export CC='/opt/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf/bin/arm-none-linux-gnueabihf-gcc'
export PATH=$PATH:/opt/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf/bin
cmake -DCMAKE_TOOLCHAIN_FILE=tc.cmake ..
oww
Might need to add -lrt in CMakeLists
set(CMAKE_CXX_FLAGS "-lrt")
It should be clear by now - I have no idea what I'm doing, when it comes to compilers. lol
Last build didn't work.
librt still missing.
And it would have to be the ARM version, I would think?
Same as the old version, so maybe the whole vmem_posix thing isn't meant to be enabled, for the ARM build?
Doh! lol
Compiling.
booo!
${swrl}/hw/mem/_vmem.cpp
Needs to use the erm, embedded version of _vmem, rather than the x86/x64 thingy. Dunno.
Nope
That _vmem file was already included, elsewhere in the CMakeLists file.
int main(int argc, char* argv[])
{
#if HOST_OS == OS_LINUX
void common_linux_setup();
common_linux_setup();
#endif
set_user_config_dir(".");
set_user_data_dir(".");
add_system_config_dir(".");
add_system_data_dir(".");
ParseCommandLine(argc, argv);
cfgOpen();
libswirl_init();
libswirl_loop(argc == 1 ? "": argv[1]);
return 0;
}
Old version doesn't have that common_linux_setup stuff.
So I'll give up for now, and give skmp chance to take a look.
Actually, I'll try using the new _vmem files in the old build.
Maybe not. Way too many changes...
Gonna leave it for tonight, sorry.
Was just hoping to see how fast the emu ran with Fastmem.
(with or without the FPGA rendering running)
DC logo, with the ARM set at the default 800 MHz...
VREG = 03 ARMRST 00
ARM7: Invalidating cache
SetWindowText: reicast git/n - 35.46 - 28.19 - V: 16.90 (1.41, NTSC480i59.94) R: 11.93+0.00 VTX: 0.00 , MIPS: 0.00
SetWindowText: reicast git/n - 11.38 - 87.80 - V: 52.63 (1.01, NTSC480i59.94) R: 51.64+0.00 VTX: 0.00 , MIPS: 0.00
SetWindowText: reicast git/n - 11.29 - 88.56 - V: 53.09 (1.03, NTSC480i59.94) R: 51.10+0.00 VTX: 0.00 , MIPS: 0.00
ARM set to 1 GHz...
VREG = 03 ARMRST 01
VREG = 03 ARMRST 01
VREG = 03 ARMRST 00
ARM7: Invalidating cache
SetWindowText: reicast git/n - 30.19 - 33.12 - V: 19.85 (1.37, NTSC480i59.94) R: 14.39+0.00 VTX: 0.00 , MIPS: 0.00
SetWindowText: reicast git/n - 9.18 - 108.83 - V: 65.24 (1.01, NTSC480i59.94) R: 64.24+0.00 VTX: 0.00 , MIPS: 0.00
SetWindowText: reicast git/n - 9.11 - 109.76 - V: 65.79 (1.03, NTSC480i59.94) R: 63.30+0.00 VTX: 0.00 , MIPS: 0.00
I think the second number is the percentage speed, vs a real DC ?
So that's a decent speed-up. About what is expected for the 25% increase in ARM clock freq.
Need to do some profiling of the emu, when running on the ARM.
Don't get me wrong, it's amazing to see it running at all.
But I'm wondering which part of the code takes the longest to execute.
Seeing that there's no actual rendering being done by the emu, and no sound stuff enabled.
Or maybe the ARM is still there, as it has to give feedback to the games, to keep them happy?
I think the basic summary is - the ARM cores on the Cyc V suck. lol
We'll just have to call it "Lazy Taxi".
Oh dear...
/media/fat/reicast# ../Scripts/mister_mem_oc.sh
Current frequency is 800000 KHz
Frequency successfully set to 1000000 KHz.
/media/fat/reicast# ../Scripts/mister_mem_timings.sh
***** BEFORE *****
tCL=7, tRP=6, tRCD=6, tRAS=14
tRFC=120, tFAW=15, tRRD=3, tAL=0, tCW=7
tWTR=4, tWR=6, tREFI=3120
tCCD=4, tMRD=4, tRC=20, tRTP=3
Min Power Save Cycles=0, tXPDLL=3, tXS=512
mister_mem_oc seemed to work.
mister_mem_timings, not so much.
The whole screen went wonky, then it rebooted. lol
A nice extra speed boost.
(DC logo)
So that's with both the ARM and DDR3 set to 1 GHz.
But not the mem timings, as that probably only works for slower mem freq?
I spoke too soon…
Makes me wonder about whether a heatsink for the RAM would help.
ARM core(s) overclocked to 1 GHz. Memory left at 800 MHz, else it explodes.
FPGA renders not in-sync with the emu, for the above vid. ^
Scenes with mostly flat-shaded polys render quite fast now.
Emu in-sync (waits for) FPGA renders...
So the FPGA is still a fair bit slower than the emu is running.
Maybe ~10 FPS, at 30 MHz.
It should be in-sync, but it now looks like it's displaying the backbuffer.
Hence the gaps in the hud, etc.
emu is running at 30% speed, on that BIOS menu. With "wait for FPGA render" turned off.
5.12 millisecond (per frame) being wasted atm, waiting for the Tile writeback.
When it could be double-buffered, so it can be rendering the next Tile, whilst writing the previous one.
Trying direct FB writes, bypassing the Tile ARGB buffer.
Doesn't work too well on the sim. lol
Fixed.
I was hopping over too many states in isp_state, trying to get stuff to run faster.
But forgetting that it skipped most of the FB writes, because I'm bypassing the Tile ARGB buffer.
Had to restart the Quartus compile.
This is just a test, to see how much faster it can render, when it's not using the Tile buffer.
So it won't be able to do Alpha blending etc. but will write the pixel colour directly to the FB.
With the in-sync thing enabled, the BIOS menu is doing around 12% fullspeed.
Which is pretty good, considering it's at 30 MHz.
The end goal being 100 MHz, which will be tough, but we'll see.
Takes forever to get into Sanic.
Because the FMV plays so slowly.
The FMV won't actually display anything yet, because I haven't implemented the YUV thing.
You can see how much slower it is, with textures enabled.
So the texture cache is the next thing on the list.
lol
I really dislike unskippable intros.
With textures disabled, it's doing roughly 5-12 FPS.
With textures on, it's only about 3-4 FPS.
Sleeping soon. zzzzz
I know you keep saying the dreamcast goal is 100mhz but considering the original CPU was 200mhz do you think its possible to run at full speed?
Currently only the gpu is being recreated in FPGA with the rest running on a cut down emulator on the arm side. The GPUs original speed was 100mhz
FPGA is mostly filled with gpu so both gpu and cpu are out of the question for mister
Thus the only chance will be a hybrid approach with the FPGA just being the GPU and perhaps the sound chip. And the CPU and the rest being emulated by the ARM cores (a mini version of reicast currently running on them)
If this turns out to be too slow I suggested a "half-rez" SD mode, but ElectronAsh isn't convinced yet of its necessity.
(which is a good sign)
If a possible SD mode is too slow then DC is simply not feasible at all on MiSTer. But the PowerVR2 research could be used for a next gen MiSTEr.
Yeah I assumed this was more of a research / experimental thing that could maybe pay off on a bigger board but I haven’t been following that long
There should be a feature where the Dreamcast core makes obnoxious disc drive noises.
I want my games to be grinded into dust for the authentic Dreamcast experience.
Only way that might happen, is if some of the uber devs could collab on this.
As there might be ways of getting an SH4 core to execute more instructions in parallel.
Where it could then run at a lower freq, and still be fast enough for most games.
But right now, the SH4 is never going to fit alongside the GPU, sound, and all of the other logic.
Unless somebody knows how to massively reduce the logic used for the inTri and interp blocks.
I'm convinced it's possible, I just suck at maths.
Not even ChatGPT knows quite how to do it. lol
I do think quite a few people would be interested in a custom FPGA platform, with a slightly larger/newer FPGA (like Agilex), and a real SH4 CPU + SDRAM.
But at that point, you almost might as well run a Dreamcast or NAOMI. lol
The same FPGA platform would be able to run tons of the existing cores, though.
It just needs enough IO, vs the DE10.
The SH4 "hat" board I built, uses up both 40-pin GPIO headers, and even a few of the Arduino header signals.
So it can't even be used alongside an SDRAM module.
(although the hat board does have it's own SDRAM for the SH4 to use.)
I really do think the GPU logic could be shrunk down a lot, I just don't know enough about the maths etc., to say for sure.
On the face of it, there really isn't THAT much other active logic for the GPU.
I did have that starting to run a bit of BIOS code, but it would need tons of other registers and logic, to get it to run much further.
ie. to the point of it starting to write display lists into VRAM.
I looked on the MAME debugger a few weeks back, to see what it accessed first.
I can't remember exactly what now. lol
!
isn't the memory bandwidth needed also above what misterfpga can provide?
There should be plenty of bandwidth with DDR3. The main hurdle is the initial latency.
You can only really mitigate the latency by adding caches for the most frequently-accessed stuff, like textures.
So you Burst-read say 32 Words of data at a time, into the cache.
You still have that initial handful of clock cycles of latency, but after that, the data is read pretty much contiguously from that small chunk of DDR3.
Once the data is in the cache, it's basically zero latency.
ie. The address changes, then the data is available on the very next clock cycle.
DDR3 on the DE10 usually runs at 400 MHz, and is 32-bit wide.
400 MHz * 4 Bytes at a time = 1,600 MB/sec.
But it's DDR, so transfers data on both the rising and falling edges of the clock.
So the peak transfer rate is 3,200 MB/s (3.2 GBytes/sec !!)
(or 3.051 Gibibytes, or whatever the stupid name is for it now. lol)
The SDRAM used as VRAM on the Dreamcast runs at 100 MHz.
16MB main memory, 8MB graphics memory and 2MB sound on dreamcast right?
Only half is read/written at a time, for parameters (Vertex and other data), so 400 MB/s.
For textures, it's read as the full 64-bit wide data, so 800 MB/s, peak.
Yep.
NAOMI is 32MB main, 16MB VRAM, 8MB sound.
I guess they needed quite a lot for sound samples on NAOMI, as it can't use CD/GD audio?
So, plenty enough bandwidth for DC.
The latency of DDR seems to be a bit sucky, though.
With the core running at 30 MHz atm, it's taking roughly 5-7 clock cycles of latency to read any random word from DDR3.
For every single word (64-bit).
If the core was running at 100 MHz eventually (which could be very hard), that 7 (max) clock cycles will turn into about 21.
(from the POV of the core, vs DDR3 running at the same 400 MHz)
I don't know why it sucks quite as much as that. lol
Or I could be way off, and the DDR controller will handle it, and the latency isn't that bad.
The old-people (SDR) SDRAM module needs 8 clock cycles to read/write any random word.
But each module is only 16-bit wide.
yeah its weird, i lack knowledge on how mister works but DDR3 at 400mhz should theoretically be about 10ns, must be something to do with how it's shared on mister
So maybe the DDR3 is always roughly 7-8 clocks max, even at a faster core freq. It's a good question.
What we do know is, even with ARM Linux sharing DDR3, the BUSY signal almost never gets asserted.
I can Burst write tiles into the DDR3 Framebuffer, for 512 clock cycles, and only see BUSY asserted for one or two clock cycles.
Still, with the core at only 30 MHz, though.
I really need to do some more calcs on how to speed it up, though.
I didn't think it would be quite as hard as this, to get closer to the same speed as the sim.
I need to add a texture cache next, even if it's quite a simple one.
The Codebook cache is pretty much already in place, and it Burst-reads 256 Words at a time (64-bit).
Every time the texture Base address changes, as that's where the VQ Codebook sits.
The Parameter Cache isn't really a cache at all right now.
It just blindly stores the vertex params for each triangle. The Tag just increments on each incoming triangle within the current tile.
(so the Tag can be used as the address for the param buffer.)
Both the ISP and TSP need to know some of the param word stuff, and the verts for each triangle.
The ISP needs to know them, to do the inTri and Z interp calcs.
The TSP needs to know the texture base address, the X,Y, and UV of each vert, and reads the resulting Z values from the Tag/Z buffer.
Another pair of interp blocks for UV.
And then even more, for Gouraud, which I haven't even attempted to fit in the FPGA yet. lol
I need to bite the bullet soon, and just add the texture cache. Probably using the Codebook cache as the basis. It's practically the same thing.
The Sega PDF suggests how many words it pre-reads for each type of data.
And it's not very much. Maybe 32 Words at a time, usually.
VQ compression can fit 32 texels per 64-bit word, so it does a lot to reduce the amount of data you need to read for each texture.
I guess only TWO 64-bit Words to be read, for an 8x8 texture, using VQ.
Largest texture size on PVR2 is 1024x1024, apparently.
Which is riduculous, even with VQ. lol
That would need to read 32,768 Words (64-bit), even with VQ enabled.
256KBytes.
An example of the DDRAM_BUSY signal, during a Tile -> FB writeback.
OK, so it was three clock cycles, out of 512. lol
Actually a small bug there, because I should be holding in the same state, until BUSY goes Low again.
The DDR controller supposedly takes care of Burst Writes now.
By bunching the writes together (if they are on consecutive addresses).
So you just keep writing as fast as possible, and the controller mostly takes care of it.
For Burst reads, you have to decide how much contiguous data you want to request at a time.
Which is easy for some stuff like the Codebook cache, as that's always reading 256 Words.
But vertex params and textures are quite variable-length.
That's better.
It now pauses in the same isp_state, if the BUSY signal is High.
I can't tell if that has made much difference on-screen, as I keep messing with the ASCAL thing.
And now it has vertical scanlines again.
But the BUSY signal gets asserted far less often than I thought.
Sometimes it barely happens at all, through many frames of renders.
Although the Framebuffer writes are happening on alternate clock cycles atm, so it could go faster.
Anywho. Texture cache.
Gonna try to hook something up.
FPGA around 90% full... and it couldn't compile.
And that wasn't even with the texture cache block enabled, because it just gave a black screen when I tried it last night.
So not sure what else would have changed so much to make it fail to compile.
I removed some stuff from SignalTap, as that usually helps.
Also had to add some waits for "z_clear_wait", as it was often trying to write to the Tag/Z buffer during a clear.
It has to clear all of the Z values, at the start of each new tile.
So Z starts at zero, then it gets compared (for each pixel) to each incoming (visible pixel) triangle Z value.
It does the "32 pixels at once" thing for the Tag/Z buffer, so only takes 32 clock cycles to do the clear.
I really do need to emulate Burst transfers on the sim soon.
So I can keep the sim and Quartus version 100% in-sync.
So then, I should be able to check everything in the sim, fix most of the worst graphical glitches, etc.
I don't know if I can finish this, tbh.
But I also don't think it could be re-written very easily. There aren't too many ways to do it.
Just have to keep on chipping away at it.
But you can almost guarantee, if I get to a point where it runs quite fast on the FPGA, and the renders look half-decent, an uber dev will just release a full Dreamcast core. lol
'cos stuff like that has happened many times in the past.
(Arkanoid is one example, but I was happy to help get that done.)
thank you for your time in this 🙂
I will carry on, and see what I can do.
It's just frustrating atm, as I know what is possible now, but it's hard to get everything implemented, especially with it running so low on FPGA resources, and the very long compile times.
Right now, Quartus is havin trouble again, and the compile has been going for nearly 90 minutes.
That usually means it won't finish, so I might as well cancel it, and figure out some stuff I can remove temporarily.
This is why I really want a MiSTer setup with a larger FPGA.
I kind of sort of have one already, but with only 77K LEs.
Using a QMtech Cyc V module.
I also have a Cyc IV module with 150K LEs, but haven't tried compiling cores for that yet.
The problem with both modules is - there is no DDR mem.
Only on the Orange Pi, which isn't directly shared with the FPGA.
So none of the usual stuff like ASCAL can work.
Hell, I'd even replace the Cyc V on a DE10 with a larger one.
Which might not be quite as bad as it sounds.
I appreciate your diligence and hard work you put into this community! I just hope the mister fpga platform over time evolves where mister fpga chips do become bigger and MAYBE (idk how cores exactly work) could possibly one day have the ease capability of running dreamcast, PS2 and Xbox.
Likewise, it's been amazing seeing what you can do and push the limits of the mister.
Thanks. 😉
I still believe a LOT of logic could be saved, in the interp and inTri blocks.
Just needs the right dev, and somebody who actually knows maths, rather than me. lol
Also on the todo list, is to make a note of all of the min and max float values for X,Y,Z, for each of the example renders.
So I can figure out how many fractional and integer bits it needs, and where the could be an overflow etc.
I knew it was going to be quite hard to work with fixed-point numbers, as floats can represent a VERY wide range.
But so far, the ranges don't seem to be too extreme.
Currently working on testing an HP Z800 Dual-Xeon workstation.
I've not owned any Workstation-class stuff before.
And no ECC memory, not since I had the Octane and O2, around 2008.
It's not really worth trying to repair the original Z800 PSU.
Almost all of the rails are 12V anyway, then a low-current -12V, and 5V Standby.
I might not have a 12V PSU with enough current to even get the machine to POST atm, but it's worth a try.
I'm just curious to see of an older dual-socket Xeon can still be useful for Quartus or whatever. Probably not. lol
Poor Dreamcast sat on the bench. I haven't even tested the AtomisWave LAN adapter yet.
And the DVD drive from the Marantz DVD player, which I was intending to try reading GD-ROMs with.
Too many projects.
I also think your 'hat' thing is an interesting idea
It gives me Sega CD/32x vibes.
Imagine if one could just make a hat and connect 1 or 2 raspberry pi zeros for extra performance.
(I have no clue how any of this works, and I'm probably chatting absolute rubbish)
interesting stuff, but yeah just realized I don't really know what the other cores do with their video out but I guess it has to end up in the DDR3 🙂
Very unlikely to be quite enough to get the Dual Xeon to post, but I'll try without the CPUs in first.
yeah it's important to try to limit all the calculations to as few bits as possible. You can even see this happening on modern GPUs
No, it's absolutely possible, and a Rasp Pi could probably be made to DMA data fast enough into VRAM on the FPGA.
The main issue there is, having enough IO pins to do everything.
Each 40-pin header on the DE10 has 36 IOs.
(then two Grounds, 3V3, and 5V)
Which might be just about enough for a 32-bit bus, Clock, and a few control signals.
So that could be one way of getting more ARM power to run the emu.
It's just... it's never going to feel quite the same as having the vast majority of it on the FPGA.
Which isn't necessarily a bad thing, but you know what I mean. lol
Yep, on almost all cores, their native video (usually RGB, Syncs, and a Data Enable) gets fed to ASCAL, as well as to the RGB "DAC" pins on the GPIO header.
ASCAL has two or three Framebuffers in DDR3, to do the upscaling (or downscaling) etc.
But ASCAL also has the option of displaying an existing Framebuffer in DDR3, which is what I'm using atm.
Normally, that feature would often be used for displaying the Linux framebuffer from the ARM side.
The ARM just has to plonk the Framebuffer at a certain address, then ASCAL set up to read from that address.
ASCAL supports 16-bit, 24-bit, 32-bit?, and Paletted colour, when reading the Framebuffer.
I'm using 16BPP for PVR2, but kind of read from each 32-bit word. It's still not quite right, hence I only see half the horizontal resolution atm.
After quite a lot of hassle - the Xeon "POSTed" earlier.
But with the measley 16 Amp PSU, it gets quite hot within a few minutes, so I didn't want to leave it on.
I ordered an HP DPS-800 PSU instead, which is known to fit in the old case.
Lots of RAM errors showed up, but I don't know if it just needs "training" for that.
Could probably run the whole thing off a car battery for quite a while. lol
It's almost all 12V rails, plus an Amp or two for 5V Standby.
And a -12V rail, which is probably only used for the old-people serial port, and certain sound cards with opamps on, etc.
I doubt the speed of the Dual Xeons will be super impressive, but I don't know.
Just interesting to own a "Workstation" class machine, even if it's old.
Could someone please explain to me exactly what's being discussed here? I gather it's partially using the ARM and may run slow? Would it be too slow for use like a normal core? Could this become a real Dreamcast core on a DE-10 successor?
Groundwork for a Dreamcast gpu fpga recreation. CPU is emulated on the arm side. All experimental
Bump
He bumped!!! 😊
hehe
so I am back home
I should in theory be able to load up the core in the tools
it kinda does and kinda doesn't 😛
The waitress in my favourite heavy metal pub in my home town was from Greece. She educated us thus all my Greece is due to her.
haha
Not for ARM, at least.
I like Dreamcast
I think I probably need to tidy up this craphole of a room, and the other room, before I look into DC again. lol
It hasn't been this untidy in quite a while. I thought I was making progress, until I got burnout a few months back.
Adventure!
GameCube
Zx Spectrum
What's not to love
You have a very nice collection
tbh, I haven't powered up most of that stuff in the past 2-3 years.
There's a Vectrex at the back, but it has a fault on the logic board.
Same, I mostly just use my Mister now and my PC. Rest of my stuff just collects dust now.
It was fully working originally, but I helped out a friend many years ago (easily 8-9 years ago now), by sending him my working board, and he sent his faulty one back. I never got around to repairing it.
Exactly.
And, tbh, I rarely play games either now. I find projects more rewarding overall, that's during the times when I don't have burnout.
Good job the detail isn't great in the photo, as it's like an anti-doxxing filter. lol
Although, there is one game that's keeping one of my older consoles connected up to my TV. Resident Evil Code Veronica on the GameCube. (Plus I like how easy it is to mod a GameCube. I have the GCloader + a gcvideo device for a beautiful digital out picture)
Ironically, it also has a Dreamcast release. 😉
But yeah, it's always nice to see the more uncommon/non-mainstream consoles/retro stuff in people's collection. That Vectrex does look nice.
Last time this bench was this "tidy" was about four months ago...
So that's all in the same room, with the bench, and bed.
Each of the two rooms are so small, I can lay flat, and touch both walls. lol
Actually, I need to blur an address on that last one.
Haha, don't worry. All I saw was your very blue co op card.
It's why I don't really send pictures of my collections or anything. Knowing my luck il leave some personal info lying around. (Which has a high chance of happening)
co op card, was my squeegee, for the solder paste.
I have so many PCB projects to build, and almost zero motivation.
New GD Emu, boards for the DIY AV Receiver, etc.
Id be gutted if I ended up breaking my co op card though. No discounts.
lol
I only got it for Download fest last year.
Used it probably twice, and never since.
In fact, I think I forgot to even use the card.
These could actually be quite nice rooms, if I bothered to keep them tidy. sigh
Could do with new curtains, too.
I think this year, will be the year il learn how to do some soldering.
Collecting retro games/consoles and soldering goes hand in hand. I've got probably 5 spare ps1's I could practice modchipping on after I feel confident enough.
Then I could actually diagnose and attempt to fix my old Dreamcast. Unfortunately it's no longer dreaming anymore. (Powers fine, no video out)
Used to work though
Least it works though!
I do find soldering quite theraputic at times.
Also, getting into using BGA chips, and using solder stencils, isn't half as bad as you might think. 😉
My first attempt at using a soldering iron was to desolder caps and tsop (bridge 2 solder points) in my OG xbox
The only thing is, when using a stencil, is you really want to make 100% sure you have ALL of the components ready.
Needless to say, it didn't go well at all.
You can obv manually solder quite a lot after the fact, but not really with the stencil, and you'll already have paste / solder on some pads, unless you mask off the stencil.
I first did that TSOP thing probably around 2005, I think?
I've never owned an Xbox modchip - I only ever did the TSOP, and then the Font exploit.
And it was always fine, for what I used.
Easy to load ISOs on the HDD, and use XBMC etc.
The OG Xbox was my main media player, until about 2012. lol
Soldering really is mostly about prep, using (extra) flux where needed, and a half-decent soldering station.
I pretty much only need to use a small chisel tip for most stuff, on the Metcal.
It's like a 1.78mm wide chisel tip, or something.
And 0.5mm diam solder. Nothing ever larger than that.
And a lot of people think they can use a high-power "60 Watt" iron, and it'll be fine, but it's often not.
Especially the cheapo mains irons I started with, which got wayyyy too hot, to the point where the tip was almost glowing red.
And they would charr up instantly. That's no good.
And had the big screw, holding the tip, in, which would melt connectors etc. lol
Yeah. I think I had some rubbish flux, a bargain bin £10 eBay special soldering iron and just no skill.
My plan is to learn on boards you can buy from AliExpress, then after I feel confident I would buy a few cheap sports cartridge based games that use a battery for saving. (As I'd mostly be replacing batteries). Then start replacing them. Il probably pickup the Pinecil V2 soldering iron. Looks good
Definitely best learning on junk boards you don't care too much about.
For a lot of SMD component removal, I use the hot air station.
Just have to be patient with it, and don't set it much higher than say 350C.
(I have to set mine fairly high, as I don't think the temp display is very accurate.)
And to not try to prise off chips etc.. Unless they have a small spot of glue underneath (Saturn SDRAM, for example), they should lift off with almost no effort, once the solder on all pins is molten.
Yeah, sweet. Thank you for your tips. Much appreciated
The KSGER soldering stations look fine, for the money. I almost bought one the other month, but there is a new version now, and I wasn't sure which type (brand) of tips people tend to go for now.
I only ever switch to a larger tip, for really heavy-duty wire soldering, or large ground planes / heatsinks.
(on the Metcal, I mean)
But the Metcal is getting very old now. The PSU on my previous one already failed years ago, and is still in pieces. lol
I'm sure the KSGER T12 (or newer) would do fine.
But, ironically, the KSGER is already a kind of clone of other stuff, but there are clones-of-clones.
ie. versions which look similar, but with not-so-good build quality, especially in the PSU.
Some don't even hook up Mains earth to the tip.
(which isn't great for ESD, but it does help prevent shorting stuff, if you forget to turn the power off to the board. lol)
(should obv never do that anyway)
Super common fault on the DC, is just the connector from the PSU to the mobo.
If it's been powered off / unplugged from the Mains for quite some time, it should be safe enough to take apart.
Take the few screws out of the PSU, don't lose the plastic insulator sheet underneath it.
Clean the pins that poke up from the motherboard, using IPA, or contact cleaner.
tbh, I generally even scrape the pins on all sides a bit, then clean with IPA.
But you need to be careful not to allow any metal shavings onto any of the PCBs.
Sweet thank you for the suggestion. Hopefully I can fix it as it's been region free modded to play my Japanese Resident Evil imports.
Then, when you plug the PSU back on (don't forget the plastic sheet underneath. lol) - kind of shove the PSU on and off the pins a few times, then put the screws back.
Oh, and if you need to remove the motherboard for any reason, be very careful that some of the screws are longer than the others.
Specifically the ones that hold the metal part of the GD drive to the top metal shield of the mobo
Else, this can happen. See if you can spot it. lol
Hint: Just below the RGB/AV port, and just to the left of the SH4 CPU.
I feel quite motivated tonight, to tidy the rooms.
But I can't, because it's nearly midnight. sigh
#sleepingpatterns
I bought a CPAP machine years ago, which I'm sure would help, but I never got on with it.
It's probably how a dog feels, when it hangs it's head out of a car window, with the car doing 70 MPH.
Air being forced down you when you try to breathe out. lol
And yes, I am a bit bored tonight.
Need to quietly move a few bits around, like the two or three DVD players I bought recently.
With a view to getting rid / reselling them.
Buying DVD players in 2025? Not Blu Ray or 4K?
It's a long story. lol
This long...
I do think it's possible to read the High Density track on a GD disk.
Using the Marantz DVD player mech I bought a few months back.
But as usual, I didn't quite get to that point, of hooking up the serial connections, and FPGA.
If I can just do a proof-of-concept for that, I'll finish the PCB design for the "new" CD/GD/DVD drive for the DC.
Unfortunately, it has to use some fairly old chips, but probably still quite a lot of new-old stock available.
Could eventually lead to a Dreamcast that can play DVDs, too.
(and read DVDs, for larger homebrew, like DCA III.)
He must have been playing Ys!
@rain obsidian hey, did you know about this? I just found out!
https://x.com/robertdalesmith/status/1889087198890496160?s=46&t=qZDZkf9otfQfj9v44WsIZA
Yep, there is a custom BIOS which is patched to always enable this.
Or I think to select it from the Settings menu?
Japanese Cake BIOS, I think?
I keep updating ScummVM, unfortunately I don't think there is a way to get rid of screen tearing without modifying the framebuffer to support double-buffering.
Allo.
I did wonder about that again recently.
The PVR "core" thing can actually sync the emu to Vsync, sort of...
When the emu writes to the framebuffer in DDR3, I check the last few bytes on the FPGA side.
That triggers the FPGA to start rendering a frame, then I write some bytes back to DDR3, which tells the emu that the frame is done.
It's possible something like that could work for ScummVM and other stuff.
But you'd have to be fairly sure the emu can always run fast enough to render frames > 60 FPS maybe?
// Write the magic number to the evil PVR register. ;)
//
// parameter TEST_SELECT_addr = 16'h0018; // RW Test - writing this register is prohibited.
//
pvr_regs[ 0x18 ] = 0xCA;
pvr_regs[ 0x19 ] = 0xFE;
pvr_regs[ 0x1a ] = 0xBA;
pvr_regs[ 0x1b ] = 0xBE;
// Copy the PVR regs directly ABOVE the 8MB VRAM (in DDR3).
memcpy(vram+offs_8meg, pvr_regs, pvr_RegSize);
// Trigger the PVR reg copy on the core, then render the frame...
// (the core should then clear these to 0x00, after the frame is rendered.)
vram[ (offs_8meg-8)+0 ] = 0xCA;
vram[ (offs_8meg-8)+1 ] = 0xFE;
vram[ (offs_8meg-8)+2 ] = 0xBA;
vram[ (offs_8meg-8)+3 ] = 0xBE;
vram[ (offs_8meg-8)+4 ] = 0xCA;
vram[ (offs_8meg-8)+5 ] = 0xFE;
vram[ (offs_8meg-8)+6 ] = 0xBA;
vram[ (offs_8meg-8)+7 ] = 0xBE;
void rend_end_render() {
// wait for render to end
// interrupts get fired automatically
while ( emu_vram[ (offs_8meg-8)+0 ] != 0x00) {}
FrameCount++;
//printf("rend_end_render\n");
}
It's quite rough, but it works. Obviously the FPGA renders WAYYYYY slower than the emu can run atm, though.
At first i was quite exited with scummVM on mister because i thought the upscaler will be used.
Is it possible to have a software running on the arm side which use the fpga scaler?
ScummVM technically kind of already did/does use the ASCAL scaler.
ASCAL can be set to display a framebuffer in DDR3.
And the DDR3 is shared between the ARM Linux and FPGA side.
So Scumm running on the Linux side just renders frames to an area of DDR3, and ASCAL displays that.
So ASCAL can do the upscaling / downscaling stuff.
It's just they are not in-sync, so you end up with screen tearing, unfortunately.
I do think it's fixable, but I don't think I'll personally have time to take a look.
I'd also have to find the code in ScummVM which does the final rendering, and see if it could be made to wait for the FPGA/ASCAL to finish reading the previous frame.
Lets keep in mind that a lot of these original games that ScummVM play did have screen tearing (on DOS platform anyway) as I remember. So for me I can still enjoy them...
So it is possible to have the same filters than the cores?
The channel is way too empty wihout Ash 👾
I hope ash is doing ok. ❤️
Is he just taking a break or did something happen?
Dunno
probably fine. he works in bursts of extreme inspiration
That seems like his MO. 😎
@rain obsidian , grabulosaure showed git activity recently (IntV branch) thus maybe he is back from AWOL
Not doing too great, tbh.
I had another bout of what seems to be the same kind of low folate thing.
Leg weakness, shortness of breath, brain fog, blurry vision, dizziness.
Not half as bad as a few years back, but still not fun.
I had to start on Folic Acid again, Iron, and multivits, etc.
Gradually getting better, I think. But it's hard to think.
This all started happening after either the second vax shot, or after covid itself.
AFAIK, I might have only had covid once, which was at Download fest in 2021, about six months after the vax.
Trying not to make this a "political" thing, but I'm not sure what to think any more about any of it.
I lost my eldest brother to Covid in December 2021, which has obviously also had a very negative impact on all of us.
And no, he didn't have the vax. He refused it.
And I feel like I still can't freely talk about this online, else certain people will say "told you so" about him not having the vax, or "told you so" about me (and the three other people in this household, and most of the rest of the family) having it.
And I still have the trip to Thailand in just over a week, which I'm just about well enough to go on.
So I've just been trying my best to not let the depression drag me down, trying to drink more water, and eat a bit better.
Still not exercising, which is a huge problem. (EDIT: Quite hard to do, when you have problems with going outside, and when your legs have been buckling under you for weeks.)
I've been like this, pretty much since about four months ago, when I had a sudden burst of energy, and got about five PCB projects sent off at once.
I haven't built more than one of them since.
Could even be due to this, as I had Hypoglycemia as a kid, and have Asperger's etc.
Just interesting to me that I never had anything quite like this, until after the vax / covid era.
A lot of the symptoms feel VERY similar to what I had a few weeks after the second vax. Coincidence? We'll probably never know.
Might even be triggered as part of long covid.
I'm not back from Thailand until the middle of April, if I survive. lol
So will be even quieter until then
Hang in there Ash, Covid was f'd up. I lost my grandad to it. Not everyone gets political about it, screw people who do IMO 🙂
I really hope things get better for you Ash, I lost family to covid and a close friend now has long covid. Don’t need to get political to say that covid times were pretty awful.
Get well soon Ash, we appreciate what you do here but your health matters more!
Take care and do what you gotta for you. 🙏 Things will get better and of course, Dreamcast some day too 😎😅
take care of yourself, ash. ❤️
@rain obsidian I joke around but I love your work ethic and posts. Looking forward to seeing them again when you feel better. ❤️
Hope you feel better man
Hope you get better soon mate 🙂
Sending good vibes to ElectronAsh.
I hope you have a good trip and start feeling better soon, Ash!
Aye, take it easy mate, rest as much as you can, get well soon!
welcome back, buddy
missed you my guy, hope you’re feeling better
You're back! We missed you! Hope you got the rest you needed. 🫂
Welcome back @rain obsidian! Hope your doing well and health is good! ✌️
How was Thailand?
Thanks, all.
Sorry for the late reply, I was still recovering. lol
But, also, after the first full day of rest, I was helping my brother on the allotment plots again, dismantling and moving a greenhouse.
Songkran was... nuts.
But in a good way.
Very crowded at some points, but we really got into it on the second day in Bangkok.
The first day of Songkran, we were actually in Lamai, on Samui.
It did really help with my mental health, to get me out of a rut.
But I need to work on my diet and exercise now, especially to get rid of the belly.
I think that was the street parallel to Khaosan Road?
Hard to remember exactly where we were, as we stayed in four different hotels.
Bangkok for a few days, then Korat, visited Isaan, then back to Korat for 6 days.
Then down to Pattaya via coach.
Then back to Bangkok, to get a flight to Samui
We missed the earthquake by about four days.
We were in Pizza Hut in "The Mall" in Korat when it happened.
My sis-in-law said she thought the building was moving, then we immediately noticed it swaying slightly, and everyone was dizzy, including the staff.
Checked our phones, and saw the quake in Myanmar and Bangkok. Couldn't believe it.
When we stayed in Bangkok again for the last few days, the lintel in my room had a giant crack along it.
Very glad we missed the main quake, tbh.
Somehow I didn't have a repeat of the leg weakness etc. when I was away. Maybe just due to moving around every day, and taking Vit B complex.
And I was drinking at least 7 bottles of Leo beer almost every day for a month. lol
Only added a few cocktails to that on certain nights, not too often. Only really got properly drunk about three times.
We went via Air China for the main flights, via Beijing.
Flights and staff were fine, except the first Beijing to Bangkok leg, where the air con wasn't working at ALL, in the middle part of the plane.
It was genuinely one of THE worst experiences of my entire life. lol
Couldn't breathe for hours, I was passing out, lips tingling etc.
Bloody plane just would NOT go below the clouds for hours. I had my first real panic attack in the airport.
Aside from that, it was just the usual travelling thing of not getting enough sleep before flights, then not being able to sleep on the plane.
(crappy 737 for that short leg, but they thankfully fixed the AC for the flight back to Beijing. Then it was a much newer A350 back to London.)
Anyway, Dreamcast...
Not sure when I'll get back to working on it, but I do intend to at some point.
To make any real progress, I'd ideally need to collab with somebody.
Is Dreamcast core even a possibility? Or do you mean for a potential successor to the de 10 nano?
I guess that this experiment would rather feature a hybrid approach. High level emulation on the arm side and a low level emulation of the powerVR on the FPGA side.
But it is not even granted that the MiSTer can even simulate the powerVR alone.
But it is an experiment that nobody dared before. The outcome is unknown.
I think it's more just laying the groundwork for a next gen Mister
After all it is open source. If something can get working on the MiSTer, then good. If next gen consoles profit from the work, then even better.
What would be the best you’re hoping for on this core? Nvm
"Send nudes"? j/k enjoy songkhan, I was in Bangkok during it once, and was kinda insane lol
I just got a spam email from Terasic about a new board:
https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=166&No=1373&PartNo=2
It would need to be bridged with a raspberry pi or something over the GPIO to do a lot of the file stuff we offload to the ARM side. I don't know if all the extra logic would be as helpful as components to help with 3d processing.
the memory situation is pretty big downgrade from current mister
Looks like they have a SOC attachment that about doubles the price:
https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=67&No=1290
not relevant for the atum a3 nano
It's not specifically listed in the compatibility list but that might be becuase it's too new. It was listed under compatible attachments on the Atum 3 page
the atum3 nano doesn't have an fmc connector
Omg everyone is posting that everywhere
Terasic spam works lmao
lol true
The agilex 3 is kinda the cheaper budget line compared to the agilex 5 which will be the cyclone v successor more or less.
I predict both the price and feature set of the "de25-nano" are going to be disappointing
Same. Especially for us yanks due to tariffs.
there's also the even more disappointing possibility that this is what they consider the de10-nano equivalent
Nah
Doesn't even have hps
The de10-nano having hps and ddr3 positioned it very strategically.
interestingly the agilex 7 dev board has hdmi 2.1 output
Perhaps the solution is to come up with the specs we need and the next time and every time Terasic sends out spam we have everyone reply with that spec list.
they already know
They need a reminder. Every time they contact us about a board we can't use, we contact them about what we want. As much as you don't like having this flood of responses on the discord, they will not abide a flood of emails.
Please no mail campaigns. They’re aware of us of course and are happy to sell de10-nanos to more people, but we are not their target market in the slightest. They have project goals and priorities that don’t necessarily line up with what the MiSTer and that’s ok.
and they've directly spoken to developers
Alexey (Sorg), the MiSTer project lead, is already involved and has written a technical review of the de25-standard to help influence its development.
https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=&No=1365#contents
lol everyone in that lineup are department heads and professors of highly regarded universities. And then it’s Sorg.
Who’s the best dressed out of all of them lol
That is by far the sharpest suit.
LOL
I really want to work on DC some more, but a bit stuck again.
I also really want to see how the uber devs organize a project.
eg. Do they still keep paper notes and diagrams, for example?
I also need to hook up a DC mobo to the new o'scope, too see the basic VRAM timings.
I think that could give some good info on the internal state machines of PVR2.
ie. how much it pre-fetches (on average) for textures and vertex params.
How often it fetches the framebuffer for displaying via the DAC, and what the bursts lengths are.
That could give me more pointers on how to structure the core stuffs.
OK, so writing to the tag buffer is about as fast as it's going to get.
Where it processes 32 "pixels" worth at once (a whole tile row).
It also writes the params for each Tag value to the param buffer.
It's the texturing part which is slow atm, due to the DDR / SDRAM latency.
So I need to separate the TSP stuff next.
Then figure out exactly which params need to be shoved into the TSP.
The "Texture Address" module contains most of the TSP stuff really, but it currently resides within the ISP Parser file / module.
You're doing God's work sir
Param Buffer stores these values atm, per Tag...
ISP Instruction Word.
TSP Instruction Word.
TCW (Texture Control Word).
X,Y,U0,V0,Base Colour,Offset Colour, for Verts A,B,C.
Segata Sanshiro?
During the texturing phase, the outputs from the param buffer go via the Z,U,V interp modules, UV clamp, then to the Texture Address module.
Most of that just to generate the UV coords for the texture look-up.
The texture address module does the VRAM address generation, then decodes the colour info from that data.
(Texture Address module also contains the palette, Codebook cache, and final colour blending logic.)
Wondering how much of that module could be done using LUTs.
ie. small "ROMs" for various combinations of UV input bits.
This would be one of the first candidates...
// NOTE: Need to add 3 to tex_u_size in all of these LUTs, because the mipmap table starts at a 1x1 texture size, but tex_u_size==0 is the 8x8 texture size.
case (tex_u_size+3)
0: mipmap_byte_offs_norm <= 20'h6; // 1 texel
1: mipmap_byte_offs_norm <= 20'h8; // 2 texels
2: mipmap_byte_offs_norm <= 20'h10; // 4 texels
3: mipmap_byte_offs_norm <= norm_offs_1024[05:0]; // 20'h30; // 8 texels
4: mipmap_byte_offs_norm <= norm_offs_1024[07:0]; // 20'hb0; // 16 texels
5: mipmap_byte_offs_norm <= norm_offs_1024[09:0]; // 20'h2b0; // 32 texels
6: mipmap_byte_offs_norm <= norm_offs_1024[11:0]; // 20'hab0; // 64 texels
7: mipmap_byte_offs_norm <= norm_offs_1024[13:0]; // 20'h2ab0; // 128 texels
8: mipmap_byte_offs_norm <= norm_offs_1024[15:0]; // 20'haab0; // 256 texels
9: mipmap_byte_offs_norm <= norm_offs_1024[17:0]; // 20'h2aab0; // 512 texels
10: mipmap_byte_offs_norm <= norm_offs_1024[19:0]; // 20'haaab0; // 1024 texels
endcase
// mipmap table mux (or zero offset, for non-mipmap)...
mipmap_byte_offs <= (!is_mipmap) ? 0 :
(vq_comp) ? (mipmap_byte_offs_norm>>3) : // Note: The mipmap byte offset table for VQ textures is just mipmap_byte_offs_norm[]>>3.
(is_pal4 | is_pal8) ? (mipmap_byte_offs_norm>>1) : // Note: The mipmap byte offset table for PAL4 or PAL8 is just mipmap_byte_offs_norm[]>>1.
mipmap_byte_offs_norm;
It took me a very long time to simplify that. lol
It used to be three separate case blocks.
wire [19:0] norm_offs_1024 = 20'haaab0;
reg [19:0] mipmap_byte_offs_norm;
//reg [19:0] mipmap_byte_offs_vq; // The VQ mipmap offset table is just norm[]>>3, so I ditched the table.
//reg [19:0] mipmap_byte_offs_pal; // The palette mipmap offset table is just norm[]>>1, so I ditched the table.
Could replace the norm_offs_1024[05:0] stuff with the constant value.
I guess it would already be inferred as a small "ROM" block, tbh. Just in registers atm.
The type of mipmap offset "table" is chosen based on whether the texture is VQ-compressed, or uses a palette, or is uncompressed.
So that could for sure be combined into a small ROM / LUT.
(mipmapping isn't implemented at all yet, but I still need to calc the start offset for the texture data.)
If a texture is VQ-compresesd, is also adds an offset of 2,048, since it stores the Codebook before the actual texture data.
Oh shit, he’s back at it
and we love him for it
lol
Not quite at the level I was months ago, but I'll keep chipping away at it when I can.
It was a huge boost, when skmp got the emu working on the ARM side, for sure.
Currently trying to further simplify / combine the logic here.
Because every time it does a <= assign, it's adding latency to this module.
Which will be making the renders look more crappy atm.
If you stare at the code for long enough, you can start to see more patterns for combining things.
For example, I could probably get rid of the extra shifts in some places, like mipmap_byte_offs)<<2).
Need to try combining these.
// mipmap table mux.
mipmap_byte_offs <= (!is_mipmap) ? 0 : // Non-Mipmapped. Zero offset.
(vq_comp) ? (mipmap_byte_offs_norm>>3) : // Note: The mipmap byte offset table for VQ textures is just mipmap_byte_offs_norm[]>>3.
(is_pal4 | is_pal8) ? (mipmap_byte_offs_norm>>1) : // Note: The mipmap byte offset table for PAL4 or PAL8 is just mipmap_byte_offs_norm[]>>1.
mipmap_byte_offs_norm; // Uncompressed?
// Twiddled or Non-Twiddled).
twop_or_not <= (vq_comp) ? ((12'd2048 + mipmap_byte_offs)<<2) + twop :
(is_pal4 || is_pal8 || is_twid) ? (mipmap_byte_offs>>1) + twop : // I haven't figured out why this needs the >>1 yet. Oh well.
mipmap_byte_offs + non_twid_addr;
If the texture is non-mipmapped, that overrides the texture start offset, and just starts at 0.
If it's VQ-compressed (which apparently always uses mipmaps?), then add the offset from the table, with a shift.
Else, 4BPP or 8BPP paletted, so different shift.
Else, uncompressed, so no shift.
The actual clamped U/V coords get shoved into the "twop" calc.
"twop_or_not" gets shifted yet again, to generate the final texture look-up address...
// Shift twop_or_not, based on the number of nibbles, bytes, or words to read from each 64-bit vram_din word.
texel_word_offs <= (vq_comp) ? (twop_or_not)>>5 : // VQ = 32 TEXELS per 64-bit VRAM word. (1 BYTE per FOUR Texels).
(is_pal4) ? (twop_or_not)>>4 : // PAL4 = 16 TEXELS per 64-bit word. (4BPP).
(is_pal8) ? (twop_or_not)>>3 : // PAL8 = 8 TEXELS per 64-bit word. (8BPP).
(twop_or_not)>>2; // Uncomp = 4 TEXELS per 64-bit word (16BPP).
I feel like a lot of this could be combined into one LUT. lol
I try to combine the first block of code/logic first. I'll have to stare at it for a few hours.
Not quite right.
Oh, apparently you can have non mip-mapped VQ textures.
Or something.
Old code...
Ahh, it was due to 16-bit uncompressed textures for the Daytona logo etc.
But it can use a twiddled or non-twiddled texture address.
Took me all that time, just to ditch the "twop_or_not" logic.
But I still managed to break some of the textures more.
Wall texture is very rough now.
We are so back! 🤘
Not quite at full steam. lol
I worked on it for quite a few hours last night, and got hardly anyway.
I couldn't fix the issue with the wall textures (VQ + mipmap) without reverting to the older code.
I'm once again reaching the limits of my knowledge. I'm happy to get this far, as I didn't think I'd even manage to render anything at first.
It could be doable to get the speed of PVR fast enough to be playable, if the emulator part on ARM was improved as well.
But it might never hit "full speed" on the DE10, even with the hybrid emu approach.
Something simpler that you may be interested in as I know you like Sega and midi, is Saturn had a midi adaptor and a couple of titles supported plugging in a midi keyboard to use that could potentially be supported in the core with usb midi keyboard (Main already supports midi keyboards etc.)
I am also looking for someone who can read Mame code and comments to figure out what chips are actually inside the Casio Loopy that are used for playing games (and not the printer parts) as the main CPU is an SH1 and I have a suspicion the other parts may not be as complex as previously thought. Interestingly the console is described as basically being a Casio keyboard under the hood, which if true is quite funny. Would be good to figure out what is actually in this thing. 🙂
https://discord.com/channels/647909397477195803/1363955274020290621
“twop”? Is that something from the DE-10 Nano’s side or the Dreamcast’s side?
Wait, I wonder what will happen to the VMU here? Like, will it be able to be seen with DS emulator-like settings on positioning it, or will it just be not visible at all?
Also would SNAC be able to handle the data load of VMU communications for a Dreamcast controller?
Prolly none of these questions matter, given that you aren’t even hoping to get more then what sounds like the GPU on the FPGA, and aren’t hoping for the core to run at beyond half speed.
twop, I believe, just means "Twiddle Operation".
Twiddling is the swapping of certain bits. In this case, it's the texture address.
They interleave the bits of the U and V coord values, which changes how the texture is accessed from VRAM.
It makes the texture data more localized, even when a texture is rotated on-screen.
ie. It makes the accesses more contiguous, so it doesn't have to skip so many lines of texture in VRAM.
This helps keep the data within the same VRAM "page", which in turn helps with burst transfers.
(if you go beyond the current VRAM page, it incurs extra clock cycles, to activate the next page / row.)
I've been wanting for ages, to hook up a real DC to the o'scope, to check the basic VRAM timings for stuff.
But I haven't got my brain back into "fine soldering" mode yet. lol
tbh, things like the VMU are wayyyyy secondary to everything else atm.
But the VMU could just be added as an on-screen overlay later.
Or, real DC joypads would be hooked up via a simple SNAC adapter. It's only a few pins.
I'm still trying to figure out an efficient-ish way to do the Tag sorting, before the info gets sent to the TSP for texturing.
But it looks like any way I chose is going to use a lot of FPGA logic.
It's quite a tricky problem, and I'm not used to that kind of thing with coding.
The Tag buffer just represents which primitives (polygons) exist within the current tile.
And obviously which pixels of the tile each polygon covers.
You can't really send any of that info to the texturing unit until ALL of the polygons within the tile have been processed.
Since that it how it does the HSR (Hidden Surface Removal).
It does process 32 "pixels" in parallel, though, so gets a nice speed boost from that.
The problem is, to make VRAM access efficient, it needs to group together all of the same Tag values for texturing, so it doesn't have to keep re-reading the VQ Codebook or palette, nor vertex info from VRAM.
Quite a hard thing to do, if you think about it. Especially trying to do it without too much logic.
I already generate a bitmask for each new polygon written to the Tag buffer.
That bitmask denotes which pixels of each tile row (part of) the polygon is visible in.
For each new incoming polygon, I would have to write a new bitmask, but also negate the bitmasks of any previous polygon Tags that get overwritten.
That would mean updating (up to 32) 32-bit bitmasks at once.
Which would really need to be done in parallel as well.
I guess that wouldn't be too bad, if it was only working on one row at a time.
Even then, once the whole tile is processed (in the Tag buffer), the TSP would still need to increment through each Tag + Bitmask, to do the texturing.
Well Dreamcast games are in 4:3. You could render the screen in 16X9... Obviously keep the DC game in 4:3 but place a small window in any corner with the VMU screen I would guess.
I'm not a dev so I'm just taking a wild guess here
No easy way to skip to active bits in the bitmask either, unless it uses an RLE scheme.
Yes, that's true. You could overlay stuff outside of the 4:3 image.
That would only really work on the HDMI output though.
Unless you output to the CRT as anamorphic or something. lol
Yeah CRT would be tricky.
You'd probably just have to overlay it in a corner of the screen
If you allow for the option of positioning it wherever you'd like, on a case by case basis for CRT gamers they could position it in the best spot possible
I think the only real way to solve this Tag sorting problem, is to use a kind of linked list.
Since it won't always have 32 different Tags per row.
But can have up to 32.
It would need a way to denote how many of the 32 entries are active.
If it's only say 5 unique tags in the current row, it can say "OK, we're done for this row", then move to the next.
Or, screw all of that, and just hope a proper texture cache can keep the speed high enough. lol
Random wireless light gun photo, FTW.
Using the "AliExpress Special" light gun board.
oh hi @rain obsidian
I don't implement tag sorting myself
instead i have a larger fpu cache
hi
Yeah, it's because I can only read one 64-bit word of texture per clock tick.
Even at 100 MHz, if it spends too long re-reading the Codebook or params, the overheads are too great.
Fastest it can render a 640x480 image at 100 MHz, texturing only one pixel at a time...
640 x 480 = 307,200 pixels
307,200 x 100 MHz (10ns) = 3,072,000 = 3.072 milliseconds.
I guess that's 325 FPS. lol
But yeah, it's tricky to get decent frame rates without Tag sorting.
I've tried to think of all kinds of ways to approach it.
But most ideas involve using too much FPGA logic.
Latest ChatGPT suggestion...
Which is pretty much what I tried before, but couldn't get working.
I don't think it would help much, keeping a bitmask for which pixels each Tag relates to in a tile row.
Because the TSP would still need to check each bitmask bit to know whether to write the pixel or not.
The pixels in a tile row aren't necesarily contiguous (per Tag / polygon) either, of course.
Since parts of previous poly spans can be overwritten by new ones, leaving sporadic pixels from the previous tag(s).
However...
With the older Verilog, I was getting very good (theoretical) frame rates in the sim.
Since it had no latency for accessing random Words from VRAM etc.
But mainly, when I added the leading zeros / trailing zeros thing.
That allowed it to jump to the starting pixel of a span, but then it was also using a full Z-buffer, which is cheating.
The only other way, would be to store a list of all of the X/Y coords of all pixels relating to each Tag.
Which could be up to 1,024 values (32x32 pixel tile)... per Tag.
And I currently have to store the vertex params for about 256 (or more) Tags in the param cache.
So that's a LOT of data. lol
OK, so...
Ignorring transparent stuff for now, the inTri thing is boolean, right?...
ie. either the Tag gets written to the Tag buffer (per-pixel), or it doesn't, depending on the Depth Compare.
Trying to figure out if I can just send a Tag buffer row directly to the TSP.
Then let it overwrite pixels in the framebuffer, based on the inTri result.
No, probably not.
That's not really "deferred" rendering at all.
thats also in the hw
Do we know if the real HW can actually read texels from four different addresses at once?
Since there are four address busses.
(would only be a 16-bit wide value from each VRAM chip, ofc.)
Another example of the Tag buffer.
Which represents the lower-right tile being drawn on Sanic.
You can kind of spot the pattern of Sanic's spike, and right hand.
A "Tag" is simply a value which increments once for each incoming polygon (or partial polygon) within the current tile.
With Tag values up to 0xB6 (182) in the example above, that's quite a lot of crap to store in the param cache.
(the param cache stores the ISP and TSP params, Vertex X,Y,Z,U,V, Base Colour, and Offset Colour, for every tag within a tile. 😮 )
I basically need an algo which could determine which pixels in the Tag buffer relate to a given Tag value, so it can efficiently group them together.
And then ofc do the same for all of the other Tag values in the buffer.
They did apparently use an RLE approach on newer PVR stuff. Not sure if PVR2 did.
I keep coming back to the linked-list thing.
It already calculates in inTri bits for all 32 pixels in a tile row, within one clock cycle.
But when a new poly span comes along that overwrites part of an old one, it needs to "evict" those inTri bits from the list
(and if all of the inTri bits of an existing span get overwritten, it needs to mark that as a free entry.)
Only got as far as boosting the theoretical frame rate last night, from 15.6 to around 23 FPS.
And that was just skipping some states in the ISP parser.
So there isn't much point looking into Tag sorting and other stuff, until I can figure out how to do a pipeline.
Due to the latency of processing each thing, it's taking too many states to output a new pixel.
I need to figure out exactly what needs to pass between each stage, then do a pipeline thing.
Well I'm pretty sure Resident Evil Code Veronica (PAL version) runs at 25fps. Theoretically we only need 2 more frames (/JK)
It's always great to read your write ups and seeing pictures. All that seems so technical to me.
So it will take a hit of a few clock cycles at the start of processing a tile, but then it will be a contiguous stream of pixels written to the ARGB tile buffer (per Tag, ignorring any wait times for texture cache misses / DDR latency).
(and then double-buffered, so it can flush the finish tile to DDR via burst transfer, whilst allowing the opposite tile buffer to get written to for the next tile.)
Oh, and cleaned up the render a bit, but the wall textures are still messed up, after I tweaked the texture address module.
I think I'm gonna struggle with any of it from this point, tbh.
It was amazing seing skmp's emulator running on MiSTer, though.
Even with the very low frame rates, mainly due to my crusty "GPU".
I might have to pause working on this again, and get some PCB projects built.
Including the new GD Emu + HDMI board.
Whilst helping with the allotments, so I didn’t get much done in the rooms. lol
I only got as far as clearing some of the stuff from the floor, into a box. sigh
Love the random Psu haha.
Really hard not to dox yourself these days, with all the mail-order stuff. Oh well. lol
Oh yeah, I bought a US Dreamcast PSU a while ago.
To do some testing on noise + heat, when the 12V rail isn't being used.
I need to be sure I don't get it mixed up with the 240V ones. lol
Seeing all that electronic stuff always reminds me that I need to learn how to solder.
One day I will. Got a lot of stuff planned first.
Ah that makes sense.
Bits of poor Game Gears everywhere, too.
DIY 3-chip DLP projector probably won't ever happen at this rate.
I never did manage to get ANY pixels shifted from an FPGA onto the DMD.
But at least my own DMD board does work, only driven by a commercial projector directly (Optoma).
I think we've done quite well in this little house.
And I helped move most of it. lol
Don't really EVER want to build a single-pane glass greenhouse again.
OH MY GOD
It was terrifying. My brother already dropped some panes of glass on his ankle ('cos he didn't wait for me one morning.)
That picture
No way
The black and white
In your pub
My old Barbers used to have a wall filled with pictures like that hung up. He had the same one iirc.
What a blast from the past.
Nice to see the Guinness on tap too haha.
Helped build the conservatory (and pick it all up), the summerhouse for the bar, the bar itself, the other summerhouse, all of the sheds, the planters, everything in the bar, moved the wood and slabs, bought the corner sofa for the decking two weeks ago.
Taps have never really worked. lol
Brother bought a beer cooler a few years ago, but it had a coolant leak right away. sigh
I reckon you just need some classic Wetherspoons plates and your all good to go.
Looks amazing
He rarely drinks spirits, and this is the first time in about 18 months that we've even bought a few bottles, for his birthday last weekend.
So basically, I'm the main person who has drank the spirits in the bar... about eight times over. lol
Just over a long time.
The only thing that makes my eye twitch, is the fact he decided to offset the shelves at the back of the bar.
I don't get it. I never did. lol
He wanted to put a different amp horizontal, on the left-hand side.
But there are SO many other ways to control the amp if it was stashed under the bar etc., or just buy a smaller one.
I like symmetry. lol
I used to really hate Guinness in my teens, but I love it now.
(planters and stuff not finished yet. We only just moved them from another plot yesterday. Rolled my ankle... again. lol)
The summerhouse on the plot has a faux-leather sofa, TV, and stereo, with Mission floorstanding speakers. lol
Just don't tell them.
We also somehow managed to get the allotment plot directly behind our garden.
And they even let us put a few steps up, and a gate. 😮
That's also very good
Then my brother managed to get the plot next to that, so it's almost like we have a 60ft square garden. lol
So yeah, that's another reason why I struggle to get retro / electronics projects done.
Don't really want to do much more heavy lifting atm.
That does make sense
I didn't decorate the bar, though. I didn't have much say in it tbh. lol
The decorations look amazing TBF. That black and white sign really stands out.
That corner sofa, btw...
The day before, we saw one at a garden centre... They were asking £1,500 for it, almost the same.
My brother had a favourite search set up on Faceplant Marketplace.
We got it for £100. lol
Now that's a deal and a half right there.
Yeah, it does look pretty good now. We managed to get the bar built, about two days before the first lockdown in 2020.
So both good and bad memories, but it for sure helped us get through it.
Right, I might actually have a Guinness in a sec. lol
Catch you all soon.
Take care mate.
Place looks awesome, also love your bar.
no, akaik, no
all textures are addressed in 64 bit units
I assume it always reads from the texture cache in 64-bit units, and that the texture cache can provide 4x64 bit units for bilinear
if miss, then it refills
same for the VQ cache
Yes, Christmas 2038.
So we’ll be getting the Dreamcast core at the same time John Wick’s car is repaired, then.
I think I might finally have figured out a method for Tag sorting.
I couldn't find a better image atm, but you get the idea.
When the HSR process is done, you end up with a full Tag buffer.
Each Tag value just relates to a specific polygon (or partial).
But during the HSR processing, the Tags won't always be in contiguous spans/rows, because previous Tag values get overwritten if the next polygon span comes along (and if certain pixels from the new poly span pass the Depth-compare test).
For every incoming polygon, I currently store the ISP/TSP/TCW, Vertex, and Colour params in a buffer/cache.
Which is obviously very wasteful, but I'll figure that out later.
Even storing only 256 sets of params gives half-decent renders.
Anywho, for doing the Tag sorting, I just have to start rendering pixels from the FRONT Tag, then going front-to-back.
For each row in the Tag buffer, I'll also store a bitmask (inTri word) for which pixels in the row the Tag is active in.
So then I can just work my way from the "front” Tag layer backwards.
Each time a pixel is filled in (in the Tile ARGB buffer), it does a logical NOT with the bitmask in the next lowest Tag.
Once all of the bitmask bits in a row are zero, it will just skip that row, until ALL rows are zero, then it knows it's rendered all pixels within the tile.
Kind of hard to explain, but at least now I have something to work with.
Need to go out now, to help my brother fix the brakes on his van.
Oh, could also store a 32-bit bitmask for which rows of the Tag buffer contains active pixels of the current Tag. That way; the TSP should be able to skip whole rows.
I’m still in the van. lol
For transparent polys, I think that can work, too. Just calculating the pixel colour as it works through each layer.
Failing that, it would have to render from the ‘back’ Tag layer, but that’s harder.
And now, as always when I get a eureka moment, I’m stuck bleeding the brakes on a Ford POS.
I think I might be able to reuse the inTri logic during the TSP stage. At the same time, checking each inTri bitmask to generate the ‘per row’ bitmask.
Capri? XR3i, 1978 Cortina?
@halcyon creek It's actually a Ford Transit, but the "joke" is from Men In Black (1997)...
A great moment in automotive movie history.
https://discord.com/channels/647909397477195803/1062880066100531210 Has anyone actually decapped the VMU?
Not sure if the CPU has been, but that isn't required to make a core of it. There is one half done someone did as a uni project. "Just" needs someone with the skill level to handle a CPU to pick it up.
Is the Sega Mega-Genesis-Drive too compilated to make an accurate core off without decapping or something?
Or was the old "Genesis" core just based off of a REALLY bad guess on the hardware?
The old Genesis core was excellent
The new core is a really inefficient way to code a core, doing a wire but wire recreation, but it did fix a few timing bugs. There is a reason after that Sorg didn't want to port over the SMS done this way
is the current SMS core based on the genesis core?
No, it is it's own thing
oh it's SMS and game gear that are the same core, yeah?
Yeah
A Nuked SMS core would lose a lot of things that the current SMS core does, would be two steps forward, these steps back
Yeah, that's a weird balance. I imagine the wire-by-wire recreation is the reason that the core is so full, and doesn't have room for more stuff
Yeah, the Megadrive core is really inefficient because it is wire by wire, and as a result it much bigger than it needs to be, so takes up way more space than the old one. Also being coded that way makes it much harder to add extra features to before even considering the lack of space left
People are really forgetting that accuracy is always the priority when it comes to the MiSTer project.
If you really want to pump games with extra options, just boot up a software emulator instead.
The VMU cannot be more complicated than the Sega Genesis, so ritual sacrifice (decapping) of one for an hyper-accurate core doesn’t seem too far fetched to me.
The blocker to there being a VMU core isn't a lack of info on the chip, there is a data sheet for it out there and it is already in Mame.
https://github.com/mamedev/mame/blob/master/src/devices/cpu/lc8670/lc8670.cpp
The reason there isn't a core is because nobody with the skillset to make the core is interested, or hasn't decided they want to spend their time on it.
If this Sanyo CPU is from the late 90s, then it may not be possible to decap and get good pictures of, as once you get to the mid 90s the number of levels on chips increased so it isn't feasible to decap them and get pictures of all the layers as they are stacked on top of each other. Unlike chips from the 80s that were one layer.
Even if someone did a decap, and it worked, and good photos were taken, someone with the skills to trace the whole thing out would need to do that which takes a long time. Then someone would need to turn that into FPGA code.
The end result, if even possible, would be less efficient and harder to read than building a core using the datasheet, other resources like Mame and probing the hardware.
For the SMS core, it would be better if someone were to fix the two remaining test fails, possibly using the Nuked core as a reference, than throw it out and port over the Nuked one, losing all the additional mappers, GG support, peripherals, as well as extra features like cheats which would be convoluted to try add into the Nuked core.
I'm still stuck on this stupid problem.
I managed to speed up the render times on the sim yesterday, from around 15 FPS up to 26 FPS.
But most of that was "cheating", by removing unneeded states, etc.
I don't think there is any simple way to generate the RLE output from the polygon spans on-the-fly (as the spans get written to the Tag buffer).
Without it using a LOT of logic.
But maybe it's not too bad to just store the inTri bits for all 32 rows in some BRAM.
That would require 128 Bytes worth, per Tag.
A render like the Daytona one has a max Tag value per-tile of around 600.
So call it 768 Tags in BRAM, times 128 Bytes = 96 KBytes.
It would still need to render the "Tags" in front-to-back order.
Because if it was back-to-front (ie. processing from the first incoming polygon to the last), there's no simple way of updating ALL of the previous inTri bitmasks when spans get overwritten by newer polys.
Converting the finished Tag buffer (example above) to RLE could be doable as well, but I think it would take too many clock cycles.
If I temporarily disable the final rendering output in the sim, the Daytona render gets a frame time which would hit 70 FPS, but that's only if the core could hit 100 MHz.
Basically, the whole reason to group together the Tag values, is so that you only need to read the texture into cache ONCE.
(well, you have to read another chunk of the same texture if need-be, but you get the idea.)
With the current scheme of just rendering each Tag value as it appears in the Tag buffer, it would obviously be even slower on the FPGA, due to the DDR / SDRAM latency etc. That RAM is only really efficient when doing longer Burst transfers.
I tried looking through multiple patents to see if I could find how they optimized things.
But didn't find too much detail on the Tag sorting part.
Maybe doing the RLE thing on the Tag buffer isn't so bad.
As it takes a lot of clock cycles just to read in a chunk of texture or VQ Codebook anyway.
That could be enough time to offset the time taken for the RLE thing.
(RLE is important, not only for sending less data to the TSP, but so the TSP can quickly skip over pixels and rows which are not covered by the current Tag / polygon.)
Doing the RLE conversion on a 32-bit inTri bitmask looks fine. ChatGippity can help with that.
But I don't think there are many ways around storing the 32 inTri words for every incoming poly / tag.
It's sort of currently doing this.
But with a single Tag value assigned to each triangle (ie. the same value across all active pixels of the triangle).
Obviously any incoming triangles that pass the depth test with overwrite the previous triangle's pixels.
(I do also store the Z values for each pixel. The Tag buffer is a combined Tag / Z buffer really.)
I can't use the Z values to help with the RLE thing, because that would be a LOT of data.
What we do know, is the Tag value of the last poly spans written to the Tag buffer.
I can't think of a better way, than to render from the "front" polygon, then working front-to-back.
(marking off how many pixels in each row have been shaded, until all pixels are done.)
This is probably obvious to most of you, but code from ChatGpt rarely works as stated. lol
I get that it's the users that "train" the language model, but it still has a way to go yet.
It'll be wild when it gets there though. It's only a matter of time.
Tried some ChatGippity code earlier. Didn't work.
Well, didn't quite work.
It outputs RLE length = 0, which should never happen.
Eventually gets down to a row where the "02" tags start.
But even those don't combine RLE values into one group. lol
I think this might be a doable method, as it could start the process whilst the TSP is still fetching a chunk of texture for the first Tag (polygon) found.
I need to test a smaller part of code from ChatGippity first.
Just need it to output valid RLE values for the stuff currently in the (completed) Prim-Tag buffer.
Might be enough logic spare later, to have it do that on more than one value at a time.
I realize this might never get close to "full speed" on MiSTer, btw, even with skmp's port of the emu part on the ARM.
But if I could even get it running well enough to hit 15-20 FPS in most games, at say 50 MHz, that would be a great proof-of-concept.
And hopefully inspire others to collab, and aim at Dreamcast for a future FPGA board maybe.
Failing that, Robert might just release a full Dreamcast core out-of-the-blue. lol
Makes sense, given the VMU has only around 25 official games for it, the games are still made under the assumption that you have a Dreamcast (so not all of the games work well standalone), and I doubt homebrew would expand that much more, else the VMU thread wouldn’t be dead, and a core would be talked about more.
What you could do is some research into what the games are all on the VMU, make a. List and highlight the best ones, and also what Homebrew was made for it, and do a write up in the VMU thread. If a good case is made there is interesting stuff on there then there is more chance of a dev deciding to pick it up.
I could be wrong but I don't think the CPU in it is used elsewhere, but would be worth double checking that.
Still working on this.
I have a slightly different strategy.
I asked ChatGippity to write some Verilog that simply generates RLE output from the 1,024 Tags in the Tag buffer.
For some tiles, all 1,024 pixels will have the same Tag value (like some of the "sky" texture tiles in Daytona, for example).
That would take a minimum of 1,024 clock cycles to determine that, so that's the worst-case latency for when the first RLE value gets output.
But, you can offset that latency...
By just having two Tag buffers, and doing the double-buffering thing.
So you would incur that 1,024 clock cycle delay at the start of processing, but then the ISP would be able to continue processing the NEXT tile whilst the TSP is doing the texturing for the current tile.
The only problem with all of this, is how to handle the translucent polys. lol
Once I have the ISP outputting RLE reliably, I can then separate the texture_address unit, and shove all of that into its own TSP module.
Oh, and the RLE code from ChatGypsy does apparently group the Tags together, which is exactly what I need.
(that way, the TSP only needs to read the chunks of texture for ONE tag value at a time, which will prevent it wasting time re-reading stuff from VRAM.)
Depending on how the texture cache is written, it would be possible later, to render two, four, or more pixels at once.
Then dump the finished Tile ARGB back to DDR.
I know this project has been going on for a very long time already, but I don't just want to let it die (yet).
I'll continue to work on it when I can.