#Sega Dreamcast
1 messages · Page 5 of 1
Don't get it.
Can't see what changed, as I changed too much code at once. lol
I'm getting strong Deja Vu with this.
It gets confusing, with the address shifts you need, to convert between BYTE and Word (16-bit, 32-bit, 64-bit) addressing.
I'll just have to remove the extra two LSB bits from opb_word, and add poly_addr to SignalTap, as I'm sure that's where the issue is atm.
Oh, I suppose it already is the BYTE address now.
Not sure how that happens, but must be some difference between the sim and FPGA versions.
ie...
opb_word is 0x80166780.
We take the lower 21 bits from that [20:0], and I can see it's already the BYTE address.
So no need to shift.
poly_addr <= (PARAM_BASE&24'hf00000) + opb_word[20:0];
It actually gets added to some bits from PARAM_BASE as well, to kind of mask which 1MB block of VRAM the data is in.
I REALLY want to see this drawing something tonight, so I can sleep a bit better.
Something more fundamental is going on, like a shifted address in the PVR module.
I might have to leave it for tonight, and have a look tomorrow.
OK, so the shift of the lower 21 bits of opb_word was correct.
But it was using the wrong thing for the poly addr, due to a different bug.
Still awake.
Can't get it to compile with everything in place.
It complains about not having enough registers for uninferred mem.
But I did both the Z-buffer and param buffer as proper IP blocks now.
The frustrating thing is, I can't tell for sure which part is using the most logic until it compiles.
And it refuses to get past the error.
I started merging the two projects (Verilator sim, and Quartus project).
This is what it looks like in the sim now.
Which is more than I was hoping for, because now I can debug that.
A tweak...
Temporarily disabled the vram_wait emulation.
Which suggests something in the code isn't waiting for that at all, and expecting the data too soon.
Back to using registers for the param buffer...
A few glitches, and transparency disabled atm.
Because the core now outputs 16BPP for writes to the framebuffer, to match what the Quartus version is doing.
So no Alpha value is being blended with the existing framebuffer stuff in the sim atm.
Alpha kludge...
The rear window on the car wasn't being rendered properly anyway, due to other issues.
I'm basically trying to copy parts from each project, to eventually merge them.
So that any changes in the sim should work on the FPGA.
(aside from stuff like PLLs and BRAMs, which have to be instantiated using IP blocks in Quartus.)
Got rid of the vertical lines...
Quartus is being a proper bitch.
I think it's the Codebook cache causing most of the issues atm.
It also needs to be re-written, to infer using BRAM correctly.
Are you the guy that makes the awesome shaders for RetroArch?
Unfortunately not. You probably refer to Sonkun 😜
That's it lol. Either way, a pleasure to meet you sir 😀
I gave up trying to fix that code last night, and went back to the old one.
You can see now, how the first proper opb_word has the lower part set to 0x990.
And that IS the (32-bit) WORD address, so it was correct to shift it left by two bits, to get the BYTE address.
So that's the start offset in VRAM, for the first triangle of the first primitive.
At least now, I know what needs to be done. I can start adding each part more slowly.
I thought about even setting Quartus to use a much larger FPGA in the same Cyclone V family, JUST to get it to compile. lol
It's a bit dumb, that it can give you an error about using too much logic, but it won't give you a rough estimate of how much that is.
(I realize it can't give an exact figure until it has got all the way through the Fitter process, but still.)
You know you've done something wrong, when Quartus has been running for an HOUR, and not moved. lol
This was me trying to be clever, and do the Z-clear faster...
if (clear_pend) begin
z_buffer[ clear_cnt ] <= 32'h00000000;
z_buffer[ clear_cnt+10'd511 ] <= 32'h00000000;
clear_cnt <= clear_cnt + 10'd1;
if (clear_cnt==10'd511) begin
clear_done <= 1'b1;
clear_pend <= 1'b0;
end
end
else begin
if (clear_z) begin
clear_cnt <= 10'd0;
clear_pend <= 1'b1;
end
if (inTriangle && depth_allow && !z_write_disable) z_buffer[ z_buff_addr ] <= z_in;
end
But I don't think it can clear both halves of the array at once, and infer that as two BRAMs.
Quartus isn't quite smart enough to infer that, so it's probably trying to implement the Z-buffer using all REGISTERS.
Which is amazingly wasteful.
I genuinely think that waiting for Quartus compiles is harder than actually writing a core in the first place.
I reverted that code, and re-ran the compile. Wasting time doing it this way.
I need to do everything on the sim first.
It was the Codebook cache hogging all of the registers.
I've started slowly adding other parts of the code.
The CB cache can be fixed last.
Really strange, but the problem might actually be in the texture_address module.
I'll get there in the end. I just need to keep modifying the code until it compiles.
Then I can figure out which parts it's OK with.
If I can just get the Tag buffer stuff working (and fitting), that would be awesome.
The Codebook cache improved the sim frame times by about 30-100%...
This is painfully slow, but I can't think of how else to approach it.
It's rendering the same as before atm, with some template code added.
Trying the Codebook cache next...
Been loving all the dreamcast core work @rain obsidian, Dont understand most of it, but keep pushing!
Thanks, @steady grail 😉
It's mostly a constant battle of will power. lol
ie. I know most of what needs to be done now, and how I want it to work.
But struggling with Quartus compiles, as all of the code changes are quite complex, and quite a lot needs to be modified to fit the FPGA.
Failed with the same error again.
So either it's just the Codebook cache needs to be re-written (probably), or it's allowing other logic to work now, which simply won't fit the FPGA.
Back to the old non-cached Codebook for now.
The more important thing, is to get the Param buffer and Tag buffer stuff working.
That's where the biggest speed-up is.
I might have to buy yet another new FPGA dev board soon.
It makes sense to use a larger FPGA first, so you can see what fits, and what needs improving / trimming / culling.
I honestly don't think the speed-up logic needs THAT much.
The buffers are pretty much just some small-ish BRAM blocks with a bit of logic around them.
Another example of why the DDR latency sucks...
And that's not even fetching texels nor rendering pixels at that point. lol
It's only reading in the vertex params for the first polygon.
It might be tricky to figure out burst transfers for that, since the number of Words it read is variable, based on various flags in the first few words.
Found one of my main screw-ups...
reg isp_switch;
always @(posedge clock or negedge reset_n)
if (!reset_n) begin
isp_switch <= 1'b0;
end
else begin
if (render_poly || render_to_tile) isp_switch <= 1'b1;
if (poly_drawn || tile_accum_done) isp_switch <= 1'b0;
end
assign vram_addr = (isp_switch) ? isp_vram_addr_out : ra_vram_addr;
assign vram_rd = (isp_switch) ? isp_vram_rd : ra_vram_rd;
assign vram_wr = (isp_switch) ? isp_vram_wr : ra_vram_wr;
assign vram_dout = isp_vram_dout;
I don't have a proper arbiter for access to memory yet.
ie. where each module asserts a Request signal, then waits for access.
I just have the isp_switch flag.
It starts off as 0 after reset.
Which lets the RA (Region Array) parser read from VRAM.
The RA then pulses render_poly or render_to_tile, which flips isp_switch to 1.
That allows the signals from the ISP to control VRAM access.
Then, either the poly_drawn or tile_accum_done pulse from the ISP parser flips isp_switch back, so the RA has control again.
The problem was...
"render_to_tile" and "tile_accum_done" weren't added originally. Oops. lol
Those were extra signals I added, for the new Tag buffer (Hidden Surface Removal) stuff.
It didn't break things in the sim, because the sim C code can always supply the data to the Verilog model instantly.
So, when the ISP parser was immediately trying to assert vram_rd, the isp_switch flag was still at 0.
So the DDR controller never saw the Read request, so would obviously never reply by setting DDRAM_DOUT_READY.
(Data out Ready).
I'm probably on the 7th Quartus compile of the night.
I NEED to get this further along now. It's getting ridiculous. lol
I guess I could just do some of those signals as an OR.
But I tried that last night, and it didn't work, 'cos other stuff was broken.
Great.
It's back to drawing one tile of garbage, in the corner of the screen again. lol
Aaaand, I was missing another signal.
No wonder people get burnout.
"tile_accum_done" was being output by the isp_parser module, and the input existed on the ra_parser as well.
But the output from the ISP parser wasn't being used. sigh
Compiles are taking about 27 minutes atm, and using 72% of the FPGA.
A bug that's a bit too complex to explain. lol
Basically, the ISP parser was triggering one extra vram_rd pulse on the final pixel of the tile.
But it was jumping back to idle state at the same time, which switched access of VRAM back to the RA parser.
So the RA parser was receiving data that the ISP had requested, which screws up everything else down the line.
Progress.
Ironically, some of the textures are drawn, but flat-shaded stuff is wrong.
Found a few other bugs.
The Z buffer wasn't being cleared before each new tile.
And the depth_compare modules were getting the full output from the Z buffer, but that includes the prim_tag and Z value.
So the compare probably wasn't working right.
On the sim, the Z value is 48-bit.
On the FPGA atm, it's set to be only 32-bit.
assign prim_tag_out = z_prim_out[40:32];
assign z_out = z_prim_out[31:0];
I was using the full z_prim_out for the depth compare, basically.
Renders still look the same.
Using 77% of the FPGA now.
17% BRAM (on-chip mem)
Quite a few of the renders get stuck atm.
It does have some kind of structure in the models, but the colours are obviously flooding whole tiles atm.
isp_state runs up to state 49, reading in the vertex params from VRAM.
prim_tag increments for each primitive (which can be a collection of triangles).
pcache_write, writes the vertex params for the triangle to the buffer.
The short burst at the bottom of the screen is when it's doing the inTri and Z-compare on the new triangle, vs what's in the Z-buffer from the previous triangle.
It does that on 32 pixels at once, as it's the only way they would have got the speed necessary.
So there are 32 depth_compare modules.
And the inTri module also does the edge equation stuff on 32 pixels at once, so a whole tile row.
So it can quickly check down all 32 rows, and will write the prim_tag number to any "pixels" in the buffer, if those pixels pass the Z-compare and inTri checks.
wire [31:0] z_write_allow = (inTri & depth_allow) | {32{z_clear_busy}}; // inTri & depth_allow Bitwise AND.
inTri is 32-bits.
Each bit represents whether a pixel of the triangle exists in each column of the current tile row.
depth_allow is also 32-bits, and comes from the 32 depth compare instances.
Which is just this...
always @(*) begin
case (depth_comp)
0: depth_allow = 0; // Never.
1: depth_allow = (IP_Z < old_z); // Less.
2: depth_allow = (IP_Z == old_z); // Equal.
3: depth_allow = (IP_Z <= old_z); // Less or Equal
4: depth_allow = (IP_Z > old_z); // Greater.
5: depth_allow = (IP_Z != old_z); // Not Equal.
6: depth_allow = (IP_Z >= old_z); // Greater or Equal.
7: depth_allow = 1; // Always
endcase
end
old_z is the previous Z value read from the Z-buffer, which was written by the previous triangle (or not).
IP_Z is the new incoming Z value, from the Z-interp module. That is also calculated for all 32 pixels at once. I forgot about that. lol
There is an issue, though, now that I've implemented the Z/Tag buffer using BRAM.
I need to read the OLD values from the Z-buffer, do the depth compare, then write the new values back to the same row.
But there's usually a delay of one clock cycle to get valid data out of BRAM, after the address (row) changes.
You can choose to disable the "Registering" of the output data on the BRAM block.
Which means it will spit out data as quickly as it can from the SRAM cells.
But it's not ideal to not register that data, because it takes finite time for the values to "settle", and all of the bits to become valid.
I might get away with it, though, as I'm constantly incrementing the address each time.
But then it would have incremented to the next address after doing the compare on the previous clock.
The BRAM for the Param buffer isn't so timing-critical.
Well, it is. lol
But it means I'm having to add extra clock cycles when new params need to be loaded from the buffer.
I wish I knew how to draw decent diagrams of this stuff. I might have to ask a dumb AI again.
The isp_parser code from the FPGA, running on the sim...
But with a register-based Z-buffer, for obvious reasons. lol
The fact it's rendering at all on the FPGA, and with somewhat correct colours.
And the fact the isp_parser code is mostly doing something.
Points to an issue with the Z buffer, I guess.
The prim_tag and Z values are stored alongside eachother. So the Z-buffer is dual-duty.
Testing it with the register-based Z buffer in Quartus, but it will probably explode.
Viewing part of the param buffer.
It tries to render parts of some scenes.
The right-hand part, I mean. lol
So I'm confident the parameter buffer is working OK.
It has to be a Z buffer issue.
I think it is just the fact that writing to the Z buffer might need to be done in pairs of clock cycles.
Like a read-check-write thing.
Added an extra state, so it has chance to do the Z-compare first.
Then does the write in isp_state 90.
// Write triangle spans to Z / Tag buffer, checking 32 "pixels" at once for inTri AND depth_compare.
50: begin
if (y_ps[4:0]==5'd31) begin
isp_vram_addr <= isp_vram_addr_last;
isp_state <= 8'd48; // Loop, to check next PRIM.
end
else isp_state <= 8'd90;
end
90: begin
y_ps <= y_ps + 11'd1;
isp_state <= 8'd50; // Z-buff write is allowed in this state. Jump back.
end
wire trig_z_row_write = isp_state==8'd90 && y_ps<=(tiley_start+31) && !z_write_disable;
I've also added the Z buffer mem to the In-System Memory Content Editor.
It's hard to debug all of this on the sim. I just have to keep going with the compiles.
Some renders aren’t completely terrible.
Yeah, I think it has to be the Z-buffer thing.
So it's doing the compare, but almost always ending up as "true".
So forces a write to all pixels in each row.
'cos the same colour (or texture) generally stays within each tile.
So it's the same tag value(s) within each tile, too.
I think the param buffer is working OK, at least.
Just to show the terrible state of the "core" so far. ^
And I have to mention this every time - the frame(s) you are seeing as the VRAM dump loads, are from reicast.
The FPGA logic then overwrites that frame, badly. lol
Trying to figure out why things are shifted, though.
Maybe also from it grabbing the wrong Z values for everything.
It's probably comparing the Z values from the previous row, to the current row.
Then writing to the current, and incrementing again.
I can't figure out what that would look like, but probably something like the above.
Definitely better, but still shifted slightly.
Could just be a lack of Z precision as well atm. I kept it as 32-bit values instead of 48-bit, to try to save on logic.
Tag / Z-buffer stuffs
The prim_tag is in the first three nibbles.
Then the Z values in the lower eight nibbles (32-bits).
The inTri calc code from Quartus works fine in the sim, so it's not that.
The interp block, not so much. But then it's meant for 32-bit instead of 48-bit...
Still compiling.
Trying the sim Interp code on the FPGA.
Since there was obviously a big difference. lol
I trimmed down the widths of the registers / wires in the interp block.
So it still gave (mostly) decent renders in the sim, but would save logic on the FPGA.
With lots of notes.
About the same as before.
But pretty much ruled out the interp block(s) now.
And sort of ruled out the inTri block, as I tested that on the sim.
But...
The sim is using 48-bit values for a lot of stuff, and the FPGA version mostly 32-bit.
So could just be that, but then it was rendering quite well before, when it was only processing one pixel at a time.
(it still does process one pixel at a time for the final output, but now it does it after the Hidden Surface Removal, using the Tag buffer.)
And the sim is also doing a Clear Z at the start of each tile, so it's not that.
The Sonic frame only renders in about 750 milliseconds atm.
And the core is only running at 16 MHz again, so I could be more sure about the timings working.
That has to be a Z-compare issue really.
I should really be happy that I got this far last night, vs it not doing anything at all. lol
Aaand, I think I just found the issue...
These are the 32 new BRAM blocks for the Z / Prim buffer.
I have the logic in place for the counter that Clears the Z (and prim) values to zero, at the start of each Tile.
But, I'm missing the mux, which actually routes the "zeros" into the BRAM blocks. lol
That would obviously explain most of the problem.
Gonna shove some muxes in there.
Can't really get away with fewer muxes, unless I do a kind of AND with a bitwise invert.
Yep, 32 muxes has surely got to use less logic than a crap-ton of bitwise logic.
Also trying this...
// Write triangle spans to Z / Tag buffer, checking 32 "pixels" at once for inTri AND depth_compare.
50: begin
if (y_ps[4:0]==5'd31) begin
isp_vram_addr <= isp_vram_addr_last;
isp_state <= 8'd48; // Loop, to check next PRIM.
end
else isp_state <= 8'd90;
end
90: begin // Z-buff write is allowed in this state.
isp_state <= isp_state + 8'd1;
end
91: begin
y_ps[4:0] <= y_ps[4:0] + 5'd1;
isp_state <= 8'd50; // Jump back.
end
y_ps starts off at the top of the Tile.
So bits [4:0] should be zero when it first gets to state 50.
It has to do a read of the Z / Tag buffer first, so it can do the depth-compare on the old Z values.
So it does the read in state 50, then can do the write-back in state 90 (which is triggered by other logic elsewhere).
Then in state 91, it increments y_ps, then jumps back to state 50.
It won't be incrementing y_ps in states 50 and 90, so that should be correct for doing the writeback to the same row of Z values.
(y_ps[4:0] is used to select the row in the Z / Tag buffer.)
But the main thing is the Z buffer should get cleared properly now. That's if the design fits.
I think I have a solution for saving at least one clock cycle above.
Just storing the Z values for the current row in some regs.
So then they would be delayed by one clock cycle.
Which means it can do the depth compare on those values, vs the new incoming ones.
Anyway, compiling.
Might be asleep soon. I'll post here if the renders improve.
Nope, that won't work either.
Because it's using BRAMs now, you can't have two different addreses at the same time.
So I might have to change it to use Dual-ported RAMs.
Almost the same, just without as many gaps.
Oh well.
Definitely leaving it there for today, I've been compiling and testing all night again.
It is drawing textures mostly OK.
The background texture on the menu was always a bit weird, due to a separate issue.
It is almost like the inTri test is always giving "True" for all the bits within each tile.
Anyway, I'll figure it out.
Still awake, again.
Found the bug with it not rendering all the tests.
Now that it's hooked up to the DDR controller, I have to wait for DDRAM_DOUT_VALID, after requesting a Read of each word.
I forgot to do that for when it reads the VQ Codebook, in the actual tile rendering state
So anything which uses VQ textures was getting stuck, which a LOT of games use.
Not that it fixes the main problem with the garbage tiles, but I'll be working on that later.
Rest, you must
Yoda is not fond of seagulls. Full-length version of the song first seen here: https://youtu.be/UkiI2vM2lfA
ITUNES: http://apple.co/2gmEqCi AMAZON: http://a.co/4bgZBa2 Google Play coming soon!
Like on Facebook! http://www.facebook.com/badlipreading
Follow on Twitter! http://twitter.com/badlipreading
nice progress!!!!!!!!!!
Been a while since I watched this it’s so damn catchy
Back again.
The render from the last compile…
It must be copying the previous row in the Z buffer to the next row. Freaky hall-of-mirrors style thing.
Anything that uses VQ textures is way worse…
One of the main bugs before...
wire [31:0] z_write_allow = (inTri & depth_allow) | {32{z_clear_busy}}; // inTri & depth_allow Bitwise AND.
wire write_allow = (trig_z_row_write | z_clear_busy);
I had that write_allow signal, which is kind of the master enable for writes.
trig_z_row_write goes High during isp_state==50, or at least it used to.
That gets OR'ed with z_clear_busy, so I can force writes during Z-clear (zeroing).
But, I somehow forgot to actually use that write_allow signal when hooking up the BRAMs. lol
So it was always allowed to write to the BRAMs via the z_write_allow bits.
The fix...
wire [31:0] z_write_allow = (inTri & depth_allow & {32{trig_z_row_write | z_clear_busy}}); // inTri & depth_allow Bitwise AND.
The weird {32{ }} thing, is a replicating operator.
So it does the OR of the trig_z_row_write and z_clear_busy flags first, then replicates it 32 times.
So it can then be AND'ed with the 32 inTri and 32 depth_allow bits.
The issue now, is how it's actually doing the row select for the Z buffer.
And the whole read-modify-write thing is wrong.
(Z buffer / prim_tag buffer. Same thing. The values are merged, and written to the same "pixel" in the buffer.)
I love seeing these updates, but I wish I understood any of it lol
I tried the old sim code version of the Z buffer update…
And it’s an even worse hall-of-mirrors thing, but also “cleaner”.
I have to be very clear on that each time - but the good half of the screen was just the pre-existing render from reicast.
If I don't mention that when people are scanning past, some people lose their minds, and start doing YT vids on it. lol
But I know that you know. 😉
Oh no i get that, Still just to see it producing a render is nice
OK, that's weird.
When I try running the Menu render again, it's all completely messed up.
Which is how most other renders look atm.
But it shouldn't do that after a Reset.
So something isn't being reset correctly in the core.
I don't think the FPGA clears the BRAMs to zero at start-up either, but I'll check that.
OK, even weirder.
I re-loaded the core (from SignalFlap / USB JTAG), and now both Menu and Menu 2 render "OK" again.
Mem screen renders look way worse than you'd expect.
So it's like - if the second thing you load is similar to the first (Menu and Menu 2), it renders "OK".
Larger textured areas are still somewhat OK.
But I'm not 100% sure that's even overwriting the reicast render in that section. lol
The logical thing, is that the Tags are being messed up.
The Z buffer values being corrupted isn't quite as bad, if it's only copying from the previous row etc.
It would likely give the weird hall-of-mirrors thing.
Well, actually mainly the Tags would do that.
Might need to separate the Tag buffer and Z-Buffer.
It has to read the existing Z values for each row, then update those Z values but also the Tags at the same time.
(for any row which contains some pixels of the new triangle, it will write both the Tags and the new Z values for those pixels.)
inTri bits are High, for any tile row pixels which the triangle spans.
depth_compare bits are High, of course only for triangle pixels where the depth-compare result is True.
(It generally starts off with the Tag and Z values all cleared in the buffer, at the start of rendering each new tile.)
Then it writes the Tags and Z values for the first triangle/prim.
Since it's comparing the Z values of that first triangle to "zeros" in the buffer, it will always draw that whole triangle.
Technically, I'm supposed to be rendering the Background poly first, but I haven't quite got that working yet. lol
The PVR2 has two registers, which give you the VRAM address for where to find the Background poly params.
And the second reg to give the depth of that poly.
I think rendering the Background poly might even be mandatory, as it basically helps clear the Z/Tag buffer, and also sets the starting minimum depth value for all tile pixels.
Yep, it's definitely screwing up completely, if I load a different VRAM dump, then re-load the Menu one.
Can only really be doing that, if the Z buff isn't being cleared.
Clears it OK there, or seems to.
I'll have to just implement the FPGA version of the Z buffer in registers, so I can test it in the sim.
Should be quite easy to do.
z_mem z_mem_inst_0 ( .clock( clock ), .data( (z_clear_busy) ? 42'd0 : {prim_tag_in,z_in_col_0} ), .address( row_sel_mux ), .wren( z_write_allow[0 ] ), .q( z_col_0 ) );
32 copies of that.
Which is just a 1-port RAM block.
With a clock, and registered output data.
It has 42-bit wide data in/out busses.
"data" is the input bus, "q" is the output bus. Don't ask. lol
wren is ofc the Write Enable input.
And address selects which one of the 32 entries.
module z_mem_thing (
input clock,
input [41:0] data,
input [4:0] address,
input wren,
output reg [41:0] q
);
reg [41:0] z_mem [0:31];
always @(posedge clock) begin
if (wren) z_mem[ address ] <= data;
q <= z_mem[ address ];
end
endmodule
That's, pretty much it. lol
reg [41:0] z_mem [0:31];
Just says to make a 2D array of registers, each register is 42 bits wide, and there are 32 of them.
42 bits, because I'm currently using a 10-bit prim_tag value, and 32-bit Z value.
So those just get concatenated together, on the "data" input port.
{prim_tag_in, z_in_col_0}
input [9:0] prim_tag_in, // prim_tag
input [31:0] z_in_col_0, // IP_Z[0]
Well, whadya know. lol
Verilator RULEZ, man.
It's not exactly the same, but close enough to the same problem.
Imagine how long it would take me to debug this, with (now) 29-minute Quartus compiles. :p
Tweaked the logic for Z-clear, and yeah...
Missing the last line of each tile, but I don't care.
old...
wire [31:0] z_write_allow = (inTri & depth_allow & {32{trig_z_row_write | z_clear_busy}});
New...
wire [31:0] z_write_allow = (z_clear_busy) ? 32'hffffffff : (inTri & depth_allow & {32{trig_z_row_write}});
So that FORCES all z_write_allow bits High, when z_clear_busy is High.
Else, it does the other stuff. lol
I wasn't sure about how the replication of the trig_z_row_write OR'ed with z_clear_busy would work before.
I guess it doesn't.
The new code is a bit more human-readable anyway.
And, guess what... another Quartus compile.
And it really is taking ~29 minutes now. Mah poor 3700X.
Zoom in, helps you see that it's the first row of each tile which isn't being rendered / written correctly.
So probably just this off-by-1 thing...
// Write triangle spans to Tag buffer, checking 32 "pixels" at once for inTri AND depth_compare.
50: begin
y_ps <= y_ps + 11'd1;
if (y_ps[4:0]==5'd31) begin
isp_vram_addr <= isp_vram_addr_last;
isp_state <= 8'd48; // Loop, to check next PRIM.
end
end
wire trig_z_row_write = isp_state==8'd50 && (y_ps<=(tiley_start+31)) && !z_write_disable;
When trig_z_row_write is true, and isp_state==50, y_ps has already incremented by 1.
Sort of.
Or maybe not. It's one of the most confusing things about HDL.
Because many times, you don't see the result of a calc until the next clock cycle.
When using registers / clocked logic, that is.
The trig_z_row_write thing is a pure combinational logic thing.
But all of it's inputs are from registered logic.
Slightly clearer code from earlier...
old...
wire [31:0] z_write_allow = (z_clear_busy) ? 32'hffffffff : (inTri & depth_allow & {32{trig_z_row_write}});
new...
wire [31:0] z_write_allow = (z_clear_busy) ? 32'hffffffff :
(!trig_z_row_write) ? 32'h00000000 :
(inTri & depth_allow);
If-Else (conditional) operator. So useful.
If it's doing a Z-Clear, force all z_write_allow bits High (0xFFFFFFFF) Else...
If trig_z_row_write is Low, force all bits Low, Else...
Allow the bitwise AND, of the 32 inTri and 32 depth_allow bits.
The gaps between triangles are back. Oh well.
Some textures kinda disappeared. lol
The Gap Between Triangles is the name of my prog rock band
Nice.
Between Two Ferns (tm)
#netflix #funnyordie #happy
Between Two Ferns with Zach Galifianakis is an American talk show hosted by comedian Zach Galifianakis which features celebrity guests. Episodes last several minutes, in which the interviewer (Galifianakis) and guests trade barbs and insults. In addition to the online series, there is a Comedy Central television spec...
It VERKS !…
Mostly…
Progress.
Doing the git commit.
Still won't render very fast, until DDR burst transfers are added.
In fact, the Daytona render is taking about 7 seconds atm, at 16 MHz.
Considering it takes roughly 6-7 clock cycles to access any random Word in DDR3, for EVERY single pixel.
You can imagine the speed-up when it can burst transfer whole chunks of say 128 Words at once, on every clock.
640x480 = 307,200 pixels.
6 clocks (generous) per pixel atm = 1,843,200 clock cycles.
At 16 MHz, each clock tick is 62.5 ns.
So 115,200,000ns for the full frame, or 115.2ms.
Clearly it's not even THAT fast yet. lol
Anywho, yeah - a ridiculously long time to read each texel, and process each pixel, due to DDR3 latency.
Obvious issue with every-other Z buffer row, too.
sim looks OK, in comparison.
I think that's the whole read-modify-write thing, for the Z buf.
Simpler scenes look OK.
Definitely rendering the polys, but hidden behind blue eyes... I mean behind the lines.
I found another big bug.
The original code was doing "direct" rendering, on each triangle/primitive at a time.
The new code does the deferred rendering thing, of processing the triangles into Tags and Z-values first.
Then it renders the pixels, by fetching the texels, or doing the colour shading etc.
And, if the Texture and VQ bits are both set, and if the texture base address has changed vs the previous time, it also reads the Codebook from VRAM.
But I still had the old code in place, which was reading the Codebook for every triangle.
Which is wrong, because that would probably have been reading in the Codebook for textures that don't tally with the current Tag in the tag buffer.
Also didn't help that I wasn't always allowing the Texture or Codebook address to be output to DDRAM either.
@rain obsidian Take a break man, stellar work, but enjoy other things in life! 🙂
Yeah, I should take a break, really.
I do get overly-obsessed sometimes.
But I feel like I could really do something with this.
It is starting to affect my health, which isn't good.
I don't think I've even left the house in about three months.
(I don't often go out anyway, but you know what I mean.)
This has been running for 46 minutes now.
And that message means it probably won't finish compiling, as it's struggling to fit the FPGA.
Which also usually means - I'm getting closer to stuff working. lol
Because the more bugs fixed, the more the rest of the logic starts kicking in.
I need to figure out why it's ditching every other line.
Sim renders quite well, with the same exact z_buff module...
And with the same param_buffer code.
Much better.
Found some more dumb bugs.
Let’s just pretend this one doesn’t exist…
76% of the FPGA logic used.
24% of the on-chip mem.
ASCAL (the HDMI upscaler module) uses about 6,690 Logic Elements, out of the total 110,000.
So that's really very efficient.
Karen isn’t looking too bad…
I think the gaps between the polys are the clock delay between reading and writing back the Z values / Tags.
As it can't really happen during the final rendering of each tile, because that always renders the full tile.
(ignorring any other bugs with missing or overdrawn lines. lol)
Most of the horiz lines fixed.
Old...
New...
But some vert lines maybe a bit worse. lol
Dayum.
It's like Verilog Whack-A-Mole.
I thought it was The Gap between my Ears
Debugging the FPGA stuff in the sim again...
I NEED to merge the projects soon, somehow.
16-bit textures are fine in the sim.
But VQ is messed up.
Erm, probably need to wordswap the data.
wire [63:0] isp_vram_din = vram_din;
//wire [31:0] isp_vram_din = (!isp_vram_addr_out[22]) ? vram_din[31:0] : vram_din[63:32];
In the sim, it supplies the full 64-bit data for ISP param reads.
Can't remember how that even works.
On the FPGA, it loads the 8MB VRAM dumps as two halves.
Writing each half into the lower then upper 32-bits of each 64-bit word.
The sim doesn't currently do that, so it's another thing on the todo list.
CRIZY TIXY
Sponsored by Gillette.
twelve oclock shadow
Just a lot of smaller polys, yep.
So the Tag changes more quickly, and there is a clock delay / timing bug atm.
So it can't update the params fast enough, to grab the correct shading colour etc.
Or just a processing delay, meaning the colour is getting updated one or two clocks after the actual framebuffer write pulse.
still dreamcast
The parts that rendered, looks good though, heh
Who doesn’t love flipped normals? Heh
I'm in another limbo state atm.
Even in the sim, it can be tricky sometimes, figuring out how to debug it.
So it's WAYYY harder to debug on the FPGA via SignalTap.
atm, this is happening.
And I don't quite get why, but likely to do with the Codebook stuff.
This was after I pasted in most of the FPGA code into the sim.
They're not 1:1, though. I have quite a lot of work to do, to make them both the same.
Then I can finally merge the projects, and just have the sim in a folder alongside the MiSTer "core".
That frame took so long to render now, it would only hit around 14 FPS at 100 MHz. lol
With a fake DDR latency delay added, of only ONE clock cycle.
The real FPGA/DDR has typical latencies of around 6-8 clock cycles, which is why it sucks atm.
(the sim version is also reading the Codebook for every single pixel that uses VQ textures atm, so that's also super slow.)
I just re-enabled the checks for whether the pixels are textured, or using VQ.
And now the frame time is a bit lower, and hits 15.4 FPS.
But the missing textures for all of the mountains? That's very weird.
An aside Ash, did you ever try get the Jag CD BIOS to load on the core? Am idly curious if it showed any signs of life
Don't think I ever tried it?
I tried the CD32 BIOS on the Minimig core a while back.
All I had to do there, was to either tweak the main MiSTer code, or swap the two halves of the ROM around, or something.
Since the code in the Minimig core didn't support loading the CD32 BIOS by default, due to how the banks were arranged.
The Jag CD BIOS might do something, but it depends if it locks up if there's no CD hardware in place.
Yeah that is what I was thinking too, if may be looking for the CD and just not do anything, but maybe it shows something
electronash@Ryzen7:/mnt/c/linux_temp/Jaguar_MiSTer$ sshpass -p '1' scp jag_cd_bios.bin [email protected]:/media/fat/Gam
es/Jaguar
Doesn't do nowt.
I tried loading it as a "Cart" at first.
Which gave the expected red screen on the main Jag BIOS.
Then tried copying it as a .rom, then loading as a BIOS, and just a black screen.
I'm not sure how or where the CD BIOS would get mapped atm.
Oh man, you caught me at a time where I'm not intending to do much on DC tonight. lol
So I might take a brief look at the Jag CD thing.
Hah, well whatever tickles your fancy, I was just curious if it showed any signs of life and looks like it doesn't
Weird seeing some of my comments, from about four years ago. lol
The original Jag core didn't even load a "full" BIOS.
Only Greg's very trimmed-down tiny bit of code, to just launch the cart code.
And I don't think it was able to launch most homebrew at the time, as those often had a different start address.
The start address is usually read from the cart header.
CD-Rom emulation, chip codename Butch (the HW engineer was definitely obsessed with T&J somehow ...)
void jaguarcd_state::jagcd_gpu_dsp_map(address_map &map)
{
console_base_gpu_map(map);
map(0x800000, 0x83ffff).r(FUNC(jaguarcd_state::cd_bios_r));
map(0xdfff00, 0xdfff3f).rw(FUNC(jaguarcd_state::butch_regs_r), FUNC(jaguarcd_state::butch_regs_w));
}
void jaguar_state::m68020_map(address_map &map)
{
map(0x000000, 0x7fffff).ram().share("sharedram");
map(0x800000, 0x9fffff).rom().region("maincpu", 0);
map(0xa00000, 0xa1ffff).ram().share("mainram");
Trying to figure out where the CD BIOS needs to get loaded.
It doesn't look like it has a normal Cart header, put it that way.
I searched for a 68K NOP (0x4E71), and they exist in the CD BIOS.
ie. the endianess seems to be correct for loading as-is.
But I don't know if the CD BIOS overrides the normal BIOS, or if the CD BIOS gets mapped higher up in memory, by virtue of some of the cart address pins?
The code above, suggests it just gets loaded in place of the normal BIOS.
Anywho, I'll just hook up SignalFlap, and see what it shows.
This isn't even compiling now. No idea what I did.
OK, yep. Wrong folder. lol
The recent Jag core rework is in the root of my C drive.
And apparently I already had SnargleFlap set up. So this might be an even older core.
GR has some quite a bit on it in the last week
Ahh, OK.
I might as well abandon this one, then. It might not even boot. lol
Looks like it's the ancient core.
Oh, still the wrong folder. sigh
No idea what my brain is doing tonight.
Doesn't boot.
Grabbing the latest code.
I just deleted the two very old Jag cores from my github.
No point keeping any of that now.
The latest core by GR and kitrinx seems to have a lot of changes / fixes anyway.
So I forked the "rework" branch from GR's, and will work from that.
Need to compile first. I kept my old SignalFlap file.
Has a small EEPROM on it, too, near the lower-right.
93C46
Similar/same as on ST-V.
Well, probably in a fair few Jag carts, too? lol
Can't quite remember.
I probably shouldn't say this so early-on, but there really aren't THAT many commands for the CD stuff...
void jaguarcd_state::butch_regs_w(offs_t offset, uint32_t data, uint32_t mem_mask)
Hah, that's interesting it shows a sign of life there
map(0xdfff00, 0xdfff3f).rw(FUNC(jaguarcd_state::butch_regs_r16), FUNC(jaguarcd_state::butch_regs_w16));
Apparently that's the only address (range) for where the Butch regs are mapped.
Which makes things a lot easier. lol
(easier, from the POV of research. Actually getting it working is another matter.)
Normal BIOS, has the 68K Reset Vector...
Which is where MAME jumps to at start-up.
But the Jag CD BIOS doesn't seem to have that, and gets mapped at 0x802000, at start-up...
I think that probably is a Cart header, after all.
But it gives the red screen of death, if loaded like a Cart in the core atm.
New core compiled...
But it's unstable.
Logo didn't show up, the second time I loaded it.
sigh
That's REALLYYYY annoying to work on.
Can't rely on the result, then can't rely on anything.
I'll just have to compile again, with tweaked settings, and some stuff removed from SignalFap.
All I wanted to do, was see if the CD BIOS did anything, or tried to read any of the Butch regs. lol
It's funny how even the Jag CD still shares a few things with "newer" CD/DVD drives.
The "DALAS" chip (or family) was used for many years, for the RF frontend.
Including on the Marantz DVD board that's sat on the desk in front of me.
I guess the RF amp stuff didn't need to change much, over the years.
We don't have to worry about any of that for Jag CD on the core, though. We're only interested in the higher-level commands, and loading the data in.
There's ya CD BIOS.
256KB
Hard to spot the extra data line at first, but it goes to pin 30.
Seems to have it's own specific Chip Select. hmm
Maybe that's the ROMCSL1 thing, in the core.
I mean this with complete sincerity and with complete respect but how the hell do you have this much energy.
It’s like you’re going 1000 miles a minute just cranking through shit lol.
lol
It's just how I work - I post stuff on here mainly because it also helps me keep track.
And it's easier for me to scroll through after, to look at the diagrams etc.
I'm a big fan of copy n paste. 😛
void jaguar_state::console_base_map(address_map &map)
{
map(0x000000, 0x1fffff).mirror(0x200000).rw(FUNC(jaguar_state::shared_ram_r16), FUNC(jaguar_state::shared_ram_w16));
map(0xe00000, 0xe1ffff).rom().region("mainrom", 0);
// CD map...
console_base_map(map);
map(0x800000, 0x83ffff).rom().region("cdbios", 0);
map(0xdfff00, 0xdfff3f).rw(FUNC(jaguarcd_state::butch_regs_r16), FUNC(jaguarcd_state::butch_regs_w16));
ROM_START( jaguarcd )
ROM_REGION16_BE( 0x20000, "mainrom", 0 )
ROM_LOAD16_WORD( "jagboot.rom", 0x00000, 0x20000, CRC(fb731aaa) SHA1(f8991b0c385f4e5002fa2a7e2f5e61e8c5213356) )
ROM_REGION32_BE( 0x600000, "cart", ROMREGION_ERASE00 )
// TODO: cart needs to be removed (CD BIOS runs in the cart space)
ROM_REGION16_BE(0x40000, "cdbios", 0 )
ROM_SYSTEM_BIOS( 0, "default", "Jaguar CD" )
ROMX_LOAD( "jag_cd.bin", 0x00000, 0x040000, CRC(687068d5) SHA1(73883e7a6e9b132452436f7ab1aeaeb0776428e5), ROM_GROUPWORD | ROM_BIOS(0) )
ROM_SYSTEM_BIOS( 1, "dev", "Jaguar Developer CD" )
ROMX_LOAD( "jagdevcd.bin", 0x00000, 0x040000, CRC(55a0669c) SHA1(d61b7b5912118f114ef00cf44966a5ef62e455a5), ROM_GROUPWORD | ROM_BIOS(1) )
ROM_REGION16_BE( 0x1000, "waverom", 0 )
ROM_LOAD16_WORD("jagwave.rom", 0x0000, 0x1000, CRC(7a25ee5b) SHA1(58117e11fd6478c521fbd3fdbe157f39567552f0) )
ROM_END
Oh yep. There's the jump vector, in the Jag CD BIOS.
I still don't get how it's loaded, though.
Whether it gets mapped like a normal cart, or not.
Can't tell until the fecking core works. grrrr
You're tearing me apart, Quartus.
This core is haunted.
Is Quartus struggling to compile?
Not struggling, just giving crappy unstable builds again.
But this core might always have that trouble, until somebody can do proper timing closure on it.
And I have no idea how to do that.
You can do a "Report Top Failing Paths" in TimeQuest.
But I don't yet know how to fix most of the issues it suggests.
Timing constraints stuff is hard to learn.
Ah yeah, my understanding is that is still a big issue getting a stable core made
A lot of the issue is also due to how the core was "written".
Greg wrote scripts, to help translate the original chip Netlists into Verilog.
Some some interpretation and extra glue logic added.
Which means the code is NOT very human-readable.
And apparently having the Netlists was never a magic bullet. lol
First part of boot. The 68K reads the Reset Vector from the second 32-bit word in the BIOS ROM.
Are you just needing a "stable" build of GR's current core as is?
Then jumps to that addr, which is usually 0xE00000.
And that uses xromcsl_0, as the Chip Select.
(csl = Chip Select, active-Low)
Core is STILL unstable.
Don't know if I can work on it, when it's that bad.
It won't boot again.
Sometimes it's even because SignalTap is enabled, and part of the core, which makes it unstable again.
Can never quite tell.
Completely crashes, reading random junk.
I just needed TEN minutes, to try to capture a CD thing. sigh. lol
SO annoying.
Main BIOS reading the Cart header.
So I reckon the CD BIOS does just work like a Cart.
But gets disabled if a Cart is inserted, if that makes sense.
I just needed a FEW times when the stupid core would work, and I could maybe have seen if the CD BIOS is even accessing anything. lol
You can see why dev of certain cores is very slow at times.
Multiple devs have worked on the Jag core, including Greg, myself, kitrinx, Greyrogue, and now Mazamars312.
And between us, we still can't get a stable core. lol
map(0xdfff00, 0xdfff3f).rw(FUNC(jaguarcd_state::butch_regs_r16)
Time to get Aggressive.
It consistently runs the main BIOS far enough to try to read the Cart header each time.
But clearly has a problem with stability, accessing the Tom and Jerry chips.
I can't even find the same Timing Recommendation thing on Google - that's how hard timing constraints are in Quartus. lol
Wow, that code was rough. lol
From way back when.
I thought most of my crappy code had been replaced by now, but apparently not.
We'll see if Murray has done any new tweaks soon, which can be back-ported to the MiSTer version.
I think he might already have done a MiSTer version, though, as he did the Single-SDRAM thing for the Pocket already.
And Pixel Ninja was showing the Jag core on the live stream. Not sure if that was only the Pocket version, or MiSTer as well?
Yep, so xromcsl_1 is asserted for CART reads.
So I do think the CD BIOS gets read just like a cart.
It just gets stuck if it can't read the regs on the Butch chip.
case 8: //DS DATA
switch((m_butch_regs[offset] & 0xff00) >> 8)
{
case 0x03: // Read TOC
{
if(!m_cdrom->exists()) // No disc
{
m_butch_cmd_response[0] = 0x400;
m_butch_regs[0] |= 0x2000;
m_butch_cmd_index = 0;
m_butch_cmd_size = 1;
return;
}
uint32_t jaguarcd_state::butch_regs_r(offs_t offset)
{
switch(offset*4)
{
case 8: //DS DATA
//m_butch_regs[0] &= ~0x2000;
return m_butch_cmd_response[(m_butch_cmd_index++) % m_butch_cmd_size];
}
return m_butch_regs[offset];
}
Command responses are just some small packets read back.
Else, it allows reading of the other regs.
If I can get even a half-stable core, I might be able to spoof enough of the regs to get the shiny thing to display.
Atari Jaguar CD Startup (No Power)
This is the Atari Jaguar startup when there is no power connected to the Jaguar CD. Visit our online store www.obsoletegaming.com
lol
Never knew the CD BIOS detects whether the power is plugged into the drive.
But what would be the point of that, wouldn’t it need power to even detect if it has power?
No, the logic stuff gets most of the power via the cart slot / Jaaaaag.
9V etc. needed for the motors / lens tracking coils, etc.
Sorry, guys.
Core still unstable.
Can't do it.
It's a piece of shit.
Can't even get it to boot now.
Takes almost 20 minutes per compile.
Gonna mess with MAME for a bit, then ragequit the Jag core for tonight.
No need to be sorry. Always fun to ride along on your work. Exciting
I just hate that I don't know how to fix the timing issues. lol
No worries dude. GR is making good progress, hopefully he can get it stable soon.
Yeah.
First accesses of the CD regs, in MAME.
[0x2c]: ? (used at start-up)
Oh thanks, MAME code.
So helpful.
The first time it acceses different regs, just before the logo appears...
Can't copy n paste from the MAME debugger. Never could.
[0x00]: irq register
(R)
-x-- ---- ---- ---- CD uncorrectable data error pending
--x- ---- ---- ---- Response from CD drive pending
---x ---- ---- ---- Command to CD drive pending
---- x--- ---- ---- Subcode data pending
---- -x-- ---- ---- Frame pending
---- --x- ---- ---- CD data FIFO half-full flag pending
(W)
---- ---- -x-- ---- CIRC failure irq
---- ---- --x- ---- CD module command RX buffer full irq
---- ---- ---x ---- CD module command TX buffer empty irq
---- ---- ---- x--- Enable pre-set subcode time-match found irq
---- ---- ---- -x-- Enable CD subcode frame-time irq
---- ---- ---- --x- Enable CD data FIFO half full irq
---- ---- ---- ---x set to enable irq
[0x04]: DSA control register
[0x0a]: DSA TX/RX data (sends commands with this)
[0x10]: I2S bus control register
[0x14]: CD subcode control register
[0x18]: Subcode data register A
[0x1C]: Subcode data register B
[0x20]: Subcode time and compare enable
[0x24]: I2S FIFO data
[0x28]: I2S FIFO data (old)
[0x2c]: ? (used at start-up)
Looks like it might write the commands to the upper byte of each word, which is weird.
case 8: //DS DATA
switch((m_butch_regs[offset] & 0xff00) >> 8)
{
Core broken.
Giving up.
Displayed the logo once, then never again.
With a few random colour dots on the screen.
I just needed ONE boot with the CD BIOS loaded as a Cart, but it wouldn't even get that far.
My old build from the 16th December runs fine.
But it doesn't have SnargleFlap enabled, so I can't use it to debug.
Will try disabling stuff like Power Up Don't Care.
As it seemed to help a bit, years ago.
"All I had to do there, was to....." You make it sound so sinple, and I know it certainly isn't!!
I mean to get to the next step. FPGA dev is a voyage of discovery. lol
The Jag core is so haunted, even the DSE (Design Space Explorer) doesn't like it.
Back to DC land for a while.
Trying to fix the gaps between the ears.
That screenshot is quite a good demo of the Tag buffer.
It's part-way through rendering the pixels, based on that Tag buffer.
You can see how each Tag number relates to each triangle in the orange thing.
There's no obvious gap between Tags 17 and 16 there.
So it must be just the actual pixel rendering stage that has the delay.
When it renders the pixels, using those Tags, it needs to fetch the vertex / shading / texturing params for each triangle VERY quickly.
So it doesn't have to waste say 24-43 clock cycles fetching those params from VRAM again.
Hence, the parameter buffer atm just blindly stores the params for ALL triangle tags that get written to the Tag buffer.
Even if some of those tag numbers won't appear in the Tag buffer by the end of processing, because they have been "overdrawn".
Which is kind of the whole point. lol
ie. That's how it does the hidden-surface removal, so it doesn't waste tons of time when actually rendering the pixels for each tile.
The Tag buffer (and Z buffer, combined) get processed as 32 pixels at a time, so that's where the main speed-up is.
It does the inTri calcs (triangle pixel visibility along each row), and depth-compare, on all 32 pixels at once.
So processing each triangle into the Tag buffer only takes 32 clock cycles, as it moves down each row.
The Tag buffer then represents exactly which pixels you need to render, and for which triangles, with zero "overdraw" after that point.
// Write triangle spans to Z / Tag buffer, checking 32 "pixels" at once for inTri AND depth_compare.
50: if (!z_clear_busy) begin
y_ps[4:0] <= y_ps[4:0] + 5'd1;
isp_state <= 8'd90;
end
That's all that is needed in the ISP state machine, to process each row of the Tag buffer.
Other logic elsewhere, contains the 32 bits of inTri, and 32 copies of the depth_compare module.
wire trig_z_row_write = (isp_state==8'd50) && !z_write_disable;
(it actually jumps to isp_state 90, 91, 92 in real code, to give some extra time for processing delays.)
Once all triangles within the current tile have been processed into the Tag buffer, the actual pixel rendering happens.
Which ended up being a far more complex bit of code than I wanted it to be. lol
Mainly due to the sucky latency of DDR3, and having to re-read the Codebook for any triangles that use VQ textures.
It also checks whether the Tag number has changed, as it traverses the pixels in each tile row.
(no need to keep fetching the triangle / shading / texturing params, if we're still within the same triangle.)
Also no need to keep fetching a new texel from VRAM, if the triangle doesn't have the "texture" flag set.
And that's pretty much it, but getting to this point took me about a year. lol
All modern GPUs use a similar tile-based approach, as it does make a lot of sense.
But they added tons of other fancy stuff, obviously, like fragment shaders.
IIRC, a "fragment", in the modern GPU sense, would be a bit like the Tag buffer, but referring only to one triangle, and containing ALL of the info to render that triangle.
A think a fragment even stores the per-pixel stuff for shading/texturing?
Fixed the gaps.
It was also just a processing delay...
// Wait for valid Texel data.
52: if (vram_valid) begin
isp_state <= isp_state + 1'd1;
end
53: begin
isp_state <= isp_state + 1'd1;
end
// Write the pixel to the Framebuffer.
54: if (!vram_wait) begin // Make sure DDRAM is NOT busy, before doing the Write.
if (y_ps[4:0]==5'd31 && x_ps[4:0]==5'd31) begin // On the last (lower-right) pixel of the tile...
x_ps[4:0] <= x_ps[4:0] + 5'd1;
tile_accum_done <= 1'b1; // Tell the RA we're done.
isp_state <= 8'd0; // Back to idle state.
end
I had to add the extra isp_state 53, to give the other logic time to do the calcs for texturing etc.
There are ways to mitigate that extra delay time, by doing pipelining stuff, but that scares me.
Also a blatant off-by-one bug in isp_state 54.
Which is very likely causing the small dots on the FPGA version, due to the last pixel of each tile not being written to.
Since it's only checking for y_ps[4:0] and x_ps[4:0] both == 31, but not allowing the actual pixel write in that state.
(on the FPGA, it has some extra delays, due to using the BRAMs, etc. The Verilator sim code currently doesn't properly emulate the DDR3 latency either.)
FPGA actually has 2 or 3 pixels of extra delay, hence why everything is shifted within each tile.
Trying to figure out why the mountains aren't being rendered in the sim.
Both the mountains and road textures use VQ compression.
The roads are displayed OK, so it can't really be a Codebook problem.
Mostly fixed the missing stuff now.
And Kasumi lost her beard.
Before...
After...
Now she's ready for the Paris Olympics.
It's all down to logic issues, in the main Tag fetch / param fetch / pixel rendering.
Since it checks to see if the Tag value has changed, and whether it needs to read the Codebook for a new texture.
But that was blocking it from doing the pixel render.
So, when the Tag changes rapidly on each pixel (like on her face), it wasn't allowing the actual pixel writes.
Will try. hehe
I kinda broke it again.
But now I know that when you see the tiles looking like this, it's usually because FRAC_BITS is too high.
Since I'm limiting the fixed_point values to 32-bit atm, it's not quite enough for anything above FRAC_BITS=11.
It looks VERY similar to when the inTri and depth_compare AND thing was broken.
As the integer part of the fixed-point values overflow, and inTri always thinks the triangle pixels are "visible" throughout the whole tile.
So whatever the last triangle was that gets written to the Tag buffer, smashes the whole buffer / tile.
Also, once again had timing issues with the core, and it kept getting stuck in ra_parser state 12.
Even though there's NO conditional logic in that state for it to get stuck on.
looks playable to me
lol
I'm stuck in one of those really weird catch-22 things in the code atm.
I can't get good renders in the sim atm, after pasting in the code from the FPGA version.
The problem is, the whole delay thing, between when you update something, and when you receive the valid result.
I need it to interate through each "pixel" at a time in the Tag buffer.
So that will have a delay of one clock cycle before the actual Tag value gets output.
Then I need to check to see if the Tag has changed since the previous time, so I don't waste time reading the entire Codebook or new texel for every Tag/pixel.
Can't seem to get it right again, without it looking like a scene from The Blue Oyster Bar.
Using the old-people Codebook logic is super slow, and not much better...
So I'm stuck again. The core has crappy timings, like the Jag core, so is now unstable.
And I'm not sure how I've managed to screw up the sim logic.
Totally confused now.
I tried cloning the sim repo from yesterday, and the code is quite different from what's in Quartus atm.
And the sim version is using the codebook cache, and I forgot how half of that works.
The sim thing was rendering quite fast, but with gaps between polys again. The fix from above didn't work.
It's all a real mess, tbh. I need to somehow reconcile all of the differences, and merge the projects.
Reverted to the Quartus project from yesterday.
I can once again see why people say "commit often" when using github.
Quite a bit better.
And this time, I remembered to do the git commit. lol
Just over four seconds for the Daytona render, at 16 MHz.
So at 96 MHz, it should do about 1.5 FPS?
1.33 FPS
That's starting to get closer to what I'd expect, before the Burst transfers stuff is added.
Which should make a BIG difference.
Rendered in ~1.5 seconds…
ship it
Awake again.
Jag CD.
With GreyRogue's latest core. I'm hoping it's more stable.
I can see now, how the Butch chip selects the Cart, or selects the CD BIOS.
The GROM1 signal is the Chip Select for the Cart slot on the CD drive itself.
The ROM1 signal is the usual Chip Select from the Jag mobo.
ROM1 goes to the Butch chip (near the bottom of the screen).
Then the Butch chip can select either the GROM1 (cart slot) or PROMCSL (CD BIOS).
And it knows whether there is a Cart in the slot, in a similar way to how the Jag mobo does it...
The "Cart present" signal gets grounded directly when you insert a cart.
The Jag mobo actually uses that to enable the main 5V switching regulator. Without the cart inserted, you don't get any signs of life from the console.
The Butch chip uses that simply to know if there's a cart in the slot on the CD drive.
But, when I try to load the CD BIOS as if it's a cart, it always gives the red screen of death.
The CD BIOS even seems to have a header, like a cart ROM does.
And it appears to have the correct byte swap.
But...
Many carts are accessed in either 16-bit or 32-bit mode.
IIRC, the Jag BIOS detects that by reading a few bytes in the cart, and can figure out how the PROM is connected.
From what I can tell from the MAME code, the CD BIOS really should just get mapped like a cart, at 0x800000.
That is the range where xromcsl1 should also get asserted (Low).
(xromcsl0 is for selecting the main BIOS on the mobo, and that gets mapped to 0xE00000, sort of.)
I'm wondering if the CD BIOS needs to get mirrored multiple times, so it can jump to code higher up in memory.
Not sure why it would need to do that, but anyway.
I haven't seen any accesses of the Butch regs yet.
If I enable the cart Checksum bypass, it goes to a black screen.
Then sits at this address...
Jag CD BIOS has a jump vector in the same place as a typical cart...
Gonna check MAME.
Interesting. MAME is reading this from Cybermorph. Which is the four bytes just before the jump vector thingy...
OK, then after the Jag logo disappears (after the checksum?), it does read the jump vector, then immediately takes the Jump...
The Jag CD BIOS jumps directly to that point on start-up, which is weird...
ie. not to 0xE00000, where the main BIOS is.
Then it does read the CD BIOS vectors, but not in the same way?
There's a Jag CD Dev BIOS, too, but that just has lots of 0xFF at the start, and then what seems to be similar code to the normal CD BIOS?
Very weird.
Deja Vu, from many years ago.
look at the univ.bin patch file
at offset $400 you will see 4 bytes : 04 04 04 04
this means that the image is for 32bits cartridge
change them to 00 00 00 00 give you a patch for 8 bits cartridges.
02 02 02 02 stands for 16 bits cartridges (don't know if it works).
So now I think I know what's happening.
The Jag core isn't set up to work with 8-bit wide Cart ROMs atm.
Probably.
Oh yeah, that's why they repeat the same number four times.
So it still appears, even if the cart ROM is being read as 8-bit, 16-bit, or 32-bit wide.
lol
// 32-bit cart mode...
//
assign cart_q1 = (!abus_out[2]) ? DDRAM_DOUT[63:32] : DDRAM_DOUT[31:00];
I knew it was in there somewhere.
Gonna take some figuring out.
Looks like the cart ROM is now read from SDRAM.
So I'll probably have to tweak the address there.
Actually, maybe I can just patch the CD BIOS to be read as 32-bit.
Heyyy
That was it.
No real need to tweak the core for reading in 8-bit mode, but I might add the option anyway.
Just need to patch those four bytes at 0x400 in jag_cd.bin, then enable the cart Checksum bypass.
(unless the checksum in the ROM can be updated. I'm not sure where that is.)
First accesses of the Butch regs...
All initial Butch reg accesses...
First writes...
Which tally with MAME.
Looks like the commands get written to the upper Byte, for some reason.
And the first two commands are...
0x70 Set DAC mode (?)
0x17 Clear Error
[0x00]: irq register
(R)
-x-- ---- ---- ---- CD uncorrectable data error pending
--x- ---- ---- ---- Response from CD drive pending
---x ---- ---- ---- Command to CD drive pending
---- x--- ---- ---- Subcode data pending
---- -x-- ---- ---- Frame pending
---- --x- ---- ---- CD data FIFO half-full flag pending
(W)
---- ---- -x-- ---- CIRC failure irq
---- ---- --x- ---- CD module command RX buffer full irq
---- ---- ---x ---- CD module command TX buffer empty irq
---- ---- ---- x--- Enable pre-set subcode time-match found irq
---- ---- ---- -x-- Enable CD subcode frame-time irq
---- ---- ---- --x- Enable CD data FIFO half full irq
---- ---- ---- ---x set to enable irq
[0x04]: DSA control register
[0x0a]: DSA TX/RX data (sends commands with this)
0x01 Play Title (?)
0x02 Stop
0x03 Read TOC
0x04 Pause
0x05 Unpause
0x09 Get Title Len
0x0a Open Tray
0x0b Close Tray
0x0d Get Comp Time
0x10 Goto ABS Min
0x11 Goto ABS Sec
0x12 Goto ABS Frame
0x14 Read Long TOC
0x15 Set Mode
0x16 Get Error
0x17 Clear Error
0x18 Spin Up
0x20 Play AB Min
0x21 Play AB Sec
0x22 Play AB Frame
0x23 Stop AB Min
0x24 Stop AB Sec
0x25 Stop AB Frame
0x26 AB Release
0x50 Get Disc Status
0x51 Set Volume
0x54 Get Maxsession
0x70 Set DAC mode (?)
0xa0-0xaf User Define (???)
0xf0 Service
0xf1 Sledge
0xf2 Focus
0xf3 Turntable
0xf4 Radial
[0x10]: I2S bus control register
[0x14]: CD subcode control register
[0x18]: Subcode data register A
[0x1C]: Subcode data register B
[0x20]: Subcode time and compare enable
[0x24]: I2S FIFO data
[0x28]: I2S FIFO data (old)
[0x2c]: ? (used at start-up)
case 8: //DS DATA
switch((m_butch_regs[offset] & 0xff00) >> 8)
{
case 0x03: // Read TOC
{
For sure seems to use the upper Byte, when writing Commands.