#rpi open firmware

1 messages · Page 1 of 1 (latest)

dusk fiber
#

@twilit oasis you where interested in the open rpi firmware a few days ago but had to run, do you want to learn more about it today?

twilit oasis
#

At work currently. I've been busy.

dusk fiber
#

ah, just ping me in here whenever your free

twilit oasis
dusk fiber
#

for every model of rpi, the VPU always starts executing a maskrom after reset

#

for bcm2835, that rom is 18384 bytes
for bcm2836, 18384 bytes again, but 2 parts have changed
for bcm2836, 30528 bytes, a complete overhaul
for bcm2711, 14480 bytes, stripped back heavily

#

the 283{5,6,7} maskroms can all load the next stage from ~8 sources

#

1st/2nd/4th in the boot order are all bootcode.bin on fat on an SD card
but they differ in which controller (sdhost vs sdhci) and bus width (4bit vs 8bit) is used

#

3rd is raw NAND flash

#

5th is SPI flash

#

6th is the dwc usb controller
bcm283{5,6} (pi0-pi2) only support device mode

bcm2837(pi3, pi02) support both host and device
there is seperate OTP bits to allow host and device
if both host&device are allowed, it will query the OTG_ID pin and decide at runtime

#

in usb-host mode, it supports the lan9514 NIC found on the rpi, and will then boot from either usb-storage or tftp over the lan9514

#

7th in the boot order is i2c-slave, you just fire off a giant blob with raw i2c writes, not SMbus style

#

8 is something called MPHI, but i have no idea what it is

#

bcm2711 stripped it back massively, to just:
1: recovery.bin on an SD card
2: SPI flash
3: usb-device mode

#

@twilit oasis that all make sense so far?

twilit oasis
#

Yeah

dusk fiber
#

for the pi0-pi3 lineup, the default way to boot, is with bootcode.bin on sd/usb/tftp

#

for the bcm2711 lineup, its SPI flash all the way, at this stage

#

the pi3 maskrom had many usb-host bugs, and being rom, you cant exactly fix them

twilit oasis
#

Honestly SPI flash does simplify some things.

dusk fiber
#

as a hack, they came up with the bootcode.bin only mode of booting

#

an SD card, with only bootcode.bin, would bypass rom bugs, and be able to usb-host boot the rest of the way

#

and as a side-benefit, this worked on the pi0-pi2 lineup

#

the pi4 basically just made that the only way to boot

#

throw bootcode.bin onto SPI flash, and now its basically the same setup

#

and you can update it at any time to fix bugs or add features

#

for the pi0-pi3 family, the official bootcode.bin can detect how it was loaded, and will try to load the next stage from the same source, after it brings dram online

#

but the special bootcode.bin only mode, means that if it came from SD, and cant find anything, it will try usb-host next

#

for the bcm2711, the bootcode.bin in SPI will consult the bootconf.txt file in SPI, which has a BOOT_ORDER= string

#

BOOT_ORDER=0xf4241
this says to try SD first
then usb-storage on the pcie xhci
then network
then usb-storage on pcie xhci again
then loop

#

thats far better then before, when you had to configure things in write-once OTP

twilit oasis
#

Yeah, although personally I'm partial to config pins.

dusk fiber
#

you can do that too!

#

bootconf.txt supports gpio conditionals

twilit oasis
#

Hmm interesting

dusk fiber
#
[all]
BOOT_UART=1
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
DHCP_TIMEOUT=45000
DHCP_REQ_TIMEOUT=4000
TFTP_FILE_TIMEOUT=30000
TFTP_IP=192.168.2.15
TFTP_PREFIX=0
SD_BOOT_MAX_RETRIES=3
NET_BOOT_MAX_RETRIES=5

gpio=21=ip,pu

[gpio21=0]
BOOT_ORDER=0x3

[gpio21=1]
BOOT_ORDER=0x5241

[none]
FREEZE_VERSION=1```
#

gpio=21=ip,pu changes pin 21 to input with pullup

#

[gpio21=0] means the following statements only apply if the pin is somehow low

twilit oasis
#

At this point I'm just relieved it doesn't have the brainrot that is UEFI.

dusk fiber
#

so a jumper from gpio21 to gnd, can change the BOOT_ORDER

#

the maskrom does also have gpio config too

#

you can configure a certain gpio pin, to disable loading recovery.bin from SD when in a certain state

#

and the same, for booting from SPI

#

fully configurable, for both which pin, and which level is "disable"

#

thats mainly for use on the CM4, when the emmc and SPI are soldered on, and might have bad firmware

#

forcing them off causes it to fall back to usb-device mode, and you can re-flash it

#

for all models, the rom loads a .bin file into the 128kb VPU L2 cache, and executes it (with optional signature checking)

#

that .bin then brings dram online, and loads a .elf file

#

and the .elf then brings the rest of the system up, and loads linux

#

with the pi0-pi3 lineup, you have 3 choices over where to get your bootcode.bin

#

the closed bootcode.bin also brings dram online, and then loads start.elf and executes it

#

using start_file=msd.elf or just renaming .elf files, you can also get the closed .bin to load other things

#

you then have 3 choices again, for which .elf you load

#

msd.elf is meant for compute modules, but also works on the zeros
the rpi will use the dwc to emulate a usb MSD, and expose the sd/emmc, for flashing an OS

#

start.elf is the official firmware, with all of the services you expect from an rpi (hw encode/decode, camera, 3d, and everything)

#

and lk-overlay can also produce an lk.elf file, and one configuration will go on to loading linux

#

the biggest difference and compatability issue here, is that start.elf comes with a fixup.dat, relocation data, for loading start.elf to a diff memory addr

#

and the open loader doesnt support that, and has never been tested with the closed start.elf binary

#

but the open lk.elf is compatible with both open&closed .bin loaders

#

so you can use the closed bootcode.bin for netboot, but then the open lk.elf, and speed up your development cycles

#

or go pure open source, with lk-overlay for both bootcode.bin and lk.elf

#

@twilit oasis that whole mess make sense?

twilit oasis
#

It's long, but yeah.

dusk fiber
#

and this is the boot graph for the old bcm2711 firmware

#

i'm not able to bring the dram up, or the arm up

#

so, while i can stick open source in at any stage, thats a dead-end

#

but, when RPF decided to add https booting to the SPI firmware, it went over the 128kb size limit

#

so they split the bootloader into 2 pieces

#

the new bootcode.bin only does dram init, and loading of bootmain.elf from spi, and has lost the ability to boot from anything else

#

and the new bootmain.elf then does the actual loading part of a bootloader

#

https booting is also called network install

#

the SPI flash will just download a ~64mb boot.img file over the internet, check an rsa signature on it, and then treat that as the fat partition on SD

#

within the default image, is the normal start4.elf firmware, linux, and an initrd
and within the initrd, is the rpi imager

#

so, you can just hit shift while booting, and the rpi imager will launch, and you can install any distro to your sd/usb

#

and i say "default image", because its fully configurable in bootconf.txt

#

HTTP_HOST=, HTTP_PATH= can be used to get boot.img from elsewhere

#

IMAGER_REPO_URL= changes the path to the json file that rpi-imager uses for a distro list

#

so you can use the stock boot.img, but provide a custom list of distro's

#

combine that with gpio conditionals, and you can force it into net-install mode with a jumper

#

and boom, you now have a (network reliant) factory reset, on whatever product you bake a CM4 into

#

hold a button while booting, select an OS from the list, hit write, done

twilit oasis
#

Well minimizing proprietary blobs is definitely preferable.

dusk fiber
#

yeah, thats where replacing the blobs comes in

#

with the old rpi-open-firmware project, you can boot the pi2/pi3 without any blobs present (if you ignore the maskrom)

#

but you dont get any video, and changing the codebase often crashes things, its got to fit within 128kb, and is oddly fragile at times

#

so i grabbed this existing kernel, and ported it to the VPU

#

that instantly gave me threading, mutexes, blocking thread primitives, irq handlers, ext2, tga, and more

twilit oasis
#

That's impressive.

dusk fiber
#

i then took bits of code from rpi-open-firmware, and ported them over, giving me lpddr2, arm, power, and clock drivers

#

with LK, you define a project file like this

#

each module, has its own rules.mk that follows a similar scheme

#

the target, says to load target/rpi3-vpu/rules.mk, which follows the same scheme again

#

TARGET := rpi3-vpu and BOOTCODE := 1 say that the resulting lk.bin must be under 128kb, and is compatible with the maskrom loading, just rename it to bootcode.bin and this will run on startup

#

platform/bcm28xx/otp is the OTP driver, so you can read things like the hw revision and serial#

#

platform/bcm28xx/rpi-ddr2 is the lpddr2 driver, so it can bring ram online

#

platform/bcm28xx/sdhost is an SD driver, so you can access storage

#

app/vc4-stage1 is the bootloader app, it uses otp/sdhost to mount an ext4 partition, load /boot/lk.elf, and then executes it

#

and every module, is free to declare more modules it depends on, and sources to add to the kernel

#

lib/elf allows parsing elf files
lib/fs is the filesystem core
lib/fs/ext2 is the ext2/4 driver
lib/partition adds MBR support
lib/lua is part of a test, for running lua at boot

#

https://github.com/librerpi/lk-overlay/blob/master/app/vc4-stage1/stage1.c

bdev_t *sd = rpi_sdhost_init();
partition_publish("sdhost", 0);
ret = fs_mount("/root", "ext2", "sdhostp1");
ret = fs_open_file("/root/boot/lk.elf", &stage2);
ret = elf_open_handle(stage2_elf, fs_read_wrapper, stage2, false);
  void *entry = load_and_run_elf(stage2_elf);
  fs_close_file(stage2);
  arch_chain_load(entry, 0, 0, 0, 0);
#

and boom, with just that (and error handling), you have a bootloader!

#

but, its only able to load VPU binaries, and it just jumps to the entry-point defined in the ELF header

#

so we are free to add much more expensive modules into the code

#

stage2 will bring various hw blocks online, including the arm core, and execute an arm payload on that

#

this runs on the arm core, and loads rpi2.dtb and zImage from /boot on ext4

#

on the LK side of things, i have uart/vec(ntsc)/dpi/3d/pwm/sd all working

#

when linux is on the arm side, it has uart/usb-host/sd, and it can get a dumb framebuffer by routing thru LK

#

the main features that are missing and would be possible to solve, are:

  • config files
  • raw camera access (just bayer frames)
  • i2s/spi/i2c on linux
  • maybe 3d on linux, with some driver mangling
#

the harder stuff, that should still be possible:

  • usb-host on the vpu side (netboot/usb boot)
  • hdmi/dsi init
#

and then the basically impossible stuff, is h264/mpeg2/jpeg accel, and the whole ISP

#

bcm2711 is much further behind, no dram init, and i havent gotten the arm core to start either, so your entirely reliant on closed blobs

#

@twilit oasis any questions after all of that?

dusk fiber
#
0x3f102070 00 00 90 00 0c 00 00 00  00 00 00 00 00 00 00 00  |................|
0x3f102080 18 02 43 00 00 0c 01 00  86 00 18 00 40 00 00 00  |..C.........@...|
] hexdump 0x7e102070 32
0x7e102070 00 00 90 00 0c 08 00 00  00 00 00 00 00 00 00 00  |................|
0x7e102080 38 02 47 00 00 1c 01 00  8e 04 18 00 40 00 00 00  |8.G.........@...|
dusk fiber
#

some progress

#

i just ignored that PLLH isnt locking, and tried using it anyways, as the ref-clk for PWM/audio

#

its hovering around 311mhz

dusk fiber
#

and its working!!

#

there was 4 DIG registers, no clue what they do
but i noticed, linux doesnt touch them, ever

#

so i commented that out, and boom, its come to life!

dusk fiber
#

@kindred raptor i got PLLH working lastnight!

kindred raptor
#

Nice!

dusk fiber
#

i started by just ignoring the fact that it doesnt lock, and trying to use it anyways

#

some math with the divisors, revealed it was running at around 300mhz

#

and after more messing around, i eventually got it to run at the desired freq

kindred raptor
#

Does it lock now?

dusk fiber
#

yep

kindred raptor
#

So, why did it refuse to lock?

dusk fiber
#

there was a set of 4 registers, DIG0, DIG1, DIG2, DIG3

#

my code was setting them, the same as it did with the other PLL's

#

but linux never touches the DIG registers, ever

#

i commented that block out, and it started working

kindred raptor
#

What were the DIG registers supposed to do again?

dusk fiber
#

no clue!

#

linux doesnt touch them, and i cant remember where i got that code from

#

and the headers dont say anything

#
  *REG32(A2W_PLLA_DIG3) = A2W_PASSWORD | 0x0;
  *REG32(A2W_PLLA_DIG2) = A2W_PASSWORD | 0x400000;
  *REG32(A2W_PLLA_DIG1) = A2W_PASSWORD | 0x5;
  *REG32(A2W_PLLA_DIG0) = A2W_PASSWORD | div | 0x555000;
#

it was doing a bunch of magic numbers like this

#
  *REG32(A2W_PLLA_ANA3) = A2W_PASSWORD | KA(2);
  *REG32(A2W_PLLA_ANA2) = A2W_PASSWORD | 0x0;
  *REG32(A2W_PLLA_ANA1) = A2W_PASSWORD | (prediv ? ANA1_DOUBLE : 0) | KI(2) | KP(8);
  *REG32(A2W_PLLA_ANA0) = A2W_PASSWORD | 0x0;
#

and i think the KA/KI/KP vars, are part of the digital PID loop

kindred raptor
#

ic

dusk fiber
#

i assume that changing those, will adjust how fast the PLL locks, and how stable its clock is

#

but i would need a spectrum analyzer and other fancy hw, to do anything with that

#

so, in theory....

#

#define SCALER_DISPECTRL_SECURE_MODE_SET 0x80000000

#

i just set bit 31 in 0x7e40000c, and the security problem i had goes away

#

(or clear it, will need testing)

#

and now that PLLH is working, the linux kms drivers should just work

#

and boom, full 2d and 3d accel

dusk fiber
#

ok, lets see, first i need to turn the arm code back on....

#
] whatareyou
i am aarch64 with MIDR_EL1 0x410fd034 in EL 1
#

ok, arm core is running a repl

dusk fiber
#
] hexdump 0xffffffffc0400000 32
0xc0400000 7f 00 0c 80 00 00 17 00  76 72 64 64 00 00 3f 81  |........vrdd..?.|
0xc0400010 00 00 00 00 00 00 00 00  00 00 00 00 76 72 64 64  |............vrdd|
#

0x7e40000c has bits 16-21, 24, and 31 set

#

so SECURE_MODE is 1

#

ok, now linux doesnt boot anymore

dusk fiber
#

interesting

#

*REG32(SCALER_DISPECTRL) &= ~SCALER_DISPECTRL_SECURE_MODE;

#

just running this, breaks linux

#

but if i run that from the VPU, it doesnt hang

#

ok, so at /soc/gpu you have:

                gpu {
                        compatible = "brcm,bcm2835-vc4";
                        status = "disabled";
                };
#

that is the master for the whole drm

#

ive turned on a bunch of gpu stuff....

#
[  103.230789] raspberrypi-firmware soc:firmware: Request 0x00030066 returned status 0x80000001
[  103.240922] vc4-drm soc:gpu: [drm] Couldn't stop firmware display driver: -22
PANIC: Asynchronous SError Interrupt

Entering kdb (current=0xffffff800b408000, pid 748) on processor 0 due to NonMaskable Interrupt @ 0xffffffc00815eb64
[0]kdb> 
#

oh wow, that actually worked this time

#
 panic+0x198/0x374
 nmi_panic+0xb4/0xbc
 arm64_serror_panic+0x78/0x84
 do_serror+0x30/0x7c
 el1h_64_error_handler+0x38/0x50
 el1h_64_error+0x64/0x68
 vc4_hvs_bind+0xf8/0x560 [vc4]
 component_bind_all+0x110/0x260
#

@granite sandal incase you missed it, ive moved my spammy chat over to this thread

dusk fiber
#

as best as i can tell, its vc4_hvs_upload_linear_kernel() that faulted, while writing to the display list

#

which means the hvs security is not disabled, and the problem remains

granite sandal
#

i'm on a bug hunt today will go back and read it

kindred raptor
dusk fiber
#

Security to block the arm from using the 2d core

kindred raptor
#

o_O

#

Why would one want that? To prevent memory access from the "wrong" context?

dusk fiber
#

The 2d core can read all ram via dma

#

That defeats all DRM schemes

#

Your crypto keys cease to be safe

kindred raptor
#

So you block the 2D core while doing what? In a game console context (think We Hope Nintendo Will Use Our IC), it wouldn't make any sense. I guess it was made to prevent DMA while decoding DRM'd video?

dusk fiber
#

Or force all video thru an RPC call into the secure kernel

#

Opengl was over that RPC channel

#

There are signs that a secure video frame can be used as opengl texture in a secure manner

kindred raptor
# dusk fiber Opengl was over that RPC channel

So, you put your display list in a buffer somewhere, you call an OS function, it reads & hopefully somehow in spite of the halting problem sanitizes it, and then sends it to the videocore?

dusk fiber
#

The list can't loop so no halting problem

#

No branching

dusk fiber
#

if you just implement a basic parser, you can decode that, substitute virtual addresses for physical addresses (and fault on violation), and then forward it on

kindred raptor
#

Makes sense!

dusk fiber
#

but the rpi firmware instead implements the dispmanx api

#

it doesnt expose the raw hw api itself, but a proper 2d graphics api

#

vc_dispmanx_resource_create() creates an image in gpu memory

#

vc_dispmanx_resource_write_data() copies image data from linux to gpu memory

#

vc_dispmanx_update_start() and vc_dispmanx_update_submit_sync() wrap a bunch of changes, to make the whole group atomic

#

vc_dispmanx_element_add() creates a sprite using a previously allocated resource as its backing image

#

other functions exist, to change the resource behind a sprite, or to change the parameters on a sprite

#

and RPF (at my request) added a function that lets you get the physical address of a resource in gpu memory, so you can dma directly into it and skip vc_dispmanx_resource_write_data()

#

the dispmanx stack then keeps track of all of the resources(images) and elements(sprites), and generates the display list automatically

kindred raptor
#

That's... another way to ensure you don't get your tivobox owned

#

(As long as you properly sanitize inputs)

dusk fiber
#

yep

#

there is a dedicated mmu between the arm core(s) and the rest of the system

#

so you could ban linux from ever touching gpu ram or mmio

#

if linux wants something, it must ask the gpu for help

#

but the rpi firmware hasnt made use of any of these tricks

#

the security was off from day 1

#

so all thats left, is design tricks, that imply it could have done this at one time

#

like the firmware having a secure/non-secure split, and well sanitized rpc calls into the secure half

#

including one rpc call that just lets you write anywhere in ram 😛

kindred raptor
dusk fiber
#

the original pi1 firmware, looked like it came right out of a cable box

#

it has mention of tv remote buttons, games, channels

#

they basically just took STB firmware, added an app to run linux on the co-processor, and shipped it 😛

#

and over time, they removed unused parts, and the codebase evolved

#

to speed up testing, i added a line to the bootloader

#

dlist_memory[0] = 0x1234;

#

in theory, that will cause the same fault linux is having, but way sooner in the boot process

kindred raptor
#

What fault was linux having again? I may have missed it in the scrollback :P

dusk fiber
#

async external abort

#

ok, that sorta worked

#
[    0.000000] SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.21 #1-NixOS
[    0.000000] Hardware name: Raspberry Pi 3 Model B rev 1.2, with open firmware (DT)
[    0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
...
[    0.000000] Kernel panic - not syncing: Asynchronous SError Interrupt
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.21 #1-NixOS
[    0.000000] Hardware name: Raspberry Pi 3 Model B rev 1.2, with open firmware (DT)
[    0.000000] Call trace:
...
[    0.000000]  el1h_64_error+0x64/0x68
[    0.000000]  setup_arch+0x168/0x5d4
[    0.000000]  start_kernel+0xa4/0x79c
[    0.000000]  __primary_switched+0xbc/0xc4
#

my bootloader somehow has that error masked

#

so the instant linux turns exception handling on, boom

#

i can kinda see why this isnt handled like an OOPS

#

the handler was ran seconds!! (the async) after the fault was triggered

#
325         parse_early_param();
326 
327         /*
328          * Unmask asynchronous aborts and fiq after bringing up possible
329          * earlycon. (Report possible System Errors once we can report this
330          * occurred).
331          */
332         local_daif_restore(DAIF_PROCCTX_NOIRQ);
#

325 is what allowed me to see this
and 332 feels like where it caught the exception

#

arch/arm64/include/asm/daifflags.h:#define DAIF_PROCCTX_NOIRQ (PSR_I_BIT | PSR_F_BIT)

#

DAIF, page 392

#

Interrupt Mask Bits

#

bit 8, A, SError interrupt mask bit.

#

that matches the error msg

#
invalid exception, which 0x13
iframe 0xffff00000008dec0:
x0  0xffff00000003d1c8 x1  0x        ffffffff x2  0x            1234 x3  0xffff000000044000
x4  0x        696f6820 x5  0x        696f6820 x6  0x               0 x7  0x           48841
x8  0xffff00000003d508 x9  0xffff00000002a000 x10 0xffff00000008dff0 x11 0x        ffffffc8
x12 0xffff00000008e030 x13 0xffff00000008e030 x14 0xffff00000004f000 x15 0xffff000000082430
x16 0x               1 x17 0x8030200002211214 x18 0x               0 x19 0xffff00000003d1c8
x20 0xffff000000032fa8 x21 0x               0 x22 0xffff000000032000 x23 0x               0
x24 0xffff000000032fe0 x25 0xffff000000032fb8 x26 0xffff000000032bd8 x27 0x               0
x28 0xffff0000000447a0 x29 0xffff00000008e010 lr  0xffff0000000058c4 usp 0xf3fbed2923810090
elr 0xffff00000000c73c
spsr 0x        60000205
stack trace:
0xffff00000000c73c
0xffff000000005efc
0xffff000000004ebc
0xffff0000000034fc
panic (caller 0xffff0000000033c0): die
HALT: spinning forever... (reason = 9)
#

bingo

kindred raptor
#

How in the world would loading linux trigger that?

dusk fiber
#

the problem is that the bootloader didnt unmask this error

#

so the bootloader triggered the error (via dlist_memory[0] = 0x1234;) and left it pending

#

and once linux got the console up, it unmasked the error, and instantly blew up

kindred raptor
#

So, you set a pointer to the display list to somewhere that doesn't make sense

#

Which caused a pending SError

dusk fiber
#

any write to the displaylist, even a normally valid one, will fault like this

#

the arm just doesnt have permission to use the 2d core

kindred raptor
#

O.o

#

But SError shouldn't be generated by a request failing the MMU

#

AFAIK

dusk fiber
#

its not the arm mmu

kindred raptor
#

Ah

#

So, if secure_mode is on

#

that works on the VC MMU

dusk fiber
#

the peripheral itself is deciding to allow or not

#

so we are down into the axi layer

kindred raptor
#

This has to do with the secure_mode thing?

dusk fiber
#

that was my theory

#

if secure is on, only requests from the VPU can write

#

if secure is off, anybody can write

kindred raptor
#

That would sound right

dusk fiber
#
SCALER_DISPECTRL: 0x13f0000
invalid exception, which 0x13
#

bit 31 is the secure flag

#
SCALER_DISPECTRL: 0x813f0000
invalid exception, which 0x13
#

it faults when both on and off ......

kindred raptor
#

guess you are missing something

dusk fiber
#

yep

kindred raptor
#

Or the ARM peripheral always has non-secure permissions

#

Saying "the ARM peripheral" has very cursed energy hans

dusk fiber
#

the same write works when running under the closed firmware

#

this tells you what each bit in 0x813f0000 means

#

first, we have PANIC_CTRL in bits 0-6, thats 0

#

then we have BUSY_STATUS in bits 8-31????

#

then Y_BUSY in bits 9-31???

#

overlap much??

#
SCALER_DISPECTRL: 0x813f0000
SCALER_DISPECTRL: 0xfdff007f
...
SCALER_DISPECTRL: 0xfdff007f
invalid exception, which 0x13
#

if i write all f's, then some bits dont stick

#

bits 0-6
16-24
26-31

#

yeah, ive got no clue

#

too much overlap in this register, and video still works even with garbage written to it

#

well, one last idea, which also failed

#
*REG32(SCALER_DISPCTRL) &= ~SCALER_DISPCTRL_ENABLE; // disable HVS
...
*REG32(SCALER_DISPCTRL) = SCALER_DISPCTRL_ENABLE // re-enable HVS
#

before i disable (it may have already been off), it was at 0x813f0000

#

after disabling, writes are silently ignored

#

after re-enabling, it accepts the write and can clear that secure_mode bit

dusk fiber
#

my next potential target, there are flags in here, like FULLPERI and peripherals on

#

if i remove FULLPERI then it faults early in boot, and doing as little as reading the clock will hang the arm

#

but the uart is working

#

if i remove peripherals on, nothing changes

dusk fiber
#

i'm getting a SLVERR when i try to drive the 2d core

#

a SLVERR is represented by the BRESP bits being 0b10

#
#define ARM_C0_BRESP1    0x00000004
#define ARM_C0_BRESP2    0x00000008

static const uint8_t g_BrespTab[] = { 
        0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x1C, 0x18, 0x1C, 0x18, 0x0,
        0x10, 0x14, 0x10, 0x1C, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x0,
        0x10, 0x14, 0x10, 0x1C, 0x18, 0x1C, 0x10, 0x14, 0x18, 0x1C, 0x10, 0x14, 0x10, 0x0,
        0x10, 0x14, 0x18, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x0,
        0x10, 0x14, 0x18, 0x14, 0x18, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x18, 0x0 
};
#

there is a magic table of ~65 BRESP flags in the code.....

#

the main broadcom mmu, between arm and vpu, has 64 pages, of 16mb each......

#

if i decode that array, i get:

#
0: 0x4   OKAY
1: 0x5 EXOKAY
2: 0x4   OKAY
3: 0x5 EXOKAY
4: 0x4   OKAY
5: 0x5 EXOKAY
6: 0x4   OKAY
7: 0x5 EXOKAY
8: 0x4   OKAY
9: 0x7 DECERR
10: 0x6 SLVERR
11: 0x7 DECERR
12: 0x6 SLVERR

14: 0x4   OKAY
15: 0x5 EXOKAY
16: 0x4   OKAY
17: 0x7 DECERR
18: 0x4   OKAY
19: 0x5 EXOKAY
20: 0x4   OKAY
21: 0x5 EXOKAY
22: 0x4   OKAY
23: 0x5 EXOKAY
24: 0x4   OKAY
25: 0x5 EXOKAY
26: 0x4   OKAY

28: 0x4   OKAY
29: 0x5 EXOKAY
30: 0x4   OKAY
31: 0x7 DECERR
32: 0x6 SLVERR
33: 0x7 DECERR
34: 0x4   OKAY
35: 0x5 EXOKAY
36: 0x6 SLVERR
37: 0x7 DECERR
38: 0x4   OKAY
39: 0x5 EXOKAY
40: 0x4   OKAY

42: 0x4   OKAY
43: 0x5 EXOKAY
44: 0x6 SLVERR
45: 0x5 EXOKAY
46: 0x4   OKAY
47: 0x5 EXOKAY
48: 0x4   OKAY
49: 0x5 EXOKAY
50: 0x4   OKAY
51: 0x5 EXOKAY
52: 0x4   OKAY
53: 0x5 EXOKAY
54: 0x4   OKAY

56: 0x4   OKAY
57: 0x5 EXOKAY
58: 0x6 SLVERR
59: 0x5 EXOKAY
60: 0x6 SLVERR
61: 0x5 EXOKAY
62: 0x4   OKAY
63: 0x5 EXOKAY
64: 0x4   OKAY
65: 0x5 EXOKAY
66: 0x4   OKAY
67: 0x5 EXOKAY
68: 0x6 SLVERR
kindred raptor
dusk fiber
#

i think the write isnt succeeding

#

and the error is telling you so

#

but its an async error, and can happen a dozen opcodes late

kindred raptor
#

But how were you driving the display list?

dusk fiber
#

the VPU can write to the HVS just fine

#

but the ARM cant

kindred raptor
#

Or were you directly executing stuff on the VPU?

#

(that's what you did. ok.)

dusk fiber
#

one of the DSI ports has this same problem, and the rpi engineers never did fix it

#

so they have an RPC to forward every read/write

kindred raptor
#

My guess is that this is the reason why they use RPC for calls to the VPU too

kindred raptor
dusk fiber
#

exactly

kindred raptor
#

if the VPU has been designed in such a way so as to refuse writes from the ARM peripheral

#

Then, it is how it is

dusk fiber
#

i assume for the DSI, its just hard-wired like that

#

but clearly, its a config switch on the HVS

kindred raptor
#

There's some sort of peripheral inside the silicon driving the DSI, right?

#

But wait, you said that from the proprietary firmware, writes to the VPU from the ARM peripheral succeed? O.o

dusk fiber
#

yeah

#

if you boot the closed firmware, and then write to the HVS from arm, it just works

#

thats how the kms drivers in linux work

kindred raptor
dusk fiber
#

yep

kindred raptor
#

You just went and picked the strangest and most undocumented platform to work with

dusk fiber
#

😄

kindred raptor
#

Bizzaro world where phrases such as "The ARM Peripheral" become normal

#

The Twilight Pi

dusk fiber
#

i think the older videocore SoC's, just lacked an arm entirely

#

the VPU ran the entire show

#

and one of the rpi enginers has said, "lets throw an arm core in there, we might need it some day"

#

and from the design, it seems to pre-date or not trust the arm secure vs non-secure stuff

#

so they are using the VPU as the master, and ARM as the untrusted slave

kindred raptor
dusk fiber
#

it pre-dates the foundation

kindred raptor
#

Most probably they just got a broadcom part with an ARM core

#

(Or the pi engineers' for that matter)

dusk fiber
#

one of the broadcom engineers picked the bcm2835 to make the rpi, and start the foundation

#

and the foundation was started by a group of broadcom employees

#

if i remove the ARM_C0_FULLPERI flag, then the arm can still access the UART, but it cant access the clock

#

that seems like some major isolation

#

perfect for a hostile application

kindred raptor
#

Yeah, my guess is that it was a preexisting IC. I doubt somebody went and started what sounds like essentially a skunkworks project by designing custom silicon as a first step

#

:P

dusk fiber
#

yep

#

the arm core being added, was a broadcom employee idea, and they later became an rpi engineer

kindred raptor
#

TIL debian 12 was released 2 months ago.

#

I must dist-upgrade

#

</offtopic>

#

Re: Your AXI woes: What if you asked in the raspi forums?

dusk fiber
#

ive asked the rpi engineers repeatedly, as to why i'm getting slave error

#

the answer is always power domains, "the hvs isnt on"

#

ssuuuurrreeee, thats definitely offf

#

and as usual, they only post a single reply to the thread, and then go silent

kindred raptor
#

maybe there's something specific you need to do to fully bring it up?

dusk fiber
#

not sure what else there would be to turn on

kindred raptor
#

AXI interconnect related?

dusk fiber
#

the arm can read any hvs register

#

so its already got a path

kindred raptor
#

hmm... I guess there's some permissions register?

dusk fiber
#

yep

#

so, i have ~3 solutions to this

#

1: keep searching, until i find the magic permission register
2: implement an RPC like DSI has, so the linux driver can relay every write thru the firmware
3: implement my own fkms, and use my custom display stack

#

2 has the best changes of getting hdmi

#

but 2&3 mean you need a custom build of linux to get the new drivers

#

but now that PLLH is up, i could take another stab at hdmi bringup

#

i have an old platform/bcm28xx/vc4-hdmi i started, what happens if i turn that on....

#

fixed the compile errors, and nothin, but thats to be expected

#

it only writes to 3 registers

#

now i get to read this, and try to figure out the hw....

#

so most of the hdmi runs off the HSM clock, which ive set to 100mhz

#

but the PHY directly uses PLLH_PIX

#

found a nice reset function, time to implement!

dusk fiber
#

vc4_hdmi_set_timings() looks interesting

#

ok, now what timings does my display actually need

#

boots stock raspios

#
root@raspberrypi:/sys/kernel/debug/dri/0# cat hdmi_regs 
             HDMI_VERTA0 = 0x00302400
             HDMI_VERTA1 = 0x00302400
             HDMI_VERTB0 = 0x00000026
             HDMI_VERTB1 = 0x00000026
              HDMI_HORZA = 0x00006500
              HDMI_HORZB = 0x0f81c030
#

the raw register values

#
root@raspberrypi:/sys/kernel/debug/dri/0# cat state
crtc[101]: pixelvalve-2
        mode: "1280x1024": 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x68 0x5
#

the modeline

#

the timing from EDID

#

VERTA: 0x302400

#

yep, matches

#

VERTB: 0x26
good

#
/* Horizontal pack porch (htotal - hsync_end). */
# define VC4_HDMI_HORZB_HBP_MASK
#

somebody made the same typo, repeatedly, in the linux src

#
HORZA: 0x6500
HORZB: 0xf81c030
#

i now have all of the timing params, and can program the hdmi block...

#

registered programmed, but no image

#

vc4_hdmi_encoder_pre_crtc_configure() calls what i just implemented, and the other things it does are likely useful

#
        /*   
         * As stated in RPi's vc4 firmware "HDMI state machine (HSM) clock must
         * be faster than pixel clock, infinitesimally faster, tested in
         * simulation. Otherwise, exact value is unimportant for HDMI
         * operation." This conflicts with bcm2835's vc4 documentation, which
         * states HSM's clock has to be at least 108% of the pixel clock.
         *
         * Real life tests reveal that vc4's firmware statement holds up, and
         * users are able to use pixel clocks closer to HSM's, namely for
         * 1920x1200@60Hz. So it was decided to have leave a 1% margin between
         * both clocks. Which, for RPi0-3 implies a maximum pixel clock of
         * 162MHz.
         *
         * Additionally, the AXI clock needs to be at least 25% of
         * pixel clock, but HSM ends up being the limiting factor.
         */
#

@kindred raptor a bit juicy comment that nicely explains things!

kindred raptor
dusk fiber
#

trying to

#
> (48+112+248+1280) * (1+3+38+1024)
1,799,408
> (48+112+248+1280) * (1+3+38+1024) * 60.02
108,000,468.16000001
#

each frame has ~1.8 million pixels, and at 60.02fps, thats ~108mhz pixel clock

#

which agrees with EDID

#

so, HSM has to be at least 108mhz

#

clock_set_hsm(MHZ_TO_HZ(100), 5);

#

that is not

#

clock_set_hsm(MHZ_TO_HZ(125), PERI_PLLC_PER);

ref: 500000000, target: 125000000, divisor(f): 4.000000, divisor(fixed): 0x4000
] measure_clocks
clock #22(hsm) is 125000000
#

perfect!

#

and the axi clock (core? not sure) needs to be at least 27mhz, easy

#

oh, and interesting....

#

if the axi clock has to be 25% of the pixel clock
and every pixel is 4 bytes
then that would imply its moving 128 bits over axi?

#

but that also implies the axi is being loaded constantly, by every pixel displayed

#

maybe they mean the HVS load....

#

clk_set_rate(vc4_hdmi->pixel_clock, pixel_rate);

#

ok which clock is this.....

#

vc4_hdmi->pixel_clock = devm_clk_get(dev, "pixel");

#

the "pixel" clock in device-tree

#
                        clocks = <&firmware_clocks 9>,
                                 <&firmware_clocks 13>;
                        clock-names = "pixel", "hdmi";
#

which is firmware clock 9

#
                        firmware_clocks: clocks {
                                compatible = "raspberrypi,firmware-clocks";
                                #clock-cells = <1>;
                        };
#

RPI_FIRMWARE_PIXEL_CLK_ID

#

i'm just going to blindly assume, its PLLH_PIX

#

so, if i want that at 108mhz, and the PLL bottoms out at 600

#
> 108*6
648
#

then i need 648mhz and /6

#

that seems to have worked

kindred raptor
dusk fiber
#

ive not encountered ACLK anywhere in the rpi source

#

so i'm not sure which divider its running on

#
        if (pixel_rate > 297000000)
                bvb_rate = 300000000;
        else if (pixel_rate > 148500000)
                bvb_rate = 150000000;
        else
                bvb_rate = 75000000;

        ret = clk_set_min_rate(vc4_hdmi->pixel_bvb_clock, bvb_rate);
kindred raptor
#

Ah, it's just a generic signal name. I doubt you'll find it in the source :P

dusk fiber
#

curious, what is the bvb clock....

kindred raptor
#

It's the term the protocol spec uses (ACLK)

dusk fiber
#

any idea about the BRESP stuff?

kindred raptor
dusk fiber
#

but what about the bresp table?

kindred raptor
#

I guess there's a register with all bresp somewhere?

dusk fiber
#

it writes all of those BRESP values into a single register, with a short delay after each write

kindred raptor
#

Where's this code even from?

dusk fiber
#

lines 33-56

#
#define ARM_C0_BRESP1    0x00000004
#define ARM_C0_BRESP2    0x00000008
#define ARM_C0_BOOTHI    0x00000010
#

you can use these defines, to decode that table

kindred raptor
dusk fiber
#

that runs on the VPU side, before the arm is enabled

kindred raptor
#

Ah, so ARM control0 is not the ARM control register?

dusk fiber
#

its a control register for the "arm peripheral"

#

involved in configuring and turning it on

#

#define ARM_C0_AARCH64 0x00000200
if you set this bit in control0, the arm starts in aarch64 mode

#

if that bit is clear, it starts in armv7 mode

kindred raptor
#

I wonder how they derived that table....

dusk fiber
#

thats a neat compatability thing, bcm2836 era firmware (pi2) could run on a bcm2837 (pi3), and not even be aware of the aarch64 core

#

and it will just run in armv7 mode

kindred raptor
#

Well, it looks like it's... writing stuff to said control register, and updating it based on how the arm core previously responded

dusk fiber
#

i dont think its even caring how the arm responds

#

its just blasting a bit array of 3bit ints at the arm

kindred raptor
#

Then what are ARM_C0_BRESP1?

#

If not the BRESP line from ARM?

dusk fiber
#

my best guess, its an array of pre-defined BRESP answers, for various ranges of memory

#

so if you step out of line, you get an error from this table

#

my theory, is that the arm peripheral will take this list, treating ARM_CONTROL0 like a FIFO, and store it on some internal ram

#

and for every access to ram, it looks up the right slot in this array, and returns the BRESP value in that slot

#

and that then lets the VPU firmware configure what you can and cant do from the arm

#

for example, one index in that array, could be the BRESP for any write to the HVS

kindred raptor
#

ARM_CONTROL0 is defined as a register in the headers

dusk fiber
#

yep

kindred raptor
#

And other things with the prefix ARM_C0 map to things that should be memory-mapped registers

dusk fiber
#
#define ARM_CONTROL0  HW_REGISTER_RW(ARM_BASE+0x000)
#define ARM_C0_SIZ128M   0x00000000
#define ARM_C0_SIZ256M   0x00000001
#define ARM_C0_SIZ512M   0x00000002
#define ARM_C0_SIZ1G     0x00000003
#define ARM_C0_BRESP0    0x00000000
#define ARM_C0_BRESP1    0x00000004
#define ARM_C0_BRESP2    0x00000008
#define ARM_C0_BOOTHI    0x00000010
#

HW_REGISTER_RW will cast the int into a pointer, and then de-reference the pointer

#

so you can just do ARM_CONTROL0 = 0x123 to write to MMIO
and foo = ARM_CONTROL0 to read from MMIO

#

ARM_C0_* are then constants for various flags within that register

#
#define ARM_C0_JTAGMASK  0x00000E00
#define ARM_C0_JTAGOFF   0x00000000
#define ARM_C0_JTAGBASH  0x00000800 // Debug on GPIO off
#define ARM_C0_JTAGGPIO  0x00000C00 // Debug on GPIO on
kindred raptor
#

Ah, nevermind

#

yeah

#

ok

dusk fiber
#

JTAGMASK is all bits in the jtag enum
JTAGOFF disables jtag
JTAGBASH allows bit-banging jtag from the vpu
JTAGGPIO allows jtag on the gpio header

kindred raptor
#

Sorry, I misread the C code

#

And imagined a macro somewhere that did a read from said addresses

dusk fiber
#
// ARM JTAG BASH
//
#define AJB_BASE 0x7e2000c0

#define AJBCONF HW_REGISTER_RW(AJB_BASE+0x00)
#define   AJB_BITS0    0x000000
#define   AJB_BITS4    0x000004
#define   AJB_BITS8    0x000008
...
#define   AJB_ENABLE   0x000800
#define   AJB_HOLD0    0x000000
#define   AJB_HOLD1    0x001000
#define   AJB_HOLD2    0x002000
#define   AJB_HOLD3    0x003000
#define   AJB_RESETN   0x004000
#define   AJB_CLKSHFT  16
#define   AJB_BUSY     0x80000000
#define AJBTMS HW_REGISTER_RW(AJB_BASE+0x04)
#define AJBTDI HW_REGISTER_RW(AJB_BASE+0x08)
#define AJBTDO HW_REGISTER_RW(AJB_BASE+0x0c)
#

this appears to be a hw accelerated jtag bit-banging peripheral

#

so you can wiggle the jtag lines of the arm core directly from the VPU, without involving external pins

kindred raptor
dusk fiber
#

#define HW_REGISTER_RW(addr) (*(volatile unsigned int *)(addr))

#

the missing piece of the magic

#

VC4_HD_VID_CTL_ENABLE

#

that looks like the master hdmi enable flag

#

hidden under vc4_hdmi_encoder_post_crtc_enable()

kindred raptor
#

So, ARM_CONTROL0's 2 bits left to the MSB are the BRESP?

#

According to those flags?

dusk fiber
#

4(0b0100) is BRESP1
8(0b1000) is BRESP2

#

a C from the bresp table, is just 4|8, both

#

so if i assume BRESP1 is BRESP[0] and so on...

#

then 0=OKAY
4=EXOKAY
8=SLVERR
C=DECERR

kindred raptor
dusk fiber
#

could be possible

kindred raptor
#

Say that your previous write to the register was 0x00000010

#

If you had a SLVERR, EXOKAY or DECERR

dusk fiber
#

i did notice, messing with that table has no effect on the pi3

kindred raptor
#

You'd now have 0x00000004, 0x00000008, 0x0000000c

dusk fiber
#

including just not writing the table entirely

kindred raptor
#

So, by or'ing the previous value to the register, it's ensuring those bits get... cleared?

#

This seems completely pointless, since those should be just input lines

#

The VideoCore is the controller

#

Unless what you want to do is write those exact values, for some reason, and just want to clear the register

#

Some magic sequence they came up with?

dusk fiber
#

ive got 2 theories

#

1: its programming a set of constant replies, each for a different chunk of ram

#

2: its forcing the BRESP bits temporarily, to flush any pending transactions

kindred raptor
#

forcing the BRESP
Those are probably input lines from the ARM

#

Unless the videocore is an AXI slave to the ARM

dusk fiber
#

i think its both

#

there is an axi slave port on the arm, for these control registers

#

and there is an axi master port on the arm, for it to do normal things to ram/mmio

#

the arm axi master, goes thru a custom MMU, that mangles some bits of the addr first, and then goes into the main interconnect

#

and the arm axi slave, is always on the interconnect

#

that custom mmu, is configured via the arm axi slave, along with other things

kindred raptor
#

Then this code is not very good, it expects the transactions to come after the delay they have set

#

Does not check them at all

#

Forces the bresp bits

#

Clears the past bresp bits

#

And forces them again

#

Without checking at all what comes in

#

I have no idea with what kind of blind luck they came up with the values

dusk fiber
#

my 2nd theory, is that the arm has outstanding axi transactions, possibly from before the axi master was enabled

#

and it will jam up if it doesnt get a reply

#

so its forcing some fake relies, to flush the pending ones

#

but other axi masters have a proper flush flag

kindred raptor
#

Then why respond with OK, EXOK, OK, EXOK, ...., DECERR

#

pattern?

dusk fiber
#

no clue

kindred raptor
#

And why does 0x1 need to remain high?

#

throughout all of this?

dusk fiber
#

the 0x10 drops low after every batch of 13 writes

#

scroll over to the right more

kindred raptor
#

Indeed

#

But what does 0x10 control

dusk fiber
#

something called BOOTHI

kindred raptor
#

BOOT High?!

#

IDK

#

All I know is that this code is held on blind luck and timing

dusk fiber
#

same

#

and you should see how pullups are configured 😛

kindred raptor
#

I wonder what'd happen if you messed with the PLL before doing that song and dance

dusk fiber
#

it reeks of "oh, we forgot about clock domains"

kindred raptor
#

Does that code affect other pis other than pi3?

dusk fiber
#

ive not tried messing with BRESP on the other models

#
void gpio_apply_batch(struct gpio_pull_batch *batch) {
  for (enum pull_mode mode = 0; mode <=2; mode++) {
    if (batch->bank[mode][0] | batch->bank[mode][1]) {
      *REG32(GPIO_GPPUD) = mode;
      udelay(500);
      *REG32(GPIO_GPPUDCLK0) = batch->bank[mode][0];
      *REG32(GPIO_GPPUDCLK1) = batch->bank[mode][1];
      udelay(500);
      *REG32(GPIO_GPPUDCLK0) = 0;
      *REG32(GPIO_GPPUDCLK1) = 0;
      *REG32(GPIO_GPPUD) = 0;
      udelay(500);
    }
  }
}
kindred raptor
#

o.O

dusk fiber
#
offset  name
0x94    GPPUD       GPIO pin pull up/down enable
0x98    GPPUDCLK0   GPIO pin pull up/down enable clock 0
0x9c    GPPUDCLK1   GPIO pin pull up/down enable clock 1
kindred raptor
#

is it... pulsing the clock?

dusk fiber
#

GPPUD, master pull-up/pull-down enable, see further notes
GPPUDCLKn, gpio pullup/down clock enable

to change the pullup config:

  • write the desired mode to GPPUD (off=0, down=1, up=2)
  • delay for 150 clock cycles
  • write a 1 to the bits of GPPUDCLKn that correspond to GPIO pins you want to modify the state of
  • delay for another 150 clock cycles
  • write a zero to GPPUD
  • write 0 to GPPUDCLKn
#

yes

kindred raptor
#

Who wrote that piece of doc?

dusk fiber
#

i did

kindred raptor
#

How did you even figure that out :P

dusk fiber
#

other docs

#

and source

kindred raptor
#

Ah, there's official doc?

dusk fiber
#

each bit in GPPUDCLK0 and GPPUDCLK1 seems to directly wire to a 2bit latch, that isnt in any clock domain

#

and GPPUD goes to the input of all 64 latches

#

to set the pulls, you use GPPUD to present the desired mode (off, up, down), to every latch

#

then you use GPPUDCLK0 and GPPUDCLK1 to manually strobe the write enable on some of the latches

dusk fiber
#

those up/down regs, seem to be raw flip-flops, without any clock domain

#

each bit of GPPUDCLK0 is driving the clock of each flip-flop in the gpio 0-31 range

kindred raptor
#

...ok.

#

I wonder why they didn't just give them the same clock as the CPU and use a clock enable pin

dusk fiber
kindred raptor
#

Like Normal People would :P

dusk fiber
#

official doc, from the datasheet

#

and from those bit values, i feel like bit0 is directly linked to the pull-down flipflop
and bit1, the pullup flipflop

#

and 11 would set both pulls and start a fight!

dusk fiber
# kindred raptor Like Normal People would :P

they did fix things in the pi4:

for the bcm2711/rpi4:
pullup control signal is 2 bits wide (mask of 0x3)
the register offset (which register) is gpio/16
the bit-shift within that register is (gpio % 16) * 2

looks to be AVR style, just a giant block of bits, just like the function select ones
2 bits per pin, off=0, up=1, down=2, 16 pins per 32bit reg, 4 registers in total?

#

the 2711 datasheet

#

the other major difference, is that 283x pulls, are write-only

#

but the bcm2711 pulls, are r/w

#
] gpio_dump_state
GPIO00   IN  HIGH | HIGH    IN GPIO32
GPIO01   IN  HIGH | HIGH    IN GPIO33
GPIO02   IN  HIGH |  LOW    IN GPIO34
GPIO03   IN  HIGH | HIGH    IN GPIO35
GPIO04   IN  HIGH | HIGH    IN GPIO36
GPIO05   IN  HIGH | HIGH    IN GPIO37
GPIO06   IN  HIGH | HIGH    IN GPIO38
GPIO07   IN  HIGH | HIGH    IN GPIO39
GPIO08   IN  HIGH |  LOW    IN GPIO40
GPIO09   IN   LOW |  LOW    IN GPIO41
GPIO10   IN   LOW |  LOW  ALT0 GPIO42
GPIO11   IN   LOW |  LOW    IN GPIO43
GPIO12   IN   LOW | HIGH    IN GPIO44
GPIO13   IN   LOW | HIGH    IN GPIO45
GPIO14 ALT0  HIGH | HIGH    IN GPIO46
GPIO15 ALT0  HIGH | HIGH    IN GPIO47
GPIO16   IN   LOW |  LOW    IN GPIO48
GPIO17   IN   LOW |  LOW    IN GPIO49
GPIO18   IN   LOW |  LOW    IN GPIO50
GPIO19   IN   LOW |  LOW    IN GPIO51
GPIO20   IN   LOW |  LOW    IN GPIO52
GPIO21   IN   LOW |  LOW    IN GPIO53
GPIO22   IN   LOW |  LOW       GPIO54
GPIO23   IN   LOW |  LOW       GPIO55
GPIO24   IN   LOW |  LOW       GPIO56
GPIO25   IN   LOW |  LOW       GPIO57
GPIO26   IN   LOW |  LOW       GPIO58
GPIO27   IN   LOW |  LOW       GPIO59
GPIO28   IN   LOW |  LOW       GPIO60
GPIO29  OUT  HIGH |  LOW       GPIO61
GPIO30   IN   LOW |  LOW       GPIO62
GPIO31   IN   LOW |  LOW       GPIO63
#

i also have this debug cmd, that can print every pin state

kindred raptor
dusk fiber
#

on the bcm2711, it also has an arrow, for current pull direction

#

the official datasheet for the 2711

kindred raptor
dusk fiber
#

the 2835 datasheet

kindred raptor
#

ah ok

#

found it

dusk fiber
#

both datasheets are guilty of lying by omission

#

2835 claims gpio 0-27 alt2, is a reserved mode

#

the datasheet entirely omits gpio 0-15 alt3/alt4
the wiki says alt3 is AVEOUT, and alt4 is AVEIN

#

which appears to be a 12bit parallel video in/out port

#

the datasheet claims gpio 46-53 alt0 are "internal"
the wiki reveaps that 46/47 alt0 are i2c
and 48-53 alt0 are an SD interface

#

all very useful things to know

#

hmmm, i wrote to all of the hdmi control regs, i think its enabled, but my monitor is having zero reaction to it

#
1584         vc4_hdmi_recenter_fifo(vc4_hdmi);
1585         vc4_hdmi_enable_scrambling(encoder);
#

enless, its one of these?

#

recenter never finishes

#

feels like a clock is missing

#

HDMI_FIFO_CTL: 0x64627573

#

bit 5, fifo reset, is still set...

#
> Buffer("64627573","hex").toString("ascii")
'dbus'
#

wait, thats not right

#
] hexdump 0x7e808000 128
0x7e808000 69 6d 64 68 69 6d 64 68  69 6d 64 68 f0 00 00 00  |imdhimdhimdh....|
0x7e808010 69 6d 64 68 20 04 00 00  01 01 01 01 00 00 00 00  |imdh ...........|
0x7e808020 69 6d 64 68 00 00 00 00  69 6d 64 68 00 00 00 00  |imdh....imdh....|
0x7e808030 69 6d 64 68 69 6d 64 68  00 00 00 c0 69 6d 64 68  |imdhimdh....imdh|
0x7e808040 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x7e808050 00 00 00 00 00 00 00 00  00 00 00 00 69 6d 64 68  |............imdh|
0x7e808060 69 6d 64 68 69 6d 64 68  a5 01 00 00 69 6d 64 68  |imdhimdh....imdh|
0x7e808070 69 6d 64 68 69 6d 64 68  69 6d 64 68 69 6d 64 68  |imdhimdhimdhimdh|
#
] hexdump 0x7e902000 128
0x7e902000 73 75 62 64 73 75 62 64  73 75 62 64 73 75 62 64  |subdsubdsubdsubd|
0x7e902010 73 75 62 64 73 75 62 64  73 75 62 64 73 75 62 64  |subdsubdsubdsubd|
0x7e902020 73 75 62 64 73 75 62 64  73 75 62 64 73 75 62 64  |subdsubdsubdsubd|
0x7e902030 73 75 62 64 73 75 62 64  73 75 62 64 73 75 62 64  |subdsubdsubdsubd|
0x7e902040 73 75 62 64 73 75 62 64  73 75 62 64 73 75 62 64  |subdsubdsubdsubd|
0x7e902050 73 75 62 64 73 75 62 64  73 75 62 64 73 75 62 64  |subdsubdsubdsubd|
0x7e902060 73 75 62 64 73 75 62 64  73 75 62 64 73 75 62 64  |subdsubdsubdsubd|
0x7e902070 73 75 62 64 73 75 62 64  73 75 62 64 73 75 62 64  |subdsubdsubdsubd|
#

@kindred raptor aha, this is what it looks like, when an axi slave is disabled!

#

writes silently ignored, reads return a 32bit constant

#

and because of LE vs BE

the first block is hdmi
the second block is dbus

kindred raptor
dusk fiber
#

probably

#

so the problem then, is that hdmi is still disabled at some level

#
] hexdump 0x7e910000 128
0x7e910000 80 00 00 08 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x7e910010 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x7e910020 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x7e910030 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x7e910040 00 00 00 00 00 00 00 00  30 65 76 61 30 65 76 61  |........0eva0eva|
0x7e910050 30 65 76 61 30 65 76 61  30 65 76 61 30 65 76 61  |0eva0eva0eva0eva|
0x7e910060 30 65 76 61 30 65 76 61  30 65 76 61 30 65 76 61  |0eva0eva0eva0eva|
0x7e910070 30 65 76 61 30 65 76 61  30 65 76 61 30 65 76 61  |0eva0eva0eva0eva|
#

it will also return that 32bit constant, for any undefined register

#

in here, i can see an 0x08000080 at the first slot, a bunch of nulls, and then ave0 repeating

#

other headers refer to this as AVE_IN_BASE, which implies its part of the 12bit parallel video capture interface

dusk fiber
#

back

dusk fiber
#

had an idea, on another thing to probe

#

do the same hexdump, under linux and the official firmware

#

and see how the hdmi block differs

#
root@raspberrypi:~# /home/clever/rpi-tools/utils/ramdumper -m -a 0x3f808000 -l 128
starting at 0x3f808000 (1016MB)
0x3f808000 69 6d 64 68 69 6d 64 68  69 6d 64 68 01 02 00 00  |imdhimdhimdh....|
0x3f808010 69 6d 64 68 20 04 00 00  01 01 01 01 00 00 00 00  |imdh ...........|
0x3f808020 69 6d 64 68 00 00 00 00  69 6d 64 68 00 00 00 00  |imdh....imdh....|
0x3f808030 69 6d 64 68 69 6d 64 68  00 00 00 c0 69 6d 64 68  |imdhimdh....imdh|
0x3f808040 20 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ...............|
0x3f808050 00 00 00 00 00 00 00 00  00 00 00 00 69 6d 64 68  |............imdh|
0x3f808060 69 6d 64 68 69 6d 64 68  04 54 00 00 69 6d 64 68  |imdhimdh.T..imdh|
0x3f808070 69 6d 64 68 69 6d 64 68  69 6d 64 68 69 6d 64 68  |imdhimdhimdhimdh|
#

yep, i can read that first block as before

#
root@raspberrypi:~# /home/clever/rpi-tools/utils/ramdumper -m -a 0x3f902000 -l 1024
starting at 0x3f902000 (1017MB)
0x3f902000 00 06 00 00 00 00 00 00  06 00 00 00 00 00 00 00  |................|
0x3f902010 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x3f902020 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x3f902030 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x3f902040 00 00 00 00 00 00 00 00  00 00 00 80 00 34 d0 9c  |.............4..|
0x3f902050 00 10 00 00 80 00 13 00  00 00 00 00 41 40 00 00  |............A@..|
0x3f902060 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x3f902070 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x3f902080 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0x3f902090 88 c6 fa 00 03 00 00 00  00 00 00 00 03 04 00 21  |...............!|
0x3f9020a0 00 00 00 00 00 00 00 00  00 00 00 08 00 00 00 00  |................|
0x3f9020b0 00 00 00 00 f8 24 01 01  f8 24 01 01 83 00 00 00  |.....$...$......|
0x3f9020c0 28 b0 0c 00 00 65 00 00  30 c0 81 0f 00 24 30 00  |(....e..0....$0.|
0x3f9020d0 26 00 00 00 00 24 30 00  26 00 00 00 00 00 00 00  |&....$0.&.......|
0x3f9020e0 00 00 00 00 00 00 00 00  fc ff 00 00 d5 63 8d 50  |.............c.P|
0x3f9020f0 5c 6f 82 96 be c3 d4 ea  f5 ff 4c 00 00 00 00 00  |\o........L.....|
#

but the hdmi block, returns proper data, not dbus

dusk fiber
#

@kindred raptor the VPU is a dual-core processor, with L1 and L2 caches
it has 32 scalar registers, of 32bits each
and 8x8x8bit of vector registers

#

much like arm, the top few scalar registers have a special purpose
r25 is sp
r26 is lr (link register)
r30 is the status register
r31 is the pc

kindred raptor
#

(Actually, even the nintendo 64 MIPS CPU thing had a cache. I do not know why I expected the videocore no to have one. lol.)

dusk fiber
#

out of reset, it will be executing a maskrom

#

that rom will zero out the entire L2 cache using vector writes

kindred raptor
#

efficient bringup ROM :P

dusk fiber
#

i think its due to the cache-as-ram hack

#

if you write to an entire cacheline at once, the write just goes into the cache, and never bothers dram

#

if you get a cache hit,(read or write) it just works, and never bothers dram

#

if you cache miss, it goes to dram, and 💥 wheres the ram?? 😄

#

to prevent 💥 , you need to pre-fill the entire L2 cache with nulls

#

so when a read happens anywhere, its a cache hit

#

that rom will then try to load bootcode.bin into the L2 cache, from one of ~8 sources

#

1/2/4 are all bootcode.bin on fat on SD
but they differ in which sd controller is used, and if its 4bit or 8bit mode

#

3 is raw nand flash

#

5 is SPI flash

#

6 on the pi0-pi2, is usb-device
6 on the pi3/pi02, can be device, host, or both

#

7 is i2c-slave

dusk fiber
#

@kindred raptor the bcm2835 is weird, in that the arm lacks a dedicated L2 cache!

so the VPU L2 cache is sort of loaned to the arm

dusk fiber
#

somewhat, you still had to manage the arm L1

#

and there was more latency, vs a proper arm L2

#

linux has also never been aware of that VPU L2 cache

#

its just transparently treated as normal ram

dusk fiber
#

@leaden junco let me re-read the bcm2711 datasheet on interrupts, and see what it says, and if i missed something....

kindred raptor
#

(👋)

dusk fiber
#

so, top-left, we have the arm generic timer, and its 5 timer irq's, repeated for each core

kindred raptor
#

Is this the peripherals datasheet?

dusk fiber
#

yep

#

then we have the ARM_LOCAL block, which i think is a broadcom custom peripheral, that is only visible to the arm, and it has 19 irq's

#

then we have the ARMC block, which has the VPU<->ARM mailboxes, and some of the legacy irq stuff from the bcm2835

#

then you have 62 peripheral interrupts from the VPU
plus 2 irq's from the ethernet/pcie

#

and then 57??? irq's from the ethernet/pcie, lol

#

all of that funnels into the gic, which masks/routes things, and generates an IRQ+FIQ pair for each core

#

things like irq 57 in this chart, are what ruin micro-kernels on the bcm2711

#

you cant route ttyAMA0 to one process, and ttyAMA1 to another process using standard GIC code

#

you have to taint your micro-kernel and make it a little less micro, by adding support for the non-standard irq handling

#

it kind of defeats the point of using a GIC, and shows that broadcom didnt fully trust the GIC when designing the chip

#

ah, and this is the "57" interrupts from ethernet/pcie

#

4 to emulate legacy pci
1 for pcie msi
2 for ethernet
1 for the internal xhci i think?
no clue what avs is

#

that secure ethernet irq is fishy

#

it smells of IPMI

#

The secure IRQ output (which is only useful for the VPU and not the CPU) from the ETH_PCIe block is routed to VC
peripheral IRQ 63

#

like the VPU can configure genet to send certain packets to the VPU, and the arm/vpu can share the genet

#

it then sort of repeats the info, but this time using GIC lingo

#

the legacy irq/fiq and all of the per-core timers, get routed to PPI's (per processor interrupts)

#

the PMU (profiling) interrupts are not routed to PPI's, so the GIC can decide which core to interrupt when core0 has had too many L1 misses

#

which can limit the taint on the profiling data

#

and then all of the other irq sources are routed to normal SPI's

#

so this table is required, if you want to configure the gic correctly (if you where writing/porting a kernel)

#

@leaden junco i think i see where my misunderstanding started

the legacy irq controller, with its limited irq routing, is itself an irq source for the gic!

but the gic also gets all of the interrupts directly, and can then route those better

leaden junco
#

Great! So it is actually re-routing the interrupts and not just showing as such

#

Follow up question: how many PCIe interrupts SoC have? the MSI interrupt I think

dusk fiber
#

the 4 legacy inta/intb/incd/intd from pci
plus a single msi interrupt

to figure out what the msi meant, you would then have to ask the pcie controller

leaden junco
#

I'm wondering if spreading the PCIe interrupts can have some performance increase for NVMe array

dusk fiber
#

the gic treats MSI as a single interrupt source

leaden junco
#

Ah ok

dusk fiber
#

so all MSI's have to go to a single cpu core

leaden junco
#

Because if I attach the NVMe drive I can see it is creating four queue interrupts

dusk fiber
#

in theory, broadcom could have exposed the gic to pci-e dma, and then gic MSI's could have been used

#

i dont know if that works

#

oh yeah, thats a thing nvme does, each cpu core gets its own command/reply queue

#

so different cores can issue commands, without a global mutex

#

and thanks to my new 32core desktop, /proc/interrupts is now unreadable, lol

leaden junco
#

But the catch here is that on RPI they are all using CPU0

#

which, make no sense to create multiple queue

dusk fiber
#

on my 8 core laptop, i can see that the are allocated a bit strangely

#
 125:          0          0    3067684          0          0          0          0          0   PCI-MSI 2097152-edge      nvme0q0
 126:          0          0          0          0  128860067          0          0          0   PCI-MSI 2097153-edge      nvme0q1
 127:          0   63475567          0          0          0          0          0          0   PCI-MSI 2097154-edge      nvme0q2
 128:          0          0   63272082          0          0          0          0          0   PCI-MSI 2097155-edge      nvme0q3
 129:          0          0          0   63109892          0          0          0          0   PCI-MSI 2097156-edge      nvme0q4
 130:          0          0          0          0          0   63635747          0          0   PCI-MSI 2097157-edge      nvme0q5
 131:          0          0          0          0          0          0   63428409          0   PCI-MSI 2097158-edge      nvme0q6
 132:          0          0          0          0          0          0          0   60445457   PCI-MSI 2097159-edge      nvme0q7
leaden junco
#

I was messing around the kernel code to see if there is any performance different

#

I guess the only issue I see is nvme0q0 is on the same CPU as nvme0q3

dusk fiber
#

queues 0 and 3 go to core 2
queue 1 goes to core 4
queue 2 goes to core 1
queue 4 goes to core 3
queue 5 goes to core 5
queue 6 goes to core 6
queue 7 goes to core 7

#

the multiple queues serves 2 purposes

#

1: you can issue a command to the nvme, by just turning interrupts off, and writing to the queue for your current core, no need to get any locks

leaden junco
dusk fiber
#

2: the reply goes back to the core that just scheduled the job, so the L1 cache is ready to resume whatever just asked for the data

leaden junco
#

This is the forum page about the topics

dusk fiber
#

your loosing 2, but not 1

kindred raptor
dusk fiber
#

no clue

#

just my laptop being weird

dusk fiber
kindred raptor
dusk fiber
#

so you can see how simple it is to use

kindred raptor
#

hos about the C part?

dusk fiber
#

only 2 functions have any real logic, one to generate the display list for a sprite, and one to copy it into the hw

#

the rest is all just allowing properties to flow both ways

#

and in theory, it can handle dual-monitor on the entire vc4 lineup

kindred raptor
#

Yeah, I recognized the display list stuff

dusk fiber
#

the biggest implementation difference from little-kernel, is the active sprite list

#

LK has its own z-order sorted array, with functions to add/remove sprites, which lets multiple modules share the hw seamlessly

#

while the circuitpython version, just expects a python list of sprites, and python code must keep track

#

circuitpython is far more single-threaded, so it doesnt really need that flexibility, and you can always add it easily in python

#

and it made the resulting code massively simpler

#

it also helped to not have any legacy code, and to basically start from a clean slate, with everything i learned from the previous version

#

tomorrow, i need to look into TileGrid objects, and see if they can fully replace my custom Sprite class

leaden junco
#

@dusk fiber A dummmm question, do you know what are the reserved memory block at
2eff2000-2effffff : reserved
and
01150000-0154ffff : reserved
01550000-0186ffff : Kernel data

are used for?

dusk fiber
#

heh, i just happen to be writing a 2-3 page forum post on that!

dusk fiber
leaden junco
#

Lol

#

Yeap

dusk fiber
#

information overload time! 😄

#

from a design perspective, i would also say the VPU's reloc heap is better then linux's CMA heap

#

and linux could be improved to better utilize the CMA

#

are you aware of how both cma and the reloc heap work?

leaden junco
#

I treat CMA as a continues block so if you really need you can request without trying to move the pages.
But I'm not sure how VPU's heap work

#

hmm did you have arm_peri_high=1 in the config.txt

dusk fiber
#
root@pi400:~# grep arm_peri /boot/config.txt 
arm_peri_high=1
#

yep

#

for the linux CMA, its a special region that can only have CMA or movable pages, so when linux does need that contiguous block, it can move/push things out of the way, and allocate a large slice, like your 7mb dma buffer

leaden junco
#

In the git issue

"OK, it seems to work. I tested ethernet, USB 3, and display output.

However, the firmware still doesn't report the last 64M of RAM as usable:"

"That makes sense. The code that generates the contents of the memory node is unaware of the arm_peri_high flag, so is always carving out the final 64MB for the peripherals. That's an easy fix."

#

So I think the 64M with arm_peri_high=1 is going to fix soon

dusk fiber
#

ah, that sounds like a bug, but the VPU firmware still needs some of that ram

leaden junco
#

Also I wonder if libcamera can still work with arm_peri_high=1

dusk fiber
#

it should

leaden junco
#

Because that memory address is sent to VPU to write buffer

dusk fiber
#

only the MMIO window moves, and device-tree automates all of that

#

the dma is still in the lower 1gig

leaden junco
#

Ahhh I see, it is weird that my config file doesn't have the arm_peri_high=1 settings

dusk fiber
#

i think with linux cma, once you allocate an object, you permanently have that chunk of ram, until you free the object

#

and to improve the chances of allocating well, it might over-allocate/align, so your 7mb buffers might turn into 8mb buffers

#

without that, you can get free space fragmentation

#

where you have 8 holes, all 1mb in size, but no 8mb hole

#

this is where the VPU's relocatable heap comes in to save the day (palmos also had the same feature)

#

all objects are movable, even dma objects!

leaden junco
#

what is VPU's relocatable heap and how it works?

dusk fiber
#

this api and the few that follow give a rough idea

#

first, you have to allocate some memory, and you get a handle back

#

when you want to access the memory, you must call the lock function to get its current physical addr

#

then you can read/write, or do dma

#

when your done, call unlock, and never touch that address again

#

any unlocked object can be freely moved, to defrag the free space

leaden junco
dusk fiber
#

if you want to read the buffer, call lock again, to learn its new address, so some access, and then unlock

#

with this command, you can dump the entire relocatable heap

leaden junco
# dusk fiber and to improve the chances of allocating well, it might over-allocate/align, so ...

Yeap
[13229.767017] cma: cma_alloc(cma 00000000c236aafb, count 1904, align 8)
[13229.767229] cma: cma_alloc(): returned 00000000dd0e929c
[13229.770434] cma: cma_alloc(cma 00000000c236aafb, count 1904, align 8)
[13229.770581] cma: cma_alloc(): returned 00000000527e6c2a
[13229.773537] cma: cma_alloc(cma 00000000c236aafb, count 1904, align 8)
[13229.773702] cma: cma_alloc(): returned 000000001ecc5d6d
[13229.776005] cma: cma_alloc(cma 00000000c236aafb, count 1904, align 8)
[13229.776133] cma: cma_alloc(): returned 00000000c191821f

#

I think it is aligning to 4K pages

dusk fiber
#

in my case, you can see a 39mb hole
a 512 byte object
a ~24kb object
a bunch of 740 byte objects
and more...

#

it doesnt need to force alignment on anything, and can just move things to defrag the free space

dusk fiber
#

there it is

#

Generally, pointers will be sanitized when kernel.kptr_restrict is non-zero.
(from a stack overflow answer)

leaden junco
#

Ah I was looking at the count

dusk fiber
#

try setting that to 0 in /boot/cmdline.txt, reboot, and check dmesg for cma_alloc again

#

basically, linux is protecting itself against various exploits
so its censoring all pointers in dmesg, by hashing them

#

makes debug impossible, but also makes certain exploits impossible

leaden junco
#

But I'm also trying to use cma_debug_show_areas()

#

There is a debug function in cma.c

#

but like.... no examples on how to use it

dusk fiber
#

000000001ecc5d6d just screams red flags, because it claimed to be 8 aligned, but its not aligned

leaden junco
#

I think cma allocating memory is fine, but just want to really maximize the aread

dusk fiber
#

there is also one other trick you can do

leaden junco
#

Which is very interesting:
# dmesg | grep CMA [ 0.000000] Reserved memory: created CMA memory pool at 0x0000000056000000, size 960 MiB

dusk fiber
#

you have 7gig of ram the vpu/gpu cant access

#

have a dedicated thread, that just copies frames from cma to normal ram, as fast as possible

#

dont process anything

#

the sooner you get it out of cma, the better

leaden junco
#

Yes but.....

#

This is the catch, the memory is not cached so memcpy is slow

dusk fiber
leaden junco
#

I think it is, based on the blog

dusk fiber
dusk fiber
#

but checking the dt he posted, i see the problem

#

he doesnt have the address range defined properly

#

so linux is allocating the cma pool beyond the lower 1gig

#

where dma cant reach

#

ive also heard that the cmdline.txt method breaks the same way, because it doesnt support the range definition

dusk fiber
#

if CONFIG_CMA_DEBUG is enabled, that function will do things
if disabled, that function is a no-op...

leaden junco
dusk fiber
#

cma_alloc() will call it automatically, but only if an allocation fails, and warnings are allowed

leaden junco
#

So that even if he is able to allocate 960M that starting address won't work for libcamera

dusk fiber
#

@no_warn: Avoid printing message about failed allocation

leaden junco
#

Oh I did recompile the kernel with CMA_DEBUG

dusk fiber
dusk fiber
leaden junco
dusk fiber
#

you could move the call to cma_debug_show_areas() outside of that if condition

#

then it will print on every allocation, successful or not

leaden junco
#

Ah actually yes let me try that

dusk fiber
#

that lets you access the entire 8gig range of the system

#

problem is, accessing them, i dont think linux just exposes a usable api

leaden junco
dusk fiber
#

possibly

#

follow that, but keep in mind that high-peri has moved the mmio, and read the bcm2711 datasheet to confirm how the 2711 differs

leaden junco
#

Got it, for now I'll leave the high-peri mode off

dusk fiber
#

ive only delt with dma on the vc4 era SoC's so far (pi0-pi3), mainly for pwm audio

leaden junco
#

I don't really see the benfict of high-peri though

#

like CMA still limited to the lower 1G anyways

dusk fiber
#

the main benefit, is that you get another ~64mb of memory in the lower 1gig

leaden junco
#

I guess... yeah If I really want to maximize the area

dusk fiber
#
root@pi400:~# cat /proc/iomem 
4 7e20 1000-47e2011ff : serial@7e201000
#

here, you can see that my PL011 uart (ttyAMA0) is at 0x4_7e20_1000

#

which is just the raw VPU bus addr (7e) with a 0x4_0000_0000 offset, that can only be reached in 64bit mode

#

but, if i turn high peri off, and reboot...

#

fe201000-fe2011ff : serial@7e201000

dusk fiber
#

the MMIO window is at the top of the 32bit space, just below 4096mb

#

so its reachable by any 32bit kernel, but its not actually in the way of the 1gig window

#

so, its more about arm memory, then the 1gig window

#

40000000-fbffffff : System RAM with low-peri

#

40000000-ffffffff : System RAM with high-peri

#

yep, bingo, i'm now short 64mb of ram

#
root@pi400:~# free -m
              total        used        free      shared  buff/cache   available
Mem:           3807         169        3302          37         334        3531
Swap:            99           0          99

with low peri

#
pi@pi400:~ $ free -m
              total        used        free      shared  buff/cache   available
Mem:           3870         165        3388          23         317        3613
Swap:            99           0          99

high peri

#

63mb difference in total memory, lets assume its just a rounding error

dusk fiber
#

but a lot of drivers assume the cache is just always coherent (like x86, and other arm boards), so they are missing that code

#

checking the source, cma_alloc() and cma_release() are the main interface, and there is nothing for actually mapping it

#

which makes sense, other things can map it after allocating

#

oh, interesting, mm/cma_debug.c

leaden junco
#

So I can see two other then camera are using CMA, ethernet and NVMe for HMB

#

which makes sense

dusk fiber
#

anything in /sys/kernel/debug about cma?

#

all i can find is:

root@pi400:/sys/kernel/debug/dma_buf# cat bufinfo 

Dma-buf Objects:
size            flags           mode            count           exp_name        ino     
03145728        00000000        00080005        00000005        drm     00021531
        Exclusive fence: drm_sched v3d_render signalled
        Attached Devices:
        47ec00000.v3d
Total 1 devices attached


Total 1 objects, 3145728 bytes
root@pi400:/sys/kernel/debug/dma_buf# 

because i havent enabled cma debug

leaden junco
#

Nope

#

There is also a sysfs interface

#

but that only gives you the successful and failed alloc count

#

pi@camera6:~ $ cat /sys/kernel/mm/cma/reserved/alloc_pages_fail
0
pi@camera6:~ $ ls /sys/kernel/mm/cma/reserved/
alloc_pages_fail alloc_pages_success

dusk fiber
#

oh, interesting, the debugfs api, lets you just directly alloc and release

#

with just echo alone, no proper c api

#

oh, i should maybe start at libcamera or unicam...

#

do you remember/know where the dma_buf gets allocated again? i forgot

leaden junco
#

[ 58.622167] cma: number of available pages: 13@9331+29@9411+2@9470+10@11158+2@11198+2@11230+2@11262+20@11564+2@11614+2@11646+2@11678+2@11710+2@11742+2@11774+20@12076+2@12126+2@12158+2@12190+2@12222+34@12254+144@14192+144@16240+144@18288+144@20336+144@22384+144@24432+144@26480+144@28528+144@30576+144@32624+144@34672+144@36720+144@38768+144@40816+144@42864+144@44912+144@46960+144@49008+144@51056+144@53104+144@55152+144@57200+144@59248+144@61296+144@63344+144@65392+144@67440+144@69488+144@71536+144@73584+144@75632+212@76076+107732@76588=> 112562 free of 184320 total pages

#

with 14 x 2 buffer for libcamera too

dusk fiber
#

vc4_bo.c in the 2d side, vc4_free_object() will check if its an imported object (a gem object pointing to a buffer something else made)

#

if it was imported, it will just destroy the gem object (the wrapper around another type of buffer)

#

if it wasnt imported, it goes into a cache, so the 2d subsystem can reuse it without a free/alloc sequence

leaden junco
#

libcamera\src\libcamera\pipeline\raspberrypi\dma_heaps.cpp

#

`UniqueFD DmaHeap::alloc(const char *name, std::size_t size)
{
int ret;

if (!name)
    return {};

struct dma_heap_allocation_data alloc = {};

alloc.len = size;
alloc.fd_flags = O_CLOEXEC | O_RDWR;

ret = ::ioctl(dmaHeapHandle_.get(), DMA_HEAP_IOCTL_ALLOC, &alloc);
if (ret < 0) {
    LOG(RPI, Error) << "dmaHeap allocation failure for "
            << name;
    return {};
}

UniqueFD allocFd(alloc.fd);
ret = ::ioctl(allocFd.get(), DMA_BUF_SET_NAME, name);
if (ret < 0) {
    LOG(RPI, Error) << "dmaHeap naming failure for "
            << name;
    return {};
}

return allocFd;

}`

dusk fiber
#

and where did dmaHeapHandle_ come from?

#
pi@pi400:~ $ ls -l /dev/dma_heap/
total 0
crw-rw---- 1 root video 253, 1 Sep  4 02:40 linux,cma
crw-rw---- 1 root video 253, 0 Sep  4 02:40 system
#

ah, one of these, thats what i was looking for

#

so basically, you open linux,cma, and then you can issue a DMA_HEAP_IOCTL_ALLOC to allocate a dma_buf within the CMA

#

and you pass it a pointer to a dma_heap_allocation_data to describe the allocation request

#
drivers/dma-buf/dma-heap.c:     struct dma_heap_allocation_data *heap_allocation = data;
include/uapi/linux/dma-heap.h: * struct dma_heap_allocation_data - metadata passed from userspace for
include/uapi/linux/dma-heap.h:struct dma_heap_allocation_data {
include/uapi/linux/dma-heap.h: * Takes a dma_heap_allocation_data struct and returns it with the fd field
include/uapi/linux/dma-heap.h:                                struct dma_heap_allocation_data)
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c:     struct dma_heap_allocation_data data = {
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c:     struct dma_heap_allocation_data_smaller {
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c:                               struct dma_heap_allocation_data_smaller);
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c:     struct dma_heap_allocation_data_bigger {
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c:                               struct dma_heap_allocation_data_bigger);
#

which only shows up in 3 places in linux
1: the implementation
2: the headers
3: a tool for testing it

leaden junco
#

So what I read is that "buffers allocated using the CMA dma-heap are cached."

#

Now the question is I need to swap the v4l2 buffer with dma-heap

dusk fiber
#

i think behind the scenes, everything will be dma_buf objects from cma

leaden junco
dusk fiber
#

DmaHeap::alloc() will return the new buf-buf object

leaden junco
#

I should lookin to where is the v4l2 gets its buffer from.... hmm

dusk fiber
#

yeah

#

i was expecting to find that in libcamera, but dont see it immediately

#

./drivers/media/platform/bcm2835/bcm2835-unicam.c

#
1814         .vidioc_reqbufs                 = vb2_ioctl_reqbufs,
1815         .vidioc_create_bufs             = vb2_ioctl_create_bufs,
1816         .vidioc_prepare_buf             = vb2_ioctl_prepare_buf,
1817         .vidioc_querybuf                = vb2_ioctl_querybuf,
1818         .vidioc_qbuf                    = vb2_ioctl_qbuf,
1819         .vidioc_dqbuf                   = vb2_ioctl_dqbuf,
1820         .vidioc_expbuf                  = vb2_ioctl_expbuf,
#
2695 static const struct vb2_ops unicam_video_qops = {
2696         .wait_prepare           = vb2_ops_wait_prepare,
2697         .wait_finish            = vb2_ops_wait_finish,
2698         .queue_setup            = unicam_queue_setup,
2699         .buf_prepare            = unicam_buffer_prepare,
2700         .buf_queue              = unicam_buffer_queue,
#

vb2_ioctl_create_bufs seems lik the best bet

#

drivers/media/common/videobuf2/videobuf2-v4l2.c:EXPORT_SYMBOL_GPL(vb2_ioctl_create_bufs);

#

vb2_create_bufs

#

vb2_core_create_bufs

#

__vb2_queue_alloc

#

__vb2_buf_mem_alloc

#

call_ptr_memop and now i'm lost

leaden junco
#

I think unicam isn't the one to look at

#

like this should be just calling to VC

dusk fiber
#

687 if (!(q->io_modes & VB2_DMABUF) || !q->mem_ops->attach_dmabuf ||

#

ah

leaden junco
#

and moving the data with CMA buffer

dusk fiber
#

when using libcamera, linux will be driving the unicam peripheral directly

#

and raw bayer frames land in memory entirely under linux's control, with no interaction from the firmware (other then clock setup, and power gating)

#

if you then want to do bayer->yuv, or awb, you have to run thru the ISP, which requires firmware support

#

2983 q->mem_ops = &vb2_dma_contig_memops;

#

aha, unicam just inherits the generic contiguous memory ops

#

vb2_dc_alloc()

#

based on the non_coherent_mem flag, it uses the coherent or non-coherent allocator

#

V4L2_MEMORY_FLAG_NON_COHERENT, that looks like something userland could set!

#
209 
210 static int vb2_dc_alloc_non_coherent(struct vb2_dc_buf *buf)
211 {
212         struct vb2_queue *q = buf->vb->vb2_queue;
213 
214         buf->dma_sgt = dma_alloc_noncontiguous(buf->dev,
#

ah, but then they switch up what its not, and ruin everything, so unicam cant use this path

#
DMA_ATTR_SKIP_CPU_SYNC
----------------------

By default dma_map_{single,page,sg} functions family transfer a given
buffer from CPU domain to device domain. Some advanced use cases might
require sharing a buffer between more than one device. This requires
having a mapping created separately for each device and is usually
#

aha, this sounds like what i said earlier, about cache maintaince!

#

transfering ownership of the buffer between the cpu and device, and keeping the cpu cache in sync

#

but its now 3:42 am, i should get to bed

leaden junco
#

Yeah same for me, I'll look at the stuff you mention! Thanks a lot!