#rpi open firmware
1 messages · Page 1 of 1 (latest)
At work currently. I've been busy.
ah, just ping me in here whenever your free
Well, I'm on a commute now.
for every model of rpi, the VPU always starts executing a maskrom after reset
for bcm2835, that rom is 18384 bytes
for bcm2836, 18384 bytes again, but 2 parts have changed
for bcm2836, 30528 bytes, a complete overhaul
for bcm2711, 14480 bytes, stripped back heavily
the 283{5,6,7} maskroms can all load the next stage from ~8 sources
1st/2nd/4th in the boot order are all bootcode.bin on fat on an SD card
but they differ in which controller (sdhost vs sdhci) and bus width (4bit vs 8bit) is used
3rd is raw NAND flash
5th is SPI flash
6th is the dwc usb controller
bcm283{5,6} (pi0-pi2) only support device mode
bcm2837(pi3, pi02) support both host and device
there is seperate OTP bits to allow host and device
if both host&device are allowed, it will query the OTG_ID pin and decide at runtime
in usb-host mode, it supports the lan9514 NIC found on the rpi, and will then boot from either usb-storage or tftp over the lan9514
7th in the boot order is i2c-slave, you just fire off a giant blob with raw i2c writes, not SMbus style
8 is something called MPHI, but i have no idea what it is
bcm2711 stripped it back massively, to just:
1: recovery.bin on an SD card
2: SPI flash
3: usb-device mode
@twilit oasis that all make sense so far?
Yeah
for the pi0-pi3 lineup, the default way to boot, is with bootcode.bin on sd/usb/tftp
for the bcm2711 lineup, its SPI flash all the way, at this stage
the pi3 maskrom had many usb-host bugs, and being rom, you cant exactly fix them
Honestly SPI flash does simplify some things.
as a hack, they came up with the bootcode.bin only mode of booting
an SD card, with only bootcode.bin, would bypass rom bugs, and be able to usb-host boot the rest of the way
and as a side-benefit, this worked on the pi0-pi2 lineup
the pi4 basically just made that the only way to boot
throw bootcode.bin onto SPI flash, and now its basically the same setup
and you can update it at any time to fix bugs or add features
for the pi0-pi3 family, the official bootcode.bin can detect how it was loaded, and will try to load the next stage from the same source, after it brings dram online
but the special bootcode.bin only mode, means that if it came from SD, and cant find anything, it will try usb-host next
for the bcm2711, the bootcode.bin in SPI will consult the bootconf.txt file in SPI, which has a BOOT_ORDER= string
BOOT_ORDER=0xf4241
this says to try SD first
then usb-storage on the pcie xhci
then network
then usb-storage on pcie xhci again
then loop
thats far better then before, when you had to configure things in write-once OTP
Yeah, although personally I'm partial to config pins.
Hmm interesting
[all]
BOOT_UART=1
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
DHCP_TIMEOUT=45000
DHCP_REQ_TIMEOUT=4000
TFTP_FILE_TIMEOUT=30000
TFTP_IP=192.168.2.15
TFTP_PREFIX=0
SD_BOOT_MAX_RETRIES=3
NET_BOOT_MAX_RETRIES=5
gpio=21=ip,pu
[gpio21=0]
BOOT_ORDER=0x3
[gpio21=1]
BOOT_ORDER=0x5241
[none]
FREEZE_VERSION=1```
gpio=21=ip,pu changes pin 21 to input with pullup
[gpio21=0] means the following statements only apply if the pin is somehow low
At this point I'm just relieved it doesn't have the brainrot that is UEFI.
so a jumper from gpio21 to gnd, can change the BOOT_ORDER
the maskrom does also have gpio config too
you can configure a certain gpio pin, to disable loading recovery.bin from SD when in a certain state
and the same, for booting from SPI
fully configurable, for both which pin, and which level is "disable"
thats mainly for use on the CM4, when the emmc and SPI are soldered on, and might have bad firmware
forcing them off causes it to fall back to usb-device mode, and you can re-flash it
for all models, the rom loads a .bin file into the 128kb VPU L2 cache, and executes it (with optional signature checking)
that .bin then brings dram online, and loads a .elf file
and the .elf then brings the rest of the system up, and loads linux
with the pi0-pi3 lineup, you have 3 choices over where to get your bootcode.bin
https://github.com/librerpi/rpi-open-firmware compiles down to an all-in-one bootcode.bin that brings dram online, and then executes linux on the arm core
https://github.com/librerpi/lk-overlay is highly modular, and one configuration of it will compile down to a bootcode.bin that brings dram on, and runs a .elf on the VPU
the closed bootcode.bin also brings dram online, and then loads start.elf and executes it
using start_file=msd.elf or just renaming .elf files, you can also get the closed .bin to load other things
you then have 3 choices again, for which .elf you load
msd.elf is meant for compute modules, but also works on the zeros
the rpi will use the dwc to emulate a usb MSD, and expose the sd/emmc, for flashing an OS
start.elf is the official firmware, with all of the services you expect from an rpi (hw encode/decode, camera, 3d, and everything)
and lk-overlay can also produce an lk.elf file, and one configuration will go on to loading linux
the biggest difference and compatability issue here, is that start.elf comes with a fixup.dat, relocation data, for loading start.elf to a diff memory addr
and the open loader doesnt support that, and has never been tested with the closed start.elf binary
but the open lk.elf is compatible with both open&closed .bin loaders
so you can use the closed bootcode.bin for netboot, but then the open lk.elf, and speed up your development cycles
or go pure open source, with lk-overlay for both bootcode.bin and lk.elf
@twilit oasis that whole mess make sense?
It's long, but yeah.
and this is the boot graph for the old bcm2711 firmware
i'm not able to bring the dram up, or the arm up
so, while i can stick open source in at any stage, thats a dead-end
but, when RPF decided to add https booting to the SPI firmware, it went over the 128kb size limit
so they split the bootloader into 2 pieces
the new bootcode.bin only does dram init, and loading of bootmain.elf from spi, and has lost the ability to boot from anything else
and the new bootmain.elf then does the actual loading part of a bootloader
https booting is also called network install
the SPI flash will just download a ~64mb boot.img file over the internet, check an rsa signature on it, and then treat that as the fat partition on SD
within the default image, is the normal start4.elf firmware, linux, and an initrd
and within the initrd, is the rpi imager
so, you can just hit shift while booting, and the rpi imager will launch, and you can install any distro to your sd/usb
and i say "default image", because its fully configurable in bootconf.txt
HTTP_HOST=, HTTP_PATH= can be used to get boot.img from elsewhere
IMAGER_REPO_URL= changes the path to the json file that rpi-imager uses for a distro list
so you can use the stock boot.img, but provide a custom list of distro's
combine that with gpio conditionals, and you can force it into net-install mode with a jumper
and boom, you now have a (network reliant) factory reset, on whatever product you bake a CM4 into
hold a button while booting, select an OS from the list, hit write, done
Well minimizing proprietary blobs is definitely preferable.
yeah, thats where replacing the blobs comes in
with the old rpi-open-firmware project, you can boot the pi2/pi3 without any blobs present (if you ignore the maskrom)
but you dont get any video, and changing the codebase often crashes things, its got to fit within 128kb, and is oddly fragile at times
so i grabbed this existing kernel, and ported it to the VPU
that instantly gave me threading, mutexes, blocking thread primitives, irq handlers, ext2, tga, and more
That's impressive.
i then took bits of code from rpi-open-firmware, and ported them over, giving me lpddr2, arm, power, and clock drivers
https://github.com/librerpi/lk-overlay/blob/master/project/vc4-stage1.mk
LOCAL_DIR := $(GET_LOCAL_DIR)
TARGET := rpi3-vpu
MODULES += \
app/vc4-stage1 \
platform/bcm28xx/otp \
platform/bcm28xx/rpi-ddr2 \
platform/bcm28xx/sdhost \
GLOBAL_DEFINES += BOOTCODE=1 NOVM_MAX_ARENAS=2 NOVM_DEFAULT_ARENA=0
GLOBAL_DEFINES += WITH_NO_FP=1
BOOTCODE := 1
WERROR := 0```
with LK, you define a project file like this
each module, has its own rules.mk that follows a similar scheme
the target, says to load target/rpi3-vpu/rules.mk, which follows the same scheme again
TARGET := rpi3-vpu and BOOTCODE := 1 say that the resulting lk.bin must be under 128kb, and is compatible with the maskrom loading, just rename it to bootcode.bin and this will run on startup
platform/bcm28xx/otp is the OTP driver, so you can read things like the hw revision and serial#
platform/bcm28xx/rpi-ddr2 is the lpddr2 driver, so it can bring ram online
platform/bcm28xx/sdhost is an SD driver, so you can access storage
app/vc4-stage1 is the bootloader app, it uses otp/sdhost to mount an ext4 partition, load /boot/lk.elf, and then executes it
https://github.com/librerpi/lk-overlay/blob/master/app/vc4-stage1/rules.mk
MODULE_DEPS += \
lib/elf \
lib/fs \
lib/fs/ext2 \
lib/lua \
lib/partition \
MODULE_SRCS += \
$(LOCAL_DIR)/stage1.c \
and every module, is free to declare more modules it depends on, and sources to add to the kernel
lib/elf allows parsing elf files
lib/fs is the filesystem core
lib/fs/ext2 is the ext2/4 driver
lib/partition adds MBR support
lib/lua is part of a test, for running lua at boot
https://github.com/librerpi/lk-overlay/blob/master/app/vc4-stage1/stage1.c
bdev_t *sd = rpi_sdhost_init();
partition_publish("sdhost", 0);
ret = fs_mount("/root", "ext2", "sdhostp1");
ret = fs_open_file("/root/boot/lk.elf", &stage2);
ret = elf_open_handle(stage2_elf, fs_read_wrapper, stage2, false);
void *entry = load_and_run_elf(stage2_elf);
fs_close_file(stage2);
arch_chain_load(entry, 0, 0, 0, 0);
and boom, with just that (and error handling), you have a bootloader!
but, its only able to load VPU binaries, and it just jumps to the entry-point defined in the ELF header
https://github.com/librerpi/lk-overlay/blob/master/project/vc4-stage2.mk
so now, we just start all over, but now the 128kb size limit isnt at play
so we are free to add much more expensive modules into the code
stage2 will bring various hw blocks online, including the arm core, and execute an arm payload on that
and currently, the arm payload is a 3rd build of LK!
https://github.com/librerpi/lk-overlay/blob/master/project/rpi2-test.mk
this runs on the arm core, and loads rpi2.dtb and zImage from /boot on ext4
on the LK side of things, i have uart/vec(ntsc)/dpi/3d/pwm/sd all working
when linux is on the arm side, it has uart/usb-host/sd, and it can get a dumb framebuffer by routing thru LK
the main features that are missing and would be possible to solve, are:
- config files
- raw camera access (just bayer frames)
- i2s/spi/i2c on linux
- maybe 3d on linux, with some driver mangling
the harder stuff, that should still be possible:
- usb-host on the vpu side (netboot/usb boot)
- hdmi/dsi init
and then the basically impossible stuff, is h264/mpeg2/jpeg accel, and the whole ISP
bcm2711 is much further behind, no dram init, and i havent gotten the arm core to start either, so your entirely reliant on closed blobs
@twilit oasis any questions after all of that?
0x3f102070 00 00 90 00 0c 00 00 00 00 00 00 00 00 00 00 00 |................|
0x3f102080 18 02 43 00 00 0c 01 00 86 00 18 00 40 00 00 00 |..C.........@...|
] hexdump 0x7e102070 32
0x7e102070 00 00 90 00 0c 08 00 00 00 00 00 00 00 00 00 00 |................|
0x7e102080 38 02 47 00 00 1c 01 00 8e 04 18 00 40 00 00 00 |8.G.........@...|
some progress
i just ignored that PLLH isnt locking, and tried using it anyways, as the ref-clk for PWM/audio
its hovering around 311mhz
and its working!!
there was 4 DIG registers, no clue what they do
but i noticed, linux doesnt touch them, ever
so i commented that out, and boom, its come to life!
@kindred raptor i got PLLH working lastnight!
Nice!
i started by just ignoring the fact that it doesnt lock, and trying to use it anyways
some math with the divisors, revealed it was running at around 300mhz
and after more messing around, i eventually got it to run at the desired freq
Does it lock now?
yep
So, why did it refuse to lock?
there was a set of 4 registers, DIG0, DIG1, DIG2, DIG3
my code was setting them, the same as it did with the other PLL's
but linux never touches the DIG registers, ever
i commented that block out, and it started working
What were the DIG registers supposed to do again?
no clue!
linux doesnt touch them, and i cant remember where i got that code from
and the headers dont say anything
*REG32(A2W_PLLA_DIG3) = A2W_PASSWORD | 0x0;
*REG32(A2W_PLLA_DIG2) = A2W_PASSWORD | 0x400000;
*REG32(A2W_PLLA_DIG1) = A2W_PASSWORD | 0x5;
*REG32(A2W_PLLA_DIG0) = A2W_PASSWORD | div | 0x555000;
it was doing a bunch of magic numbers like this
*REG32(A2W_PLLA_ANA3) = A2W_PASSWORD | KA(2);
*REG32(A2W_PLLA_ANA2) = A2W_PASSWORD | 0x0;
*REG32(A2W_PLLA_ANA1) = A2W_PASSWORD | (prediv ? ANA1_DOUBLE : 0) | KI(2) | KP(8);
*REG32(A2W_PLLA_ANA0) = A2W_PASSWORD | 0x0;
and i think the KA/KI/KP vars, are part of the digital PID loop
ic
i assume that changing those, will adjust how fast the PLL locks, and how stable its clock is
but i would need a spectrum analyzer and other fancy hw, to do anything with that
so, in theory....
#define SCALER_DISPECTRL_SECURE_MODE_SET 0x80000000
i just set bit 31 in 0x7e40000c, and the security problem i had goes away
(or clear it, will need testing)
and now that PLLH is working, the linux kms drivers should just work
and boom, full 2d and 3d accel
ok, lets see, first i need to turn the arm code back on....
] whatareyou
i am aarch64 with MIDR_EL1 0x410fd034 in EL 1
ok, arm core is running a repl
] hexdump 0xffffffffc0400000 32
0xc0400000 7f 00 0c 80 00 00 17 00 76 72 64 64 00 00 3f 81 |........vrdd..?.|
0xc0400010 00 00 00 00 00 00 00 00 00 00 00 00 76 72 64 64 |............vrdd|
0x7e40000c has bits 16-21, 24, and 31 set
so SECURE_MODE is 1
ok, now linux doesnt boot anymore
interesting
*REG32(SCALER_DISPECTRL) &= ~SCALER_DISPECTRL_SECURE_MODE;
just running this, breaks linux
but if i run that from the VPU, it doesnt hang
ok, so at /soc/gpu you have:
gpu {
compatible = "brcm,bcm2835-vc4";
status = "disabled";
};
that is the master for the whole drm
ive turned on a bunch of gpu stuff....
[ 103.230789] raspberrypi-firmware soc:firmware: Request 0x00030066 returned status 0x80000001
[ 103.240922] vc4-drm soc:gpu: [drm] Couldn't stop firmware display driver: -22
PANIC: Asynchronous SError Interrupt
Entering kdb (current=0xffffff800b408000, pid 748) on processor 0 due to NonMaskable Interrupt @ 0xffffffc00815eb64
[0]kdb>
oh wow, that actually worked this time
panic+0x198/0x374
nmi_panic+0xb4/0xbc
arm64_serror_panic+0x78/0x84
do_serror+0x30/0x7c
el1h_64_error_handler+0x38/0x50
el1h_64_error+0x64/0x68
vc4_hvs_bind+0xf8/0x560 [vc4]
component_bind_all+0x110/0x260
@granite sandal incase you missed it, ive moved my spammy chat over to this thread
as best as i can tell, its vc4_hvs_upload_linear_kernel() that faulted, while writing to the display list
which means the hvs security is not disabled, and the problem remains
i'm on a bug hunt today will go back and read it
What is SECURE_MODE?
I assumed it was the cause of my problem
Security to block the arm from using the 2d core
The 2d core can read all ram via dma
That defeats all DRM schemes
Your crypto keys cease to be safe
So you block the 2D core while doing what? In a game console context (think We Hope Nintendo Will Use Our IC), it wouldn't make any sense. I guess it was made to prevent DMA while decoding DRM'd video?
Or force all video thru an RPC call into the secure kernel
Opengl was over that RPC channel
There are signs that a secure video frame can be used as opengl texture in a secure manner
So, you put your display list in a buffer somewhere, you call an OS function, it reads & hopefully somehow in spite of the halting problem sanitizes it, and then sends it to the videocore?
for the 2d core, you basically just have a flat list of sprites
each sprite has 1 to 3 pointers into ram, at predictable offsets
and optionally 1/4/5 pointers into the display list, for scaling/palette stuff
if you just implement a basic parser, you can decode that, substitute virtual addresses for physical addresses (and fault on violation), and then forward it on
Makes sense!
but the rpi firmware instead implements the dispmanx api
it doesnt expose the raw hw api itself, but a proper 2d graphics api
vc_dispmanx_resource_create() creates an image in gpu memory
vc_dispmanx_resource_write_data() copies image data from linux to gpu memory
vc_dispmanx_update_start() and vc_dispmanx_update_submit_sync() wrap a bunch of changes, to make the whole group atomic
vc_dispmanx_element_add() creates a sprite using a previously allocated resource as its backing image
other functions exist, to change the resource behind a sprite, or to change the parameters on a sprite
and RPF (at my request) added a function that lets you get the physical address of a resource in gpu memory, so you can dma directly into it and skip vc_dispmanx_resource_write_data()
the dispmanx stack then keeps track of all of the resources(images) and elements(sprites), and generates the display list automatically
That's... another way to ensure you don't get your tivobox owned
(As long as you properly sanitize inputs)
yep
there is a dedicated mmu between the arm core(s) and the rest of the system
so you could ban linux from ever touching gpu ram or mmio
if linux wants something, it must ask the gpu for help
but the rpi firmware hasnt made use of any of these tricks
the security was off from day 1
so all thats left, is design tricks, that imply it could have done this at one time
like the firmware having a secure/non-secure split, and well sanitized rpc calls into the secure half
including one rpc call that just lets you write anywhere in ram 😛
I'm guessing they based it off a reference design from the manufacturer
the original pi1 firmware, looked like it came right out of a cable box
it has mention of tv remote buttons, games, channels
they basically just took STB firmware, added an app to run linux on the co-processor, and shipped it 😛
and over time, they removed unused parts, and the codebase evolved
to speed up testing, i added a line to the bootloader
dlist_memory[0] = 0x1234;
in theory, that will cause the same fault linux is having, but way sooner in the boot process
What fault was linux having again? I may have missed it in the scrollback :P
async external abort
ok, that sorta worked
[ 0.000000] SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.21 #1-NixOS
[ 0.000000] Hardware name: Raspberry Pi 3 Model B rev 1.2, with open firmware (DT)
[ 0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
...
[ 0.000000] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.21 #1-NixOS
[ 0.000000] Hardware name: Raspberry Pi 3 Model B rev 1.2, with open firmware (DT)
[ 0.000000] Call trace:
...
[ 0.000000] el1h_64_error+0x64/0x68
[ 0.000000] setup_arch+0x168/0x5d4
[ 0.000000] start_kernel+0xa4/0x79c
[ 0.000000] __primary_switched+0xbc/0xc4
my bootloader somehow has that error masked
so the instant linux turns exception handling on, boom
i can kinda see why this isnt handled like an OOPS
the handler was ran seconds!! (the async) after the fault was triggered
325 parse_early_param();
326
327 /*
328 * Unmask asynchronous aborts and fiq after bringing up possible
329 * earlycon. (Report possible System Errors once we can report this
330 * occurred).
331 */
332 local_daif_restore(DAIF_PROCCTX_NOIRQ);
325 is what allowed me to see this
and 332 feels like where it caught the exception
arch/arm64/include/asm/daifflags.h:#define DAIF_PROCCTX_NOIRQ (PSR_I_BIT | PSR_F_BIT)
DAIF, page 392
Interrupt Mask Bits
bit 8, A, SError interrupt mask bit.
that matches the error msg
invalid exception, which 0x13
iframe 0xffff00000008dec0:
x0 0xffff00000003d1c8 x1 0x ffffffff x2 0x 1234 x3 0xffff000000044000
x4 0x 696f6820 x5 0x 696f6820 x6 0x 0 x7 0x 48841
x8 0xffff00000003d508 x9 0xffff00000002a000 x10 0xffff00000008dff0 x11 0x ffffffc8
x12 0xffff00000008e030 x13 0xffff00000008e030 x14 0xffff00000004f000 x15 0xffff000000082430
x16 0x 1 x17 0x8030200002211214 x18 0x 0 x19 0xffff00000003d1c8
x20 0xffff000000032fa8 x21 0x 0 x22 0xffff000000032000 x23 0x 0
x24 0xffff000000032fe0 x25 0xffff000000032fb8 x26 0xffff000000032bd8 x27 0x 0
x28 0xffff0000000447a0 x29 0xffff00000008e010 lr 0xffff0000000058c4 usp 0xf3fbed2923810090
elr 0xffff00000000c73c
spsr 0x 60000205
stack trace:
0xffff00000000c73c
0xffff000000005efc
0xffff000000004ebc
0xffff0000000034fc
panic (caller 0xffff0000000033c0): die
HALT: spinning forever... (reason = 9)
bingo
Doesn't this mean that there was a memory bus error?
How in the world would loading linux trigger that?
the problem is that the bootloader didnt unmask this error
so the bootloader triggered the error (via dlist_memory[0] = 0x1234;) and left it pending
and once linux got the console up, it unmasked the error, and instantly blew up
So, you set a pointer to the display list to somewhere that doesn't make sense
Which caused a pending SError
any write to the displaylist, even a normally valid one, will fault like this
the arm just doesnt have permission to use the 2d core
its not the arm mmu
This has to do with the secure_mode thing?
that was my theory
if secure is on, only requests from the VPU can write
if secure is off, anybody can write
That would sound right
SCALER_DISPECTRL: 0x13f0000
invalid exception, which 0x13
bit 31 is the secure flag
SCALER_DISPECTRL: 0x813f0000
invalid exception, which 0x13
it faults when both on and off ......
guess you are missing something
yep
Or the ARM peripheral always has non-secure permissions
Saying "the ARM peripheral" has very cursed energy 
the same write works when running under the closed firmware
this tells you what each bit in 0x813f0000 means
first, we have PANIC_CTRL in bits 0-6, thats 0
then we have BUSY_STATUS in bits 8-31????
then Y_BUSY in bits 9-31???
overlap much??
SCALER_DISPECTRL: 0x813f0000
SCALER_DISPECTRL: 0xfdff007f
...
SCALER_DISPECTRL: 0xfdff007f
invalid exception, which 0x13
if i write all f's, then some bits dont stick
bits 0-6
16-24
26-31
yeah, ive got no clue
too much overlap in this register, and video still works even with garbage written to it
well, one last idea, which also failed
*REG32(SCALER_DISPCTRL) &= ~SCALER_DISPCTRL_ENABLE; // disable HVS
...
*REG32(SCALER_DISPCTRL) = SCALER_DISPCTRL_ENABLE // re-enable HVS
before i disable (it may have already been off), it was at 0x813f0000
after disabling, writes are silently ignored
after re-enabling, it accepts the write and can clear that secure_mode bit
my next potential target, there are flags in here, like FULLPERI and peripherals on
if i remove FULLPERI then it faults early in boot, and doing as little as reading the clock will hang the arm
but the uart is working
if i remove peripherals on, nothing changes
i'm getting a SLVERR when i try to drive the 2d core
a SLVERR is represented by the BRESP bits being 0b10
#define ARM_C0_BRESP1 0x00000004
#define ARM_C0_BRESP2 0x00000008
static const uint8_t g_BrespTab[] = {
0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x1C, 0x18, 0x1C, 0x18, 0x0,
0x10, 0x14, 0x10, 0x1C, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x0,
0x10, 0x14, 0x10, 0x1C, 0x18, 0x1C, 0x10, 0x14, 0x18, 0x1C, 0x10, 0x14, 0x10, 0x0,
0x10, 0x14, 0x18, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x0,
0x10, 0x14, 0x18, 0x14, 0x18, 0x14, 0x10, 0x14, 0x10, 0x14, 0x10, 0x14, 0x18, 0x0
};
there is a magic table of ~65 BRESP flags in the code.....
the main broadcom mmu, between arm and vpu, has 64 pages, of 16mb each......
if i decode that array, i get:
0: 0x4 OKAY
1: 0x5 EXOKAY
2: 0x4 OKAY
3: 0x5 EXOKAY
4: 0x4 OKAY
5: 0x5 EXOKAY
6: 0x4 OKAY
7: 0x5 EXOKAY
8: 0x4 OKAY
9: 0x7 DECERR
10: 0x6 SLVERR
11: 0x7 DECERR
12: 0x6 SLVERR
14: 0x4 OKAY
15: 0x5 EXOKAY
16: 0x4 OKAY
17: 0x7 DECERR
18: 0x4 OKAY
19: 0x5 EXOKAY
20: 0x4 OKAY
21: 0x5 EXOKAY
22: 0x4 OKAY
23: 0x5 EXOKAY
24: 0x4 OKAY
25: 0x5 EXOKAY
26: 0x4 OKAY
28: 0x4 OKAY
29: 0x5 EXOKAY
30: 0x4 OKAY
31: 0x7 DECERR
32: 0x6 SLVERR
33: 0x7 DECERR
34: 0x4 OKAY
35: 0x5 EXOKAY
36: 0x6 SLVERR
37: 0x7 DECERR
38: 0x4 OKAY
39: 0x5 EXOKAY
40: 0x4 OKAY
42: 0x4 OKAY
43: 0x5 EXOKAY
44: 0x6 SLVERR
45: 0x5 EXOKAY
46: 0x4 OKAY
47: 0x5 EXOKAY
48: 0x4 OKAY
49: 0x5 EXOKAY
50: 0x4 OKAY
51: 0x5 EXOKAY
52: 0x4 OKAY
53: 0x5 EXOKAY
54: 0x4 OKAY
56: 0x4 OKAY
57: 0x5 EXOKAY
58: 0x6 SLVERR
59: 0x5 EXOKAY
60: 0x6 SLVERR
61: 0x5 EXOKAY
62: 0x4 OKAY
63: 0x5 EXOKAY
64: 0x4 OKAY
65: 0x5 EXOKAY
66: 0x4 OKAY
67: 0x5 EXOKAY
68: 0x6 SLVERR
What I find especially strange/cursed, is that, if you mask serror, the write succeeds, even though the peripheral returns a SLVERR.....
i think the write isnt succeeding
and the error is telling you so
but its an async error, and can happen a dozen opcodes late
But how were you driving the display list?
one of the DSI ports has this same problem, and the rpi engineers never did fix it
so they have an RPC to forward every read/write
My guess is that this is the reason why they use RPC for calls to the VPU too
The RPi engineers can't fix something that's baked into the silicon
exactly
if the VPU has been designed in such a way so as to refuse writes from the ARM peripheral
Then, it is how it is
i assume for the DSI, its just hard-wired like that
but clearly, its a config switch on the HVS
There's some sort of peripheral inside the silicon driving the DSI, right?
But wait, you said that from the proprietary firmware, writes to the VPU from the ARM peripheral succeed? O.o
yeah
if you boot the closed firmware, and then write to the HVS from arm, it just works
thats how the kms drivers in linux work
then it's probably something like that
yep
You just went and picked the strangest and most undocumented platform to work with
😄
Bizzaro world where phrases such as "The ARM Peripheral" become normal
The Twilight Pi
i think the older videocore SoC's, just lacked an arm entirely
the VPU ran the entire show
and one of the rpi enginers has said, "lets throw an arm core in there, we might need it some day"
and from the design, it seems to pre-date or not trust the arm secure vs non-secure stuff
so they are using the VPU as the master, and ARM as the untrusted slave
I don't think it was the Pi Foundation's idea
it pre-dates the foundation
Most probably they just got a broadcom part with an ARM core
(Or the pi engineers' for that matter)
one of the broadcom engineers picked the bcm2835 to make the rpi, and start the foundation
and the foundation was started by a group of broadcom employees
if i remove the ARM_C0_FULLPERI flag, then the arm can still access the UART, but it cant access the clock
that seems like some major isolation
perfect for a hostile application
Yeah, my guess is that it was a preexisting IC. I doubt somebody went and started what sounds like essentially a skunkworks project by designing custom silicon as a first step
:P
yep
the arm core being added, was a broadcom employee idea, and they later became an rpi engineer
TIL debian 12 was released 2 months ago.
I must dist-upgrade
</offtopic>
Re: Your AXI woes: What if you asked in the raspi forums?
ive asked the rpi engineers repeatedly, as to why i'm getting slave error
the answer is always power domains, "the hvs isnt on"
an open source start.elf, demoing both the 2d and 3d cores at the same time
-rwxr-xr-x 1 root root 2.0M Sep 7 00:52 start.elf
entire program and image data is all in an open source start.elf
cpu is running at 216mhz on an rpi1
cpu usage is very low, due to hw acceleration
ssuuuurrreeee, thats definitely offf
and as usual, they only post a single reply to the thread, and then go silent
maybe there's something specific you need to do to fully bring it up?
not sure what else there would be to turn on
AXI interconnect related?
hmm... I guess there's some permissions register?
yep
so, i have ~3 solutions to this
1: keep searching, until i find the magic permission register
2: implement an RPC like DSI has, so the linux driver can relay every write thru the firmware
3: implement my own fkms, and use my custom display stack
2 has the best changes of getting hdmi
but 2&3 mean you need a custom build of linux to get the new drivers
but now that PLLH is up, i could take another stab at hdmi bringup
i have an old platform/bcm28xx/vc4-hdmi i started, what happens if i turn that on....
fixed the compile errors, and nothin, but thats to be expected
it only writes to 3 registers
now i get to read this, and try to figure out the hw....
so most of the hdmi runs off the HSM clock, which ive set to 100mhz
but the PHY directly uses PLLH_PIX
found a nice reset function, time to implement!
vc4_hdmi_set_timings() looks interesting
ok, now what timings does my display actually need
boots stock raspios
root@raspberrypi:/sys/kernel/debug/dri/0# cat hdmi_regs
HDMI_VERTA0 = 0x00302400
HDMI_VERTA1 = 0x00302400
HDMI_VERTB0 = 0x00000026
HDMI_VERTB1 = 0x00000026
HDMI_HORZA = 0x00006500
HDMI_HORZB = 0x0f81c030
the raw register values
root@raspberrypi:/sys/kernel/debug/dri/0# cat state
crtc[101]: pixelvalve-2
mode: "1280x1024": 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x68 0x5
the modeline
https://gist.github.com/cleverca22/bfcd710475ccf277b817bb73473dac61
Detailed Timing Descriptors:
DTD 1: 1280x1024 60.020 Hz 5:4 63.981 kHz 108.000 MHz (376 mm x 301 mm)
Hfront 48 Hsync 112 Hback 248 Hpol P
Vfront 1 Vsync 3 Vback 38 Vpol P
the timing from EDID
VERTA: 0x302400
yep, matches
VERTB: 0x26
good
/* Horizontal pack porch (htotal - hsync_end). */
# define VC4_HDMI_HORZB_HBP_MASK
somebody made the same typo, repeatedly, in the linux src
HORZA: 0x6500
HORZB: 0xf81c030
i now have all of the timing params, and can program the hdmi block...
registered programmed, but no image
vc4_hdmi_encoder_pre_crtc_configure() calls what i just implemented, and the other things it does are likely useful
/*
* As stated in RPi's vc4 firmware "HDMI state machine (HSM) clock must
* be faster than pixel clock, infinitesimally faster, tested in
* simulation. Otherwise, exact value is unimportant for HDMI
* operation." This conflicts with bcm2835's vc4 documentation, which
* states HSM's clock has to be at least 108% of the pixel clock.
*
* Real life tests reveal that vc4's firmware statement holds up, and
* users are able to use pixel clocks closer to HSM's, namely for
* 1920x1200@60Hz. So it was decided to have leave a 1% margin between
* both clocks. Which, for RPi0-3 implies a maximum pixel clock of
* 162MHz.
*
* Additionally, the AXI clock needs to be at least 25% of
* pixel clock, but HSM ends up being the limiting factor.
*/
@kindred raptor a bit juicy comment that nicely explains things!
Ah, you're bringing up hdmi now?
trying to
> (48+112+248+1280) * (1+3+38+1024)
1,799,408
> (48+112+248+1280) * (1+3+38+1024) * 60.02
108,000,468.16000001
each frame has ~1.8 million pixels, and at 60.02fps, thats ~108mhz pixel clock
which agrees with EDID
so, HSM has to be at least 108mhz
clock_set_hsm(MHZ_TO_HZ(100), 5);
that is not
clock_set_hsm(MHZ_TO_HZ(125), PERI_PLLC_PER);
ref: 500000000, target: 125000000, divisor(f): 4.000000, divisor(fixed): 0x4000
] measure_clocks
clock #22(hsm) is 125000000
perfect!
and the axi clock (core? not sure) needs to be at least 27mhz, easy
oh, and interesting....
if the axi clock has to be 25% of the pixel clock
and every pixel is 4 bytes
then that would imply its moving 128 bits over axi?
but that also implies the axi is being loaded constantly, by every pixel displayed
maybe they mean the HVS load....
clk_set_rate(vc4_hdmi->pixel_clock, pixel_rate);
ok which clock is this.....
vc4_hdmi->pixel_clock = devm_clk_get(dev, "pixel");
the "pixel" clock in device-tree
clocks = <&firmware_clocks 9>,
<&firmware_clocks 13>;
clock-names = "pixel", "hdmi";
which is firmware clock 9
firmware_clocks: clocks {
compatible = "raspberrypi,firmware-clocks";
#clock-cells = <1>;
};
RPI_FIRMWARE_PIXEL_CLK_ID
i'm just going to blindly assume, its PLLH_PIX
so, if i want that at 108mhz, and the PLL bottoms out at 600
> 108*6
648
then i need 648mhz and /6
that seems to have worked
FWIW, the AXI bus does define a global clock signal (ACLK)
ive not encountered ACLK anywhere in the rpi source
so i'm not sure which divider its running on
if (pixel_rate > 297000000)
bvb_rate = 300000000;
else if (pixel_rate > 148500000)
bvb_rate = 150000000;
else
bvb_rate = 75000000;
ret = clk_set_min_rate(vc4_hdmi->pixel_bvb_clock, bvb_rate);
Ah, it's just a generic signal name. I doubt you'll find it in the source :P
curious, what is the bvb clock....
It's the term the protocol spec uses (ACLK)
any idea about the BRESP stuff?
It's just a line that provides the write response
but what about the bresp table?
I guess there's a register with all bresp somewhere?
it writes all of those BRESP values into a single register, with a short delay after each write
Where's this code even from?
lines 33-56
#define ARM_C0_BRESP1 0x00000004
#define ARM_C0_BRESP2 0x00000008
#define ARM_C0_BOOTHI 0x00000010
you can use these defines, to decode that table
this runs on the arm side?
that runs on the VPU side, before the arm is enabled
Ah, so ARM control0 is not the ARM control register?
its a control register for the "arm peripheral"
involved in configuring and turning it on
#define ARM_C0_AARCH64 0x00000200
if you set this bit in control0, the arm starts in aarch64 mode
if that bit is clear, it starts in armv7 mode
I wonder how they derived that table....
thats a neat compatability thing, bcm2836 era firmware (pi2) could run on a bcm2837 (pi3), and not even be aware of the aarch64 core
and it will just run in armv7 mode
Well, it looks like it's... writing stuff to said control register, and updating it based on how the arm core previously responded
i dont think its even caring how the arm responds
its just blasting a bit array of 3bit ints at the arm
my best guess, its an array of pre-defined BRESP answers, for various ranges of memory
so if you step out of line, you get an error from this table
my theory, is that the arm peripheral will take this list, treating ARM_CONTROL0 like a FIFO, and store it on some internal ram
and for every access to ram, it looks up the right slot in this array, and returns the BRESP value in that slot
and that then lets the VPU firmware configure what you can and cant do from the arm
for example, one index in that array, could be the BRESP for any write to the HVS
ARM_CONTROL0 is defined as a register in the headers
yep
And other things with the prefix ARM_C0 map to things that should be memory-mapped registers
#define ARM_CONTROL0 HW_REGISTER_RW(ARM_BASE+0x000)
#define ARM_C0_SIZ128M 0x00000000
#define ARM_C0_SIZ256M 0x00000001
#define ARM_C0_SIZ512M 0x00000002
#define ARM_C0_SIZ1G 0x00000003
#define ARM_C0_BRESP0 0x00000000
#define ARM_C0_BRESP1 0x00000004
#define ARM_C0_BRESP2 0x00000008
#define ARM_C0_BOOTHI 0x00000010
HW_REGISTER_RW will cast the int into a pointer, and then de-reference the pointer
so you can just do ARM_CONTROL0 = 0x123 to write to MMIO
and foo = ARM_CONTROL0 to read from MMIO
ARM_C0_* are then constants for various flags within that register
#define ARM_C0_JTAGMASK 0x00000E00
#define ARM_C0_JTAGOFF 0x00000000
#define ARM_C0_JTAGBASH 0x00000800 // Debug on GPIO off
#define ARM_C0_JTAGGPIO 0x00000C00 // Debug on GPIO on
JTAGMASK is all bits in the jtag enum
JTAGOFF disables jtag
JTAGBASH allows bit-banging jtag from the vpu
JTAGGPIO allows jtag on the gpio header
Sorry, I misread the C code
And imagined a macro somewhere that did a read from said addresses
// ARM JTAG BASH
//
#define AJB_BASE 0x7e2000c0
#define AJBCONF HW_REGISTER_RW(AJB_BASE+0x00)
#define AJB_BITS0 0x000000
#define AJB_BITS4 0x000004
#define AJB_BITS8 0x000008
...
#define AJB_ENABLE 0x000800
#define AJB_HOLD0 0x000000
#define AJB_HOLD1 0x001000
#define AJB_HOLD2 0x002000
#define AJB_HOLD3 0x003000
#define AJB_RESETN 0x004000
#define AJB_CLKSHFT 16
#define AJB_BUSY 0x80000000
#define AJBTMS HW_REGISTER_RW(AJB_BASE+0x04)
#define AJBTDI HW_REGISTER_RW(AJB_BASE+0x08)
#define AJBTDO HW_REGISTER_RW(AJB_BASE+0x0c)
this appears to be a hw accelerated jtag bit-banging peripheral
so you can wiggle the jtag lines of the arm core directly from the VPU, without involving external pins
yeah, for some reason I thought they were pointers to some address somewhere :(
#define HW_REGISTER_RW(addr) (*(volatile unsigned int *)(addr))
the missing piece of the magic
VC4_HD_VID_CTL_ENABLE
that looks like the master hdmi enable flag
hidden under vc4_hdmi_encoder_post_crtc_enable()
So, ARM_CONTROL0's 2 bits left to the MSB are the BRESP?
According to those flags?
4(0b0100) is BRESP1
8(0b1000) is BRESP2
a C from the bresp table, is just 4|8, both
so if i assume BRESP1 is BRESP[0] and so on...
then 0=OKAY
4=EXOKAY
8=SLVERR
C=DECERR
Silly guess, but, what if it's doing that song and dance and clearing the transaction status flag from whatever was previously done?
could be possible
Say that your previous write to the register was 0x00000010
If you had a SLVERR, EXOKAY or DECERR
i did notice, messing with that table has no effect on the pi3
You'd now have 0x00000004, 0x00000008, 0x0000000c
including just not writing the table entirely
So, by or'ing the previous value to the register, it's ensuring those bits get... cleared?
This seems completely pointless, since those should be just input lines
The VideoCore is the controller
Unless what you want to do is write those exact values, for some reason, and just want to clear the register
Some magic sequence they came up with?
ive got 2 theories
1: its programming a set of constant replies, each for a different chunk of ram
2: its forcing the BRESP bits temporarily, to flush any pending transactions
forcing the BRESP
Those are probably input lines from the ARM
Unless the videocore is an AXI slave to the ARM
i think its both
there is an axi slave port on the arm, for these control registers
and there is an axi master port on the arm, for it to do normal things to ram/mmio
the arm axi master, goes thru a custom MMU, that mangles some bits of the addr first, and then goes into the main interconnect
and the arm axi slave, is always on the interconnect
that custom mmu, is configured via the arm axi slave, along with other things
Then this code is not very good, it expects the transactions to come after the delay they have set
Does not check them at all
Forces the bresp bits
Clears the past bresp bits
And forces them again
Without checking at all what comes in
I have no idea with what kind of blind luck they came up with the values
my 2nd theory, is that the arm has outstanding axi transactions, possibly from before the axi master was enabled
and it will jam up if it doesnt get a reply
so its forcing some fake relies, to flush the pending ones
but other axi masters have a proper flush flag
no clue
something called BOOTHI
I wonder what'd happen if you messed with the PLL before doing that song and dance
it reeks of "oh, we forgot about clock domains"
Does that code affect other pis other than pi3?
ive not tried messing with BRESP on the other models
void gpio_apply_batch(struct gpio_pull_batch *batch) {
for (enum pull_mode mode = 0; mode <=2; mode++) {
if (batch->bank[mode][0] | batch->bank[mode][1]) {
*REG32(GPIO_GPPUD) = mode;
udelay(500);
*REG32(GPIO_GPPUDCLK0) = batch->bank[mode][0];
*REG32(GPIO_GPPUDCLK1) = batch->bank[mode][1];
udelay(500);
*REG32(GPIO_GPPUDCLK0) = 0;
*REG32(GPIO_GPPUDCLK1) = 0;
*REG32(GPIO_GPPUD) = 0;
udelay(500);
}
}
}
o.O
offset name
0x94 GPPUD GPIO pin pull up/down enable
0x98 GPPUDCLK0 GPIO pin pull up/down enable clock 0
0x9c GPPUDCLK1 GPIO pin pull up/down enable clock 1
is it... pulsing the clock?
GPPUD, master pull-up/pull-down enable, see further notes
GPPUDCLKn, gpio pullup/down clock enableto change the pullup config:
- write the desired mode to GPPUD (off=0, down=1, up=2)
- delay for 150 clock cycles
- write a 1 to the bits of GPPUDCLKn that correspond to GPIO pins you want to modify the state of
- delay for another 150 clock cycles
- write a zero to GPPUD
- write 0 to GPPUDCLKn
yes
Who wrote that piece of doc?
i did
How did you even figure that out :P
Ah, there's official doc?
each bit in GPPUDCLK0 and GPPUDCLK1 seems to directly wire to a 2bit latch, that isnt in any clock domain
and GPPUD goes to the input of all 64 latches
to set the pulls, you use GPPUD to present the desired mode (off, up, down), to every latch
then you use GPPUDCLK0 and GPPUDCLK1 to manually strobe the write enable on some of the latches
page 89
those up/down regs, seem to be raw flip-flops, without any clock domain
each bit of GPPUDCLK0 is driving the clock of each flip-flop in the gpio 0-31 range
...ok.
I wonder why they didn't just give them the same clock as the CPU and use a clock enable pin
Like Normal People would :P
official doc, from the datasheet
and from those bit values, i feel like bit0 is directly linked to the pull-down flipflop
and bit1, the pullup flipflop
and 11 would set both pulls and start a fight!
they did fix things in the pi4:
for the bcm2711/rpi4:
pullup control signal is 2 bits wide (mask of 0x3)
the register offset (which register) is gpio/16
the bit-shift within that register is (gpio % 16) * 2looks to be AVR style, just a giant block of bits, just like the function select ones
2 bits per pin, off=0, up=1, down=2, 16 pins per 32bit reg, 4 registers in total?
the 2711 datasheet
the other major difference, is that 283x pulls, are write-only
but the bcm2711 pulls, are r/w
] gpio_dump_state
GPIO00 IN HIGH | HIGH IN GPIO32
GPIO01 IN HIGH | HIGH IN GPIO33
GPIO02 IN HIGH | LOW IN GPIO34
GPIO03 IN HIGH | HIGH IN GPIO35
GPIO04 IN HIGH | HIGH IN GPIO36
GPIO05 IN HIGH | HIGH IN GPIO37
GPIO06 IN HIGH | HIGH IN GPIO38
GPIO07 IN HIGH | HIGH IN GPIO39
GPIO08 IN HIGH | LOW IN GPIO40
GPIO09 IN LOW | LOW IN GPIO41
GPIO10 IN LOW | LOW ALT0 GPIO42
GPIO11 IN LOW | LOW IN GPIO43
GPIO12 IN LOW | HIGH IN GPIO44
GPIO13 IN LOW | HIGH IN GPIO45
GPIO14 ALT0 HIGH | HIGH IN GPIO46
GPIO15 ALT0 HIGH | HIGH IN GPIO47
GPIO16 IN LOW | LOW IN GPIO48
GPIO17 IN LOW | LOW IN GPIO49
GPIO18 IN LOW | LOW IN GPIO50
GPIO19 IN LOW | LOW IN GPIO51
GPIO20 IN LOW | LOW IN GPIO52
GPIO21 IN LOW | LOW IN GPIO53
GPIO22 IN LOW | LOW GPIO54
GPIO23 IN LOW | LOW GPIO55
GPIO24 IN LOW | LOW GPIO56
GPIO25 IN LOW | LOW GPIO57
GPIO26 IN LOW | LOW GPIO58
GPIO27 IN LOW | LOW GPIO59
GPIO28 IN LOW | LOW GPIO60
GPIO29 OUT HIGH | LOW GPIO61
GPIO30 IN LOW | LOW GPIO62
GPIO31 IN LOW | LOW GPIO63
i also have this debug cmd, that can print every pin state
where's this from?
on the bcm2711, it also has an arrow, for current pull direction
the official datasheet for the 2711
And this one?
the 2835 datasheet
both datasheets are guilty of lying by omission
2835 claims gpio 0-27 alt2, is a reserved mode
https://elinux.org/RPi_BCM2835_GPIOs
this wiki reveals that 0-27 alt2 is DPI, a very easy to use display mode
the datasheet entirely omits gpio 0-15 alt3/alt4
the wiki says alt3 is AVEOUT, and alt4 is AVEIN
which appears to be a 12bit parallel video in/out port
the datasheet claims gpio 46-53 alt0 are "internal"
the wiki reveaps that 46/47 alt0 are i2c
and 48-53 alt0 are an SD interface
all very useful things to know
hmmm, i wrote to all of the hdmi control regs, i think its enabled, but my monitor is having zero reaction to it
1584 vc4_hdmi_recenter_fifo(vc4_hdmi);
1585 vc4_hdmi_enable_scrambling(encoder);
enless, its one of these?
recenter never finishes
feels like a clock is missing
HDMI_FIFO_CTL: 0x64627573
bit 5, fifo reset, is still set...
> Buffer("64627573","hex").toString("ascii")
'dbus'
wait, thats not right
] hexdump 0x7e808000 128
0x7e808000 69 6d 64 68 69 6d 64 68 69 6d 64 68 f0 00 00 00 |imdhimdhimdh....|
0x7e808010 69 6d 64 68 20 04 00 00 01 01 01 01 00 00 00 00 |imdh ...........|
0x7e808020 69 6d 64 68 00 00 00 00 69 6d 64 68 00 00 00 00 |imdh....imdh....|
0x7e808030 69 6d 64 68 69 6d 64 68 00 00 00 c0 69 6d 64 68 |imdhimdh....imdh|
0x7e808040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x7e808050 00 00 00 00 00 00 00 00 00 00 00 00 69 6d 64 68 |............imdh|
0x7e808060 69 6d 64 68 69 6d 64 68 a5 01 00 00 69 6d 64 68 |imdhimdh....imdh|
0x7e808070 69 6d 64 68 69 6d 64 68 69 6d 64 68 69 6d 64 68 |imdhimdhimdhimdh|
] hexdump 0x7e902000 128
0x7e902000 73 75 62 64 73 75 62 64 73 75 62 64 73 75 62 64 |subdsubdsubdsubd|
0x7e902010 73 75 62 64 73 75 62 64 73 75 62 64 73 75 62 64 |subdsubdsubdsubd|
0x7e902020 73 75 62 64 73 75 62 64 73 75 62 64 73 75 62 64 |subdsubdsubdsubd|
0x7e902030 73 75 62 64 73 75 62 64 73 75 62 64 73 75 62 64 |subdsubdsubdsubd|
0x7e902040 73 75 62 64 73 75 62 64 73 75 62 64 73 75 62 64 |subdsubdsubdsubd|
0x7e902050 73 75 62 64 73 75 62 64 73 75 62 64 73 75 62 64 |subdsubdsubdsubd|
0x7e902060 73 75 62 64 73 75 62 64 73 75 62 64 73 75 62 64 |subdsubdsubdsubd|
0x7e902070 73 75 62 64 73 75 62 64 73 75 62 64 73 75 62 64 |subdsubdsubdsubd|
@kindred raptor aha, this is what it looks like, when an axi slave is disabled!
writes silently ignored, reads return a 32bit constant
and because of LE vs BE
the first block is hdmi
the second block is dbus
I feel returning SLVERR, and then letting the controller decide what to do is a More Right thing to do
probably
so the problem then, is that hdmi is still disabled at some level
] hexdump 0x7e910000 128
0x7e910000 80 00 00 08 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x7e910010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x7e910020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x7e910030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x7e910040 00 00 00 00 00 00 00 00 30 65 76 61 30 65 76 61 |........0eva0eva|
0x7e910050 30 65 76 61 30 65 76 61 30 65 76 61 30 65 76 61 |0eva0eva0eva0eva|
0x7e910060 30 65 76 61 30 65 76 61 30 65 76 61 30 65 76 61 |0eva0eva0eva0eva|
0x7e910070 30 65 76 61 30 65 76 61 30 65 76 61 30 65 76 61 |0eva0eva0eva0eva|
it will also return that 32bit constant, for any undefined register
in here, i can see an 0x08000080 at the first slot, a bunch of nulls, and then ave0 repeating
other headers refer to this as AVE_IN_BASE, which implies its part of the 12bit parallel video capture interface
^
time for some lunch, bbl
back
had an idea, on another thing to probe
do the same hexdump, under linux and the official firmware
and see how the hdmi block differs
root@raspberrypi:~# /home/clever/rpi-tools/utils/ramdumper -m -a 0x3f808000 -l 128
starting at 0x3f808000 (1016MB)
0x3f808000 69 6d 64 68 69 6d 64 68 69 6d 64 68 01 02 00 00 |imdhimdhimdh....|
0x3f808010 69 6d 64 68 20 04 00 00 01 01 01 01 00 00 00 00 |imdh ...........|
0x3f808020 69 6d 64 68 00 00 00 00 69 6d 64 68 00 00 00 00 |imdh....imdh....|
0x3f808030 69 6d 64 68 69 6d 64 68 00 00 00 c0 69 6d 64 68 |imdhimdh....imdh|
0x3f808040 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ...............|
0x3f808050 00 00 00 00 00 00 00 00 00 00 00 00 69 6d 64 68 |............imdh|
0x3f808060 69 6d 64 68 69 6d 64 68 04 54 00 00 69 6d 64 68 |imdhimdh.T..imdh|
0x3f808070 69 6d 64 68 69 6d 64 68 69 6d 64 68 69 6d 64 68 |imdhimdhimdhimdh|
yep, i can read that first block as before
root@raspberrypi:~# /home/clever/rpi-tools/utils/ramdumper -m -a 0x3f902000 -l 1024
starting at 0x3f902000 (1017MB)
0x3f902000 00 06 00 00 00 00 00 00 06 00 00 00 00 00 00 00 |................|
0x3f902010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x3f902020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x3f902030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x3f902040 00 00 00 00 00 00 00 00 00 00 00 80 00 34 d0 9c |.............4..|
0x3f902050 00 10 00 00 80 00 13 00 00 00 00 00 41 40 00 00 |............A@..|
0x3f902060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x3f902070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x3f902080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
0x3f902090 88 c6 fa 00 03 00 00 00 00 00 00 00 03 04 00 21 |...............!|
0x3f9020a0 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 |................|
0x3f9020b0 00 00 00 00 f8 24 01 01 f8 24 01 01 83 00 00 00 |.....$...$......|
0x3f9020c0 28 b0 0c 00 00 65 00 00 30 c0 81 0f 00 24 30 00 |(....e..0....$0.|
0x3f9020d0 26 00 00 00 00 24 30 00 26 00 00 00 00 00 00 00 |&....$0.&.......|
0x3f9020e0 00 00 00 00 00 00 00 00 fc ff 00 00 d5 63 8d 50 |.............c.P|
0x3f9020f0 5c 6f 82 96 be c3 d4 ea f5 ff 4c 00 00 00 00 00 |\o........L.....|
but the hdmi block, returns proper data, not dbus
@kindred raptor the VPU is a dual-core processor, with L1 and L2 caches
it has 32 scalar registers, of 32bits each
and 8x8x8bit of vector registers
much like arm, the top few scalar registers have a special purpose
r25 is sp
r26 is lr (link register)
r30 is the status register
r31 is the pc
(Actually, even the nintendo 64 MIPS CPU thing had a cache. I do not know why I expected the videocore no to have one. lol.)
out of reset, it will be executing a maskrom
that rom will zero out the entire L2 cache using vector writes
efficient bringup ROM :P
i think its due to the cache-as-ram hack
if you write to an entire cacheline at once, the write just goes into the cache, and never bothers dram
if you get a cache hit,(read or write) it just works, and never bothers dram
if you cache miss, it goes to dram, and 💥 wheres the ram?? 😄
to prevent 💥 , you need to pre-fill the entire L2 cache with nulls
so when a read happens anywhere, its a cache hit
that rom will then try to load bootcode.bin into the L2 cache, from one of ~8 sources
1/2/4 are all bootcode.bin on fat on SD
but they differ in which sd controller is used, and if its 4bit or 8bit mode
3 is raw nand flash
5 is SPI flash
6 on the pi0-pi2, is usb-device
6 on the pi3/pi02, can be device, host, or both
7 is i2c-slave
@kindred raptor the bcm2835 is weird, in that the arm lacks a dedicated L2 cache!
so the VPU L2 cache is sort of loaned to the arm
free cache coherence! :>
somewhat, you still had to manage the arm L1
and there was more latency, vs a proper arm L2
linux has also never been aware of that VPU L2 cache
its just transparently treated as normal ram
@leaden junco let me re-read the bcm2711 datasheet on interrupts, and see what it says, and if i missed something....
(👋)
so, top-left, we have the arm generic timer, and its 5 timer irq's, repeated for each core
Is this the peripherals datasheet?
yep
then we have the ARM_LOCAL block, which i think is a broadcom custom peripheral, that is only visible to the arm, and it has 19 irq's
then we have the ARMC block, which has the VPU<->ARM mailboxes, and some of the legacy irq stuff from the bcm2835
then you have 62 peripheral interrupts from the VPU
plus 2 irq's from the ethernet/pcie
and then 57??? irq's from the ethernet/pcie, lol
all of that funnels into the gic, which masks/routes things, and generates an IRQ+FIQ pair for each core
things like irq 57 in this chart, are what ruin micro-kernels on the bcm2711
you cant route ttyAMA0 to one process, and ttyAMA1 to another process using standard GIC code
you have to taint your micro-kernel and make it a little less micro, by adding support for the non-standard irq handling
it kind of defeats the point of using a GIC, and shows that broadcom didnt fully trust the GIC when designing the chip
ah, and this is the "57" interrupts from ethernet/pcie
4 to emulate legacy pci
1 for pcie msi
2 for ethernet
1 for the internal xhci i think?
no clue what avs is
that secure ethernet irq is fishy
it smells of IPMI
The secure IRQ output (which is only useful for the VPU and not the CPU) from the ETH_PCIe block is routed to VC
peripheral IRQ 63
like the VPU can configure genet to send certain packets to the VPU, and the arm/vpu can share the genet
it then sort of repeats the info, but this time using GIC lingo
the legacy irq/fiq and all of the per-core timers, get routed to PPI's (per processor interrupts)
the PMU (profiling) interrupts are not routed to PPI's, so the GIC can decide which core to interrupt when core0 has had too many L1 misses
which can limit the taint on the profiling data
and then all of the other irq sources are routed to normal SPI's
so this table is required, if you want to configure the gic correctly (if you where writing/porting a kernel)
@leaden junco i think i see where my misunderstanding started
the legacy irq controller, with its limited irq routing, is itself an irq source for the gic!
but the gic also gets all of the interrupts directly, and can then route those better
Great! So it is actually re-routing the interrupts and not just showing as such
Follow up question: how many PCIe interrupts SoC have? the MSI interrupt I think
the 4 legacy inta/intb/incd/intd from pci
plus a single msi interrupt
to figure out what the msi meant, you would then have to ask the pcie controller
I'm wondering if spreading the PCIe interrupts can have some performance increase for NVMe array
the gic treats MSI as a single interrupt source
Ah ok
so all MSI's have to go to a single cpu core
Because if I attach the NVMe drive I can see it is creating four queue interrupts
in theory, broadcom could have exposed the gic to pci-e dma, and then gic MSI's could have been used
i dont know if that works
oh yeah, thats a thing nvme does, each cpu core gets its own command/reply queue
so different cores can issue commands, without a global mutex
and thanks to my new 32core desktop, /proc/interrupts is now unreadable, lol
But the catch here is that on RPI they are all using CPU0
which, make no sense to create multiple queue
on my 8 core laptop, i can see that the are allocated a bit strangely
125: 0 0 3067684 0 0 0 0 0 PCI-MSI 2097152-edge nvme0q0
126: 0 0 0 0 128860067 0 0 0 PCI-MSI 2097153-edge nvme0q1
127: 0 63475567 0 0 0 0 0 0 PCI-MSI 2097154-edge nvme0q2
128: 0 0 63272082 0 0 0 0 0 PCI-MSI 2097155-edge nvme0q3
129: 0 0 0 63109892 0 0 0 0 PCI-MSI 2097156-edge nvme0q4
130: 0 0 0 0 0 63635747 0 0 PCI-MSI 2097157-edge nvme0q5
131: 0 0 0 0 0 0 63428409 0 PCI-MSI 2097158-edge nvme0q6
132: 0 0 0 0 0 0 0 60445457 PCI-MSI 2097159-edge nvme0q7
I was messing around the kernel code to see if there is any performance different
I guess the only issue I see is nvme0q0 is on the same CPU as nvme0q3
queues 0 and 3 go to core 2
queue 1 goes to core 4
queue 2 goes to core 1
queue 4 goes to core 3
queue 5 goes to core 5
queue 6 goes to core 6
queue 7 goes to core 7
the multiple queues serves 2 purposes
1: you can issue a command to the nvme, by just turning interrupts off, and writing to the queue for your current core, no need to get any locks
2: the reply goes back to the core that just scheduled the job, so the L1 cache is ready to resume whatever just asked for the data
This is the forum page about the topics
your loosing 2, but not 1
why are both queues on the same core and one is not utilized?
@kindred raptor oh, incase you missed it:
https://www.youtube.com/watch?v=4PQLIjj4i1I
i now have the hw sprites on the pi-zero working under circuitpython
https://github.com/adafruit/circuitpython/pull/8349 adds vc4 sprite support to circuitpython
code.py (in the PR) is the demo code
Nice! I guess you had to write some driver code in cython?
https://github.com/cleverca22/circuitpython/blob/broadcom-2d-accel/ports/broadcom/bindings/videocore/code.py has the python side of that demo
so you can see how simple it is to use
hos about the C part?
https://github.com/adafruit/circuitpython/pull/8349 is the c side
only 2 functions have any real logic, one to generate the display list for a sprite, and one to copy it into the hw
the rest is all just allowing properties to flow both ways
and in theory, it can handle dual-monitor on the entire vc4 lineup
Yeah, I recognized the display list stuff
the biggest implementation difference from little-kernel, is the active sprite list
LK has its own z-order sorted array, with functions to add/remove sprites, which lets multiple modules share the hw seamlessly
while the circuitpython version, just expects a python list of sprites, and python code must keep track
circuitpython is far more single-threaded, so it doesnt really need that flexibility, and you can always add it easily in python
and it made the resulting code massively simpler
it also helped to not have any legacy code, and to basically start from a clean slate, with everything i learned from the previous version
tomorrow, i need to look into TileGrid objects, and see if they can fully replace my custom Sprite class
@dusk fiber A dummmm question, do you know what are the reserved memory block at
2eff2000-2effffff : reserved
and
01150000-0154ffff : reserved
01550000-0186ffff : Kernel data
are used for?
heh, i just happen to be writing a 2-3 page forum post on that!
i assume thats your forum thread i just posted a 2 page reply onto
information overload time! 😄
from a design perspective, i would also say the VPU's reloc heap is better then linux's CMA heap
and linux could be improved to better utilize the CMA
are you aware of how both cma and the reloc heap work?
I treat CMA as a continues block so if you really need you can request without trying to move the pages.
But I'm not sure how VPU's heap work
hmm did you have arm_peri_high=1 in the config.txt
root@pi400:~# grep arm_peri /boot/config.txt
arm_peri_high=1
yep
for the linux CMA, its a special region that can only have CMA or movable pages, so when linux does need that contiguous block, it can move/push things out of the way, and allocate a large slice, like your 7mb dma buffer
In the git issue
"OK, it seems to work. I tested ethernet, USB 3, and display output.
However, the firmware still doesn't report the last 64M of RAM as usable:"
"That makes sense. The code that generates the contents of the memory node is unaware of the arm_peri_high flag, so is always carving out the final 64MB for the peripherals. That's an easy fix."
So I think the 64M with arm_peri_high=1 is going to fix soon
ah, that sounds like a bug, but the VPU firmware still needs some of that ram
Also I wonder if libcamera can still work with arm_peri_high=1
it should
Because that memory address is sent to VPU to write buffer
only the MMIO window moves, and device-tree automates all of that
the dma is still in the lower 1gig
Ahhh I see, it is weird that my config file doesn't have the arm_peri_high=1 settings
i think with linux cma, once you allocate an object, you permanently have that chunk of ram, until you free the object
and to improve the chances of allocating well, it might over-allocate/align, so your 7mb buffers might turn into 8mb buffers
without that, you can get free space fragmentation
where you have 8 holes, all 1mb in size, but no 8mb hole
this is where the VPU's relocatable heap comes in to save the day (palmos also had the same feature)
all objects are movable, even dma objects!
what is VPU's relocatable heap and how it works?
this api and the few that follow give a rough idea
first, you have to allocate some memory, and you get a handle back
when you want to access the memory, you must call the lock function to get its current physical addr
then you can read/write, or do dma
when your done, call unlock, and never touch that address again
any unlocked object can be freely moved, to defrag the free space
I see, so that is where VPU is moving the address table
if you want to read the buffer, call lock again, to learn its new address, so some access, and then unlock
with this command, you can dump the entire relocatable heap
Yeap
[13229.767017] cma: cma_alloc(cma 00000000c236aafb, count 1904, align 8)
[13229.767229] cma: cma_alloc(): returned 00000000dd0e929c
[13229.770434] cma: cma_alloc(cma 00000000c236aafb, count 1904, align 8)
[13229.770581] cma: cma_alloc(): returned 00000000527e6c2a
[13229.773537] cma: cma_alloc(cma 00000000c236aafb, count 1904, align 8)
[13229.773702] cma: cma_alloc(): returned 000000001ecc5d6d
[13229.776005] cma: cma_alloc(cma 00000000c236aafb, count 1904, align 8)
[13229.776133] cma: cma_alloc(): returned 00000000c191821f
I think it is aligning to 4K pages
in my case, you can see a 39mb hole
a 512 byte object
a ~24kb object
a bunch of 740 byte objects
and more...
it doesnt need to force alignment on anything, and can just move things to defrag the free space
i think a security flag is messing with those numbers, let me find it...
there it is
Generally, pointers will be sanitized when kernel.kptr_restrict is non-zero.
(from a stack overflow answer)
Ah I was looking at the count
try setting that to 0 in /boot/cmdline.txt, reboot, and check dmesg for cma_alloc again
basically, linux is protecting itself against various exploits
so its censoring all pointers in dmesg, by hashing them
makes debug impossible, but also makes certain exploits impossible
But I'm also trying to use cma_debug_show_areas()
There is a debug function in cma.c
but like.... no examples on how to use it
000000001ecc5d6d just screams red flags, because it claimed to be 8 aligned, but its not aligned
I think cma allocating memory is fine, but just want to really maximize the aread
also someone is able to allocate 960M: https://www.marcusfolkesson.se/blog/contiguous-memory-allocator/
there is also one other trick you can do
Which is very interesting:
# dmesg | grep CMA [ 0.000000] Reserved memory: created CMA memory pool at 0x0000000056000000, size 960 MiB
you have 7gig of ram the vpu/gpu cant access
have a dedicated thread, that just copies frames from cma to normal ram, as fast as possible
dont process anything
the sooner you get it out of cma, the better
i dont think that was done on an rpi, the address is wrong
I think it is, based on the blog
2 solutions for that
1: fix linux, to allow a dma_buf to be cached, and evict the cache at the right time
2: use one of the 64bit capable dma cores, to just burst copy it without the arm being involved
ah yeah, the blog does say its for the rpi
but checking the dt he posted, i see the problem
he doesnt have the address range defined properly
so linux is allocating the cma pool beyond the lower 1gig
where dma cant reach
ive also heard that the cmdline.txt method breaks the same way, because it doesnt support the range definition
let me check that...
if CONFIG_CMA_DEBUG is enabled, that function will do things
if disabled, that function is a no-op...
Yeah this is what I'm wondering before
this
cma_alloc() will call it automatically, but only if an allocation fails, and warnings are allowed
So that even if he is able to allocate 960M that starting address won't work for libcamera
@no_warn: Avoid printing message about failed allocation
Oh I did recompile the kernel with CMA_DEBUG
exactly, but thats seperate from high peripherals mode
but it will only print that debug, if an allocation fails, and no_warn=0
I think these two are the way to go
you could move the call to cma_debug_show_areas() outside of that if condition
then it will print on every allocation, successful or not
Ah actually yes let me try that
https://datasheets.raspberrypi.com/
bcm2711-peripherals.pdf
page 65, the dma4 engines
that lets you access the entire 8gig range of the system
problem is, accessing them, i dont think linux just exposes a usable api
Can this be an example I can follow for bcm2711?
https://github.com/fandahao17/Raspberry-Pi-DMA-Tutorial
possibly
follow that, but keep in mind that high-peri has moved the mmio, and read the bcm2711 datasheet to confirm how the 2711 differs
Got it, for now I'll leave the high-peri mode off
ive only delt with dma on the vc4 era SoC's so far (pi0-pi3), mainly for pwm audio
I don't really see the benfict of high-peri though
like CMA still limited to the lower 1G anyways
the main benefit, is that you get another ~64mb of memory in the lower 1gig
I guess... yeah If I really want to maximize the area
root@pi400:~# cat /proc/iomem
4 7e20 1000-47e2011ff : serial@7e201000
here, you can see that my PL011 uart (ttyAMA0) is at 0x4_7e20_1000
which is just the raw VPU bus addr (7e) with a 0x4_0000_0000 offset, that can only be reached in 64bit mode
but, if i turn high peri off, and reboot...
fe201000-fe2011ff : serial@7e201000
oh, oops
the MMIO window is at the top of the 32bit space, just below 4096mb
so its reachable by any 32bit kernel, but its not actually in the way of the 1gig window
so, its more about arm memory, then the 1gig window
40000000-fbffffff : System RAM with low-peri
40000000-ffffffff : System RAM with high-peri
yep, bingo, i'm now short 64mb of ram
root@pi400:~# free -m
total used free shared buff/cache available
Mem: 3807 169 3302 37 334 3531
Swap: 99 0 99
with low peri
pi@pi400:~ $ free -m
total used free shared buff/cache available
Mem: 3870 165 3388 23 317 3613
Swap: 99 0 99
high peri
63mb difference in total memory, lets assume its just a rounding error
ideally, linux can just let you map the dma_buf objects with the cache enabled
but every time a peripheral completes a write (unicam frame done), linux issues a cache eviction for that range, to ensure it stays coherent
and for peripheral writes, you when you initiate dma, the driver has to pre-flush the cache
but a lot of drivers assume the cache is just always coherent (like x86, and other arm boards), so they are missing that code
checking the source, cma_alloc() and cma_release() are the main interface, and there is nothing for actually mapping it
which makes sense, other things can map it after allocating
oh, interesting, mm/cma_debug.c
So I can see two other then camera are using CMA, ethernet and NVMe for HMB
which makes sense
anything in /sys/kernel/debug about cma?
all i can find is:
root@pi400:/sys/kernel/debug/dma_buf# cat bufinfo
Dma-buf Objects:
size flags mode count exp_name ino
03145728 00000000 00080005 00000005 drm 00021531
Exclusive fence: drm_sched v3d_render signalled
Attached Devices:
47ec00000.v3d
Total 1 devices attached
Total 1 objects, 3145728 bytes
root@pi400:/sys/kernel/debug/dma_buf#
because i havent enabled cma debug
Nope
There is also a sysfs interface
but that only gives you the successful and failed alloc count
pi@camera6:~ $ cat /sys/kernel/mm/cma/reserved/alloc_pages_fail
0
pi@camera6:~ $ ls /sys/kernel/mm/cma/reserved/
alloc_pages_fail alloc_pages_success
oh, interesting, the debugfs api, lets you just directly alloc and release
with just echo alone, no proper c api
oh, i should maybe start at libcamera or unicam...
do you remember/know where the dma_buf gets allocated again? i forgot
[ 58.622167] cma: number of available pages: 13@9331+29@9411+2@9470+10@11158+2@11198+2@11230+2@11262+20@11564+2@11614+2@11646+2@11678+2@11710+2@11742+2@11774+20@12076+2@12126+2@12158+2@12190+2@12222+34@12254+144@14192+144@16240+144@18288+144@20336+144@22384+144@24432+144@26480+144@28528+144@30576+144@32624+144@34672+144@36720+144@38768+144@40816+144@42864+144@44912+144@46960+144@49008+144@51056+144@53104+144@55152+144@57200+144@59248+144@61296+144@63344+144@65392+144@67440+144@69488+144@71536+144@73584+144@75632+212@76076+107732@76588=> 112562 free of 184320 total pages
with 14 x 2 buffer for libcamera too
vc4_bo.c in the 2d side, vc4_free_object() will check if its an imported object (a gem object pointing to a buffer something else made)
if it was imported, it will just destroy the gem object (the wrapper around another type of buffer)
if it wasnt imported, it goes into a cache, so the 2d subsystem can reuse it without a free/alloc sequence
libcamera\src\libcamera\pipeline\raspberrypi\dma_heaps.cpp
`UniqueFD DmaHeap::alloc(const char *name, std::size_t size)
{
int ret;
if (!name)
return {};
struct dma_heap_allocation_data alloc = {};
alloc.len = size;
alloc.fd_flags = O_CLOEXEC | O_RDWR;
ret = ::ioctl(dmaHeapHandle_.get(), DMA_HEAP_IOCTL_ALLOC, &alloc);
if (ret < 0) {
LOG(RPI, Error) << "dmaHeap allocation failure for "
<< name;
return {};
}
UniqueFD allocFd(alloc.fd);
ret = ::ioctl(allocFd.get(), DMA_BUF_SET_NAME, name);
if (ret < 0) {
LOG(RPI, Error) << "dmaHeap naming failure for "
<< name;
return {};
}
return allocFd;
}`
and where did dmaHeapHandle_ come from?
pi@pi400:~ $ ls -l /dev/dma_heap/
total 0
crw-rw---- 1 root video 253, 1 Sep 4 02:40 linux,cma
crw-rw---- 1 root video 253, 0 Sep 4 02:40 system
ah, one of these, thats what i was looking for
so basically, you open linux,cma, and then you can issue a DMA_HEAP_IOCTL_ALLOC to allocate a dma_buf within the CMA
and you pass it a pointer to a dma_heap_allocation_data to describe the allocation request
drivers/dma-buf/dma-heap.c: struct dma_heap_allocation_data *heap_allocation = data;
include/uapi/linux/dma-heap.h: * struct dma_heap_allocation_data - metadata passed from userspace for
include/uapi/linux/dma-heap.h:struct dma_heap_allocation_data {
include/uapi/linux/dma-heap.h: * Takes a dma_heap_allocation_data struct and returns it with the fd field
include/uapi/linux/dma-heap.h: struct dma_heap_allocation_data)
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c: struct dma_heap_allocation_data data = {
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c: struct dma_heap_allocation_data_smaller {
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c: struct dma_heap_allocation_data_smaller);
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c: struct dma_heap_allocation_data_bigger {
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c: struct dma_heap_allocation_data_bigger);
which only shows up in 3 places in linux
1: the implementation
2: the headers
3: a tool for testing it
So what I read is that "buffers allocated using the CMA dma-heap are cached."
Now the question is I need to swap the v4l2 buffer with dma-heap
i think behind the scenes, everything will be dma_buf objects from cma
Basically I'm still referencing this one: https://forums.raspberrypi.com/viewtopic.php?t=352554
DmaHeap::alloc() will return the new buf-buf object
I should lookin to where is the v4l2 gets its buffer from.... hmm
yeah
i was expecting to find that in libcamera, but dont see it immediately
./drivers/media/platform/bcm2835/bcm2835-unicam.c
1814 .vidioc_reqbufs = vb2_ioctl_reqbufs,
1815 .vidioc_create_bufs = vb2_ioctl_create_bufs,
1816 .vidioc_prepare_buf = vb2_ioctl_prepare_buf,
1817 .vidioc_querybuf = vb2_ioctl_querybuf,
1818 .vidioc_qbuf = vb2_ioctl_qbuf,
1819 .vidioc_dqbuf = vb2_ioctl_dqbuf,
1820 .vidioc_expbuf = vb2_ioctl_expbuf,
2695 static const struct vb2_ops unicam_video_qops = {
2696 .wait_prepare = vb2_ops_wait_prepare,
2697 .wait_finish = vb2_ops_wait_finish,
2698 .queue_setup = unicam_queue_setup,
2699 .buf_prepare = unicam_buffer_prepare,
2700 .buf_queue = unicam_buffer_queue,
vb2_ioctl_create_bufs seems lik the best bet
drivers/media/common/videobuf2/videobuf2-v4l2.c:EXPORT_SYMBOL_GPL(vb2_ioctl_create_bufs);
vb2_create_bufs
vb2_core_create_bufs
__vb2_queue_alloc
__vb2_buf_mem_alloc
call_ptr_memop and now i'm lost
and moving the data with CMA buffer
when using libcamera, linux will be driving the unicam peripheral directly
and raw bayer frames land in memory entirely under linux's control, with no interaction from the firmware (other then clock setup, and power gating)
if you then want to do bayer->yuv, or awb, you have to run thru the ISP, which requires firmware support
2983 q->mem_ops = &vb2_dma_contig_memops;
aha, unicam just inherits the generic contiguous memory ops
vb2_dc_alloc()
based on the non_coherent_mem flag, it uses the coherent or non-coherent allocator
V4L2_MEMORY_FLAG_NON_COHERENT, that looks like something userland could set!
209
210 static int vb2_dc_alloc_non_coherent(struct vb2_dc_buf *buf)
211 {
212 struct vb2_queue *q = buf->vb->vb2_queue;
213
214 buf->dma_sgt = dma_alloc_noncontiguous(buf->dev,
ah, but then they switch up what its not, and ruin everything, so unicam cant use this path
DMA_ATTR_SKIP_CPU_SYNC
----------------------
By default dma_map_{single,page,sg} functions family transfer a given
buffer from CPU domain to device domain. Some advanced use cases might
require sharing a buffer between more than one device. This requires
having a mapping created separately for each device and is usually
aha, this sounds like what i said earlier, about cache maintaince!
transfering ownership of the buffer between the cpu and device, and keeping the cpu cache in sync
but its now 3:42 am, i should get to bed
Yeah same for me, I'll look at the stuff you mention! Thanks a lot!