#entire logs is about nvidia

223 messages · Page 1 of 1 (latest)

weak notch
#

how can i fix the logs

sudo dmesg
[ 1738.030637] NVRM: No NVIDIA devices probed.
[ 1738.031345] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
[ 1739.285272] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[ 1739.285278] NVRM: GPU 0000:08:00.0 is already bound to vfio-pci.
[ 1739.288921] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[ 1739.288923] NVRM: This can occur when another driver was loaded and 
               NVRM: obtained ownership of the NVIDIA device(s).
[ 1739.288924] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[ 1739.288925] NVRM: No NVIDIA devices probed.
[ 1739.289564] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
[ 1739.535654] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[ 1739.535661] NVRM: GPU 0000:08:00.0 is already bound to vfio-pci.
[ 1739.539388] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[ 1739.539390] NVRM: This can occur when another driver was loaded and 
               NVRM: obtained ownership of the NVIDIA device(s).
[ 1739.539391] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.```
quick narwhal
#

what

#

.s litany

buoyant treeBOT
# quick narwhal .s litany

Please follow the Standard Litany when asking a question:
What was your environment? What was your operating system, configuration?
What did you do? What did you run or test? Where?
What actually happened? What were the exact results, complete log contents, exact error messages?
What did you expect? What were you aiming to achieve? What result were you looking for?

Vague or superficial questions will yield vague or superficial answers. False information leads to false solutions.

Also see this similar guide on how to ask smart questions.

weak notch
#
linux /vmlinuz-linux-zen
initrd /amd-ucode.img
initrd /initramfs-linux-zen-vm.img
options root=UUID=72fc446c-0b05-491d-9b02-969cb75fc03a rw iommu=pt pcie_acs_override=downstream,multifunction pcie_port_pm=off split_lock_detect=off vfio-pci.ids=10de:2188,10de:1aeb rd.driver.blacklist=nouveau,nvidia,nvidia_drm,nvidia_modeset,i2c_nvidia_gpu modprobe.blacklist=nouveau,nvidia,nvidia_drm,nvidia_modeset,i2c_nvidia_gpu```
sharp zodiac
weak notch
#

it works prefectly fine

#

i just want logs to be normal not all about nvidia

#
linux /vmlinuz-linux-zen
initrd /amd-ucode.img
initrd /initramfs-linux-zen.img
options root=UUID=72fc446c-0b05-491d-9b02-969cb75fc03a rw iommu=pt pcie_acs_override=downstream,multifunction pcie_port_pm=off split_lock_detect=off```
i also have this
i want to be able to boot with both gpus working on linux or boot with nvidia gpu working on vms
sharp zodiac
#

binding a GPU to VFIO really confuses the nvidia driver, so if you want a "cleaner" dmesg you can also blacklist the nvidia driver itself, but then you can't hotswap.

#

once you have something to swap binds, you can just use this when nvidia has control of the device:
LD_PRELOAD="" __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia __VK_LAYER_NV_optimus=NVIDIA_only VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json [command]

weak notch
#

isn't this a blacklist?

sharp zodiac
#

it is. I'm not sure why that's not working, but I assume you don't actually want anything on the nvidia side blacklisted.

#

if you want to hotswap GPUs you need nvidia drivers loaded.

weak notch
#

i have amd gpu and nvidia gpu

sharp zodiac
#

I assumed as much

weak notch
#

i want to be able to boot with both working on linux
and other boot option to boot with nvidia gpu disabled for linux and nvidia gpu working for vms

sharp zodiac
#

oh, I see. That might be genuinely weirder to setup than hotswapping.

#

why not just swap without rebooting?

weak notch
sharp zodiac
#

some of them hotswap quite reliably.

weak notch
sharp zodiac
#

interesting. I can hotswap a 1080ti with an AMD on the host. Can you show the script you used?

weak notch
#

the gpu is working in sway

#

to make the gpu not work i have to close sway

#

when i close sway there is no linux

sharp zodiac
#

huh

weak notch
#

how do you hotswap

#

can you make me example script

sharp zodiac
#

I get the impression you have the GPUs installed in the wrong order here

weak notch
sharp zodiac
# weak notch can you make me example script

I can't because I don't know the PCIe addresses on your system. You want to use the unbind functionality on the driver in /sys/bus tree, change your setup so vfio does or does not take early control of the device, and then run a PCIe device scan again (echo 1 > /sys/bus/pci/rescan)

sharp zodiac
#

if you cannot unbind the nvidia driver without your system hanging then there might be some other fundamental issue at play

sharp zodiac
# weak notch

you need to experiment a bit first to see what works. This stuff is a bit hacky so I can't tell you what will work on your system:
sudo rmmod vfio_pci vfio_pci_core vfio_iommu_type1 to get VFIO to stop taking control of the GPU (since thats the state its booting in)
sudo echo 1 > /sys/bus/pci/rescan to rescan the bus and see if nvidia obtains the device. If yes, then the VFIO->nvidia swap worked. Usually this never crashes. If nvidia isn't loaded, do sudo modprobe -i nvidia_modeset nvidia_uvm nvidia.

#

you have two options for removing a driver from a device, and that's simply unloading the driver, or unbinding it and telling the driver to not take control of the device.

weak notch
sharp zodiac
#

you need to think for yourself a bit here.

#
  1. whatever you have blacklisted in the kernel options you listed earlier isn't working, as nvidia is clearly loaded from your dmesg logs. I can't tell you why that is, likely has something to do with your bootloader and how you're updating it.
#
  1. you don't need to blacklist anything immediately, as vfio is taking early control of your device. This is good, because you want to make sure your AMD gpu is always used for display on the host (this should always be the case if its the primary GPU, although 'primary' is more of a BIOS detail).
#
  1. you need to test if nvidia can take control of the device and relinquish it gracefully, without a system hang. Try with modprobe and see what happens, and read dmesg.
weak notch
#

ok so i am trying to fix this issue to be able to read logs for #1414757329286598786

sharp zodiac
#

I have no idea why you think that's related.

weak notch
#

its full of nvidia

sharp zodiac
#

is nvidia spamming dmesg?

#

because that shouldn't happen

weak notch
#

this why i made this support

sharp zodiac
#

it usually spits that out like twice or three times

#

okay well then just modprobe nvidia

weak notch
sharp zodiac
#

I don't see what the problem is

weak notch
#

i want nvidia to be like never installed when i boot with vfio-pci

sharp zodiac
#

blacklisting or modprobe'ing the driver isn't going to make a big difference

weak notch
#

how can i make 2 kernels
one that has no nvidia drivers and one that does?

sharp zodiac
#

I don't know why your driver blacklist isn't working. That's on your bootloader config.

sharp zodiac
weak notch
#

i have created 2 initramfs

sharp zodiac
#

no.

weak notch
#

hm i spent long time making pacman hook to create other initramfs

sharp zodiac
#

why are you overcomplicating this

#

its literally just a driver blacklist

weak notch
#
#!/bin/bash
set -e

# List of kernel preset files to patch
KERNEL_PRESETS=(
    /etc/mkinitcpio.d/linux.preset
    /etc/mkinitcpio.d/linux-zen.preset
    /etc/mkinitcpio.d/linux-lts.preset
    /etc/mkinitcpio.d/linux-hardened.preset
    /etc/mkinitcpio.d/linux-vfio.preset
)

# Loop through each preset
for preset in "${KERNEL_PRESETS[@]}"; do
    [[ -f "$preset" ]] || continue

    echo "Patching $preset..."

    # Add 'vm' to PRESETS if missing (with quotes)
    if ! grep -q "'vm'" "$preset"; then
        sed -i "/^PRESETS=/ s/)/ 'vm')/" "$preset"
    fi

    # Add vm_* lines if missing
    if ! grep -q "^vm_image=" "$preset"; then
        cat <<EOF >> "$preset"

# vm preset
vm_image="/boot/initramfs-$(basename "$preset" .preset)-vm.img"
vm_config="/etc/mkinitcpio-vm.conf"
EOF
    fi
done

echo "All presets patched."```
so i don't need this?
sharp zodiac
#

unless you made a custom kernel in the first place and compiled these modules into the kernel itself, which I highly doubt

weak notch
# sharp zodiac ??? why are you doing this

vm initramfs MODULES=(vfio_pci vfio vfio_iommu_type1 amdgpu)
reguler initramfs MODULES=(vfio_pci vfio vfio_iommu_type1 amdgpu nvidia nvidia_modeset nvidia_uvm nvidia_drm)

sharp zodiac
#

yeah, this is why I'm saying to just have one config and change the setup later in the boot process or just hotswap at runtime. You only need VFIO to load early, and you never need to remove that from your initramfs

#

nvidia does not need to load early because it's not trying to grab a primary GPU that you need for basic display

#

period.

#

it's a surprisingly flexible driver

weak notch
#

if nvidia did not load early i can't use the gpu

sharp zodiac
#

that's why you use modprobe. I told you to test it right now using those commands to see if that works properly.

weak notch
#

i don't want modprobe

#

i don't want to run any command

sharp zodiac
#

why

weak notch
#

i want it to be all automatic

sharp zodiac
#

you can make it into a fucking macro if you wanted

#

is it not easier to run a single script for swapping than rebooting your entire computer with a different initramfs?

weak notch
#

rebooting is easier

#

as logging out is broken

sharp zodiac
#

you don't even need to log out

weak notch
#

I CAN'T

#

understand

#

the gpu is using everything

#

display games everything

#

to remove its drivers everything has to be closed

sharp zodiac
#

you need to fix that, then

weak notch
#

fix what

#

make gpu not work for anything?

#

this not what i want

sharp zodiac
#

your compositor should not be grabbing anything but your primary gpu

weak notch
#

this is not how i configured it

#

i configured it to work on all gpus

sharp zodiac
#

you are just making life harder for yourself

weak notch
#

come on

#

all i wanted is to use davinci resolve

#

it does not work on amd

sharp zodiac
#

I'm basically telling you there's a way to set up your compositor and system in a way where you can use a single script to hotswap your nvidia gpu from vfi to the nvidia driver, and then subsequently run applications as needed to use the nvidia card instead of the amd card. This is how you should set it up, because it then delegates compositing and less intensive rasterizing for simple apps to your AMD gpu, and more intense stuff to your Nvidia card.

#

without a log out or reboot

weak notch
#

everything else on amd

sharp zodiac
#

that would cover that case as well

#

you would run davinci resolve with LD_PRELOAD="" __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia __VK_LAYER_NV_optimus=NVIDIA_only VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json %command% once the GPU is controlled by nvidia, and then you would just do nothing for every other app.

#

or, if you wanted it on VFIO, you'd run a script

#

it would swap

#

and then you'd boot your VM and you can do whatever you want in the VM.

weak notch
sharp zodiac
#

okay so you're already doing that

#

wait, but your env?

#

SERIOUSLY

weak notch
sharp zodiac
#

THE WHOLE POINT IS TO SELECTIVELY USE THE NVIDIA GPU, NO?

weak notch
#

Exec=/usr/bin/nvidia-gpu-run /opt/resolve/bin/resolve %u

weak notch
sharp zodiac
#

that env is forcing everything to run on nvidia.

#

hell that's probably why nvidia is spamming your dmesg dude

weak notch
#

this is script

#

this is not env

#

its bash script

#

that you use to run apps

sharp zodiac
#

okay

#

then why

#

are you bothering

#

with initramfs

weak notch
#

i need the gpu to be on vms

sharp zodiac
#

okay, fine, I will tell you what's wrong with your initramfs and you can reboot (for literally no reason) to do this: your regular initramfs should not be loading vfio, and because of the order, vfio will always take control of the GPU. Just remove it.

weak notch
#

i need the vm to watch some videos

#

so i use the vm to record the videos
then reboot and boot with nvidia gpu on linux to edit the videos

sharp zodiac
#

that's fine, I get it, you're just making this way more complicated than it needs to be

#

the reboot and initramfs configs aren't needed

weak notch
#

i hearied from 1 youtuber that there is something i can put

#

that makes the gpu not work for anything expect resolve

sharp zodiac
#

I told you what was wrong with your regular initramfs

weak notch
#

idk how to do that

sharp zodiac
#

just fix that

weak notch
#

wait

#

you told me to undo what i did

#

right?

#

the 2 initramfs

sharp zodiac
#

I told you to not bother with entire rebooting nonsense entirely

weak notch
#

ok

sharp zodiac
#

but you're not listening

weak notch
#

so you want me to undo that

#

wait let me do it

#

i am doing what you say

sharp zodiac
#

you can absolutely reboot for all of this if you want to

weak notch
#

no

#

i am doing what you want

sharp zodiac
#

I'm just telling you that it is actually possible to hotswap

weak notch
#

no reboot = no reboot

sharp zodiac
#

if you want to try the hotswap approach run the modprobe stuff to confirm it actually works

#

which I said earlier

weak notch
#

they did work before

#

when i was using kde because logout was working on kde

#

so look why is logout not working?

#

once i log out it re opens sway

#

it just re opens it

sharp zodiac
#

that is likely something to do with your DM.

#

what DM are you using

weak notch
#

do i need to unload vfio-pci?

#

i think that i do

sharp zodiac
#

you need to unload vfio first, yes, also from your logs it was clear both vfio and nvidia were already loaded.

#

man, I'm really losing patience here

weak notch
sharp zodiac
#

you seem like someone who just wants to follow a bunch of steps without thinking and its clear you've watched a couple tutorials and blindly put your system in a weird state. Your initramfs setup isn't doing what you think its doing, your session issues with logging out is likely due to how you configured sway to run

sharp zodiac
# weak notch

you need to unbind the device first before running modprobe in this case

weak notch
#

it freezes on this

sharp zodiac
#

... because you probably removed vfio

sharp zodiac
#

if you want to run the VM, you need it controlled by VFIO.

weak notch
#

wait is it modprobe -r

#

or rmmod

sharp zodiac
#

literally does the same thing

#

read the manpage

#

removing VFIO may have allowed nvidia to take control of the GPU. That means the host can now use it. Test it.

#

to do the inverse, you need to unbind the driver from the PCIe device, remove the driver, re load vfio, and then scan the PCIe bus. Its possible something else could trigger the bus scan, so that might not be needed.

weak notch
#

oh so look

#

i found out why its not working

#

in use right?

#

sudo lsof /dev/nvidia*
says
lact
sway

#

sway opend on the gpu

#

it has to be from tty wait

sharp zodiac
#

you can forcefully unbind it, it's possible its not sway in this case. Use echo [address] > /sys/bus/pci/devices/[address]/driver/unbind

#

its not the tty.

#

that's just now how that works.

#

the only risk with unbind is that you can leave the gpu in a bad state, but nvidia GPUs tend to play nice. Not sure about yours though.

#

regardless, sway should only be using the AMD gpu.

#

you should also be displaying everything on the AMD gpu, and the AMD gpu should be considered your primary gpu by the bios. Sometimes that's not configurable and boils down to which physical slot they are in.

weak notch
#

i don't understand what address i should put

#

pci_0000_08_00_0?

#

or 0000_08_00_0

weak notch
sharp zodiac
#

why are you using underscores

#

its literally in the list if you just bothered to use ls

#

🤦‍♂️

sharp zodiac
weak notch
#

ok wait brb i will go to eat then when i will be back i will use ls and try to fix it

sharp zodiac
#

I'm done helping you

weak notch
#

ok

#

thanks for giving me hope

sharp zodiac
#

like this is all possible but I can't magically re-trace everything you've done on your system nor am I going to walk you through this step by step

#

VFIO stuff is always finicky and you need to just understand what is actually happening

weak notch
#

i know that you said that are done helping me but i think i am done doing what you wanted

#

start.sh

#!/bin/bash
set -x

sudo systemctl stop coolercontrold
sudo systemctl stop lactd
sudo systemctl stop greetd

pkill -t tty1

sleep 2

sudo rmmod i2c_nvidia_gpu
sudo rmmod nvidia_drm
sudo rmmod nvidia_uvm
sudo rmmod nvidia_modeset
sudo rmmod nvidia

sudo virsh nodedev-detach pci_0000_08_00_0
sudo virsh nodedev-detach pci_0000_08_00_1
sudo virsh nodedev-detach pci_0000_08_00_2
sudo virsh nodedev-detach pci_0000_08_00_3

sudo modprobe vfio-pci

sudo systemctl start coolercontrold
sudo systemctl start lactd
sudo systemctl start greetd```
#

stop.sh

#!/bin/bash
set -x

sudo systemctl stop coolercontrold
sudo systemctl stop lactd
sudo systemctl stop greetd

pkill -t tty1

sleep 2

sudo modprobe -r vfio-pci

sudo virsh nodedev-reattach pci_0000_08_00_0
sudo virsh nodedev-reattach pci_0000_08_00_1
sudo virsh nodedev-reattach pci_0000_08_00_2
sudo virsh nodedev-reattach pci_0000_08_00_3

sudo modprobe nvidia
sudo modprobe nvidia_modeset
sudo modprobe nvidia_uvm
sudo modprobe nvidia_drm
sudo modprobe i2c_nvidia_gpu

sudo systemctl start coolercontrold
sudo systemctl start lactd
sudo systemctl start greetd```
#

on boot nvidia gpu works on linux
i can run start.sh in tty2 to get gpu in vms
then later i can run stop.sh in tty2 to get gpu back in linux

#

greetd seems to run everything at tty1 so it has to be killed as my user