#netdata service fails after each update due to overwriting the systemd unit

1 messages · Page 1 of 1 (latest)

livid lichen
#

the updater seems to put systemd units into /usr/lib/systemd/system/ with an ExecStart=/usr/sbin/netdata -D $EXTRA_OPTS.
suffice to say, my netdata is in /opt, and theres no /usr/sbin/netdata

Running kickstart in --dry-run apparently thinks that there is a netdata install in /, but there is not.
- Found an existing netdata install at /, but could not determine the install type. Usually this means you installed Netdata through your distribution’s regular package repositories or some other unsupported method.

steep mortar
#

You can overwrite systemd units by creating /etc/systemd/system/netdata.service.d/config with correct ExecStart= value

livid lichen
#

yea, I can also just symlink the correct service to systemd

#

that does not change the fact that the updater breaks my shit every time it runs because it can't correctly determine where the fuck shit is installed

#

I'd rather get the root caused fixed, than work around the symptoms

steep mortar
#

Well, when you create that file then updater should not touch it. My bet is that it is fixed value passed during build and it is not changed for your build.

cursive snow
#

don't you have leftovers of a previous install or something ?

livid lichen
#

yea, saw that as well

cursive snow
#

anyway you can use kickstart.sh --install-prefix with your path

livid lichen
#

but no, never installed it through native packages

#

only kickstart

#
Version: netdata v1.38.0-18-ga86d03b5e
Configure options:  '--prefix=/opt/netdata/usr' '--sysconfdir=/opt/netdata/etc' '--localstatedir=/opt/netdata/var' '--libexecdir=/opt/netdata/usr/libexec' '--libdir=/opt/netdata/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--enable-cloud' '--without-bundled-protobuf' '--disable-dependency-tracking' 'CFLAGS=-static -O2 -I/openssl-static/include -pipe' 'LDFLAGS=-static -L/openssl-static/lib' 'PKG_CONFIG_PATH=/openssl-static/lib/pkgconfig'
Install type: kickstart-static
    Binary architecture: x86_64
Features:
    dbengine:                   YES
    Native HTTPS:               YES
    Netdata Cloud:              YES
    ACLK:                       YES
    TLS Host Verification:      YES
    Machine Learning:           YES
    Stream Compression:         YES
Libraries:
    protobuf:                YES (system)
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    NO
    EBPF:                    YES
    IPMI:                    NO
    NFACCT:                  NO
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 NO
    Prometheus Remote Write: YES
Debug/Developer Features:
    Trace Allocations:       NO
cursive snow
#

you could try to find any *netdata* file or directory which is not in /opt

livid lichen
#

did, came up empty

#

no ref in yum either

cursive snow
#

trying to take a look at the updater script

#

/opt/netdata is referenced multiple time but in last case

#

so it must be finding a file somewhere before that

#

like /etc/netdata or something

#

do you remember the command you used to try to find netdata files ?

livid lichen
#

find / -type d -name *netdata*

#

although

#

I missed one apparently

#

the socket isn't in /opt/netdata/var/run/

#

it's in /run

cursive snow
#

hm I don't have an install on /opt but I'm not shocked about the socket staying in /run

deep bay
#

I think this may be my problem too... totally vanilla install from script.

cursive snow
#

apparently the updater uses the NETDATA_PREFIX env var

#

which should be set somewhere before either by the install script or something

#

or in the xx/etc/netdata/.environment I guess ?

#

indeed should be somewhere there

livid lichen
#
# Created by installer
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
CFLAGS="-static -O2 -I/openssl-static/include -pipe"
LDFLAGS="-static -L/openssl-static/lib"
MAKEOPTS="-j2"
NETDATA_TMPDIR="/tmp"
NETDATA_PREFIX="/opt/netdata"
NETDATA_CONFIGURE_OPTIONS=" --enable-cloud --without-bundled-protobuf --disable-dependency-tracking"
NETDATA_ADDED_TO_GROUPS=" docker nginx varnish haproxy adm nsd proxy squid ceph nobody"
INSTALL_UID="0"
NETDATA_GROUP="netdata"
REINSTALL_OPTIONS=" --disable-telemetry"
RELEASE_CHANNEL="nightly"
IS_NETDATA_STATIC_BINARY="yes"
NETDATA_LIB_DIR="/opt/netdata/var/lib/netdata"
deep bay
#

Would anyone be able to list the steps for me to correct this on Debian 11?

#

I'm a netdata newb.

livid lichen
#
→ stat /opt/netdata/etc/netdata/.environment
  File: ‘/opt/netdata/etc/netdata/.environment’
  Size: 625             Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 2124816     Links: 1
Access: (0644/-rw-r--r--)  Uid: (  997/ netdata)   Gid: (  994/ netdata)
Context: system_u:object_r:usr_t:s0
Access: 2023-02-08 15:04:19.967734933 +0000
Modify: 2023-02-08 04:33:04.553827208 +0000
Change: 2023-02-08 04:33:04.553827208 +0000
 Birth: -
#

I mean

#

it was set in place properly

cursive snow
#

indeed

deep bay
#

Is the updater going to keep breaking it?

livid lichen
#

just don't install it as a monolithic pile of junk into /opt and you should be fine

deep bay
#

I didn't specify an install location, I just ran the kickstart script.

cursive snow
#

@livid lichen it looks like it search for binaries too in the kickstart script to define ndprefixhttps://github.com/netdata/netdata/blob/master/packaging/installer/kickstart.sh#L860

#

can you extend the find

#

find / -name "*netdata*" -not -path "/opt/*" i guess

livid lichen
#

oh, good find

#

that might actually be the culprit

#

because I symlinked it to /usr/bin at some point for exactly the same reason

cursive snow
#

ah so we probably know why the kickstart script does that, but not the updater

#

the updater should use the environment file only though

livid lichen
#

disregard

#

yeeted the link

#

kickstart still needs the --install otherwise it looks at /

cursive snow
livid lichen
#

rest checks out

cursive snow
#

do you have /opt/netdata/etc/netdata/.install-type ?

livid lichen
#

yes

#
INSTALL_TYPE='kickstart-static'
PREBUILT_ARCH='x86_64'
#

which is correct

#

the file is also readable

#

so line 882 can't possibly fail

#

the thing is

#

I don't have environment in the tmpdir that was created

#

err sorry

#

wrong block

#

install-type

cursive snow
#

to me regarding the detection function it may find something in dpkg

livid lichen
#

it can't

#

it's completely void of anything netdata related

cursive snow
shut acorn
#

@livid lichen hey guys, sorry it it's been mentioned in the thread, what distribution are you running?

livid lichen
#

centos 7

cursive snow
#

hi there 🙂

livid lichen
#

also no

shut acorn
#

hey 🙂

livid lichen
#

rpm comes up empty as well

#

I'm tempted to nuke this entire thing and deal with dockers issues instead

shut acorn
#

Ok, will try to reproduce this and see..

livid lichen
#

I've also just yeeted the socket and /run/netdata

#

kickstart is still getting confused

shut acorn
#

Can you check a bit grep netdata /etc/passwd ?

livid lichen
#

netdata:x:997:994:netdata:/opt/netdata:/sbin/nologin

shut acorn
#

ok, that's ok, thanks

cursive snow
#

I tried to kickstart on a centos docker container, saying no to yum install, it does install on /opt and further dry-run successfully find the install :/

livid lichen
#

:/

shut acorn
#

Can you run rpm -q netdata ?

livid lichen
#

empty

#

pretty sure this issue is due to some exotic fuckup from earlier updates

shut acorn
#

Some thing must be left behind, perhaps yes... Can you run this -> PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/cm/.local/bin:/home/cm/bin:/usr/local/bin:/usr/local/sbin command -v netdata ?

#

sorry for the /home/cm, if possible replace with your home although I don't think it matters

livid lichen
#

empty

#
→ find / -maxdepth 2 -name *netdata*
/opt/netdata
#
→ find / -type d -name *netdata*
/opt/netdata
/opt/netdata/etc/netdata
/opt/netdata/var/run/netdata
/opt/netdata/var/cache/netdata
/opt/netdata/var/lib/netdata
/opt/netdata/var/log/netdata
/opt/netdata/usr/share/netdata
/opt/netdata/usr/libexec/netdata
/opt/netdata/usr/lib/netdata
#

(I cleaned up /tmp just before that)

shut acorn
#

ok, it must be some kind of bug, can't reproduce it though locally on a centos 7 here.....

livid lichen
#

ah

#

now that I removed all the remnants from /tmp

#

kickstart works

#

however, running it still produces a faulty systemd service

shut acorn
#

works you mean it detects /opt install by the service is wrong?

livid lichen
#

yes

#

so basically

#

we're back at the initial issue that prompted me to create this thread in the first place

#

if I delete the netdata.service from /usr/lib/systemd/system/, will kickstart recreate it?

shut acorn
#

it should yes, will do the same from scratch

#

well, you mean deleting just the service file? No then, I don't think it will

#

yeah, i think i see it now, the service file indeed is problematic

#

ok, we will fix it, not sure though why it has come up now..

cursive snow
shut acorn
#

perhaps, yes

livid lichen
#

where is the service generated?

#

I can't seem to find it

shut acorn
#

the service files are in system/

#

(in our source)

#

basically in the static install they are in: /opt/netdata/usr/lib/netdata/system/

#

we have one for v235 of systemd, and one other for >235

#

but the v235 one is problematic, since it's statically listing /usr/sbin/netdata

cursive snow
#

that's a good catch ^^

livid lichen
#

that explains it

#

I'm hardstuck on 219

astral vigil
#

The above-mentioned fix is now merged, so it should go out in tonight’s nightlies.

Additionally, we’re probably going to have a v1.38.1 patch release next week including this and a handful of other bug fixes (I’m trying to push internally for us to do more patch releases so that people using stable releases get bug fixes like this quicker).

livid lichen
#

I'm not on nightly, but I've disabled auto update due to this