that seems correct behaviour to me if | Home Assistant | Page 1

honest moon Jun 30, 2023, 12:25 AM

#

No, this running total sums differences between the data points, not the data points themselves. i.e. the 1st data point should never yield a result to be added to the running total, it merely primes the starting point for the 1st difference. Only when we retrieve the 2nd data point we can determine the 1st difference to be added to the running total. In the cases I've observed however the 1st data point's value was added to the running total, in some cases leading to an abnormal high running total for specific time periods. It depends on the actual value in that 1st data point how well it can be seen: if the 1st data point's value is close to 0, you'll hardly notice it on the running total. If however f.e. it is a value from an energy meter that has been running for years before that 1st data point, the difference will be significant. Since I only switched from Domoticz over to HA in March of this year, most of my energy meters started at non-zero values in HA and I see this happen a lot.

#

I have to admit though that I may have jumped to conclusions in reporting this to occur in all standard energy cards: I do sometimes notice abnormal spikes when using the standard energy cards too, but I could not yet exactly determine what causes these. It became clear to me what exactly was happening when I was working with MindFreeze's Sankey Chart Card from HACS (https://github.com/MindFreeze/ha-sankey-chart). Because this card can show the data in much detail, the results of this card can easily be verified by simply adding the numbers up and I saw the results didn't add up in some cases.
It could thus be that my observation applies only to that specific sankey card. However, the spikes in the standard energy cards always happen when selecting the same entities and period that I can see the error in the sankey card occurring on. i.e. it seems to me that either the erroneous code is shared between the sankey card and the standard cards or it has been copied over by MindFreeze from the standard card's code. I'm a professional on MS-SQL and have some (decades old) C experience and a little bit of C++ experience (also from 20+ years ago), but 'modern' front-end languages like javascript and python are rather new to me, so I have some difficulties reading the code of HA, let alone debug or alter it. I need someone that has more experience with this to have a look to confirm or disapprove my findings...

honest moon Jun 30, 2023, 12:49 AM

#

I'm collecting some data for you to see what I mean.

honest moon Jun 30, 2023, 2:17 AM

#

https://imgur.com/a/KxXOhCH As an example, here's one day's overview that can't be right, in this case because the device that reports into 'Keuken: Spots: Verbuik' is a MCOHome MH-DT411 which does support energy metering, but if you activate metering reports on it, it quickly floods the z-wave mesh by sending too much messages in a short period with no way to properly tune this. So I disabled metering in its device parameters. And since then it has sat on 3.47kWh. It should show as 0.0kWh energy used each and every day in the reports.
Next I will include a text file that has the queries plus their results from the statistics and statistics_short_term tables for this device in this period to show that all of the data points actually have the value of 3.47000000000022 in their sum column. The running total of the differences should therefor be 0 for the time period of June 22. The usage for this device is however listed as that initial value of 3.5kWh (= 3.4700000000022 rounded to 1 decimal) on this day. For other days it is correctly listed as 0kWh, and also when I select the week that has June 22 in it, the device is listed as 0kWh. Whereas if there realy was an entry of 3.5-ish kWh in that range, the running total for the week should at least be 3.5kWh too. However, only when I select DAY + June 22 exactly, the 3.5kWh usage shows.

honest moon Jun 30, 2023, 2:38 AM

#

https://drive.google.com/file/d/1GFc6E1DIjoVIpz2vYHt9FhzDFLGdmFTh/view?usp=sharing

#

Coming to think of it: why does it only happen on this June 22nd? There IS data available before that day, so why does it only on this day include that initial 3.5 value into the running total and not the expected difference of 0.0? If I select June 21st or June 23rd or any other day or even the week or month including June 22nd, the correct 0.0 is shown?

honest moon Jun 30, 2023, 3:45 AM

#

Ah, there is a gap in the data: last entry was at 21-06-2023 13:00:00.000, whereas the first entry on the 22nd was only at 22-06-2023 08:00:00.000.

#

short term statistics has the same gap: from 21-06-2023 13:50:00.000 to 22-06-2023 08:05:00.000.

#

So it seems to me that in the case where no data is available at the start of the running total's period, an incorrect starting value is used.

#

This could easily be solved by using a bit more sophisticated queries. One can retrieve the current sum and the sum from the previous row in one statement using the lag() operator. I have however no idea if it is possible to incorporate this into HA somehow...
The statement would look like this: select strftime('%d-%m-%Y %H:%M:%f', s.start_ts, 'unixepoch') as start_dt, s.*, lag(s.sum, 1) over (partition by s.metadata_id order by s.start_ts) as sum_previous from statistics s where s.metadata_id = 139 and s.start_ts >= (julianday('2023-06-21T00:00:00.000') - 2440587.5)*86400.0 and s.start_ts <= (julianday('2023-06-23T00:00:00.000') - 2440587.5)86400.0 order by s.start_ts limit 1000;
And to add a fix for this "initial datapoint issue" we could then easily add a coalesce() to make the 1st row return the same sum value for sum_previous, so that the 1st row's difference is always 0:
select strftime('%d-%m-%Y %H:%M:%f', s.start_ts, 'unixepoch') as start_dt, s., coalesce(lag(s.sum, 1) over (partition by s.metadata_id order by s.start_ts), s.sum) as sum_previous from statistics s where s.metadata_id = 139 and s.start_ts >= (julianday('2023-06-21T00:00:00.000') - 2440587.5)*86400.0 and s.start_ts <= (julianday('2023-06-23T00:00:00.000') - 2440587.5)*86400.0 order by s.start_ts limit 1000;

#

With
select strftime('%d-%m-%Y %H:%M:%f', s.start_ts, 'unixepoch') as start_dt, s.*, coalesce(lag(s.sum, 1) over (partition by s.metadata_id order by s.start_ts), s.sum) as sum_previous, s.sum - coalesce(lag(s.sum, 1) over (partition by s.metadata_id order by s.start_ts), s.sum) as sum_diff from statistics s where s.metadata_id = 118 and s.start_ts >= (julianday('2023-06-21T00:00:00.000') - 2440587.5)*86400.0 and s.start_ts <= (julianday('2023-06-23T00:00:00.000') - 2440587.5)*86400.0 order by s.start_ts limit 1000;
being the complete query to retrieve both the previous sum and the difference between the current and previous sums. So no one can make an error in that too, like conversion or rounding issues etc.

honest moon Jun 30, 2023, 8:23 AM

#

I found HA uses SQLAlchemy for constructing its sql statements. Out of curiosity now reading to see if it's possible to add fetching a previous row's value.

honest moon Jun 30, 2023, 8:41 AM

#

Alchemist at least does support calling window functions: https://docs.sqlalchemy.org/en/20/tutorial/data_select.html#using-window-functions

#

It seems it depends on how sqlalchemy is used by HA, as an ORM or as a SQL statement generator, both are possible (https://docs.sqlalchemy.org/en/20/tutorial/data_select.html#using-window-functions). So I'll do have to go through the HA code to answer my own curiosity 🙂

fathom mantle Jun 30, 2023, 2:35 PM

#

Can't really follow what you're doing but if you think there is a bug then you should make an issue on github

honest moon Jun 30, 2023, 3:59 PM

#

yes, I'm pretty sure there is a bug. However not sure where exactly it is; in the energy cards, in the sankey card or in HA. So trying to find that first.

#

the sqlalchemy is just my curiosity 😉

#that seems correct behaviour to me if