#Oh I think I found the reason why people

1 messages ยท Page 1 of 1 (latest)

copper thicket
#

I think the problem is not that it hides the first entry, because that is intentional. I think the problem is that it doesn't read back far enough to find the previous value for the first actual reading of the period. The first actual reading being the end of the first period: i.e. if the time span would be a day showing hourly values, the first entry would be 01:00, as this is the usage between 00:00 - 01:00. For that it would need to read the sum from the 01:00 and the 00:00 rows. If a 00:00 row is not available, it should not show a value for 01:00, and so on. My queries take whatever row was the last row before the current row, but this could potentially include usage for previous hours as well.
The following query should list all available values for yesterday, so you can verify your assumptions:

select strftime( '%Y-%m-%d %H:%M:%f', se.start_ts, 'unixepoch') as [datetime], se.[sum] - ss.[sum] as [change] from statistics se inner join statistics ss on ss.metadata_id = se.metadata_id and ss.start_ts = strftime('%s', se.start_ts, 'unixepoch', '-1 hour') where se.start_ts > strftime('%s', 'now', 'start of day', '-1 day') and se.start_ts <= strftime('%s', 'now', 'start of day') and se.metadata_id = (select id from statistics_meta where statistic_id like 'sensor.%koelkast%kwh') order by se.start_ts;
#

For me this results in:

datetime|change
2023-09-14 01:00:00.000|0.000599999999999934
2023-09-14 02:00:00.000|0.000499999999999945
2023-09-14 03:00:00.000|0.000599999999999934
2023-09-14 04:00:00.000|0.000600000000000156
2023-09-14 05:00:00.000|0.000599999999999934
2023-09-14 06:00:00.000|0.000599999999999934
2023-09-14 07:00:00.000|0.000500000000000167
2023-09-14 08:00:00.000|0.000599999999999934
2023-09-14 09:00:00.000|0.000599999999999934
2023-09-14 10:00:00.000|0.000599999999999934
2023-09-14 11:00:00.000|0.000600000000000156
2023-09-14 12:00:00.000|0.000499999999999945
2023-09-14 13:00:00.000|0.000599999999999934
2023-09-14 14:00:00.000|0.000600000000000156
2023-09-14 15:00:00.000|0.000599999999999934
2023-09-14 16:00:00.000|0.000599999999999934
2023-09-14 17:00:00.000|0.000499999999999945
2023-09-14 18:00:00.000|0.000500000000000167
2023-09-14 19:00:00.000|0.000599999999999934
2023-09-14 20:00:00.000|0.000499999999999945
2023-09-14 21:00:00.000|0.000600000000000156
2023-09-14 22:00:00.000|0.000599999999999934
2023-09-14 23:00:00.000|0.000599999999999934
2023-09-15 00:00:00.000|0.000599999999999934
#

I think this is the correct range to show for September 14th: sept 14th, 01:00 up until sept 15th 00:00. The fact that this may feel odd is because how the data gets stored with the end-time of the period.

copper thicket
#

Using the above query you'll get only available and valid "change" values. If a row is missing, the next hour will not be returned (the inner join will not find the previous hour's row) and a change value will only be listed if both the row for t and t-1 are available.
Although this does properly show values only at the hours that valid pairs exist for, it may lose some of the total usage for the day: the change values shown are exactly what was used during that specific hour. Any usage in the not-shown column(s) is however not shown. This is why I had the earlier versions of my queries search for the "previous row", as opposed to the "previous hour's row". Searching for the previous row can result in (for example) one hour missing from the graph, but the next hour would still include that missing hour's usage with the usage of the next available hour after that. I personally think that is better: the sum of all hours listed will match the total for the day, whereas if you only list the exact usage per hour and leave out any incomplete pairs (= match on previous hour), the sum of all listed hours may be less than the total for the day when a gap exists within that day.

#

b.t.w. to find any statistic that has a gap in yesterday's hourly statistics, you can use this query:

select m.statistic_id, strftime( '%Y-%m-%d %H:%M:%f', se.start_ts, 'unixepoch') as [datetime], se.[sum] - ss.[sum] as [change] from statistics se inner join statistics_meta m on m.id = se.metadata_id and m.has_sum = 1 left outer join statistics ss on ss.metadata_id = se.metadata_id and ss.start_ts = strftime('%s', se.start_ts, 'unixepoch', '-1 hour') where se.start_ts > strftime('%s', 'now', 'start of day', '-1 day') and se.start_ts <= strftime('%s', 'now', 'start of day') and ss.metadata_id is null order by se.start_ts;
#

For me that resulted in:

statistic_id|datetime|change
sensor.balkon_regen_meter_water_consumption_cubic_meters|2023-09-14 02:00:00.000|
sensor.washok_ups_device_electric_consumption_kwh|2023-09-14 05:00:00.000|
sensor.slaapkamer_angelique_electric_consumption_kwh|2023-09-14 11:00:00.000|
#

So I put that last one in the query that lists all change values based on the "match on previous hour" query:

select strftime( '%Y-%m-%d %H:%M:%f', se.start_ts, 'unixepoch') as [datetime], se.[sum] - ss.[sum] as [change] from statistics se inner join statistics ss on ss.metadata_id = se.metadata_id and ss.start_ts = strftime('%s', se.start_ts, 'unixepoch', '-1 hour') where se.start_ts > strftime('%s', 'now', 'start of day', '-1 day') and se.start_ts <= strftime('%s', 'now', 'start of day') and se.metadata_id = (select id from statistics_meta where statistic_id like 'sensor.slaapkamer_angelique_electric_consumption_kwh') order by se.start_ts;
datetime|change
2023-09-14 12:00:00.000|0.00159999999999982
2023-09-14 13:00:00.000|0.00170000000000048
2023-09-14 14:00:00.000|0.00159999999999982
2023-09-14 15:00:00.000|0.00169999999999959
2023-09-14 16:00:00.000|0.00159999999999982
2023-09-14 17:00:00.000|0.00170000000000048
2023-09-14 18:00:00.000|0.00159999999999982
2023-09-14 19:00:00.000|0.00170000000000048
2023-09-14 20:00:00.000|0.00159999999999982
2023-09-14 21:00:00.000|0.00169999999999959
2023-09-14 22:00:00.000|0.00159999999999982
2023-09-14 23:00:00.000|0.00170000000000048
2023-09-15 00:00:00.000|0.00159999999999982
#

You see that any usage before 11:00 is not listed, but more importantly: the usage listed at 12:00 is exactly that what was used between 11:00 - 12:00. Any usage before 11:00 is not included in any graph, even though we do know the total value: it is the change between the previous row and the 11:00 row.

terse pawn
#

I'll point out I'm not really talking about SQL queries here, these are all done with recorder API calls.

copper thicket
#

I know, and that's exactly where the error is. That's what I'm trying to demonstrate here...

terse pawn
#

ok. Actually it seems the statistics platform already returns the change since the previous datapoint for you automatically, so it should not be necessary to worry about "looking back further".

copper thicket
#

I haven't seen the exact piece of code, but I think you are right. as the standard energy cards do not show the same (erroneous) behavior that the sankey-chart-card does.

terse pawn
#

e.g. here's a sensor I made "unavailable" for a few hours and turned it back on:

#

has a 448kWh spike

#

when I look at statistics for this time, I see nothing relevant

copper thicket
#

448kWh? That seems a bit high to me for a few hours. Do you have some factory in your back yard? ๐Ÿ™‚

terse pawn
#

it's all synthetic/fake ๐Ÿ˜›

#

but look at what data is actually returned from the API query:

#

the dialog only shows points 1 and 2

#

ignores 0

copper thicket
#

can you run my query on your data, just to humour me? I think you could be on to something there: if the sql query does not show that value, the API must have an error.

terse pawn
#

I think the only error here is the frontend decided not to show point #0, since it thinks it needs to calculate the change itself. it doesn't seem to be aware of this change variable, it just subtracts sum[1] - sum[0]

copper thicket
#

Yes, and I think it (somehow) incorrectly includes the [sum] instead of the change into cummulated values.

#

But only under some conditions...

#

If your fake sensor has 5kWh per hour, the total for that peak should have been 35kWh, not 448kWh.

terse pawn
#

those are 5 minute statistics, not hourly

#

so that's 5kWh per 5 minutes

copper thicket
#

ok, I just looked at the hight of the bars when the sensor was still available. the 1st bar after the gap should be the sum of the missing bars plus that bar itself, right?

terse pawn
#

yeah in this case that's correct

copper thicket
#

? what do you mean "it is correct"? The total of 448 is correct or my statement is corerect? ๐Ÿ˜‰

terse pawn
#

I think both. you're right that that bar is just the accumulated usage since I made the statistic unavailable

#

I'm not saying there's something wrong with the bar, I'm just pointing out why it doesn't show up in the adjust dialog

copper thicket
#

OK, but 448 is not the sum of the missing bars: the sum of those missing bars is only 35kWh...

terse pawn
#

each "missing bar" is 60kWh

copper thicket
#

๐Ÿ˜ณ , I see now. You're right. So, 448 IS correct.

copper thicket
#

If you work on that dialog, could you have a look at the rounding of the values shown too? For small values (> 0.005 and < -0.005) the dialog shows 0, and it won't let you store an exact 0 value, as it seems to check the newly entered value against the rounded value, not the actual value read when deciding whether or not to write the newly entered value.

terse pawn
#

Here is dialog after minor change:

#

I could take a look at rounding, probably makes sense to do that in a separate PR though

copper thicket
terse pawn
#

I'm not sure what the history was of deciding two decimal places was

copper thicket
#

I'm not so much interested in changing the number of decimals, but more that it doesn't write the newly entered value if the newly entered value matches the rounded value instead of the actual value. i.e. you can't update any value that is between 0.005 and -0.005 to 0.0.

terse pawn
#

ah hm yeah that seems bad

copper thicket
#

What happens in your dialog when you change the time to for example 1:00:00 pm?

#

(just checking for edge cases)

terse pawn
#

shows a single hourly datapoint, 1:00pm (-2:00pm), 448kwh

copper thicket
#

Do you think that is correct? As it showed that value for 1:55PM previously. Should it not show that 448 only for this 1:55PM row? I mean, should it not show no rows at all when you select 01:00:00 ?

terse pawn
#

I think it is. there's two separate queries happening, it queries both hourly statistics and 5 minute statistics.

If it finds 5 minute statistics it shows them, if not it shows the hourly statistics

#

since there's no 5 minutes within +/- 10 minutes of 1:00pm, it shows the hourly statistics at +/- 2 hours of 1:00pm

copper thicket
#

really? I see that this correctly applies your new logic, but I think that feature could potentially give new errors; it feels like that was meant for when you query for a period where the short term stats were cleaned and could give unexpected results when a gap exists in the 5 minute stats... But that's not something for now ๐Ÿ˜‰

#

Actually, there was no data collected in the 00:00:00 - 01:00:00 period according to the graph you showed, was there? Or did you delete some data from the short term stats and the LTS are still completely there?

terse pawn
#

no there was nothing collected from 12pm-1pm. I never actively deleted anything, I just forced the entity that was collecting the statistics to unavailable

#

so it stopped collecting statistics for a few hours

copper thicket
#

then, why is there an entry of 448kWh for 01pm?

#

in the LTS?

#

shouldn't that be collected at 02PM?

terse pawn
#

"1pm" is the growth between 1pm and 2pm

#

the sensor became available again at ~1:57pm and first got picked up in the 5 minute statistics for 1:55pm - 2:00pm