#Oh I think I found the reason why people
1 messages ยท Page 1 of 1 (latest)
I think the problem is not that it hides the first entry, because that is intentional. I think the problem is that it doesn't read back far enough to find the previous value for the first actual reading of the period. The first actual reading being the end of the first period: i.e. if the time span would be a day showing hourly values, the first entry would be 01:00, as this is the usage between 00:00 - 01:00. For that it would need to read the sum from the 01:00 and the 00:00 rows. If a 00:00 row is not available, it should not show a value for 01:00, and so on. My queries take whatever row was the last row before the current row, but this could potentially include usage for previous hours as well.
The following query should list all available values for yesterday, so you can verify your assumptions:
select strftime( '%Y-%m-%d %H:%M:%f', se.start_ts, 'unixepoch') as [datetime], se.[sum] - ss.[sum] as [change] from statistics se inner join statistics ss on ss.metadata_id = se.metadata_id and ss.start_ts = strftime('%s', se.start_ts, 'unixepoch', '-1 hour') where se.start_ts > strftime('%s', 'now', 'start of day', '-1 day') and se.start_ts <= strftime('%s', 'now', 'start of day') and se.metadata_id = (select id from statistics_meta where statistic_id like 'sensor.%koelkast%kwh') order by se.start_ts;
For me this results in:
datetime|change
2023-09-14 01:00:00.000|0.000599999999999934
2023-09-14 02:00:00.000|0.000499999999999945
2023-09-14 03:00:00.000|0.000599999999999934
2023-09-14 04:00:00.000|0.000600000000000156
2023-09-14 05:00:00.000|0.000599999999999934
2023-09-14 06:00:00.000|0.000599999999999934
2023-09-14 07:00:00.000|0.000500000000000167
2023-09-14 08:00:00.000|0.000599999999999934
2023-09-14 09:00:00.000|0.000599999999999934
2023-09-14 10:00:00.000|0.000599999999999934
2023-09-14 11:00:00.000|0.000600000000000156
2023-09-14 12:00:00.000|0.000499999999999945
2023-09-14 13:00:00.000|0.000599999999999934
2023-09-14 14:00:00.000|0.000600000000000156
2023-09-14 15:00:00.000|0.000599999999999934
2023-09-14 16:00:00.000|0.000599999999999934
2023-09-14 17:00:00.000|0.000499999999999945
2023-09-14 18:00:00.000|0.000500000000000167
2023-09-14 19:00:00.000|0.000599999999999934
2023-09-14 20:00:00.000|0.000499999999999945
2023-09-14 21:00:00.000|0.000600000000000156
2023-09-14 22:00:00.000|0.000599999999999934
2023-09-14 23:00:00.000|0.000599999999999934
2023-09-15 00:00:00.000|0.000599999999999934
I think this is the correct range to show for September 14th: sept 14th, 01:00 up until sept 15th 00:00. The fact that this may feel odd is because how the data gets stored with the end-time of the period.
Using the above query you'll get only available and valid "change" values. If a row is missing, the next hour will not be returned (the inner join will not find the previous hour's row) and a change value will only be listed if both the row for t and t-1 are available.
Although this does properly show values only at the hours that valid pairs exist for, it may lose some of the total usage for the day: the change values shown are exactly what was used during that specific hour. Any usage in the not-shown column(s) is however not shown. This is why I had the earlier versions of my queries search for the "previous row", as opposed to the "previous hour's row". Searching for the previous row can result in (for example) one hour missing from the graph, but the next hour would still include that missing hour's usage with the usage of the next available hour after that. I personally think that is better: the sum of all hours listed will match the total for the day, whereas if you only list the exact usage per hour and leave out any incomplete pairs (= match on previous hour), the sum of all listed hours may be less than the total for the day when a gap exists within that day.
b.t.w. to find any statistic that has a gap in yesterday's hourly statistics, you can use this query:
select m.statistic_id, strftime( '%Y-%m-%d %H:%M:%f', se.start_ts, 'unixepoch') as [datetime], se.[sum] - ss.[sum] as [change] from statistics se inner join statistics_meta m on m.id = se.metadata_id and m.has_sum = 1 left outer join statistics ss on ss.metadata_id = se.metadata_id and ss.start_ts = strftime('%s', se.start_ts, 'unixepoch', '-1 hour') where se.start_ts > strftime('%s', 'now', 'start of day', '-1 day') and se.start_ts <= strftime('%s', 'now', 'start of day') and ss.metadata_id is null order by se.start_ts;
For me that resulted in:
statistic_id|datetime|change
sensor.balkon_regen_meter_water_consumption_cubic_meters|2023-09-14 02:00:00.000|
sensor.washok_ups_device_electric_consumption_kwh|2023-09-14 05:00:00.000|
sensor.slaapkamer_angelique_electric_consumption_kwh|2023-09-14 11:00:00.000|
So I put that last one in the query that lists all change values based on the "match on previous hour" query:
select strftime( '%Y-%m-%d %H:%M:%f', se.start_ts, 'unixepoch') as [datetime], se.[sum] - ss.[sum] as [change] from statistics se inner join statistics ss on ss.metadata_id = se.metadata_id and ss.start_ts = strftime('%s', se.start_ts, 'unixepoch', '-1 hour') where se.start_ts > strftime('%s', 'now', 'start of day', '-1 day') and se.start_ts <= strftime('%s', 'now', 'start of day') and se.metadata_id = (select id from statistics_meta where statistic_id like 'sensor.slaapkamer_angelique_electric_consumption_kwh') order by se.start_ts;
datetime|change
2023-09-14 12:00:00.000|0.00159999999999982
2023-09-14 13:00:00.000|0.00170000000000048
2023-09-14 14:00:00.000|0.00159999999999982
2023-09-14 15:00:00.000|0.00169999999999959
2023-09-14 16:00:00.000|0.00159999999999982
2023-09-14 17:00:00.000|0.00170000000000048
2023-09-14 18:00:00.000|0.00159999999999982
2023-09-14 19:00:00.000|0.00170000000000048
2023-09-14 20:00:00.000|0.00159999999999982
2023-09-14 21:00:00.000|0.00169999999999959
2023-09-14 22:00:00.000|0.00159999999999982
2023-09-14 23:00:00.000|0.00170000000000048
2023-09-15 00:00:00.000|0.00159999999999982
You see that any usage before 11:00 is not listed, but more importantly: the usage listed at 12:00 is exactly that what was used between 11:00 - 12:00. Any usage before 11:00 is not included in any graph, even though we do know the total value: it is the change between the previous row and the 11:00 row.
I'll point out I'm not really talking about SQL queries here, these are all done with recorder API calls.
I know, and that's exactly where the error is. That's what I'm trying to demonstrate here...
ok. Actually it seems the statistics platform already returns the change since the previous datapoint for you automatically, so it should not be necessary to worry about "looking back further".
I haven't seen the exact piece of code, but I think you are right. as the standard energy cards do not show the same (erroneous) behavior that the sankey-chart-card does.
e.g. here's a sensor I made "unavailable" for a few hours and turned it back on:
has a 448kWh spike
when I look at statistics for this time, I see nothing relevant
448kWh? That seems a bit high to me for a few hours. Do you have some factory in your back yard? ๐
it's all synthetic/fake ๐
but look at what data is actually returned from the API query:
the dialog only shows points 1 and 2
ignores 0
can you run my query on your data, just to humour me? I think you could be on to something there: if the sql query does not show that value, the API must have an error.
I think the only error here is the frontend decided not to show point #0, since it thinks it needs to calculate the change itself. it doesn't seem to be aware of this change variable, it just subtracts sum[1] - sum[0]
Yes, and I think it (somehow) incorrectly includes the [sum] instead of the change into cummulated values.
But only under some conditions...
If your fake sensor has 5kWh per hour, the total for that peak should have been 35kWh, not 448kWh.
ok, I just looked at the hight of the bars when the sensor was still available. the 1st bar after the gap should be the sum of the missing bars plus that bar itself, right?
yeah in this case that's correct
? what do you mean "it is correct"? The total of 448 is correct or my statement is corerect? ๐
I think both. you're right that that bar is just the accumulated usage since I made the statistic unavailable
I'm not saying there's something wrong with the bar, I'm just pointing out why it doesn't show up in the adjust dialog
OK, but 448 is not the sum of the missing bars: the sum of those missing bars is only 35kWh...
each "missing bar" is 60kWh
๐ณ , I see now. You're right. So, 448 IS correct.
I understand now: The logic in the API correctly searches for the previous row, whereas the logic in the AAS-dialog searches for the previous hour's row. Do you think you can fix that? (Maybe have the dialog call the API correctly?)
If you work on that dialog, could you have a look at the rounding of the values shown too? For small values (> 0.005 and < -0.005) the dialog shows 0, and it won't let you store an exact 0 value, as it seems to check the newly entered value against the rounded value, not the actual value read when deciding whether or not to write the newly entered value.
Here is dialog after minor change:
I could take a look at rounding, probably makes sense to do that in a separate PR though
makes sense (and avoids getting either change rejected)
I'm not sure what the history was of deciding two decimal places was
I'm not so much interested in changing the number of decimals, but more that it doesn't write the newly entered value if the newly entered value matches the rounded value instead of the actual value. i.e. you can't update any value that is between 0.005 and -0.005 to 0.0.
ah hm yeah that seems bad
Your change does seem to make it better indeed ๐
What happens in your dialog when you change the time to for example 1:00:00 pm?
(just checking for edge cases)
shows a single hourly datapoint, 1:00pm (-2:00pm), 448kwh
Do you think that is correct? As it showed that value for 1:55PM previously. Should it not show that 448 only for this 1:55PM row? I mean, should it not show no rows at all when you select 01:00:00 ?
I think it is. there's two separate queries happening, it queries both hourly statistics and 5 minute statistics.
If it finds 5 minute statistics it shows them, if not it shows the hourly statistics
since there's no 5 minutes within +/- 10 minutes of 1:00pm, it shows the hourly statistics at +/- 2 hours of 1:00pm
really? I see that this correctly applies your new logic, but I think that feature could potentially give new errors; it feels like that was meant for when you query for a period where the short term stats were cleaned and could give unexpected results when a gap exists in the 5 minute stats... But that's not something for now ๐
Actually, there was no data collected in the 00:00:00 - 01:00:00 period according to the graph you showed, was there? Or did you delete some data from the short term stats and the LTS are still completely there?
no there was nothing collected from 12pm-1pm. I never actively deleted anything, I just forced the entity that was collecting the statistics to unavailable
so it stopped collecting statistics for a few hours