Standardized Invoke Benchmark | Invoke | Page 1

languid kite Oct 7, 2023, 9:55 PM

#

User submitted benchmarks could utilize standardized seed and prompt per model. The benchmark could collect CPU, GPU and other relevant system settings and usage.

After the series of benchmarks are complete, the user could choose to submit benchmarks to invoke.

The end result is a database benchmarking hardware which would provide insights for development and end users.

lone ravine Oct 8, 2023, 12:08 AM

#

its a good idea, but identifyinng all the contributing settings would be challenging

#

for example, sometimes just a NVIDIA driver update of specific dlls can hhave drastic performance improvements...so this benchmark would need to identify all of the drivers that have an impact and get their dll version

#

for example: https://discord.com/channels/1020123559063990373/1111430198370500709

languid kite Oct 8, 2023, 1:50 PM

#

@lone ravine Shouldn't the DLLs correspond to driver and python packages versions? If so, that should capture most of the setting/software for performance. It should be trivial to capture pip package and graphics driver versions.

However, I don't think everything has to be captured initially for this to be actually useful.

lone ravine Oct 8, 2023, 8:03 PM

#

languid kite <@467496023846289419> Shouldn't the DLLs correspond to driver and python package...

in the past, the best drivers rrelated to python packages were terrible for the 40 series. Upgrading the dlls gave 2-3x perf gains for the 40 series, but hurt 30 series perf. So which dlls should the product ship with? Its not a trivial thing to ensure the ideal version config for every card

#

and again, im not saying its not possible, im saying there are a ton of variables and capturing all of them will be challenging, such that you may get terrible perf reports from misconfigured installations. this would be fine for "best case" peak metrics as long as at least some of the reports are configured well, but you may not be able to produce a reliable "average" typical use case expectation

#

I had had a discussion with @broken elbow about this some time in the past, he may have thoughts here.

broken elbow Oct 8, 2023, 9:27 PM

#

I do now have access to a server with a hypervisor setup so I could play around with spinning up a new server for something like this, but my current server is a consumer computer so idfk what kind of performance I sould push with the analysis.

upbeat dove Oct 12, 2023, 12:40 PM

#

It may not be scientific, but it could still be useful.

#

For example, a 4090 user may realize they're not getting the most out of their card if another 4090 user has done all the configs and reported "insane speeds"

broken elbow Oct 14, 2023, 11:01 PM

#

I could mess around with a simple python script that yeets some benchmark data to a server automatically, but I don't think I should use my current server for any form of actual deployment as it is hosted at my house :P
I haven't ever done that type of data processing before though, so it'll definitely require some learning on my part. I bet the benchmark itself would be relatively easy though. if we collected all the configs at runtime, we could simply host a site that lists all the documented card types and their highest performing config, obviously the data would be skewed by inconsistent climate and over/underclocking but it would still likely prove useful.

worldly leaf Oct 15, 2023, 8:05 PM

#

My opinion how it could look: An benchmark upload from an user could contain CPU model, frequency, number of cores visible in system (if inside VM it can be less or more than reality), GPU model, frequency, VRAM. Store if it's GPU or CPU computation.
Prompt something with specific model, specified steps and settings etc.
My suggestion, store something like hw-id, as one user might flood the statistics and skew them for all; instead, if possible store the measurements and just have user replace their measurement, or at least lock it to the hardware.. If you would think about fairness you might see possible pitfalls.

#Standardized Invoke Benchmark