Code Test Feeback | Rust Programming Language Community | Page 1

The style of solution seems unusual for Rust, you're not making use of iterators even when they're appropriate. I think the clearest solution here would be;

data_stream.map(TypeTotals).fold(...)

But it depends on how well you can express combining two TypeTotal objects.

Ending up with an object expressing the types and their totals.

Hashmaps might not get you the best performance here, but swapping the default out for hashbrown usually helps as you're unlikely to be concerned about hash attacks.

You're also deserialising the entire JSON, but you only care about the value of the field "type", so you're wasting effort on each read. Serde will ignore unknown fields, so just deserialise to a struct and handle deserialise failures.

For your cases with a non string entry you can either fail, or skip. This should be a CLI option to the user. Default failure, "--skip-malformed" or similar for bad entries.

You can also go for divide and conquer on larger files with multiprocessing tools like rayon. Start by dividing the file into chunks with memory mapping, then each process only reads one chunk, aggregating your totals at the end. This requires some balancing if the effort in memory mapping is similar to your data processing, as your setup time can exceed the time saved. Worth benchmarking.

What I really don't like is how you're limiting yourself to files across the process. Your library should be accepting a stream of data, and should ignore the notion of files, except if you're pursuing the memory mapped strategy above, where the file is divided into multiple streams, with your library still working over each stream.

#Code Test Feeback