#Code Test Feeback

4 messages · Page 1 of 1 (latest)

spark thunder
#

The style of solution seems unusual for Rust, you're not making use of iterators even when they're appropriate. I think the clearest solution here would be;

data_stream.map(TypeTotals).fold(...)

But it depends on how well you can express combining two TypeTotal objects.

Ending up with an object expressing the types and their totals.

Hashmaps might not get you the best performance here, but swapping the default out for hashbrown usually helps as you're unlikely to be concerned about hash attacks.

You're also deserialising the entire JSON, but you only care about the value of the field "type", so you're wasting effort on each read. Serde will ignore unknown fields, so just deserialise to a struct and handle deserialise failures.

For your cases with a non string entry you can either fail, or skip. This should be a CLI option to the user. Default failure, "--skip-malformed" or similar for bad entries.

You can also go for divide and conquer on larger files with multiprocessing tools like rayon. Start by dividing the file into chunks with memory mapping, then each process only reads one chunk, aggregating your totals at the end. This requires some balancing if the effort in memory mapping is similar to your data processing, as your setup time can exceed the time saved. Worth benchmarking.

What I really don't like is how you're limiting yourself to files across the process. Your library should be accepting a stream of data, and should ignore the notion of files, except if you're pursuing the memory mapped strategy above, where the file is divided into multiple streams, with your library still working over each stream.

leaden jolt
#

I second the performance topic : multiprocessing/multithreading and avoid parsing the whole json as stated above.

#

a side note, in your current implementation : you are conditioning the desired result of the CLI with the level of logs of the CLI. (lib.rs#L158 , lib.rs#L159 and lib.rs#L161 : info! => println! )

#

on a personal view, I find this code overengineered. I wouldn't implement this much of traits and I would go with the bare minimum : example reader.rs is not needed I would say for this task (maybe I am wrong )