#simple json automation code review

2 messages · Page 1 of 1 (latest)

visual sundial
#

Hey guys.

I've been programming alone for quite a while and I'm worried I might be perpetuating bad habits I might not even be aware of. I'd appreciate the feedback on this piece of code and on how I've approached the problem. It's an automation script to calculate the mean and standard deviation of values in json files inside a directory (the json files have nested structures). Thank you for your time.

import copy
import glob
import json
import os
import statistics as stats
from sys import argv
from typing import Any


def read_json(file_name: str) -> Any:
    with open(file_name, mode="r", encoding="utf-8") as file:
        return json.load(file)


def group_metrics(json_files: list[Any]) -> Any:
    grouped_json = copy.deepcopy(json_files[0])
    for json_object in json_files[1:]:
        copy_information(grouped_json, json_object)
    return grouped_json


def copy_information(grouped_json: Any, json_object: Any):
    for key, value in json_object.items():
        if isinstance(value, dict):
            copy_information(grouped_json[key], value)
            continue
        grouped_json[key] = [grouped_json[key]]
        grouped_json[key].append(value)


def calculate_stats(grouped_json: Any) -> Any:
    for key, value in grouped_json.items():
        if isinstance(value, dict):
            calculate_stats(grouped_json[key])
            continue
        mean = stats.fmean(value)
        std = stats.stdev(value)
        grouped_json[key] = (mean, std)
    return grouped_json


def main():
    assert len(argv) == 2
    directory = argv[1]
    assert os.path.exists(directory)

    directory_files = [file_name for file_name in glob.glob(f"{directory}/*.json")]
    json_files = [read_json(file_name) for file_name in directory_files]
    json_grouped = group_metrics(json_files)
    json_mean = calculate_stats(json_grouped)
    print(json.dumps(json_mean, indent=4))


if __name__ == "__main__":
    main()

stuck stirrup
#

One of the principals of functional code (so a personal preference) is the single responsibility principle. Basically every function should only do 1 thing.
group_metrics creates a deep copy and iterates over the json object. copy_information modifies the grouped json dict and groups values.
calculate_stats modifies the grouped json object and computes the mean/std.

The trouble with not following the single responsibility principle is you end up with functions that aren't reusable. You may want to do 1 thing but the existing function does a second thing you don't want to do, so you're forced to reimplement the same logic over again.

I see another issue that is kinda leading to the first issue. You modify the object that is passed to the function. You shouldn't do that as it can lead to unexpected side effects. A better approach is to return a value that can then be used to create the result you want.

If you didn't modify the arguments you wouldn't need to create deep copies. Which would save a lot of time.

Another thing I see is you're using asserts. Asserts are fine for debugging. In production though they can sometimes be disabled for performance reasons, so they shouldn't be relied on for validation.