Hi team, I have a suggestion for the Model Caching feature in Serverless.
- Currently, it seems to cache the entire Hugging Face repository.
- This is inefficient for GGUF LLM quantizations (e.g., https://huggingface.co/unsloth/GLM-4.5-Air-GGUF). These repositories often contain multiple full-model copies (different quantizations), leading to significantly wasted download time/storage.
Suggestion: Please consider adding an option to specify a subset of files/regex from a Hugging Face repository to cache. This would allow users to download only the specific GGUF quantization they intend to use.