#Issues Deploying Endpoints for Custom HF and Preset Azure Mistral Models in Azure ML

5 messages · Page 1 of 1 (latest)

cedar pike
#

Hi everyone,

I'm fine‑tuning a language model for financial Q&A using a custom loss that emphasizes numerical tokens (tokens between <NUM> and </NUM>). I run the entire process in Azure ML using pipelines and register my models after training – one is a custom Hugging Face model and the other is an Azure Mistral 3B Finetuned model (which is a preset asset).

The Problem:

Custom HF Model:
When I deploy my registered custom HF model as an endpoint, the container crashes. The logs mention issues like a missing azureml-inference-server-http (which I’ve already added to my environment). My scoring script is configured to load the model from the AZUREML_MODEL_DIR environment variable, but I'm unsure how to properly connect the deployment to this registered model.

Azure Mistral Model (Preset):
I'm also encountering issues with the Azure Mistral 3B Finetuned model, which is registered as a preset asset. Since it isn’t packaged as a fully downloadable artifact, I run into mounting/loading problems when I try to use it as an input in my pipeline or deploy it as an endpoint.

Has anyone experienced these issues with deploying endpoints for both custom and preset model assets in Azure ML? How can I reliably connect my deployment to the registered models without encountering these artifact or mounting issues?

uneven heath
# cedar pike Hi everyone, I'm fine‑tuning a language model for financial Q&A using a custom ...

Hi I would recommend you look at using the Olive Pipeline to do your fine tuning and then deploy your model the Machine Learning Studio Model Endpoint here is an end to lab and tutorial on how to do this https://aka.ms/ignite/pre016

GitHub

Choosing the right finetuning technique, and discover tools for finetuning. A scenario will be used to provide real- world scenario for fine tuning, and optimization techniques - Azure/Ignite_Fine...

indigo oasis
cedar pike
#

Hi Lee,

Thank you so much for your recommendation and sorry for the delayed response.

I revisited the Olive pipeline from the Ignite FineTuning workshop example at https://github.com/Azure/Ignite_FineTuning_workshop/tree/main/lab/workshop-instructions/Lab5-Optimize-Model and followed it closely. After quantizing, fine-tuning, and optimizing my model, I deployed the endpoint to Azure ML.

However, when I query it with a prompt that appears in the training sample data provided —like:

{"prompt": "Can you recommend a restaurant in Tokyo?"}

I keep getting a short, incomplete answer, such as:

{"response": "Koichino (Japanese food),re you into sushi?d me a must-try in the"}

It’s the same truncated text multiple times in a row, rather than the more complete response that references “Sushi Saito” from the training data.

Everything finished successfully in the pipeline, so I’m wondering if you have any tips on why the model might ignore the fine-tuned responses or keep truncating them. Any guidance on improving the final generation or debugging would be really helpful! Thank you

GitHub

Choosing the right finetuning technique, and discover tools for finetuning. A scenario will be used to provide real- world scenario for fine tuning, and optimization techniques - Azure/Ignite_Fine...

cedar pike