Issues Deploying Endpoints for Custom HF and Preset Azure Mistral Models in Azure ML | Microsoft Foundry | Page 1

cedar pike Feb 6, 2025, 5:27 PM

#

Hi everyone,

I'm fine‑tuning a language model for financial Q&A using a custom loss that emphasizes numerical tokens (tokens between <NUM> and </NUM>). I run the entire process in Azure ML using pipelines and register my models after training – one is a custom Hugging Face model and the other is an Azure Mistral 3B Finetuned model (which is a preset asset).

The Problem:

Custom HF Model:
When I deploy my registered custom HF model as an endpoint, the container crashes. The logs mention issues like a missing azureml-inference-server-http (which I’ve already added to my environment). My scoring script is configured to load the model from the AZUREML_MODEL_DIR environment variable, but I'm unsure how to properly connect the deployment to this registered model.

Azure Mistral Model (Preset):
I'm also encountering issues with the Azure Mistral 3B Finetuned model, which is registered as a preset asset. Since it isn’t packaged as a fully downloadable artifact, I run into mounting/loading problems when I try to use it as an input in my pipeline or deploy it as an endpoint.

Has anyone experienced these issues with deploying endpoints for both custom and preset model assets in Azure ML? How can I reliably connect my deployment to the registered models without encountering these artifact or mounting issues?

uneven heath Feb 7, 2025, 7:48 AM

#

cedar pike Hi everyone, I'm fine‑tuning a language model for financial Q&A using a custom ...

Hi I would recommend you look at using the Olive Pipeline to do your fine tuning and then deploy your model the Machine Learning Studio Model Endpoint here is an end to lab and tutorial on how to do this https://aka.ms/ignite/pre016

GitHub

GitHub - Azure/Ignite_FineTuning_workshop: Choosing the right finet...

Choosing the right finetuning technique, and discover tools for finetuning. A scenario will be used to provide real- world scenario for fine tuning, and optimization techniques - Azure/Ignite_Fine...

indigo oasis Feb 7, 2025, 5:31 PM

#

cedar pike Hi everyone, I'm fine‑tuning a language model for financial Q&A using a custom ...

@cedar pike - can you share the exact log message (or stack trace) for the HF model crash? I'm curious to learn more.

cedar pike Feb 17, 2025, 7:30 PM

#

Hi Lee,

Thank you so much for your recommendation and sorry for the delayed response.

I revisited the Olive pipeline from the Ignite FineTuning workshop example at https://github.com/Azure/Ignite_FineTuning_workshop/tree/main/lab/workshop-instructions/Lab5-Optimize-Model and followed it closely. After quantizing, fine-tuning, and optimizing my model, I deployed the endpoint to Azure ML.

However, when I query it with a prompt that appears in the training sample data provided —like:

{"prompt": "Can you recommend a restaurant in Tokyo?"}

I keep getting a short, incomplete answer, such as:

{"response": "Koichino (Japanese food),re you into sushi?d me a must-try in the"}

It’s the same truncated text multiple times in a row, rather than the more complete response that references “Sushi Saito” from the training data.

Everything finished successfully in the pipeline, so I’m wondering if you have any tips on why the model might ignore the fine-tuned responses or keep truncating them. Any guidance on improving the final generation or debugging would be really helpful! Thank you

GitHub

Ignite_FineTuning_workshop/lab/workshop-instructions/Lab5-Optimize-...

Choosing the right finetuning technique, and discover tools for finetuning. A scenario will be used to provide real- world scenario for fine tuning, and optimization techniques - Azure/Ignite_Fine...

cedar pike Feb 17, 2025, 7:32 PM

#

indigo oasis <@219888888867586048> - can you share the exact log message (or stack trace) fo...

Sure, I deleted the endpoint and went to another approach but the issue was also in Azure ML in the artifacts, the only file there is the model.pt which is the pytorch checkpoint file that contains the trained model's weights and state

#Issues Deploying Endpoints for Custom HF and Preset Azure Mistral Models in Azure ML