xgboost4j predict fail in some devices | Learn AI Together | Page 1

kind lava Mar 29, 2023, 1:24 PM

#

anyone working with xgboost4j 🙂 today i face strange issue, predict wrong in local and server A but correct in server B .
in server A, with the same struct and config, model X1 fail but model X2 work :))
any clue about that
fail, in deed, in my case, model always give predict score >0.99

lilac cairn Mar 29, 2023, 1:32 PM

#

kind lava anyone working with xgboost4j 🙂 today i face strange issue, predict wrong in lo...

Can you explain more about how did you split the data and such?

#

I don't use Java, so I can't really read through the code, but understanding your diagram and flow is no problem

#

For example, there could be data leakage problem, there could be data imbalance problem, and so on

#

If your model is only good at predicting class A, and your test set on server B is mostly filled with class A, then you will certainly have a higher result there

#

Pruning these logic is what I'm looking at

kind lava Mar 29, 2023, 1:36 PM

#

sorry maybe i not declare it clear.
I train xgboost and infer it using xgboost4j.

my issue is: with the same input, but model predict different result

#

in local: model always predict a prob > 0.99, but when i deploy it to server B. it run correctly and output prob = 0.02

#

i also deploy to server A. but it run into the same problem with local . output prob always > 0.99

#

firstly i think its have some issue with incorrect library load, but, i try another model (model X2 with fewer features) , then do the same step above.
its work correctly in all device

lilac cairn Mar 29, 2023, 1:46 PM

#

That's strange. I also just found this: https://github.com/dmlc/xgboost/issues/4562 and it's an open ticket. If your stuff works perfectly fine without xgboost4j, then might be xgboost4j itself then

GitHub

[jvm-packages] Performance differ XGBoost4j prediction in windows a...

I am using XGBoost4j to get predtions. But I found the performances in windows and Linux are so different. And it is quite confusing. My codes are: long startTime = System.nanoTime(); DMatrix dmat;...

#

@pallid vault wonder if you have thoughts on this

pallid vault Mar 29, 2023, 1:51 PM

#

I think i might use xgboost4j one day. Other than that, I could only offer tech support questions atm.

Is it loading the same weights? Does the server have bias enabled/disabled, if applicable? My first step would be checking to see if the network on the server is actually the same as the one local

kind lava Mar 29, 2023, 1:53 PM

#

it use the same weights, and load from file locally

#

just come in mind that something wrong happen when model is bigger and related to ram resource

pallid vault Mar 29, 2023, 1:53 PM

#

Is there a debug way to check manually, or are you just confident the server reads the same as local?

kind lava Mar 29, 2023, 1:54 PM

#

but im not sure how to check

#

i try another model in the same server, same service, and all the same just another bin 🙂

#

and amazingly its work, the model that not work, is work in server B. only fail in local and server A

pallid vault Mar 29, 2023, 1:56 PM

#

Could you try a different model that successfully runs locally? See if it is only this project or if it is related to them all

kind lava Mar 29, 2023, 1:57 PM

#

yeah the model X2, work everywhere. I test in the same project.
model X1, fail in local and server A, work in server B

#

p/s: actually model X1 is upgrade of X2 with some new features btw

lilac cairn Mar 29, 2023, 2:03 PM

#

@pallid vault you're an angel

pallid vault Mar 29, 2023, 2:03 PM

#

I'm trying to read more about xgboost to see if I can find smth

kind lava Mar 29, 2023, 2:04 PM

#

tks you so much, this take me 2 days search around, and cant find any clue

pallid vault Mar 29, 2023, 2:11 PM

#

The only thing that stands out to me atm is the different forms of saving the bin, to include feature maps or not

The Boost class also seems to contain a Booster#dump_model() which according to the docs

Dump model into a text or JSON file. Unlike save_model(), the output format is primarily used for visualization or interpretation, hence it’s more human readable but cannot be loaded back to XGBoost.

I am not 100% certain it exists in the java version, but it looks like it could definitely show if the model is the same internally or if there are some minor differences

kind lava Mar 29, 2023, 2:31 PM

#

the problem in my case is its work with previous version, it work in server B. But what i want is work in server A 😄
My stupid solution now is make an api to call predict from serverA to server B, its so stupid 😦 poor me

kind lava Mar 29, 2023, 2:35 PM

#

pallid vault The only thing that stands out to me atm is the different forms of saving the bi...

i train by python
XGB Version 1.6.1
model.save_model(f'{OUTPUT_DIR}/prod26_{PRJ_NAME}_model.bin')
then save it to bin

and use xgboost4j to load back to infer realtime

pallid vault Mar 29, 2023, 2:36 PM

#

I have to go soon, so hopefully something here has helped a bit

kind lava Mar 29, 2023, 2:37 PM

#

tks you for your help . You are so nice

pallid vault Mar 29, 2023, 2:37 PM

#

I am the default java helper here lol, Ian calls me occasionally

#xgboost4j predict fail in some devices