#md5hash computation using the correct encoding

6 messages · Page 1 of 1 (latest)

marsh narwhal
#

I have a java method written which takes a file as input and computes the md5hash of the file.
The issue is that I want to execute this code on two different machines - Windows and Linux.
Windows is using UTF-16 encoding whereas Linux is using UTF-8 encoding.
The contents of the file are the same but due to different encodings, the md5hash that is computed for these files is different. My aim is to compare the content in the files, character to character.

Sharing the java method here.

        try (FileInputStream fis = new FileInputStream(filename)) {
            MessageDigest md = MessageDigest.getInstance("MD5");

            // Read the file and update the message digest
            byte[] buffer = new byte[8192];
            int bytesRead;
            while ((bytesRead = fis.read(buffer)) != -1) {
                md.update(buffer, 0, bytesRead);
            }

            // Get the MD5 hash
            byte[] md5Bytes = md.digest();

            // Convert the byte array to a hexadecimal string
            StringBuilder sb = new StringBuilder();
            for (byte md5Byte : md5Bytes) {
                sb.append(Integer.toString((md5Byte & 0xff) + 0x100, 16).substring(1));
            }
           
            return sb.toString();
        }

    }```
umbral gulchBOT
#

This post has been reserved for your question.

Hey @marsh narwhal! Please use /close or the Close Post button above when your problem is solved. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.

TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.

umbral gulchBOT
celest needle
#

What do you mean characters? What do these files contain, exactly?

marsh narwhal