#Converting ulaw 8khz to pcm 16khz 16 bit?
1 messages ยท Page 1 of 1 (latest)
While you are waiting for getting help, here are some tips to improve your experience:
If nobody is calling back, that usually means that your question was not well asked and hence nobody feels confident enough answering. Try to use your time to elaborate, provide details, context, more code, examples and maybe some screenshots. With enough info, someone knows the answer for sure.
Don't forget to close your thread using the command </help-thread close:1027500463647621170> when your question has been answered, thanks.
When I run soxi "filename.wav" I see it shows:
Channels : 1
Sample Rate : 8000
Precision : 16-bit
Duration : 00:00:00.08 = 659 samples ~ 6.17813 CDDA sectors
File Size : 44
Bit Rate : 4.27k
Sample Encoding: 16-bit Signed Integer PCM
If I write the audio to a tempoary file then load it and then save it it works and I get:
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:13.18 = 210938 samples ~ 988.772 CDDA sectors
File Size : 422k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
System.out.println("uLaw: " + ulawFormat.getEncoding() + "," + ulawFormat.getSampleRate() + "," + ulawFormat.getSampleSizeInBits() + "," + ulawFormat.getChannels() + "," + ulawFormat.getFrameSize() + "," + ulawFormat.getFrameRate() + "," + ulawFormat.isBigEndian() + "," + ulawAudioInputStream.getFrameLength());
gives:
uLaw: ULAW,8000.0,8,1,160,50.0,false,659
System.out.println("PCM: " + pcmForm.getEncoding() + "," + pcmForm.getSampleRate() + "," + pcmForm.getSampleSizeInBits() + "," + pcmForm.getChannels() + "," + pcmForm.getFrameSize() + "," + pcmForm.getFrameRate() + "," + pcmForm.isBigEndian() + "," + pcmAudioInputStream.getFrameLength());
gives:
PCM: PCM_SIGNED,8000.0,16,1,2,8000.0,false,659
Closed the thread due to inactivity.
If your question was not resolved yet, feel free to just post a message to reopen it, or create a new thread. But try to improve the quality of your question to make it easier to help you ๐
To be honest, I feel like it's hard te get an answer to this question here
Can you give a bit of context on what libraries you are using and that kinda stuff
It's 12h
They get closed within 12 hours?
of no activity, yes. but u can reopen it at any time. ur supposed to provide more info and make it easier to help u
cause after 12h pretty much all helpers have seen ur question already
Not sure what else I can add? I've included the code, the library etc. What else can I include to help?
thing is, keeping it open for longer wont magically generate u an answer. after everyone saw it, there wont be suddenly another user who didnt see it yet
so either no one knew the answer, or it wasnt easy enough to help and they moved on
Yes that's what I figured it's too complicated
So I guess I'll leave it closed. Thanks for the explanation! I just wanted to make sure ๐
I'll figure it out myself
u could try to create a minimal example so that people can try it themselves
Not sure it can be more minimal
it means that i can copy pasta and run it
right now i can't
im missing the sound file and the code is not just a single main method
so i cant try it out myself locally
without extra effort
Alright I'll do that. Most people when they say minimum they want it down to one of two lines so I will get the strings and make an example in the next couple of hours. Thank you very much
but yeah, it's a very specific question. likely better suited for StackOverflow
Yeah I don't do stackoverflow. I never get help and they usually just troll and have a god complex
you def. get a good answer if you write a high quality question
but a minimal reproducable example would be important to provide
problem with SO is that most beginners have bad first experiences because the questions asked by beginners are mostly bad (either duplicate or smth similar)
its more an intermediate/expert kind of thing
but based on the complex question it seems like you know what you do
barely any asker reads the rules and takes them serious. which is unfortunate, since the other side is quite strict about them
low quality questions get stomped
anything that requires ping pong or guessing, which can't be immediately answered, gets stomped
anything that shows no research effort or similiar gets stomped
but for a good reason imo
what matters is that u have a high "question asking" skill. and then u also get a high quality answer
we allow low quality questions which makes it sometimes hard for the helper
but easier for the asker
Not really sure how this is a low quality question considering that it takes any audio as a base64. Like litterally the only thing I will add is loading the string
I didnt say your question is low quality
you said
Yeah I don't do stackoverflow. I never get help and they usually just troll and have a god complex
and I answered how to instead get good answers
So I should open a new thread? Or update here?
? I mean to post my updated question here.
update it here
ok
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.util.Base64;
import javax.sound.sampled.AudioFileFormat;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
public class BaseAudioExample {
public static void main(String[] args) throws LineUnavailableException, IOException {
File file = new File(
"/path/to/base64.txt");
BufferedReader br
= new BufferedReader(new FileReader(file));
StringBuilder sb = new StringBuilder();
String st;
while ((st = br.readLine()) != null)
{
sb.append(st);
}
byte[] decodedAudioBytes = Base64.getDecoder().decode(sb.toString());
ByteArrayInputStream inputStream = new ByteArrayInputStream(decodedAudioBytes);
AudioFormat audioFormat = new AudioFormat(
AudioFormat.Encoding.ULAW, // Audio format details here
8000,
8,
1,
160,
50,
false
);
AudioInputStream ulawAudioInputStream = new AudioInputStream(inputStream, audioFormat, decodedAudioBytes.length / audioFormat.getFrameSize());
AudioFormat ulawFormat = ulawAudioInputStream.getFormat();
AudioFormat pcmFormat = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
ulawFormat.getSampleRate(),
16, // 16-bit sample size
ulawFormat.getChannels(),
ulawFormat.getChannels() * 2, // Frame size
ulawFormat.getSampleRate(),
false // Little endian
);
AudioInputStream pcmAudioInputStream = AudioSystem.getAudioInputStream(pcmFormat, ulawAudioInputStream);
AudioFormat pcmForm = pcmAudioInputStream.getFormat();
// Open a line to play the audio
DataLine.Info info = new DataLine.Info(SourceDataLine.class, pcmForm);
SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
line.open(pcmForm);
// Start playing
line.start();
byte[] buffer = new byte[4096];
int bytesRead;
// Read and play audio data from the stream
while ((bytesRead = pcmAudioInputStream.read(buffer)) != -1)
{
line.write(buffer, 0, bytesRead);
}
AudioSystem.write(pcmAudioInputStream, AudioFileFormat.Type.WAVE, new FileOutputStream("/path/to/save/.wav"));
}
}
It will play fine over the speakers, but not save correctly on the wav
Mind you this example is only to illustrate my issue.
let me try it on my end
And the rules you guys are mentioning are in #rules ?
I guess in the example I also didn't close the streams. So keep that in mind
where did we mention rules?
I think Zabu said smth about Stackoverflow rules
Ohhh yea then it was about stackoverflow. I always make it a point to read rules before I post so I was confused
all good
If you need more context I am trying to get this to work for transcription but it has to be on pcm
It needs to be in:
16bit little-endian mono
Or
8khz 16bit little-endian mono
they are algos for audio?
Yea well codecs
G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. It is an ITU-T standard (Recommendation) for audio encoding, titled Pulse code modulation (PCM) of voice frequencies released for use in 1972.
G.711 passes audio signals in the frequency band of 300โ3400 Hz and samples them ...
I guess this might do it I will test it in a few
http://www.cs.columbia.edu/~hgs/research/projects/ng911-text/jp2105/NG-911/src/local/media/G711.java
CMUSphinx is an open source speech recognition system for mobile and server applications. Supported languages: C, C++, C#, Python, Ruby, Java, Javascript. Supported platforms: Unix, Windows, IOS, Android, hardware.
ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav this works if I do it with ProcessBuilder
But I want to not have to save it convert it and then load it from the disk before being able to do anything with the file
Because I'll be getting these in real time. In the base64 I sent I merged all the incoming base64 together
I changed it to this:
Path base64 = Path.of(Main.class.getResource("/base64.txt").toURI());
byte[] decodedAudioBytes = Base64.getDecoder().decode(Files.readAllBytes(base64));
at the beginning to load the file but that already gives me input byte array has incorrect ending byte at 140628
your method with old I/O is working but this might be important for the File writing later and thus fails maybe
but yeah file is also corrupted for me when creating
with this?
Path base64 = Path.of(Main.class.getResource("/base64.txt").toURI());
byte[] decodedAudioBytes = Base64.getDecoder().decode(Files.readAllBytes(base64));
this throws me an exception already, but I tried with old I/O like you did and it played correctly but corrupted file
and for sure the conversion happens before playing
this exception might be important and could be the problem why the file is corrupted
test this out yourself
i am
with your path
I guess though its worth mention that normally I am not loading this base64 string from a file
its being sent from a json object
in a socket that is streaming
I think the base64 file might already be corrupted
and that the playing using java is working fine until that byte
but the file is corrupted overall
I can save the file in the same format and its fine
so you mean save it in a ulaw format and then its fine?
I still dont understand why this would not work
ah I found it, my base64.txt got a new line at the end which was unintended
oh
my guess would be that the conversion from ulaw -> pcm is failing somewhere but the playing is working
so its probably the conversion from pcm -> wav file
System.out.println(AudioSystem.isFileTypeSupported(AudioFileFormat.Type.WAVE, pcmAudioInputStream));
this prints true though
also if you want to add this
System.out.println("PCM: " + pcmForm.getEncoding() + "," + pcmForm.getSampleRate() + "," + pcmForm.getSampleSizeInBits() + "," + pcmForm.getChannels() + "," + pcmForm.getFrameSize() + "," + pcmForm.getFrameRate() + "," + pcmForm.isBigEndian() + "," + pcmAudioInputStream.getFrameLength());
System.out.println("uLaw: " + ulawFormat.getEncoding() + "," + ulawFormat.getSampleRate() + "," + ulawFormat.getSampleSizeInBits() + "," + ulawFormat.getChannels() + "," + ulawFormat.getFrameSize() + "," + ulawFormat.getFrameRate() + "," + ulawFormat.isBigEndian() + "," + ulawAudioInputStream.getFrameLength());
if I change things too much here it gives a conversion failed error
and fails long before playing audio
during the actual conversion
if you mess around with the audio formats
You are testing on windows?
yeah
this prints false:
System.out.println(AudioSystem.isConversionSupported(audioFormat, pcmFormat));
Interesting
same with
System.out.println(AudioSystem.isConversionSupported(ulawFormat, pcmFormat));
playing the audio still works lmao
these are the provided codecs it checks
so it cant convert?
seems like its not supported
but I am not enough into javas audio api to tell you what to do now
though I think the UlawCodec should work for that
this gets me further. I'll do more research. Thanks
8 will work
java.lang.IllegalArgumentException: Unsupported conversion: PCM_SIGNED 8000.0 Hz, 8 bit, mono, 2 bytes/frame, from ULAW 8000.0 Hz, 8 bit, mono, 160 bytes/frame, 50.0 frames/second,
yeah I misread
I do some more debugging to see where its having a problem
n v m
System.out.println(AudioSystem.isConversionSupported(pcmFormat, ulawFormat));
this returns true
the arguments where swapped
๐
mb
idk something weird is happening
System.out.println(pcmAudioInputStream.available());
this prints the expected 105469
Yes I tested all that before even attempting to ask
I even wrote each byte to the file
I did do something last night that made it play but it was .08 seconds
System.out.println(AudioSystem.write(pcmAudioInputStream, AudioFileFormat.Type.WAVE, Files.newOutputStream(Path.of("D://test.wav"), StandardOpenOption.CREATE)));
but this prints 1362 bytes written
this getAudioFileFormat got the property (byte length 1362) but why?
Maybe because it's missing the header?
this calculation gives 1362
Well during the conversion the frame size is getting smaller?
Which it needs to get larger?
frameLength is 659 and frameSize is 160 based on the debugger
After the conversion?
I am not sure if this is before or after conversion
the header got a size of 44
so its calculating 659 * 2 + 44 -> 1362
but isnt that too small?
Yes
isnt pcm > ulaw
I think that's where the issue is. I haven't touched audio conversion until yesterday
But I think even then it would play
you set the frameSize with ulawFormat.getChannels() * 2 but why?
getChannels returns 1 and thus the result is 2
and I think that when playing instead of saving the audio it just doesnt calculate its size but instead play the stream until its end
and the saving saves the content until its calculation point (1362)
I need mono
ok that means for this?
yes
xD
ulawFormat.getChannels() * 2 calculates the frame size in bytes based on the number of audio channels and the assumption that each sample is represented by 16 bits (2 bytes). This frame size is important when working with audio because it helps determine the amount of data processed or transferred at a given time.
but how do you get from 160 bytes/frame (ulaw) to 2 bytes/frame (pcm) this already makes no sense for me
In ulaw, each sample takes 1 byte (8 bits)
PCM takes 2 bytes (16 bits)
This difference in encoding results in different frame sizes, with PCM having a smaller frame size (2 bytes/frame for stereo) compared to ฮผ-law (160 bytes/frame for stereo).
Or am I mistaken?
I dont really know this
I thought ulaw is compresed
removing *2 gives me this
Unsupported conversion: PCM_SIGNED 8000.0 Hz, 16 bit, mono, 1 bytes/frame, little-endian from ULAW 8000.0 Hz, 8 bit, mono, 160 bytes/frame, 50.0 frames/second, ```
so it only makes sense that pcm takes more bytes per frame no?
already tried with having the same frameSize or doubled etc
yea same error only * 2 works
For mono PCM audio, which has one channel, the frame size would be:
Frame Size = Number of Channels * Bits per Sample / 8
Frame Size = 1 channel * 16 bits / 8 bits per byte
Frame Size = 2 bytes
So, for mono PCM audio, the frame size is 2 bytes per frame. Each frame represents one unit of audio data for the mono channel.
but this would mean that the pcm audio is smaller in size, so to have the same length in time to play the sound it needs less frames/second
because the length of the sound needs to be the same no?
Yea that's where I'm confused. But if that was the case it wouldn't play correct either?
Because I came up with these settings while testing
hm and the playing works fine
Before these it sounded slow, like the chipmunks etc
Or alot of crackling
Actually
I have an idea
lets see xD
the implementation just writes until this calculated size
but no idea what to do against that
and I think this is the point where its worth making a Stackoverflow post
should I write one real quick
?
if you want. Like i said I usually avoid there now cause i either never get an answer or just rude comments
I asked a coworker but I'm pretty sure they won't know.
I have another file to use though for audio
rather than the one i sent
ok nice
๐
me literally being 18 trying to fix this lmao
this is a good summarization of the conversion right?
store ulaw to pcm converted sound to wave file
I think so.
for this
https://stackoverflow.com/questions/77055054/store-ulaw-to-pcm-converted-sound-to-a-wave-file
might not be the best written post I did but it shows much information of what we tried and actually want to achieve
now we just need to wait
probably till tomorrow
I will tell you about new informations
also I am wondering, your title says 8khz (ulaw) - 16khz/16bit (pcm)
but isnt the pcmFormat in the code just using getSampleRate which is also 8khz?
I get unsupported conversions when changing to 16
yeah
PCM_SIGNED 8000.0 Hz, 8 bit, mono, 2 bytes/frame, from ULAW 8000.0 Hz, 8 bit, mono, 160 bytes/frame, 50.0 frames/second
so i guess i do too
sorry tried a lot of things
funny
I asked chatgpt to review and it fixed it
replaced
System.out.println("Actually written: " + AudioSystem.write(pcmAudioInputStream, AudioFileFormat.Type.WAVE, Files.newOutputStream(Path.of("D://test.wav"), StandardOpenOption.CREATE)));
with
try (inputStream; ulawAudioInputStream; AudioInputStream pcmAudioInputStream = AudioSystem.getAudioInputStream(pcmFormat, ulawAudioInputStream)) {
System.out.println("Available: " + pcmAudioInputStream.available());
byte[] audioData = new byte[pcmAudioInputStream.available()];
int bytesRead = pcmAudioInputStream.read(audioData);
System.out.println("Actually written: " + writeWavFile("/path/to/wav.wav", audioData, ulawFormat));
}
and added:
private static int writeWavFile(String filePath, byte[] audioData, AudioFormat ulawFormat) throws IOException {
AudioFormat format = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
ulawFormat.getSampleRate(),
16, // 16-bit sample size
ulawFormat.getChannels(),
ulawFormat.getChannels() * 2, // Frame size
ulawFormat.getSampleRate(),
false // Little endian
);
@pale sierra thanks for the help. I can now stream the audio directly from an input stream to cmusphinx4 without creating a file . Now I just need to retrain it with 8khz instead of 16khz i think ๐
22:47:31.518 INFO speedTracker Total Time Audio: 3.43s Proc: 3.12s 0.91 X real time
22:47:31.518 INFO memoryTracker Mem Total: 3800.00 Mb Free: 3036.33 Mb
22:47:31.518 INFO memoryTracker Used: This: 763.67 Mb Avg: 763.67 Mb Max: 763.67 Mb
22:47:31.518 INFO trieNgramModel LM Cache Size: 12950 Hits: 2350499 Misses: 12950
Hypothesis: activation but i knew of a remake of the skull
22:47:31.876 INFO speedTracker This Time Audio: 1694062080.00s Proc: 0.29s Speed: 0.00 X real time
22:47:31.876 INFO speedTracker Total Time Audio: 1694062080.00s Proc: 3.41s 0.00 X real time
22:47:31.876 INFO memoryTracker Mem Total: 3800.00 Mb Free: 3361.35 Mb
22:47:31.876 INFO memoryTracker Used: This: 438.65 Mb Avg: 601.16 Mb Max: 763.67 Mb
22:47:31.876 INFO trieNgramModel LM Cache Size: 13625 Hits: 2442934 Misses: 13625
Hypothesis: the art
Ill go ahead and close this.
ok the stackoverflow got no answer till now