#Any way I can speed this up?
1 messages · Page 1 of 1 (latest)
Here's a gist if you prefer that - https://gist.github.com/bones-ai/fc1064d85d083ec43700697c41c6d8b4
package main
import "core:fmt"
import "core:os"
import "core:strconv"
import "core:strings"
import "core:mem"
MNIST_TEST_FILE_PATH :: "./data/mnist_test.csv"
MNIST_TRAIN_FILE_PATH :: "./data/mnist_train.csv"
MNIST_IMG_SIZE :: 28
MnistRecord :: struct {
data: [MNIST_IMG_SIZE * MNIST_IMG_SIZE]u8,
value: u8,
}
load_mnist_data :: proc(path: string, size: u16) -> (data: []MnistRecord, ok: bool) {
file, f_ok := os.read_entire_file(path)
if !f_ok {
return
}
defer delete(file)
file_string := string(file)
data = make([]MnistRecord, size)
line_idx := 0
is_first_line := true
for line in strings.split_lines_iterator(&file_string) {
defer line_idx += 1
if is_first_line {
is_first_line = false
continue
}
splits := strings.split(line, ",")
rec: MnistRecord
is_record_value := true
rec_idx := 0
for s in splits {
defer rec_idx += 1
value := u8 (strconv.atoi(s))
if is_record_value {
is_record_value = false
rec.value = value
continue
}
rec.data[rec_idx - 1] = value
}
data[line_idx - 1] = rec
}
return data, true
}
main :: proc() {
mnist_data, ok := load_mnist_data(MNIST_TRAIN_FILE_PATH, 60000)
defer delete(mnist_data)
if !ok {
fmt.println("Failed to read mnist data file")
return
}
for d, i in mnist_data {
if i % 5000 == 0 {
fmt.printfln("i: %d -> v: %d", i, d.value)
}
}
}
load_mnist_data :: proc(path: string, size: u16) -> (data: []MnistRecord, ok: bool) {
file, f_ok := os.read_entire_file(path, context.temp_allocator)
if !f_ok {
return
}
defer delete(file, context.temp_allocator)
file_string := string(file)
data = make([]MnistRecord, size)
_, _ = strings.split_lines_iterator(&file_string)
i := 0
for line in strings.split_lines_iterator(&file_string) {
defer i += 1
line := line
value_s, _ := strings.split_iterator(&line, ",")
data[i].value = u8(strconv.atoi(value_s))
j := 0
for data_s in strings.split_iterator(&line, ",") {
defer j += 1
data[i].data[j] = u8(strconv.atoi(data_s))
}
}
return data, true
}
- No real reason to put conditions into the loops, just do them ahead of time
- No reason to
strings.split()and allocate (you were also leaking it) - No reason to copy
recwhen you do the assignment todata[i]just write it directly - temp allocator is your friend
+1. measure it first
There was a thread a while back on processing large CSV files - not sure if it’s applicable to your case but it might be worth a scan: https://discord.com/channels/568138951836172421/1014208116302299146
Btw there is also core: encoding/csv but if you need speed doing it yourself if better
core: encoding/csv is what I used initially and yes doing it manually was faster
Thanks for linking that, I'll have a look at the thread in detail 🙂
Hey, thanks a lot for listing all the things I could improve on 🙂
Also can you please elaborate on
4. temp allocator is your friend
+1. measure it first
- temp allocator is usually faster, in most cases it already has enough memory to not need to touch the a system allocator. If you were to have intermediate allocations (like with
strings.split()) there's alsoruntime.DEFAULT_TEMP_ALLOCATOR_TEMP_GUARD()to automatically drop all allocations in the scope at once, also saving on time spent indelete().
the +1, core:time has Stopwatch for quickly getting some basic timing info on what part feels sluggish, or depending on your platform run a perf tool, on linux hotspot is nice and simple to use (or cachegrind with kcachegrind).
I spent a decent amount of time at a company where I did a lot of large csv work. The fastest way to parse through a large file will always be to stream it. No read_entire_file.
And it looks like you're assuming no quotes =]
Thanks for explaining that, still fairly new to the language, I'll have to explore what temp allocators are 🥲
Yes, found this article https://odin-lang.org/news/read-a-file-line-by-line/ that shows how to stream a file
Update on this,
I was able to optimise it following a few of the suggestions above, it almost takes half the time now for 60k records, I think I'm happy with this for now
https://gist.github.com/bones-ai/b9128a391d9b18540b284e0c499ceac7
is that a debug build or fully optimized release build? What's the use case, do you need the debug build to be fast too?
It was optimized for speed
odin build . -o:speed
This is for a mnist neural network visualization i'm working on, and I'm fine with the debug builds being a slower initially during the data load process i guess
try odin build . -o:aggressive -disable-assert -no-bounds-check -microarch:native
few ideas:
// Pass in raw string bytes and skip decoding utf8 runes
string_to_u8 :: proc(s: []byte) -> u8 {
result: u8 = 0;
for ch in s {
result = result * 10 + (ch - '0');
}
return result;
}
// ...
for i := 0;; i += 1{
// reader_read_string should have used context.temp_allocator. This way you don't need to free
//
// Never tested reader_read_slice but it could work:
// reader_read_slice doesn't allocate, but data will be overwritten on next read.
// the 'buffer' passed into bufio.Reader _has_ to be large enough to handle each line.
line := transmute(string)(bufio.reader_read_slice(&r, '\n') or_break)
// process line
values := split_u8_string(line)
ret[i].value = values[0]
for j in 1..=MNIST_IMG_SIZE * MNIST_IMG_SIZE {
ret[i].data[j-1] = f32 (values[j]) / 255.0
}
}
I didn't test this code (especially reader_read_slice, which might not work) but maybe it's useful
Is aggressive a nightly thing?
Invalid optimization mode for -o:<string>, got aggressive
Also added the other flags to my build script, so thanks for that
how old is your compiler lol?
Also, will try out the other changes you've suggested when I get some time
it has been in monthly releases for a few months
I remember installing odin only a week ago, let me check
you can use odin version btw
I think I'm on latest? I'm building it from source
[~/projects/bai/odin-nn] $ odin version
odin version dev-2024-08:9553bc368
[~/projects/bai/odin-nn] $ ./build.sh speed
Building for speed
Invalid optimization mode for -o:<string>, got aggressive
Valid optimization modes:
minimal
size
speed
none (useful for -debug builds)