This is a segment of some bioinformatics code where i take some "word" and "alphabet" as input that is a dna string and possible DNA values:
word = "ACGTTCACGTCGATGCTATGCGATGCATGT";
alphabet = "ACGTN";
and this code is supposed to take that word and create a vector that lists every possible DNA strand that can be created via adding X number of additional letters, where the letters can be any listed in the alphabet. It needs to work for any value of X (num_mismatches in my code).
I do a very brute force method where I just loop through the Vec<DNA> and add each possible letter at each possible spot and append that all to a big list. Then, if the mismatches > 1, i loop through THAT list, and add each possible letter at each possible spot in the new words.
This looks like a lot of cloning and allocation/insertions that look expensive, but I can't figure out how to reduce that number.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum DNA {
A,
C,
G,
T,
N,
}
// Insertions, slow?
let mut potential_mismatches: Vec<Vec<DNA>> = Vec::new();
let mut insertion_mismatches: Vec<Vec<DNA>> = Vec::new();
let mut inserts = 1;
while inserts <= num_mismatches {
let mut word_insert_list: Vec<Vec<DNA>> = Vec::new();
if inserts == 1 {
word_insert_list.push(word.to_vec());
}
else {
word_insert_list = insertion_mismatches.clone();
}
let word_length = word.len() + inserts;
for word_to_insert_in in &mut word_insert_list {
for i in 0..word_length {
for nt in alphabet {
word_to_insert_in.insert(i, *nt);
insertion_mismatches.push(word_to_insert_in.clone());
word_to_insert_in.remove(i);
}
}
}
potential_mismatches.extend(insertion_mismatches.clone());
inserts += 1;
}