Need some feedback on some Simple code I wrote to decrypt a Caesar Cypher | Rust Programming Language Community | Page 1

grave mural Dec 30, 2023, 9:48 AM

#

Hi, Rust experts!

I've written a simple library code with a submodule named utils.rs, to crack a simple Caesar Cypher (shift the letters of the alphabet by a fixed amount).

What this does is: 1. Take input text, index each letter based on its position in the alphabet, add a shift to the index, and then map the new index numbers back to their corresponding letters. This is repeated for shifts of 0 to 25. It essentially just does the busywork of trying each possible Caesar Cypher configuration.

What I need to know is how to make this code more efficient, because I get the feeling that there may be an easier/faster/more efficient way of doing things.

main.rs

use rust_crypto_tools::*;

fn main() {
    
    attack::ceasar(String::from("hzruvadohafvbyjvbuayfjhukvmvyfvbhzrdohafvbjhukvmvyfvbyjvbuayf"),26);
    
    }

#

lib.rs

mod utils;
use utils::text_manipulation;
//Holds full attack algorithms
pub struct attack {
//metadata here

}

impl attack {

pub fn ceasar (plaintext:String, letters:i64) 
{

        let uppercase_ordered:Vec<String> = vec![
            "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q",
            "R", "S", "T", "U", "V", "W", "X", "Y", "Z",
        ].into_iter().map(str::to_string).collect();

        let lowercase_ordered:Vec<String> = vec![
        "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q",
        "r", "s", "t", "u", "v", "w", "x", "y", "z",
        ].into_iter().map(str::to_string).collect();


let mut decrypted_text:Vec<String> =vec![];

for i in 0..letters {
let plaintext_vector=text_manipulation::english_alphabet_vectorize(String::from(&plaintext)).iter().map(|x|x+i).collect::<Vec<i64>>();
for j in plaintext_vector  {
let mut index=text_manipulation::looping_index(25, j);
decrypted_text.push(String::from(&uppercase_ordered[index as usize]));
}
println!("{:?}",  0b001100001+0b1    );
println!("{}   shift = {}",decrypted_text.join(""),i);  //To add in mistakes: WEhen you want to convert vector of strings to string, use join("")
decrypted_text.clear();
}

}
}

#

utils.rs

pub mod text_manipulation {
    fn reverse_text(text: String) -> String {
        text.chars().rev().collect()
    }

    fn get_item_type<T>(_: &T) {
        println!("{}", std::any::type_name::<T>());
    }

    //This function assumes zero-indexing, and is inclusive of counter_limit
    pub fn looping_index(mut index_limit: i64, current_point: i64) -> i64 {
        match current_point > index_limit {
            true => current_point % (index_limit + 1),
            _ => match current_point < 0 {
                true => {
                    index_limit += 1;
                    (index_limit - (current_point.abs() % index_limit)) % index_limit  //final part is to handle the n cases when n is a multiple of index_limit
                }
                false => current_point, //if it's within the range of 0..=index_limit
            },
        }
    }

    //Accepts the English alphabet as a single string, and transform it into a numbered vector
    pub fn english_alphabet_vectorize(text: String) -> Vec<i64> //Vec<String>
    {
        let uppercase_ordered:Vec<String> = vec![
            "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q",
            "R", "S", "T", "U", "V", "W", "X", "Y", "Z",
        ].into_iter().map(str::to_string).collect();
        
        let mut trial_vector:Vec<usize>=vec![];
        for item in text.split("").map(str::to_uppercase).collect::<Vec<String>>() {

            match uppercase_ordered.iter().position(|r| *r==item)
            {
            Some(i) => trial_vector.push(i),
            None => {}
            }


        }
    trial_vector.iter().map(|x|*x as i64).collect()

    }

}```

cobalt relic Dec 30, 2023, 10:41 AM

#

First of all, run cargo fmt. It idents the code and inserts spaces around operators (among other things) to make the formatting more consistent and readable.

#

Now, for some efficiency improvements. When dealing with single characters, you can make use of char. They take up less space in memory and don't need to allocate their contents on the heap. For example english_alphabet_vectorize could be written like this

pub fn english_alphabet_vectorize(text: String) -> Vec<i64> {
    let uppercase_ordered: Vec<char> = vec![
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
        'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
    ];
    
    let mut trial_vector: Vec<i64> = vec![];
    for item in text.to_uppercase().chars() {
        match uppercase_ordered.iter().position(|r| *r == item) {
            Some(i) => trial_vector.push(i as i64),
            None => {}
        }
    }
    trial_vector
}

.chars() is used to turn a string into an iterator of characters. It's basically .split("") except it gives you a char instead of a &str for each character.
To get rid of some more allocations, I removed the .collect() from the loop, because it's possible to iterate over iterators directly. I also changed the type of trial_vector to Vec<i64> so we don't have to create a new vector later to change the type.
(If removing .collect() somewhere gives you weird errors, that's likely due to lifetime errors. In those cases I suggest keeping the .collect())

And a sidenote: I placed .to_uppercase() before .chars() because it's difficult to turn a single char into uppercase (for reasons I will not go into right now).

#

Now for looping_index. There is a function to do that in the standard library, so it could be implemented as just

pub fn looping_index(index_limit: i64, current_point: i64) -> i64 {
    current_point.rem_euclid(index_limit + 1)
}

This won't make the code more efficient, but it makes the code simpler, reducing the risk of bugs

solemn bough Dec 30, 2023, 11:00 AM

#

This should be equivalent code

//Accepts the English alphabet as a single string, and transform it into a numbered vector
pub fn english_alphabet_vectorize(text: &str) -> Vec<i64> //Vec<String>
{
    text
        .chars()
        .filter_map(|item| {
            item.is_ascii_uppercase().then_some((item as u8 - b'A') as i64)
        }).collect()
}

cobalt relic Dec 30, 2023, 11:02 AM

#

In ceasar, appart from the efficientcy improvement by using Vec<char> instead of Vec<String> for uppercase_ordered, you can also improve the efficiency a bit by building a string directly instead of pushing to a Vec<String>.

let mut decrypted_text = String::new(); // creates an empty string

for i in 0..letters {
    /// ...
    for j in plaintext_vector {
        let mut index = text_manipulation::looping_index(25, j);
        decrypted_text.push_str(&uppercase_ordered[index as usize]);
        // Or if you change the type of `uppercase_ordered` to `Vec<char>`:
        // decrypted_text.push(uppercase_ordered[index as usize]);
    }
    println!("{}", decrypted_text);
    decrypted_text.clear();
}

Maybe you avoided it because it's inefficient to do so in other languages, where it would have to create a new string on each iteration, but in Rust it reuses the same string so it's actually more efficient because we can skip making the Vec<String>.

solemn bough Dec 30, 2023, 11:12 AM

#

pub fn ceasar(plaintext: &str, letters: i64) {
    let mut decrypted_text = String::new();

    for i in 0..letters {
        text_manipulation::ascii_uppercase_offsets(plaintext)
            .map(|x| text_manipulation::looping_index(25, x + i))
            .for_each(|index| {
                decrypted_text.push((index as u8 + b'A') as char);
            });

        println!("Shift: {:2}, Decrypted: {}", i, decrypted_text);
        decrypted_text.clear();
    }
}

pub mod text_manipulation {
    //This function assumes zero-indexing, and is inclusive of counter_limit
    pub fn looping_index(index_limit: i64, current_point: i64) -> i64 {
        if current_point > index_limit {
            current_point % (index_limit + 1)
        } else if current_point < 0 {
            let index_limit = index_limit + 1;
            (index_limit - (current_point.abs() % index_limit)) % index_limit
        } else {
            current_point
        }
    }

    //Accepts a string of ascii alphabetic characters, and transforms it into an iterator of the character's offset from 'A'
    pub fn ascii_uppercase_offsets(text: &str) -> impl Iterator<Item = i64> + '_ {
        text.chars()
            .map(|item| item.to_ascii_uppercase())
            .filter_map(|item| {
                if item.is_ascii_alphabetic() {
                    Some((item as u8 - b'A') as i64)
                } else {
                    None
                }
            })
    }
}

fn main() {
    ceasar("QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD", 26); // Prints "Shift:  3, Decrypted: THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG"
}

olive panther Dec 30, 2023, 11:20 AM

#

instead of returning a Vec and then calling iter, english_alphabet_vectorize can return an impl Iterator<Item = i64> + '_, avoiding the collect

solemn bough Dec 30, 2023, 11:34 AM

#

With inlining:

pub fn ceasar(plaintext: &str, letters: i64) {
    let looping_index = |index_limit, current_point: i64| if current_point > index_limit {
        current_point % (index_limit + 1)
    } else if current_point < 0 {
        let index_limit = index_limit + 1;
        (index_limit - (current_point.abs() % index_limit)) % index_limit
    } else {
        current_point
    };

    let mut decrypted_text = String::new();

    for i in 0..letters {
        plaintext.chars()
            .map(|item| item.to_ascii_uppercase())
            .filter(char::is_ascii_alphabetic)
            .map(|item| item as u8 - b'A')
            .map(|x| looping_index(25, x as i64 + i))
            .for_each(|index| {
                decrypted_text.push((index as u8 + b'A') as char);
            });

        println!("Shift: {:2}, Decrypted: {}", i, decrypted_text);
        decrypted_text.clear();
    }
}

fn main() {
    ceasar("QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD", 26); // Prints "Shift:  3, Decrypted: THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG"
}

grave mural Dec 30, 2023, 12:26 PM

#

cobalt relic In `ceasar`, appart from the efficientcy improvement by using `Vec<char>` instea...

Ohh... all right, that's a great improvement!

grave mural Dec 30, 2023, 12:28 PM

#

cobalt relic In `ceasar`, appart from the efficientcy improvement by using `Vec<char>` instea...

I actually did not realize this. Did you have to look through the compiler output to know?

solemn bough Dec 30, 2023, 12:32 PM

#

grave mural I actually did not realize this. Did you have to look through the compiler outpu...

String allocates heap memory for its internal buffer and stores a length, capacity, and pointer to that memory in the object, totalling to 24 bytes of space per string not counting the heap. A char, on the other hand, is "just" a 32-bit number stored entirely within 4 bytes.

grave mural Dec 30, 2023, 12:33 PM

#

solemn bough `String` allocates heap memory for its internal buffer and stores a length, capa...

I see... the reason I was hesitant to make use of it was because I thought you'd have to create a new string object with every iteration.

solemn bough Dec 30, 2023, 12:36 PM

#

grave mural I see... the reason I was hesitant to make use of it was because I thought you'd...

Strings are amortized in Rust, meaning that if the data you're adding goes over the current capacity a String will double its internal buffer. This means that on average pushing to a string won't have any allocation costs (even though a few times it will allocate).

#

The same internal buffer of the String can be reused

grave mural Dec 30, 2023, 12:37 PM

#

solemn bough `String`s are amortized in Rust, meaning that if the data you're adding goes ove...

Ohh... where did you get all this information from? I couldn't remember seeing it in The Book (or maybe I was just tired when I read it).

#

Would love to get up to speed by reading the same material that you did!

late isle Dec 30, 2023, 12:43 PM

#

the info is probably derived from the following process:

knowing that String uses Vec<u8> internally by looking at the source code
knowing that Vec doubles its capacity when it exceeds it, by reading the docs of Vec in the stdlib docs
knowing what amortization means and a Vec fundamentally does, by reading a book on data structures and algorithms (DSAs)

solemn bough Dec 30, 2023, 12:45 PM

#

grave mural Ohh... where did you get all this information from? I couldn't remember seeing i...

The Rust documentation is incredibly good, you should always read up before trying to do something. Here's the doc on the internal representation of a String. Again, the Book, rustlings, and the docs are your friend. (And the Rustonomicon, but that's for very advanced Rust).

String in std::string - Rust

A UTF-8–encoded, growable string.

grave mural Dec 30, 2023, 12:46 PM

#

solemn bough The Rust documentation is incredibly good, you should always read up before tryi...

Yeah, I just read The Book, I thought it would be enough. 😭 Didn't know there was so much more detail.

late isle Dec 30, 2023, 12:46 PM

#

the book is just the basics

#

it gets you up and running, but not enough to understand the intricacies of most things

#

the next advice id usually give is reading the entirety of stdlib docs and doing this exercise

solemn bough Dec 30, 2023, 12:47 PM

#

grave mural Yeah, I just read The Book, I thought it would be enough. 😭 Didn't know there ...

Most of everything in Rust's std is documented very well, so check the docs if you're unsure of anything :)

late isle Dec 30, 2023, 12:47 PM

#

-pngme

willow juniperBOT Dec 30, 2023, 12:47 PM

#

https://jrdngr.github.io/pngme_book/

Introduction - PNGme: An Intermediate Rust Project

grave mural Dec 30, 2023, 12:49 PM

#

late isle the next advice id usually give is reading the entirety of stdlib docs and doing...

That's... a lot. I'll try though! Any advice on how you did it?

#

Not saying I won't try, it's just that there's a lot to remember.

late isle Dec 30, 2023, 12:50 PM

#

i did not memorize the entirety of stdlib. i just read it mostly to know where things are, and did a deep dive on the most common types i usually use

#

Option, Result, String, Vec, and Iterator is what i would explore first

grave mural Dec 30, 2023, 12:52 PM

#

late isle Option, Result, String, Vec, and Iterator is what i would explore first

Ahh, okay, thanks, that's very sound advice!

grave mural Dec 30, 2023, 3:38 PM

#

late isle the info is probably derived from the following process: 1) knowing that String ...

Would you recommend any books on Data Structures, BTW? I have Cormen's famous Introduction to Algorithms, but they don't say much about Vecs apart from a brief mention about bit vecs. They still have what looks to me like a pretty impressive list of Data Structures, though.

sharp stratus Dec 31, 2023, 2:14 AM

#

cobalt relic Now, for some efficiency improvements. When dealing with single characters, you ...

i mean u8 is 1 byte

solemn bough Dec 31, 2023, 2:43 AM

#

sharp stratus i mean u8 is 1 byte

What does that have to do with this?

#

A char is 4 bytes

sharp stratus Dec 31, 2023, 2:44 AM

#

    let uppercase_ordered: Vec<u8> = vec![
        b'A', b'B', b'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
        b'R', b'S', b'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
    ];
    ```

solemn bough Dec 31, 2023, 2:45 AM

#

sharp stratus ```rs let uppercase_ordered: Vec<u8> = vec![ b'A', b'B', b'C', 'D', ...

Either way, all those are just ASCII offsets in order, none of that is required in the cipher

#

Also [u8; _] would be better if we go down that path

sharp stratus Dec 31, 2023, 2:46 AM

#

[char; 26] would be better too

solemn bough Dec 31, 2023, 2:47 AM

#

Just item as u8 - b'A' works as well ferrisCluelesser

sharp stratus Dec 31, 2023, 2:48 AM

#

eh

#

yeah

grave mural Dec 31, 2023, 4:43 AM

#

solemn bough Either way, all those are just ASCII offsets in order, none of that is required ...

I was actually thinking of using binary and incrementing it directly, but apparently when I tried that it got automatically recast to i32 by the + operator, and there's no way to cast it back apparently.

livid linden Dec 31, 2023, 5:33 AM

#

?play ```rust
fn shift_by_one(s: &mut String) {
for byte in unsafe { s.as_mut_vec() } {
match byte {
b'A'..=b'Y' | b'a'..=b'y' => *byte += 1,
b'Z' | b'z' => *byte -= 25,
_ => (),
}
}
}

fn main() {
let mut s = String::from("QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD");

for shift in 0..26 {
    println!("{shift:>2}: {s}");
    shift_by_one(&mut s);
}

}

sly pecanBOT Dec 31, 2023, 5:33 AM

#

0: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD
 1: RFC OSGAI ZPMUL DMV HSKNQ MTCP RFC JYXW BME
 2: SGD PTHBJ AQNVM ENW ITLOR NUDQ SGD KZYX CNF
 3: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
 4: UIF RVJDL CSPXO GPY KVNQT PWFS UIF MBAZ EPH
 5: VJG SWKEM DTQYP HQZ LWORU QXGT VJG NCBA FQI
 6: WKH TXLFN EURZQ IRA MXPSV RYHU WKH ODCB GRJ
 7: XLI UYMGO FVSAR JSB NYQTW SZIV XLI PEDC HSK
 8: YMJ VZNHP GWTBS KTC OZRUX TAJW YMJ QFED ITL
 9: ZNK WAOIQ HXUCT LUD PASVY UBKX ZNK RGFE JUM
10: AOL XBPJR IYVDU MVE QBTWZ VCLY AOL SHGF KVN
11: BPM YCQKS JZWEV NWF RCUXA WDMZ BPM TIHG LWO
12: CQN ZDRLT KAXFW OXG SDVYB XENA CQN UJIH MXP
13: DRO AESMU LBYGX PYH TEWZC YFOB DRO VKJI NYQ
14: ESP BFTNV MCZHY QZI UFXAD ZGPC ESP WLKJ OZR
15: FTQ CGUOW NDAIZ RAJ VGYBE AHQD FTQ XMLK PAS
16: GUR DHVPX OEBJA SBK WHZCF BIRE GUR YNML QBT
17: HVS EIWQY PFCKB TCL XIADG CJSF HVS ZONM RCU
18: IWT FJXRZ QGDLC UDM YJBEH DKTG IWT APON SDV
19: JXU GKYSA RHEMD VEN ZKCFI ELUH JXU BQPO TEW
20: KYV HLZTB SIFNE WFO ALDGJ FMVI KYV CRQP UFX
21: LZW IMAUC TJGOF XGP BMEHK GNWJ LZW DSRQ VGY
22: MAX JNBVD UKHPG YHQ CNFIL HOXK MAX ETSR WHZ
23: NBY KOCWE VLIQH ZIR DOGJM IPYL NBY FUTS XIA
24: OCZ LPDXF WMJRI AJS EPHKN JQZM OCZ GVUT YJB
25: PDA MQEYG XNKSJ BKT FQILO KRAN PDA HWVU ZKC```

livid linden Dec 31, 2023, 5:47 AM

#

grave mural Hi, Rust experts! I've written a simple library code with a submodule named `ut...

See the code above for a more efficient way, as it uses no division and only one allocation. It's also simpler and doesn't throw away spaces or punctuation.

grave mural Dec 31, 2023, 5:49 AM

#

livid linden ?play ```rust fn shift_by_one(s: &mut String) { for byte in unsafe { s.as_mu...

Oooh... unsafe territory... 😯 I'll read up on that!

livid linden Dec 31, 2023, 5:51 AM

#

It's unsafe because Strings have to have UTF-8, but if you see ASCII characters like English letters, they're guaranteed to be those ASCII characters, and so you can safely alter them to other ASCII characters without ruining UTF-8.

grave mural Dec 31, 2023, 5:51 AM

#

livid linden ?play ```rust fn shift_by_one(s: &mut String) { for byte in unsafe { s.as_mu...

Yes, this would be the most efficient, I think! You don't even have to refer to a secondary vector or anything.

grave mural Dec 31, 2023, 5:52 AM

#

livid linden It's unsafe because `String`s have to have UTF-8, but if you see ASCII character...

That's a great solution for English charsets!

What if we want extended non-ASCII characters, how would you handle that?

livid linden Dec 31, 2023, 5:53 AM

#

For that, I'd probably do str.chars().whatever.collect::<String>() because non-ASCII characters have lengths from 2 to 4 bytes, and replacing them can change the length of that.

#

If you're sure the byte lengths of the original and replacement characters are the same, you can do it this way, though.

grave mural Dec 31, 2023, 5:54 AM

#

livid linden If you're sure the byte lengths of the original and replacement characters are t...

By 'This way' you mean directly working with bytes?

livid linden Dec 31, 2023, 5:54 AM

#

Yes.

grave mural Dec 31, 2023, 5:54 AM

#

livid linden If you're sure the byte lengths of the original and replacement characters are t...

I see

#

The Caesar Cypher is pretty basic, eventually I'll have to move on to bit manipulation. Any useful advice for that?

livid linden Dec 31, 2023, 5:55 AM

#

What task with bit manipulation?

grave mural Dec 31, 2023, 5:56 AM

#

livid linden What task with bit manipulation?

XOR -ing bits, truncating them, etc.

#

If I remember correctly, those are the operations involved in SHA-256

livid linden Dec 31, 2023, 5:57 AM

#

Ahh, if you're writing that, as long as you use the bit operations they say, it should be pretty efficient.

#

I don't think there's much you can do to speed it up.

#

For security, you'd probably want to figure out a way to do things in constant time or whatever.

livid linden Dec 31, 2023, 6:01 AM

#

grave mural If I remember correctly, those are the operations involved in SHA-256

Oh, there's also this: https://en.wikipedia.org/wiki/Intel_SHA_extensions.

#

That can provide hardware acceleration.

grave mural Dec 31, 2023, 6:03 AM

#

livid linden For security, you'd probably want to figure out a way to do things in constant t...

Not sure if that's possible, LOL, especially if the size of the data being hashed is huge.

livid linden Dec 31, 2023, 6:03 AM

#

No, I mean your bit shifts and so forth.

#

The only thing that should change the time taken would be the amount of data you're hashing.

#

The specific bits in the data shouldn't change the time it takes to do the hashing.

#

Ahh, there's hardware acceleration on more than Intel: https://en.wikipedia.org/wiki/SHA-2#Implementations.

grave mural Dec 31, 2023, 6:23 AM

#

livid linden No, I mean your bit shifts and so forth.

OMG... Yes, you're right, now I remember that if you mess up and make it faster/slower to encrypt data with more 1s than 0s, etc, even if it's for efficiency's sake, a malicious attacker might still be able to infer what information you had.

grave mural Dec 31, 2023, 6:23 AM

#

livid linden The specific bits in the data shouldn't change the time it takes to do the hashi...

Best tactic: Have a fixed delay that's longer than the whole hashing process LOLOL 😂

#Need some feedback on some Simple code I wrote to decrypt a Caesar Cypher