Rewrite WUtils string.c in Rust #3

Merged
trurl merged 7 commits from trurl/wmaker:refactor/wutil-rs-string into refactor/wutil-rs 2025-12-08 12:40:57 -05:00
Owner

This should be pretty straightforward.

There's a case to be made that the functions which see very little use (wtokenjoin, wtokensplit) should belong elsewhere, but it all needs to be rewritten in Rust or factored away regardless.

This should be pretty straightforward. There's a case to be made that the functions which see very little use (wtokenjoin, wtokensplit) should belong elsewhere, but it all needs to be rewritten in Rust or factored away regardless.
trurl added 5 commits 2025-10-28 21:24:40 -04:00
These functions were added to glibc 2.38, so we don't even need to check for
libbsd. There isn't a strong need to worry about supporting older systems
because use of these functions should go away as we rewrite string.c in
Rust. And, apparently, they are not held in high regard historically. That's at
least 2 reasons to be rid of them.
This is slated for replacement, but it will help to be better documented first.
These functions should be gotten rid of as we transition to Rust, but replacing
them with minimally tested Rust code should suffice as a first step.
Without this, `make` won't automatically rebuild wutil-rs when string.rs
changes.
trurl requested review from cross 2025-10-28 21:25:02 -04:00
cross requested changes 2025-10-31 11:02:43 -04:00
Dismissed
@@ -0,0 +28,4 @@
pub unsafe extern "C" fn wstrndup(s: *const c_char, len: usize) -> *mut c_char {
assert!(!s.is_null());
let s = unsafe { CStr::from_ptr(s) };
let len = usize::min(len, s.count_bytes()) + 1;
Member

I don't think this is what you want, necessarily. You want to find a zero byte, sure, but looking beyond the first len bytes is not useful and may not even be correct, if the source string is not e.g. 0 terminated. I would argue that's a bug, but I could be wrong.

So I'd do something like this:

pub unsafe extern "C" fn strnlen(s: *const c_char, maxlen: usize) -> usize {
    assert!(!s.is_null());
    let s = s.cast<u8>();
    let bs = unsafe { std::slice::from_raw_parts(s, maxlen) };
    bs.into_iter().position(|p| p == 0).unwrap_or(maxlen)
}

pub unsafe extern "C" fn wstrndup(s: *const c_char, len: usize) -> *mut c_char {
    assert!(!s.is_null());
    let len = strnlen(s, len);
    let copy = allow_bytes(len + 1).cast();
    unsafe {
        ptr::copy(s.cast(), copy, len);
    }
    copy.cast()
}
I don't think this is what you want, necessarily. You want to find a zero byte, sure, but looking beyond the first `len` bytes is not useful and may not even be correct, if the source string is not e.g. 0 terminated. I would argue that's a bug, but I could be wrong. So I'd do something like this: ``` pub unsafe extern "C" fn strnlen(s: *const c_char, maxlen: usize) -> usize { assert!(!s.is_null()); let s = s.cast<u8>(); let bs = unsafe { std::slice::from_raw_parts(s, maxlen) }; bs.into_iter().position(|p| p == 0).unwrap_or(maxlen) } pub unsafe extern "C" fn wstrndup(s: *const c_char, len: usize) -> *mut c_char { assert!(!s.is_null()); let len = strnlen(s, len); let copy = allow_bytes(len + 1).cast(); unsafe { ptr::copy(s.cast(), copy, len); } copy.cast() } ```
Author
Owner

Thanks, this is an improvement. (The extant wstrndup called strlen on s, so I didn't feel too badly about scanning ahead for a null byte ... but we can do better.)

I'd like to migrate away from supporting C-style strings, so I'm putting the length computation inline.

Thanks, this is an improvement. (The extant `wstrndup` called `strlen` on `s`, so I didn't feel too badly about scanning ahead for a null byte ... but we can do better.) I'd like to migrate away from supporting C-style strings, so I'm putting the length computation inline.
@@ -0,0 +104,4 @@
// TODO: complain.
return ptr::null_mut();
};
let s: Vec<_> = s.trim().bytes().chain(iter::once(b'\0')).collect();
Member

So this will allocate a Vec<u8> to hold the trimmed bytes, and then invoke alloc_string to do another allocation. I don't know if there's a good way around that; perhaps use strndup? Something like,

    let trimmmed = s.trim();
    let ptr = trimmed.as_ptr();
    let len = trimmed.len();
    unsafe { wstrndup(ptr, len) }
So this will allocate a `Vec<u8>` to hold the trimmed bytes, and then invoke `alloc_string` to do another allocation. I don't know if there's a good way around that; perhaps use `strndup`? Something like, ```rust let trimmmed = s.trim(); let ptr = trimmed.as_ptr(); let len = trimmed.len(); unsafe { wstrndup(ptr, len) } ```
Author
Owner

This is an improvement. Thanks!

This is an improvement. Thanks!
trurl added 2 commits 2025-11-12 16:49:46 -05:00
cross approved these changes 2025-12-02 11:17:02 -05:00
trurl force-pushed refactor/wutil-rs-string from b574880350 to c298b5f96f 2025-12-08 12:40:35 -05:00 Compare
trurl merged commit c298b5f96f into refactor/wutil-rs 2025-12-08 12:40:57 -05:00
trurl deleted branch refactor/wutil-rs-string 2025-12-08 12:40:58 -05:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: vitrine/wmaker#3