Document everything

This commit is contained in:
Leonard Hecker 2025-05-16 01:12:59 +02:00
parent 3ba67f7613
commit 293ea36c49
33 changed files with 1229 additions and 147 deletions

49
CONTRIBUTING.md Normal file
View file

@ -0,0 +1,49 @@
# Contributing
## Translation improvements
You can find our translations in [`src/bin/edit/localization.rs`](./src/bin/edit/localization.rs).
Please feel free to open a pull request with your changes at any time.
If you'd like to discuss your changes first, please feel free to open an issue.
## Bug reports
If you find any bugs, we gladly accept pull requests without prior discussion.
Otherwise, you can of course always open an issue for us to look into.
## Feature requests
Please open a new issue for any feature requests you have in mind.
Keeping the binary size of the editor small is a priority for us and so we may need to discuss any new features first until we have support for plugins.
## Code changes
The project has a focus on a small binary size and sufficient (good) performance.
As such, we generally do not accept pull requests that introduce dependencies (there are always exceptions of course).
Otherwise, you can consider this project a playground for trying out any cool ideas you have.
The overall architecture of the project can be summarized as follows:
* The underlying text buffer in `src/buffer` doesn't keep track of line breaks in the document.
This is a crucial design aspect that permeates throughout the entire codebase.
To oversimplify, the *only* state that is kept is the current cursor position.
When the user asks to move to another line, the editor will `O(n)` seek through the underlying document until it found the corresponding number of line breaks.
* As a result, `src/simd` contains crucial `memchr2` functions to quickly find the next or previous line break (runs at up to >100GB/s).
* Furthermore, `src/unicode` implements an `Utf8Chars` iterator which transparently inserts U+FFFD replacements during iteration (runs at up to 4GB/s).
* Furthermore, `src/unicode` also implements grapheme cluster segmentation and cluster width measurement via its `MeasurementConfig` (runs at up to 600MB/s).
* If word wrap is disabled, `memchr2` is used for all navigation across lines, allowing us to breeze through 1GB large files as if they were 1MB.
* Even if word-wrap is enabled, it's still sufficiently smooth thanks to `MeasurementConfig`. This is only possible because these base functions are heavily optimized.
* `src/framebuffer.rs` implements a "framebuffer" like in video games.
It allows us to draw the UI output into an intermediate buffer first, accumulating all changes and handling things like color blending.
Then, it can compare the accumulated output with the previous frame and only send the necessary changes to the terminal.
* `src/tui.rs` implements an immediate mode UI. Its module implementation gives an overview how it works and I recommend reading it.
* `src/vt.rs` implements our VT parser.
* `src/sys` contains our platform abstractions.
* Finally, `src/bin/edit` ties everything together.
It's roughly 90% UI code and business logic.
It contains a little bit of VT logic in `setup_terminal`.
If you have an issue with your terminal, the places of interest are the aforementioned:
* VT parser in `src/vt.rs`
* Platform specific code in `src/sys`
* And the `setup_terminal` function in `src/bin/edit/main.rs`

View file

@ -1,3 +1,20 @@
# MS-DOS Editor Redux
# Microsoft Edit
TBA
A simple editor for simple needs.
This editor pays homage to the classic [MS-DOS Editor](https://en.wikipedia.org/wiki/MS-DOS_Editor), but with a modern interface and modern input controls similar to VS Code. The goal is to provide an accessible editor, even those largely unfamiliar with terminals can use.
## Installation
* Download the latest release from our [releases page](https://github.com/microsoft/edit/releases/latest)
* Extract the archive
* Copy the `edit` binary to a directory in your `PATH`
* You may delete any other files in the archive if you don't need them
## Build Instructions
* [Install Rust](https://www.rust-lang.org/tools/install)
* Install the nightly toolchain: `rustup install nightly`
* Alternatively, set the environment variable `RUSTC_BOOTSTRAP=1`
* Clone the repository
* For a release build run: `cargo build --config .cargo/release.toml --release`

View file

@ -1,12 +1,16 @@
//! Provides a transparent error type for edit.
use std::{io, result};
use crate::sys;
// Remember to add an entry to `Error::message()` for each new error.
pub const APP_ICU_MISSING: Error = Error::new_app(0);
/// Edit's transparent `Result` type.
pub type Result<T> = result::Result<T, Error>;
/// Edit's transparent `Error` type.
/// Abstracts over system and application errors.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Error {
App(u32),

View file

@ -7,9 +7,34 @@ use std::ptr::NonNull;
use super::release;
use crate::apperr;
/// A debug wrapper for [`release::Arena`].
///
/// The problem with [`super::ScratchArena`] is that it only "borrows" an underlying
/// [`release::Arena`]. Once the [`super::ScratchArena`] is dropped it resets the watermark
/// of the underlying [`release::Arena`], freeing all allocations done since borrowing it.
///
/// It is completely valid for the same [`release::Arena`] to be borrowed multiple times at once,
/// *as long as* you only use the most recent borrow. Bad example:
/// ```should_panic
/// use edit::arena::scratch_arena;
///
/// let mut scratch1 = scratch_arena(None);
/// let mut scratch2 = scratch_arena(None);
///
/// let foo = scratch1.alloc_uninit::<usize>();
///
/// // This will also reset `scratch1`'s allocation.
/// drop(scratch2);
///
/// *foo; // BOOM! ...if it wasn't for our debug wrapper.
/// ```
///
/// To avoid this, this wraps the real [`release::Arena`] in a "debug" one, which pretends as if every
/// instance of itself is a distinct [`release::Arena`] instance. Then we use this "debug" [`release::Arena`]
/// for [`super::ScratchArena`] which allows us to track which borrow is the most recent one.
pub enum Arena {
// Delegate is 'static, because release::Arena requires no lifetime
// annotations, and so this struct cannot use them either.
// annotations, and so this mere debug helper cannot use them either.
Delegated { delegate: &'static release::Arena, borrow: usize },
Owned { arena: release::Arena },
}

View file

@ -1,12 +1,14 @@
//! Arena allocators. Small and fast.
#[cfg(debug_assertions)]
mod debug;
mod release;
mod scratch;
mod string;
#[cfg(debug_assertions)]
#[cfg(all(not(doc), debug_assertions))]
pub use self::debug::Arena;
#[cfg(not(debug_assertions))]
#[cfg(any(doc, not(debug_assertions)))]
pub use self::release::Arena;
pub use self::scratch::{ScratchArena, init, scratch_arena};
pub use self::string::ArenaString;

View file

@ -12,12 +12,36 @@ use crate::{apperr, sys};
const ALLOC_CHUNK_SIZE: usize = 64 * KIBI;
/// An arena allocator.
///
/// If you have never used an arena allocator before, think of it as
/// allocating objects on the stack, but the stack is *really* big.
/// Each time you allocate, memory gets pushed at the end of the stack,
/// each time you deallocate, memory gets popped from the end of the stack.
///
/// One reason you'd want to use this is obviously performance: It's very simple
/// and so it's also very fast, >10x faster than your system allocator.
///
/// However, modern allocators such as `mimalloc` are just as fast, so why not use them?
/// Because their performance comes at the cost of binary size and we can't have that.
///
/// The biggest benefit though is that it sometimes massively simplifies lifetime
/// and memory management. This can best be seen by this project's UI code, which
/// uses an arena to allocate a tree of UI nodes. This is infameously difficult
/// to do in Rust, but not so when you got an arena allocator:
/// All nodes have the same lifetime, so you can just use references.
///
/// # Safety
///
/// **Do not** push objects into the arena that require destructors.
/// Destructors are not executed. Use a pool allocator for that.
pub struct Arena {
base: NonNull<u8>,
capacity: usize,
commit: Cell<usize>,
offset: Cell<usize>,
/// See [`super::debug`], which uses this for borrow tracking.
#[cfg(debug_assertions)]
pub(super) borrows: Cell<usize>,
}
@ -61,6 +85,7 @@ impl Arena {
/// Obviously, this is GIGA UNSAFE. It runs no destructors and does not check
/// whether the offset is valid. You better take care when using this function.
pub unsafe fn reset(&self, to: usize) {
// Fill the deallocated memory with 0xDD to aid debugging.
if cfg!(debug_assertions) && self.offset.get() > to {
let commit = self.commit.get();
let len = (self.offset.get() + 128).min(commit) - to;

View file

@ -9,6 +9,7 @@ use crate::helpers::*;
static mut S_SCRATCH: [release::Arena; 2] =
const { [release::Arena::empty(), release::Arena::empty()] };
/// Call this before using [`scratch_arena`].
pub fn init() -> apperr::Result<()> {
unsafe {
for s in &mut S_SCRATCH[..] {
@ -18,8 +19,27 @@ pub fn init() -> apperr::Result<()> {
Ok(())
}
/// Returns a new scratch arena for temporary allocations,
/// ensuring it doesn't conflict with the provided arena.
/// Need an arena for temporary allocations? [`scratch_arena`] got you covered.
/// Call [`scratch_arena`] and it'll return an [`Arena`] that resets when it goes out of scope.
///
/// ---
///
/// Most methods make just two kinds of allocations:
/// * Interior: Temporary data that can be deallocated when the function returns.
/// * Exterior: Data that is returned to the caller and must remain alive until the caller stops using it.
///
/// Such methods only have two lifetimes, for which you consequently also only need two arenas.
/// ...even if your method calls other methods recursively! This is because the exterior allocations
/// of a callee are simply interior allocations to the caller, and so on, recursively.
///
/// This works as long as the two arenas flip/flop between being used as interior/exterior allocator
/// along the callstack. To ensure that is the case, we use a recursion counter in debug builds.
///
/// This approach was described among others at: <https://nullprogram.com/blog/2023/09/27/>
///
/// # Safety
///
/// If your function takes an [`Arena`] argument, you **MUST** pass it to `scratch_arena` as `Some(&arena)`.
pub fn scratch_arena(conflict: Option<&Arena>) -> ScratchArena<'static> {
unsafe {
#[cfg(debug_assertions)]
@ -31,18 +51,9 @@ pub fn scratch_arena(conflict: Option<&Arena>) -> ScratchArena<'static> {
}
}
// Most methods make just two kinds of allocations:
// * Interior: Temporary data that can be deallocated when the function returns.
// * Exterior: Data that is returned to the caller and must remain alive until the caller stops using it.
//
// Such methods only have two lifetimes, for which you consequently also only need two arenas.
// ...even if your method calls other methods recursively! This is because the exterior allocations
// of a callee are simply interior allocations to the caller, and so on, recursively.
//
// This works as long as the two arenas flip/flop between being used as interior/exterior allocator
// along the callstack. To ensure that is the case, we use a recursion counter in debug builds.
//
// This approach was described among others at: https://nullprogram.com/blog/2023/09/27/
/// Borrows an [`Arena`] for temporary allocations.
///
/// See [`scratch_arena`].
#[cfg(debug_assertions)]
pub struct ScratchArena<'a> {
arena: debug::Arena,

View file

@ -4,49 +4,63 @@ use std::ops::{Bound, Deref, DerefMut, RangeBounds};
use super::Arena;
use crate::helpers::*;
/// A custom string type, because `std` lacks allocator support for [`String`].
///
/// To keep things simple, this one is hardcoded to [`Arena`].
#[derive(Clone)]
pub struct ArenaString<'a> {
vec: Vec<u8, &'a Arena>,
}
impl<'a> ArenaString<'a> {
/// Creates a new [`ArenaString`] in the given arena.
#[must_use]
pub const fn new_in(arena: &'a Arena) -> Self {
Self { vec: Vec::new_in(arena) }
}
#[inline]
/// Turns a [`str`] into an [`ArenaString`].
#[must_use]
pub fn from_str(arena: &'a Arena, s: &str) -> Self {
let mut res = Self::new_in(arena);
res.push_str(s);
res
}
/// It says right here that you checked if `bytes` is valid UTF-8
/// and you are sure it is. Presto! Here's an `ArenaString`!
///
/// # Safety
///
/// It says "unchecked" right there. What did you expect?
/// You fool! It says "unchecked" right there. Now the house is burning.
#[inline]
#[must_use]
pub unsafe fn from_utf8_unchecked(bytes: Vec<u8, &'a Arena>) -> Self {
Self { vec: bytes }
}
pub fn from_utf8_lossy<'s>(arena: &'a Arena, v: &'s [u8]) -> Result<&'s str, ArenaString<'a>> {
let mut iter = v.utf8_chunks();
/// Checks whether `text` contains only valid UTF-8.
/// If the entire string is valid, it returns `Ok(text)`.
/// Otherwise, it returns `Err(ArenaString)` with all invalid sequences replaced with U+FFFD.
pub fn from_utf8_lossy<'s>(
arena: &'a Arena,
text: &'s [u8],
) -> Result<&'s str, ArenaString<'a>> {
let mut iter = text.utf8_chunks();
let Some(mut chunk) = iter.next() else {
return Ok("");
};
let valid = chunk.valid();
if chunk.invalid().is_empty() {
debug_assert_eq!(valid.len(), v.len());
return Ok(unsafe { str::from_utf8_unchecked(v) });
debug_assert_eq!(valid.len(), text.len());
return Ok(unsafe { str::from_utf8_unchecked(text) });
}
const REPLACEMENT: &str = "\u{FFFD}";
let mut res = Self::new_in(arena);
res.reserve(v.len());
res.reserve(text.len());
loop {
res.push_str(chunk.valid());
@ -62,6 +76,7 @@ impl<'a> ArenaString<'a> {
Err(res)
}
/// Turns a [`Vec<u8>`] into an [`ArenaString`], replacing invalid UTF-8 sequences with U+FFFD.
#[must_use]
pub fn from_utf8_lossy_owned(v: Vec<u8, &'a Arena>) -> Self {
match Self::from_utf8_lossy(v.allocator(), &v) {
@ -70,26 +85,32 @@ impl<'a> ArenaString<'a> {
}
}
/// It's empty.
pub fn is_empty(&self) -> bool {
self.vec.is_empty()
}
/// It's lengthy.
pub fn len(&self) -> usize {
self.vec.len()
}
/// It's capacatity.
pub fn capacity(&self) -> usize {
self.vec.capacity()
}
/// It's a [`String`], now it's a [`str`]. Wow!
pub fn as_str(&self) -> &str {
unsafe { str::from_utf8_unchecked(self.vec.as_slice()) }
}
/// It's a [`String`], now it's a [`str`]. And it's mutable! WOW!
pub fn as_mut_str(&mut self) -> &mut str {
unsafe { str::from_utf8_unchecked_mut(self.vec.as_mut_slice()) }
}
/// Now it's bytes!
pub fn as_bytes(&self) -> &[u8] {
self.vec.as_slice()
}
@ -103,22 +124,32 @@ impl<'a> ArenaString<'a> {
&mut self.vec
}
/// Reserves *additional* memory. For you old folks out there (totally not me),
/// this is differrent from C++'s `reserve` which reserves a total size.
pub fn reserve(&mut self, additional: usize) {
self.vec.reserve(additional)
}
/// Now it's small! Alarming!
///
/// *Do not* call this unless this string is the last thing on the arena.
/// Arenas are stacks, they can't deallocate what's in the middle.
pub fn shrink_to_fit(&mut self) {
self.vec.shrink_to_fit()
}
/// To no surprise, this clears the string.
pub fn clear(&mut self) {
self.vec.clear()
}
/// Append some text.
pub fn push_str(&mut self, string: &str) {
self.vec.extend_from_slice(string.as_bytes())
}
/// Append a single character.
#[inline]
pub fn push(&mut self, ch: char) {
match ch.len_utf8() {
1 => self.vec.push(ch as u8),
@ -156,6 +187,7 @@ impl<'a> ArenaString<'a> {
}
}
/// Replaces a range of characters with a new string.
pub fn replace_range<R: RangeBounds<usize>>(&mut self, range: R, replace_with: &str) {
match range.start_bound() {
Bound::Included(&n) => assert!(self.is_char_boundary(n)),

View file

@ -1,19 +1,31 @@
//! Base64 facilities.
use crate::arena::ArenaString;
const CHARSET: [u8; 64] = *b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
/// Encodes the given bytes as base64 and appends them to the destination string.
pub fn encode(dst: &mut ArenaString, src: &[u8]) {
unsafe {
let mut inp = src.as_ptr();
let mut remaining = src.len();
let dst = dst.as_mut_vec();
// One aspect of base64 is that the encoded length can be calculated accurately in advance.
let out_len = src.len().div_ceil(3) * 4;
// ... we can then use this fact to reserve space all at once.
dst.reserve(out_len);
// SAFETY: Getting a pointer to the reserved space is only safe
// *after* calling `reserve()` as it may change the pointer.
let mut out = dst.as_mut_ptr().add(dst.len());
if remaining != 0 {
// Translate chunks of 3 source bytes into 4 base64-encoded bytes.
while remaining > 3 {
// SAFETY: Thanks to `remaining > 3`, reading 4 bytes at once is safe.
// This improves performance massively over a byte-by-byte approach,
// because it allows us to byte-swap the read and use simple bit-shifts below.
let val = u32::from_be((inp as *const u32).read_unaligned());
inp = inp.add(3);
remaining -= 3;
@ -32,6 +44,8 @@ pub fn encode(dst: &mut ArenaString, src: &[u8]) {
let mut in1 = 0;
let mut in2 = 0;
// We can simplify the following logic by assuming that there's only 1
// byte left. If there's >1 byte left, these two '=' will be overwritten.
*out.add(3) = b'=';
*out.add(2) = b'=';

View file

@ -27,7 +27,7 @@ use edit::input::{self, kbmod, vk};
use edit::oklab::oklab_blend;
use edit::tui::*;
use edit::vt::{self, Token};
use edit::{apperr, base64, path, sys};
use edit::{apperr, base64, icu, path, sys};
use localization::*;
use state::*;
@ -51,6 +51,10 @@ fn main() -> process::ExitCode {
}
fn run() -> apperr::Result<()> {
let items = vec!["hello.txt", "hallo.txt", "world.txt", "Hello, world.txt"];
let mut sorted = items.clone();
sorted.sort_by(|a, b| icu::compare_strings(a.as_bytes(), b.as_bytes()));
// Init `sys` first, as everything else may depend on its functionality (IO, function pointers, etc.).
let _sys_deinit = sys::init()?;
// Next init `arena`, so that `scratch_arena` works. `loc` depends on it.

View file

@ -1,13 +1,17 @@
//! A text buffer for a text editor.
//!
//! Implements a Unicode-aware, layout-aware text buffer for terminals.
//! It's based on a gap buffer. It has no line cache and instead relies
//! on the performance of the ucd module for fast text navigation.
//!
//! ---
//!
//! If the project ever outgrows a basic gap buffer (e.g. to add time travel)
//! an ideal, alternative architecture would be a piece table with immutable trees.
//! The tree nodes can be allocated on the same arena allocator as the added chunks,
//! making lifetime management fairly easy. The algorithm is described here:
//! * https://cdacamar.github.io/data%20structures/algorithms/benchmarking/text%20editors/c++/editor-data-structures/
//! * https://github.com/cdacamar/fredbuf
//! * <https://cdacamar.github.io/data%20structures/algorithms/benchmarking/text%20editors/c++/editor-data-structures/>
//! * <https://github.com/cdacamar/fredbuf>
//!
//! The downside is that text navigation & search takes a performance hit due to small chunks.
//! The solution to the former is to keep line caches, which further complicates the architecture.
@ -36,8 +40,8 @@ use crate::framebuffer::{Framebuffer, IndexedColor};
use crate::helpers::*;
use crate::oklab::oklab_blend;
use crate::simd::memchr2;
use crate::unicode::{Cursor, MeasurementConfig};
use crate::{apperr, icu, unicode};
use crate::unicode::{self, Cursor, MeasurementConfig};
use crate::{apperr, icu};
/// The margin template is used for line numbers.
/// The max. line number we should ever expect is probably 64-bit,
@ -47,16 +51,25 @@ const MARGIN_TEMPLATE: &str = " │ ";
/// Happens to reuse MARGIN_TEMPLATE, because it has sufficient whitespace.
const TAB_WHITESPACE: &str = MARGIN_TEMPLATE;
/// Stores statistics about the whole document.
#[derive(Copy, Clone)]
pub struct TextBufferStatistics {
logical_lines: CoordType,
visual_lines: CoordType,
}
/// Stores the active text selection.
#[derive(Copy, Clone)]
enum TextBufferSelection {
/// No active selection.
None,
/// The user is currently selecting text.
///
/// Moving the cursor will update the selection.
Active { beg: Point, end: Point },
/// The user stopped selecting text.
///
/// Moving the cursor will destroy the selection.
Done { beg: Point, end: Point },
}
@ -66,6 +79,9 @@ impl TextBufferSelection {
}
}
/// In order to group actions into a single undo step,
/// we need to know the type of action that was performed.
/// This stores the action type.
#[derive(Copy, Clone, Eq, PartialEq)]
enum HistoryType {
Other,
@ -73,11 +89,15 @@ enum HistoryType {
Delete,
}
/// An undo/redo entry.
struct HistoryEntry {
/// Logical cursor position before the change was made.
/// [`TextBuffer::cursor`] position before the change was made.
cursor_before: Point,
/// [`TextBuffer::selection`] before the change was made.
selection_before: TextBufferSelection,
/// [`TextBuffer::stats`] before the change was made.
stats_before: TextBufferStatistics,
/// [`GapBuffer::generation`] before the change was made.
generation_before: u32,
/// Logical cursor position where the change took place.
/// The position is at the start of the changed range.
@ -88,21 +108,38 @@ struct HistoryEntry {
added: Vec<u8>,
}
/// Caches an ICU search operation.
struct ActiveSearch {
/// The search pattern.
pattern: String,
/// The search options.
options: SearchOptions,
/// The ICU `UText` object.
text: icu::Text,
/// The ICU `URegularExpression` object.
regex: icu::Regex,
/// [`GapBuffer::generation`] when the search was created.
/// This is used to detect if we need to refresh the
/// [`ActiveSearch::regex`] object.
buffer_generation: u32,
/// [`TextBuffer::selection_generation`] when the search was
/// created. When the user manually selects text, we need to
/// refresh the [`ActiveSearch::pattern`] with it.
selection_generation: u32,
/// Stores the text buffer offset in between searches.
next_search_offset: usize,
/// If we know there were no hits, we can skip searching.
no_matches: bool,
}
/// Options for a search operation.
#[derive(Default, Clone, Copy, Eq, PartialEq)]
pub struct SearchOptions {
/// If true, the search is case-sensitive.
pub match_case: bool,
/// If true, the search matches whole words.
pub whole_word: bool,
/// If true, the search uses regex.
pub use_regex: bool,
}
@ -111,22 +148,36 @@ pub struct SearchOptions {
struct ActiveEditLineInfo {
/// Points to the start of the currently being edited line.
safe_start: Cursor,
/// Number of visual rows of the line that starts
/// at [`ActiveEditLineInfo::safe_start`].
line_height_in_rows: CoordType,
/// Byte distance from the start of the line at
/// [`ActiveEditLineInfo::safe_start`] to the next line.
distance_next_line_start: usize,
}
/// Char- or word-wise navigation? Your choice.
pub enum CursorMovement {
Grapheme,
Word,
}
/// The result of a call to [`TextBuffer::render()`].
pub struct RenderResult {
/// The maximum visual X position we encountered during rendering.
pub visual_pos_x_max: CoordType,
}
/// A [`TextBuffer`] with inner mutability.
pub type TextBufferCell = SemiRefCell<TextBuffer>;
/// A [`TextBuffer`] inside an [`Rc`].
///
/// We need this because the TUI system needs to borrow
/// the given text buffer(s) until after the layout process.
pub type RcTextBuffer = Rc<TextBufferCell>;
/// A text buffer for a text editor.
pub struct TextBuffer {
buffer: GapBuffer,
@ -167,11 +218,15 @@ pub struct TextBuffer {
}
impl TextBuffer {
/// Creates a new text buffer inside an [`Rc`].
/// See [`TextBuffer::new()`].
pub fn new_rc(small: bool) -> apperr::Result<RcTextBuffer> {
let buffer = TextBuffer::new(small)?;
Ok(Rc::new(SemiRefCell::new(buffer)))
}
/// Creates a new text buffer. With `small` you can control
/// if the buffer is optimized for <1MiB contents.
pub fn new(small: bool) -> apperr::Result<Self> {
Ok(Self {
buffer: GapBuffer::new(small)?,
@ -209,26 +264,36 @@ impl TextBuffer {
})
}
/// Length of the document in bytes.
pub fn text_length(&self) -> usize {
self.buffer.len()
}
/// Number of logical lines in the document,
/// that is, lines separated by newlines.
pub fn logical_line_count(&self) -> CoordType {
self.stats.logical_lines
}
/// Number of visual lines in the document,
/// that is, the number of lines after layout.
pub fn visual_line_count(&self) -> CoordType {
self.stats.visual_lines
}
/// Does the buffer need to be saved?
pub fn is_dirty(&self) -> bool {
self.last_save_generation != self.buffer.generation()
}
/// The buffer generation changes on every edit.
/// With this you can check if it has changed since
/// the last time you called this function.
pub fn generation(&self) -> u32 {
self.buffer.generation()
}
/// Force the buffer to be dirty.
pub fn mark_as_dirty(&mut self) {
self.last_save_generation = self.buffer.generation().wrapping_sub(1);
}
@ -237,10 +302,12 @@ impl TextBuffer {
self.last_save_generation = self.buffer.generation();
}
/// The encoding used during reading/writing. "UTF-8" is the default.
pub fn encoding(&self) -> &'static str {
self.encoding
}
/// Set the encoding used during reading/writing.
pub fn set_encoding(&mut self, encoding: &'static str) {
if self.encoding != encoding {
self.encoding = encoding;
@ -248,10 +315,14 @@ impl TextBuffer {
}
}
/// The newline type used in the document. LF or CRLF.
pub fn is_crlf(&self) -> bool {
self.newlines_are_crlf
}
/// Changes the newline type used in the document.
///
/// NOTE: Cannot be undone.
pub fn normalize_newlines(&mut self, crlf: bool) {
let newline: &[u8] = if crlf { b"\r\n" } else { b"\n" };
let mut off = 0;
@ -318,26 +389,34 @@ impl TextBuffer {
self.newlines_are_crlf = crlf;
}
/// Whether to insert or overtype text when writing.
pub fn is_overtype(&self) -> bool {
self.overtype
}
/// Set the overtype mode.
pub fn set_overtype(&mut self, overtype: bool) {
self.overtype = overtype;
}
/// Gets the logical cursor position, that is,
/// the position in lines and graphemes per line.
pub fn cursor_logical_pos(&self) -> Point {
self.cursor.logical_pos
}
/// Gets the visual cursor position, that is,
/// the position in laid out rows and columns.
pub fn cursor_visual_pos(&self) -> Point {
self.cursor.visual_pos
}
/// Gets the width of the left margin.
pub fn margin_width(&self) -> CoordType {
self.margin_width
}
/// Is the left margin enabled?
pub fn set_margin_enabled(&mut self, enabled: bool) -> bool {
if self.margin_enabled == enabled {
false
@ -348,22 +427,38 @@ impl TextBuffer {
}
}
/// Gets the width of the text contents for layout.
pub fn text_width(&self) -> CoordType {
self.width - self.margin_width
}
/// Ask the TUI system to scroll the buffer and make the cursor visible.
///
/// TODO: This function shows that [`TextBuffer`] is poorly abstracted
/// away from the TUI system. The only reason this exists is so that
/// if someone outside the TUI code enables word-wrap, the TUI code
/// recognizes this and scrolls the cursor into view. But outside of this
/// scrolling, views, etc., are all UI concerns = this should not be here.
pub fn make_cursor_visible(&mut self) {
self.wants_cursor_visibility = true;
}
/// For the TUI code to retrieve a prior [`TextBuffer::make_cursor_visible()`] request.
pub fn take_cursor_visibility_request(&mut self) -> bool {
mem::take(&mut self.wants_cursor_visibility)
}
/// Is word-wrap enabled?
///
/// Technically, this is a misnomer, because it's line-wrapping.
pub fn is_word_wrap_enabled(&self) -> bool {
self.word_wrap_enabled
}
/// Enable or disable word-wrap.
///
/// NOTE: It's expected that the tui code calls `set_width()` sometime after this.
/// This will then trigger the actual recalculation of the cursor position.
pub fn set_word_wrap(&mut self, enabled: bool) {
if self.word_wrap_enabled != enabled {
self.word_wrap_enabled = enabled;
@ -372,6 +467,11 @@ impl TextBuffer {
}
}
/// Set the width available for layout.
///
/// Ideally this would be a pure UI concern, but the text buffer needs this
/// so that it can abstract away visual cursor movement such as "go a line up".
/// What would that even mean if it didn't know how wide a line is?
pub fn set_width(&mut self, width: CoordType) -> bool {
if width <= 0 || width == self.width {
false
@ -382,10 +482,12 @@ impl TextBuffer {
}
}
/// Set the tab width. Could be anything, but is expected to be 1-8.
pub fn tab_size(&self) -> CoordType {
self.tab_size
}
/// Set the tab size. Clamped to 1-8.
pub fn set_tab_size(&mut self, width: CoordType) -> bool {
let width = width.clamp(1, 8);
if width == self.tab_size {
@ -397,18 +499,22 @@ impl TextBuffer {
}
}
/// Returns whether tabs are used for indentation.
pub fn indent_with_tabs(&self) -> bool {
self.indent_with_tabs
}
/// Sets whether tabs or spaces are used for indentation.
pub fn set_indent_with_tabs(&mut self, indent_with_tabs: bool) {
self.indent_with_tabs = indent_with_tabs;
}
/// Sets whether the line the cursor is on should be highlighted.
pub fn set_line_highlight_enabled(&mut self, enabled: bool) {
self.line_highlight_enabled = enabled;
}
/// Sets a ruler column, e.g. 80.
pub fn set_ruler(&mut self, column: CoordType) {
self.ruler = column;
}
@ -799,6 +905,7 @@ impl TextBuffer {
Ok(())
}
/// Returns the current selection.
pub fn has_selection(&self) -> bool {
self.selection.is_some()
}
@ -809,6 +916,7 @@ impl TextBuffer {
self.selection_generation
}
/// Moves the cursor to `visual_pos` and updates the selection to contain it.
pub fn selection_update_visual(&mut self, visual_pos: Point) {
let cursor = self.cursor;
self.set_cursor_for_selection(self.cursor_move_to_visual_internal(cursor, visual_pos));
@ -826,6 +934,7 @@ impl TextBuffer {
}
}
/// Moves the cursor to `logical_pos` and updates the selection to contain it.
pub fn selection_update_logical(&mut self, logical_pos: Point) {
let cursor = self.cursor;
self.set_cursor_for_selection(self.cursor_move_to_logical_internal(cursor, logical_pos));
@ -843,6 +952,7 @@ impl TextBuffer {
}
}
/// Moves the cursor by `delta` and updates the selection to contain it.
pub fn selection_update_delta(&mut self, granularity: CursorMovement, delta: CoordType) {
let cursor = self.cursor;
self.set_cursor_for_selection(self.cursor_move_delta_internal(cursor, granularity, delta));
@ -860,6 +970,7 @@ impl TextBuffer {
}
}
/// Select the current word.
pub fn select_word(&mut self) {
let Range { start, end } = navigation::word_select(&self.buffer, self.cursor.offset);
let beg = self.cursor_move_to_offset_internal(self.cursor, start);
@ -871,6 +982,7 @@ impl TextBuffer {
});
}
/// Select the current line.
pub fn select_line(&mut self) {
let beg = self.cursor_move_to_logical_internal(
self.cursor,
@ -885,6 +997,7 @@ impl TextBuffer {
});
}
/// Select the entire document.
pub fn select_all(&mut self) {
let beg = Default::default();
let end = self.cursor_move_to_logical_internal(beg, Point::MAX);
@ -895,18 +1008,23 @@ impl TextBuffer {
});
}
/// Turn an active selection into a finalized selection.
///
/// Any future cursor movement will destroy the selection.
pub fn selection_finalize(&mut self) {
if let TextBufferSelection::Active { beg, end } = self.selection {
self.set_selection(TextBufferSelection::Done { beg, end });
}
}
/// Destroy the current selection.
pub fn clear_selection(&mut self) -> bool {
let had_selection = self.selection.is_some();
self.set_selection(TextBufferSelection::None);
had_selection
}
/// Find the next occurrence of the given `pattern` and select it.
pub fn find_and_select(&mut self, pattern: &str, options: SearchOptions) -> apperr::Result<()> {
if let Some(search) = &mut self.search {
let search = search.get_mut();
@ -959,6 +1077,7 @@ impl TextBuffer {
Ok(())
}
/// Find the next occurrence of the given `pattern` and replace it with `replacement`.
pub fn find_and_replace(
&mut self,
pattern: &str,
@ -978,6 +1097,7 @@ impl TextBuffer {
self.find_and_select(pattern, options)
}
/// Find all occurrences of the given `pattern` and replace them with `replacement`.
pub fn find_and_replace_all(
&mut self,
pattern: &str,
@ -1333,18 +1453,22 @@ impl TextBuffer {
cursor
}
/// Moves the cursor to the given offset.
pub fn cursor_move_to_offset(&mut self, offset: usize) {
unsafe { self.set_cursor(self.cursor_move_to_offset_internal(self.cursor, offset)) }
}
/// Moves the cursor to the given logical position.
pub fn cursor_move_to_logical(&mut self, pos: Point) {
unsafe { self.set_cursor(self.cursor_move_to_logical_internal(self.cursor, pos)) }
}
/// Moves the cursor to the given visual position.
pub fn cursor_move_to_visual(&mut self, pos: Point) {
unsafe { self.set_cursor(self.cursor_move_to_visual_internal(self.cursor, pos)) }
}
/// Moves the cursor by the given delta.
pub fn cursor_move_delta(&mut self, granularity: CursorMovement, delta: CoordType) {
unsafe { self.set_cursor(self.cursor_move_delta_internal(self.cursor, granularity, delta)) }
}
@ -1847,11 +1971,13 @@ impl TextBuffer {
self.edit_end();
}
// TODO: This function is ripe for some optimizations:
// * Instead of replacing the entire selection,
// it should unindent each line directly (as if multiple cursors had been used).
// * The cursor movement at the end is rather costly, but at least without word wrap
// it should be possible to calculate it directly from the removed amount.
/// Unindents the current selection or line.
///
/// TODO: This function is ripe for some optimizations:
/// * Instead of replacing the entire selection,
/// it should unindent each line directly (as if multiple cursors had been used).
/// * The cursor movement at the end is rather costly, but at least without word wrap
/// it should be possible to calculate it directly from the removed amount.
pub fn unindent(&mut self) {
let mut selection_beg = self.cursor.logical_pos;
let mut selection_end = selection_beg;
@ -1927,7 +2053,8 @@ impl TextBuffer {
self.set_cursor_internal(self.cursor_move_to_logical_internal(self.cursor, selection_end));
}
/// Extracts a chunk of text or a line if no selection is active. May optionally delete it.
/// Extracts the contents of the current selection.
/// May optionally delete it, if requested. This is meant to be used for Ctrl+X.
pub fn extract_selection(&mut self, delete: bool) -> Vec<u8> {
let Some((beg, end)) = self.selection_range_internal(true) else {
return Vec::new();
@ -1946,6 +2073,9 @@ impl TextBuffer {
out
}
/// Extracts the contents of the current selection the user made.
/// This differs from [`TextBuffer::extract_selection()`] in that
/// it does nothing if the selection was made by searching.
pub fn extract_user_selection(&mut self, delete: bool) -> Option<Vec<u8>> {
if !self.has_selection() {
return None;
@ -1961,10 +2091,17 @@ impl TextBuffer {
Some(self.extract_selection(delete))
}
/// Returns the current selection anchors, or `None` if there
/// is no selection. The returned logical positions are sorted.
pub fn selection_range(&self) -> Option<(Cursor, Cursor)> {
self.selection_range_internal(false)
}
/// Returns the current selection anchors.
///
/// If there's no selection and `line_fallback` is `true`,
/// the start/end of the current line are returned.
/// This is meant to be used for Ctrl+C / Ctrl+X.
fn selection_range_internal(&self, line_fallback: bool) -> Option<(Cursor, Cursor)> {
let [beg, end] = match self.selection {
TextBufferSelection::None if !line_fallback => return None,
@ -1983,6 +2120,8 @@ impl TextBuffer {
if beg.offset < end.offset { Some((beg, end)) } else { None }
}
/// Starts a new edit operation.
/// This is used for tracking the undo/redo history.
fn edit_begin(&mut self, history_type: HistoryType, cursor: Cursor) {
self.active_edit_depth += 1;
if self.active_edit_depth > 1 {
@ -2033,6 +2172,8 @@ impl TextBuffer {
}
}
/// Writes `text` into the buffer at the current cursor position.
/// It records the change in the undo stack.
fn edit_write(&mut self, text: &[u8]) {
let logical_y_before = self.cursor.logical_pos.y;
@ -2052,6 +2193,8 @@ impl TextBuffer {
self.stats.logical_lines += self.cursor.logical_pos.y - logical_y_before;
}
/// Deletes the text between the current cursor position and `to`.
/// It records the change in the undo stack.
fn edit_delete(&mut self, to: Cursor) {
debug_assert!(to.offset >= self.active_edit_off);
@ -2076,6 +2219,8 @@ impl TextBuffer {
self.stats.logical_lines += logical_y_before - to.logical_pos.y;
}
/// Finalizes the current edit operation
/// and recalculates the line statistics.
fn edit_end(&mut self) {
self.active_edit_depth -= 1;
assert!(self.active_edit_depth >= 0);
@ -2125,10 +2270,12 @@ impl TextBuffer {
self.reflow(false);
}
/// Undo the last edit operation.
pub fn undo(&mut self) {
self.undo_redo(true);
}
/// Redo the last undo operation.
pub fn redo(&mut self) {
self.undo_redo(false);
}
@ -2238,10 +2385,12 @@ impl TextBuffer {
self.reflow(false);
}
/// For interfacing with ICU.
pub(crate) fn read_backward(&self, off: usize) -> &[u8] {
self.buffer.read_backward(off)
}
/// For interfacing with ICU.
pub fn read_forward(&self, off: usize) -> &[u8] {
self.buffer.read_forward(off)
}

View file

@ -1,4 +1,4 @@
//! Like `RefCell`, but without any runtime checks in release mode.
//! [`std::cell::RefCell`], but without runtime checks in release builds.
#[cfg(debug_assertions)]
pub use debug::*;

View file

@ -8,7 +8,7 @@ use std::path::PathBuf;
use crate::arena::{ArenaString, scratch_arena};
use crate::helpers::ReplaceRange as _;
/// An abstraction over potentially chunked text containers.
/// An abstraction over reading from text containers.
pub trait ReadableDocument {
/// Read some bytes starting at (including) the given absolute offset.
///
@ -16,7 +16,7 @@ pub trait ReadableDocument {
///
/// * Be lenient on inputs:
/// * The given offset may be out of bounds and you MUST clamp it.
/// * You SHOULD NOT assume that offsets are at grapheme cluster boundaries.
/// * You should not assume that offsets are at grapheme cluster boundaries.
/// * Be strict on outputs:
/// * You MUST NOT break grapheme clusters across chunks.
/// * You MUST NOT return an empty slice unless the offset is at or beyond the end.
@ -28,14 +28,21 @@ pub trait ReadableDocument {
///
/// * Be lenient on inputs:
/// * The given offset may be out of bounds and you MUST clamp it.
/// * You SHOULD NOT assume that offsets are at grapheme cluster boundaries.
/// * You should not assume that offsets are at grapheme cluster boundaries.
/// * Be strict on outputs:
/// * You MUST NOT break grapheme clusters across chunks.
/// * You MUST NOT return an empty slice unless the offset is zero.
fn read_backward(&self, off: usize) -> &[u8];
}
/// An abstraction over writing to text containers.
pub trait WriteableDocument: ReadableDocument {
/// Replace the given range with the given bytes.
///
/// # Warning
///
/// * The given range may be out of bounds and you MUST clamp it.
/// * The replacement may not be valid UTF8.
fn replace(&mut self, range: Range<usize>, replacement: &[u8]);
}

View file

@ -1,3 +1,5 @@
//! A shoddy framebuffer for terminal applications.
use std::cell::Cell;
use std::fmt::Write;
use std::ops::{BitOr, BitXor};
@ -24,6 +26,7 @@ const CACHE_TABLE_SIZE: usize = 1 << CACHE_TABLE_LOG2_SIZE;
/// 8 bits out, but rather shift 56 bits down to get the best bits from the top.
const CACHE_TABLE_SHIFT: usize = usize::BITS as usize - CACHE_TABLE_LOG2_SIZE;
/// Standard 16 VT & default foreground/background colors.
#[derive(Clone, Copy)]
pub enum IndexedColor {
Black,
@ -47,33 +50,55 @@ pub enum IndexedColor {
Foreground,
}
/// Number of indices used by [`IndexedColor`].
pub const INDEXED_COLORS_COUNT: usize = 18;
/// Fallback theme.
pub const DEFAULT_THEME: [u32; INDEXED_COLORS_COUNT] = [
0xff000000, 0xff212cbe, 0xff3aae3f, 0xff4a9abe, 0xffbe4d20, 0xffbe54bb, 0xffb2a700, 0xffbebebe,
0xff808080, 0xff303eff, 0xff51ea58, 0xff44c9ff, 0xffff6a2f, 0xffff74fc, 0xfff0e100, 0xffffffff,
0xff000000, 0xffffffff,
];
/// A shoddy framebuffer for terminal applications.
///
/// The idea is that you create a [`Framebuffer`], draw a bunch of text and
/// colors into it, and it takes care of figuring out what changed since the
/// last rendering and sending the differences as VT to the terminal.
///
/// This is an improvement over how many other terminal applications work,
/// as they fail to accurately track what changed. If you watch the output
/// of `vim` for instance, you'll notice that it redraws unrelated parts of
/// the screen all the time.
pub struct Framebuffer {
/// Store the color palette.
indexed_colors: [u32; INDEXED_COLORS_COUNT],
/// Front and back buffers. Indexed by `frame_counter & 1`.
buffers: [Buffer; 2],
/// The current frame counter. Increments on every `flip` call.
frame_counter: usize,
auto_colors: [u32; 2], // [dark, light]
/// The colors used for `contrast()`. It stores the default colors
/// of the palette as [dark, light], unless the palette is recognized
/// as a light them, in which case it swaps them.
auto_colors: [u32; 2],
/// A cache table for previously contrasted colors.
/// See: <https://fgiesen.wordpress.com/2019/02/11/cache-tables/>
contrast_colors: [Cell<(u32, u32)>; CACHE_TABLE_SIZE],
}
impl Framebuffer {
/// Creates a new framebuffer.
pub fn new() -> Self {
Self {
indexed_colors: DEFAULT_THEME,
buffers: Default::default(),
frame_counter: 0,
auto_colors: [0, 0],
contrast_colors: [const { Cell::new((0, 0)) }; 256],
contrast_colors: [const { Cell::new((0, 0)) }; CACHE_TABLE_SIZE],
}
}
/// Sets the base color palette.
pub fn set_indexed_colors(&mut self, colors: [u32; INDEXED_COLORS_COUNT]) {
self.indexed_colors = colors;
@ -86,6 +111,7 @@ impl Framebuffer {
}
}
/// Begins a new frame with the given `size`.
pub fn flip(&mut self, size: Size) {
if size != self.buffers[0].bg_bitmap.size {
for buffer in &mut self.buffers {
@ -117,9 +143,7 @@ impl Framebuffer {
/// Replaces text contents in a single line of the framebuffer.
/// All coordinates are in viewport coordinates.
/// Assumes that all tabs have been replaced with spaces.
///
/// TODO: This function is ripe for performance improvements.
/// Assumes that control characters have been replaced or escaped.
pub fn replace_text(
&mut self,
y: CoordType,
@ -131,6 +155,18 @@ impl Framebuffer {
back.text.replace_text(y, origin_x, clip_right, text)
}
/// Draws a scrollbar in the given `track` rectangle.
///
/// Not entirely sure why I put it here instead of elsewhere.
///
/// # Parameters
///
/// * `clip_rect`: Clips the rendering to this rectangle.
/// This is relevant when you have scrollareas inside scrollareas.
/// * `track`: The rectangle in which to draw the scrollbar.
/// In absolute viewport coordinates.
/// * `content_offset`: The current offset of the scrollarea.
/// * `content_height`: The height of the scrollarea content.
pub fn draw_scrollbar(
&mut self,
clip_rect: Rect,
@ -247,8 +283,10 @@ impl Framebuffer {
self.indexed_colors[index as usize]
}
// To facilitate constant folding by the compiler,
// alpha is given as a fraction (`numerator` / `denominator`).
/// Returns a color from the palette.
///
/// To facilitate constant folding by the compiler,
/// alpha is given as a fraction (`numerator` / `denominator`).
#[inline]
pub fn indexed_alpha(&self, index: IndexedColor, numerator: u32, denominator: u32) -> u32 {
let c = self.indexed_colors[index as usize];
@ -259,6 +297,7 @@ impl Framebuffer {
a << 24 | r << 16 | g << 8 | b
}
/// Returns a color opposite to the brightness of the given `color`.
pub fn contrasted(&self, color: u32) -> u32 {
let idx = (color as usize).wrapping_mul(HASH_MULTIPLIER) >> CACHE_TABLE_SHIFT;
let slot = self.contrast_colors[idx].get();
@ -277,16 +316,25 @@ impl Framebuffer {
srgb_to_oklab(color).l < 0.5
}
/// Blends the given sRGB color onto the background bitmap.
///
/// TODO: The current approach blends foreground/background independently,
/// but ideally `blend_bg` with semi-transparent dark should also darken text below it.
pub fn blend_bg(&mut self, target: Rect, bg: u32) {
let back = &mut self.buffers[self.frame_counter & 1];
back.bg_bitmap.blend(target, bg);
}
/// Blends the given sRGB color onto the foreground bitmap.
///
/// TODO: The current approach blends foreground/background independently,
/// but ideally `blend_fg` should blend with the background color below it.
pub fn blend_fg(&mut self, target: Rect, fg: u32) {
let back = &mut self.buffers[self.frame_counter & 1];
back.fg_bitmap.blend(target, fg);
}
/// Reverses the foreground and background colors in the given rectangle.
pub fn reverse(&mut self, target: Rect) {
let back = &mut self.buffers[self.frame_counter & 1];
@ -310,17 +358,23 @@ impl Framebuffer {
}
}
/// Replaces VT attributes in the given rectangle.
pub fn replace_attr(&mut self, target: Rect, mask: Attributes, attr: Attributes) {
let back = &mut self.buffers[self.frame_counter & 1];
back.attributes.replace(target, mask, attr);
}
/// Sets the current visible cursor position and type.
///
/// Call this when focus is inside an editable area and you want to show the cursor.
pub fn set_cursor(&mut self, pos: Point, overtype: bool) {
let back = &mut self.buffers[self.frame_counter & 1];
back.cursor.pos = pos;
back.cursor.overtype = overtype;
}
/// Renders the framebuffer contents accumulated since the
/// last call to `flip()` and returns them serialized as VT.
pub fn render<'a>(&mut self, arena: &'a Arena) -> ArenaString<'a> {
let idx = self.frame_counter & 1;
// Borrows the front/back buffers without letting Rust know that we have a reference to self.
@ -484,6 +538,7 @@ struct Buffer {
cursor: Cursor,
}
/// A buffer for the text contents of the framebuffer.
#[derive(Default)]
struct LineBuffer {
lines: Vec<String>,
@ -509,10 +564,8 @@ impl LineBuffer {
/// Replaces text contents in a single line of the framebuffer.
/// All coordinates are in viewport coordinates.
/// Assumes that all tabs have been replaced with spaces.
///
/// TODO: This function is ripe for performance improvements.
pub fn replace_text(
/// Assumes that control characters have been replaced or escaped.
fn replace_text(
&mut self,
y: CoordType,
origin_x: CoordType,
@ -632,6 +685,7 @@ impl LineBuffer {
}
}
/// An sRGB bitmap.
#[derive(Default)]
struct Bitmap {
data: Vec<u32>,
@ -647,6 +701,10 @@ impl Bitmap {
memset(&mut self.data, color);
}
/// Blends the given sRGB color onto the bitmap.
///
/// This uses the `oklab` color space for blending so the
/// resulting colors may look different from what you'd expect.
fn blend(&mut self, target: Rect, color: u32) {
if (color & 0xff000000) == 0x00000000 {
return;
@ -700,11 +758,14 @@ impl Bitmap {
}
}
/// A bitfield for VT text attributes.
///
/// It being a bitfield allows for simple diffing.
#[repr(transparent)]
#[derive(Default, Clone, Copy, PartialEq, Eq)]
pub struct Attributes(u8);
#[allow(non_upper_case_globals)] // Mimics an enum, but it's actually a bitfield. Allows simple diffing.
#[allow(non_upper_case_globals)]
impl Attributes {
pub const None: Attributes = Attributes(0);
pub const Italic: Attributes = Attributes(0b1);
@ -734,6 +795,7 @@ impl BitXor for Attributes {
}
}
/// Stores VT attributes for the framebuffer.
#[derive(Default)]
struct AttributeBuffer {
data: Vec<Attributes>,
@ -782,6 +844,7 @@ impl AttributeBuffer {
}
}
/// Stores cursor position and type for the framebuffer.
#[derive(Default, PartialEq, Eq)]
struct Cursor {
pos: Point,

View file

@ -1,3 +1,12 @@
//! Provides fast, non-cryptographic hash functions.
/// The venerable wyhash hash function.
///
/// It's fast, has good statistical properties, and is in the public domain.
/// See: <https://github.com/wangyi-fudan/wyhash>
/// If you visit the link, you'll find that it was superseded by "rapidhash",
/// but that's not particularly interesting for this project. rapidhash results
/// in way larger assembly and isn't faster when hashing small amounts of data.
pub fn hash(mut seed: u64, data: &[u8]) -> u64 {
unsafe {
const S0: u64 = 0xa0761d6478bd642f;

View file

@ -1,3 +1,5 @@
//! Random assortment of helpers I didn't know where to put.
use std::alloc::Allocator;
use std::cmp::Ordering;
use std::io::Read;
@ -15,11 +17,17 @@ pub const KIBI: usize = 1024;
pub const MEBI: usize = 1024 * 1024;
pub const GIBI: usize = 1024 * 1024 * 1024;
/// A viewport coordinate type used throughout the application.
pub type CoordType = i32;
/// To avoid overflow issues because you're adding two [`CoordType::MAX`] values together,
/// you can use [`COORD_TYPE_SAFE_MIN`] and [`COORD_TYPE_SAFE_MAX`].
pub const COORD_TYPE_SAFE_MAX: CoordType = 32767;
/// See [`COORD_TYPE_SAFE_MAX`].
pub const COORD_TYPE_SAFE_MIN: CoordType = -32767 - 1;
/// A 2D point. Uses [`CoordType`].
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq)]
pub struct Point {
pub x: CoordType,
@ -46,6 +54,7 @@ impl Ord for Point {
}
}
/// A 2D size. Uses [`CoordType`].
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq)]
pub struct Size {
pub width: CoordType,
@ -58,6 +67,7 @@ impl Size {
}
}
/// A 2D rectangle. Uses [`CoordType`].
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq)]
pub struct Rect {
pub left: CoordType,
@ -67,34 +77,44 @@ pub struct Rect {
}
impl Rect {
/// Mimics CSS's `padding` property where `padding: a` is `a a a a`.
pub fn one(value: CoordType) -> Self {
Self { left: value, top: value, right: value, bottom: value }
}
/// Mimics CSS's `padding` property where `padding: a b` is `a b a b`,
/// and `a` is top/bottom and `b` is left/right.
pub fn two(top_bottom: CoordType, left_right: CoordType) -> Self {
Self { left: left_right, top: top_bottom, right: left_right, bottom: top_bottom }
}
/// Mimics CSS's `padding` property where `padding: a b c` is `a b c b`,
/// and `a` is top, `b` is left/right, and `c` is bottom.
pub fn three(top: CoordType, left_right: CoordType, bottom: CoordType) -> Self {
Self { left: left_right, top, right: left_right, bottom }
}
/// Is the rectangle empty?
pub fn is_empty(&self) -> bool {
self.left >= self.right || self.top >= self.bottom
}
/// Width of the rectangle.
pub fn width(&self) -> CoordType {
self.right - self.left
}
/// Height of the rectangle.
pub fn height(&self) -> CoordType {
self.bottom - self.top
}
/// Check if it contains a point.
pub fn contains(&self, point: Point) -> bool {
point.x >= self.left && point.x < self.right && point.y >= self.top && point.y < self.bottom
}
/// Intersect two rectangles.
pub fn intersect(&self, rhs: Self) -> Self {
let l = self.left.max(rhs.left);
let t = self.top.max(rhs.top);
@ -110,7 +130,7 @@ impl Rect {
}
}
/// `std::cmp::minmax` is unstable, as per usual.
/// [`std::cmp::minmax`] is unstable, as per usual.
pub fn minmax<T>(v1: T, v2: T) -> [T; 2]
where
T: Ord,
@ -145,12 +165,16 @@ pub const unsafe fn str_from_raw_parts<'a>(ptr: *const u8, len: usize) -> &'a st
unsafe { str::from_utf8_unchecked(slice::from_raw_parts(ptr, len)) }
}
/// [`<[T]>::copy_from_slice`] panics if the two slices have different lengths.
/// This one just returns the copied amount.
pub fn slice_copy_safe<T: Copy>(dst: &mut [T], src: &[T]) -> usize {
let len = src.len().min(dst.len());
unsafe { ptr::copy_nonoverlapping(src.as_ptr(), dst.as_mut_ptr(), len) };
len
}
/// [`Vec::splice`] results in really bad assembly.
/// This doesn't. Don't use [`Vec::splice`].
pub trait ReplaceRange<T: Copy> {
fn replace_range<R: RangeBounds<usize>>(&mut self, range: R, src: &[T]);
}
@ -205,6 +229,7 @@ fn vec_replace_impl<T: Copy, A: Allocator>(dst: &mut Vec<T, A>, range: Range<usi
}
}
/// [`Read`] but with [`MaybeUninit<u8>`] buffers.
pub fn file_read_uninit<T: Read>(
file: &mut T,
buf: &mut [MaybeUninit<u8>],
@ -216,11 +241,13 @@ pub fn file_read_uninit<T: Read>(
}
}
/// Turns a [`&[u8]`] into a [`&[MaybeUninit<T>]`].
#[inline(always)]
pub const fn slice_as_uninit_ref<T>(slice: &[T]) -> &[MaybeUninit<T>] {
unsafe { slice::from_raw_parts(slice.as_ptr() as *const MaybeUninit<T>, slice.len()) }
}
/// Turns a [`&mut [T]`] into a [`&mut [MaybeUninit<T>]`].
#[inline(always)]
pub const fn slice_as_uninit_mut<T>(slice: &mut [T]) -> &mut [MaybeUninit<T>] {
unsafe { slice::from_raw_parts_mut(slice.as_mut_ptr() as *mut MaybeUninit<T>, slice.len()) }

View file

@ -1,3 +1,5 @@
//! Bindings to the ICU library.
use std::cmp::Ordering;
use std::ffi::CStr;
use std::mem;
@ -13,6 +15,7 @@ use crate::{apperr, arena_format, sys};
static mut ENCODINGS: Vec<&'static str> = Vec::new();
/// Returns a list of encodings ICU supports.
pub fn get_available_encodings() -> &'static [&'static str] {
// OnceCell for people that want to put it into a static.
#[allow(static_mut_refs)]
@ -38,6 +41,7 @@ pub fn get_available_encodings() -> &'static [&'static str] {
}
}
/// Formats the given ICU error code into a human-readable string.
pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Result {
fn format(code: u32) -> &'static str {
let Ok(f) = init_if_needed() else {
@ -62,6 +66,7 @@ pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Re
}
}
/// Converts between two encodings using ICU.
pub struct Converter<'pivot> {
source: *mut icu_ffi::UConverter,
target: *mut icu_ffi::UConverter,
@ -80,6 +85,14 @@ impl Drop for Converter<'_> {
}
impl<'pivot> Converter<'pivot> {
/// Constructs a new `Converter` instance.
///
/// # Parameters
///
/// * `pivot_buffer`: A buffer used to cache partial conversions.
/// Don't make it too small.
/// * `source_encoding`: The source encoding name (e.g., "UTF-8").
/// * `target_encoding`: The target encoding name (e.g., "UTF-16").
pub fn new(
pivot_buffer: &'pivot mut [MaybeUninit<u16>],
source_encoding: &str,
@ -114,6 +127,20 @@ impl<'pivot> Converter<'pivot> {
arena_format!(arena, "{}\0", input)
}
/// Performs one step of the encoding conversion.
///
/// # Parameters
///
/// * `input`: The input buffer to convert from.
/// It should be in the `source_encoding` that was previously specified.
/// * `output`: The output buffer to convert to.
/// It should be in the `target_encoding` that was previously specified.
///
/// # Returns
///
/// A tuple containing:
/// 1. The number of bytes read from the input buffer.
/// 2. The number of bytes written to the output buffer.
pub fn convert(
&mut self,
input: &[u8],
@ -168,24 +195,26 @@ impl<'pivot> Converter<'pivot> {
// I picked 64 because it seemed like a reasonable lower bound.
const CACHE_SIZE: usize = 64;
// Caches a chunk of TextBuffer contents (UTF-8) in UTF-16 format.
/// Caches a chunk of TextBuffer contents (UTF-8) in UTF-16 format.
struct Cache {
/// The translated text. Contains `len`-many valid items.
/// The translated text. Contains [`Cache::utf16_len`]-many valid items.
utf16: [u16; CACHE_SIZE],
/// For each character in `utf16` this stores the offset in the `TextBuffer`,
/// For each character in [`Cache::utf16`] this stores the offset in the [`TextBuffer`],
/// relative to the start offset stored in `native_beg`.
/// This has the same length as `utf16`.
/// This has the same length as [`Cache::utf16`].
utf16_to_utf8_offsets: [u16; CACHE_SIZE],
/// `utf8_to_utf16_offsets[native_offset - native_beg]` will tell you which character
/// in `utf16` maps to the given `native_offset` in the underlying `TextBuffer`.
/// `utf8_to_utf16_offsets[native_offset - native_beg]` will tell you which character in
/// [`Cache::utf16`] maps to the given `native_offset` in the underlying [`TextBuffer`].
/// Contains `native_end - native_beg`-many valid items.
utf8_to_utf16_offsets: [u16; CACHE_SIZE],
/// The number of valid items in `utf16`.
/// The number of valid items in [`Cache::utf16`].
utf16_len: usize,
/// Offset of the first non-ASCII character.
/// Less than or equal to [`Cache::utf16_len`].
native_indexing_limit: usize,
/// The range of UTF-8 text in the `TextBuffer` that this chunk covers.
/// The range of UTF-8 text in the [`TextBuffer`] that this chunk covers.
utf8_range: Range<usize>,
}
@ -195,9 +224,15 @@ struct DoubleCache {
mru: bool,
}
// I initially did this properly with a PhantomData marker for the TextBuffer lifetime,
// but it was a pain so now I don't. Not a big deal - its only use is in a self-referential
// struct in TextBuffer which Rust can't deal with anyway.
/// A wrapper around ICU's `UText` struct.
///
/// In our case its only purpose is to adapt a [`TextBuffer`] for ICU.
///
/// # Safety
///
/// Warning! No lifetime tracking is done here.
/// I initially did it properly with a PhantomData marker for the TextBuffer
/// lifetime, but it was a pain so now I don't. Not a big deal in our case.
pub struct Text(&'static mut icu_ffi::UText);
impl Drop for Text {
@ -208,11 +243,12 @@ impl Drop for Text {
}
impl Text {
/// Constructs an ICU `UText` instance from a `TextBuffer`.
/// Constructs an ICU `UText` instance from a [`TextBuffer`].
///
/// # Safety
///
/// The caller must ensure that the given `TextBuffer` outlives the returned `Text` instance.
/// The caller must ensure that the given [`TextBuffer`]
/// outlives the returned `Text` instance.
pub unsafe fn new(tb: &TextBuffer) -> apperr::Result<Self> {
let f = init_if_needed()?;
@ -349,12 +385,16 @@ fn utext_access_impl<'a>(
let dirty = ut.a != tb.generation() as i64;
if dirty {
// The text buffer contents have changed.
// Invalidate both caches so that future calls don't mistakenly use them
// when they enter the for loop in the else branch below (`dirty == false`).
double_cache.cache[0].utf16_len = 0;
double_cache.cache[1].utf16_len = 0;
double_cache.cache[0].utf8_range = 0..0;
double_cache.cache[1].utf8_range = 0..0;
ut.a = tb.generation() as i64;
} else {
// Check if one of the caches already contains the requested range.
for (i, cache) in double_cache.cache.iter_mut().enumerate() {
if cache.utf8_range.contains(&index_contained) {
double_cache.mru = i != 0;
@ -443,13 +483,12 @@ fn utext_access_impl<'a>(
}
}
// TODO: This loop is the slow part of our uregex search. May be worth optimizing.
loop {
let Some(c) = it.next() else {
break;
};
// Thanks to our `if utf16_len >= utf16_limit` check,
// Thanks to our `if utf16_len >= UTF16_LEN_LIMIT` check,
// we can safely assume that this will fit.
unsafe {
let utf8_len_beg = utf8_len;
@ -515,7 +554,11 @@ extern "C" fn utext_map_native_index_to_utf16(ut: &icu_ffi::UText, native_index:
off_rel as i32
}
// Same reason here for not using a PhantomData marker as with `Text`.
/// A wrapper around ICU's `URegularExpression` struct.
///
/// # Safety
///
/// Warning! No lifetime tracking is done here.
pub struct Regex(&'static mut icu_ffi::URegularExpression);
impl Drop for Regex {
@ -526,8 +569,14 @@ impl Drop for Regex {
}
impl Regex {
/// Enable case-insensitive matching.
pub const CASE_INSENSITIVE: i32 = icu_ffi::UREGEX_CASE_INSENSITIVE;
/// If set, ^ and $ match the start and end of each line.
/// Otherwise, they match the start and end of the entire string.
pub const MULTILINE: i32 = icu_ffi::UREGEX_MULTILINE;
/// Treat the given pattern as a literal string.
pub const LITERAL: i32 = icu_ffi::UREGEX_LITERAL;
/// Constructs a regex, plain and simple. Read `uregex_open` docs.
@ -566,7 +615,7 @@ impl Regex {
}
/// Updates the regex pattern with the given text.
/// If the text contents have changed, you can pass the same text as you usued
/// If the text contents have changed, you can pass the same text as you used
/// initially and it'll trigger ICU to reload the text and invalidate its caches.
///
/// # Safety
@ -578,6 +627,7 @@ impl Regex {
unsafe { (f.uregex_setUText)(self.0, text.0 as *const _ as *mut _, &mut status) };
}
/// Sets the regex to the absolute offset in the underlying text.
pub fn reset(&mut self, index: usize) {
let f = assume_loaded();
let mut status = icu_ffi::U_ZERO_ERROR;
@ -611,6 +661,7 @@ impl Iterator for Regex {
static mut ROOT_COLLATOR: Option<*mut icu_ffi::UCollator> = None;
/// Compares two UTF-8 strings for sorting using ICU's collation algorithm.
pub fn compare_strings(a: &[u8], b: &[u8]) -> Ordering {
// OnceCell for people that want to put it into a static.
#[allow(static_mut_refs)]
@ -688,6 +739,10 @@ fn compare_strings_ascii(a: &[u8], b: &[u8]) -> Ordering {
static mut ROOT_CASEMAP: Option<*mut icu_ffi::UCaseMap> = None;
/// Converts the given UTF-8 string to lower case.
///
/// Case folding differs from lower case in that the output is primarily useful
/// to machines for comparisons. It's like applying Unicode normalization.
pub fn fold_case<'a>(arena: &'a Arena, input: &str) -> ArenaString<'a> {
// OnceCell for people that want to put it into a static.
#[allow(static_mut_refs)]

View file

@ -1,10 +1,17 @@
//! Parses VT sequences into input events.
//!
//! In the future this allows us to take apart the application and
//! support input schemes that aren't VT, such as UEFI, or GUI.
use crate::helpers::{CoordType, Point, Size};
use crate::vt;
// TODO: Is this a good idea? I did it to allow typing `kbmod::CTRL | vk::A`.
// The reason it's an awkard u32 and not a struct is to hopefully make ABIs easier later.
// Of course you could just translate on the ABI boundary, but my hope is that this
// design lets me realize some restrictions early on that I can't foresee yet.
/// Represents a key/modifier combination.
///
/// TODO: Is this a good idea? I did it to allow typing `kbmod::CTRL | vk::A`.
/// The reason it's an awkard u32 and not a struct is to hopefully make ABIs easier later.
/// Of course you could just translate on the ABI boundary, but my hope is that this
/// design lets me realize some restrictions early on that I can't foresee yet.
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq)]
pub struct InputKey(u32);
@ -47,6 +54,7 @@ impl InputKey {
}
}
/// A keyboard modifier. Ctrl/Alt/Shift.
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq)]
pub struct InputKeyMod(u32);
@ -83,8 +91,10 @@ impl std::ops::BitOrAssign for InputKeyMod {
}
}
// The codes defined here match the VK_* constants on Windows.
// It's a convenient way to handle keyboard input, even on other platforms.
/// Keyboard keys.
///
/// The codes defined here match the VK_* constants on Windows.
/// It's a convenient way to handle keyboard input, even on other platforms.
pub mod vk {
use super::InputKey;
@ -189,6 +199,7 @@ pub mod vk {
pub const F24: InputKey = InputKey::new(0x87);
}
/// Keyboard modifiers.
pub mod kbmod {
use super::InputKeyMod;
@ -203,12 +214,17 @@ pub mod kbmod {
pub const CTRL_ALT_SHIFT: InputKeyMod = InputKeyMod::new(0x07000000);
}
/// Text input.
///
/// "Keyboard" input is also "text" input and vice versa.
/// It differs in that text input can also be Unicode.
#[derive(Clone, Copy)]
pub struct InputText<'a> {
pub text: &'a str,
pub bracketed: bool,
}
/// Mouse input state. Up/Down, Left/Right, etc.
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Default)]
pub enum InputMouseState {
#[default]
@ -224,21 +240,34 @@ pub enum InputMouseState {
Scroll,
}
/// Mouse input.
#[derive(Clone, Copy)]
pub struct InputMouse {
/// The state of the mouse.Up/Down, Left/Right, etc.
pub state: InputMouseState,
/// Any keyboard modifiers that are held down.
pub modifiers: InputKeyMod,
/// Position of the mouse in the viewport.
pub position: Point,
/// Scroll delta.
pub scroll: Point,
}
/// Primary result type of the parser.
pub enum Input<'input> {
/// Window resize event.
Resize(Size),
/// Text input.
///
/// Note that [`Input::Keyboard`] events can also be text.
Text(InputText<'input>),
/// Keyboard input.
Keyboard(InputKey),
/// Mouse input.
Mouse(InputMouse),
}
/// Parses VT sequences into input events.
pub struct Parser {
bracketed_paste: bool,
x10_mouse_want: bool,
@ -247,6 +276,9 @@ pub struct Parser {
}
impl Parser {
/// Creates a new parser that turns VT sequences into input events.
///
/// Keep the instance alive for the lifetime of the input stream.
pub fn new() -> Self {
Self {
bracketed_paste: false,
@ -256,7 +288,8 @@ impl Parser {
}
}
/// Turns VT sequences into keyboard, mouse, etc., inputs.
/// Takes an [`vt::Stream`] and returns a [`Stream`]
/// that turns VT sequences into input events.
pub fn parse<'parser, 'vt, 'input>(
&'parser mut self,
stream: vt::Stream<'vt, 'input>,
@ -265,15 +298,15 @@ impl Parser {
}
}
/// An iterator that parses VT sequences into input events.
///
/// Can't implement [`Iterator`], because this is a "lending iterator".
pub struct Stream<'parser, 'vt, 'input> {
parser: &'parser mut Parser,
stream: vt::Stream<'vt, 'input>,
}
impl<'input> Stream<'_, '_, 'input> {
/// Parses the next input action from the previously given input.
///
/// Can't implement Iterator, because this is a "lending iterator".
#[allow(clippy::should_implement_trait)]
pub fn next(&mut self) -> Option<Input<'input>> {
loop {
@ -446,6 +479,17 @@ impl<'input> Stream<'_, '_, 'input> {
}
}
/// Once we encounter the start of a bracketed paste
/// we seek to the end of the paste in this function.
///
/// A bracketed paste is basically:
/// ```text
/// <ESC>[201~ lots of text <ESC>[201~
/// ```
///
/// That text inbetween is then expected to be taken literally.
/// It can inbetween be anything though, including other escape sequences.
/// This is the reason why this is a separate method.
#[cold]
fn handle_bracketed_paste(&mut self) -> Option<Input<'input>> {
let beg = self.stream.offset();

View file

@ -1,7 +1,10 @@
//! This module implements Oklab as defined at: https://bottosson.github.io/posts/oklab/
//! Oklab colorspace conversions.
//!
//! Implements Oklab as defined at: <https://bottosson.github.io/posts/oklab/>
#![allow(clippy::excessive_precision)]
/// An Oklab color with alpha.
pub struct Lab {
pub l: f32,
pub a: f32,
@ -9,6 +12,7 @@ pub struct Lab {
pub alpha: f32,
}
/// Converts a 32-bit sRGB color to Oklab.
pub fn srgb_to_oklab(color: u32) -> Lab {
let r = SRGB_TO_RGB_LUT[(color & 0xff) as usize];
let g = SRGB_TO_RGB_LUT[((color >> 8) & 0xff) as usize];
@ -31,6 +35,7 @@ pub fn srgb_to_oklab(color: u32) -> Lab {
}
}
/// Converts an Oklab color to a 32-bit sRGB color.
pub fn oklab_to_srgb(c: Lab) -> u32 {
let l_ = c.l + 0.3963377774 * c.a + 0.2158037573 * c.b;
let m_ = c.l - 0.1055613458 * c.a - 0.0638541728 * c.b;
@ -57,6 +62,7 @@ pub fn oklab_to_srgb(c: Lab) -> u32 {
r | (g << 8) | (b << 16) | (a << 24)
}
/// Blends two 32-bit sRGB colors in the Oklab color space.
pub fn oklab_blend(dst: u32, src: u32) -> u32 {
let dst = srgb_to_oklab(dst);
let src = srgb_to_oklab(src);

View file

@ -1,3 +1,5 @@
//! Path related helpers.
use std::ffi::OsStr;
use std::path::{Component, MAIN_SEPARATOR_STR, Path, PathBuf};

View file

@ -1,13 +1,13 @@
//! Rust has a very popular `memchr` crate. It's quite fast, so you may ask yourself
//! why we don't just use it: Simply put, this is optimized for short inputs.
//! `memchr`, but with two needles.
use std::ptr;
use super::distance;
/// memchr(), but with two needles.
/// Returns the index of the first occurrence of either needle in the `haystack`.
/// If no needle is found, `haystack.len()` is returned.
/// `memchr`, but with two needles.
///
/// Returns the index of the first occurrence of either needle in the
/// `haystack`. If no needle is found, `haystack.len()` is returned.
/// `offset` specifies the index to start searching from.
pub fn memchr2(needle1: u8, needle2: u8, haystack: &[u8], offset: usize) -> usize {
unsafe {

View file

@ -1,16 +1,15 @@
//! Rust has a very popular `memchr` crate. It's quite fast, so you may ask yourself
//! why we don't just use it: Simply put, this is optimized for short inputs.
//! `memchr`, but with two needles.
use std::ptr;
use super::distance;
/// Same as `memchr2`, but searches from the end of the haystack.
/// If no needle is found, 0 is returned.
/// `memchr`, but with two needles.
///
/// *NOTE: Unlike `memchr2` (or `memrchr`), an offset PAST the hit is returned.*
/// This is because this function is primarily used for `unicode::newlines_backward`,
/// which needs exactly that.
/// If no needle is found, 0 is returned.
/// Unlike `memchr2` (or `memrchr`), an offset PAST the hit is returned.
/// This is because this function is primarily used for
/// `ucd::newlines_backward`, which needs exactly that.
pub fn memrchr2(needle1: u8, needle2: u8, haystack: &[u8], offset: usize) -> Option<usize> {
unsafe {
let beg = haystack.as_ptr();

View file

@ -1,21 +1,25 @@
//! This module provides a `memset` function for "arbitrary" sizes (1/2/4/8 bytes), as the regular `memset`
//! is only implemented for byte-sized arrays. This allows us to more aggressively unroll loops and to
//! use AVX2 on x64 for the non-byte-sized cases and opens the door to compiling with `-Copt-level=s`.
//! `memchr` for arbitrary sizes (1/2/4/8 bytes).
//!
//! This implementation uses SWAR to only have a single implementation for all 4 sizes: By duplicating smaller
//! types into a larger `u64` register we can treat all sizes as if they were `u64`. The only thing we need
//! to take care of then, is the tail end of the array, where we need to write 0-7 additional bytes.
//! Clang calls the C `memset` function only for byte-sized types (or 0 fills).
//! We however need to fill other types as well. For that, clang generates
//! SIMD loops under higher optimization levels. With `-Os` however, it only
//! generates a trivial loop which is too slow for our needs.
//!
//! This implementation uses SWAR to only have a single implementation for all
//! 4 sizes: By duplicating smaller types into a larger `u64` register we can
//! treat all sizes as if they were `u64`. The only thing we need to take care
//! of is the tail end of the array, which needs to write 0-7 additional bytes.
use std::mem;
use super::distance;
/// A trait to mark types that are safe to use with `memset`.
/// A marker trait for types that are safe to `memset`.
///
/// # Safety
///
/// Just like with C's `memset`, bad things happen
/// if you use this with types that are non-trivial.
/// if you use this with non-trivial types.
pub unsafe trait MemsetSafe: Copy {}
unsafe impl MemsetSafe for u8 {}
@ -30,6 +34,7 @@ unsafe impl MemsetSafe for i32 {}
unsafe impl MemsetSafe for i64 {}
unsafe impl MemsetSafe for isize {}
/// Fills a slice with the given value.
#[inline]
pub fn memset<T: MemsetSafe>(dst: &mut [T], val: T) {
unsafe {

View file

@ -1,3 +1,5 @@
//! Provides various high-throughput utilities.
mod memchr2;
mod memrchr2;
mod memset;

View file

@ -1,3 +1,5 @@
//! Platform abstractions.
use std::fs::File;
use std::path::Path;

View file

@ -1,3 +1,8 @@
//! Unix-specific platform code.
//!
//! Read the `windows` module for reference.
//! TODO: This reminds me that the sys API should probably be a trait.
use std::ffi::{CStr, c_int, c_void};
use std::fs::{self, File};
use std::mem::{self, MaybeUninit};

View file

@ -73,6 +73,7 @@ extern "system" fn console_ctrl_handler(_ctrl_type: u32) -> Foundation::BOOL {
1
}
/// Initializes the platform-specific state.
pub fn init() -> apperr::Result<Deinit> {
unsafe {
// Get the stdin and stdout handles first, so that if this function fails,
@ -151,6 +152,7 @@ impl Drop for Deinit {
}
}
/// Switches the terminal into raw mode, etc.
pub fn switch_modes() -> apperr::Result<()> {
unsafe {
check_bool_return(Console::SetConsoleCtrlHandler(Some(console_ctrl_handler), 1))?;
@ -180,6 +182,10 @@ pub fn switch_modes() -> apperr::Result<()> {
}
}
/// During startup we need to get the window size from the terminal.
/// Because I didn't want to type a bunch of code, this function tells
/// [`read_stdin`] to inject a fake sequence, which gets picked up by
/// the input parser and provided to the TUI code.
pub fn inject_window_size_into_stdin() {
unsafe {
STATE.inject_resize = true;
@ -202,9 +208,11 @@ fn get_console_size() -> Option<Size> {
/// Reads from stdin.
///
/// Returns `None` if there was an error reading from stdin.
/// Returns `Some("")` if the given timeout was reached.
/// Otherwise, it returns the read, non-empty string.
/// # Returns
///
/// * `None` if there was an error reading from stdin.
/// * `Some("")` if the given timeout was reached.
/// * Otherwise, it returns the read, non-empty string.
pub fn read_stdin(arena: &Arena, mut timeout: time::Duration) -> Option<ArenaString<'_>> {
let scratch = scratch_arena(Some(arena));
@ -351,6 +359,10 @@ pub fn read_stdin(arena: &Arena, mut timeout: time::Duration) -> Option<ArenaStr
Some(text)
}
/// Writes a string to stdout.
///
/// Use this instead of `print!` or `println!` to avoid
/// the overhead of Rust's stdio handling. Don't need that.
pub fn write_stdout(text: &str) {
unsafe {
let mut offset = 0;
@ -368,6 +380,12 @@ pub fn write_stdout(text: &str) {
}
}
/// Check if the stdin handle is redirected to a file, etc.
///
/// # Returns
///
/// * `Some(file)` if stdin is redirected.
/// * Otherwise, `None`.
pub fn open_stdin_if_redirected() -> Option<File> {
unsafe {
let handle = Console::GetStdHandle(Console::STD_INPUT_HANDLE);
@ -376,12 +394,14 @@ pub fn open_stdin_if_redirected() -> Option<File> {
}
}
/// A unique identifier for a file.
#[derive(Clone)]
#[repr(transparent)]
pub struct FileId(FileSystem::FILE_ID_INFO);
impl PartialEq for FileId {
fn eq(&self, other: &Self) -> bool {
// Lowers to an efficient word-wise comparison.
const SIZE: usize = std::mem::size_of::<FileSystem::FILE_ID_INFO>();
let a: &[u8; SIZE] = unsafe { mem::transmute(&self.0) };
let b: &[u8; SIZE] = unsafe { mem::transmute(&other.0) };
@ -405,6 +425,10 @@ pub fn file_id(file: &File) -> apperr::Result<FileId> {
}
}
/// Canonicalizes the given path.
///
/// This differs from [`fs::canonicalize`] in that it strips the `\\?\` UNC
/// prefix on Windows. This is because it's confusing/ugly when displaying it.
pub fn canonicalize(path: &Path) -> std::io::Result<PathBuf> {
let mut path = fs::canonicalize(path)?;
let path = path.as_mut_os_string();
@ -421,8 +445,8 @@ pub fn canonicalize(path: &Path) -> std::io::Result<PathBuf> {
}
/// Reserves a virtual memory region of the given size.
/// To commit the memory, use `virtual_commit`.
/// To release the memory, use `virtual_release`.
/// To commit the memory, use [`virtual_commit`].
/// To release the memory, use [`virtual_release`].
///
/// # Safety
///
@ -456,7 +480,7 @@ pub unsafe fn virtual_reserve(size: usize) -> apperr::Result<NonNull<u8>> {
/// # Safety
///
/// This function is unsafe because it uses raw pointers.
/// Make sure to only pass pointers acquired from `virtual_reserve`.
/// Make sure to only pass pointers acquired from [`virtual_reserve`].
pub unsafe fn virtual_release(base: NonNull<u8>, size: usize) {
unsafe {
Memory::VirtualFree(base.as_ptr() as *mut _, size, Memory::MEM_RELEASE);
@ -468,8 +492,8 @@ pub unsafe fn virtual_release(base: NonNull<u8>, size: usize) {
/// # Safety
///
/// This function is unsafe because it uses raw pointers.
/// Make sure to only pass pointers acquired from `virtual_reserve`
/// and to pass a size less than or equal to the size passed to `virtual_reserve`.
/// Make sure to only pass pointers acquired from [`virtual_reserve`]
/// and to pass a size less than or equal to the size passed to [`virtual_reserve`].
pub unsafe fn virtual_commit(base: NonNull<u8>, size: usize) -> apperr::Result<()> {
unsafe {
check_ptr_return(Memory::VirtualAlloc(
@ -511,14 +535,17 @@ pub unsafe fn get_proc_address<T>(handle: NonNull<c_void>, name: &CStr) -> apper
}
}
/// Loads the "common" portion of ICU4C.
pub fn load_libicuuc() -> apperr::Result<NonNull<c_void>> {
unsafe { load_library(w!("icuuc.dll")) }
}
/// Loads the internationalization portion of ICU4C.
pub fn load_libicui18n() -> apperr::Result<NonNull<c_void>> {
unsafe { load_library(w!("icuin.dll")) }
}
/// Returns a list of preferred languages for the current user.
pub fn preferred_languages(arena: &Arena) -> Vec<ArenaString, &Arena> {
// If the GetUserPreferredUILanguages() don't fit into 512 characters,
// honestly, just give up. How many languages do you realistically need?
@ -606,6 +633,7 @@ pub(crate) fn io_error_to_apperr(err: std::io::Error) -> apperr::Error {
gle_to_apperr(err.raw_os_error().unwrap_or(0) as u32)
}
/// Formats a platform error code into a human-readable string.
pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Result {
unsafe {
let mut ptr: *mut u8 = null_mut();
@ -635,6 +663,7 @@ pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Re
}
}
/// Checks if the given error is a "file not found" error.
pub fn apperr_is_not_found(err: apperr::Error) -> bool {
err == gle_to_apperr(Foundation::ERROR_FILE_NOT_FOUND)
}

File diff suppressed because it is too large Load diff

View file

@ -6,17 +6,24 @@ use crate::document::ReadableDocument;
use crate::helpers::{CoordType, Point};
use crate::simd::{memchr2, memrchr2};
/// Stores a position inside a [`ReadableDocument`].
///
/// The cursor tracks both the absolute byte-offset,
/// as well as the position in terminal-related coordinates.
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq)]
pub struct Cursor {
/// Offset in bytes within the buffer.
pub offset: usize,
/// Position in the buffer in lines (.y) and grapheme clusters (.x).
///
/// Line wrapping has NO influence on this.
pub logical_pos: Point,
/// Position in the buffer in laid out rows (.y) and columns (.x).
///
/// Line wrapping has an influence on this.
pub visual_pos: Point,
/// Horizontal position in visual columns.
///
/// Line wrapping has NO influence on this and if word wrap is disabled,
/// it's identical to `visual_pos.x`. This is useful for calculating tab widths.
pub column: CoordType,
@ -27,6 +34,7 @@ pub struct Cursor {
pub wrap_opp: bool,
}
/// Your entrypoint to navigating inside a [`ReadableDocument`].
#[derive(Clone)]
pub struct MeasurementConfig<'doc> {
buffer: &'doc dyn ReadableDocument,
@ -36,25 +44,41 @@ pub struct MeasurementConfig<'doc> {
}
impl<'doc> MeasurementConfig<'doc> {
/// Creates a new [`MeasurementConfig`] for the given document.
pub fn new(buffer: &'doc dyn ReadableDocument) -> Self {
Self { buffer, tab_size: 8, word_wrap_column: 0, cursor: Default::default() }
}
/// Sets the tab size.
///
/// Defaults to 8, because that's what a tab in terminals evaluates to.
pub fn with_tab_size(mut self, tab_size: CoordType) -> Self {
self.tab_size = tab_size.max(1);
self
}
/// You want word wrap? Set it here!
///
/// Defaults to 0, which means no word wrap.
pub fn with_word_wrap_column(mut self, word_wrap_column: CoordType) -> Self {
self.word_wrap_column = word_wrap_column;
self
}
/// Sets the initial cursor to the given position.
///
/// WARNING: While the code doesn't panic if the cursor is invalid,
/// the results will obviously be complete garbage.
pub fn with_cursor(mut self, cursor: Cursor) -> Self {
self.cursor = cursor;
self
}
/// Navigates **forward** to the given absolute offset.
///
/// # Returns
///
/// The cursor position after the navigation.
pub fn goto_offset(&mut self, offset: usize) -> Cursor {
self.cursor = Self::measure_forward(
self.tab_size,
@ -68,6 +92,13 @@ impl<'doc> MeasurementConfig<'doc> {
self.cursor
}
/// Navigates **forward** to the given logical position.
///
/// Logical positions are in lines and grapheme clusters.
///
/// # Returns
///
/// The cursor position after the navigation.
pub fn goto_logical(&mut self, logical_target: Point) -> Cursor {
self.cursor = Self::measure_forward(
self.tab_size,
@ -81,6 +112,13 @@ impl<'doc> MeasurementConfig<'doc> {
self.cursor
}
/// Navigates **forward** to the given visual position.
///
/// Visual positions are in laid out rows and columns.
///
/// # Returns
///
/// The cursor position after the navigation.
pub fn goto_visual(&mut self, visual_target: Point) -> Cursor {
self.cursor = Self::measure_forward(
self.tab_size,
@ -94,6 +132,7 @@ impl<'doc> MeasurementConfig<'doc> {
self.cursor
}
/// Returns the current cursor position.
pub fn cursor(&self) -> Cursor {
self.cursor
}
@ -447,10 +486,33 @@ impl<'doc> MeasurementConfig<'doc> {
}
}
// TODO: This code could be optimized by replacing memchr with manual line counting.
// If `line_stop` is very far away, we could accumulate newline counts horizontally
// in a AVX2 register (= 32 u8 slots). Then, every 256 bytes we compute the horizontal
// sum via `_mm256_sad_epu8` yielding us the newline count in the last block.
/// Seeks forward to to the given line start.
///
/// If given a piece of `text`, and assuming you're currently at `offset` which
/// is on the logical line `line`, this will seek forward until the logical line
/// `line_stop` is reached. For instance, if `line` is 0 and `line_stop` is 2,
/// it'll seek forward past 2 line feeds.
///
/// This function always stops exactly past a line feed
/// and thus returns a position at the start of a line.
///
/// # Warning
///
/// If the end of `text` is hit before reaching `line_stop`, the function
/// will return an offset of `text.len()`, not at the start of a line.
///
/// # Parameters
///
/// * `text`: The text to search in.
/// * `offset`: The offset to start searching from.
/// * `line`: The current line.
/// * `line_stop`: The line to stop at.
///
/// # Returns
///
/// A tuple consisting of:
/// * The new offset.
/// * The line number that was reached.
pub fn newlines_forward(
text: &[u8],
mut offset: usize,
@ -467,6 +529,13 @@ pub fn newlines_forward(
offset = offset.min(len);
loop {
// TODO: This code could be optimized by replacing memchr with manual line counting.
//
// If `line_stop` is very far away, we could accumulate newline counts horizontally
// in a AVX2 register (= 32 u8 slots). Then, every 256 bytes we compute the horizontal
// sum via `_mm256_sad_epu8` yielding us the newline count in the last block.
//
// We could also just use `_mm256_sad_epu8` on each fetch as-is.
offset = memchr2(b'\n', b'\n', text, offset);
if offset >= len {
break;
@ -482,9 +551,18 @@ pub fn newlines_forward(
(offset, line)
}
// Seeks to the start of the given line.
// No matter what parameters are given, it only returns an offset at the start of a line.
// Put differently, even if `line == line_stop`, it'll seek backward to the line start.
/// Seeks backward to the given line start.
///
/// See [`newlines_forward`] for details.
/// This function does almost the same thing, but in reverse.
///
/// # Warning
///
/// In addition to the notes in [`newlines_forward`]:
///
/// No matter what parameters are given, [`newlines_backward`] only returns an
/// offset at the start of a line. Put differently, even if `line == line_stop`,
/// it'll seek backward to the line start.
pub fn newlines_backward(
text: &[u8],
mut offset: usize,
@ -506,6 +584,10 @@ pub fn newlines_backward(
}
}
/// Returns an offset past a newline.
///
/// If `offset` is right in front of a newline,
/// this will return the offset past said newline.
pub fn skip_newline(text: &[u8], mut offset: usize) -> usize {
if offset >= text.len() {
return offset;
@ -522,6 +604,7 @@ pub fn skip_newline(text: &[u8], mut offset: usize) -> usize {
offset
}
/// Strips a trailing newline from the given text.
pub fn strip_newline(mut text: &[u8]) -> &[u8] {
// Rust generates surprisingly tight assembly for this.
if text.last() == Some(&b'\n') {

View file

@ -1,3 +1,5 @@
//! Everything related to Unicode lives here.
mod measurement;
mod tables;
mod utf8;

View file

@ -1,5 +1,14 @@
use std::{hint, iter};
/// An iterator over UTF-8 encoded characters.
///
/// This differs from [`std::str::Chars`] in that it works on unsanitized
/// byte slices and transparently replaces invalid UTF-8 sequences with U+FFFD.
///
/// This follows ICU's bitmask approach for `U8_NEXT_OR_FFFD` relatively
/// closely. This is important for compatibility, because it implements the
/// WHATWG recommendation for UTF8 error recovery. It's also helpful, because
/// the excellent folks at ICU have probably spent a lot of time optimizing it.
#[derive(Clone, Copy)]
pub struct Utf8Chars<'a> {
source: &'a [u8],
@ -7,30 +16,39 @@ pub struct Utf8Chars<'a> {
}
impl<'a> Utf8Chars<'a> {
/// Creates a new `Utf8Chars` iterator starting at the given `offset`.
pub fn new(source: &'a [u8], offset: usize) -> Self {
Self { source, offset }
}
/// Returns the byte slice this iterator was created with.
pub fn source(&self) -> &'a [u8] {
self.source
}
/// Checks if the source is empty.
pub fn is_empty(&self) -> bool {
self.source.is_empty()
}
/// Returns the length of the source.
pub fn len(&self) -> usize {
self.source.len()
}
/// Returns the current offset in the byte slice.
///
/// This will be past the last returned character.
pub fn offset(&self) -> usize {
self.offset
}
/// Sets the offset to continue iterating from.
pub fn seek(&mut self, offset: usize) {
self.offset = offset;
}
/// Returns true if `next` will return another character.
pub fn has_next(&self) -> bool {
self.offset < self.source.len()
}
@ -39,9 +57,6 @@ impl<'a> Utf8Chars<'a> {
// performance actually suffers when this gets inlined.
#[cold]
fn next_slow(&mut self, c: u8) -> char {
// See: https://datatracker.ietf.org/doc/html/rfc3629
// as well as ICU's `utf8.h` for the bitmask approach.
if self.offset >= self.source.len() {
return Self::fffd();
}
@ -114,12 +129,10 @@ impl<'a> Utf8Chars<'a> {
// The trail byte is the index and the lead byte mask is the value.
// This is because the split at 0x90 requires more bits than fit into an u8.
const TRAIL1_LEAD_BITS: [u8; 16] = [
// +------ 0xF4 lead
// |+----- 0xF3 lead
// ||+---- 0xF2 lead
// |||+--- 0xF1 lead
// ||||+-- 0xF0 lead
// vvvvv
// --------- 0xF4 lead
// | ...
// | +---- 0xF0 lead
// v v
0b_00000, //
0b_00000, //
0b_00000, //
@ -143,6 +156,8 @@ impl<'a> Utf8Chars<'a> {
cp &= !0xF0;
// Now we can verify if it's actually <= 0xF4.
// Curiously, this if condition does a lot of heavy lifting for
// performance (+13%). I think it's just a coincidence though.
if cp > 4 {
return Self::fffd();
}
@ -191,7 +206,8 @@ impl<'a> Utf8Chars<'a> {
}
}
// Improves performance by ~5% and reduces code size.
// This simultaneously serves as a `cold_path` marker.
// It improves performance by ~5% and reduces code size.
#[cold]
#[inline(always)]
fn fffd() -> char {
@ -202,8 +218,6 @@ impl<'a> Utf8Chars<'a> {
impl Iterator for Utf8Chars<'_> {
type Item = char;
// At opt-level="s", this function doesn't get inlined,
// but performance greatly suffers in that case.
#[inline]
fn next(&mut self) -> Option<Self::Item> {
if self.offset >= self.source.len() {

View file

@ -1,19 +1,38 @@
//! Our VT parser.
use std::{mem, time};
use crate::simd::memchr2;
/// The parser produces these tokens.
pub enum Token<'parser, 'input> {
/// A bunch of text. Doesn't contain any control characters.
Text(&'input str),
/// A single control character, like backspace or return.
Ctrl(char),
/// We encountered `ESC x` and this contains `x`.
Esc(char),
/// We encountered `ESC O x` and this contains `x`.
SS3(char),
/// A CSI sequence started with `ESC [`.
///
/// They are the most common escape sequences. See [`Csi`].
Csi(&'parser Csi),
/// An OSC sequence started with `ESC ]`.
///
/// The sequence may be split up into multiple tokens if the input
/// is given in chunks. This is indicated by the `partial` field.
Osc { data: &'input str, partial: bool },
/// An DCS sequence started with `ESC P`.
///
/// The sequence may be split up into multiple tokens if the input
/// is given in chunks. This is indicated by the `partial` field.
Dcs { data: &'input str, partial: bool },
}
/// Stores the state of the parser.
#[derive(Clone, Copy)]
pub enum State {
enum State {
Ground,
Esc,
Ss3,
@ -24,10 +43,20 @@ pub enum State {
DcsEsc,
}
/// A single CSI sequence, parsed for your convenience.
pub struct Csi {
/// The parameters of the CSI sequence.
pub params: [u16; 32],
/// The number of parameters stored in [`Csi::params`].
pub param_count: usize,
/// The private byte, if any. `0` if none.
///
/// The private byte is the first character right after the
/// `ESC [` sequence. It is usually a `?` or `<`.
pub private_byte: char,
/// The final byte of the CSI sequence.
///
/// This is the last character of the sequence, e.g. `m` or `H`.
pub final_byte: char,
}
@ -73,6 +102,9 @@ impl Parser {
}
}
/// An iterator that parses VT sequences into [`Token`]s.
///
/// Can't implement [`Iterator`], because this is a "lending iterator".
pub struct Stream<'parser, 'input> {
parser: &'parser mut Parser,
input: &'input str,
@ -80,10 +112,12 @@ pub struct Stream<'parser, 'input> {
}
impl<'parser, 'input> Stream<'parser, 'input> {
/// Returns the input that is being parsed.
pub fn input(&self) -> &'input str {
self.input
}
/// Returns the current parser offset.
pub fn offset(&self) -> usize {
self.off
}
@ -99,8 +133,6 @@ impl<'parser, 'input> Stream<'parser, 'input> {
}
/// Parses the next VT sequence from the previously given input.
///
/// Can't implement Iterator, because this is a "lending iterator".
#[allow(clippy::should_implement_trait)]
pub fn next(&mut self) -> Option<Token<'parser, 'input>> {
// I don't know how to tell Rust that `self.parser` and its lifetime

View file

@ -0,0 +1,15 @@
# Grapheme Table Generator
This tool processes Unicode Character Database (UCD) XML files to generate efficient, multi-stage trie lookup tables for properties relevant to terminal applications:
* Grapheme cluster breaking rules
* Line breaking rules (optional)
* Character width properties
## Usage
* Download [ucd.nounihan.grouped.zip](https://www.unicode.org/Public/UCD/latest/ucdxml/ucd.nounihan.grouped.zip)
* Run some equivalent of:
```sh
grapheme-table-gen --lang=rust --extended --no-ambiguous --line-breaks path/to/ucd.nounihan.grouped.xml
```
* Place the result in `src/unicode/tables.rs`