mirror of
https://github.com/microsoft/edit.git
synced 2025-07-02 06:01:12 +00:00
Document everything
This commit is contained in:
parent
3ba67f7613
commit
293ea36c49
33 changed files with 1229 additions and 147 deletions
49
CONTRIBUTING.md
Normal file
49
CONTRIBUTING.md
Normal file
|
@ -0,0 +1,49 @@
|
|||
# Contributing
|
||||
|
||||
## Translation improvements
|
||||
|
||||
You can find our translations in [`src/bin/edit/localization.rs`](./src/bin/edit/localization.rs).
|
||||
Please feel free to open a pull request with your changes at any time.
|
||||
If you'd like to discuss your changes first, please feel free to open an issue.
|
||||
|
||||
## Bug reports
|
||||
|
||||
If you find any bugs, we gladly accept pull requests without prior discussion.
|
||||
Otherwise, you can of course always open an issue for us to look into.
|
||||
|
||||
## Feature requests
|
||||
|
||||
Please open a new issue for any feature requests you have in mind.
|
||||
Keeping the binary size of the editor small is a priority for us and so we may need to discuss any new features first until we have support for plugins.
|
||||
|
||||
## Code changes
|
||||
|
||||
The project has a focus on a small binary size and sufficient (good) performance.
|
||||
As such, we generally do not accept pull requests that introduce dependencies (there are always exceptions of course).
|
||||
Otherwise, you can consider this project a playground for trying out any cool ideas you have.
|
||||
|
||||
The overall architecture of the project can be summarized as follows:
|
||||
* The underlying text buffer in `src/buffer` doesn't keep track of line breaks in the document.
|
||||
This is a crucial design aspect that permeates throughout the entire codebase.
|
||||
|
||||
To oversimplify, the *only* state that is kept is the current cursor position.
|
||||
When the user asks to move to another line, the editor will `O(n)` seek through the underlying document until it found the corresponding number of line breaks.
|
||||
* As a result, `src/simd` contains crucial `memchr2` functions to quickly find the next or previous line break (runs at up to >100GB/s).
|
||||
* Furthermore, `src/unicode` implements an `Utf8Chars` iterator which transparently inserts U+FFFD replacements during iteration (runs at up to 4GB/s).
|
||||
* Furthermore, `src/unicode` also implements grapheme cluster segmentation and cluster width measurement via its `MeasurementConfig` (runs at up to 600MB/s).
|
||||
* If word wrap is disabled, `memchr2` is used for all navigation across lines, allowing us to breeze through 1GB large files as if they were 1MB.
|
||||
* Even if word-wrap is enabled, it's still sufficiently smooth thanks to `MeasurementConfig`. This is only possible because these base functions are heavily optimized.
|
||||
* `src/framebuffer.rs` implements a "framebuffer" like in video games.
|
||||
It allows us to draw the UI output into an intermediate buffer first, accumulating all changes and handling things like color blending.
|
||||
Then, it can compare the accumulated output with the previous frame and only send the necessary changes to the terminal.
|
||||
* `src/tui.rs` implements an immediate mode UI. Its module implementation gives an overview how it works and I recommend reading it.
|
||||
* `src/vt.rs` implements our VT parser.
|
||||
* `src/sys` contains our platform abstractions.
|
||||
* Finally, `src/bin/edit` ties everything together.
|
||||
It's roughly 90% UI code and business logic.
|
||||
It contains a little bit of VT logic in `setup_terminal`.
|
||||
|
||||
If you have an issue with your terminal, the places of interest are the aforementioned:
|
||||
* VT parser in `src/vt.rs`
|
||||
* Platform specific code in `src/sys`
|
||||
* And the `setup_terminal` function in `src/bin/edit/main.rs`
|
21
README.md
21
README.md
|
@ -1,3 +1,20 @@
|
|||
# MS-DOS Editor Redux
|
||||
# Microsoft Edit
|
||||
|
||||
TBA
|
||||
A simple editor for simple needs.
|
||||
|
||||
This editor pays homage to the classic [MS-DOS Editor](https://en.wikipedia.org/wiki/MS-DOS_Editor), but with a modern interface and modern input controls similar to VS Code. The goal is to provide an accessible editor, even those largely unfamiliar with terminals can use.
|
||||
|
||||
## Installation
|
||||
|
||||
* Download the latest release from our [releases page](https://github.com/microsoft/edit/releases/latest)
|
||||
* Extract the archive
|
||||
* Copy the `edit` binary to a directory in your `PATH`
|
||||
* You may delete any other files in the archive if you don't need them
|
||||
|
||||
## Build Instructions
|
||||
|
||||
* [Install Rust](https://www.rust-lang.org/tools/install)
|
||||
* Install the nightly toolchain: `rustup install nightly`
|
||||
* Alternatively, set the environment variable `RUSTC_BOOTSTRAP=1`
|
||||
* Clone the repository
|
||||
* For a release build run: `cargo build --config .cargo/release.toml --release`
|
||||
|
|
|
@ -1,12 +1,16 @@
|
|||
//! Provides a transparent error type for edit.
|
||||
|
||||
use std::{io, result};
|
||||
|
||||
use crate::sys;
|
||||
|
||||
// Remember to add an entry to `Error::message()` for each new error.
|
||||
pub const APP_ICU_MISSING: Error = Error::new_app(0);
|
||||
|
||||
/// Edit's transparent `Result` type.
|
||||
pub type Result<T> = result::Result<T, Error>;
|
||||
|
||||
/// Edit's transparent `Error` type.
|
||||
/// Abstracts over system and application errors.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum Error {
|
||||
App(u32),
|
||||
|
|
|
@ -7,9 +7,34 @@ use std::ptr::NonNull;
|
|||
use super::release;
|
||||
use crate::apperr;
|
||||
|
||||
/// A debug wrapper for [`release::Arena`].
|
||||
///
|
||||
/// The problem with [`super::ScratchArena`] is that it only "borrows" an underlying
|
||||
/// [`release::Arena`]. Once the [`super::ScratchArena`] is dropped it resets the watermark
|
||||
/// of the underlying [`release::Arena`], freeing all allocations done since borrowing it.
|
||||
///
|
||||
/// It is completely valid for the same [`release::Arena`] to be borrowed multiple times at once,
|
||||
/// *as long as* you only use the most recent borrow. Bad example:
|
||||
/// ```should_panic
|
||||
/// use edit::arena::scratch_arena;
|
||||
///
|
||||
/// let mut scratch1 = scratch_arena(None);
|
||||
/// let mut scratch2 = scratch_arena(None);
|
||||
///
|
||||
/// let foo = scratch1.alloc_uninit::<usize>();
|
||||
///
|
||||
/// // This will also reset `scratch1`'s allocation.
|
||||
/// drop(scratch2);
|
||||
///
|
||||
/// *foo; // BOOM! ...if it wasn't for our debug wrapper.
|
||||
/// ```
|
||||
///
|
||||
/// To avoid this, this wraps the real [`release::Arena`] in a "debug" one, which pretends as if every
|
||||
/// instance of itself is a distinct [`release::Arena`] instance. Then we use this "debug" [`release::Arena`]
|
||||
/// for [`super::ScratchArena`] which allows us to track which borrow is the most recent one.
|
||||
pub enum Arena {
|
||||
// Delegate is 'static, because release::Arena requires no lifetime
|
||||
// annotations, and so this struct cannot use them either.
|
||||
// annotations, and so this mere debug helper cannot use them either.
|
||||
Delegated { delegate: &'static release::Arena, borrow: usize },
|
||||
Owned { arena: release::Arena },
|
||||
}
|
||||
|
|
|
@ -1,12 +1,14 @@
|
|||
//! Arena allocators. Small and fast.
|
||||
|
||||
#[cfg(debug_assertions)]
|
||||
mod debug;
|
||||
mod release;
|
||||
mod scratch;
|
||||
mod string;
|
||||
|
||||
#[cfg(debug_assertions)]
|
||||
#[cfg(all(not(doc), debug_assertions))]
|
||||
pub use self::debug::Arena;
|
||||
#[cfg(not(debug_assertions))]
|
||||
#[cfg(any(doc, not(debug_assertions)))]
|
||||
pub use self::release::Arena;
|
||||
pub use self::scratch::{ScratchArena, init, scratch_arena};
|
||||
pub use self::string::ArenaString;
|
||||
|
|
|
@ -12,12 +12,36 @@ use crate::{apperr, sys};
|
|||
|
||||
const ALLOC_CHUNK_SIZE: usize = 64 * KIBI;
|
||||
|
||||
/// An arena allocator.
|
||||
///
|
||||
/// If you have never used an arena allocator before, think of it as
|
||||
/// allocating objects on the stack, but the stack is *really* big.
|
||||
/// Each time you allocate, memory gets pushed at the end of the stack,
|
||||
/// each time you deallocate, memory gets popped from the end of the stack.
|
||||
///
|
||||
/// One reason you'd want to use this is obviously performance: It's very simple
|
||||
/// and so it's also very fast, >10x faster than your system allocator.
|
||||
///
|
||||
/// However, modern allocators such as `mimalloc` are just as fast, so why not use them?
|
||||
/// Because their performance comes at the cost of binary size and we can't have that.
|
||||
///
|
||||
/// The biggest benefit though is that it sometimes massively simplifies lifetime
|
||||
/// and memory management. This can best be seen by this project's UI code, which
|
||||
/// uses an arena to allocate a tree of UI nodes. This is infameously difficult
|
||||
/// to do in Rust, but not so when you got an arena allocator:
|
||||
/// All nodes have the same lifetime, so you can just use references.
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// **Do not** push objects into the arena that require destructors.
|
||||
/// Destructors are not executed. Use a pool allocator for that.
|
||||
pub struct Arena {
|
||||
base: NonNull<u8>,
|
||||
capacity: usize,
|
||||
commit: Cell<usize>,
|
||||
offset: Cell<usize>,
|
||||
|
||||
/// See [`super::debug`], which uses this for borrow tracking.
|
||||
#[cfg(debug_assertions)]
|
||||
pub(super) borrows: Cell<usize>,
|
||||
}
|
||||
|
@ -61,6 +85,7 @@ impl Arena {
|
|||
/// Obviously, this is GIGA UNSAFE. It runs no destructors and does not check
|
||||
/// whether the offset is valid. You better take care when using this function.
|
||||
pub unsafe fn reset(&self, to: usize) {
|
||||
// Fill the deallocated memory with 0xDD to aid debugging.
|
||||
if cfg!(debug_assertions) && self.offset.get() > to {
|
||||
let commit = self.commit.get();
|
||||
let len = (self.offset.get() + 128).min(commit) - to;
|
||||
|
|
|
@ -9,6 +9,7 @@ use crate::helpers::*;
|
|||
static mut S_SCRATCH: [release::Arena; 2] =
|
||||
const { [release::Arena::empty(), release::Arena::empty()] };
|
||||
|
||||
/// Call this before using [`scratch_arena`].
|
||||
pub fn init() -> apperr::Result<()> {
|
||||
unsafe {
|
||||
for s in &mut S_SCRATCH[..] {
|
||||
|
@ -18,8 +19,27 @@ pub fn init() -> apperr::Result<()> {
|
|||
Ok(())
|
||||
}
|
||||
|
||||
/// Returns a new scratch arena for temporary allocations,
|
||||
/// ensuring it doesn't conflict with the provided arena.
|
||||
/// Need an arena for temporary allocations? [`scratch_arena`] got you covered.
|
||||
/// Call [`scratch_arena`] and it'll return an [`Arena`] that resets when it goes out of scope.
|
||||
///
|
||||
/// ---
|
||||
///
|
||||
/// Most methods make just two kinds of allocations:
|
||||
/// * Interior: Temporary data that can be deallocated when the function returns.
|
||||
/// * Exterior: Data that is returned to the caller and must remain alive until the caller stops using it.
|
||||
///
|
||||
/// Such methods only have two lifetimes, for which you consequently also only need two arenas.
|
||||
/// ...even if your method calls other methods recursively! This is because the exterior allocations
|
||||
/// of a callee are simply interior allocations to the caller, and so on, recursively.
|
||||
///
|
||||
/// This works as long as the two arenas flip/flop between being used as interior/exterior allocator
|
||||
/// along the callstack. To ensure that is the case, we use a recursion counter in debug builds.
|
||||
///
|
||||
/// This approach was described among others at: <https://nullprogram.com/blog/2023/09/27/>
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// If your function takes an [`Arena`] argument, you **MUST** pass it to `scratch_arena` as `Some(&arena)`.
|
||||
pub fn scratch_arena(conflict: Option<&Arena>) -> ScratchArena<'static> {
|
||||
unsafe {
|
||||
#[cfg(debug_assertions)]
|
||||
|
@ -31,18 +51,9 @@ pub fn scratch_arena(conflict: Option<&Arena>) -> ScratchArena<'static> {
|
|||
}
|
||||
}
|
||||
|
||||
// Most methods make just two kinds of allocations:
|
||||
// * Interior: Temporary data that can be deallocated when the function returns.
|
||||
// * Exterior: Data that is returned to the caller and must remain alive until the caller stops using it.
|
||||
//
|
||||
// Such methods only have two lifetimes, for which you consequently also only need two arenas.
|
||||
// ...even if your method calls other methods recursively! This is because the exterior allocations
|
||||
// of a callee are simply interior allocations to the caller, and so on, recursively.
|
||||
//
|
||||
// This works as long as the two arenas flip/flop between being used as interior/exterior allocator
|
||||
// along the callstack. To ensure that is the case, we use a recursion counter in debug builds.
|
||||
//
|
||||
// This approach was described among others at: https://nullprogram.com/blog/2023/09/27/
|
||||
/// Borrows an [`Arena`] for temporary allocations.
|
||||
///
|
||||
/// See [`scratch_arena`].
|
||||
#[cfg(debug_assertions)]
|
||||
pub struct ScratchArena<'a> {
|
||||
arena: debug::Arena,
|
||||
|
|
|
@ -4,49 +4,63 @@ use std::ops::{Bound, Deref, DerefMut, RangeBounds};
|
|||
use super::Arena;
|
||||
use crate::helpers::*;
|
||||
|
||||
/// A custom string type, because `std` lacks allocator support for [`String`].
|
||||
///
|
||||
/// To keep things simple, this one is hardcoded to [`Arena`].
|
||||
#[derive(Clone)]
|
||||
pub struct ArenaString<'a> {
|
||||
vec: Vec<u8, &'a Arena>,
|
||||
}
|
||||
|
||||
impl<'a> ArenaString<'a> {
|
||||
/// Creates a new [`ArenaString`] in the given arena.
|
||||
#[must_use]
|
||||
pub const fn new_in(arena: &'a Arena) -> Self {
|
||||
Self { vec: Vec::new_in(arena) }
|
||||
}
|
||||
|
||||
#[inline]
|
||||
/// Turns a [`str`] into an [`ArenaString`].
|
||||
#[must_use]
|
||||
pub fn from_str(arena: &'a Arena, s: &str) -> Self {
|
||||
let mut res = Self::new_in(arena);
|
||||
res.push_str(s);
|
||||
res
|
||||
}
|
||||
|
||||
/// It says right here that you checked if `bytes` is valid UTF-8
|
||||
/// and you are sure it is. Presto! Here's an `ArenaString`!
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// It says "unchecked" right there. What did you expect?
|
||||
/// You fool! It says "unchecked" right there. Now the house is burning.
|
||||
#[inline]
|
||||
#[must_use]
|
||||
pub unsafe fn from_utf8_unchecked(bytes: Vec<u8, &'a Arena>) -> Self {
|
||||
Self { vec: bytes }
|
||||
}
|
||||
|
||||
pub fn from_utf8_lossy<'s>(arena: &'a Arena, v: &'s [u8]) -> Result<&'s str, ArenaString<'a>> {
|
||||
let mut iter = v.utf8_chunks();
|
||||
/// Checks whether `text` contains only valid UTF-8.
|
||||
/// If the entire string is valid, it returns `Ok(text)`.
|
||||
/// Otherwise, it returns `Err(ArenaString)` with all invalid sequences replaced with U+FFFD.
|
||||
pub fn from_utf8_lossy<'s>(
|
||||
arena: &'a Arena,
|
||||
text: &'s [u8],
|
||||
) -> Result<&'s str, ArenaString<'a>> {
|
||||
let mut iter = text.utf8_chunks();
|
||||
let Some(mut chunk) = iter.next() else {
|
||||
return Ok("");
|
||||
};
|
||||
|
||||
let valid = chunk.valid();
|
||||
if chunk.invalid().is_empty() {
|
||||
debug_assert_eq!(valid.len(), v.len());
|
||||
return Ok(unsafe { str::from_utf8_unchecked(v) });
|
||||
debug_assert_eq!(valid.len(), text.len());
|
||||
return Ok(unsafe { str::from_utf8_unchecked(text) });
|
||||
}
|
||||
|
||||
const REPLACEMENT: &str = "\u{FFFD}";
|
||||
|
||||
let mut res = Self::new_in(arena);
|
||||
res.reserve(v.len());
|
||||
res.reserve(text.len());
|
||||
|
||||
loop {
|
||||
res.push_str(chunk.valid());
|
||||
|
@ -62,6 +76,7 @@ impl<'a> ArenaString<'a> {
|
|||
Err(res)
|
||||
}
|
||||
|
||||
/// Turns a [`Vec<u8>`] into an [`ArenaString`], replacing invalid UTF-8 sequences with U+FFFD.
|
||||
#[must_use]
|
||||
pub fn from_utf8_lossy_owned(v: Vec<u8, &'a Arena>) -> Self {
|
||||
match Self::from_utf8_lossy(v.allocator(), &v) {
|
||||
|
@ -70,26 +85,32 @@ impl<'a> ArenaString<'a> {
|
|||
}
|
||||
}
|
||||
|
||||
/// It's empty.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.vec.is_empty()
|
||||
}
|
||||
|
||||
/// It's lengthy.
|
||||
pub fn len(&self) -> usize {
|
||||
self.vec.len()
|
||||
}
|
||||
|
||||
/// It's capacatity.
|
||||
pub fn capacity(&self) -> usize {
|
||||
self.vec.capacity()
|
||||
}
|
||||
|
||||
/// It's a [`String`], now it's a [`str`]. Wow!
|
||||
pub fn as_str(&self) -> &str {
|
||||
unsafe { str::from_utf8_unchecked(self.vec.as_slice()) }
|
||||
}
|
||||
|
||||
/// It's a [`String`], now it's a [`str`]. And it's mutable! WOW!
|
||||
pub fn as_mut_str(&mut self) -> &mut str {
|
||||
unsafe { str::from_utf8_unchecked_mut(self.vec.as_mut_slice()) }
|
||||
}
|
||||
|
||||
/// Now it's bytes!
|
||||
pub fn as_bytes(&self) -> &[u8] {
|
||||
self.vec.as_slice()
|
||||
}
|
||||
|
@ -103,22 +124,32 @@ impl<'a> ArenaString<'a> {
|
|||
&mut self.vec
|
||||
}
|
||||
|
||||
/// Reserves *additional* memory. For you old folks out there (totally not me),
|
||||
/// this is differrent from C++'s `reserve` which reserves a total size.
|
||||
pub fn reserve(&mut self, additional: usize) {
|
||||
self.vec.reserve(additional)
|
||||
}
|
||||
|
||||
/// Now it's small! Alarming!
|
||||
///
|
||||
/// *Do not* call this unless this string is the last thing on the arena.
|
||||
/// Arenas are stacks, they can't deallocate what's in the middle.
|
||||
pub fn shrink_to_fit(&mut self) {
|
||||
self.vec.shrink_to_fit()
|
||||
}
|
||||
|
||||
/// To no surprise, this clears the string.
|
||||
pub fn clear(&mut self) {
|
||||
self.vec.clear()
|
||||
}
|
||||
|
||||
/// Append some text.
|
||||
pub fn push_str(&mut self, string: &str) {
|
||||
self.vec.extend_from_slice(string.as_bytes())
|
||||
}
|
||||
|
||||
/// Append a single character.
|
||||
#[inline]
|
||||
pub fn push(&mut self, ch: char) {
|
||||
match ch.len_utf8() {
|
||||
1 => self.vec.push(ch as u8),
|
||||
|
@ -156,6 +187,7 @@ impl<'a> ArenaString<'a> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Replaces a range of characters with a new string.
|
||||
pub fn replace_range<R: RangeBounds<usize>>(&mut self, range: R, replace_with: &str) {
|
||||
match range.start_bound() {
|
||||
Bound::Included(&n) => assert!(self.is_char_boundary(n)),
|
||||
|
|
|
@ -1,19 +1,31 @@
|
|||
//! Base64 facilities.
|
||||
|
||||
use crate::arena::ArenaString;
|
||||
|
||||
const CHARSET: [u8; 64] = *b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
|
||||
|
||||
/// Encodes the given bytes as base64 and appends them to the destination string.
|
||||
pub fn encode(dst: &mut ArenaString, src: &[u8]) {
|
||||
unsafe {
|
||||
let mut inp = src.as_ptr();
|
||||
let mut remaining = src.len();
|
||||
let dst = dst.as_mut_vec();
|
||||
|
||||
// One aspect of base64 is that the encoded length can be calculated accurately in advance.
|
||||
let out_len = src.len().div_ceil(3) * 4;
|
||||
// ... we can then use this fact to reserve space all at once.
|
||||
dst.reserve(out_len);
|
||||
|
||||
// SAFETY: Getting a pointer to the reserved space is only safe
|
||||
// *after* calling `reserve()` as it may change the pointer.
|
||||
let mut out = dst.as_mut_ptr().add(dst.len());
|
||||
|
||||
if remaining != 0 {
|
||||
// Translate chunks of 3 source bytes into 4 base64-encoded bytes.
|
||||
while remaining > 3 {
|
||||
// SAFETY: Thanks to `remaining > 3`, reading 4 bytes at once is safe.
|
||||
// This improves performance massively over a byte-by-byte approach,
|
||||
// because it allows us to byte-swap the read and use simple bit-shifts below.
|
||||
let val = u32::from_be((inp as *const u32).read_unaligned());
|
||||
inp = inp.add(3);
|
||||
remaining -= 3;
|
||||
|
@ -32,6 +44,8 @@ pub fn encode(dst: &mut ArenaString, src: &[u8]) {
|
|||
let mut in1 = 0;
|
||||
let mut in2 = 0;
|
||||
|
||||
// We can simplify the following logic by assuming that there's only 1
|
||||
// byte left. If there's >1 byte left, these two '=' will be overwritten.
|
||||
*out.add(3) = b'=';
|
||||
*out.add(2) = b'=';
|
||||
|
||||
|
|
|
@ -27,7 +27,7 @@ use edit::input::{self, kbmod, vk};
|
|||
use edit::oklab::oklab_blend;
|
||||
use edit::tui::*;
|
||||
use edit::vt::{self, Token};
|
||||
use edit::{apperr, base64, path, sys};
|
||||
use edit::{apperr, base64, icu, path, sys};
|
||||
use localization::*;
|
||||
use state::*;
|
||||
|
||||
|
@ -51,6 +51,10 @@ fn main() -> process::ExitCode {
|
|||
}
|
||||
|
||||
fn run() -> apperr::Result<()> {
|
||||
let items = vec!["hello.txt", "hallo.txt", "world.txt", "Hello, world.txt"];
|
||||
let mut sorted = items.clone();
|
||||
sorted.sort_by(|a, b| icu::compare_strings(a.as_bytes(), b.as_bytes()));
|
||||
|
||||
// Init `sys` first, as everything else may depend on its functionality (IO, function pointers, etc.).
|
||||
let _sys_deinit = sys::init()?;
|
||||
// Next init `arena`, so that `scratch_arena` works. `loc` depends on it.
|
||||
|
|
|
@ -1,13 +1,17 @@
|
|||
//! A text buffer for a text editor.
|
||||
//!
|
||||
//! Implements a Unicode-aware, layout-aware text buffer for terminals.
|
||||
//! It's based on a gap buffer. It has no line cache and instead relies
|
||||
//! on the performance of the ucd module for fast text navigation.
|
||||
//!
|
||||
//! ---
|
||||
//!
|
||||
//! If the project ever outgrows a basic gap buffer (e.g. to add time travel)
|
||||
//! an ideal, alternative architecture would be a piece table with immutable trees.
|
||||
//! The tree nodes can be allocated on the same arena allocator as the added chunks,
|
||||
//! making lifetime management fairly easy. The algorithm is described here:
|
||||
//! * https://cdacamar.github.io/data%20structures/algorithms/benchmarking/text%20editors/c++/editor-data-structures/
|
||||
//! * https://github.com/cdacamar/fredbuf
|
||||
//! * <https://cdacamar.github.io/data%20structures/algorithms/benchmarking/text%20editors/c++/editor-data-structures/>
|
||||
//! * <https://github.com/cdacamar/fredbuf>
|
||||
//!
|
||||
//! The downside is that text navigation & search takes a performance hit due to small chunks.
|
||||
//! The solution to the former is to keep line caches, which further complicates the architecture.
|
||||
|
@ -36,8 +40,8 @@ use crate::framebuffer::{Framebuffer, IndexedColor};
|
|||
use crate::helpers::*;
|
||||
use crate::oklab::oklab_blend;
|
||||
use crate::simd::memchr2;
|
||||
use crate::unicode::{Cursor, MeasurementConfig};
|
||||
use crate::{apperr, icu, unicode};
|
||||
use crate::unicode::{self, Cursor, MeasurementConfig};
|
||||
use crate::{apperr, icu};
|
||||
|
||||
/// The margin template is used for line numbers.
|
||||
/// The max. line number we should ever expect is probably 64-bit,
|
||||
|
@ -47,16 +51,25 @@ const MARGIN_TEMPLATE: &str = " │ ";
|
|||
/// Happens to reuse MARGIN_TEMPLATE, because it has sufficient whitespace.
|
||||
const TAB_WHITESPACE: &str = MARGIN_TEMPLATE;
|
||||
|
||||
/// Stores statistics about the whole document.
|
||||
#[derive(Copy, Clone)]
|
||||
pub struct TextBufferStatistics {
|
||||
logical_lines: CoordType,
|
||||
visual_lines: CoordType,
|
||||
}
|
||||
|
||||
/// Stores the active text selection.
|
||||
#[derive(Copy, Clone)]
|
||||
enum TextBufferSelection {
|
||||
/// No active selection.
|
||||
None,
|
||||
/// The user is currently selecting text.
|
||||
///
|
||||
/// Moving the cursor will update the selection.
|
||||
Active { beg: Point, end: Point },
|
||||
/// The user stopped selecting text.
|
||||
///
|
||||
/// Moving the cursor will destroy the selection.
|
||||
Done { beg: Point, end: Point },
|
||||
}
|
||||
|
||||
|
@ -66,6 +79,9 @@ impl TextBufferSelection {
|
|||
}
|
||||
}
|
||||
|
||||
/// In order to group actions into a single undo step,
|
||||
/// we need to know the type of action that was performed.
|
||||
/// This stores the action type.
|
||||
#[derive(Copy, Clone, Eq, PartialEq)]
|
||||
enum HistoryType {
|
||||
Other,
|
||||
|
@ -73,11 +89,15 @@ enum HistoryType {
|
|||
Delete,
|
||||
}
|
||||
|
||||
/// An undo/redo entry.
|
||||
struct HistoryEntry {
|
||||
/// Logical cursor position before the change was made.
|
||||
/// [`TextBuffer::cursor`] position before the change was made.
|
||||
cursor_before: Point,
|
||||
/// [`TextBuffer::selection`] before the change was made.
|
||||
selection_before: TextBufferSelection,
|
||||
/// [`TextBuffer::stats`] before the change was made.
|
||||
stats_before: TextBufferStatistics,
|
||||
/// [`GapBuffer::generation`] before the change was made.
|
||||
generation_before: u32,
|
||||
/// Logical cursor position where the change took place.
|
||||
/// The position is at the start of the changed range.
|
||||
|
@ -88,21 +108,38 @@ struct HistoryEntry {
|
|||
added: Vec<u8>,
|
||||
}
|
||||
|
||||
/// Caches an ICU search operation.
|
||||
struct ActiveSearch {
|
||||
/// The search pattern.
|
||||
pattern: String,
|
||||
/// The search options.
|
||||
options: SearchOptions,
|
||||
/// The ICU `UText` object.
|
||||
text: icu::Text,
|
||||
/// The ICU `URegularExpression` object.
|
||||
regex: icu::Regex,
|
||||
/// [`GapBuffer::generation`] when the search was created.
|
||||
/// This is used to detect if we need to refresh the
|
||||
/// [`ActiveSearch::regex`] object.
|
||||
buffer_generation: u32,
|
||||
/// [`TextBuffer::selection_generation`] when the search was
|
||||
/// created. When the user manually selects text, we need to
|
||||
/// refresh the [`ActiveSearch::pattern`] with it.
|
||||
selection_generation: u32,
|
||||
/// Stores the text buffer offset in between searches.
|
||||
next_search_offset: usize,
|
||||
/// If we know there were no hits, we can skip searching.
|
||||
no_matches: bool,
|
||||
}
|
||||
|
||||
/// Options for a search operation.
|
||||
#[derive(Default, Clone, Copy, Eq, PartialEq)]
|
||||
pub struct SearchOptions {
|
||||
/// If true, the search is case-sensitive.
|
||||
pub match_case: bool,
|
||||
/// If true, the search matches whole words.
|
||||
pub whole_word: bool,
|
||||
/// If true, the search uses regex.
|
||||
pub use_regex: bool,
|
||||
}
|
||||
|
||||
|
@ -111,22 +148,36 @@ pub struct SearchOptions {
|
|||
struct ActiveEditLineInfo {
|
||||
/// Points to the start of the currently being edited line.
|
||||
safe_start: Cursor,
|
||||
/// Number of visual rows of the line that starts
|
||||
/// at [`ActiveEditLineInfo::safe_start`].
|
||||
line_height_in_rows: CoordType,
|
||||
/// Byte distance from the start of the line at
|
||||
/// [`ActiveEditLineInfo::safe_start`] to the next line.
|
||||
distance_next_line_start: usize,
|
||||
}
|
||||
|
||||
/// Char- or word-wise navigation? Your choice.
|
||||
pub enum CursorMovement {
|
||||
Grapheme,
|
||||
Word,
|
||||
}
|
||||
|
||||
/// The result of a call to [`TextBuffer::render()`].
|
||||
pub struct RenderResult {
|
||||
/// The maximum visual X position we encountered during rendering.
|
||||
pub visual_pos_x_max: CoordType,
|
||||
}
|
||||
|
||||
/// A [`TextBuffer`] with inner mutability.
|
||||
pub type TextBufferCell = SemiRefCell<TextBuffer>;
|
||||
|
||||
/// A [`TextBuffer`] inside an [`Rc`].
|
||||
///
|
||||
/// We need this because the TUI system needs to borrow
|
||||
/// the given text buffer(s) until after the layout process.
|
||||
pub type RcTextBuffer = Rc<TextBufferCell>;
|
||||
|
||||
/// A text buffer for a text editor.
|
||||
pub struct TextBuffer {
|
||||
buffer: GapBuffer,
|
||||
|
||||
|
@ -167,11 +218,15 @@ pub struct TextBuffer {
|
|||
}
|
||||
|
||||
impl TextBuffer {
|
||||
/// Creates a new text buffer inside an [`Rc`].
|
||||
/// See [`TextBuffer::new()`].
|
||||
pub fn new_rc(small: bool) -> apperr::Result<RcTextBuffer> {
|
||||
let buffer = TextBuffer::new(small)?;
|
||||
Ok(Rc::new(SemiRefCell::new(buffer)))
|
||||
}
|
||||
|
||||
/// Creates a new text buffer. With `small` you can control
|
||||
/// if the buffer is optimized for <1MiB contents.
|
||||
pub fn new(small: bool) -> apperr::Result<Self> {
|
||||
Ok(Self {
|
||||
buffer: GapBuffer::new(small)?,
|
||||
|
@ -209,26 +264,36 @@ impl TextBuffer {
|
|||
})
|
||||
}
|
||||
|
||||
/// Length of the document in bytes.
|
||||
pub fn text_length(&self) -> usize {
|
||||
self.buffer.len()
|
||||
}
|
||||
|
||||
/// Number of logical lines in the document,
|
||||
/// that is, lines separated by newlines.
|
||||
pub fn logical_line_count(&self) -> CoordType {
|
||||
self.stats.logical_lines
|
||||
}
|
||||
|
||||
/// Number of visual lines in the document,
|
||||
/// that is, the number of lines after layout.
|
||||
pub fn visual_line_count(&self) -> CoordType {
|
||||
self.stats.visual_lines
|
||||
}
|
||||
|
||||
/// Does the buffer need to be saved?
|
||||
pub fn is_dirty(&self) -> bool {
|
||||
self.last_save_generation != self.buffer.generation()
|
||||
}
|
||||
|
||||
/// The buffer generation changes on every edit.
|
||||
/// With this you can check if it has changed since
|
||||
/// the last time you called this function.
|
||||
pub fn generation(&self) -> u32 {
|
||||
self.buffer.generation()
|
||||
}
|
||||
|
||||
/// Force the buffer to be dirty.
|
||||
pub fn mark_as_dirty(&mut self) {
|
||||
self.last_save_generation = self.buffer.generation().wrapping_sub(1);
|
||||
}
|
||||
|
@ -237,10 +302,12 @@ impl TextBuffer {
|
|||
self.last_save_generation = self.buffer.generation();
|
||||
}
|
||||
|
||||
/// The encoding used during reading/writing. "UTF-8" is the default.
|
||||
pub fn encoding(&self) -> &'static str {
|
||||
self.encoding
|
||||
}
|
||||
|
||||
/// Set the encoding used during reading/writing.
|
||||
pub fn set_encoding(&mut self, encoding: &'static str) {
|
||||
if self.encoding != encoding {
|
||||
self.encoding = encoding;
|
||||
|
@ -248,10 +315,14 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// The newline type used in the document. LF or CRLF.
|
||||
pub fn is_crlf(&self) -> bool {
|
||||
self.newlines_are_crlf
|
||||
}
|
||||
|
||||
/// Changes the newline type used in the document.
|
||||
///
|
||||
/// NOTE: Cannot be undone.
|
||||
pub fn normalize_newlines(&mut self, crlf: bool) {
|
||||
let newline: &[u8] = if crlf { b"\r\n" } else { b"\n" };
|
||||
let mut off = 0;
|
||||
|
@ -318,26 +389,34 @@ impl TextBuffer {
|
|||
self.newlines_are_crlf = crlf;
|
||||
}
|
||||
|
||||
/// Whether to insert or overtype text when writing.
|
||||
pub fn is_overtype(&self) -> bool {
|
||||
self.overtype
|
||||
}
|
||||
|
||||
/// Set the overtype mode.
|
||||
pub fn set_overtype(&mut self, overtype: bool) {
|
||||
self.overtype = overtype;
|
||||
}
|
||||
|
||||
/// Gets the logical cursor position, that is,
|
||||
/// the position in lines and graphemes per line.
|
||||
pub fn cursor_logical_pos(&self) -> Point {
|
||||
self.cursor.logical_pos
|
||||
}
|
||||
|
||||
/// Gets the visual cursor position, that is,
|
||||
/// the position in laid out rows and columns.
|
||||
pub fn cursor_visual_pos(&self) -> Point {
|
||||
self.cursor.visual_pos
|
||||
}
|
||||
|
||||
/// Gets the width of the left margin.
|
||||
pub fn margin_width(&self) -> CoordType {
|
||||
self.margin_width
|
||||
}
|
||||
|
||||
/// Is the left margin enabled?
|
||||
pub fn set_margin_enabled(&mut self, enabled: bool) -> bool {
|
||||
if self.margin_enabled == enabled {
|
||||
false
|
||||
|
@ -348,22 +427,38 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Gets the width of the text contents for layout.
|
||||
pub fn text_width(&self) -> CoordType {
|
||||
self.width - self.margin_width
|
||||
}
|
||||
|
||||
/// Ask the TUI system to scroll the buffer and make the cursor visible.
|
||||
///
|
||||
/// TODO: This function shows that [`TextBuffer`] is poorly abstracted
|
||||
/// away from the TUI system. The only reason this exists is so that
|
||||
/// if someone outside the TUI code enables word-wrap, the TUI code
|
||||
/// recognizes this and scrolls the cursor into view. But outside of this
|
||||
/// scrolling, views, etc., are all UI concerns = this should not be here.
|
||||
pub fn make_cursor_visible(&mut self) {
|
||||
self.wants_cursor_visibility = true;
|
||||
}
|
||||
|
||||
/// For the TUI code to retrieve a prior [`TextBuffer::make_cursor_visible()`] request.
|
||||
pub fn take_cursor_visibility_request(&mut self) -> bool {
|
||||
mem::take(&mut self.wants_cursor_visibility)
|
||||
}
|
||||
|
||||
/// Is word-wrap enabled?
|
||||
///
|
||||
/// Technically, this is a misnomer, because it's line-wrapping.
|
||||
pub fn is_word_wrap_enabled(&self) -> bool {
|
||||
self.word_wrap_enabled
|
||||
}
|
||||
|
||||
/// Enable or disable word-wrap.
|
||||
///
|
||||
/// NOTE: It's expected that the tui code calls `set_width()` sometime after this.
|
||||
/// This will then trigger the actual recalculation of the cursor position.
|
||||
pub fn set_word_wrap(&mut self, enabled: bool) {
|
||||
if self.word_wrap_enabled != enabled {
|
||||
self.word_wrap_enabled = enabled;
|
||||
|
@ -372,6 +467,11 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Set the width available for layout.
|
||||
///
|
||||
/// Ideally this would be a pure UI concern, but the text buffer needs this
|
||||
/// so that it can abstract away visual cursor movement such as "go a line up".
|
||||
/// What would that even mean if it didn't know how wide a line is?
|
||||
pub fn set_width(&mut self, width: CoordType) -> bool {
|
||||
if width <= 0 || width == self.width {
|
||||
false
|
||||
|
@ -382,10 +482,12 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Set the tab width. Could be anything, but is expected to be 1-8.
|
||||
pub fn tab_size(&self) -> CoordType {
|
||||
self.tab_size
|
||||
}
|
||||
|
||||
/// Set the tab size. Clamped to 1-8.
|
||||
pub fn set_tab_size(&mut self, width: CoordType) -> bool {
|
||||
let width = width.clamp(1, 8);
|
||||
if width == self.tab_size {
|
||||
|
@ -397,18 +499,22 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Returns whether tabs are used for indentation.
|
||||
pub fn indent_with_tabs(&self) -> bool {
|
||||
self.indent_with_tabs
|
||||
}
|
||||
|
||||
/// Sets whether tabs or spaces are used for indentation.
|
||||
pub fn set_indent_with_tabs(&mut self, indent_with_tabs: bool) {
|
||||
self.indent_with_tabs = indent_with_tabs;
|
||||
}
|
||||
|
||||
/// Sets whether the line the cursor is on should be highlighted.
|
||||
pub fn set_line_highlight_enabled(&mut self, enabled: bool) {
|
||||
self.line_highlight_enabled = enabled;
|
||||
}
|
||||
|
||||
/// Sets a ruler column, e.g. 80.
|
||||
pub fn set_ruler(&mut self, column: CoordType) {
|
||||
self.ruler = column;
|
||||
}
|
||||
|
@ -799,6 +905,7 @@ impl TextBuffer {
|
|||
Ok(())
|
||||
}
|
||||
|
||||
/// Returns the current selection.
|
||||
pub fn has_selection(&self) -> bool {
|
||||
self.selection.is_some()
|
||||
}
|
||||
|
@ -809,6 +916,7 @@ impl TextBuffer {
|
|||
self.selection_generation
|
||||
}
|
||||
|
||||
/// Moves the cursor to `visual_pos` and updates the selection to contain it.
|
||||
pub fn selection_update_visual(&mut self, visual_pos: Point) {
|
||||
let cursor = self.cursor;
|
||||
self.set_cursor_for_selection(self.cursor_move_to_visual_internal(cursor, visual_pos));
|
||||
|
@ -826,6 +934,7 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Moves the cursor to `logical_pos` and updates the selection to contain it.
|
||||
pub fn selection_update_logical(&mut self, logical_pos: Point) {
|
||||
let cursor = self.cursor;
|
||||
self.set_cursor_for_selection(self.cursor_move_to_logical_internal(cursor, logical_pos));
|
||||
|
@ -843,6 +952,7 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Moves the cursor by `delta` and updates the selection to contain it.
|
||||
pub fn selection_update_delta(&mut self, granularity: CursorMovement, delta: CoordType) {
|
||||
let cursor = self.cursor;
|
||||
self.set_cursor_for_selection(self.cursor_move_delta_internal(cursor, granularity, delta));
|
||||
|
@ -860,6 +970,7 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Select the current word.
|
||||
pub fn select_word(&mut self) {
|
||||
let Range { start, end } = navigation::word_select(&self.buffer, self.cursor.offset);
|
||||
let beg = self.cursor_move_to_offset_internal(self.cursor, start);
|
||||
|
@ -871,6 +982,7 @@ impl TextBuffer {
|
|||
});
|
||||
}
|
||||
|
||||
/// Select the current line.
|
||||
pub fn select_line(&mut self) {
|
||||
let beg = self.cursor_move_to_logical_internal(
|
||||
self.cursor,
|
||||
|
@ -885,6 +997,7 @@ impl TextBuffer {
|
|||
});
|
||||
}
|
||||
|
||||
/// Select the entire document.
|
||||
pub fn select_all(&mut self) {
|
||||
let beg = Default::default();
|
||||
let end = self.cursor_move_to_logical_internal(beg, Point::MAX);
|
||||
|
@ -895,18 +1008,23 @@ impl TextBuffer {
|
|||
});
|
||||
}
|
||||
|
||||
/// Turn an active selection into a finalized selection.
|
||||
///
|
||||
/// Any future cursor movement will destroy the selection.
|
||||
pub fn selection_finalize(&mut self) {
|
||||
if let TextBufferSelection::Active { beg, end } = self.selection {
|
||||
self.set_selection(TextBufferSelection::Done { beg, end });
|
||||
}
|
||||
}
|
||||
|
||||
/// Destroy the current selection.
|
||||
pub fn clear_selection(&mut self) -> bool {
|
||||
let had_selection = self.selection.is_some();
|
||||
self.set_selection(TextBufferSelection::None);
|
||||
had_selection
|
||||
}
|
||||
|
||||
/// Find the next occurrence of the given `pattern` and select it.
|
||||
pub fn find_and_select(&mut self, pattern: &str, options: SearchOptions) -> apperr::Result<()> {
|
||||
if let Some(search) = &mut self.search {
|
||||
let search = search.get_mut();
|
||||
|
@ -959,6 +1077,7 @@ impl TextBuffer {
|
|||
Ok(())
|
||||
}
|
||||
|
||||
/// Find the next occurrence of the given `pattern` and replace it with `replacement`.
|
||||
pub fn find_and_replace(
|
||||
&mut self,
|
||||
pattern: &str,
|
||||
|
@ -978,6 +1097,7 @@ impl TextBuffer {
|
|||
self.find_and_select(pattern, options)
|
||||
}
|
||||
|
||||
/// Find all occurrences of the given `pattern` and replace them with `replacement`.
|
||||
pub fn find_and_replace_all(
|
||||
&mut self,
|
||||
pattern: &str,
|
||||
|
@ -1333,18 +1453,22 @@ impl TextBuffer {
|
|||
cursor
|
||||
}
|
||||
|
||||
/// Moves the cursor to the given offset.
|
||||
pub fn cursor_move_to_offset(&mut self, offset: usize) {
|
||||
unsafe { self.set_cursor(self.cursor_move_to_offset_internal(self.cursor, offset)) }
|
||||
}
|
||||
|
||||
/// Moves the cursor to the given logical position.
|
||||
pub fn cursor_move_to_logical(&mut self, pos: Point) {
|
||||
unsafe { self.set_cursor(self.cursor_move_to_logical_internal(self.cursor, pos)) }
|
||||
}
|
||||
|
||||
/// Moves the cursor to the given visual position.
|
||||
pub fn cursor_move_to_visual(&mut self, pos: Point) {
|
||||
unsafe { self.set_cursor(self.cursor_move_to_visual_internal(self.cursor, pos)) }
|
||||
}
|
||||
|
||||
/// Moves the cursor by the given delta.
|
||||
pub fn cursor_move_delta(&mut self, granularity: CursorMovement, delta: CoordType) {
|
||||
unsafe { self.set_cursor(self.cursor_move_delta_internal(self.cursor, granularity, delta)) }
|
||||
}
|
||||
|
@ -1847,11 +1971,13 @@ impl TextBuffer {
|
|||
self.edit_end();
|
||||
}
|
||||
|
||||
// TODO: This function is ripe for some optimizations:
|
||||
// * Instead of replacing the entire selection,
|
||||
// it should unindent each line directly (as if multiple cursors had been used).
|
||||
// * The cursor movement at the end is rather costly, but at least without word wrap
|
||||
// it should be possible to calculate it directly from the removed amount.
|
||||
/// Unindents the current selection or line.
|
||||
///
|
||||
/// TODO: This function is ripe for some optimizations:
|
||||
/// * Instead of replacing the entire selection,
|
||||
/// it should unindent each line directly (as if multiple cursors had been used).
|
||||
/// * The cursor movement at the end is rather costly, but at least without word wrap
|
||||
/// it should be possible to calculate it directly from the removed amount.
|
||||
pub fn unindent(&mut self) {
|
||||
let mut selection_beg = self.cursor.logical_pos;
|
||||
let mut selection_end = selection_beg;
|
||||
|
@ -1927,7 +2053,8 @@ impl TextBuffer {
|
|||
self.set_cursor_internal(self.cursor_move_to_logical_internal(self.cursor, selection_end));
|
||||
}
|
||||
|
||||
/// Extracts a chunk of text or a line if no selection is active. May optionally delete it.
|
||||
/// Extracts the contents of the current selection.
|
||||
/// May optionally delete it, if requested. This is meant to be used for Ctrl+X.
|
||||
pub fn extract_selection(&mut self, delete: bool) -> Vec<u8> {
|
||||
let Some((beg, end)) = self.selection_range_internal(true) else {
|
||||
return Vec::new();
|
||||
|
@ -1946,6 +2073,9 @@ impl TextBuffer {
|
|||
out
|
||||
}
|
||||
|
||||
/// Extracts the contents of the current selection the user made.
|
||||
/// This differs from [`TextBuffer::extract_selection()`] in that
|
||||
/// it does nothing if the selection was made by searching.
|
||||
pub fn extract_user_selection(&mut self, delete: bool) -> Option<Vec<u8>> {
|
||||
if !self.has_selection() {
|
||||
return None;
|
||||
|
@ -1961,10 +2091,17 @@ impl TextBuffer {
|
|||
Some(self.extract_selection(delete))
|
||||
}
|
||||
|
||||
/// Returns the current selection anchors, or `None` if there
|
||||
/// is no selection. The returned logical positions are sorted.
|
||||
pub fn selection_range(&self) -> Option<(Cursor, Cursor)> {
|
||||
self.selection_range_internal(false)
|
||||
}
|
||||
|
||||
/// Returns the current selection anchors.
|
||||
///
|
||||
/// If there's no selection and `line_fallback` is `true`,
|
||||
/// the start/end of the current line are returned.
|
||||
/// This is meant to be used for Ctrl+C / Ctrl+X.
|
||||
fn selection_range_internal(&self, line_fallback: bool) -> Option<(Cursor, Cursor)> {
|
||||
let [beg, end] = match self.selection {
|
||||
TextBufferSelection::None if !line_fallback => return None,
|
||||
|
@ -1983,6 +2120,8 @@ impl TextBuffer {
|
|||
if beg.offset < end.offset { Some((beg, end)) } else { None }
|
||||
}
|
||||
|
||||
/// Starts a new edit operation.
|
||||
/// This is used for tracking the undo/redo history.
|
||||
fn edit_begin(&mut self, history_type: HistoryType, cursor: Cursor) {
|
||||
self.active_edit_depth += 1;
|
||||
if self.active_edit_depth > 1 {
|
||||
|
@ -2033,6 +2172,8 @@ impl TextBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Writes `text` into the buffer at the current cursor position.
|
||||
/// It records the change in the undo stack.
|
||||
fn edit_write(&mut self, text: &[u8]) {
|
||||
let logical_y_before = self.cursor.logical_pos.y;
|
||||
|
||||
|
@ -2052,6 +2193,8 @@ impl TextBuffer {
|
|||
self.stats.logical_lines += self.cursor.logical_pos.y - logical_y_before;
|
||||
}
|
||||
|
||||
/// Deletes the text between the current cursor position and `to`.
|
||||
/// It records the change in the undo stack.
|
||||
fn edit_delete(&mut self, to: Cursor) {
|
||||
debug_assert!(to.offset >= self.active_edit_off);
|
||||
|
||||
|
@ -2076,6 +2219,8 @@ impl TextBuffer {
|
|||
self.stats.logical_lines += logical_y_before - to.logical_pos.y;
|
||||
}
|
||||
|
||||
/// Finalizes the current edit operation
|
||||
/// and recalculates the line statistics.
|
||||
fn edit_end(&mut self) {
|
||||
self.active_edit_depth -= 1;
|
||||
assert!(self.active_edit_depth >= 0);
|
||||
|
@ -2125,10 +2270,12 @@ impl TextBuffer {
|
|||
self.reflow(false);
|
||||
}
|
||||
|
||||
/// Undo the last edit operation.
|
||||
pub fn undo(&mut self) {
|
||||
self.undo_redo(true);
|
||||
}
|
||||
|
||||
/// Redo the last undo operation.
|
||||
pub fn redo(&mut self) {
|
||||
self.undo_redo(false);
|
||||
}
|
||||
|
@ -2238,10 +2385,12 @@ impl TextBuffer {
|
|||
self.reflow(false);
|
||||
}
|
||||
|
||||
/// For interfacing with ICU.
|
||||
pub(crate) fn read_backward(&self, off: usize) -> &[u8] {
|
||||
self.buffer.read_backward(off)
|
||||
}
|
||||
|
||||
/// For interfacing with ICU.
|
||||
pub fn read_forward(&self, off: usize) -> &[u8] {
|
||||
self.buffer.read_forward(off)
|
||||
}
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
//! Like `RefCell`, but without any runtime checks in release mode.
|
||||
//! [`std::cell::RefCell`], but without runtime checks in release builds.
|
||||
|
||||
#[cfg(debug_assertions)]
|
||||
pub use debug::*;
|
||||
|
|
|
@ -8,7 +8,7 @@ use std::path::PathBuf;
|
|||
use crate::arena::{ArenaString, scratch_arena};
|
||||
use crate::helpers::ReplaceRange as _;
|
||||
|
||||
/// An abstraction over potentially chunked text containers.
|
||||
/// An abstraction over reading from text containers.
|
||||
pub trait ReadableDocument {
|
||||
/// Read some bytes starting at (including) the given absolute offset.
|
||||
///
|
||||
|
@ -16,7 +16,7 @@ pub trait ReadableDocument {
|
|||
///
|
||||
/// * Be lenient on inputs:
|
||||
/// * The given offset may be out of bounds and you MUST clamp it.
|
||||
/// * You SHOULD NOT assume that offsets are at grapheme cluster boundaries.
|
||||
/// * You should not assume that offsets are at grapheme cluster boundaries.
|
||||
/// * Be strict on outputs:
|
||||
/// * You MUST NOT break grapheme clusters across chunks.
|
||||
/// * You MUST NOT return an empty slice unless the offset is at or beyond the end.
|
||||
|
@ -28,14 +28,21 @@ pub trait ReadableDocument {
|
|||
///
|
||||
/// * Be lenient on inputs:
|
||||
/// * The given offset may be out of bounds and you MUST clamp it.
|
||||
/// * You SHOULD NOT assume that offsets are at grapheme cluster boundaries.
|
||||
/// * You should not assume that offsets are at grapheme cluster boundaries.
|
||||
/// * Be strict on outputs:
|
||||
/// * You MUST NOT break grapheme clusters across chunks.
|
||||
/// * You MUST NOT return an empty slice unless the offset is zero.
|
||||
fn read_backward(&self, off: usize) -> &[u8];
|
||||
}
|
||||
|
||||
/// An abstraction over writing to text containers.
|
||||
pub trait WriteableDocument: ReadableDocument {
|
||||
/// Replace the given range with the given bytes.
|
||||
///
|
||||
/// # Warning
|
||||
///
|
||||
/// * The given range may be out of bounds and you MUST clamp it.
|
||||
/// * The replacement may not be valid UTF8.
|
||||
fn replace(&mut self, range: Range<usize>, replacement: &[u8]);
|
||||
}
|
||||
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
//! A shoddy framebuffer for terminal applications.
|
||||
|
||||
use std::cell::Cell;
|
||||
use std::fmt::Write;
|
||||
use std::ops::{BitOr, BitXor};
|
||||
|
@ -24,6 +26,7 @@ const CACHE_TABLE_SIZE: usize = 1 << CACHE_TABLE_LOG2_SIZE;
|
|||
/// 8 bits out, but rather shift 56 bits down to get the best bits from the top.
|
||||
const CACHE_TABLE_SHIFT: usize = usize::BITS as usize - CACHE_TABLE_LOG2_SIZE;
|
||||
|
||||
/// Standard 16 VT & default foreground/background colors.
|
||||
#[derive(Clone, Copy)]
|
||||
pub enum IndexedColor {
|
||||
Black,
|
||||
|
@ -47,33 +50,55 @@ pub enum IndexedColor {
|
|||
Foreground,
|
||||
}
|
||||
|
||||
/// Number of indices used by [`IndexedColor`].
|
||||
pub const INDEXED_COLORS_COUNT: usize = 18;
|
||||
|
||||
/// Fallback theme.
|
||||
pub const DEFAULT_THEME: [u32; INDEXED_COLORS_COUNT] = [
|
||||
0xff000000, 0xff212cbe, 0xff3aae3f, 0xff4a9abe, 0xffbe4d20, 0xffbe54bb, 0xffb2a700, 0xffbebebe,
|
||||
0xff808080, 0xff303eff, 0xff51ea58, 0xff44c9ff, 0xffff6a2f, 0xffff74fc, 0xfff0e100, 0xffffffff,
|
||||
0xff000000, 0xffffffff,
|
||||
];
|
||||
|
||||
/// A shoddy framebuffer for terminal applications.
|
||||
///
|
||||
/// The idea is that you create a [`Framebuffer`], draw a bunch of text and
|
||||
/// colors into it, and it takes care of figuring out what changed since the
|
||||
/// last rendering and sending the differences as VT to the terminal.
|
||||
///
|
||||
/// This is an improvement over how many other terminal applications work,
|
||||
/// as they fail to accurately track what changed. If you watch the output
|
||||
/// of `vim` for instance, you'll notice that it redraws unrelated parts of
|
||||
/// the screen all the time.
|
||||
pub struct Framebuffer {
|
||||
/// Store the color palette.
|
||||
indexed_colors: [u32; INDEXED_COLORS_COUNT],
|
||||
/// Front and back buffers. Indexed by `frame_counter & 1`.
|
||||
buffers: [Buffer; 2],
|
||||
/// The current frame counter. Increments on every `flip` call.
|
||||
frame_counter: usize,
|
||||
auto_colors: [u32; 2], // [dark, light]
|
||||
/// The colors used for `contrast()`. It stores the default colors
|
||||
/// of the palette as [dark, light], unless the palette is recognized
|
||||
/// as a light them, in which case it swaps them.
|
||||
auto_colors: [u32; 2],
|
||||
/// A cache table for previously contrasted colors.
|
||||
/// See: <https://fgiesen.wordpress.com/2019/02/11/cache-tables/>
|
||||
contrast_colors: [Cell<(u32, u32)>; CACHE_TABLE_SIZE],
|
||||
}
|
||||
|
||||
impl Framebuffer {
|
||||
/// Creates a new framebuffer.
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
indexed_colors: DEFAULT_THEME,
|
||||
buffers: Default::default(),
|
||||
frame_counter: 0,
|
||||
auto_colors: [0, 0],
|
||||
contrast_colors: [const { Cell::new((0, 0)) }; 256],
|
||||
contrast_colors: [const { Cell::new((0, 0)) }; CACHE_TABLE_SIZE],
|
||||
}
|
||||
}
|
||||
|
||||
/// Sets the base color palette.
|
||||
pub fn set_indexed_colors(&mut self, colors: [u32; INDEXED_COLORS_COUNT]) {
|
||||
self.indexed_colors = colors;
|
||||
|
||||
|
@ -86,6 +111,7 @@ impl Framebuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Begins a new frame with the given `size`.
|
||||
pub fn flip(&mut self, size: Size) {
|
||||
if size != self.buffers[0].bg_bitmap.size {
|
||||
for buffer in &mut self.buffers {
|
||||
|
@ -117,9 +143,7 @@ impl Framebuffer {
|
|||
|
||||
/// Replaces text contents in a single line of the framebuffer.
|
||||
/// All coordinates are in viewport coordinates.
|
||||
/// Assumes that all tabs have been replaced with spaces.
|
||||
///
|
||||
/// TODO: This function is ripe for performance improvements.
|
||||
/// Assumes that control characters have been replaced or escaped.
|
||||
pub fn replace_text(
|
||||
&mut self,
|
||||
y: CoordType,
|
||||
|
@ -131,6 +155,18 @@ impl Framebuffer {
|
|||
back.text.replace_text(y, origin_x, clip_right, text)
|
||||
}
|
||||
|
||||
/// Draws a scrollbar in the given `track` rectangle.
|
||||
///
|
||||
/// Not entirely sure why I put it here instead of elsewhere.
|
||||
///
|
||||
/// # Parameters
|
||||
///
|
||||
/// * `clip_rect`: Clips the rendering to this rectangle.
|
||||
/// This is relevant when you have scrollareas inside scrollareas.
|
||||
/// * `track`: The rectangle in which to draw the scrollbar.
|
||||
/// In absolute viewport coordinates.
|
||||
/// * `content_offset`: The current offset of the scrollarea.
|
||||
/// * `content_height`: The height of the scrollarea content.
|
||||
pub fn draw_scrollbar(
|
||||
&mut self,
|
||||
clip_rect: Rect,
|
||||
|
@ -247,8 +283,10 @@ impl Framebuffer {
|
|||
self.indexed_colors[index as usize]
|
||||
}
|
||||
|
||||
// To facilitate constant folding by the compiler,
|
||||
// alpha is given as a fraction (`numerator` / `denominator`).
|
||||
/// Returns a color from the palette.
|
||||
///
|
||||
/// To facilitate constant folding by the compiler,
|
||||
/// alpha is given as a fraction (`numerator` / `denominator`).
|
||||
#[inline]
|
||||
pub fn indexed_alpha(&self, index: IndexedColor, numerator: u32, denominator: u32) -> u32 {
|
||||
let c = self.indexed_colors[index as usize];
|
||||
|
@ -259,6 +297,7 @@ impl Framebuffer {
|
|||
a << 24 | r << 16 | g << 8 | b
|
||||
}
|
||||
|
||||
/// Returns a color opposite to the brightness of the given `color`.
|
||||
pub fn contrasted(&self, color: u32) -> u32 {
|
||||
let idx = (color as usize).wrapping_mul(HASH_MULTIPLIER) >> CACHE_TABLE_SHIFT;
|
||||
let slot = self.contrast_colors[idx].get();
|
||||
|
@ -277,16 +316,25 @@ impl Framebuffer {
|
|||
srgb_to_oklab(color).l < 0.5
|
||||
}
|
||||
|
||||
/// Blends the given sRGB color onto the background bitmap.
|
||||
///
|
||||
/// TODO: The current approach blends foreground/background independently,
|
||||
/// but ideally `blend_bg` with semi-transparent dark should also darken text below it.
|
||||
pub fn blend_bg(&mut self, target: Rect, bg: u32) {
|
||||
let back = &mut self.buffers[self.frame_counter & 1];
|
||||
back.bg_bitmap.blend(target, bg);
|
||||
}
|
||||
|
||||
/// Blends the given sRGB color onto the foreground bitmap.
|
||||
///
|
||||
/// TODO: The current approach blends foreground/background independently,
|
||||
/// but ideally `blend_fg` should blend with the background color below it.
|
||||
pub fn blend_fg(&mut self, target: Rect, fg: u32) {
|
||||
let back = &mut self.buffers[self.frame_counter & 1];
|
||||
back.fg_bitmap.blend(target, fg);
|
||||
}
|
||||
|
||||
/// Reverses the foreground and background colors in the given rectangle.
|
||||
pub fn reverse(&mut self, target: Rect) {
|
||||
let back = &mut self.buffers[self.frame_counter & 1];
|
||||
|
||||
|
@ -310,17 +358,23 @@ impl Framebuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Replaces VT attributes in the given rectangle.
|
||||
pub fn replace_attr(&mut self, target: Rect, mask: Attributes, attr: Attributes) {
|
||||
let back = &mut self.buffers[self.frame_counter & 1];
|
||||
back.attributes.replace(target, mask, attr);
|
||||
}
|
||||
|
||||
/// Sets the current visible cursor position and type.
|
||||
///
|
||||
/// Call this when focus is inside an editable area and you want to show the cursor.
|
||||
pub fn set_cursor(&mut self, pos: Point, overtype: bool) {
|
||||
let back = &mut self.buffers[self.frame_counter & 1];
|
||||
back.cursor.pos = pos;
|
||||
back.cursor.overtype = overtype;
|
||||
}
|
||||
|
||||
/// Renders the framebuffer contents accumulated since the
|
||||
/// last call to `flip()` and returns them serialized as VT.
|
||||
pub fn render<'a>(&mut self, arena: &'a Arena) -> ArenaString<'a> {
|
||||
let idx = self.frame_counter & 1;
|
||||
// Borrows the front/back buffers without letting Rust know that we have a reference to self.
|
||||
|
@ -484,6 +538,7 @@ struct Buffer {
|
|||
cursor: Cursor,
|
||||
}
|
||||
|
||||
/// A buffer for the text contents of the framebuffer.
|
||||
#[derive(Default)]
|
||||
struct LineBuffer {
|
||||
lines: Vec<String>,
|
||||
|
@ -509,10 +564,8 @@ impl LineBuffer {
|
|||
|
||||
/// Replaces text contents in a single line of the framebuffer.
|
||||
/// All coordinates are in viewport coordinates.
|
||||
/// Assumes that all tabs have been replaced with spaces.
|
||||
///
|
||||
/// TODO: This function is ripe for performance improvements.
|
||||
pub fn replace_text(
|
||||
/// Assumes that control characters have been replaced or escaped.
|
||||
fn replace_text(
|
||||
&mut self,
|
||||
y: CoordType,
|
||||
origin_x: CoordType,
|
||||
|
@ -632,6 +685,7 @@ impl LineBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// An sRGB bitmap.
|
||||
#[derive(Default)]
|
||||
struct Bitmap {
|
||||
data: Vec<u32>,
|
||||
|
@ -647,6 +701,10 @@ impl Bitmap {
|
|||
memset(&mut self.data, color);
|
||||
}
|
||||
|
||||
/// Blends the given sRGB color onto the bitmap.
|
||||
///
|
||||
/// This uses the `oklab` color space for blending so the
|
||||
/// resulting colors may look different from what you'd expect.
|
||||
fn blend(&mut self, target: Rect, color: u32) {
|
||||
if (color & 0xff000000) == 0x00000000 {
|
||||
return;
|
||||
|
@ -700,11 +758,14 @@ impl Bitmap {
|
|||
}
|
||||
}
|
||||
|
||||
/// A bitfield for VT text attributes.
|
||||
///
|
||||
/// It being a bitfield allows for simple diffing.
|
||||
#[repr(transparent)]
|
||||
#[derive(Default, Clone, Copy, PartialEq, Eq)]
|
||||
pub struct Attributes(u8);
|
||||
|
||||
#[allow(non_upper_case_globals)] // Mimics an enum, but it's actually a bitfield. Allows simple diffing.
|
||||
#[allow(non_upper_case_globals)]
|
||||
impl Attributes {
|
||||
pub const None: Attributes = Attributes(0);
|
||||
pub const Italic: Attributes = Attributes(0b1);
|
||||
|
@ -734,6 +795,7 @@ impl BitXor for Attributes {
|
|||
}
|
||||
}
|
||||
|
||||
/// Stores VT attributes for the framebuffer.
|
||||
#[derive(Default)]
|
||||
struct AttributeBuffer {
|
||||
data: Vec<Attributes>,
|
||||
|
@ -782,6 +844,7 @@ impl AttributeBuffer {
|
|||
}
|
||||
}
|
||||
|
||||
/// Stores cursor position and type for the framebuffer.
|
||||
#[derive(Default, PartialEq, Eq)]
|
||||
struct Cursor {
|
||||
pos: Point,
|
||||
|
|
|
@ -1,3 +1,12 @@
|
|||
//! Provides fast, non-cryptographic hash functions.
|
||||
|
||||
/// The venerable wyhash hash function.
|
||||
///
|
||||
/// It's fast, has good statistical properties, and is in the public domain.
|
||||
/// See: <https://github.com/wangyi-fudan/wyhash>
|
||||
/// If you visit the link, you'll find that it was superseded by "rapidhash",
|
||||
/// but that's not particularly interesting for this project. rapidhash results
|
||||
/// in way larger assembly and isn't faster when hashing small amounts of data.
|
||||
pub fn hash(mut seed: u64, data: &[u8]) -> u64 {
|
||||
unsafe {
|
||||
const S0: u64 = 0xa0761d6478bd642f;
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
//! Random assortment of helpers I didn't know where to put.
|
||||
|
||||
use std::alloc::Allocator;
|
||||
use std::cmp::Ordering;
|
||||
use std::io::Read;
|
||||
|
@ -15,11 +17,17 @@ pub const KIBI: usize = 1024;
|
|||
pub const MEBI: usize = 1024 * 1024;
|
||||
pub const GIBI: usize = 1024 * 1024 * 1024;
|
||||
|
||||
/// A viewport coordinate type used throughout the application.
|
||||
pub type CoordType = i32;
|
||||
|
||||
/// To avoid overflow issues because you're adding two [`CoordType::MAX`] values together,
|
||||
/// you can use [`COORD_TYPE_SAFE_MIN`] and [`COORD_TYPE_SAFE_MAX`].
|
||||
pub const COORD_TYPE_SAFE_MAX: CoordType = 32767;
|
||||
|
||||
/// See [`COORD_TYPE_SAFE_MAX`].
|
||||
pub const COORD_TYPE_SAFE_MIN: CoordType = -32767 - 1;
|
||||
|
||||
/// A 2D point. Uses [`CoordType`].
|
||||
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub struct Point {
|
||||
pub x: CoordType,
|
||||
|
@ -46,6 +54,7 @@ impl Ord for Point {
|
|||
}
|
||||
}
|
||||
|
||||
/// A 2D size. Uses [`CoordType`].
|
||||
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub struct Size {
|
||||
pub width: CoordType,
|
||||
|
@ -58,6 +67,7 @@ impl Size {
|
|||
}
|
||||
}
|
||||
|
||||
/// A 2D rectangle. Uses [`CoordType`].
|
||||
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub struct Rect {
|
||||
pub left: CoordType,
|
||||
|
@ -67,34 +77,44 @@ pub struct Rect {
|
|||
}
|
||||
|
||||
impl Rect {
|
||||
/// Mimics CSS's `padding` property where `padding: a` is `a a a a`.
|
||||
pub fn one(value: CoordType) -> Self {
|
||||
Self { left: value, top: value, right: value, bottom: value }
|
||||
}
|
||||
|
||||
/// Mimics CSS's `padding` property where `padding: a b` is `a b a b`,
|
||||
/// and `a` is top/bottom and `b` is left/right.
|
||||
pub fn two(top_bottom: CoordType, left_right: CoordType) -> Self {
|
||||
Self { left: left_right, top: top_bottom, right: left_right, bottom: top_bottom }
|
||||
}
|
||||
|
||||
/// Mimics CSS's `padding` property where `padding: a b c` is `a b c b`,
|
||||
/// and `a` is top, `b` is left/right, and `c` is bottom.
|
||||
pub fn three(top: CoordType, left_right: CoordType, bottom: CoordType) -> Self {
|
||||
Self { left: left_right, top, right: left_right, bottom }
|
||||
}
|
||||
|
||||
/// Is the rectangle empty?
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.left >= self.right || self.top >= self.bottom
|
||||
}
|
||||
|
||||
/// Width of the rectangle.
|
||||
pub fn width(&self) -> CoordType {
|
||||
self.right - self.left
|
||||
}
|
||||
|
||||
/// Height of the rectangle.
|
||||
pub fn height(&self) -> CoordType {
|
||||
self.bottom - self.top
|
||||
}
|
||||
|
||||
/// Check if it contains a point.
|
||||
pub fn contains(&self, point: Point) -> bool {
|
||||
point.x >= self.left && point.x < self.right && point.y >= self.top && point.y < self.bottom
|
||||
}
|
||||
|
||||
/// Intersect two rectangles.
|
||||
pub fn intersect(&self, rhs: Self) -> Self {
|
||||
let l = self.left.max(rhs.left);
|
||||
let t = self.top.max(rhs.top);
|
||||
|
@ -110,7 +130,7 @@ impl Rect {
|
|||
}
|
||||
}
|
||||
|
||||
/// `std::cmp::minmax` is unstable, as per usual.
|
||||
/// [`std::cmp::minmax`] is unstable, as per usual.
|
||||
pub fn minmax<T>(v1: T, v2: T) -> [T; 2]
|
||||
where
|
||||
T: Ord,
|
||||
|
@ -145,12 +165,16 @@ pub const unsafe fn str_from_raw_parts<'a>(ptr: *const u8, len: usize) -> &'a st
|
|||
unsafe { str::from_utf8_unchecked(slice::from_raw_parts(ptr, len)) }
|
||||
}
|
||||
|
||||
/// [`<[T]>::copy_from_slice`] panics if the two slices have different lengths.
|
||||
/// This one just returns the copied amount.
|
||||
pub fn slice_copy_safe<T: Copy>(dst: &mut [T], src: &[T]) -> usize {
|
||||
let len = src.len().min(dst.len());
|
||||
unsafe { ptr::copy_nonoverlapping(src.as_ptr(), dst.as_mut_ptr(), len) };
|
||||
len
|
||||
}
|
||||
|
||||
/// [`Vec::splice`] results in really bad assembly.
|
||||
/// This doesn't. Don't use [`Vec::splice`].
|
||||
pub trait ReplaceRange<T: Copy> {
|
||||
fn replace_range<R: RangeBounds<usize>>(&mut self, range: R, src: &[T]);
|
||||
}
|
||||
|
@ -205,6 +229,7 @@ fn vec_replace_impl<T: Copy, A: Allocator>(dst: &mut Vec<T, A>, range: Range<usi
|
|||
}
|
||||
}
|
||||
|
||||
/// [`Read`] but with [`MaybeUninit<u8>`] buffers.
|
||||
pub fn file_read_uninit<T: Read>(
|
||||
file: &mut T,
|
||||
buf: &mut [MaybeUninit<u8>],
|
||||
|
@ -216,11 +241,13 @@ pub fn file_read_uninit<T: Read>(
|
|||
}
|
||||
}
|
||||
|
||||
/// Turns a [`&[u8]`] into a [`&[MaybeUninit<T>]`].
|
||||
#[inline(always)]
|
||||
pub const fn slice_as_uninit_ref<T>(slice: &[T]) -> &[MaybeUninit<T>] {
|
||||
unsafe { slice::from_raw_parts(slice.as_ptr() as *const MaybeUninit<T>, slice.len()) }
|
||||
}
|
||||
|
||||
/// Turns a [`&mut [T]`] into a [`&mut [MaybeUninit<T>]`].
|
||||
#[inline(always)]
|
||||
pub const fn slice_as_uninit_mut<T>(slice: &mut [T]) -> &mut [MaybeUninit<T>] {
|
||||
unsafe { slice::from_raw_parts_mut(slice.as_mut_ptr() as *mut MaybeUninit<T>, slice.len()) }
|
||||
|
|
89
src/icu.rs
89
src/icu.rs
|
@ -1,3 +1,5 @@
|
|||
//! Bindings to the ICU library.
|
||||
|
||||
use std::cmp::Ordering;
|
||||
use std::ffi::CStr;
|
||||
use std::mem;
|
||||
|
@ -13,6 +15,7 @@ use crate::{apperr, arena_format, sys};
|
|||
|
||||
static mut ENCODINGS: Vec<&'static str> = Vec::new();
|
||||
|
||||
/// Returns a list of encodings ICU supports.
|
||||
pub fn get_available_encodings() -> &'static [&'static str] {
|
||||
// OnceCell for people that want to put it into a static.
|
||||
#[allow(static_mut_refs)]
|
||||
|
@ -38,6 +41,7 @@ pub fn get_available_encodings() -> &'static [&'static str] {
|
|||
}
|
||||
}
|
||||
|
||||
/// Formats the given ICU error code into a human-readable string.
|
||||
pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Result {
|
||||
fn format(code: u32) -> &'static str {
|
||||
let Ok(f) = init_if_needed() else {
|
||||
|
@ -62,6 +66,7 @@ pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Re
|
|||
}
|
||||
}
|
||||
|
||||
/// Converts between two encodings using ICU.
|
||||
pub struct Converter<'pivot> {
|
||||
source: *mut icu_ffi::UConverter,
|
||||
target: *mut icu_ffi::UConverter,
|
||||
|
@ -80,6 +85,14 @@ impl Drop for Converter<'_> {
|
|||
}
|
||||
|
||||
impl<'pivot> Converter<'pivot> {
|
||||
/// Constructs a new `Converter` instance.
|
||||
///
|
||||
/// # Parameters
|
||||
///
|
||||
/// * `pivot_buffer`: A buffer used to cache partial conversions.
|
||||
/// Don't make it too small.
|
||||
/// * `source_encoding`: The source encoding name (e.g., "UTF-8").
|
||||
/// * `target_encoding`: The target encoding name (e.g., "UTF-16").
|
||||
pub fn new(
|
||||
pivot_buffer: &'pivot mut [MaybeUninit<u16>],
|
||||
source_encoding: &str,
|
||||
|
@ -114,6 +127,20 @@ impl<'pivot> Converter<'pivot> {
|
|||
arena_format!(arena, "{}\0", input)
|
||||
}
|
||||
|
||||
/// Performs one step of the encoding conversion.
|
||||
///
|
||||
/// # Parameters
|
||||
///
|
||||
/// * `input`: The input buffer to convert from.
|
||||
/// It should be in the `source_encoding` that was previously specified.
|
||||
/// * `output`: The output buffer to convert to.
|
||||
/// It should be in the `target_encoding` that was previously specified.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// A tuple containing:
|
||||
/// 1. The number of bytes read from the input buffer.
|
||||
/// 2. The number of bytes written to the output buffer.
|
||||
pub fn convert(
|
||||
&mut self,
|
||||
input: &[u8],
|
||||
|
@ -168,24 +195,26 @@ impl<'pivot> Converter<'pivot> {
|
|||
// I picked 64 because it seemed like a reasonable lower bound.
|
||||
const CACHE_SIZE: usize = 64;
|
||||
|
||||
// Caches a chunk of TextBuffer contents (UTF-8) in UTF-16 format.
|
||||
/// Caches a chunk of TextBuffer contents (UTF-8) in UTF-16 format.
|
||||
struct Cache {
|
||||
/// The translated text. Contains `len`-many valid items.
|
||||
/// The translated text. Contains [`Cache::utf16_len`]-many valid items.
|
||||
utf16: [u16; CACHE_SIZE],
|
||||
/// For each character in `utf16` this stores the offset in the `TextBuffer`,
|
||||
/// For each character in [`Cache::utf16`] this stores the offset in the [`TextBuffer`],
|
||||
/// relative to the start offset stored in `native_beg`.
|
||||
/// This has the same length as `utf16`.
|
||||
/// This has the same length as [`Cache::utf16`].
|
||||
utf16_to_utf8_offsets: [u16; CACHE_SIZE],
|
||||
/// `utf8_to_utf16_offsets[native_offset - native_beg]` will tell you which character
|
||||
/// in `utf16` maps to the given `native_offset` in the underlying `TextBuffer`.
|
||||
/// `utf8_to_utf16_offsets[native_offset - native_beg]` will tell you which character in
|
||||
/// [`Cache::utf16`] maps to the given `native_offset` in the underlying [`TextBuffer`].
|
||||
/// Contains `native_end - native_beg`-many valid items.
|
||||
utf8_to_utf16_offsets: [u16; CACHE_SIZE],
|
||||
|
||||
/// The number of valid items in `utf16`.
|
||||
/// The number of valid items in [`Cache::utf16`].
|
||||
utf16_len: usize,
|
||||
/// Offset of the first non-ASCII character.
|
||||
/// Less than or equal to [`Cache::utf16_len`].
|
||||
native_indexing_limit: usize,
|
||||
|
||||
/// The range of UTF-8 text in the `TextBuffer` that this chunk covers.
|
||||
/// The range of UTF-8 text in the [`TextBuffer`] that this chunk covers.
|
||||
utf8_range: Range<usize>,
|
||||
}
|
||||
|
||||
|
@ -195,9 +224,15 @@ struct DoubleCache {
|
|||
mru: bool,
|
||||
}
|
||||
|
||||
// I initially did this properly with a PhantomData marker for the TextBuffer lifetime,
|
||||
// but it was a pain so now I don't. Not a big deal - its only use is in a self-referential
|
||||
// struct in TextBuffer which Rust can't deal with anyway.
|
||||
/// A wrapper around ICU's `UText` struct.
|
||||
///
|
||||
/// In our case its only purpose is to adapt a [`TextBuffer`] for ICU.
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// Warning! No lifetime tracking is done here.
|
||||
/// I initially did it properly with a PhantomData marker for the TextBuffer
|
||||
/// lifetime, but it was a pain so now I don't. Not a big deal in our case.
|
||||
pub struct Text(&'static mut icu_ffi::UText);
|
||||
|
||||
impl Drop for Text {
|
||||
|
@ -208,11 +243,12 @@ impl Drop for Text {
|
|||
}
|
||||
|
||||
impl Text {
|
||||
/// Constructs an ICU `UText` instance from a `TextBuffer`.
|
||||
/// Constructs an ICU `UText` instance from a [`TextBuffer`].
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// The caller must ensure that the given `TextBuffer` outlives the returned `Text` instance.
|
||||
/// The caller must ensure that the given [`TextBuffer`]
|
||||
/// outlives the returned `Text` instance.
|
||||
pub unsafe fn new(tb: &TextBuffer) -> apperr::Result<Self> {
|
||||
let f = init_if_needed()?;
|
||||
|
||||
|
@ -349,12 +385,16 @@ fn utext_access_impl<'a>(
|
|||
let dirty = ut.a != tb.generation() as i64;
|
||||
|
||||
if dirty {
|
||||
// The text buffer contents have changed.
|
||||
// Invalidate both caches so that future calls don't mistakenly use them
|
||||
// when they enter the for loop in the else branch below (`dirty == false`).
|
||||
double_cache.cache[0].utf16_len = 0;
|
||||
double_cache.cache[1].utf16_len = 0;
|
||||
double_cache.cache[0].utf8_range = 0..0;
|
||||
double_cache.cache[1].utf8_range = 0..0;
|
||||
ut.a = tb.generation() as i64;
|
||||
} else {
|
||||
// Check if one of the caches already contains the requested range.
|
||||
for (i, cache) in double_cache.cache.iter_mut().enumerate() {
|
||||
if cache.utf8_range.contains(&index_contained) {
|
||||
double_cache.mru = i != 0;
|
||||
|
@ -443,13 +483,12 @@ fn utext_access_impl<'a>(
|
|||
}
|
||||
}
|
||||
|
||||
// TODO: This loop is the slow part of our uregex search. May be worth optimizing.
|
||||
loop {
|
||||
let Some(c) = it.next() else {
|
||||
break;
|
||||
};
|
||||
|
||||
// Thanks to our `if utf16_len >= utf16_limit` check,
|
||||
// Thanks to our `if utf16_len >= UTF16_LEN_LIMIT` check,
|
||||
// we can safely assume that this will fit.
|
||||
unsafe {
|
||||
let utf8_len_beg = utf8_len;
|
||||
|
@ -515,7 +554,11 @@ extern "C" fn utext_map_native_index_to_utf16(ut: &icu_ffi::UText, native_index:
|
|||
off_rel as i32
|
||||
}
|
||||
|
||||
// Same reason here for not using a PhantomData marker as with `Text`.
|
||||
/// A wrapper around ICU's `URegularExpression` struct.
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// Warning! No lifetime tracking is done here.
|
||||
pub struct Regex(&'static mut icu_ffi::URegularExpression);
|
||||
|
||||
impl Drop for Regex {
|
||||
|
@ -526,8 +569,14 @@ impl Drop for Regex {
|
|||
}
|
||||
|
||||
impl Regex {
|
||||
/// Enable case-insensitive matching.
|
||||
pub const CASE_INSENSITIVE: i32 = icu_ffi::UREGEX_CASE_INSENSITIVE;
|
||||
|
||||
/// If set, ^ and $ match the start and end of each line.
|
||||
/// Otherwise, they match the start and end of the entire string.
|
||||
pub const MULTILINE: i32 = icu_ffi::UREGEX_MULTILINE;
|
||||
|
||||
/// Treat the given pattern as a literal string.
|
||||
pub const LITERAL: i32 = icu_ffi::UREGEX_LITERAL;
|
||||
|
||||
/// Constructs a regex, plain and simple. Read `uregex_open` docs.
|
||||
|
@ -566,7 +615,7 @@ impl Regex {
|
|||
}
|
||||
|
||||
/// Updates the regex pattern with the given text.
|
||||
/// If the text contents have changed, you can pass the same text as you usued
|
||||
/// If the text contents have changed, you can pass the same text as you used
|
||||
/// initially and it'll trigger ICU to reload the text and invalidate its caches.
|
||||
///
|
||||
/// # Safety
|
||||
|
@ -578,6 +627,7 @@ impl Regex {
|
|||
unsafe { (f.uregex_setUText)(self.0, text.0 as *const _ as *mut _, &mut status) };
|
||||
}
|
||||
|
||||
/// Sets the regex to the absolute offset in the underlying text.
|
||||
pub fn reset(&mut self, index: usize) {
|
||||
let f = assume_loaded();
|
||||
let mut status = icu_ffi::U_ZERO_ERROR;
|
||||
|
@ -611,6 +661,7 @@ impl Iterator for Regex {
|
|||
|
||||
static mut ROOT_COLLATOR: Option<*mut icu_ffi::UCollator> = None;
|
||||
|
||||
/// Compares two UTF-8 strings for sorting using ICU's collation algorithm.
|
||||
pub fn compare_strings(a: &[u8], b: &[u8]) -> Ordering {
|
||||
// OnceCell for people that want to put it into a static.
|
||||
#[allow(static_mut_refs)]
|
||||
|
@ -688,6 +739,10 @@ fn compare_strings_ascii(a: &[u8], b: &[u8]) -> Ordering {
|
|||
|
||||
static mut ROOT_CASEMAP: Option<*mut icu_ffi::UCaseMap> = None;
|
||||
|
||||
/// Converts the given UTF-8 string to lower case.
|
||||
///
|
||||
/// Case folding differs from lower case in that the output is primarily useful
|
||||
/// to machines for comparisons. It's like applying Unicode normalization.
|
||||
pub fn fold_case<'a>(arena: &'a Arena, input: &str) -> ArenaString<'a> {
|
||||
// OnceCell for people that want to put it into a static.
|
||||
#[allow(static_mut_refs)]
|
||||
|
|
64
src/input.rs
64
src/input.rs
|
@ -1,10 +1,17 @@
|
|||
//! Parses VT sequences into input events.
|
||||
//!
|
||||
//! In the future this allows us to take apart the application and
|
||||
//! support input schemes that aren't VT, such as UEFI, or GUI.
|
||||
|
||||
use crate::helpers::{CoordType, Point, Size};
|
||||
use crate::vt;
|
||||
|
||||
// TODO: Is this a good idea? I did it to allow typing `kbmod::CTRL | vk::A`.
|
||||
// The reason it's an awkard u32 and not a struct is to hopefully make ABIs easier later.
|
||||
// Of course you could just translate on the ABI boundary, but my hope is that this
|
||||
// design lets me realize some restrictions early on that I can't foresee yet.
|
||||
/// Represents a key/modifier combination.
|
||||
///
|
||||
/// TODO: Is this a good idea? I did it to allow typing `kbmod::CTRL | vk::A`.
|
||||
/// The reason it's an awkard u32 and not a struct is to hopefully make ABIs easier later.
|
||||
/// Of course you could just translate on the ABI boundary, but my hope is that this
|
||||
/// design lets me realize some restrictions early on that I can't foresee yet.
|
||||
#[repr(transparent)]
|
||||
#[derive(Clone, Copy, PartialEq, Eq)]
|
||||
pub struct InputKey(u32);
|
||||
|
@ -47,6 +54,7 @@ impl InputKey {
|
|||
}
|
||||
}
|
||||
|
||||
/// A keyboard modifier. Ctrl/Alt/Shift.
|
||||
#[repr(transparent)]
|
||||
#[derive(Clone, Copy, PartialEq, Eq)]
|
||||
pub struct InputKeyMod(u32);
|
||||
|
@ -83,8 +91,10 @@ impl std::ops::BitOrAssign for InputKeyMod {
|
|||
}
|
||||
}
|
||||
|
||||
// The codes defined here match the VK_* constants on Windows.
|
||||
// It's a convenient way to handle keyboard input, even on other platforms.
|
||||
/// Keyboard keys.
|
||||
///
|
||||
/// The codes defined here match the VK_* constants on Windows.
|
||||
/// It's a convenient way to handle keyboard input, even on other platforms.
|
||||
pub mod vk {
|
||||
use super::InputKey;
|
||||
|
||||
|
@ -189,6 +199,7 @@ pub mod vk {
|
|||
pub const F24: InputKey = InputKey::new(0x87);
|
||||
}
|
||||
|
||||
/// Keyboard modifiers.
|
||||
pub mod kbmod {
|
||||
use super::InputKeyMod;
|
||||
|
||||
|
@ -203,12 +214,17 @@ pub mod kbmod {
|
|||
pub const CTRL_ALT_SHIFT: InputKeyMod = InputKeyMod::new(0x07000000);
|
||||
}
|
||||
|
||||
/// Text input.
|
||||
///
|
||||
/// "Keyboard" input is also "text" input and vice versa.
|
||||
/// It differs in that text input can also be Unicode.
|
||||
#[derive(Clone, Copy)]
|
||||
pub struct InputText<'a> {
|
||||
pub text: &'a str,
|
||||
pub bracketed: bool,
|
||||
}
|
||||
|
||||
/// Mouse input state. Up/Down, Left/Right, etc.
|
||||
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Default)]
|
||||
pub enum InputMouseState {
|
||||
#[default]
|
||||
|
@ -224,21 +240,34 @@ pub enum InputMouseState {
|
|||
Scroll,
|
||||
}
|
||||
|
||||
/// Mouse input.
|
||||
#[derive(Clone, Copy)]
|
||||
pub struct InputMouse {
|
||||
/// The state of the mouse.Up/Down, Left/Right, etc.
|
||||
pub state: InputMouseState,
|
||||
/// Any keyboard modifiers that are held down.
|
||||
pub modifiers: InputKeyMod,
|
||||
/// Position of the mouse in the viewport.
|
||||
pub position: Point,
|
||||
/// Scroll delta.
|
||||
pub scroll: Point,
|
||||
}
|
||||
|
||||
/// Primary result type of the parser.
|
||||
pub enum Input<'input> {
|
||||
/// Window resize event.
|
||||
Resize(Size),
|
||||
/// Text input.
|
||||
///
|
||||
/// Note that [`Input::Keyboard`] events can also be text.
|
||||
Text(InputText<'input>),
|
||||
/// Keyboard input.
|
||||
Keyboard(InputKey),
|
||||
/// Mouse input.
|
||||
Mouse(InputMouse),
|
||||
}
|
||||
|
||||
/// Parses VT sequences into input events.
|
||||
pub struct Parser {
|
||||
bracketed_paste: bool,
|
||||
x10_mouse_want: bool,
|
||||
|
@ -247,6 +276,9 @@ pub struct Parser {
|
|||
}
|
||||
|
||||
impl Parser {
|
||||
/// Creates a new parser that turns VT sequences into input events.
|
||||
///
|
||||
/// Keep the instance alive for the lifetime of the input stream.
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
bracketed_paste: false,
|
||||
|
@ -256,7 +288,8 @@ impl Parser {
|
|||
}
|
||||
}
|
||||
|
||||
/// Turns VT sequences into keyboard, mouse, etc., inputs.
|
||||
/// Takes an [`vt::Stream`] and returns a [`Stream`]
|
||||
/// that turns VT sequences into input events.
|
||||
pub fn parse<'parser, 'vt, 'input>(
|
||||
&'parser mut self,
|
||||
stream: vt::Stream<'vt, 'input>,
|
||||
|
@ -265,15 +298,15 @@ impl Parser {
|
|||
}
|
||||
}
|
||||
|
||||
/// An iterator that parses VT sequences into input events.
|
||||
///
|
||||
/// Can't implement [`Iterator`], because this is a "lending iterator".
|
||||
pub struct Stream<'parser, 'vt, 'input> {
|
||||
parser: &'parser mut Parser,
|
||||
stream: vt::Stream<'vt, 'input>,
|
||||
}
|
||||
|
||||
impl<'input> Stream<'_, '_, 'input> {
|
||||
/// Parses the next input action from the previously given input.
|
||||
///
|
||||
/// Can't implement Iterator, because this is a "lending iterator".
|
||||
#[allow(clippy::should_implement_trait)]
|
||||
pub fn next(&mut self) -> Option<Input<'input>> {
|
||||
loop {
|
||||
|
@ -446,6 +479,17 @@ impl<'input> Stream<'_, '_, 'input> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Once we encounter the start of a bracketed paste
|
||||
/// we seek to the end of the paste in this function.
|
||||
///
|
||||
/// A bracketed paste is basically:
|
||||
/// ```text
|
||||
/// <ESC>[201~ lots of text <ESC>[201~
|
||||
/// ```
|
||||
///
|
||||
/// That text inbetween is then expected to be taken literally.
|
||||
/// It can inbetween be anything though, including other escape sequences.
|
||||
/// This is the reason why this is a separate method.
|
||||
#[cold]
|
||||
fn handle_bracketed_paste(&mut self) -> Option<Input<'input>> {
|
||||
let beg = self.stream.offset();
|
||||
|
|
|
@ -1,7 +1,10 @@
|
|||
//! This module implements Oklab as defined at: https://bottosson.github.io/posts/oklab/
|
||||
//! Oklab colorspace conversions.
|
||||
//!
|
||||
//! Implements Oklab as defined at: <https://bottosson.github.io/posts/oklab/>
|
||||
|
||||
#![allow(clippy::excessive_precision)]
|
||||
|
||||
/// An Oklab color with alpha.
|
||||
pub struct Lab {
|
||||
pub l: f32,
|
||||
pub a: f32,
|
||||
|
@ -9,6 +12,7 @@ pub struct Lab {
|
|||
pub alpha: f32,
|
||||
}
|
||||
|
||||
/// Converts a 32-bit sRGB color to Oklab.
|
||||
pub fn srgb_to_oklab(color: u32) -> Lab {
|
||||
let r = SRGB_TO_RGB_LUT[(color & 0xff) as usize];
|
||||
let g = SRGB_TO_RGB_LUT[((color >> 8) & 0xff) as usize];
|
||||
|
@ -31,6 +35,7 @@ pub fn srgb_to_oklab(color: u32) -> Lab {
|
|||
}
|
||||
}
|
||||
|
||||
/// Converts an Oklab color to a 32-bit sRGB color.
|
||||
pub fn oklab_to_srgb(c: Lab) -> u32 {
|
||||
let l_ = c.l + 0.3963377774 * c.a + 0.2158037573 * c.b;
|
||||
let m_ = c.l - 0.1055613458 * c.a - 0.0638541728 * c.b;
|
||||
|
@ -57,6 +62,7 @@ pub fn oklab_to_srgb(c: Lab) -> u32 {
|
|||
r | (g << 8) | (b << 16) | (a << 24)
|
||||
}
|
||||
|
||||
/// Blends two 32-bit sRGB colors in the Oklab color space.
|
||||
pub fn oklab_blend(dst: u32, src: u32) -> u32 {
|
||||
let dst = srgb_to_oklab(dst);
|
||||
let src = srgb_to_oklab(src);
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
//! Path related helpers.
|
||||
|
||||
use std::ffi::OsStr;
|
||||
use std::path::{Component, MAIN_SEPARATOR_STR, Path, PathBuf};
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
//! Rust has a very popular `memchr` crate. It's quite fast, so you may ask yourself
|
||||
//! why we don't just use it: Simply put, this is optimized for short inputs.
|
||||
//! `memchr`, but with two needles.
|
||||
|
||||
use std::ptr;
|
||||
|
||||
use super::distance;
|
||||
|
||||
/// memchr(), but with two needles.
|
||||
/// Returns the index of the first occurrence of either needle in the `haystack`.
|
||||
/// If no needle is found, `haystack.len()` is returned.
|
||||
/// `memchr`, but with two needles.
|
||||
///
|
||||
/// Returns the index of the first occurrence of either needle in the
|
||||
/// `haystack`. If no needle is found, `haystack.len()` is returned.
|
||||
/// `offset` specifies the index to start searching from.
|
||||
pub fn memchr2(needle1: u8, needle2: u8, haystack: &[u8], offset: usize) -> usize {
|
||||
unsafe {
|
||||
|
|
|
@ -1,16 +1,15 @@
|
|||
//! Rust has a very popular `memchr` crate. It's quite fast, so you may ask yourself
|
||||
//! why we don't just use it: Simply put, this is optimized for short inputs.
|
||||
//! `memchr`, but with two needles.
|
||||
|
||||
use std::ptr;
|
||||
|
||||
use super::distance;
|
||||
|
||||
/// Same as `memchr2`, but searches from the end of the haystack.
|
||||
/// If no needle is found, 0 is returned.
|
||||
/// `memchr`, but with two needles.
|
||||
///
|
||||
/// *NOTE: Unlike `memchr2` (or `memrchr`), an offset PAST the hit is returned.*
|
||||
/// This is because this function is primarily used for `unicode::newlines_backward`,
|
||||
/// which needs exactly that.
|
||||
/// If no needle is found, 0 is returned.
|
||||
/// Unlike `memchr2` (or `memrchr`), an offset PAST the hit is returned.
|
||||
/// This is because this function is primarily used for
|
||||
/// `ucd::newlines_backward`, which needs exactly that.
|
||||
pub fn memrchr2(needle1: u8, needle2: u8, haystack: &[u8], offset: usize) -> Option<usize> {
|
||||
unsafe {
|
||||
let beg = haystack.as_ptr();
|
||||
|
|
|
@ -1,21 +1,25 @@
|
|||
//! This module provides a `memset` function for "arbitrary" sizes (1/2/4/8 bytes), as the regular `memset`
|
||||
//! is only implemented for byte-sized arrays. This allows us to more aggressively unroll loops and to
|
||||
//! use AVX2 on x64 for the non-byte-sized cases and opens the door to compiling with `-Copt-level=s`.
|
||||
//! `memchr` for arbitrary sizes (1/2/4/8 bytes).
|
||||
//!
|
||||
//! This implementation uses SWAR to only have a single implementation for all 4 sizes: By duplicating smaller
|
||||
//! types into a larger `u64` register we can treat all sizes as if they were `u64`. The only thing we need
|
||||
//! to take care of then, is the tail end of the array, where we need to write 0-7 additional bytes.
|
||||
//! Clang calls the C `memset` function only for byte-sized types (or 0 fills).
|
||||
//! We however need to fill other types as well. For that, clang generates
|
||||
//! SIMD loops under higher optimization levels. With `-Os` however, it only
|
||||
//! generates a trivial loop which is too slow for our needs.
|
||||
//!
|
||||
//! This implementation uses SWAR to only have a single implementation for all
|
||||
//! 4 sizes: By duplicating smaller types into a larger `u64` register we can
|
||||
//! treat all sizes as if they were `u64`. The only thing we need to take care
|
||||
//! of is the tail end of the array, which needs to write 0-7 additional bytes.
|
||||
|
||||
use std::mem;
|
||||
|
||||
use super::distance;
|
||||
|
||||
/// A trait to mark types that are safe to use with `memset`.
|
||||
/// A marker trait for types that are safe to `memset`.
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
/// Just like with C's `memset`, bad things happen
|
||||
/// if you use this with types that are non-trivial.
|
||||
/// if you use this with non-trivial types.
|
||||
pub unsafe trait MemsetSafe: Copy {}
|
||||
|
||||
unsafe impl MemsetSafe for u8 {}
|
||||
|
@ -30,6 +34,7 @@ unsafe impl MemsetSafe for i32 {}
|
|||
unsafe impl MemsetSafe for i64 {}
|
||||
unsafe impl MemsetSafe for isize {}
|
||||
|
||||
/// Fills a slice with the given value.
|
||||
#[inline]
|
||||
pub fn memset<T: MemsetSafe>(dst: &mut [T], val: T) {
|
||||
unsafe {
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
//! Provides various high-throughput utilities.
|
||||
|
||||
mod memchr2;
|
||||
mod memrchr2;
|
||||
mod memset;
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
//! Platform abstractions.
|
||||
|
||||
use std::fs::File;
|
||||
use std::path::Path;
|
||||
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
//! Unix-specific platform code.
|
||||
//!
|
||||
//! Read the `windows` module for reference.
|
||||
//! TODO: This reminds me that the sys API should probably be a trait.
|
||||
|
||||
use std::ffi::{CStr, c_int, c_void};
|
||||
use std::fs::{self, File};
|
||||
use std::mem::{self, MaybeUninit};
|
||||
|
|
|
@ -73,6 +73,7 @@ extern "system" fn console_ctrl_handler(_ctrl_type: u32) -> Foundation::BOOL {
|
|||
1
|
||||
}
|
||||
|
||||
/// Initializes the platform-specific state.
|
||||
pub fn init() -> apperr::Result<Deinit> {
|
||||
unsafe {
|
||||
// Get the stdin and stdout handles first, so that if this function fails,
|
||||
|
@ -151,6 +152,7 @@ impl Drop for Deinit {
|
|||
}
|
||||
}
|
||||
|
||||
/// Switches the terminal into raw mode, etc.
|
||||
pub fn switch_modes() -> apperr::Result<()> {
|
||||
unsafe {
|
||||
check_bool_return(Console::SetConsoleCtrlHandler(Some(console_ctrl_handler), 1))?;
|
||||
|
@ -180,6 +182,10 @@ pub fn switch_modes() -> apperr::Result<()> {
|
|||
}
|
||||
}
|
||||
|
||||
/// During startup we need to get the window size from the terminal.
|
||||
/// Because I didn't want to type a bunch of code, this function tells
|
||||
/// [`read_stdin`] to inject a fake sequence, which gets picked up by
|
||||
/// the input parser and provided to the TUI code.
|
||||
pub fn inject_window_size_into_stdin() {
|
||||
unsafe {
|
||||
STATE.inject_resize = true;
|
||||
|
@ -202,9 +208,11 @@ fn get_console_size() -> Option<Size> {
|
|||
|
||||
/// Reads from stdin.
|
||||
///
|
||||
/// Returns `None` if there was an error reading from stdin.
|
||||
/// Returns `Some("")` if the given timeout was reached.
|
||||
/// Otherwise, it returns the read, non-empty string.
|
||||
/// # Returns
|
||||
///
|
||||
/// * `None` if there was an error reading from stdin.
|
||||
/// * `Some("")` if the given timeout was reached.
|
||||
/// * Otherwise, it returns the read, non-empty string.
|
||||
pub fn read_stdin(arena: &Arena, mut timeout: time::Duration) -> Option<ArenaString<'_>> {
|
||||
let scratch = scratch_arena(Some(arena));
|
||||
|
||||
|
@ -351,6 +359,10 @@ pub fn read_stdin(arena: &Arena, mut timeout: time::Duration) -> Option<ArenaStr
|
|||
Some(text)
|
||||
}
|
||||
|
||||
/// Writes a string to stdout.
|
||||
///
|
||||
/// Use this instead of `print!` or `println!` to avoid
|
||||
/// the overhead of Rust's stdio handling. Don't need that.
|
||||
pub fn write_stdout(text: &str) {
|
||||
unsafe {
|
||||
let mut offset = 0;
|
||||
|
@ -368,6 +380,12 @@ pub fn write_stdout(text: &str) {
|
|||
}
|
||||
}
|
||||
|
||||
/// Check if the stdin handle is redirected to a file, etc.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// * `Some(file)` if stdin is redirected.
|
||||
/// * Otherwise, `None`.
|
||||
pub fn open_stdin_if_redirected() -> Option<File> {
|
||||
unsafe {
|
||||
let handle = Console::GetStdHandle(Console::STD_INPUT_HANDLE);
|
||||
|
@ -376,12 +394,14 @@ pub fn open_stdin_if_redirected() -> Option<File> {
|
|||
}
|
||||
}
|
||||
|
||||
/// A unique identifier for a file.
|
||||
#[derive(Clone)]
|
||||
#[repr(transparent)]
|
||||
pub struct FileId(FileSystem::FILE_ID_INFO);
|
||||
|
||||
impl PartialEq for FileId {
|
||||
fn eq(&self, other: &Self) -> bool {
|
||||
// Lowers to an efficient word-wise comparison.
|
||||
const SIZE: usize = std::mem::size_of::<FileSystem::FILE_ID_INFO>();
|
||||
let a: &[u8; SIZE] = unsafe { mem::transmute(&self.0) };
|
||||
let b: &[u8; SIZE] = unsafe { mem::transmute(&other.0) };
|
||||
|
@ -405,6 +425,10 @@ pub fn file_id(file: &File) -> apperr::Result<FileId> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Canonicalizes the given path.
|
||||
///
|
||||
/// This differs from [`fs::canonicalize`] in that it strips the `\\?\` UNC
|
||||
/// prefix on Windows. This is because it's confusing/ugly when displaying it.
|
||||
pub fn canonicalize(path: &Path) -> std::io::Result<PathBuf> {
|
||||
let mut path = fs::canonicalize(path)?;
|
||||
let path = path.as_mut_os_string();
|
||||
|
@ -421,8 +445,8 @@ pub fn canonicalize(path: &Path) -> std::io::Result<PathBuf> {
|
|||
}
|
||||
|
||||
/// Reserves a virtual memory region of the given size.
|
||||
/// To commit the memory, use `virtual_commit`.
|
||||
/// To release the memory, use `virtual_release`.
|
||||
/// To commit the memory, use [`virtual_commit`].
|
||||
/// To release the memory, use [`virtual_release`].
|
||||
///
|
||||
/// # Safety
|
||||
///
|
||||
|
@ -456,7 +480,7 @@ pub unsafe fn virtual_reserve(size: usize) -> apperr::Result<NonNull<u8>> {
|
|||
/// # Safety
|
||||
///
|
||||
/// This function is unsafe because it uses raw pointers.
|
||||
/// Make sure to only pass pointers acquired from `virtual_reserve`.
|
||||
/// Make sure to only pass pointers acquired from [`virtual_reserve`].
|
||||
pub unsafe fn virtual_release(base: NonNull<u8>, size: usize) {
|
||||
unsafe {
|
||||
Memory::VirtualFree(base.as_ptr() as *mut _, size, Memory::MEM_RELEASE);
|
||||
|
@ -468,8 +492,8 @@ pub unsafe fn virtual_release(base: NonNull<u8>, size: usize) {
|
|||
/// # Safety
|
||||
///
|
||||
/// This function is unsafe because it uses raw pointers.
|
||||
/// Make sure to only pass pointers acquired from `virtual_reserve`
|
||||
/// and to pass a size less than or equal to the size passed to `virtual_reserve`.
|
||||
/// Make sure to only pass pointers acquired from [`virtual_reserve`]
|
||||
/// and to pass a size less than or equal to the size passed to [`virtual_reserve`].
|
||||
pub unsafe fn virtual_commit(base: NonNull<u8>, size: usize) -> apperr::Result<()> {
|
||||
unsafe {
|
||||
check_ptr_return(Memory::VirtualAlloc(
|
||||
|
@ -511,14 +535,17 @@ pub unsafe fn get_proc_address<T>(handle: NonNull<c_void>, name: &CStr) -> apper
|
|||
}
|
||||
}
|
||||
|
||||
/// Loads the "common" portion of ICU4C.
|
||||
pub fn load_libicuuc() -> apperr::Result<NonNull<c_void>> {
|
||||
unsafe { load_library(w!("icuuc.dll")) }
|
||||
}
|
||||
|
||||
/// Loads the internationalization portion of ICU4C.
|
||||
pub fn load_libicui18n() -> apperr::Result<NonNull<c_void>> {
|
||||
unsafe { load_library(w!("icuin.dll")) }
|
||||
}
|
||||
|
||||
/// Returns a list of preferred languages for the current user.
|
||||
pub fn preferred_languages(arena: &Arena) -> Vec<ArenaString, &Arena> {
|
||||
// If the GetUserPreferredUILanguages() don't fit into 512 characters,
|
||||
// honestly, just give up. How many languages do you realistically need?
|
||||
|
@ -606,6 +633,7 @@ pub(crate) fn io_error_to_apperr(err: std::io::Error) -> apperr::Error {
|
|||
gle_to_apperr(err.raw_os_error().unwrap_or(0) as u32)
|
||||
}
|
||||
|
||||
/// Formats a platform error code into a human-readable string.
|
||||
pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Result {
|
||||
unsafe {
|
||||
let mut ptr: *mut u8 = null_mut();
|
||||
|
@ -635,6 +663,7 @@ pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Re
|
|||
}
|
||||
}
|
||||
|
||||
/// Checks if the given error is a "file not found" error.
|
||||
pub fn apperr_is_not_found(err: apperr::Error) -> bool {
|
||||
err == gle_to_apperr(Foundation::ERROR_FILE_NOT_FOUND)
|
||||
}
|
||||
|
|
375
src/tui.rs
375
src/tui.rs
File diff suppressed because it is too large
Load diff
|
@ -6,17 +6,24 @@ use crate::document::ReadableDocument;
|
|||
use crate::helpers::{CoordType, Point};
|
||||
use crate::simd::{memchr2, memrchr2};
|
||||
|
||||
/// Stores a position inside a [`ReadableDocument`].
|
||||
///
|
||||
/// The cursor tracks both the absolute byte-offset,
|
||||
/// as well as the position in terminal-related coordinates.
|
||||
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub struct Cursor {
|
||||
/// Offset in bytes within the buffer.
|
||||
pub offset: usize,
|
||||
/// Position in the buffer in lines (.y) and grapheme clusters (.x).
|
||||
///
|
||||
/// Line wrapping has NO influence on this.
|
||||
pub logical_pos: Point,
|
||||
/// Position in the buffer in laid out rows (.y) and columns (.x).
|
||||
///
|
||||
/// Line wrapping has an influence on this.
|
||||
pub visual_pos: Point,
|
||||
/// Horizontal position in visual columns.
|
||||
///
|
||||
/// Line wrapping has NO influence on this and if word wrap is disabled,
|
||||
/// it's identical to `visual_pos.x`. This is useful for calculating tab widths.
|
||||
pub column: CoordType,
|
||||
|
@ -27,6 +34,7 @@ pub struct Cursor {
|
|||
pub wrap_opp: bool,
|
||||
}
|
||||
|
||||
/// Your entrypoint to navigating inside a [`ReadableDocument`].
|
||||
#[derive(Clone)]
|
||||
pub struct MeasurementConfig<'doc> {
|
||||
buffer: &'doc dyn ReadableDocument,
|
||||
|
@ -36,25 +44,41 @@ pub struct MeasurementConfig<'doc> {
|
|||
}
|
||||
|
||||
impl<'doc> MeasurementConfig<'doc> {
|
||||
/// Creates a new [`MeasurementConfig`] for the given document.
|
||||
pub fn new(buffer: &'doc dyn ReadableDocument) -> Self {
|
||||
Self { buffer, tab_size: 8, word_wrap_column: 0, cursor: Default::default() }
|
||||
}
|
||||
|
||||
/// Sets the tab size.
|
||||
///
|
||||
/// Defaults to 8, because that's what a tab in terminals evaluates to.
|
||||
pub fn with_tab_size(mut self, tab_size: CoordType) -> Self {
|
||||
self.tab_size = tab_size.max(1);
|
||||
self
|
||||
}
|
||||
|
||||
/// You want word wrap? Set it here!
|
||||
///
|
||||
/// Defaults to 0, which means no word wrap.
|
||||
pub fn with_word_wrap_column(mut self, word_wrap_column: CoordType) -> Self {
|
||||
self.word_wrap_column = word_wrap_column;
|
||||
self
|
||||
}
|
||||
|
||||
/// Sets the initial cursor to the given position.
|
||||
///
|
||||
/// WARNING: While the code doesn't panic if the cursor is invalid,
|
||||
/// the results will obviously be complete garbage.
|
||||
pub fn with_cursor(mut self, cursor: Cursor) -> Self {
|
||||
self.cursor = cursor;
|
||||
self
|
||||
}
|
||||
|
||||
/// Navigates **forward** to the given absolute offset.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// The cursor position after the navigation.
|
||||
pub fn goto_offset(&mut self, offset: usize) -> Cursor {
|
||||
self.cursor = Self::measure_forward(
|
||||
self.tab_size,
|
||||
|
@ -68,6 +92,13 @@ impl<'doc> MeasurementConfig<'doc> {
|
|||
self.cursor
|
||||
}
|
||||
|
||||
/// Navigates **forward** to the given logical position.
|
||||
///
|
||||
/// Logical positions are in lines and grapheme clusters.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// The cursor position after the navigation.
|
||||
pub fn goto_logical(&mut self, logical_target: Point) -> Cursor {
|
||||
self.cursor = Self::measure_forward(
|
||||
self.tab_size,
|
||||
|
@ -81,6 +112,13 @@ impl<'doc> MeasurementConfig<'doc> {
|
|||
self.cursor
|
||||
}
|
||||
|
||||
/// Navigates **forward** to the given visual position.
|
||||
///
|
||||
/// Visual positions are in laid out rows and columns.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// The cursor position after the navigation.
|
||||
pub fn goto_visual(&mut self, visual_target: Point) -> Cursor {
|
||||
self.cursor = Self::measure_forward(
|
||||
self.tab_size,
|
||||
|
@ -94,6 +132,7 @@ impl<'doc> MeasurementConfig<'doc> {
|
|||
self.cursor
|
||||
}
|
||||
|
||||
/// Returns the current cursor position.
|
||||
pub fn cursor(&self) -> Cursor {
|
||||
self.cursor
|
||||
}
|
||||
|
@ -447,10 +486,33 @@ impl<'doc> MeasurementConfig<'doc> {
|
|||
}
|
||||
}
|
||||
|
||||
// TODO: This code could be optimized by replacing memchr with manual line counting.
|
||||
// If `line_stop` is very far away, we could accumulate newline counts horizontally
|
||||
// in a AVX2 register (= 32 u8 slots). Then, every 256 bytes we compute the horizontal
|
||||
// sum via `_mm256_sad_epu8` yielding us the newline count in the last block.
|
||||
/// Seeks forward to to the given line start.
|
||||
///
|
||||
/// If given a piece of `text`, and assuming you're currently at `offset` which
|
||||
/// is on the logical line `line`, this will seek forward until the logical line
|
||||
/// `line_stop` is reached. For instance, if `line` is 0 and `line_stop` is 2,
|
||||
/// it'll seek forward past 2 line feeds.
|
||||
///
|
||||
/// This function always stops exactly past a line feed
|
||||
/// and thus returns a position at the start of a line.
|
||||
///
|
||||
/// # Warning
|
||||
///
|
||||
/// If the end of `text` is hit before reaching `line_stop`, the function
|
||||
/// will return an offset of `text.len()`, not at the start of a line.
|
||||
///
|
||||
/// # Parameters
|
||||
///
|
||||
/// * `text`: The text to search in.
|
||||
/// * `offset`: The offset to start searching from.
|
||||
/// * `line`: The current line.
|
||||
/// * `line_stop`: The line to stop at.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// A tuple consisting of:
|
||||
/// * The new offset.
|
||||
/// * The line number that was reached.
|
||||
pub fn newlines_forward(
|
||||
text: &[u8],
|
||||
mut offset: usize,
|
||||
|
@ -467,6 +529,13 @@ pub fn newlines_forward(
|
|||
offset = offset.min(len);
|
||||
|
||||
loop {
|
||||
// TODO: This code could be optimized by replacing memchr with manual line counting.
|
||||
//
|
||||
// If `line_stop` is very far away, we could accumulate newline counts horizontally
|
||||
// in a AVX2 register (= 32 u8 slots). Then, every 256 bytes we compute the horizontal
|
||||
// sum via `_mm256_sad_epu8` yielding us the newline count in the last block.
|
||||
//
|
||||
// We could also just use `_mm256_sad_epu8` on each fetch as-is.
|
||||
offset = memchr2(b'\n', b'\n', text, offset);
|
||||
if offset >= len {
|
||||
break;
|
||||
|
@ -482,9 +551,18 @@ pub fn newlines_forward(
|
|||
(offset, line)
|
||||
}
|
||||
|
||||
// Seeks to the start of the given line.
|
||||
// No matter what parameters are given, it only returns an offset at the start of a line.
|
||||
// Put differently, even if `line == line_stop`, it'll seek backward to the line start.
|
||||
/// Seeks backward to the given line start.
|
||||
///
|
||||
/// See [`newlines_forward`] for details.
|
||||
/// This function does almost the same thing, but in reverse.
|
||||
///
|
||||
/// # Warning
|
||||
///
|
||||
/// In addition to the notes in [`newlines_forward`]:
|
||||
///
|
||||
/// No matter what parameters are given, [`newlines_backward`] only returns an
|
||||
/// offset at the start of a line. Put differently, even if `line == line_stop`,
|
||||
/// it'll seek backward to the line start.
|
||||
pub fn newlines_backward(
|
||||
text: &[u8],
|
||||
mut offset: usize,
|
||||
|
@ -506,6 +584,10 @@ pub fn newlines_backward(
|
|||
}
|
||||
}
|
||||
|
||||
/// Returns an offset past a newline.
|
||||
///
|
||||
/// If `offset` is right in front of a newline,
|
||||
/// this will return the offset past said newline.
|
||||
pub fn skip_newline(text: &[u8], mut offset: usize) -> usize {
|
||||
if offset >= text.len() {
|
||||
return offset;
|
||||
|
@ -522,6 +604,7 @@ pub fn skip_newline(text: &[u8], mut offset: usize) -> usize {
|
|||
offset
|
||||
}
|
||||
|
||||
/// Strips a trailing newline from the given text.
|
||||
pub fn strip_newline(mut text: &[u8]) -> &[u8] {
|
||||
// Rust generates surprisingly tight assembly for this.
|
||||
if text.last() == Some(&b'\n') {
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
//! Everything related to Unicode lives here.
|
||||
|
||||
mod measurement;
|
||||
mod tables;
|
||||
mod utf8;
|
||||
|
|
|
@ -1,5 +1,14 @@
|
|||
use std::{hint, iter};
|
||||
|
||||
/// An iterator over UTF-8 encoded characters.
|
||||
///
|
||||
/// This differs from [`std::str::Chars`] in that it works on unsanitized
|
||||
/// byte slices and transparently replaces invalid UTF-8 sequences with U+FFFD.
|
||||
///
|
||||
/// This follows ICU's bitmask approach for `U8_NEXT_OR_FFFD` relatively
|
||||
/// closely. This is important for compatibility, because it implements the
|
||||
/// WHATWG recommendation for UTF8 error recovery. It's also helpful, because
|
||||
/// the excellent folks at ICU have probably spent a lot of time optimizing it.
|
||||
#[derive(Clone, Copy)]
|
||||
pub struct Utf8Chars<'a> {
|
||||
source: &'a [u8],
|
||||
|
@ -7,30 +16,39 @@ pub struct Utf8Chars<'a> {
|
|||
}
|
||||
|
||||
impl<'a> Utf8Chars<'a> {
|
||||
/// Creates a new `Utf8Chars` iterator starting at the given `offset`.
|
||||
pub fn new(source: &'a [u8], offset: usize) -> Self {
|
||||
Self { source, offset }
|
||||
}
|
||||
|
||||
/// Returns the byte slice this iterator was created with.
|
||||
pub fn source(&self) -> &'a [u8] {
|
||||
self.source
|
||||
}
|
||||
|
||||
/// Checks if the source is empty.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.source.is_empty()
|
||||
}
|
||||
|
||||
/// Returns the length of the source.
|
||||
pub fn len(&self) -> usize {
|
||||
self.source.len()
|
||||
}
|
||||
|
||||
/// Returns the current offset in the byte slice.
|
||||
///
|
||||
/// This will be past the last returned character.
|
||||
pub fn offset(&self) -> usize {
|
||||
self.offset
|
||||
}
|
||||
|
||||
/// Sets the offset to continue iterating from.
|
||||
pub fn seek(&mut self, offset: usize) {
|
||||
self.offset = offset;
|
||||
}
|
||||
|
||||
/// Returns true if `next` will return another character.
|
||||
pub fn has_next(&self) -> bool {
|
||||
self.offset < self.source.len()
|
||||
}
|
||||
|
@ -39,9 +57,6 @@ impl<'a> Utf8Chars<'a> {
|
|||
// performance actually suffers when this gets inlined.
|
||||
#[cold]
|
||||
fn next_slow(&mut self, c: u8) -> char {
|
||||
// See: https://datatracker.ietf.org/doc/html/rfc3629
|
||||
// as well as ICU's `utf8.h` for the bitmask approach.
|
||||
|
||||
if self.offset >= self.source.len() {
|
||||
return Self::fffd();
|
||||
}
|
||||
|
@ -114,12 +129,10 @@ impl<'a> Utf8Chars<'a> {
|
|||
// The trail byte is the index and the lead byte mask is the value.
|
||||
// This is because the split at 0x90 requires more bits than fit into an u8.
|
||||
const TRAIL1_LEAD_BITS: [u8; 16] = [
|
||||
// +------ 0xF4 lead
|
||||
// |+----- 0xF3 lead
|
||||
// ||+---- 0xF2 lead
|
||||
// |||+--- 0xF1 lead
|
||||
// ||||+-- 0xF0 lead
|
||||
// vvvvv
|
||||
// --------- 0xF4 lead
|
||||
// | ...
|
||||
// | +---- 0xF0 lead
|
||||
// v v
|
||||
0b_00000, //
|
||||
0b_00000, //
|
||||
0b_00000, //
|
||||
|
@ -143,6 +156,8 @@ impl<'a> Utf8Chars<'a> {
|
|||
cp &= !0xF0;
|
||||
|
||||
// Now we can verify if it's actually <= 0xF4.
|
||||
// Curiously, this if condition does a lot of heavy lifting for
|
||||
// performance (+13%). I think it's just a coincidence though.
|
||||
if cp > 4 {
|
||||
return Self::fffd();
|
||||
}
|
||||
|
@ -191,7 +206,8 @@ impl<'a> Utf8Chars<'a> {
|
|||
}
|
||||
}
|
||||
|
||||
// Improves performance by ~5% and reduces code size.
|
||||
// This simultaneously serves as a `cold_path` marker.
|
||||
// It improves performance by ~5% and reduces code size.
|
||||
#[cold]
|
||||
#[inline(always)]
|
||||
fn fffd() -> char {
|
||||
|
@ -202,8 +218,6 @@ impl<'a> Utf8Chars<'a> {
|
|||
impl Iterator for Utf8Chars<'_> {
|
||||
type Item = char;
|
||||
|
||||
// At opt-level="s", this function doesn't get inlined,
|
||||
// but performance greatly suffers in that case.
|
||||
#[inline]
|
||||
fn next(&mut self) -> Option<Self::Item> {
|
||||
if self.offset >= self.source.len() {
|
||||
|
|
38
src/vt.rs
38
src/vt.rs
|
@ -1,19 +1,38 @@
|
|||
//! Our VT parser.
|
||||
|
||||
use std::{mem, time};
|
||||
|
||||
use crate::simd::memchr2;
|
||||
|
||||
/// The parser produces these tokens.
|
||||
pub enum Token<'parser, 'input> {
|
||||
/// A bunch of text. Doesn't contain any control characters.
|
||||
Text(&'input str),
|
||||
/// A single control character, like backspace or return.
|
||||
Ctrl(char),
|
||||
/// We encountered `ESC x` and this contains `x`.
|
||||
Esc(char),
|
||||
/// We encountered `ESC O x` and this contains `x`.
|
||||
SS3(char),
|
||||
/// A CSI sequence started with `ESC [`.
|
||||
///
|
||||
/// They are the most common escape sequences. See [`Csi`].
|
||||
Csi(&'parser Csi),
|
||||
/// An OSC sequence started with `ESC ]`.
|
||||
///
|
||||
/// The sequence may be split up into multiple tokens if the input
|
||||
/// is given in chunks. This is indicated by the `partial` field.
|
||||
Osc { data: &'input str, partial: bool },
|
||||
/// An DCS sequence started with `ESC P`.
|
||||
///
|
||||
/// The sequence may be split up into multiple tokens if the input
|
||||
/// is given in chunks. This is indicated by the `partial` field.
|
||||
Dcs { data: &'input str, partial: bool },
|
||||
}
|
||||
|
||||
/// Stores the state of the parser.
|
||||
#[derive(Clone, Copy)]
|
||||
pub enum State {
|
||||
enum State {
|
||||
Ground,
|
||||
Esc,
|
||||
Ss3,
|
||||
|
@ -24,10 +43,20 @@ pub enum State {
|
|||
DcsEsc,
|
||||
}
|
||||
|
||||
/// A single CSI sequence, parsed for your convenience.
|
||||
pub struct Csi {
|
||||
/// The parameters of the CSI sequence.
|
||||
pub params: [u16; 32],
|
||||
/// The number of parameters stored in [`Csi::params`].
|
||||
pub param_count: usize,
|
||||
/// The private byte, if any. `0` if none.
|
||||
///
|
||||
/// The private byte is the first character right after the
|
||||
/// `ESC [` sequence. It is usually a `?` or `<`.
|
||||
pub private_byte: char,
|
||||
/// The final byte of the CSI sequence.
|
||||
///
|
||||
/// This is the last character of the sequence, e.g. `m` or `H`.
|
||||
pub final_byte: char,
|
||||
}
|
||||
|
||||
|
@ -73,6 +102,9 @@ impl Parser {
|
|||
}
|
||||
}
|
||||
|
||||
/// An iterator that parses VT sequences into [`Token`]s.
|
||||
///
|
||||
/// Can't implement [`Iterator`], because this is a "lending iterator".
|
||||
pub struct Stream<'parser, 'input> {
|
||||
parser: &'parser mut Parser,
|
||||
input: &'input str,
|
||||
|
@ -80,10 +112,12 @@ pub struct Stream<'parser, 'input> {
|
|||
}
|
||||
|
||||
impl<'parser, 'input> Stream<'parser, 'input> {
|
||||
/// Returns the input that is being parsed.
|
||||
pub fn input(&self) -> &'input str {
|
||||
self.input
|
||||
}
|
||||
|
||||
/// Returns the current parser offset.
|
||||
pub fn offset(&self) -> usize {
|
||||
self.off
|
||||
}
|
||||
|
@ -99,8 +133,6 @@ impl<'parser, 'input> Stream<'parser, 'input> {
|
|||
}
|
||||
|
||||
/// Parses the next VT sequence from the previously given input.
|
||||
///
|
||||
/// Can't implement Iterator, because this is a "lending iterator".
|
||||
#[allow(clippy::should_implement_trait)]
|
||||
pub fn next(&mut self) -> Option<Token<'parser, 'input>> {
|
||||
// I don't know how to tell Rust that `self.parser` and its lifetime
|
||||
|
|
15
tools/grapheme-table-gen/README.md
Normal file
15
tools/grapheme-table-gen/README.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
# Grapheme Table Generator
|
||||
|
||||
This tool processes Unicode Character Database (UCD) XML files to generate efficient, multi-stage trie lookup tables for properties relevant to terminal applications:
|
||||
* Grapheme cluster breaking rules
|
||||
* Line breaking rules (optional)
|
||||
* Character width properties
|
||||
|
||||
## Usage
|
||||
|
||||
* Download [ucd.nounihan.grouped.zip](https://www.unicode.org/Public/UCD/latest/ucdxml/ucd.nounihan.grouped.zip)
|
||||
* Run some equivalent of:
|
||||
```sh
|
||||
grapheme-table-gen --lang=rust --extended --no-ambiguous --line-breaks path/to/ucd.nounihan.grouped.xml
|
||||
```
|
||||
* Place the result in `src/unicode/tables.rs`
|
Loading…
Add table
Add a link
Reference in a new issue