* internals.texi (C Integer Types): New section.

This follows up and records an email in <http://lists.gnu.org/archive/html/emacs-devel/2012-07/msg00496.html>.
2012-12-10 16:13:44 -08:00 · 2012-12-10 16:13:44 -08:00 · d92d9c9501
commit d92d9c9501
parent ed6f2cd47f
2 changed files with 94 additions and 0 deletions
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@ -1,3 +1,9 @@
+2012-12-11  Paul Eggert  <eggert@cs.ucla.edu>
+
+	* internals.texi (C Integer Types): New section.
+	This follows up and records an email in
+	<http://lists.gnu.org/archive/html/emacs-devel/2012-07/msg00496.html>.
+
 2012-12-10  Stefan Monnier  <monnier@iro.umontreal.ca>

 	* control.texi (Pattern maching case statement): New node.
--- a/doc/lispref/internals.texi
+++ b/doc/lispref/internals.texi
@ -16,6 +16,7 @@ internal aspects of GNU Emacs that may be of interest to C programmers.
 * Memory Usage::        Info about total size of Lisp objects made so far.
 * Writing Emacs Primitives::   Writing C code for Emacs.
 * Object Internals::    Data formats of buffers, windows, processes.
+* C Integer Types::     How C integer types are used inside Emacs.
@end menu

@node Building Emacs
@ -1531,4 +1532,91 @@ Symbol indicating the type of process: @code{real}, @code{network},

@end table

+@node C Integer Types
+@section C Integer Types
+@cindex integer types (C programming language)
+
+Here are some guidelines for use of integer types in the Emacs C
+source code.  These guidelines sometimes give competing advice; common
+sense is advised.
+
+@itemize @bullet
+@item
+Avoid arbitrary limits.  For example, avoid @code{int len = strlen
+(s);} unless the length of @code{s} is required for other reasons to
+fit in @code{int} range.
+
+@item
+Do not assume that signed integer arithmetic wraps around on overflow.
+This is no longer true of Emacs porting targets: signed integer
+overflow has undefined behavior in practice, and can dump core or
+even cause earlier or later code to behave ``illogically''.  Unsigned
+overflow does wrap around reliably, modulo a power of two.
+
+@item
+Prefer signed types to unsigned, as code gets confusing when signed
+and unsigned types are combined.  Many other guidelines assume that
+types are signed; in the rarer cases where unsigned types are needed,
+similar advice may apply to the unsigned counterparts (e.g.,
+@code{size_t} instead of @code{ptrdiff_t}, or @code{uintptr_t} instead
+of @code{intptr_t}).
+
+@item
+Prefer @code{int} for Emacs character codes, in the range 0 ..@: 0x3FFFFF.
+
+@item
+Prefer @code{ptrdiff_t} for sizes, i.e., for integers bounded by the
+maximum size of any individual C object or by the maximum number of
+elements in any C array.  This is part of Emacs's general preference
+for signed types.  Using @code{ptrdiff_t} limits objects to
+@code{PTRDIFF_MAX} bytes, but larger objects would cause trouble
+anyway since they would break pointer subtraction, so this does not
+impose an arbitrary limit.
+
+@item
+Prefer @code{intptr_t} for internal representations of pointers, or
+for integers bounded only by the number of objects that can exist at
+any given time or by the total number of bytes that can be allocated.
+Currently Emacs sometimes uses other types when @code{intptr_t} would
+be better; fixing this is lower priority, as the code works as-is on
+Emacs's current porting targets.
+
+@item
+Prefer the Emacs-defined type @code{EMACS_INT} for representing values
+converted to or from Emacs Lisp fixnums, as fixnum arithmetic is based
+on @code{EMACS_INT}.
+
+@item
+When representing a system value (such as a file size or a count of
+seconds since the Epoch), prefer the corresponding system type (e.g.,
+@code{off_t}, @code{time_t}).  Do not assume that a system type is
+signed, unless this assumption is known to be safe.  For example,
+although @code{off_t} is always signed, @code{time_t} need not be.
+
+@item
+Prefer the Emacs-defined type @code{printmax_t} for representing
+values that might be any signed integer value that can be printed,
+using a @code{printf}-family function.
+
+@item
+Prefer @code{intmax_t} for representing values that might be any
+signed integer value.
+
+@item
+In bitfields, prefer @code{unsigned int} or @code{signed int} to
+@code{int}, as @code{int} is less portable: it might be signed, and
+might not be.  Single-bit bit fields are invariably @code{unsigned
+int} so that their values are 0 and 1.
+
+@item
+In C, Emacs commonly uses @code{bool}, 1, and 0 for boolean values.
+Using @code{bool} for booleans can make programs easier to read and a
+bit faster than using @code{int}.  Although it is also OK to use
+@code{int}, this older style is gradually being phased out.  When
+using @code{bool}, respect the limitations of the replacement
+implementation of @code{bool}, as documented in the source file
+@file{lib/stdbool.in.h}, so that Emacs remains portable to pre-C99
+platforms.
+@end itemize
+
@c FIXME Mention src/globals.h somewhere in this file?