A simple implementation for a flexible-sized dictionary of key-value pairs.


§1. Storage. There's nothing fancy here. A "dictionary" is a hash table allowing reasonably efficient lookup: it's a correspondence between "keys", which are texts, and pointers to some external structure. Often these will also be texts, but not necessarily.

    typedef struct dictionary {
        int textual;  values are texts?
        int no_entries;  total number of key-value pairs currently stored here
        int hash_table_size;  size of array...
        struct dict_entry *hash_table;  ...of linked lists of dictionary entries
        MEMORY_MANAGEMENT
    } dictionary;

    typedef struct dict_entry {
        int vacant;  a "vacant" entry is not currently used to store a k-v pair
        struct text_stream *key;  for non-vacant entries only: the key text
        void *value;  for non-vacant entries only: the value, some kind of pointer
        struct dict_entry *next_in_entry;
    } dict_entry;

The structure dictionary is private to this section.

The structure dict_entry is accessed in 5/ee, 8/bdfw and here.

§2. Creation. Dictionaries can have arbitrary size, in that they expand as needed, but for efficiency's sake the caller can set them up with an initial size of her choice.

    dictionary *Dictionaries::new(int S, int textual) {
        if (S < 2) internal_error("dictionary too small");
        dictionary *D = CREATE(dictionary);
        D->textual = textual;
        D->hash_table_size = S;
        D->no_entries = 0;
        D->hash_table = Memory::I7_calloc(S, sizeof(dict_entry), DICTIONARY_MREASON);
        for (int i=0; i<S; i++) {
            D->hash_table[i].vacant = TRUE;
            D->hash_table[i].value = NULL;
            D->hash_table[i].next_in_entry = NULL;
        }
        return D;
    }

The function Dictionaries::new is used in §7.2, 3/cla (§5), 4/sm (§26.1).

§3. Logging.

    void Dictionaries::log(OUTPUT_STREAM, dictionary *D) {
        WRITE("Dictionary:\n", (unsigned int) D); INDENT;
        for (int i=0; i<D->hash_table_size; i++) {
            WRITE("Slot %02d:", i);
            for (dict_entry *E = &(D->hash_table[i]); E; E = E->next_in_entry)
                if (E->vacant) WRITE(" vacant");
                else if (D->textual) WRITE(" %S='%S'", E->key, E->value);
                else WRITE(" %S=%08x", E->key, (unsigned int) E->value);
            WRITE("\n");
        }
        OUTDENT;
    }

The function Dictionaries::log appears nowhere else.

§4. Hashing. The whole point of a hash table is that it crudely sorts the contents by a rough indication of the key values. This crude indication is the hash value, calculated here. If there are N slots in the dictionary table, this tells us which slot (from 0 to N-1) a given key value belongs in.

    int Dictionaries::hash(text_stream *K, int N) {
        unsigned int hash = 0;
        LOOP_THROUGH_TEXT(P, K)
            hash = 16339*hash + ((unsigned int) Str::get(P));
        return (int) (hash % ((unsigned int) N));
    }

The function Dictionaries::hash is used in §7.3.

§5. Create, find, destroy. These three fundamental operations locate the dictionary entry structure for a given key value, and then do something to/with it. Note that these pointers remain valid only until somebody writes a new value into the dictionary; so be careful if thread safety's an issue.

    dict_entry *Dictionaries::find(dictionary *D, text_stream *K) {
        return Dictionaries::find_p(D, K, 0);
    }
    dict_entry *Dictionaries::create(dictionary *D, text_stream *K) {
        return Dictionaries::find_p(D, K, 1);
    }
    void Dictionaries::destroy(dictionary *D, text_stream *K) {
        Dictionaries::find_p(D, K, -1);
    }

The function Dictionaries::find is used in §6, §8, §10, 3/cla (§13).

The function Dictionaries::create is used in §6, §9, 3/cla (§5).

The function Dictionaries::destroy is used in §6.

§6. A nuisance we have to live with is that we often want to express the key as wide text (so that we can use literals like L"my-key") instead of text streams. So we also offer versions suffixed _literal:

    dict_entry *Dictionaries::find_literal(dictionary *D, wchar_t *lit) {
        TEMPORARY_TEXT(K);
        WRITE_TO(K, "%w", lit);
        dict_entry *E = Dictionaries::find(D, K);
        DISCARD_TEXT(K);
        return E;
    }
    dict_entry *Dictionaries::create_literal(dictionary *D, wchar_t *lit) {
        TEMPORARY_TEXT(K);
        WRITE_TO(K, "%w", lit);
        dict_entry *E = Dictionaries::create(D, K);
        DISCARD_TEXT(K);
        return E;
    }
    void Dictionaries::destroy_literal(dictionary *D, wchar_t *lit) {
        TEMPORARY_TEXT(K);
        WRITE_TO(K, "%w", lit);
        Dictionaries::destroy(D, K);
        DISCARD_TEXT(K);
    }

The function Dictionaries::find_literal is used in §8, §10.

The function Dictionaries::create_literal is used in §9, 4/sm (§26.1).

The function Dictionaries::destroy_literal appears nowhere else.

§7. So, then, find an entry (if change is 0), create it (if +1) or delete it (if -1).

    dict_entry *Dictionaries::find_p(dictionary *D, text_stream *K, int change) {
        if (D == NULL) <Handle the null dictionary 7.1>;
        if (change == 1) <Expand the dictionary if necessary 7.2>;
        <Work within the existing dictionary 7.3>;
    }

The function Dictionaries::find_p is used in §5, §7.2.

§7.1. It's legal to perform a find on the null dictionary: the answer's always "no".

<Handle the null dictionary 7.1> =

        if (change == 0) return NULL;
        internal_error("tried to create or destroy in a null dictionary");

This code is used in §7.

§7.2. For speed, our policy is that the hash table should have roughly the same number of slots as there are entries in the dictionary; that way, each slot will have an average of one or fewer entries. When we exceed this ideal population, we double the dictionary's capacity.

<Expand the dictionary if necessary 7.2> =

        if (D->no_entries > D->hash_table_size) {
            dictionary *D2 = Dictionaries::new(2*D->hash_table_size, D->textual);
            for (int i=0; i<D->hash_table_size; i++)
                for (dict_entry *E = &(D->hash_table[i]); E; E = E->next_in_entry)
                    if (E->vacant == FALSE) {
                        dict_entry *E2 = Dictionaries::find_p(D2, E->key, 1);
                        E2->value = E->value;
                    }
            Memory::I7_free(D->hash_table, DICTIONARY_MREASON,
                D->hash_table_size*((int) sizeof(dict_entry)));
            D->hash_table_size = D2->hash_table_size;
            D->hash_table = D2->hash_table;
        }

This code is used in §7.

§7.3. <Work within the existing dictionary 7.3> =

        dict_entry *last_E = NULL;
        for (dict_entry *E = &(D->hash_table[Dictionaries::hash(K, D->hash_table_size)]);
            E; E = E->next_in_entry) {
            last_E = E;
            if (E->vacant) {
                if (change == 1) { <Make E the new entry 7.3.1>; return E; }
            } else {
                if (Str::eq(K, E->key)) {
                    if (change == -1) <Destroy the E entry 7.3.2>;
                    return E;
                }
            }
        }
        if (change == 1) {
            dict_entry *E = CREATE(dict_entry);
            <Make E the new entry 7.3.1>;
            last_E->next_in_entry = E;
            return E;
        }
        return NULL;

This code is used in §7.

§7.3.1. When creating text values, we want them to be empty strings rather than null strings, so that printing to them will work.

<Make E the new entry 7.3.1> =

        E->vacant = FALSE;
        if (D->textual) E->value = Str::new(); else E->value = NULL;
        E->key = Str::duplicate(K);
        D->no_entries++;

This code is used in §7.3 (twice).

§7.3.2. When deleting text values, we close them first, to give back the memory. Careful: it would not be thread-safe to allow different threads to use the same dictionary if deletions are a possibility.

<Destroy the E entry 7.3.2> =

        E->vacant = TRUE; D->no_entries--;
        if ((D->textual) && (E->value)) Str::dispose_of(E->value);
        E->value = NULL;

This code is used in §7.3.

§8. Accessing entries. Eventually we're going to want the value. In principle we could be storing values which are arbitrary pointers, so we have to use void pointers:

    void *Dictionaries::read_value(dictionary *D, text_stream *key) {
        if (D == NULL) return NULL;
        if (D->textual) internal_error("textual dictionary accessed as pointy");
        dict_entry *E = Dictionaries::find(D, key);
        if (E == NULL) internal_error("read null dictionary entry");
        if (E->vacant) internal_error("read vacant dictionary entry");
        return E->value;
    }
    void *Dictionaries::read_value_literal(dictionary *D, wchar_t *key) {
        if (D == NULL) return NULL;
        if (D->textual) internal_error("textual dictionary accessed as pointy");
        dict_entry *E = Dictionaries::find_literal(D, key);
        if (E == NULL) internal_error("read null dictionary entry");
        if (E->vacant) internal_error("read vacant dictionary entry");
        return E->value;
    }

    void Dictionaries::write_value(dictionary *D, text_stream *key, void *val) {
        if (D == NULL) internal_error("wrote to null dictionary");
        if (D->textual) internal_error("textual dictionary accessed as pointy");
        dict_entry *E = Dictionaries::find(D, key);
        if (E == NULL) internal_error("wrote null dictionary entry");
        if (E->vacant) internal_error("wrote vacant dictionary entry");
        E->value = val;
    }
    void Dictionaries::write_value_literal(dictionary *D, wchar_t *key, void *val) {
        if (D == NULL) internal_error("wrote to null dictionary");
        if (D->textual) internal_error("textual dictionary accessed as pointy");
        dict_entry *E = Dictionaries::find_literal(D, key);
        if (E == NULL) internal_error("wrote null dictionary entry");
        if (E->vacant) internal_error("wrote vacant dictionary entry");
        E->value = val;
    }

The function Dictionaries::read_value is used in 3/cla (§13).

The function Dictionaries::read_value_literal appears nowhere else.

The function Dictionaries::write_value is used in 3/cla (§5).

The function Dictionaries::write_value_literal appears nowhere else.

§9. But the commonest use case is that the dictionary stores texts as values, so we provide convenient wrappers with the correct C types.

    text_stream *Dictionaries::create_text(dictionary *D, text_stream *key) {
        if (D == NULL) internal_error("wrote to null dictionary");
        if (D->textual == FALSE) internal_error("pointy dictionary accessed as textual");
        dict_entry *E = Dictionaries::create(D, key);
        return (text_stream *) E->value;
    }
    text_stream *Dictionaries::create_text_literal(dictionary *D, wchar_t *lit) {
        if (D == NULL) internal_error("wrote to null dictionary");
        if (D->textual == FALSE) internal_error("pointy dictionary accessed as textual");
        dict_entry *E = Dictionaries::create_literal(D, lit);
        return (text_stream *) E->value;
    }

The function Dictionaries::create_text appears nowhere else.

The function Dictionaries::create_text_literal appears nowhere else.

§10. We only need a read operation, because the caller can write to the dictionary entry by reading the text pointer and then using WRITE_TO.

    text_stream *Dictionaries::get_text(dictionary *D, text_stream *key) {
        if (D == NULL) return NULL;
        if (D->textual == FALSE) internal_error("pointy dictionary accessed as textual");
        dict_entry *E = Dictionaries::find(D, key);
        if (E == NULL) return NULL;
        return (text_stream *) E->value;
    }

    text_stream *Dictionaries::get_text_literal(dictionary *D, wchar_t *lit) {
        if (D == NULL) return NULL;
        if (D->textual == FALSE) internal_error("pointy dictionary accessed as textual");
        dict_entry *E = Dictionaries::find_literal(D, lit);
        if (E == NULL) return NULL;
        return (text_stream *) E->value;
    }

The function Dictionaries::get_text appears nowhere else.

The function Dictionaries::get_text_literal is used in 4/sm (§26.1).

§11. Disposal. If a dictionary was only needed temporarily then we should dispose of it and free the memory when done:

    void Dictionaries::dispose_of(dictionary *D) {
        if (D->textual)
            for (int i=0; i<D->hash_table_size; i++)
                for (dict_entry *E = &(D->hash_table[i]); E; E = E->next_in_entry)
                    if (E->vacant == FALSE)
                        Str::dispose_of(E->value);
        Memory::I7_free(D->hash_table, DICTIONARY_MREASON, D->hash_table_size*((int) sizeof(dict_entry)));
        D->hash_table = NULL;
    }

The function Dictionaries::dispose_of appears nowhere else.