練習46：三叉搜索樹

原文：Exercise 46: Ternary Search Tree

譯者：飛龍

我打算向你介紹的最后一種數(shù)據(jù)結(jié)構就是三叉搜索樹（TSTree），它和BSTree很像，除了它有三個分支，low、equal和high。它的用法和BStree以及Hashmap基本相同，用于儲存鍵值對的數(shù)據(jù)，但是它通過鍵中的獨立字符來控制。這使得TSTree具有一些BStree和Hashmap不具備的功能。

TSTree的工作方式是，每個鍵都是字符串，根據(jù)字符串中字符的等性，通過構建或者遍歷一棵樹來進行插入。首先由根節(jié)點開始，觀察每個節(jié)點的字符，如果小于、等于或大于則去往相應的方向。你可以參考這個頭文件：

#ifndef _lcthw_TSTree_h
#define _lctwh_TSTree_h

#include <stdlib.h>
#include <lcthw/darray.h>

typedef struct TSTree {
    char splitchar;
    struct TSTree *low;
    struct TSTree *equal;
    struct TSTree *high;
    void *value;
} TSTree;

void *TSTree_search(TSTree *root, const char *key, size_t len);

void *TSTree_search_prefix(TSTree *root, const char *key, size_t len);

typedef void (*TSTree_traverse_cb)(void *value, void *data);

TSTree *TSTree_insert(TSTree *node, const char *key, size_t len, void *value);

void TSTree_traverse(TSTree *node, TSTree_traverse_cb cb, void *data);

void TSTree_destroy(TSTree *root);

#endif

TSTree擁有下列成員：

splitchar

樹中該節(jié)點的字符。

low

小于splitchar的分支。

equal

等于splitchar的分支。

high

大于splitchar的分支。

value

這個節(jié)點上符合當前splitchar的值的集合。

你可以看到這個實現(xiàn)中含有下列操作：

為特定key尋找值的典型操作。

search_prefix

尋找第一個以key為前綴的值，這是你不能輕易使用BSTree 或 Hashmap 完成的操作。

insert

將key根據(jù)每個字符拆分，并把它插入到樹中。

traverse

遍歷整顆樹，使你能夠收集或分析所包含的所有鍵和值。

唯一缺少的操作就是TSTree_delete，這是因為它是一個開銷很大的操作，比BSTree_delete大得多。當我使用TSTree結(jié)構時，我將它們視為常量數(shù)據(jù)，我打算遍歷許多次，但是永遠不會移除任何東西。它們對于這樣的操作會很快，但是不適于需要快速插入或刪除的情況。為此我會使用Hashmap因為它由于BSTree和TSTree。

TSTree的實現(xiàn)非常簡單，但是第一次可能難以理解。我會在你讀完之后拆分它。

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <lcthw/dbg.h>
#include <lcthw/tstree.h>

static inline TSTree *TSTree_insert_base(TSTree *root, TSTree *node,
        const char *key, size_t len, void *value)
{
    if(node == NULL) {
        node = (TSTree *) calloc(1, sizeof(TSTree));

        if(root == NULL) {
            root = node;
        }

        node->splitchar = *key;
    }

    if(*key < node->splitchar) {
        node->low = TSTree_insert_base(root, node->low, key, len, value);
    } else if(*key == node->splitchar) {
        if(len > 1) {
            node->equal = TSTree_insert_base(root, node->equal, key+1, len - 1, value);
        } else {
            assert(node->value == NULL && "Duplicate insert into tst.");
            node->value = value;
        }
    } else {
        node->high = TSTree_insert_base(root, node->high, key, len, value);
    }

    return node;
}

TSTree *TSTree_insert(TSTree *node, const char *key, size_t len, void *value)
{
    return TSTree_insert_base(node, node, key, len, value);
}

void *TSTree_search(TSTree *root, const char *key, size_t len)
{
    TSTree *node = root;
    size_t i = 0;

    while(i < len && node) {
        if(key[i] < node->splitchar) {
            node = node->low;
        } else if(key[i] == node->splitchar) {
            i++;
            if(i < len) node = node->equal;
        } else {
            node = node->high;
        }
    }

    if(node) {
        return node->value;
    } else {
        return NULL;
    }
}

void *TSTree_search_prefix(TSTree *root, const char *key, size_t len)
{
    if(len == 0) return NULL;

    TSTree *node = root;
    TSTree *last = NULL;
    size_t i = 0;

    while(i < len && node) {
        if(key[i] < node->splitchar) {
            node = node->low;
        } else if(key[i] == node->splitchar) {
            i++;
            if(i < len) {
                if(node->value) last = node;
                node = node->equal;
            }
        } else {
            node = node->high;
        }
    }

    node = node ? node : last;

    // traverse until we find the first value in the equal chain
    // this is then the first node with this prefix
    while(node && !node->value) {
        node = node->equal;
    }

    return node ? node->value : NULL;
}

void TSTree_traverse(TSTree *node, TSTree_traverse_cb cb, void *data)
{
    if(!node) return;

    if(node->low) TSTree_traverse(node->low, cb, data);

    if(node->equal) {
        TSTree_traverse(node->equal, cb, data);
    }

    if(node->high) TSTree_traverse(node->high, cb, data);

    if(node->value) cb(node->value, data);
}

void TSTree_destroy(TSTree *node)
{
    if(node == NULL) return;

    if(node->low) TSTree_destroy(node->low);

    if(node->equal) {
        TSTree_destroy(node->equal);
    }

    if(node->high) TSTree_destroy(node->high);

    free(node);
}

對于TSTree_insert，我使用了相同模式的遞歸結(jié)構，其中我創(chuàng)建了一個小型函數(shù)，它調(diào)用真正的遞歸函數(shù)。我對此并不做任何檢查，但是你應該為之添加通常的防御性編程策略。要記住的一件事，就是它使用了一些不同的設計，這里并沒有單獨的TSTree_create函數(shù)，如果你將node傳入為NULL，它會新建一個，然后返回最終的值。

這意味著我需要為你分解TSTree_insert_base，使你理解插入操作。

tstree.c:10-18

像我提到的那樣，如果函數(shù)接收到NULL，我需要創(chuàng)建節(jié)點，并且將*key（當前字符）賦值給它。這用于當我插入鍵時來構建樹。

tstree.c:20-21

當*key小于splitchar時，選擇low分支。

tstree.c:22

如果splitchar相等，我就要進一步確定等性。這會在我剛剛創(chuàng)建這個節(jié)點時發(fā)生，所以這里我會構建這棵樹。

tstree.c:23-24

仍然有字符串需要處理，所以向下遞歸equal分支，并且移動到下一個*key字符。

tstree.c:26-27

這是最后一個字符的情況，所以我將值設置好。我編寫了一個assert來避免重復。

tstree.c:29-30

最后的情況是*key大于splitchar，所以我需要向下遞歸high分支。

這個數(shù)據(jù)結(jié)構的key實際上帶有一些特性，我只會在splitchar相等時遞增所要分析的字符。其它兩種情況我只會繼續(xù)遍歷整個樹，直到碰到了相等的字符，我才會遞歸處理下一個字符。這一操作使它對于找不到鍵的情況是非?？斓摹Ｎ铱梢詡魅胍粋€不存在的鍵，簡單地遍歷一些high和low節(jié)點，直到我碰到了末尾并且知道這個鍵不存在。我并不需要處理鍵的每個字符，或者樹的每個節(jié)點。

一旦你理解了這些，之后來分析TSTree_search如何工作：

tstree.c:46

我并不需要遞歸處理整棵樹，只需要使用使用while循環(huán)和當前的node節(jié)點。

tstree.c:47-48

如果當前字符小于節(jié)點中的splitchar，則選擇low分支。

tstree.c:49-51

如果相等，自增i并且選擇equal分支，只要不是最后一個字符。這就是if(i < len)所做的，使我不會越過最后的value。

tstree.c:52-53

否則我會選擇high分支，由于當前字符更大。

tstree.c:57-61

循環(huán)結(jié)束后如果node不為空，那么返回它的value，否則返回NULL。

這并不難以理解，并且你可以看到TSTree_search_prefix函數(shù)用了幾乎相同的算法。唯一的不同就是我并不試著尋找精確的匹配，而是可找到的最長前綴。我在相等時跟蹤last節(jié)點來實現(xiàn)它，并且在搜索循環(huán)結(jié)束之后，遍歷這個節(jié)點直到發(fā)現(xiàn)value。

觀察TSTree_search_prefix，你就會開始明白TSTree相對BSTree 和 Hashmap在查找操作上的另一個優(yōu)點。給定一個長度為X的鍵，你可以在X步內(nèi)找到任何鍵，但是也可以在X步加上額外的N步內(nèi)找到第一個前綴，取決于匹配的鍵有多長。如果樹中最長的鍵是十個字符，那么你就可以在10步之內(nèi)找到任意的前綴。更重要的是，你可以通過對鍵的每個字符只比較一次來實現(xiàn)。

相比之下，使用BSTree執(zhí)行相同操作，你需要在BSTree的每一個可能匹配的節(jié)點中檢查兩個字符串是否有共同的前綴。這對于尋找鍵，或者檢查鍵是否存在（TSTree_search）是相同的。你需要將每個字符與BSTree中的大多數(shù)字符對比，來確認是否匹配。

Hashamp對于尋找前綴更加糟糕，因為你不能夠僅僅計算前綴的哈希值。你基本上不能高效在Hashmap中實現(xiàn)它，除非數(shù)據(jù)類似URL可以被解析。即使這樣你還是需要遍歷Hashmap的所有節(jié)點。

譯者注：二叉樹和三叉樹在搜索時都是走其中的一支，但由于二叉樹中每個節(jié)點儲存字符串，而三叉樹儲存的是字符。所以三叉樹的整個搜索過程相當于一次字符串比較，而二叉樹的每個節(jié)點都需要一次字符串比較。三叉樹堆疊儲存字符串使搜索起來更方便。

至于哈希表，由于字符串整體和前綴計算出來的哈希值差別很大，所以按前綴搜索時，哈希的優(yōu)勢完全失效，所以只能改為暴力搜索，效果比二叉樹還要差。

最后的兩個函數(shù)應該易于分析，因為它們是典型的遍歷和銷毀操作，你已經(jīng)在其它數(shù)據(jù)結(jié)構中看到過了。

最后，我編寫了簡單的單元測試，來確保我所做的全部東西正確。

#include "minunit.h"
#include <lcthw/tstree.h>
#include <string.h>
#include <assert.h>
#include <lcthw/bstrlib.h>


TSTree *node = NULL;
char *valueA = "VALUEA";
char *valueB = "VALUEB";
char *value2 = "VALUE2";
char *value4 = "VALUE4";
char *reverse = "VALUER";
int traverse_count = 0;

struct tagbstring test1 = bsStatic("TEST");
struct tagbstring test2 = bsStatic("TEST2");
struct tagbstring test3 = bsStatic("TSET");
struct tagbstring test4 = bsStatic("T");

char *test_insert()
{
    node = TSTree_insert(node, bdata(&test1), blength(&test1), valueA);
    mu_assert(node != NULL, "Failed to insert into tst.");

    node = TSTree_insert(node, bdata(&test2), blength(&test2), value2);
    mu_assert(node != NULL, "Failed to insert into tst with second name.");

    node = TSTree_insert(node, bdata(&test3), blength(&test3), reverse);
    mu_assert(node != NULL, "Failed to insert into tst with reverse name.");

    node = TSTree_insert(node, bdata(&test4), blength(&test4), value4);
    mu_assert(node != NULL, "Failed to insert into tst with second name.");

    return NULL;
}

char *test_search_exact()
{
    // tst returns the last one inserted
    void *res = TSTree_search(node, bdata(&test1), blength(&test1));
    mu_assert(res == valueA, "Got the wrong value back, should get A not B.");

    // tst does not find if not exact
    res = TSTree_search(node, "TESTNO", strlen("TESTNO"));
    mu_assert(res == NULL, "Should not find anything.");

    return NULL;
}

char *test_search_prefix()
{
    void *res = TSTree_search_prefix(node, bdata(&test1), blength(&test1));
    debug("result: %p, expected: %p", res, valueA);
    mu_assert(res == valueA, "Got wrong valueA by prefix.");

    res = TSTree_search_prefix(node, bdata(&test1), 1);
    debug("result: %p, expected: %p", res, valueA);
    mu_assert(res == value4, "Got wrong value4 for prefix of 1.");

    res = TSTree_search_prefix(node, "TE", strlen("TE"));
    mu_assert(res != NULL, "Should find for short prefix.");

    res = TSTree_search_prefix(node, "TE--", strlen("TE--"));
    mu_assert(res != NULL, "Should find for partial prefix.");


    return NULL;
}

void TSTree_traverse_test_cb(void *value, void *data)
{
    assert(value != NULL && "Should not get NULL value.");
    assert(data == valueA && "Expecting valueA as the data.");
    traverse_count++;
}

char *test_traverse()
{
    traverse_count = 0;
    TSTree_traverse(node, TSTree_traverse_test_cb, valueA);
    debug("traverse count is: %d", traverse_count);
    mu_assert(traverse_count == 4, "Didn't find 4 keys.");

    return NULL;
}

char *test_destroy()
{
    TSTree_destroy(node);

    return NULL;
}

char * all_tests() {
    mu_suite_start();

    mu_run_test(test_insert);
    mu_run_test(test_search_exact);
    mu_run_test(test_search_prefix);
    mu_run_test(test_traverse);
    mu_run_test(test_destroy);

    return NULL;
}

RUN_TESTS(all_tests);

優(yōu)點和缺點

TSTree可以用于實現(xiàn)一些其它實用的事情：

除了尋找前綴，你可以反轉(zhuǎn)插入的所有鍵，之后通過后綴來尋找。我使用它來尋找主機名稱，因為我想要找到*.learncodethehardway.com，所以如果我反向來尋找，會更快匹配到它們。
你可以執(zhí)行“模糊”搜索，其中你可以收集所有與鍵的大多數(shù)字符相似的節(jié)點，或者使用其它算法用于搜索近似的匹配。
你可以尋找所有中間帶有特定部分的鍵。

我已經(jīng)談論了TSTree能做的一些事情，但是它們并不總是最好的數(shù)據(jù)結(jié)構。TSTree的缺點在于：

像我提到過的那樣，刪除操作非常麻煩。它們適用于需要快速檢索并且從不移除的操作。如果你需要刪除，可以簡單地將value置空，之后當樹過大時周期性重構它。
與BSTree和Hashmap相比，它在相同的鍵上使用了大量的空間。它對于鍵中的每個字符都使用了完整的節(jié)點。它對于短的鍵效果更好，但如果你在TSTree中放入一大堆東西，它會變得很大。
它們也不適合處理非常長的鍵，然而“長”是主觀的詞，所以應當像通常一樣先進行測試。如果你嘗試儲存一萬個字符的鍵，那么應當使用Hashmap。

如何改進

像通常一樣，瀏覽代碼，使用防御性的先決條件、斷言，并且檢查每個函數(shù)來改進。下面是一些其他的改進方案，但是你并不需要全部實現(xiàn)它們：

你可以使用DArray來允許重復的value值。
因為我提到刪除非常困難，但是你可以通過將值設為NULL來模擬，使值能夠高效被刪除。
目前還不能獲取到所有匹配指定前綴的值，我會讓你在附加題中實現(xiàn)它。
有一些其他得更復雜的算法會比它要好。查詢前綴數(shù)組、前綴樹和基數(shù)樹的資料。

附加題

實現(xiàn)TSTree_collect返回DArray包含所有匹配指定前綴的鍵。
實現(xiàn)TSTree_search_suffix和TSTree_insert_suffix，實現(xiàn)后綴搜索和插入。
使用valgrind來查看與BSTree 和 Hashmap相比，這個結(jié)構使用了多少內(nèi)存來儲存數(shù)據(jù)。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

笨辦法學C 練習46：三叉搜索樹

笨辦法學C 練習46：三叉搜索樹

練習46：三叉搜索樹

優(yōu)點和缺點

如何改進

附加題

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

笨辦法學C 練習46：三叉搜索樹

練習46：三叉搜索樹

優(yōu)點和缺點

如何改進

附加題

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av