練習(xí)35：排序和搜索

原文：Exercise 35: Sorting And Searching

譯者：飛龍

這個練習(xí)中我打算涉及到四個排序算法和一個搜索算法。排序算法是快速排序、堆排序、歸并排序和基數(shù)排序。之后在你完成基數(shù)排序之后，我打算想你展示二分搜索。

然而，我是一個懶人，大多數(shù)C標(biāo)準(zhǔn)庫都實(shí)現(xiàn)了堆排序、快速排序和歸并排序算法，你可以直接使用它們：

#include <lcthw/darray_algos.h>
#include <stdlib.h>

int DArray_qsort(DArray *array, DArray_compare cmp)
{
    qsort(array->contents, DArray_count(array), sizeof(void *), cmp);
    return 0;
}

int DArray_heapsort(DArray *array, DArray_compare cmp)
{
    return heapsort(array->contents, DArray_count(array), sizeof(void *), cmp);
}

int DArray_mergesort(DArray *array, DArray_compare cmp)
{
    return mergesort(array->contents, DArray_count(array), sizeof(void *), cmp);
}

這就是darray_algos.c文件的整個實(shí)現(xiàn)，它在大多數(shù)現(xiàn)代Unix系統(tǒng)上都能運(yùn)行。它們的每一個都使用DArray_compare對contents中儲存的無類型指針進(jìn)行排序。我也要向你展示這個頭文件：

#ifndef darray_algos_h
#define darray_algos_h

#include <lcthw/darray.h>

typedef int (*DArray_compare)(const void *a, const void *b);

int DArray_qsort(DArray *array, DArray_compare cmp);

int DArray_heapsort(DArray *array, DArray_compare cmp);

int DArray_mergesort(DArray *array, DArray_compare cmp);

#endif

大小幾乎一樣，你也應(yīng)該能預(yù)料到。接下來你可以了解單元測試中這三個函數(shù)如何使用：

#include "minunit.h"
#include <lcthw/darray_algos.h>

int testcmp(char **a, char **b)
{
    return strcmp(*a, *b);
}

DArray *create_words()
{
    DArray *result = DArray_create(0, 5);
    char *words[] = {"asdfasfd", "werwar", "13234", "asdfasfd", "oioj"};
    int i = 0;

    for(i = 0; i < 5; i++) {
        DArray_push(result, words[i]);
    }

    return result;
}

int is_sorted(DArray *array)
{
    int i = 0;

    for(i = 0; i < DArray_count(array) - 1; i++) {
        if(strcmp(DArray_get(array, i), DArray_get(array, i+1)) > 0) {
            return 0;
        }
    }

    return 1;
}

char *run_sort_test(int (*func)(DArray *, DArray_compare), const char *name)
{
    DArray *words = create_words();
    mu_assert(!is_sorted(words), "Words should start not sorted.");

    debug("--- Testing %s sorting algorithm", name);
    int rc = func(words, (DArray_compare)testcmp);
    mu_assert(rc == 0, "sort failed");
    mu_assert(is_sorted(words), "didn't sort it");

    DArray_destroy(words);

    return NULL;
}

char *test_qsort()
{
    return run_sort_test(DArray_qsort, "qsort");
}

char *test_heapsort()
{
    return run_sort_test(DArray_heapsort, "heapsort");
}

char *test_mergesort()
{
    return run_sort_test(DArray_mergesort, "mergesort");
}


char * all_tests()
{
    mu_suite_start();

    mu_run_test(test_qsort);
    mu_run_test(test_heapsort);
    mu_run_test(test_mergesort);

    return NULL;
}

RUN_TESTS(all_tests);

你需要注意的事情是第四行testcmp的定義，它困擾了我一整天。你必須使用char **而不是char *，因?yàn)?code>qsort會向你提供指向content數(shù)組中指針的指針。原因是qsort會打掃數(shù)組，使用你的比較函數(shù)來處理數(shù)組中每個元素的指針。因?yàn)槲以?code>contents中存儲指針，所以你需要使用指針的指針。

有了這些之后，你只需要實(shí)現(xiàn)三個困難的搜索算法，每個大約20行。你應(yīng)該在這里停下來，不過這本書的一部分就是學(xué)習(xí)這些算法的原理，附加題會涉及到實(shí)現(xiàn)這些算法。

基數(shù)排序和二分搜索

既然你打算自己實(shí)現(xiàn)快速排序、堆排序和歸并排序，我打算向你展示一個流行的算法叫做基數(shù)排序。它的實(shí)用性很小，只能用于整數(shù)數(shù)組，并且看上去像魔法一樣。這里我打算常見一個特殊的數(shù)據(jù)結(jié)構(gòu)，叫做RadixMap，用于將一個整數(shù)映射為另一個。

下面是為新算法創(chuàng)建的頭文件，其中也含有數(shù)據(jù)結(jié)構(gòu)：

#ifndef _radixmap_h
#include <stdint.h>

typedef union RMElement {
    uint64_t raw;
    struct {
        uint32_t key;
        uint32_t value;
    } data;
} RMElement;

typedef struct RadixMap {
    size_t max;
    size_t end;
    uint32_t counter;
    RMElement *contents;
    RMElement *temp;
} RadixMap;


RadixMap *RadixMap_create(size_t max);

void RadixMap_destroy(RadixMap *map);

void RadixMap_sort(RadixMap *map);

RMElement *RadixMap_find(RadixMap *map, uint32_t key);

int RadixMap_add(RadixMap *map, uint32_t key, uint32_t value);

int RadixMap_delete(RadixMap *map, RMElement *el);

#endif

你看到了其中有許多和Dynamic Array或List數(shù)據(jù)結(jié)構(gòu)相同的操作，不同就在于我只處理固定32位大小的uint32_t正忽視。我也會想你介紹C語言的一個新概念，叫做union。

C聯(lián)合體

聯(lián)合體是使用不同方式引用內(nèi)存中同一塊區(qū)域的方法。它們的工作方式，就像你把它定義為sturct，然而，每個元素共享同一片內(nèi)存區(qū)域。你可以認(rèn)為，聯(lián)合體是內(nèi)存中的一幅畫，所有顏色不同的元素都重疊在它上面。

它可以用于節(jié)約內(nèi)存，或在不同格式之間轉(zhuǎn)換內(nèi)存塊。它的第一個用途就是實(shí)現(xiàn)“可變類型”，你可以創(chuàng)建一個帶有類型“標(biāo)簽”的結(jié)構(gòu)體，之后在其中創(chuàng)建含有多種類型的聯(lián)合體。用于在內(nèi)存的不同格式之間轉(zhuǎn)換時，只需要定義兩個結(jié)構(gòu)體，訪問正確的那個類型。

首先讓我向你展示如何使用C聯(lián)合體構(gòu)造可變類型：

#include <stdio.h>

typedef enum {
    TYPE_INT,
    TYPE_FLOAT,
    TYPE_STRING,
} VariantType;

struct Variant {
    VariantType type;
    union {
        int as_integer;
        float as_float;
        char *as_string;
    } data;
};

typedef struct Variant Variant;

void Variant_print(Variant *var)
{
    switch(var->type) {
        case TYPE_INT:
           printf("INT: %d\n", var->data.as_integer);
           break;
        case TYPE_FLOAT:
           printf("FLOAT: %f\n", var->data.as_float);
           break;
        case TYPE_STRING:
           printf("STRING: %s\n", var->data.as_string);
           break;
        default:
           printf("UNKNOWN TYPE: %d", var->type);
    }
}

int main(int argc, char *argv[])
{
    Variant a_int = {.type = TYPE_INT, .data.as_integer = 100};
    Variant a_float = {.type = TYPE_FLOAT, .data.as_float = 100.34};
    Variant a_string = {.type = TYPE_STRING, .data.as_string = "YO DUDE!"};

    Variant_print(&a_int);
    Variant_print(&a_float);
    Variant_print(&a_string);

    // here's how you access them
    a_int.data.as_integer = 200;
    a_float.data.as_float = 2.345;
    a_string.data.as_string = "Hi there.";

    Variant_print(&a_int);
    Variant_print(&a_float);
    Variant_print(&a_string);

    return 0;
}

你可以在許多動態(tài)語言實(shí)現(xiàn)中發(fā)現(xiàn)它。對于為語言中所有基本類型，代碼中首先定義了一些帶有變遷的可變類型，之后通常給你所創(chuàng)建的類型打上object標(biāo)簽。這樣的好處就是Variant通常只需要VariantType type標(biāo)簽的空間，加上聯(lián)合體最大成員的空間，因?yàn)镃將Variant.data的每個元素堆起來，它們是重疊的，只保證有足夠的空間放下最大的元素。

radixmap.h文件中我創(chuàng)建了RMElement聯(lián)合體，用于在類型之間轉(zhuǎn)換內(nèi)存塊。這里，我希望存儲uint64_t定長整數(shù)用于排序目錄，但是我也希望使用兩個uint32_t用于表示數(shù)據(jù)的key和value對。通過使用聯(lián)合體我就能夠使用所需的兩種不同方法來訪問內(nèi)存。

實(shí)現(xiàn)

接下來是實(shí)際的RadixMap對于這些操作的實(shí)現(xiàn)：

/*
* Based on code by Andre Reinald then heavily modified by Zed A. Shaw.
*/

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <lcthw/radixmap.h>
#include <lcthw/dbg.h>

RadixMap *RadixMap_create(size_t max)
{
    RadixMap *map = calloc(sizeof(RadixMap), 1);
    check_mem(map);

    map->contents = calloc(sizeof(RMElement), max + 1);
    check_mem(map->contents);

    map->temp = calloc(sizeof(RMElement), max + 1);
    check_mem(map->temp);

    map->max = max;
    map->end = 0;

    return map;
error:
    return NULL;
}

void RadixMap_destroy(RadixMap *map)
{
    if(map) {
        free(map->contents);
        free(map->temp);
        free(map);
    }
}


#define ByteOf(x,y) (((uint8_t *)x)[(y)])

static inline void radix_sort(short offset, uint64_t max, uint64_t *source, uint64_t *dest)
{
    uint64_t count[256] = {0};
    uint64_t *cp = NULL;
    uint64_t *sp = NULL;
    uint64_t *end = NULL;
    uint64_t s = 0;
    uint64_t c = 0;

    // count occurences of every byte value
    for (sp = source, end = source + max; sp < end; sp++) {
        count[ByteOf(sp, offset)]++;
    }

    // transform count into index by summing elements and storing into same array
    for (s = 0, cp = count, end = count + 256; cp < end; cp++) {
        c = *cp;
        *cp = s;
        s += c;
    }

    // fill dest with the right values in the right place
    for (sp = source, end = source + max; sp < end; sp++) {
        cp = count + ByteOf(sp, offset);
        dest[*cp] = *sp;
        ++(*cp);
    }
}

void RadixMap_sort(RadixMap *map)
{
    uint64_t *source = &map->contents[0].raw;
    uint64_t *temp = &map->temp[0].raw;

    radix_sort(0, map->end, source, temp);
    radix_sort(1, map->end, temp, source);
    radix_sort(2, map->end, source, temp);
    radix_sort(3, map->end, temp, source);
}

RMElement *RadixMap_find(RadixMap *map, uint32_t to_find)
{
    int low = 0;
    int high = map->end - 1;
    RMElement *data = map->contents;

    while (low <= high) {
        int middle = low + (high - low)/2;
        uint32_t key = data[middle].data.key;

        if (to_find < key) {
            high = middle - 1;
        } else if (to_find > key) {
            low = middle + 1;
        } else {
            return &data[middle];
        }
    }

    return NULL;
}

int RadixMap_add(RadixMap *map, uint32_t key, uint32_t value)
{
    check(key < UINT32_MAX, "Key can't be equal to UINT32_MAX.");

    RMElement element = {.data = {.key = key, .value = value}};
    check(map->end + 1 < map->max, "RadixMap is full.");

    map->contents[map->end++] = element;

    RadixMap_sort(map);

    return 0;

error:
    return -1;
}

int RadixMap_delete(RadixMap *map, RMElement *el)
{
    check(map->end > 0, "There is nothing to delete.");
    check(el != NULL, "Can't delete a NULL element.");

    el->data.key = UINT32_MAX;

    if(map->end > 1) {
        // don't bother resorting a map of 1 length
        RadixMap_sort(map);
    }

    map->end--;

    return 0;
error:
    return -1;
}

像往常一樣鍵入它并使它通過單元測試，之后我會解釋它。尤其要注意radix_sort函數(shù)，我實(shí)現(xiàn)它的方法非常特別。

#include "minunit.h"
#include <lcthw/radixmap.h>
#include <time.h>

static int make_random(RadixMap *map)
{
    size_t i = 0;

    for (i = 0; i < map->max - 1; i++) {
        uint32_t key = (uint32_t)(rand() | (rand() << 16));
        check(RadixMap_add(map, key, i) == 0, "Failed to add key %u.", key);
    }

    return i;

error:
    return 0;
}

static int check_order(RadixMap *map)
{
    RMElement d1, d2;
    unsigned int i = 0;

    // only signal errors if any (should not be)
    for (i = 0; map->end > 0 && i < map->end-1; i++) {
        d1 = map->contents[i];
        d2 = map->contents[i+1];

        if(d1.data.key > d2.data.key) {
            debug("FAIL:i=%u, key: %u, value: %u, equals max? %d\n", i, d1.data.key, d1.data.value,
                    d2.data.key == UINT32_MAX);
            return 0;
        }
    }

    return 1;
}

static int test_search(RadixMap *map)
{
    unsigned i = 0;
    RMElement *d = NULL;
    RMElement *found = NULL;

    for(i = map->end / 2; i < map->end; i++) {
        d = &map->contents[i];
        found = RadixMap_find(map, d->data.key);
        check(found != NULL, "Didn't find %u at %u.", d->data.key, i);
        check(found->data.key == d->data.key, "Got the wrong result: %p:%u looking for %u at %u",
                found, found->data.key, d->data.key, i);
    }

    return 1;
error:
    return 0;
}

// test for big number of elements
static char *test_operations()
{
    size_t N = 200;

    RadixMap *map = RadixMap_create(N);
    mu_assert(map != NULL, "Failed to make the map.");
    mu_assert(make_random(map), "Didn't make a random fake radix map.");

    RadixMap_sort(map);
    mu_assert(check_order(map), "Failed to properly sort the RadixMap.");

    mu_assert(test_search(map), "Failed the search test.");
    mu_assert(check_order(map), "RadixMap didn't stay sorted after search.");

    while(map->end > 0) {
        RMElement *el = RadixMap_find(map, map->contents[map->end / 2].data.key);
        mu_assert(el != NULL, "Should get a result.");

        size_t old_end = map->end;

        mu_assert(RadixMap_delete(map, el) == 0, "Didn't delete it.");
        mu_assert(old_end - 1 == map->end, "Wrong size after delete.");

        // test that the end is now the old value, but uint32 max so it trails off
        mu_assert(check_order(map), "RadixMap didn't stay sorted after delete.");
    }

    RadixMap_destroy(map);

    return NULL;
}


char *all_tests()
{
    mu_suite_start();
    srand(time(NULL));

    mu_run_test(test_operations);

    return NULL;
}

RUN_TESTS(all_tests);

我不應(yīng)該向你解釋關(guān)于測試的過多東西，它只是模擬將隨機(jī)正是放入RadixMap，確保你可以可靠地將其取出。也不是非常有趣。

在radixmap.c中的大多數(shù)操作都易于理解，如果你閱讀代碼的話。下面是每個基本函數(shù)作用及其工作原理的描述：

RadixMap_create

像往常一樣，我分配了結(jié)構(gòu)體所需的內(nèi)存，結(jié)構(gòu)體在radixmap.h中定義。當(dāng)后面涉及到radix_sort時我會使用temp和contents。

RadixMap_destroy

同樣，銷毀我所創(chuàng)建的東西。

radix_sort

這個數(shù)據(jù)結(jié)構(gòu)的靈魂，我會在下一節(jié)中解釋其作用。

RadixMap_sort

它使用了radix_sort函數(shù)來實(shí)際對contents進(jìn)行排序。

RadixMap_find

使用二分搜索算法來尋找提供的key，我之后會解釋它的原理。

RadixMap_add

使用RadixMap_sort函數(shù)，它會在末尾添加key和value，然后簡單地重新排序使一切元素都有序。一旦排序完，RadixMap_find會正確工作，因?yàn)樗嵌炙阉鳌?/p>

RadixMap_delete

工作方式類似RadixMap_add，除了“刪除”結(jié)構(gòu)中的元素，通過將它們的值設(shè)為無符號的32為整數(shù)的最大值，也就是UINT32_MAX。這意味著你不能使用這個值作為合法的鍵，但是它是元素刪除變得容易。簡單設(shè)置它之后排序，它會被移動到末尾，這就算刪除了。

學(xué)習(xí)我所描述的代碼，接下來還剩RadixMap_sort，radix_sort和RadixMap_find需要了解。

RadixMap_find 和二分搜索

我首先以二分搜索如何實(shí)現(xiàn)開始。二分搜索是一種簡單算法，大多數(shù)人都可以直觀地理解。實(shí)際上，你可以取一疊游戲卡片（或帶有數(shù)字的卡片）來手動操作。下面是該函數(shù)的工作方式，也是二分搜索的原理：

基于數(shù)組大小設(shè)置上界和下界。
獲取上下界之間的中間元素。
如果鍵小于這個元素的值，就一定在它前面，所以上界設(shè)置為中間元素。
如果鍵大于這個元素的值，就一定在它后面，所以下界設(shè)置為中間元素。
繼續(xù)循環(huán)直到上界和下界越過了彼此。如果退出了循環(huán)則沒有找到。

你實(shí)際上所做的事情是，通過挑選中間的值來比較，猜出key可能的位置。由于數(shù)據(jù)是有序的，你知道key一定會在它前面或者后面，這樣就能把搜索區(qū)域分成兩半。之后你繼續(xù)搜索知道找到他，或者越過了邊界并窮盡了搜索空間。

RadixMap_sort 和 radix_sort

如果你事先手動模擬基數(shù)排序，它就很易于理解。這個算法利用了一個現(xiàn)象，數(shù)字都以十進(jìn)制字符的序列來表示，按照“不重要”到“重要”的順序排列。之后它通過十進(jìn)制字符來選取數(shù)字并且將它們儲存在桶中，當(dāng)它處理完所有字符時，數(shù)字就排好序了。一開始它看上去像是魔法，瀏覽代碼也的確如此，但是你要嘗試手動執(zhí)行它。

為了解釋這個算法，需要先寫下一組三位的十進(jìn)制數(shù)，以隨機(jī)的順序，假設(shè)就是223、912、275、100、633、120 和 380。

按照它們的個位，將數(shù)字放入桶中：[380, 100, 120], [912], [633, 223], [275]。
現(xiàn)在遍歷每個桶中的數(shù)字，接著按十位排序：[100], [912], [120, 223], [633], [275], [380]。
現(xiàn)在每個桶都包含了按照個位和十位排序后的數(shù)字。接著我需要按照這個順序遍歷，并把它們放入最后百位的桶中：[100, 120], [223, 275], [380], [633], [912]。
到現(xiàn)在為止，每個數(shù)字都按照百位、十位和個位排序，并且如果我按照順序遍歷每個桶，我會得到最終排序的結(jié)果：100, 120, 223, 275, 380, 633, 912。

確保你多次重復(fù)了這個過程，便于你理解它如何工作。這實(shí)在是一種機(jī)智的算法，并且最重要的是它對于任何大小的數(shù)字都有效。所以你可以用它來排序比較大的數(shù)字，因?yàn)槟阋淮沃皇翘幚硪晃弧?/p>

在我的環(huán)境下，“字符”是獨(dú)立的8位字節(jié)，所以我需要256個桶來儲存這些數(shù)字按照字節(jié)的分布結(jié)果。我需要一種方法來儲存它，并且不需要花費(fèi)太多的空間。如果你查看radix_sort，首先我會構(gòu)建count直方圖，便于我了解對于給定的offset，每個字節(jié)的頻率。

一旦我知道了每一種字節(jié)的數(shù)量（共有256種），我就可以將目標(biāo)數(shù)組用于存儲這些值的分布。比如，如果0x00的數(shù)量為10個，我就可以將它們放在目標(biāo)數(shù)組的前10個位置中。這可以讓我索引到它們在目標(biāo)數(shù)組中的位置，這就是radix_sort中的第二個for循環(huán)。

最后，當(dāng)我知道它們在目標(biāo)數(shù)組中儲存在哪里，我只是遍歷source數(shù)組對于當(dāng)前offset的所有字節(jié)，并且將數(shù)值按順序放入它們的位置中。ByteOf宏的使用有助于保持代碼整潔，因?yàn)樗枰恍┲羔樀暮谀Х?，但是最后?dāng)for循環(huán)結(jié)束之后，所有整數(shù)都會按照它們的字節(jié)放入桶中。

我在RadixMap_sort中對這些64位的整數(shù)按照它們的前32位進(jìn)行排序，這非常有意思。還記得我是如何將鍵和值放入RMElement類型的聯(lián)合體了嗎？這意味著如果要按照鍵來對這個數(shù)組排序，我只需要對每個整數(shù)前4個字節(jié)（32位/8位每字節(jié)）進(jìn)行排序。

如果你觀察RadixMap_sort，你會看到我獲取了contents和temp的便利指針，用于源數(shù)組和目標(biāo)數(shù)組，之后我四次調(diào)用radix_sort。每次調(diào)用我將源數(shù)組和目標(biāo)數(shù)組替換為下一字節(jié)的情況。當(dāng)我完成時，radix_sort就完成了任務(wù)，并且contents中也有了最后的結(jié)果。

如何改進(jìn)

這個實(shí)現(xiàn)有個很大的缺點(diǎn)，就是它遍歷了整個數(shù)組四次。它執(zhí)行地很快，但是如果你通過需要排序的數(shù)值大小來限制排序的總量，會更好一些。

有兩個方法可以用于改進(jìn)這個實(shí)現(xiàn)：

使用二分搜索來尋找新元素的最小位置，只對這個位置到微末之間進(jìn)行排序。你需要找到它，將新元素放到末尾，之后對它們之間進(jìn)行排序。大多數(shù)情況下這會顯著地縮減排序范圍。
跟蹤當(dāng)前所使用的最大的鍵，之后只對足夠的位數(shù)進(jìn)行排序，來處理這個鍵。你也可以跟蹤最小的數(shù)值，之后只對范圍中必要的字節(jié)進(jìn)行排序。為了這樣做，你需要關(guān)心CPU的整數(shù)存儲順序（大小端序）。

附加題

實(shí)現(xiàn)快速排序、堆排序和歸并排序，并且提供一個#define讓其他人在二者（標(biāo)準(zhǔn)庫和你的實(shí)現(xiàn)）當(dāng)中進(jìn)行選擇，或者創(chuàng)建另一套不同名稱的函數(shù)。使用我教給你的技巧，閱讀維基百科的算法頁面，之后參照偽代碼來實(shí)現(xiàn)它。
對比你的實(shí)現(xiàn)和標(biāo)準(zhǔn)庫實(shí)現(xiàn)的性能。
使用這些排序函數(shù)創(chuàng)建DArray_sort_add，它可以向DArray添加元素，但是隨后對數(shù)組排序。
編寫DArray_find，使用RadixMap_find中的二分搜索算法和DArray_compare，來在有序的DArray中尋找元素。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

笨辦法學(xué)C 練習(xí)35：排序和搜索

笨辦法學(xué)C 練習(xí)35：排序和搜索

練習(xí)35：排序和搜索

基數(shù)排序和二分搜索

C聯(lián)合體

實(shí)現(xiàn)

RadixMap_find 和二分搜索

RadixMap_sort 和 radix_sort

如何改進(jìn)

附加題

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

笨辦法學(xué)C 練習(xí)35：排序和搜索

練習(xí)35：排序和搜索

基數(shù)排序和二分搜索

C聯(lián)合體

實(shí)現(xiàn)

RadixMap_find 和二分搜索

RadixMap_sort 和 radix_sort

如何改進(jìn)

附加題

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av