Solarian Programmer

My programming ramblings

C99/C11 dynamic array that mimics C++'s std::vector

Posted on January 6, 2017 by Paul

Recently, I’ve worked on a large C library used to store items from a database. The idea was to get all records that matched a particular criteria and store these in a C data structure for further processing. The original author of the library used a linked list to store the records which probably was a sensible idea 20 years ago. However, today’s computers perform better if you use a cache friendly data structure like an array. Basically, I wanted a dynamic array that automatically increases his size if needed, something like C++'s std::vector.

UPDATE 8 January 2017
Part 2 of this article presents a better, more flexible, implementation for a C dynamic array. Feel free to skip the remaining of this article and read the second part.

It goes without saying that this article is written for C programmers or people that use a C compiler. If you are using C++, you usually don’t need to implement a vector type, you already have one in the STL, use it.

Let’s start with a simplified problem - say that we want a C dynamic array that stores integers, something like this:

1 #include <stdio.h>
2 #include <stdlib.h>
3 
4 typedef struct {
5     size_t size;
6     size_t capacity;
7     int *data;
8 } Array;

In the above structure, size represents the number of elements from data and capacity the memory allocated for data. Following the C++ std::vector model, every time size is equal to capacity we will increase the available memory by a factor of twice the number of elements currently stored.

First, we need two helper functions to create/destroy an Array:

 1 Array *array_create(size_t size) {
 2     Array *p = malloc(sizeof(Array));
 3     p->size = size;
 4     p->capacity = size;
 5     if(size) {
 6         p->data = calloc(size, sizeof(int));
 7     } else {
 8         p->data = NULL;
 9     }
10     return p;
11 }
12 
13 void array_free(Array *arr) {
14     free(arr->data);
15     free(arr);
16 }

For simplicity we didn’t check if calloc was successful, don’t take shortcuts in production code!

Using the above functions we can, for example, create two arrays:

 1 int main() {
 2     Array *arr = array_create(0);
 3     Array *arr2 = array_create(100);
 4 
 5     // ...
 6 
 7     array_free(arr);
 8     array_free(arr2);
 9     return 0;
10 }

Please note that arr is initially empty, has zero elements, while arr2 can store 100 elements. We can iterate over the data stored in an Array with:

1     for(size_t i = 0; i < arr->size; ++i) {
2         // process data ...
3     }

What if we want to add some data to an Array ? Again, following the C++ std::vector model, we can implement a push_back function that will insert numbers to the end of our array, resizing the data buffer if necessary:

 1 void array_push_back(Array *arr, int x) {
 2     arr->size++;
 3     if(arr->size >= arr->capacity) {
 4         arr->capacity = 2 * arr->size;
 5         arr->data = realloc(arr->data, arr->capacity * sizeof(int));
 6         arr->data[arr->size - 1] = x;
 7     } else {
 8         arr->data[arr->size - 1] = x;
 9     }
10 }

At this point, we have a working, albeit simplified, C dynamic array that can store an arbitrary number of integers. Here is an example of how we can use Array:

 1 int main() {
 2     Array *arr = array_create(0);
 3     printf("array size = %ld\n", arr->size);
 4     printf("array capacity = %ld\n", arr->capacity);
 5 
 6     // Store some data in the array:
 7     int nr_elements = 0;
 8     printf("How many elements you want to store? ");
 9     scanf("%d", &nr_elements);
10 
11     for(int i = 0; i < nr_elements; ++i) {
12         array_push_back(arr, i);
13     }
14 
15     printf("array size = %ld\n", arr->size);
16     printf("array capacity = %ld\n", arr->capacity);
17 
18     // do something with the data ...
19 
20     array_free(arr);
21     return 0;
22 }

I will leave as an exercise for the reader the implementation of a shrink_to_fit function. Hint, use realloc to resize the data buffer in order to have size equal to capacity.

What if we want to be able to store more than integers in our dynamic array ? A classical C pattern is to use a void pointer, something like:

1 #include <stdio.h>
2 #include <stdlib.h>
3 
4 typedef struct {
5     size_t size;
6     size_t capacity;
7     void *data;
8 } Array;

No change is necessary for the array_free function. However, array_create must be modified to account for the size of the data type we want to store in data, see lines 1 and 6:

 1 Array *array_create(size_t size, size_t sizeof_data) {
 2     Array *p = malloc(sizeof(Array));
 3     p->size = size;
 4     p->capacity = size;
 5     if(size) {
 6         p->data = calloc(size, sizeof_data);
 7     } else {
 8         p->data = NULL;
 9     }
10     return p;
11 }

The array_push_back function also needs to be refactored in order to account for the size and type of what we store in the data buffer. A possible approach is to use a macro, instead of the original function:

 1 #define ARRAY_PUSH_BACK(arr, x, data_type) \
 2     do {\
 3         arr->size++;\
 4         if(arr->size >= arr->capacity) {\
 5             arr->capacity = 2 * arr->size;\
 6             arr->data = realloc(arr->data, arr->capacity * sizeof(data_type));\
 7             data_type *pp = arr->data;\
 8             pp[arr->size - 1] = (x);\
 9         } else {\
10             data_type *pp = arr->data;\
11             pp[arr->size - 1] = (x);\
12         }\
13     } while(0)

Assuming that we have all of the Array functions and macros defined in a file named Array.h, let’s see an example of how we can store arbitrary types in an Array.

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 #include "Array.h"
 4 
 5 typedef struct {
 6     int x;
 7     int y;
 8 } Vector2i;
 9 
10 int main() {
11     // 3 arrays of different types
12     Array *arr = array_create(0, sizeof(int));
13     Array *arr2 = array_create(0, sizeof(double));
14     Array *arr3 = array_create(0, sizeof(Vector2i));
15 
16     int nr_elements = 0;
17     printf("How many elements you want to store: ");
18     scanf("%d", &nr_elements);
19 
20     // Fill the arrays with some numbers
21     for(int i = 0; i < nr_elements; ++i) {
22         ARRAY_PUSH_BACK(arr, i, int);
23         ARRAY_PUSH_BACK(arr2, i*i, double);
24         ARRAY_PUSH_BACK(arr3, ((Vector2i){i, 2*i}), Vector2i);
25     }
26 
27     // Here is an example of how you can modify the data from one of the arrays:
28     ((Vector2i *)arr3->data)[0].x = 333;
29     ((Vector2i *)arr3->data)[0].y = 888;
30 
31     // Here is an example of how to print the data to stdout for the array of Vector2i:
32     for(size_t i = 0; i < arr3->size; ++i) {
33         Vector2i *pp = arr3->data;
34         printf("(%d %d) ", pp[i].x, pp[i].y);
35     }
36     printf("\n");
37 
38     // Do further processing ...
39 
40     array_free(arr);
41     array_free(arr2);
42     array_free(arr3);
43     return 0;
44 }

Please note that all the above code was tested with Clang 3.8, GCC 6.3 and Visual Studio 2017. Normally, it should work with any recent C99/C11 compiler, with the notable exception of Visual Studio, for which parts of the C99 standard were added starting with the 2013 version. If you are using Visual Studio 2017 and up, you should be able to use the full C11 Clang toolchain from Visual Studio instead of the CL compiler.

If you want to learn more about C99/C11 I would recommend reading 21st Century C: C Tips from the New School by Ben Klemens:

or the classic C Bible, The C Programming Language by B.W. Kernighan, D.M. Ritchie:


Show Comments