What is a String in C and Why Infamous?
A string is basically a sequence of characters that form words and sentences that humans can read and understand. I call it infamous because there is no native string data type in C unlike other programming languages such as C++ or python. This makes working with strings in C much more trickier in C and would cause a lot of harm if not careful.
In C programming language, we use double quote (""
) to indicate a string and single quote (''
) to indicate a single character. For example:
char onechar = 'a'; /* this is a single character */ char * onestring = "this is a string"; /* this is a string */
Declare a C String
As there is no native string data type in C, we must declare a string as an array of char
or a char pointer
and include a terminating NULL ('\0'
) character to mark the end of a string. Depending on how you declare this “string”, it can be either mutable or immutable. So, be careful here.
Declare Mutable Strings in C with Fixed Size
Below is an example of declaring a mutable array of size 8 + 1, because we need to add 1 extra byte to store the terminating NULL character to mark the end of it. The maximum allowable length of this string would be 8 characters long. It is mutable because “myString” is allocated on the stack memory
char myString[8+1] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', '\0'}; /* this is a mutable array with fixed size */
Declare Mutable Strings in C with dynamic Size
Similar as above, except that we do not specify the size in the declaration. The compiler will automatically figure out based on its initialized value. In the example below, the total size of “myString” would be 9 + 1. it must initialize to a value at compile time (not during run time). It is also mutable because “myString” is allocated on the stack memory.
char myString[] = "my string"; /* this is a mutable array with dynamic size */
Declare Mutable Strings in C with dynamic Size – 2
The example above declares a string with dynamic size, but only during compile time. This may not be useful in the case where the length of a string is known only during run time. In this case, we will have to declare the string as a char *
and use dynamic memory allocation. The example below first allocates (9+1) bytes of memory from the heap, set them all to zero and use strcpy
to give it a string value. The terminating NULL character is set when we call the memset
function. This string is mutable because it is dynamically allocated on the heap memory.
char * myString = NULL; myString = (char *) malloc (9 + 1); if (myString) { memset(myString, 0, (9+1)); strcpy(myString, "my string"); }
Declare Read Only Strings in C
We must declare the string as char *
with or without the const
keyword to make it read only. Please note that when we declare and initialize string values using double quotes (""
), the terminating NULL character is automatically included by the compiler, so you do not have to specify it in any way.
char * myString = "my string"; /* this is a read only string */ const char * myString2 = "my string2"; /* this is also a read only string */
Print the String?
A string can be printed to terminal with printf()
function using the %s
operator:
printf("my string = %s\n", myString);
Can A String be Declared as Unsigned Char?
Yes, you can, but not encouraged because it may cause undesirable interpretation issues in soem cases. When dealing with strings in C that represent text, it’s important to consider the character encoding used. If the string is encoded using a multi-byte character encoding (such as UTF-8), treating it as an array of unsigned char
may not be sufficient for proper string manipulation and processing.
Unsigned char is more suitable to represent a byte array
, aka “binary data”. An unsigned char has a value range from 0 ~ 255, which exactly represent the value of 1 byte. For this reason, using unsigned char is more suitable for binary data representation than a string.
Get the Size Right – sizeof vs strlen
One of the most common mistakes in C is misuse utility functions to calculate size such as sizeof
and strlen
. In a lower level programming such as C, knowing the size of something is super import. A lot of standard C functions require you to provide a size value, such as memcpy
. You need to know the size to use as while loop bounds..etc.
- sizeof() is an operator that returns the size in bytes (number of memory bytes occupied) of its operand. It is commonly used to determine the size of variables, types, or expressions at compile time.
- strlen() is a function that returns the length of a NULL-terminated string without the terminating NULL.
Use sizeof() for general purpose. Use strlen only on strings.
Do not rely on sizeof() to return you the number of elements in an array. It does not necessary return the right value. Consider the example below:
void size_example(void) { char myString[16+1] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h','\0'}; char * myString2 = "abcdefghabcdefgh"; char * myString3 = (char *) malloc (16+1); strcpy(myString3, "abcdefghabcdefgh"); char myIntArray[10] = {1, 2, 3, 4, 5, 0, 7, 8, 9, 10}; printf("sizeof(myString) = %lu, strlen(myString) = %lu\n", sizeof(myString), strlen(myString)); printf("sizeof(myString2) = %lu, strlen(myString2) = %lu\n", sizeof(myString2), strlen(myString2)); printf("sizeof(myString3) = %lu, strlen(myString3) = %lu\n", sizeof(myString3), strlen(myString3)); printf("sizeof(myIntArray) = %lu, strlen(myIntArray) = %lu\n", sizeof(myIntArray), strlen(myIntArray)); }
which would produce this output:
sizeof(myString) = 17 strlen(myString) = 16
sizeof(myString2) = 8 strlen(myString2) = 16
sizeof(myString3) = 8 strlen(myString3) = 16
sizeof(myIntArray) = 10 strlen(myIntArray) = 5
but why?
- “myString” should be straightforward, the array is declared as 17 byte, so sizeof returns 17. The strlen() prints the same without the terminating NULL, so it returns 16.
- “myString2” is a pointer, which normally occupies 8 bytes in memory to represent a memory address, so sizeof() returns 8. This pointer points to a read-only string of length 16, so strlen() returns 16.
- “myString3” is also a pointer, so sizeof() returns 8. It is allocated on heap and assigned a string value of length 16, so strlen() also returns 16.
- “myIntArray” is a little tricky. It is declared as an array of char. A char occupies 1 byte in memory and there are 10 of them, so sizeof() returns 10. Strlen() returns 5 because it stops counting the size when the 0 is encountered (0 is treated as terminating NULL as well).
Copy Data – memcpy vs strcpy
Here’s the rule of thumb. Use strcpy to copy a string to another string and use memcpy to copy anything else. Here’s their syntax:
#include <string.h>
void *memcpy(void *dest, const void *src, size_t n);
char *strcpy(char *dest, const char *src);
char *strncpy(char *dest, const char *src, size_t n);
void *memset(void *s, int c, size_t n);
memcpy
is used to copy a block of memory from a source address to a destination address. It copies n
bytes of data from the memory location pointed to by src
to the memory location pointed to by dest
. It does not consider the contents of the data being copied and treats it purely as a series of bytes.
strcpy
is used specifically for copying null-terminated strings. It copies the string pointed to by src
(including the null terminator) to the memory location pointed to by dest
. It stops copying when it encounters the null terminator. strncpy
is the other variant that takes an addition size n
argument to copy at most n
characters from src
.
memset
fills the first n
bytes of the memory area pointed to by s
with the constant byte c
. memset
is commonly used for initializing memory buffers, such as setting an array to all zeros or all ones.It’s important to note that memset
operates at the byte level, meaning that the value c
is interpreted as a byte and is copied into each byte of the memory area s
.
Here’s some example:
typedef struct user { int userid; char username[64]; unsigned int age; char occupation[64]; } UserInfo; void CopyDateExample(void) { /* * memcpy and strcpy are 2 of the most commonly used functions to copy data from one pointer * location to another pointer location. memcpy is used for general purpose, strcpy is used * specifically for strings */ char myString1[16+1] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', '\0'}; char myString2[16+1]; UserInfo myuser1 = {0}; UserInfo myuser2 = {0}; // more dangerous to use because it could copy more data than myString2[] can hold // strcpy(myString2, myString1); // much safer because it will not copy more than specified size. strncpy(myString2, myString1, sizeof(myString2) - 1); // not as safe, because strlen could be longer than myString2[] can hold // strncpy(myString2, myString1, strlen(myString1)); printf("\nmyString2 = %s\n", myString2); // set myString2 to all zeros memset(myString2, 0, sizeof(myString2)); memcpy(&myString2[0], &myString1[0], sizeof(myString1)); //includes terminating null character memcpy(&myString2[0], &myString1[0], strlen(myString1)); //does not include terminating null character myuser1.age = 35; strncpy(myuser1.occupation, "doctor", strlen("doctor")); myuser1.userid = 100; strncpy(myuser1.username, "caryh", strlen("caryh")); memcpy(&myuser2, &myuser1, sizeof(UserInfo)); printf("myuser2.age = %d\nmyuser2.occupation = %s\nmyuser2.userid = %d\nmyuser2.username = %s\n", myuser2.age, myuser2.occupation, myuser2.userid, myuser2.username); }
the above example shall produce this output:
myString2 = abcdefghabcdefgh
myuser2.age = 35
myuser2.occupation = doctor
myuser2.userid = 100
myuser2.username = caryh
Compare Data – memcmp vs strcmp
Here’s a similar rule of thumb: Use strcmp or strncmp to compare 2 strings and use memcmp on everything else. Here’s their prototypes:
#include <string.h>
int memcmp(const void *s1, const void *s2, size_t n);
int strcmp(const char *str1, const char *str2);
int strncmp(const char *str1, const char *str2, size_t n);
memcmp
compares the first n
bytes of the memory areas pointed to by s1
and s2
. It treats the memory areas as raw data and compares them byte by byte.
Return Values:
- Returns a negative value if the contents of
s1
are lexicographically less than the contents ofs2
. - Returns zero if the contents of
s1
are equal to the contents ofs2
. - Returns a positive value if the contents of
s1
are lexicographically greater than the contents ofs2
.
strcmp
compares two null-terminated strings str1
and str2
. It compares the characters of the strings lexicographically (based on their ASCII values) until it finds a difference or reaches the end of either string. strncmp
does the same thing except that it only compares the first n
characters.
Return Value:
- Returns a negative value if
str1
is lexicographically less thanstr2
. - Returns zero if
str1
is equal tostr2
. - Returns a positive value if
str1
is lexicographically greater thanstr2
.
Example:
typedef struct user { int userid; char username[64]; unsigned int age; char occupation[64]; } UserInfo; void compareDataExample(void) { /* * strcmp and memcmp are 2 of the most commonly used functions to compare data from one pointer * location to another pointer location. memcmp is used for general purpose, strcmp is used * specifically for comparing strings */ char myString1[16+1] = {'t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g', '\0'}; char myString2[16+1]; char * myname1 = "Cary Huang is my name"; char * myname2 = "Cary Huang"; int myNum1 = 200; int myNum2 = 200; int * myNum1ptr = &myNum1; int * myNum2ptr = &myNum2; UserInfo myuser1 = {0}; UserInfo myuser2 = {0}; strncpy(myString2, myString1, sizeof(myString1)); printf("\n"); printf("strcmp(myString1, myString2) returns %d\n", strcmp(myString1, myString2)); printf("memcmp(myString1, myString2) returns %d\n", memcmp(myString1, myString2, sizeof(myString1))); printf("strcmp(myname1, myname2) returns %d\n", strcmp(myname1, myname2)); printf("strcmp(myname2, myname1) returns %d\n", strcmp(myname2, myname1)); printf("strncmp(myname1, myname2, 10) returns %d\n", strncmp(myname1, myname2, strlen(myname2))); printf("strncmp(myname1, myname2, 10) returns %d\n", strncmp(myname1, myname2, strlen(myname1))); if(myNum1 == myNum2) printf("myNum1 == myNum2\n"); else printf("myNum1 != myNum2\n"); if(myNum1ptr == myNum2ptr) //WRONG way to compare printf("myNum1ptr == myNum2ptr\n"); else printf("myNum1ptr != myNum2ptr\n"); if(*myNum1ptr == *myNum2ptr) printf("*myNum1ptr == *myNum2ptr\n"); else printf("*myNum1ptr != *myNum2ptr\n"); printf("memcmp(myNum1ptr, myNum2ptr) returns = %d\n", memcmp(myNum1ptr, myNum2ptr, sizeof(int))); myNum1 = 100; printf("memcmp(myNum1ptr, myNum2ptr) returns = %d\n", memcmp(myNum1ptr, myNum2ptr, sizeof(int))); printf("memcmp(&myuser1, &myuser2) returns = %d\n", memcmp(&myuser1, &myuser2, sizeof(UserInfo))); }
The above will produce this output:
strcmp(myString1, myString2) returns 0
memcmp(myString1, myString2) returns 0
strcmp(myname1, myname2) returns 32
strcmp(myname2, myname1) returns -32
strncmp(myname1, myname2, 10) returns 0
strncmp(myname1, myname2, 10) returns 32
myNum1 == myNum2
myNum1ptr != myNum2ptr
*myNum1ptr == *myNum2ptr
memcmp(myNum1ptr, myNum2ptr) returns = 0
memcmp(myNum1ptr, myNum2ptr) returns = -100
memcmp(&myuser1, &myuser2) returns = 0
Format Strings with sprintf
It is quite common for an applicate to construct a string from combinations of different data types such as int
, short
, another string perhaps. sprintf
is a handy utility function to format a string with these prototypes:
int sprintf(char *str, const char *format, ...);
int snprintf(char *str, size_t size, const char *format, ...);
sprintf
is used to format and write data to a string (str
) according to the format specifier string (format
). It works similarly to printf
but writes the formatted output to the character array specified by str
.
Return Value: The number of characters written (excluding the null terminator) or a negative value if an error occurs.
snprintf
is similar to sprintf
, but it also includes a size parameter size
, which specifies the maximum number of characters to be written to the string (str
). This prevents buffer overflow by ensuring that the output is truncated if it exceeds the specified size.
Return Value: The number of characters that would have been written if the buffer were large enough (excluding the null terminator) or a negative value if an error occurs. If the return value is greater than or equal to size
, it indicates that the output was truncated.
Example:
typedef struct ipaddress { unsigned char a; unsigned char b; unsigned char c; unsigned char d; }Ipaddress; void formatDataExample(void) { /* sprintf() and snprintf() are common functions used to format various data types into a string */ char myURL[64+1]; Ipaddress ip = {192, 168, 20, 1}; unsigned short portnumber = 8080; char * productPath = "/products/myproduct1"; char * filename = "about.html"; memset(myURL, 0, sizeof(myURL)); /* * I want to construct a https URL address and save the result to myURL char array, that looks like: * https://192.168.20.1:8080/products/myproduct1/about.html */ // sprintf(&myURL[0], "https://%u.%u.%u.%u:%u%s/%s", ip.a, ip.b, ip.c, ip.d, portnumber, productPath, filename); snprintf(&myURL[0], sizeof(myURL), "https://%u.%u.%u.%u:%u%s/%s", ip.a, ip.b, ip.c, ip.d, portnumber, productPath, filename); printf("myURL = %s\n", myURL); }
the above example produces this output:
myURL = https://192.168.20.1:8080/products/myproduct1/about.html
Related Posts
- Create User Interactions with Printf and Scanf
- A Closer Look at C Data Types and Variables
- A Deeper Look at Computer Memory
- 5 Deadly Mistakes to Avoid in C Programming
Hi, this is Cary, your friendly tech enthusiast, educator and author. Currently working as a software architect at Highgo Software Canada. I enjoy simplifying complex concepts, diving into coding challenges, unraveling the mysteries of software. Most importantly, I like sharing and teaching others about all things tech. Find more blogs from me at highgo.ca