Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
Goeduhub's Online Courses @ Udemy in Just INR 570/-
Online Training - Youtube Live Class Link
0 like 0 dislike
277 views
in Tutorial & Interview questions by Goeduhub's Expert (8.3k points)
In C, a string is not an intrinsic type. A C-string is the convention to have a one-dimensional array of characters which is terminated by a null-character, by a '\0'.

This means that a C-string with a content of "abc" will have four characters 'a', 'b', 'c' and '\0'.

Goeduhub's Top Online Courses @Udemy

For Indian Students- INR 570/- || For International Students- $12.99/-

S.No.

Course Name

 Coupon

1.

Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon

2.

Natural Language Processing-NLP with Deep Learning in Python Apply Coupon

3.

Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon
    More Courses

1 Answer

0 like 0 dislike
by Goeduhub's Expert (8.3k points)
 
Best answer

The function strtok breaks a string into a smaller strings, or tokens, using a set of delimiters.

#include <stdio.h> 

#include <string.h>

int main(void) 

{    

int toknum = 0;    

char src[] = "Hello,, world!";    

const char delimiters[] = ", !";    

char *token = strtok(src, delimiters);    

while (token != NULL)    

{        

printf("%d: [%s]\n", ++toknum, token);        

token = strtok(NULL, delimiters);    

}                  /* source is now "Hello\0, world\0\0" */ }

Output:

1: [Hello] 

2: [world]

The string of delimiters may contain one or more delimiters and different delimiter strings may be used with each call to strtok.

Calls to strtok to continue tokenizing the same source string should not pass the source string again, but instead pass NULL as the first argument. If the same source string is passed then the first token will instead be re-tokenized. That is, given the same delimiters, strtok would simply return the first token again.

Note that as strtok does not allocate new memory for the tokens, it modifies the source string. That is, in the above example, the string src will be manipulated to produce the tokens that are referenced by the pointer returned by the calls to strtok. This means that the source string cannot be const (so it can't be a string literal). It also means that the identity of the delimiting byte is lost (i.e. in the example the "," and "!" are effectively deleted from the source string and you cannot tell which delimiter character matched).

Note also that multiple consecutive delimiters in the source string are treated as one; in the example, the second comma is ignored.

strtok is neither thread safe nor re-entrant because it uses a static buffer while parsing. This means that if a function calls strtok, no function that it calls while it is using strtok can also use strtok, and it cannot be called by any function that is itself using strtok.

An example that demonstrates the problems caused by the fact that strtokis not re-entrant is as follows:

char src[] = "1.2,3.5,4.2"; 

char *first = strtok(src, ",");

do 

{    

char *part;    /* Nested calls to strtok do not work as desired */    

printf("[%s]\n", first);    

part = strtok(first, ".");    

while (part != NULL)    

{        

printf(" [%s]\n", part);        

part = strtok(NULL, ".");    

while ((first = strtok(NULL, ",")) != NULL);

Output:

[1.2] 

[1] 

[2]

The expected operation is that the outer do while loop should create three tokens consisting of each decimal number string ("1.2", "3.5", "4.2"), for each of which the strtok calls for the inner loop should split it into separate digit strings ("1", "2", "3", "5", "4", "2").

However, because strtok is not re-entrant, this does not occur. Instead the first strtok correctly creates the "1.2\0" token, and the inner loop correctly creates the tokens "1" and "2". But then the strtok in the outer loop is at the end of the string used by the inner loop, and returns NULL immediately. The second and third substrings of the src array are not analyzed at all.

Version < C11

The standard C libraries do not contain a thread-safe or re-entrant version but some others do, such as POSIX' strtok_r. Note that on MSVC the strtok equivalent, strtok_s is thread-safe.

Version ≥ C11

C11 has an optional part, Annex K, that offers a thread-safe and re-entrant version named strtok_s. You can test for the feature with __STDC_LIB_EXT1__. This optional part is not widely supported.

The strtok_s function differs from the POSIX strtok_r function by guarding against storing outside of the string being tokenized, and by checking runtime constraints. On correctly written programs, though, the strtok_s and strtok_r behave the same.

Using strtok_s with the example now yields the correct response, like so:

/* you have to announce that you want to use Annex K */ 

#define __STDC_WANT_LIB_EXT1__ 1 

#include <string.h>

#ifndef __STDC_LIB_EXT1__ 

# error "we need strtok_s from Annex K" 

#endif

char src[] = "1.2,3.5,4.2";  

char *next = NULL; 

char *first = strtok_s(src, ",", &next);

do 

{    

char *part;    

char *posn;

printf("[%s]\n", first);    

part = strtok_s(first, ".", &posn);    

while (part != NULL)    

{        

printf(" [%s]\n", part);        

part = strtok_s(NULL, ".", &posn);    

while ((first = strtok_s(NULL, ",", &next)) != NULL);

And the output will be:

[1.2] 

[1] 

[2] 

[3.5] 

[3] 

[5] 

[4.2] 

[4] 

[2]

3.3k questions

7.1k answers

393 comments

4.5k users

Related questions

 Goeduhub:

About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © goeduhub.com Social::   |  | 
...