Content-type: text/html; charset=UTF-8 Man page of GRAPHEME_NEXT_CHARACTER_BREAK_UTF8

GRAPHEME_NEXT_CHARACTER_BREAK_UTF8

Section: C Library Functions (3)
Index Return to Main Contents

BSD mandoc
suckless.org  

NAME

grapheme_next_character_break_utf8 - determine byte-offset to next grapheme cluster break  

SYNOPSIS

In grapheme.h Ft size_t Fn grapheme_next_character_break_utf8 const char *str size_t len  

DESCRIPTION

The Fn grapheme_next_character_break_utf8 function computes the offset (in bytes) to the next grapheme cluster break (see libgrapheme(7)) in the UTF-8-encoded string str of length len If a grapheme cluster begins at str this offset is equal to the length of said grapheme cluster.

If len is set to SIZE_MAX (stdint.h is already included by grapheme.h) the string str is interpreted to be NUL-terminated and processing stops when a NUL-byte is encountered.

For non-UTF-8 input data grapheme_is_character_break3and grapheme_next_character_break3 can be used instead.  

RETURN VALUES

The Fn grapheme_next_character_break_utf8 function returns the offset (in bytes) to the next grapheme cluster break in str or 0 if str is NULL  

EXAMPLES

/* cc (-static) -o example example.c -lgrapheme */
#include <grapheme.h>
#include <stdint.h>
#include <stdio.h>

int
main(void)
{
        /* UTF-8 encoded input */
        char *s = "T\xC3\xABst \xF0\x9F\x91\xA8\xE2\x80\x8D\xF0"
                  "\x9F\x91\xA9\xE2\x80\x8D\xF0\x9F\x91\xA6 \xF0"
                  "\x9F\x87\xBA\xF0\x9F\x87\xB8 \xE0\xA4\xA8\xE0"
                  "\xA5\x80 \xE0\xAE\xA8\xE0\xAE\xBF!";
        size_t ret, len, off;

        printf("Input: \"%s\"\n", s);

        /* print each grapheme cluster with byte-length */
        printf("grapheme clusters in NUL-delimited input:\n");
        for (off = 0; s[off] != '\0'; off += ret) {
                ret = grapheme_next_character_break_utf8(s + off, SIZE_MAX);
                printf("%2zu bytes | %.*s\n", ret, (int)ret, s + off);
        }
        printf("\n");

        /* do the same, but this time string is length-delimited */
        len = 17;
        printf("grapheme clusters in input delimited to %zu bytes:\n", len);
        for (off = 0; off < len; off += ret) {
                ret = grapheme_next_character_break_utf8(s + off, len - off);
                printf("%2zu bytes | %.*s\n", ret, (int)ret, s + off);
        }

        return 0;
}
 

SEE ALSO

grapheme_next_character_break3, libgrapheme(7)  

STANDARDS

Fn grapheme_next_character_break_utf8 is compliant with the Unicode 15.0.0 specification.  

AUTHORS

An Laslo Hunhold Aq Mt dev@frign.de


 

Index

NAME
SYNOPSIS
DESCRIPTION
RETURN VALUES
EXAMPLES
SEE ALSO
STANDARDS
AUTHORS

This document was created by man2html, using the manual pages.
Time: 06:27:21 GMT, December 21, 2024