Java codepoints5/29/2023 ![]() IntStream intStream = “someString”.codePoints() For such cases-if your application has to process supplementary characters too-use method codePoints() instead. forEach(c -> (Character.toChars(c))) Īs you can see, the information about the character as a whole is lost. String supplString = new String(Character.toChars(0x1F600) ) The chars() method treats them as two different code points: The supplementary characters are not appropriate for processing in a stream created by the chars() method because they are represented as a pair of char values, the first-from the high-surrogates range, ( \uD800-\uDBFF), the second-from the low-surrogates range ( \uDC00-\uDFFF). String supplString = new String(Character.toChars(0x1F600)) The character 0x1F600 is an example of an emoji called “grinning face”: print(0x1F600 > Character.MIN_SUPPLEMENTARY_CODE_POINT) print('a' > Character.MIN_SUPPLEMENTARY_CODE_POINT) You can find if the character is a supplementary one by comparing it with Character.MIN_SUPPLEMENTARY_CODE_POINT. They are called supplementary characters and they are greater than U FFFF (such as emoji, for example). There are also characters that are not appropriate for processing in a stream created by the chars() method. For the majority of mainstream applications-those that do not process a significant number of characters-the performance gain does not justify the clarity of the code, which going to be less human-readable if all the characters are processed as integers. It does not mean though that one has to process characters this way all the time. This example is very simplistic, but I hope it makes the point: processing characters as integers-without converting them to Character-can be more efficient than processing them as objects or boxing/unboxing them unnecessarily. For example, the following code selects only the lower-case letters, taking advantage of the fact that the upper-case Latin letters are listed in the character set before the lower-case Latin letters: Using code point you can process the characters without boxing them into the Characterobjects, thus improving performance, which can be significant when the number of processed characters is big enough. To demonstrate that the emitted values are the expected ‘a’, ‘b’, and ‘c’, we can cast the emitted values back to char as follows: The created IntStream emits code points (integer char values) of the characters. ![]() IntStream intStream = “someString”.chars() ![]() The lines() method creates a stream of lines extracted from this string, separated by line terminators. The chars() and codePoints() methods create a stream of code points of the characters that compose the string.
0 Comments
Leave a Reply. |