1
0
mirror of https://gitea.com/gitea/tea.git synced 2024-11-03 04:27:21 -05:00
tea/vendor/github.com/rivo/uniseg/README.md
6543 0d98cbd657 Update Vendors (#337)
* update & migrate gitea sdk (Fix Delete Tag Issue)
* upgraded github.com/AlecAivazis/survey v2.2.7 => v2.2.8
* upgraded github.com/adrg/xdg v0.2.3 => v0.3.1
* upgraded github.com/araddon/dateparse
* upgraded github.com/olekukonko/tablewriter v0.0.4 => v0.0.5
* upgraded gopkg.in/yaml.v2 v2.3.0 => v2.4.0

Reviewed-on: https://gitea.com/gitea/tea/pulls/337
Reviewed-by: Norwin <noerw@noreply.gitea.io>
Reviewed-by: khmarbaise <khmarbaise@noreply.gitea.io>
Co-authored-by: 6543 <6543@obermui.de>
Co-committed-by: 6543 <6543@obermui.de>
2021-03-05 18:06:25 +08:00

63 lines
2.2 KiB
Markdown

# Unicode Text Segmentation for Go
[![Godoc Reference](https://img.shields.io/badge/godoc-reference-blue.svg)](https://godoc.org/github.com/rivo/uniseg)
[![Go Report](https://img.shields.io/badge/go%20report-A%2B-brightgreen.svg)](https://goreportcard.com/report/github.com/rivo/uniseg)
This Go package implements Unicode Text Segmentation according to [Unicode Standard Annex #29](http://unicode.org/reports/tr29/) (Unicode version 12.0.0).
At this point, only the determination of grapheme cluster boundaries is implemented.
## Background
In Go, [strings are read-only slices of bytes](https://blog.golang.org/strings). They can be turned into Unicode code points using the `for` loop or by casting: `[]rune(str)`. However, multiple code points may be combined into one user-perceived character or what the Unicode specification calls "grapheme cluster". Here are some examples:
|String|Bytes (UTF-8)|Code points (runes)|Grapheme clusters|
|-|-|-|-|
|Käse|6 bytes: `4b 61 cc 88 73 65`|5 code points: `4b 61 308 73 65`|4 clusters: `[4b],[61 308],[73],[65]`|
|🏳️‍🌈|14 bytes: `f0 9f 8f b3 ef b8 8f e2 80 8d f0 9f 8c 88`|4 code points: `1f3f3 fe0f 200d 1f308`|1 cluster: `[1f3f3 fe0f 200d 1f308]`|
|🇩🇪|8 bytes: `f0 9f 87 a9 f0 9f 87 aa`|2 code points: `1f1e9 1f1ea`|1 cluster: `[1f1e9 1f1ea]`|
This package provides a tool to iterate over these grapheme clusters. This may be used to determine the number of user-perceived characters, to split strings in their intended places, or to extract individual characters which form a unit.
## Installation
```bash
go get github.com/rivo/uniseg
```
## Basic Example
```go
package uniseg
import (
"fmt"
"github.com/rivo/uniseg"
)
func main() {
gr := uniseg.NewGraphemes("👍🏼!")
for gr.Next() {
fmt.Printf("%x ", gr.Runes())
}
// Output: [1f44d 1f3fc] [21]
}
```
## Documentation
Refer to https://godoc.org/github.com/rivo/uniseg for the package's documentation.
## Dependencies
This package does not depend on any packages outside the standard library.
## Your Feedback
Add your issue here on GitHub. Feel free to get in touch if you have any questions.
## Version
Version tags will be introduced once Golang modules are official. Consider this version 0.1.