summaryrefslogtreecommitdiff
path: root/includes/external/addressbook/node_modules/chardet/README.md
blob: 6563a852198025c16122278d370d262408f799dd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# chardet

*Chardet* is a character detection module written in pure Javascript (Typescript). Module uses occurrence analysis to determine the most probable encoding.

- Packed size is only **22 KB**
- Works in all environments: Node / Browser / Native
- Works on all platforms: Linux / Mac / Windows
- No dependencies
- No native code / bindings
- 100% written in Typescript
- Extensive code coverage

## Installation

```
npm i chardet
```

## Usage

To return the encoding with the highest confidence:

```javascript
import chardet from 'chardet';

const encoding = chardet.detect(Buffer.from('hello there!'));
// or
const encoding = await chardet.detectFile('/path/to/file');
// or
const encoding = chardet.detectFileSync('/path/to/file');
```

To return the full list of possible encodings use `analyse` method.

```javascript
import chardet from 'chardet';
chardet.analyse(Buffer.from('hello there!'));
```

Returned value is an array of objects sorted by confidence value in descending order

```javascript
[
  { confidence: 90, name: 'UTF-8' },
  { confidence: 20, name: 'windows-1252', lang: 'fr' }
];
```

## Working with large data sets

Sometimes, when data set is huge and you want to optimize performance (with a tradeoff of less accuracy),
you can sample only the first N bytes of the buffer:

```javascript
chardet
  .detectFile('/path/to/file', { sampleSize: 32 })
  .then(encoding => console.log(encoding));
```

You can also specify where to begin reading from in the buffer:

```javascript
chardet
  .detectFile('/path/to/file', { sampleSize: 32, offset: 128 })
  .then(encoding => console.log(encoding));
```

## Supported Encodings:

- UTF-8
- UTF-16 LE
- UTF-16 BE
- UTF-32 LE
- UTF-32 BE
- ISO-2022-JP
- ISO-2022-KR
- ISO-2022-CN
- Shift_JIS
- Big5
- EUC-JP
- EUC-KR
- GB18030
- ISO-8859-1
- ISO-8859-2
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- windows-1250
- windows-1251
- windows-1252
- windows-1253
- windows-1254
- windows-1255
- windows-1256
- KOI8-R

Currently only these encodings are supported.

## Typescript?

Yes. Type definitions are included.

### References

- ICU project http://site.icu-project.org/