fbb725bbdc
As usual we ask ICU to do the actual work. The TextEncoding constructor is fed with a sample of the text to identify (ICU docs recommend a few hundred bytes). The text is analyzed in various ways (bytes patterns such as UTF-8 escaping schemes, common letter sequences from known languages, byte order marks) and an encoding is determined. Replace code in StyledEdit by this new implementation. Note that ICU seems to always return some valid encoding, even with fed with obviously non-text data. This makes StyledEdit open the files no matter what, where it would error out before. Fixes #9395.
29 lines
341 B
C++
29 lines
341 B
C++
/*
|
|
* Copyright 2016, Haiku, inc.
|
|
* Distributed under terms of the MIT license.
|
|
*/
|
|
|
|
|
|
#ifndef TEXTENCODING_H
|
|
#define TEXTENCODING_H
|
|
|
|
|
|
#include <String.h>
|
|
|
|
#include <stddef.h>
|
|
|
|
|
|
class TextEncoding
|
|
{
|
|
public:
|
|
TextEncoding(const char* data, size_t length);
|
|
|
|
BString GetName();
|
|
|
|
private:
|
|
BString fName;
|
|
};
|
|
|
|
|
|
#endif /* !TEXTENCODING_H */
|