parent
2c5a19e5ff
commit
d67c997d21
|
@ -250,7 +250,7 @@ charToRaw("Hadley")
|
||||||
|
|
||||||
Each hexadecimal number represents a byte of information: `48` is H, `61` is a, and so on. The mapping from hexadecimal number to character is called the encoding, and in this case the encoding is called ASCII. ASCII does a great job of representing English characters, because it's the __American__ Standard Code for Information Interchange.
|
Each hexadecimal number represents a byte of information: `48` is H, `61` is a, and so on. The mapping from hexadecimal number to character is called the encoding, and in this case the encoding is called ASCII. ASCII does a great job of representing English characters, because it's the __American__ Standard Code for Information Interchange.
|
||||||
|
|
||||||
Things get more complicated for languages other than English. In the early days of computing there were many competing standards for encoding non-English characters, and to correctly interpret a string you need to know both the values and the encoding. For example, two common encodings are Latin1 (aka ISO-8859-1, used for Western European languages) and Latin2 (aka ISO-8859-2, used for Eastern European languages). In Latin1, the byte `b1` is "±", but in Latin2, it's "ą"! Fortunately, today there is one standard that is supported almost everywhere: UTF-8. UTF-8 can encode just about every character used by humans today, as well as many extra symbols (like emoji!).
|
Things get more complicated for languages other than English. In the early days of computing there were many competing standards for encoding non-English characters, and to correctly interpret a string you needed to know both the values and the encoding. For example, two common encodings are Latin1 (aka ISO-8859-1, used for Western European languages) and Latin2 (aka ISO-8859-2, used for Eastern European languages). In Latin1, the byte `b1` is "±", but in Latin2, it's "ą"! Fortunately, today there is one standard that is supported almost everywhere: UTF-8. UTF-8 can encode just about every character used by humans today, as well as many extra symbols (like emoji!).
|
||||||
|
|
||||||
readr uses UTF-8 everywhere: it assumes your data is UTF-8 encoded when you read it, and always uses it when writing. This is a good default, but will fail for data produced by older systems that don't understand UTF-8. If this happens to you, your strings will look weird when you print them. Sometimes just one or two characters might be messed up; other times you'll get complete gibberish. For example:
|
readr uses UTF-8 everywhere: it assumes your data is UTF-8 encoded when you read it, and always uses it when writing. This is a good default, but will fail for data produced by older systems that don't understand UTF-8. If this happens to you, your strings will look weird when you print them. Sometimes just one or two characters might be messed up; other times you'll get complete gibberish. For example:
|
||||||
|
|
||||||
|
@ -340,7 +340,7 @@ Time
|
||||||
: `%M` minutes.
|
: `%M` minutes.
|
||||||
: `%S` integer seconds.
|
: `%S` integer seconds.
|
||||||
: `%OS` real seconds.
|
: `%OS` real seconds.
|
||||||
: `%Z` Time zone (as name, e.g. `America/Chicago`). Beware abbreviations:
|
: `%Z` Time zone (as name, e.g. `America/Chicago`). Beware of abbreviations:
|
||||||
if you're American, note that "EST" is a Canadian time zone that does not
|
if you're American, note that "EST" is a Canadian time zone that does not
|
||||||
have daylight savings time. It is \emph{not} Eastern Standard Time! We'll
|
have daylight savings time. It is \emph{not} Eastern Standard Time! We'll
|
||||||
come back to this [time zones].
|
come back to this [time zones].
|
||||||
|
@ -628,6 +628,6 @@ To get other types of data into R, we recommend starting with the tidyverse pack
|
||||||
__RSQLite__, __RPostgreSQL__ etc) allows you to run SQL queries against a
|
__RSQLite__, __RPostgreSQL__ etc) allows you to run SQL queries against a
|
||||||
database and return a data frame.
|
database and return a data frame.
|
||||||
|
|
||||||
For hierarchical data: use __jsonlite__ (by Jeroen Ooms) for json, and __xml2__ for XML. whichYou will need to convert them to data frames using the tools on [handling hierarchy].
|
For hierarchical data: use __jsonlite__ (by Jeroen Ooms) for json, and __xml2__ for XML. You will need to convert them to data frames using the tools on [handling hierarchy].
|
||||||
|
|
||||||
For other file types, try the [R data import/export manual](https://cran.r-project.org/doc/manuals/r-release/R-data.html) and the [__rio__](https://github.com/leeper/rio) package.
|
For other file types, try the [R data import/export manual](https://cran.r-project.org/doc/manuals/r-release/R-data.html) and the [__rio__](https://github.com/leeper/rio) package.
|
||||||
|
|
Loading…
Reference in New Issue