Factor is one of the worst thing of R-world. I don't recall ever needing factors...

VLM · on Feb 16, 2016

You should use factor for data cleaning and verification.

So you have "sex" on the questionnaire, and factor will very quickly identify contamination such as "often", "not yet", various mis-spellings, etc.

stewbrew · on Feb 16, 2016

How would you represent categorical data then? R's primary use case isn't text processing. And HW isn't always right.

klmr · on Feb 17, 2016

As character, for instance (in particular, they can do everything factors can do when used in conjunction with `unique`, and sorted factors can be represented as a conjunction of characters and numerics). Factors work better, but only barely. In particular, they are nowadays not any more efficient than using character (!). They used to be, which is why they are liberally used everywhere in R’s base libraries.

stewbrew · on Feb 17, 2016

"In particular, they are nowadays not any more efficient than using character"

How could a comparison of two strings of unknown size be as efficient as comparing two integers? I'm curious to learn something new.

hadley · on Feb 17, 2016

R uses a global string cache so any string comparison is just comparing two pointers.

th0br0 · on Feb 16, 2016

You will (inevitably?) run into factors when importing data from SPSS files... sure, you can discard them upon reading... but are you sure you don't want access to the value labels in the future?

lottin · on Feb 16, 2016

Factors are weird because no other language has anything like it, but they are actually a quite clever way to group data. It just takes a while to get used to them.

Fomite · on Feb 16, 2016

I actually use factors a fair amount, and having factor-like data shoved into numeric values gets you to some bad places statistically.

grayclhn · on Feb 17, 2016

You must not do a lot of regression with categorical data, then. I use commands like `lm(y ~ (x1 + x2) * factor_variable, data = d)` and `xyplot(y ~ x1 | factor_1, groups = factor_2, data = d)` all the time.

hadley · on Feb 17, 2016

Those also work just fine with strings.

grayclhn · on Feb 18, 2016

Via an implicit call to factor, right?

gbrown · on Feb 16, 2016

Factors are great, and surprisingly powerful even outside of statistical computing. With that being said, I prefer to create them on purpose rather than having read.csv attempting to be helpful.