The probabilistic representation of linguistic knowledge: Linguistic data sets annotated for grammatical acceptability