in a nutshell: fit trainingData (train a model), transform testData (predict with model)
- Transformer: DataFrame => DataFrame
- Estimator: DataFrame => Transformer
#Transformers
- Tokenizer: sentence => words
- RegexTokenizer: sentence => words - setPattern
- HashingTF: terms => feature vectors based on frequency - setNumFeatures