Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

train function does not work the same in current version #480

Closed
Arcticgrayling opened this issue Sep 1, 2016 · 3 comments
Closed

train function does not work the same in current version #480

Arcticgrayling opened this issue Sep 1, 2016 · 3 comments

Comments

@Arcticgrayling
Copy link

Coursera Class is running code from class website:
https://github.com/DataScienceSpecialization/courses/blob/master/08_PracticalMachineLearning/016preProcessingPCA/index.Rmd

train function gives error with caret version 6.0-71. problem goes away with older versions.

modelFit <- train(training$type ~ .,method="glm",data=trainPC)

Error in [.data.frame(data, , all.vars(Terms), drop = FALSE) :

undefined columns selected

In addition: There were 26 warnings (use warnings() to see them)

Warnings() gives:

warnings()

Warning messages:

1: glm.fit: fitted probabilities numerically 0 or 1 occurred

2: glm.fit: fitted probabilities numerically 0 or 1 occurred

If I revert to older version of caret 6.0-58 error goes away.
Just Warnings with version 6.0-70.

CODE:
llibrary(caret); library(kernlab); data(spam)
inTrain <- createDataPartition(y=spam$type,
p=0.75, list=FALSE)
training <- spam[inTrain,]
testing <- spam[-inTrain,]

preProc <- preProcess(log10(training[,-58]+1),method="pca",pcaComp=2)
trainPC <- predict(preProc,log10(training[,-58]+1))
modelFit <- train(training$type ~ .,method="glm",data=trainPC)

Preprocessing with PCA

testPC <- predict(preProc,log10(testing[,-58]+1))
confusionMatrix(testing$type,predict(modelFit,testPC))

Alternative (sets # of PCs)

         modelFit <- train(training$type ~ .,method="glm",preProcess="pca",data=training)
         confusionMatrix(testing$type,predict(modelFit,testing))
@topepo
Copy link
Owner

topepo commented Sep 1, 2016

You shouldn't use the data set name on the LHS of the formula. The formula interface should be used when the variables are in columns of the object that the data argument refers to.

If type is not in training and there are only numeric variables in trainPC, then you should use the non-formula method:

modelFit <- train(x = trainPC, y = training$type,method="glm")

Does that work?

@Arcticgrayling
Copy link
Author

That does work, thanks. I'll let the Coursera folks know.

@ireliatt
Copy link

Hi, what about this line:

modelFit <- train(training$type ~ .,method="glm",preProcess="pca",data=training)

I deleted data=training but it still doesn't work...

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants