Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Autofeaturizer may run redundant conversions as many as 3 times #176

Closed
ardunn opened this issue Feb 3, 2019 · 2 comments
Closed

Autofeaturizer may run redundant conversions as many as 3 times #176

ardunn opened this issue Feb 3, 2019 · 2 comments
Assignees

Comments

@ardunn
Copy link
Contributor

ardunn commented Feb 3, 2019

This is being caused by calling fit_transform, because both fit and transform call the methods for converting structures to oxidstructures, compositions to oxidcompositions, dict/string structures/comps to structure/comp objects and so on.'

Luckily these conversions are fast now, but this is still a priority issue since on large datasets it can take a while...

@ardunn ardunn added the priority label Feb 3, 2019
@ardunn ardunn self-assigned this Feb 3, 2019
@ardunn
Copy link
Contributor Author

ardunn commented Feb 3, 2019

Example: This log:

2019-02-02 23:39:32,612 INFO Hostname/IP lookup (this will take a few seconds)
2019-02-02 23:39:32,616 INFO Launching Rocket
2019-02-02 23:39:44,449 INFO RUNNING fw_id: 6 in directory: /global/scratch/ardunn/fws/block_2019-02-03-07-34-11-600263/launcher_2019-02-03-07-39-26-894234
2019-02-02 23:39:44,832 INFO Task started: {{hmprivate.automatminer.benchmarking.tasks.RunPipe}}.
2019-02-02 23:39:49 WARNING  Beginning strict benchmark.
2019-02-02 23:39:49 INFO     Training on fold index 0
2019-02-02 23:39:49 INFO     Fitting MatPipe pipeline to data.
2019-02-02 23:39:49 INFO     structure detected as strings. Attempting conversion to structure objects...
2019-02-02 23:39:49 INFO     Guessing oxidation states of structures, as they were not present in input.
2019-02-02 23:39:51 INFO     Guessing oxidation states of compositions, as they were not present in input.
2019-02-02 23:39:52 INFO     Fit AtomicOrbitals to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     Fit ElementProperty to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     Fit ElementProperty to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     Fit ElementProperty to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     Fit ElementFraction to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     Fit Stoichiometry to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     Fit TMetalFraction to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     Fit BandCenter to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     Fit ValenceOrbital to 944 samples in dataframe.
2019-02-02 23:39:52 INFO     structure detected as strings. Attempting conversion to structure objects...
2019-02-02 23:39:53 INFO     Guessing oxidation states of structures, as they were not present in input.
2019-02-02 23:39:54 INFO     Fit DensityFeatures to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit GlobalSymmetryFeatures to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit EwaldEnergy to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit SiteStatsFingerprint to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit SiteStatsFingerprint to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit SiteStatsFingerprint to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit SiteStatsFingerprint to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit SiteStatsFingerprint to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit SiteStatsFingerprint to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit SiteStatsFingerprint to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit ChemicalOrdering to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit StructuralHeterogeneity to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit MaximumPackingEfficiency to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit XRDPowderPattern to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit Dimensionality to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit OrbitalFieldMatrix to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Fit JarvisCFID to 944 samples in dataframe.
2019-02-02 23:39:54 INFO     Featurizer type bandstructure not in the dataframe to be fitted. Skipping...
2019-02-02 23:39:54 INFO     Featurizer type dos not in the dataframe to be fitted. Skipping...
2019-02-02 23:39:54 INFO     composition column already exists, overwriting with composition from structure.
2019-02-02 23:39:54 INFO     structure detected as strings. Attempting conversion to structure objects...
2019-02-02 23:39:55 INFO     Guessing oxidation states of structures, as they were not present in input.

@ardunn
Copy link
Contributor Author

ardunn commented Feb 7, 2019

Closed via afc56f8

@ardunn ardunn closed this as completed Feb 7, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant