To bring together and apply the various topics covered in this course, you will work on a project. The goal of the project is to go through the complete knowledge discovery process to answer one or more questions you have about a topic of your own choosing. You will acquire the data, formulate a question (or questions) of interest, perform the data analysis, and communicate the results. Projects are programming assignments that cover the topic of this course.
Students individually implement given assignments in recommended softwares. Each Student should create his/her own folder inside of this folder (using pull request) and place his/her files in this folder. The name of each folder starts with the name of data followed by underline and it ends with the familyname of the student (NameOfData_Familyname). The folder should comprise a file containing the analysis of the result.
Each student will explain his/her project in a 10–15 minute presentation to the class. Presentations should clearly convey the project ideas, methods, and results, including the question(s) being addressed, the motivation of the analyses being employed, and relevant evaluations, contributions, and discussion questions.
You may also find it helpful to read Python and its wonderful packages (Pandas, NumPy, Scikit-learn, SciPy, Seaborn, Matplotlib), but note that in this course it’s really not necessary to know Python. Instead, just use one of the following softwares to figure things out along the way.
- Orange, open source data analytics and mining through visual programming or Python scripting. Components for visualization, rule learning, clustering, model evaluation, and more.
- RapidMiner, makes data science teams more productive through a unified platform for data prep, machine learning, and model deployment.
- Weka, collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform.
To know more softwares, refer to the following webpage of KDnuggets: