☁️ Big Data Computing Course Assignments

Welcome to my Big Data Computing course repository! This collection showcases the assignments completed for INP7079233 Big Data Computing during the 2023-2024 academic year, under the guidance of Professors Pietracaprina and Silvestri, at University of Padova.

📚 Course Overview

This course dives deep into the world of big data, exploring advanced techniques for processing and analyzing massive datasets using cutting-edge technologies.

🧠 Key Learning Outcomes

Mastery of Apache Spark for large-scale data processing
Implementation of distributed algorithms
Real-time data stream analysis
Practical experience with cloud computing platforms

🛠️ Homework Assignments

Homework 1: Outlier Detection in Large Datasets

Objective: Implement and compare exact and approximate outlier detection algorithms using Spark.

Key Components:

Exact algorithm implementation (sequential)
Approximate algorithm using Spark RDDs
Performance and accuracy analysis

🔗 Detailed Assignment Description

Homework 2: K-Center Clustering for Outlier Detection

Objective: Enhance outlier detection by integrating k-center clustering techniques.

Key Tasks:

Refine MRApproxOutliers from HW1
Implement Farthest-First Traversal (FFT) algorithm
Develop MapReduce FFT (MRFFT)
Execute experiments on CloudVeneto cluster

🔗 Detailed Assignment Description

Homework 3: Frequent Item Detection in Data Streams

Objective: Utilize Spark Streaming API to identify frequent items in real-time data streams.

Highlight Features:

Reservoir sampling implementation
Sticky sampling method
Real-time stream processing
Comparative analysis of sampling methods

🔗 Detailed Assignment Description

🛠️ Technologies & Tools

Apache Spark & Spark Streaming
Java
CloudVeneto Cluster

🌟 Key Takeaways

This course offered an immersive journey into the realm of big data, providing:

Hands-on experience with industry-standard big data tools
Deep understanding of distributed computing principles
Practical skills in real-time data analysis and processing

📄 License

This project is licensed under the MIT License with a Non-Commercial Clause - see the LICENSE file for details.

💡 Feel free to explore the code and documentation!

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
build/libs		build/libs
hw1_files		hw1_files
hw2_files		hw2_files
hw3_files		hw3_files
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TestHuge-input.txt		TestHuge-input.txt
TestN15-input.txt		TestN15-input.txt
build.gradle		build.gradle
sentence_small.txt		sentence_small.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

☁️ Big Data Computing Course Assignments

📚 Course Overview

🧠 Key Learning Outcomes

🛠️ Homework Assignments

Homework 1: Outlier Detection in Large Datasets

Homework 2: K-Center Clustering for Outlier Detection

Homework 3: Frequent Item Detection in Data Streams

🛠️ Technologies & Tools

🌟 Key Takeaways

📄 License

About

Releases

Packages

Languages

License

francesco-biscaccia-carrara/BigData_Projects

Folders and files

Latest commit

History

Repository files navigation

☁️ Big Data Computing Course Assignments

📚 Course Overview

🧠 Key Learning Outcomes

🛠️ Homework Assignments

Homework 1: Outlier Detection in Large Datasets

Homework 2: K-Center Clustering for Outlier Detection

Homework 3: Frequent Item Detection in Data Streams

🛠️ Technologies & Tools

🌟 Key Takeaways

📄 License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages