Skip to content

Sumitkumar005/Full-Stack-Data-AI-Engineer-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

Sumit Kumar - Full-Stack Data & AI Engineer Portfolio

Welcome to my project portfolio! I am a passionate engineer with expertise in Data Analysis, Machine Learning, Deep Learning, Computer Vision, Data Engineering, Generative AI, and AI Agent systems. My work spans from building robust data pipelines and predictive models to developing intelligent multi-agent systems for real-world applications.


About Me

Hello! My name is Sumit Kumar, and I am a tech enthusiast with a background in software development and a passion for data and AI. My journey started in software engineering and evolved into data science and AI engineering. I excel at constructing scalable data pipelines, developing machine learning models, and building innovative AI systems to solve complex problems. This portfolio highlights my technical skills, projects, and progression in these fields.


Table of Contents


Data Analysis Projects

1. Transfer Tracker: Football Data Insights.

  • Description: Developed end-to-end pipelines for data cleaning, feature engineering, and exploratory analysis using Python, Pandas, NumPy, and Matplotlib/Seaborn.
  • Key Features: Missing value detection, outlier handling, and dynamic dashboards.
  • Technologies: Python, Pandas, NumPy, Matplotlib, Seaborn
  • Repository: https://github.com/Sumitkumar005/Projects](#)

2. University Rankings Explorer: A Global Analysis

  • Description: World University Ranking was analysed in order to practise Python Data Analysis libraries and find out about the classification of the university in the world. World University Ranking and Shanghai Ranking were used to analyze. Then, these rankings were compared between selected countries.
  • Technologies: Python, Pandas, NumPy, Matplotlib, Seaborn
  • Repository: https://github.com/Sumitkumar005/Projects](#)

Machine Learning Projects

1. Breast Cancer Classification using Gradient Boosting

  • Description: An end-to-end ML pipeline for classifying breast cancer data using Gradient Boosting Classifier.
  • Key Features: Data splitting, model training, hyperparameter tuning, and performance evaluation using ROC-AUC and confusion matrices.
  • Technologies: Python, Scikit-Learn, Pandas, NumPy, Matplotlib
  • Repository: https://github.com/Sumitkumar005/Projects](#)

2. Customer Churn Prediction & Other Predictive Models

  • Description: Multiple projects focusing on predictive analytics such as customer churn prediction, travel insurance prediction, and more.
  • Technologies: Python, Scikit-Learn, XGBoost, Pandas, Seaborn
  • Repository: GitHub Link

Deep Learning & Computer Vision Projects

1. CNN-based Image Recognition

  • Description: Designed and trained convolutional neural networks for image classification tasks.
  • Key Features: Model building, data augmentation, transfer learning.
  • Technologies: TensorFlow, Keras, OpenCV
  • Repository: GitHub Link

2. OCR & Real Estate Image Analysis

  • Description: Extracted text from property images using pytesseract and built models for real estate market analysis.
  • Technologies: Python, pytesseract, OpenCV
  • Repository: GitHub Link

Data Engineering Projects

1. Project 1

  • Description: It is a IoT Smoke Detection project where data was emitted at high volumne in a continuous, incremental manner with the goal of low-latency processing. Apache Kafka was used to process streaming data in real-time, then data was transformed by Apache Spark and finally loaded to SQL Server.
  • Technologies: Python, Apache Airflow, Apache Kafka,Apache Spark, SQL.
  • Repository: https://github.com/Sumitkumar005/Projects](#)

2. Project 2

  • Description: Data was extracted from websites that holds Currency Exchange Rates for Currencies. Data was continously generated. Apache Kafka was used to process streaming data in real-time. These tasks were triggered by Apache Airflow Data from Apache Kafka was read as well as transformed by Apache Spark. Finally, data was loaded to PostgresSQL. Docker was used to run this application in multicontainers.
  • Technologies: Python, Apache Airflow, Apache Kafka,Apache Spark,PostgreSQL, Docker.
  • Repository: https://github.com/Sumitkumar005/Projects(#)

-### 3. Project 3

  • Description: US Dollar Exchange Rates Table as well as Percentage Change in the Last 24 Hours Tables were extracted from a website. Data was extracted and loaded to a MinIO bucket using Python. This data was also continously generated. Apache Kafka was used to process streaming data in real-time. These tasks were triggered by Apache Airflow Data from Apache Kafka was read as well as transformed by Apache Spark. Finally, data was loaded to Apache Cassandra. Docker was used to run this application in multicontainers.
  • Technologies: Python, MinIO, Apache Airflow, Apache Kafka,Apache Spark, Apache Cassandra, docker.
  • Repository: https://github.com/Sumitkumar005/Projects(#)

Generative AI Projects

1. AI Customer Support Platform with RAG Chatbot

  • Description: A platform integrating a discussion forum with an AI chatbot using Retrieval-Augmented Generation (RAG) to provide dynamic customer support.
  • Technologies: Python, Streamlit, OpenAI GPT-4o, FAISS, Qdrant, Mem0
  • Repository: GitHub Link

2. AI Lead Generation & Competitor Intelligence Agents

  • Description: Projects that leverage GPT models and external APIs to generate leads and analyze competitor data, automating business insights.
  • Technologies: Python, Streamlit, Firecrawl, Composio, OpenAI GPT-4o
  • Repository: GitHub Link

AI Agent Systems

1. AI Recruitment Agent Team

  • Description: Simulates a full-service recruitment process using specialized AI agents for resume analysis, candidate communication, and interview scheduling.
  • Key Features: Gmail and Zoom API integration, asynchronous multi-agent collaboration.
  • Technologies: Python, Streamlit, OpenAI GPT-4o, Phidata, EmailTools
  • Repository: GitHub Link

2. AI Services Agency

  • Description: A digital agency simulation where multiple AI agents collaborate to analyze and plan software projects.
  • Key Features: Asynchronous communication, role-specific agents (CEO, CTO, Product Manager, Developer, Client Success).
  • Technologies: Python, Streamlit, OpenAI GPT-4o, Custom Analysis Tools
  • Repository: GitHub Link

3. AI Real Estate Agent

  • Description: Automates property search and market analysis by aggregating data from multiple real estate websites and providing intelligent insights.
  • Key Features: Multi-source data integration, location trend analysis, GPT-powered recommendations.
  • Technologies: Python, Streamlit, Firecrawl, OpenAI GPT-4o
  • Repository: GitHub Link

4. AI Competitor Intelligence Agent Team

  • Description: Analyzes competitor websites to generate actionable business insights using a multi-agent system.
  • Key Features: Data extraction, competitive analysis, and structured reporting.
  • Technologies: Python, Streamlit, Firecrawl, Exa AI, OpenAI GPT-4o
  • Repository: GitHub Link

5. AI Lead Generation Agent

  • Description: Automates lead generation by extracting and processing data from Quora, integrating with Google Sheets for streamlined data presentation.
  • Key Features: Targeted search, intelligent data extraction, and customizable criteria.
  • Technologies: Python, Streamlit, Firecrawl, Composio, OpenAI GPT-4o
  • Repository: GitHub Link

Certifications & Education

Education:

  • [Your University], [Your Degree], [Years Attended]
  • [Additional relevant training or bootcamps]

Certifications:

  • [Certification Name] (e.g., Google Cloud Professional Data Engineer)
  • [Certification Name] (e.g., AWS Certified Solutions Architect – Professional)
  • [Other Certifications]

Contact Information

  • Email: [Your Email Address]
  • LinkedIn: [Your LinkedIn URL]
  • GitHub: [Your GitHub URL]
  • Portfolio Website: [Your Website URL, if applicable]

Thank you for reviewing my portfolio. I look forward to discussing how I can contribute to your team with my broad and deep expertise in data and AI technologies.


Note: This is an archived portfolio that highlights my skills and project progression. For the most up-to-date work, please refer to my GitHub profile or contact me directly.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published