Welcome to my project portfolio! I am a passionate engineer with expertise in Data Analysis, Machine Learning, Deep Learning, Computer Vision, Data Engineering, Generative AI, and AI Agent systems. My work spans from building robust data pipelines and predictive models to developing intelligent multi-agent systems for real-world applications.
- Based: [Bijnor, India]
- Current Role: [ML Intern]
- Contact: [+91 7503112577] | [Sumitkumar969074@gmail.com]
- LinkedIn: [https://www.linkedin.com/in/sumit-kumar-02a145239/]
- GitHub: [https://github.com/Sumitkumar005]
Hello! My name is Sumit Kumar, and I am a tech enthusiast with a background in software development and a passion for data and AI. My journey started in software engineering and evolved into data science and AI engineering. I excel at constructing scalable data pipelines, developing machine learning models, and building innovative AI systems to solve complex problems. This portfolio highlights my technical skills, projects, and progression in these fields.
- Data Analysis Projects
- Machine Learning Projects
- Deep Learning & Computer Vision Projects
- Data Engineering Projects
- Generative AI Projects
- AI Agent Systems
- Additional Projects
- Certifications & Education
- Contact Information
- Description: Developed end-to-end pipelines for data cleaning, feature engineering, and exploratory analysis using Python, Pandas, NumPy, and Matplotlib/Seaborn.
- Key Features: Missing value detection, outlier handling, and dynamic dashboards.
- Technologies: Python, Pandas, NumPy, Matplotlib, Seaborn
- Repository: https://github.com/Sumitkumar005/Projects](#)
- Description: World University Ranking was analysed in order to practise Python Data Analysis libraries and find out about the classification of the university in the world. World University Ranking and Shanghai Ranking were used to analyze. Then, these rankings were compared between selected countries.
- Technologies: Python, Pandas, NumPy, Matplotlib, Seaborn
- Repository: https://github.com/Sumitkumar005/Projects](#)
- Description: An end-to-end ML pipeline for classifying breast cancer data using Gradient Boosting Classifier.
- Key Features: Data splitting, model training, hyperparameter tuning, and performance evaluation using ROC-AUC and confusion matrices.
- Technologies: Python, Scikit-Learn, Pandas, NumPy, Matplotlib
- Repository: https://github.com/Sumitkumar005/Projects](#)
- Description: Multiple projects focusing on predictive analytics such as customer churn prediction, travel insurance prediction, and more.
- Technologies: Python, Scikit-Learn, XGBoost, Pandas, Seaborn
- Repository: GitHub Link
- Description: Designed and trained convolutional neural networks for image classification tasks.
- Key Features: Model building, data augmentation, transfer learning.
- Technologies: TensorFlow, Keras, OpenCV
- Repository: GitHub Link
- Description: Extracted text from property images using pytesseract and built models for real estate market analysis.
- Technologies: Python, pytesseract, OpenCV
- Repository: GitHub Link
- Description: It is a IoT Smoke Detection project where data was emitted at high volumne in a continuous, incremental manner with the goal of low-latency processing. Apache Kafka was used to process streaming data in real-time, then data was transformed by Apache Spark and finally loaded to SQL Server.
- Technologies: Python, Apache Airflow, Apache Kafka,Apache Spark, SQL.
- Repository: https://github.com/Sumitkumar005/Projects](#)
- Description: Data was extracted from websites that holds Currency Exchange Rates for Currencies. Data was continously generated. Apache Kafka was used to process streaming data in real-time. These tasks were triggered by Apache Airflow Data from Apache Kafka was read as well as transformed by Apache Spark. Finally, data was loaded to PostgresSQL. Docker was used to run this application in multicontainers.
- Technologies: Python, Apache Airflow, Apache Kafka,Apache Spark,PostgreSQL, Docker.
- Repository: https://github.com/Sumitkumar005/Projects(#)
-### 3. Project 3
- Description: US Dollar Exchange Rates Table as well as Percentage Change in the Last 24 Hours Tables were extracted from a website. Data was extracted and loaded to a MinIO bucket using Python. This data was also continously generated. Apache Kafka was used to process streaming data in real-time. These tasks were triggered by Apache Airflow Data from Apache Kafka was read as well as transformed by Apache Spark. Finally, data was loaded to Apache Cassandra. Docker was used to run this application in multicontainers.
- Technologies: Python, MinIO, Apache Airflow, Apache Kafka,Apache Spark, Apache Cassandra, docker.
- Repository: https://github.com/Sumitkumar005/Projects(#)
- Description: A platform integrating a discussion forum with an AI chatbot using Retrieval-Augmented Generation (RAG) to provide dynamic customer support.
- Technologies: Python, Streamlit, OpenAI GPT-4o, FAISS, Qdrant, Mem0
- Repository: GitHub Link
- Description: Projects that leverage GPT models and external APIs to generate leads and analyze competitor data, automating business insights.
- Technologies: Python, Streamlit, Firecrawl, Composio, OpenAI GPT-4o
- Repository: GitHub Link
- Description: Simulates a full-service recruitment process using specialized AI agents for resume analysis, candidate communication, and interview scheduling.
- Key Features: Gmail and Zoom API integration, asynchronous multi-agent collaboration.
- Technologies: Python, Streamlit, OpenAI GPT-4o, Phidata, EmailTools
- Repository: GitHub Link
- Description: A digital agency simulation where multiple AI agents collaborate to analyze and plan software projects.
- Key Features: Asynchronous communication, role-specific agents (CEO, CTO, Product Manager, Developer, Client Success).
- Technologies: Python, Streamlit, OpenAI GPT-4o, Custom Analysis Tools
- Repository: GitHub Link
- Description: Automates property search and market analysis by aggregating data from multiple real estate websites and providing intelligent insights.
- Key Features: Multi-source data integration, location trend analysis, GPT-powered recommendations.
- Technologies: Python, Streamlit, Firecrawl, OpenAI GPT-4o
- Repository: GitHub Link
- Description: Analyzes competitor websites to generate actionable business insights using a multi-agent system.
- Key Features: Data extraction, competitive analysis, and structured reporting.
- Technologies: Python, Streamlit, Firecrawl, Exa AI, OpenAI GPT-4o
- Repository: GitHub Link
- Description: Automates lead generation by extracting and processing data from Quora, integrating with Google Sheets for streamlined data presentation.
- Key Features: Targeted search, intelligent data extraction, and customizable criteria.
- Technologies: Python, Streamlit, Firecrawl, Composio, OpenAI GPT-4o
- Repository: GitHub Link
Education:
- [Your University], [Your Degree], [Years Attended]
- [Additional relevant training or bootcamps]
Certifications:
- [Certification Name] (e.g., Google Cloud Professional Data Engineer)
- [Certification Name] (e.g., AWS Certified Solutions Architect – Professional)
- [Other Certifications]
- Email: [Your Email Address]
- LinkedIn: [Your LinkedIn URL]
- GitHub: [Your GitHub URL]
- Portfolio Website: [Your Website URL, if applicable]
Thank you for reviewing my portfolio. I look forward to discussing how I can contribute to your team with my broad and deep expertise in data and AI technologies.
Note: This is an archived portfolio that highlights my skills and project progression. For the most up-to-date work, please refer to my GitHub profile or contact me directly.