Skip to content

Latest commit

 

History

History
216 lines (179 loc) · 8.57 KB

week1_content.md

File metadata and controls

216 lines (179 loc) · 8.57 KB

Week 1: Foundations of LLM Software Development

January 6—January 12, 2024

Learning Objectives

By the end of this week, students will be able to:

  1. Understand the evolution and current state of LLM technology
  2. Analyze the implications of non-determinism in AI systems
  3. Implement basic LLM applications using Modal and OpenAI APIs
  4. Design appropriate development workflows for LLM applications

Session 1: Introduction to Modern LLM Development

Tuesday, January 7, 2024 (1:00 AM—3:00 AM GMT+1)

Key Topics

1. Evolution of Large Language Models

  • Historical development of language models
  • Breakthrough of transformer architecture
  • Scaling laws and their implications
  • Current state-of-the-art models and capabilities

2. Understanding Non-determinism in LLMs

  • Sources of variability in LLM outputs (Lou & Sun, 2024)
  • Statistical approaches to output validation (Tu et al., 2024)
  • Temperature and sampling strategies (Watson et al., 2024)
  • Deterministic pipelines for production systems
  • Reproducibility frameworks and best practices

References for this section:

  • Lou, J., & Sun, Y. (2024). Anchoring Bias in Large Language Models: An Experimental Study. arXiv preprint arXiv:2412.06593.
  • Tu, L., Meng, R., & Joty, S. (2024). Investigating Factuality in Long-Form Text Generation. arXiv preprint arXiv:2411.15993.
  • Watson, J., Góes, F., & Volpe, M. (2024). Are Frontier Large Language Models Suitable for Q&A in Science Centres? arXiv preprint arXiv:2412.05200.

3. Modern LLM Landscape

  • Survey of available models and their capabilities
  • Comparison of different approaches (OpenAI, Google, Anthropic)
  • Trade-offs between different model sizes and architectures
  • Latest developments in multi-modal capabilities

Required Readings

  1. Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
  2. Döll, M., Döhring, M., & Müller, A. (2024). Evaluating Gender Bias in Large Language Models. arXiv preprint arXiv:2411.09826.
  3. Xu, L., Zhao, S., Lin, Q., et al. (2024). Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study. arXiv preprint arXiv:2408.14438.

Supplementary Materials

  1. OpenAI API Documentation (2024) - Function Calling and Tool Use
  2. Modal Documentation (2024) - Serverless Deployment for AI Applications
  3. Google PaLM 2 Technical Report (2024)

Session 2: LLM Application Development Lifecycle

Thursday, January 9, 2024 (1:00 AM—3:00 AM GMT+1)

Key Topics

1. LLM-Specific Software Development Lifecycle

  • Evolution of AI Development Practices (Gong et al., 2024)
    • Traditional vs. LLM-based development cycles
    • Continuous evaluation patterns
    • Prompt version control strategies
  • Iterative Development with LLMs
    • Rapid prototyping methodologies
    • A/B testing frameworks
    • Feedback incorporation patterns
  • Advanced Integration Patterns
    • Microservices architecture for LLMs
    • API abstraction layers
    • Versioning strategies for prompts and models
  • Production Best Practices (Ventirozos et al., 2024)
    • CI/CD for LLM applications
    • Testing strategies for non-deterministic systems
    • Documentation requirements

References for this section:

  • Gong, X., Li, M., & Zhang, Y. (2024). Effective and Evasive Fuzz Testing-Driven Jailbreaking Attacks against LLMs. arXiv preprint arXiv:2409.14866.
  • Ventirozos, F., Nteka, I., & Nandy, T. (2024). Shifting NER into High Gear: The Auto-AdvER Approach. arXiv preprint arXiv:2412.05655.
  • OpenAI. (2024). LLM Application Development Guide. OpenAI Documentation.

2. Development Environment Setup

  • Modal platform configuration
  • OpenAI API integration
  • Local development workflows
  • Version control considerations

3. Infrastructure and Deployment Basics

  • Modern GPU Architecture Requirements (NVIDIA, 2024)
    • H100 Tensor Core optimizations
    • Multi-GPU deployment patterns
    • Memory hierarchy considerations
  • Scaling and Performance (Ventirozos et al., 2024)
    • Distributed inference strategies
    • Load balancing techniques
    • Latency optimization
  • Security and Cost Management
    • API security patterns (OpenAI, 2024)
    • Resource utilization optimization
    • Cost-effective scaling strategies
  • Monitoring and Observability
    • Metrics collection frameworks
    • Performance profiling tools
    • Error tracking systems

References for this section:

  • NVIDIA. (2024). H100 Tensor Core GPU Architecture: Advancing the State of AI. NVIDIA Technical Documentation.
  • Ventirozos, F., Nteka, I., & Nandy, T. (2024). Shifting NER into High Gear: The Auto-AdvER Approach. arXiv preprint arXiv:2412.05655.
  • OpenAI. (2024). Production Best Practices: Security and Scaling. OpenAI Documentation.

Required Readings

  1. Gong, X., Li, M., Zhang, Y., et al. (2024). Effective and Evasive Fuzz Testing-Driven Jailbreaking Attacks against LLMs. arXiv preprint arXiv:2409.14866.
  2. Gandhi, K., Lynch, Z., Fränken, J.P., et al. (2024). Human-like Affective Cognition in Foundation Models. arXiv preprint arXiv:2409.11733.
  3. Modal Platform Documentation (2024) - Production Deployment Guide

Supplementary Materials

  1. NVIDIA Hopper Architecture Documentation (2024)
  2. OpenAI Model Cards and System Cards (2024)
  3. Google Best Practices for LLM Application Development (2024)

Hands-on Activities

Lab 1: Production-Grade LLM Application Development

Build a robust question-answering system using Modal and OpenAI's APIs that implements industry best practices (OpenAI, 2024; Modal, 2024):

  1. Advanced API Integration

    • Secure API key rotation system
    • Intelligent rate limiting with backoff strategies
    • Comprehensive error handling (Gandhi et al., 2024)
    • Response validation framework
  2. Streaming and Processing

    • Efficient token streaming implementation
    • Real-time response validation
    • Output format enforcement
    • Caching with TTL management
  3. Production-Ready Features

    • Automated logging pipeline
    • Performance metrics collection
    • Cost optimization system
    • Health monitoring dashboard

Technical Stack Requirements:

  • Python 3.10+ with asyncio
  • Modal serverless deployment
  • OpenAI API (GPT-4 Turbo)
  • Redis for caching
  • Prometheus for metrics
  • Grafana for visualization

References:

  • OpenAI. (2024). Production System Design. OpenAI Documentation.
  • Modal. (2024). Enterprise Deployment Guide. Modal Documentation.
  • Gandhi, K., Lynch, Z., & Fränken, J.P. (2024). Human-like Affective Cognition in Foundation Models. arXiv preprint arXiv:2409.11733.

Lab 2: Development Environment Setup

Set up a complete development environment including:

  • Modal configuration
  • API key management
  • Local testing framework
  • Basic monitoring

Project Milestone 1

Due: January 12, 2024

Objective

Build and deploy a basic PDF query application that demonstrates understanding of:

  • LLM API integration
  • Proper error handling
  • Basic prompt engineering
  • Deployment using Modal

Requirements

  1. Implementation must include:

    • PDF text extraction
    • Proper chunking strategy
    • Effective prompt design
    • Basic error handling
    • Cost monitoring
  2. Documentation must include:

    • Architecture overview
    • Setup instructions
    • API documentation
    • Cost analysis

Evaluation Criteria

  • Code quality and organization (25%)
  • Implementation of best practices (25%)
  • Documentation quality (25%)
  • Error handling and robustness (25%)

Additional Resources

Technical Documentation

  1. OpenAI API Reference (2024)
  2. Modal Platform Documentation (2024)
  3. Google PaLM 2 API Documentation (2024)

Academic Papers

  1. Kaplan, J., McCandlish, S., Henighan, T., & Brown, T. B. (2020). Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361.
  2. Raffel, C., Shazeer, N., Roberts, A., et al. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140), 1–67.
  3. Xu, R., & Li, G. (2024). A Comparative Study of Offline Models and Online LLMs in Fake News Detection. arXiv preprint arXiv:2409.03067.

Industry Resources

  1. OpenAI Engineering Blog (2024)
  2. Google AI Blog - PaLM 2 Architecture (2024)
  3. NVIDIA Developer Blog - GPU Architecture for LLMs (2024)

Discussion Topics

  1. How do different model architectures affect development practices?
  2. What are the implications of non-determinism for testing and validation?
  3. How do you balance cost, performance, and reliability in LLM applications?
  4. What are the key considerations when choosing between different LLM providers?

Assessment

  • Class Participation: 10%
  • Lab Assignments: 40%
  • Project Milestone: 50%