January 6—January 12, 2024
By the end of this week, students will be able to:
- Understand the evolution and current state of LLM technology
- Analyze the implications of non-determinism in AI systems
- Implement basic LLM applications using Modal and OpenAI APIs
- Design appropriate development workflows for LLM applications
Tuesday, January 7, 2024 (1:00 AM—3:00 AM GMT+1)
- Historical development of language models
- Breakthrough of transformer architecture
- Scaling laws and their implications
- Current state-of-the-art models and capabilities
- Sources of variability in LLM outputs (Lou & Sun, 2024)
- Statistical approaches to output validation (Tu et al., 2024)
- Temperature and sampling strategies (Watson et al., 2024)
- Deterministic pipelines for production systems
- Reproducibility frameworks and best practices
References for this section:
- Lou, J., & Sun, Y. (2024). Anchoring Bias in Large Language Models: An Experimental Study. arXiv preprint arXiv:2412.06593.
- Tu, L., Meng, R., & Joty, S. (2024). Investigating Factuality in Long-Form Text Generation. arXiv preprint arXiv:2411.15993.
- Watson, J., Góes, F., & Volpe, M. (2024). Are Frontier Large Language Models Suitable for Q&A in Science Centres? arXiv preprint arXiv:2412.05200.
- Survey of available models and their capabilities
- Comparison of different approaches (OpenAI, Google, Anthropic)
- Trade-offs between different model sizes and architectures
- Latest developments in multi-modal capabilities
- Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
- Döll, M., Döhring, M., & Müller, A. (2024). Evaluating Gender Bias in Large Language Models. arXiv preprint arXiv:2411.09826.
- Xu, L., Zhao, S., Lin, Q., et al. (2024). Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study. arXiv preprint arXiv:2408.14438.
- OpenAI API Documentation (2024) - Function Calling and Tool Use
- Modal Documentation (2024) - Serverless Deployment for AI Applications
- Google PaLM 2 Technical Report (2024)
Thursday, January 9, 2024 (1:00 AM—3:00 AM GMT+1)
- Evolution of AI Development Practices (Gong et al., 2024)
- Traditional vs. LLM-based development cycles
- Continuous evaluation patterns
- Prompt version control strategies
- Iterative Development with LLMs
- Rapid prototyping methodologies
- A/B testing frameworks
- Feedback incorporation patterns
- Advanced Integration Patterns
- Microservices architecture for LLMs
- API abstraction layers
- Versioning strategies for prompts and models
- Production Best Practices (Ventirozos et al., 2024)
- CI/CD for LLM applications
- Testing strategies for non-deterministic systems
- Documentation requirements
References for this section:
- Gong, X., Li, M., & Zhang, Y. (2024). Effective and Evasive Fuzz Testing-Driven Jailbreaking Attacks against LLMs. arXiv preprint arXiv:2409.14866.
- Ventirozos, F., Nteka, I., & Nandy, T. (2024). Shifting NER into High Gear: The Auto-AdvER Approach. arXiv preprint arXiv:2412.05655.
- OpenAI. (2024). LLM Application Development Guide. OpenAI Documentation.
- Modal platform configuration
- OpenAI API integration
- Local development workflows
- Version control considerations
- Modern GPU Architecture Requirements (NVIDIA, 2024)
- H100 Tensor Core optimizations
- Multi-GPU deployment patterns
- Memory hierarchy considerations
- Scaling and Performance (Ventirozos et al., 2024)
- Distributed inference strategies
- Load balancing techniques
- Latency optimization
- Security and Cost Management
- API security patterns (OpenAI, 2024)
- Resource utilization optimization
- Cost-effective scaling strategies
- Monitoring and Observability
- Metrics collection frameworks
- Performance profiling tools
- Error tracking systems
References for this section:
- NVIDIA. (2024). H100 Tensor Core GPU Architecture: Advancing the State of AI. NVIDIA Technical Documentation.
- Ventirozos, F., Nteka, I., & Nandy, T. (2024). Shifting NER into High Gear: The Auto-AdvER Approach. arXiv preprint arXiv:2412.05655.
- OpenAI. (2024). Production Best Practices: Security and Scaling. OpenAI Documentation.
- Gong, X., Li, M., Zhang, Y., et al. (2024). Effective and Evasive Fuzz Testing-Driven Jailbreaking Attacks against LLMs. arXiv preprint arXiv:2409.14866.
- Gandhi, K., Lynch, Z., Fränken, J.P., et al. (2024). Human-like Affective Cognition in Foundation Models. arXiv preprint arXiv:2409.11733.
- Modal Platform Documentation (2024) - Production Deployment Guide
- NVIDIA Hopper Architecture Documentation (2024)
- OpenAI Model Cards and System Cards (2024)
- Google Best Practices for LLM Application Development (2024)
Build a robust question-answering system using Modal and OpenAI's APIs that implements industry best practices (OpenAI, 2024; Modal, 2024):
-
Advanced API Integration
- Secure API key rotation system
- Intelligent rate limiting with backoff strategies
- Comprehensive error handling (Gandhi et al., 2024)
- Response validation framework
-
Streaming and Processing
- Efficient token streaming implementation
- Real-time response validation
- Output format enforcement
- Caching with TTL management
-
Production-Ready Features
- Automated logging pipeline
- Performance metrics collection
- Cost optimization system
- Health monitoring dashboard
Technical Stack Requirements:
- Python 3.10+ with asyncio
- Modal serverless deployment
- OpenAI API (GPT-4 Turbo)
- Redis for caching
- Prometheus for metrics
- Grafana for visualization
References:
- OpenAI. (2024). Production System Design. OpenAI Documentation.
- Modal. (2024). Enterprise Deployment Guide. Modal Documentation.
- Gandhi, K., Lynch, Z., & Fränken, J.P. (2024). Human-like Affective Cognition in Foundation Models. arXiv preprint arXiv:2409.11733.
Set up a complete development environment including:
- Modal configuration
- API key management
- Local testing framework
- Basic monitoring
Due: January 12, 2024
Build and deploy a basic PDF query application that demonstrates understanding of:
- LLM API integration
- Proper error handling
- Basic prompt engineering
- Deployment using Modal
-
Implementation must include:
- PDF text extraction
- Proper chunking strategy
- Effective prompt design
- Basic error handling
- Cost monitoring
-
Documentation must include:
- Architecture overview
- Setup instructions
- API documentation
- Cost analysis
- Code quality and organization (25%)
- Implementation of best practices (25%)
- Documentation quality (25%)
- Error handling and robustness (25%)
- OpenAI API Reference (2024)
- Modal Platform Documentation (2024)
- Google PaLM 2 API Documentation (2024)
- Kaplan, J., McCandlish, S., Henighan, T., & Brown, T. B. (2020). Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361.
- Raffel, C., Shazeer, N., Roberts, A., et al. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140), 1–67.
- Xu, R., & Li, G. (2024). A Comparative Study of Offline Models and Online LLMs in Fake News Detection. arXiv preprint arXiv:2409.03067.
- OpenAI Engineering Blog (2024)
- Google AI Blog - PaLM 2 Architecture (2024)
- NVIDIA Developer Blog - GPU Architecture for LLMs (2024)
- How do different model architectures affect development practices?
- What are the implications of non-determinism for testing and validation?
- How do you balance cost, performance, and reliability in LLM applications?
- What are the key considerations when choosing between different LLM providers?
- Class Participation: 10%
- Lab Assignments: 40%
- Project Milestone: 50%