-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat(document-readers): Add GptRepo document reader module , issue: #281 #355
Conversation
Core Features: - Implement GptRepoDocumentReader for Git repository content processing - Support file extension filtering and .gptignore patterns - Add content concatenation with customizable preamble text - Support custom file encoding with proper error handling - Add comprehensive metadata extraction (file path, name, directory) Test Coverage: - Add unit tests for basic document reading - Add tests for file filtering and encoding - Add tests for metadata extraction - Add tests for custom preamble text Documentation: - Add detailed README with usage examples - Document API and configuration options - Include best practices and error handling guidelines BREAKING CHANGE: None
spring-javaformat apply
...xiv-document-reader/src/main/java/com/alibaba/cloud/ai/reader/arxiv/ArxivDocumentReader.java
Outdated
Show resolved
Hide resolved
...xiv-document-reader/src/main/java/com/alibaba/cloud/ai/reader/arxiv/ArxivDocumentReader.java
Show resolved
Hide resolved
...xiv-document-reader/src/main/java/com/alibaba/cloud/ai/reader/arxiv/ArxivDocumentReader.java
Show resolved
Hide resolved
...xiv-document-reader/src/main/java/com/alibaba/cloud/ai/reader/arxiv/ArxivDocumentReader.java
Show resolved
Hide resolved
...xiv-document-reader/src/main/java/com/alibaba/cloud/ai/reader/arxiv/ArxivDocumentReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码质量挺高的,还有一些问题需要讨论或者优化下
… modules - Translate class and method comments to English in ArxivSortCriterion - Translate class and method comments to English in ArxivSortOrder - Translate class, field and method comments to English in ArxivResult - Translate class and method comments to English in ArxivClient - Translate inline comments to English in ArxivDocumentReader - Translate class and method comments to English in GptRepoDocumentReader - Translate test comments to English in GptRepoDocumentReaderTest This change improves code readability and maintains consistency in documentation.
return documents; | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import java.nio.file.Path; | ||
|
||
/** | ||
* arXiv资源类,用于管理查询和资源访问 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里都翻译成 英文吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tks
Add GptRepo Document Reader Module
新增 GptRepo 文档阅读器模块
Description / 功能描述
Add GptRepo document reader module for reading and processing Git repository content. This module converts repository files into structured document format for AI processing and analysis.
新增 GptRepo 文档阅读器模块,用于读取和处理 Git 仓库内容。该模块可以将仓库中的文件转换为结构化的文档格式,便于后续的 AI 处理和分析。
Key Features / 主要功能
Basic Features / 基础功能
Support recursive reading of Git repository content
Support file extension filtering
Support file exclusion via
.gptignore
Support content concatenation or separate processing
支持递归读取整个 Git 仓库的内容
支持文件扩展名过滤
支持通过
.gptignore
文件排除特定文件支持文件内容的合并或分散处理
Advanced Features / 高级特性
Support custom document preamble text
Support custom file encoding (default UTF-8)
Provide rich file metadata
Maintain directory structure
支持自定义文档前导文本
支持自定义文件编码(默认 UTF-8)
提供丰富的文件元数据信息
支持目录结构保持
Metadata Support / 元数据支持
Technical Implementation / 技术实现
Implement
DocumentReader
interfaceUse Java NIO for file operations
Use Stream API for file collection processing
Adopt Builder pattern for search criteria
实现
DocumentReader
接口使用 Java NIO 进行文件操作
使用 Stream API 处理文件集合
采用 Builder 模式构建搜索条件
Test Coverage / 测试覆盖
Documentation / 文档完善