-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fix github api pagination #247
Conversation
WalkthroughThe pull request refactors the GithubService class’s get_repos_for_user method by converting it from synchronous to asynchronous. The method now uses aiohttp to fetch GitHub installations and repositories concurrently, implementing pagination for both data sets. It gathers installations in batches and iterates through repository pages using the "next" link in response headers. Enhanced error handling logs issues and raises appropriate HTTP exceptions for failed requests. Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant G as GithubService
participant API as GitHub API
C->>G: get_repos_for_user(user_id)
G->>API: GET /installations?page=1
API-->>G: Returns headers (total pages) and data
G->>API: [Async] GET /installations?page=2...N
API-->>G: Aggregated installations data
loop For each installation
G->>API: GET /repos (with pagination)
API-->>G: Repositories data or next link info
end
G-->>C: Aggregated installations & repositories
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
app/modules/code_provider/github/github_service.py (1)
192-254
: Streamline the pagination loop for better PEP 8 compliance and readability.
While your pagination logic is well-structured, some list comprehensions and logging statements are triggering the pipeline’s PEP 8 style warnings. Consider splitting the line comprehension for page_urls, and ensure that logging statements meet the pipeline’s formatting guidelines.Apply something similar to the following diff to address the pipeline warnings:
- page_urls = [f"{base_url}?page={page}" for page in range(2, last_page + 1)] + page_urls = [ + f"{base_url}?page={page}" + for page in range(2, last_page + 1) + ] - logger.info(f"Fetched {len(installations)} installations from {url}") + logger.info( + f"Fetched {len(installations)} installations from {url}" + )🧰 Tools
🪛 GitHub Actions: Pre-commit
[warning] 203-205: Code does not adhere to PEP 8 standards. The logging statement should be formatted using f-strings.
[warning] 228-230: Code does not adhere to PEP 8 standards. The list comprehension should be split into multiple lines for better readability.
[warning] 238-240: Code does not adhere to PEP 8 standards. The logging statement should be formatted using f-strings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/modules/code_provider/github/github_service.py
(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
app/modules/code_provider/github/github_service.py
273-273: Local variable github
is assigned to but never used
Remove assignment to unused variable github
(F841)
🪛 GitHub Actions: Pre-commit
app/modules/code_provider/github/github_service.py
[warning] 203-205: Code does not adhere to PEP 8 standards. The logging statement should be formatted using f-strings.
[warning] 228-230: Code does not adhere to PEP 8 standards. The list comprehension should be split into multiple lines for better readability.
[warning] 238-240: Code does not adhere to PEP 8 standards. The logging statement should be formatted using f-strings.
[warning] 254-256: Code does not adhere to PEP 8 standards. The logging statement should be formatted using f-strings.
🔇 Additional comments (1)
app/modules/code_provider/github/github_service.py (1)
8-8
: Import looks good.
No issues with introducing aiohttp for asynchronous operations.
@@ -217,20 +266,35 @@ async def get_repos_for_user(self, user_id: str): | |||
elif account_type == "Organization" and account_login in org_logins: | |||
user_installations.append(installation) | |||
|
|||
|
|||
repos = [] | |||
for installation in user_installations: | |||
app_auth = auth.get_installation_auth(installation["id"]) | |||
github = Github(auth=app_auth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused variable.
'github' is assigned to but never used, which might confuse readers and violate lint checks.
- github = Github(auth=app_auth)
repos_url = installation["repositories_url"]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
github = Github(auth=app_auth) | |
repos_url = installation["repositories_url"] |
🧰 Tools
🪛 Ruff (0.8.2)
273-273: Local variable github
is assigned to but never used
Remove assignment to unused variable github
(F841)
|
||
# Handle pagination for repositories as well | ||
next_repos_url = repos_url | ||
while next_repos_url: | ||
repos_response = requests.get( | ||
next_repos_url, | ||
headers={"Authorization": f"Bearer {app_auth.token}"} | ||
) | ||
|
||
if repos_response.status_code == 200: | ||
repos.extend(repos_response.json().get("repositories", [])) | ||
|
||
# Check for next page in Link header | ||
next_repos_url = None | ||
if "links" in repos_response.__dict__: | ||
for link in repos_response.links.values(): | ||
if link.get("rel") == "next": | ||
next_repos_url = link.get("url") | ||
break | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using aiohttp for repository pagination to maintain consistency.
You’re performing synchronous GET requests inside an async method, which can block the event loop. Switch to the existing aiohttp session (or open another context) for fetching paginated repositories to leverage concurrency benefits.
Example approach:
- repos_response = requests.get(
- next_repos_url,
- headers={"Authorization": f"Bearer {app_auth.token}"}
- )
+ async with session.get(
+ next_repos_url,
+ headers={"Authorization": f"Bearer {app_auth.token}"}
+ ) as response:
+ if response.status == 200:
+ data = await response.json()
+ repos.extend(data.get("repositories", []))
+ # Continue processing
Committable suggestion skipped: line range outside the PR's diff.
Summary by CodeRabbit