This project uses the GitHub API to analyze popular Java-based repositories and extract class names from .java
files. The goal is to calculate the popularity of words used in class names across these repositories, helping to identify commonly used terms in Java development.
This program:
- Fetches Java-based repositories from GitHub using the GitHub API.
- Extracts class names from
.java
files within each repository. - Calculates word popularity by breaking down class names into individual words.
- Displays the top 20 most popular words based on their frequency across class names.
This analysis provides insights into frequently used terms in Java class naming conventions across popular repositories on GitHub.
- Fetches the top Java repositories on GitHub based on star count.
- Dynamically adapts to each repository's default branch.
- Analyzes
.java
files to extract class names. - Calculates and displays the most frequently used words in class names.
- Kotlin 1.5+ with Gradle (or your preferred build tool).
- GitHub API token with appropriate permissions (required to access repository data).
-
Clone the Repository:
git clone https://github.com/yourusername/JavaClassNamePopularityAnalyzer.git cd JavaClassNamePopularityAnalyzer
-
Install Dependencies: Add the following dependencies in your
build.gradle.kts
:dependencies { implementation("com.squareup.okhttp3:okhttp:4.9.3") implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.3.3") }
-
Set Up GitHub API Token:
- Generate a GitHub personal access token by going to GitHub’s token settings.
- Copy the token and replace
"YOUR_GITHUB_TOKEN"
in the code with your token.
-
Build the Project:
./gradlew build
Run the program with:
./gradlew run
The program will fetch Java repositories, extract class names from .java
files, and display the most popular words in those class names.
-
searchJavaRepositories
: Fetches popular Java repositories using the GitHub Search API. The repositories are sorted by star count in descending order. -
getLatestCommitSHA
: Retrieves the latest commit SHA for the specified branch of a repository. It dynamically uses the repository's default branch to avoid issues with varying branch names. -
getJavaFiles
: Retrieves the file tree for a repository and filters out only the.java
files. These files are used to extract class names. -
getClassNamesFromJavaFile
: For each.java
file, retrieves the content using the GitHub API, decodes the Base64 content, and extracts class names using a regular expression. The regular expression identifies keywords followingclass
. -
calculateWordPopularity
: Splits each class name into words based on CamelCase convention and calculates the frequency of each word across all class names. It stores the word frequencies in a map to determine the most popular terms.
-
Search for Java Repositories: The program uses
searchJavaRepositories
to retrieve a list of Java repositories, limiting to the top 5 based on stars. -
Get Latest Commit SHA: For each repository,
getLatestCommitSHA
retrieves the SHA for the latest commit on the repository’s default branch. -
Retrieve and Process Java Files: Using
getJavaFiles
, the program fetches the.java
files for each repository. Each file’s content is analyzed bygetClassNamesFromJavaFile
to extract class names. -
Calculate Word Popularity: After extracting class names,
calculateWordPopularity
breaks them down into individual words and calculates the frequency of each word, then displays the 20 most common words.
After running the program, you’ll see output similar to:
Repository: iluwatar/java-design-patterns
Java File: patterns/adapter/AdapterPattern.java
Class: AdapterPattern
...
Word Popularity Score (Top 20):
manager: 85
data: 76
controller: 62
list: 60
service: 45
factory: 40
...
This output displays the top 20 most common words and their frequencies, based on the first 10,000 class names collected.
- Rate Limits: If making many requests in a short time, be mindful of GitHub’s rate limits. Using a personal access token increases your rate limits.
- Default Branch Handling: The program dynamically adapts to each repository’s default branch, which prevents errors if the default branch isn’t named
main
.
This project is licensed under the MIT License. See the LICENSE file for details.