Skip to content

Parsing log files can be a tedious task, especially when dealing with complex log formats. The Log Parser aims to streamline this process by leveraging regular expressions to match and capture relevant fields from log entries. With the extracted data, users can perform further analysis, generate reports, or gain insights from their log files.

Notifications You must be signed in to change notification settings

ksn-developer/logbrain

Repository files navigation

LogBrain

The Log Parser is a Python tool that allows you to parse log files based on a specified log format. It extracts relevant log entries from a log file and provides methods to retrieve specific fields from the log entries.

Features

  • Parse log files based on a specified log format

  • Extract IP addresses, timestamps, request details, response statuses,bytes sent, referer URLs, and user agents from log entries

  • Generate a Pandas DataFrame from the parsed log entries

  • Keep track of any log entries that failed to parse

Requirements

  • Python 3.x
  • re (regular expression) module
  • pandas library

Usage

pip install -r requirements.txt

Instantiate the LogParser class:

log_parser = LogParser()

Parse a log file by calling the parse_log method and providing the path to the log file as an argument. This method returns a list of dictionaries, where each dictionary represents a parsed log entry.

entries = log_parser.parse_log(logfile_path)

Extract specific fields from the log entries using the provided methods, such as extract_ip, extract_time, extract_request, extract_status, extract_bytes_sent, extract_referer, and extract_user_agent.

ip = log_parser.extract_ip()
time = log_parser.extract_time()
request = log_parser.extract_request()
status = log_parser.extract_status()
bytes_sent = log_parser.extract_bytes_sent()
referer = log_parser.extract_referer()
user_agent = log_parser.extract_user_agent()

Generate a Pandas DataFrame from the parsed log entries using the to_df method.

  logs_df = log_parser.to_df(entries)

Total distinct HTTP status codes

status_count = logs_df["status"].value_counts().reset_index()
status_count.columns = ["status", "count"]
status_count

*see example.ipnb in notebooks directory*

Additional Notes

The log file can be read using the read_log_file method, which returns the contents of the log file as a list of strings.

The print_matches method can be used to display the matched fields and their values from a log entry.

Feel free to customize the code and adapt it to your specific use case. Happy log parsing!

About

Parsing log files can be a tedious task, especially when dealing with complex log formats. The Log Parser aims to streamline this process by leveraging regular expressions to match and capture relevant fields from log entries. With the extracted data, users can perform further analysis, generate reports, or gain insights from their log files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published