SecurNet is a comprehensive network security project that leverages a Network Intrusion Detection System (NIDS) to enhance the security of networks. The project involves data preprocessing, feature selection, machine learning-based log classification, and a Streamlit dashboard for insightful visualization of key metrics.
The project begins with the collection of network logs, which are sent to a Kafka topic named "logs" for initial preprocessing. The first Python file handles this task, preparing the data for feature selection.
A second Python file retrieves the preprocessed data from the "logs" Kafka topic, performs additional preprocessing, and sends the refined data to another Kafka topic named "logsprocessed."
The third Python file retrieves data from the "logsprocessed" Kafka topic. It passes the logs through a trained machine learning model to classify them into categories: Background, Normal, or Botnet. The results are then sent to the "logslabelled" Kafka topic.
Apache Pinot acts as a consumer, ingesting data from the "logslabelled" Kafka topic and storing it in a database. This ensures efficient storage and retrieval of labeled log data.
The final component is a Streamlit dashboard that fetches data from Apache Pinot. The dashboard displays key metrics and insights derived from the labeled log data. This visualization aids in better defending against network attacks by providing a real-time overview of network security.
To set up and run the SecurNet project, follow these steps:
- Clone the repository:
git clone https://github.com/yourusername/SecurNet.git
cd SecurNet
- Then download the files required from here, LINK and move it to the SecureNet folder.
The preproccessing.py file cleans and makes the raw log data ready for training. It outputs prepro.csv file. This processed log data is used by MLmodeltraining.py file to train the model.
- First run the preprocessing.py file
- It will generate a csv file in folder named outprepro.
- Change the name of the csv file to prepro.csv
- Now run the MLmodeltraining.py file. This will save the model in Model folder ready to be used.
Here we will simulate log data coming in realtime. I am reading a csv file of raw log data and sending it in chunks of 10 rows to Kafka.
Flow of the log data can be seen below:
Running the project, follow the steps below,
NOTE: Run all the individual commands in a separate terminal.
- Run Apache zookeeper and kafka in different terminals one after the other by following commnads:
zookeeper-server-start /opt/homebrew/etc/zookeeper/zoo.cfg
kafka-server-start /opt/homebrew/etc/kafka/server.properties
- Create Kafka topics, "logs", "logsprocessed" and logslabelled"
kafka-topics --create --topic logs --bootstrap-server localhost:9092
kafka-topics --create --topic logsprocessed --bootstrap-server localhost:9092
kafka-topics --create --topic logslabelled --bootstrap-server localhost:9092
- Start Apache Pinot Controller, Broker and Server
pinot-admin StartController -zkAddress localhost:2181 -clusterName PinotCluster -controllerPort 9001
pinot-admin StartBroker -zkAddress localhost:2181 -clusterName PinotCluster -brokerPort 7001
pinot-admin StartServer -zkAddress localhost:2181 -clusterName PinotCluster -serverPort 8001 -serverAdminPort 8011
- Send the table schema and table config to Apache Pinot.
pinot-admin AddTable \
-schemaFile files_config/transcript_schema.json \
-tableConfigFile files_config/transcript_table_realtime.json \
-controllerPort 9001 -exec
-
Start 0.py, 1.py, 2.py in three separate terminals one after the other
-
Open the apache pinot dashboard to see data ingesting ----> Link
-
Run streamlit app to see the dashboard
streamlit run app.py