From e2f15bec68f446aa74a54bf6197292762af8d9de Mon Sep 17 00:00:00 2001 From: Tanmaya Panda <108695755+tanmaya-panda1@users.noreply.github.com> Date: Mon, 13 Jan 2025 16:29:49 +0530 Subject: [PATCH 1/3] Update azure_kusto.md for buffering commits Signed-off-by: Tanmaya Panda <108695755+tanmaya-panda1@users.noreply.github.com> --- pipeline/outputs/azure_kusto.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/pipeline/outputs/azure_kusto.md b/pipeline/outputs/azure_kusto.md index 19cf72157..a91870943 100644 --- a/pipeline/outputs/azure_kusto.md +++ b/pipeline/outputs/azure_kusto.md @@ -64,6 +64,19 @@ By default, Kusto will insert incoming ingestions into a table by inferring the | include_time_key | If enabled, a timestamp is appended to output. The key name is used `time_key` property. | `On` | | time_key | The key name of time. If `include_time_key` is false, This property is ignored. | `timestamp` | | workers | The number of [workers](../../administration/multithreading.md#outputs) to perform flush operations for this output. | `0` | +| buffering_enabled | _Optional_ - Enable buffering into disk before ingesting into Azure Kusto. | `Off` | +| buffer_dir | _Optional_ - When buffering is turned ON, specifies the location of directory where the buffered data will be stored. | `/tmp/fluent-bit/azure-kusto/` | +| upload_timeout | _Optional_ - When buffering is turned ON, specifies a timeout for uploads. Fluent Bit will start ingesting buffer files which have been created more than x minutes and haven't reached upload_file_size limit yet. | `30m` | +| upload_file_size | _Optional_ - When buffering is turned ON, specifies the size of files to be uploaded in MBs. | `200MB` | +| azure_kusto_buffer_key | _Optional_ - When buffering is turned ON, set the azure kusto buffer key which needs to be specified when using multiple instances of azure kusto output plugin and buffering is enabled. | `key` | +| store_dir_limit_size | _Optional_ - When buffering is turned ON, set the max size of the buffer directory. | `8GB` | +| buffer_file_delete_early | _Optional_ - When buffering is turned ON, whether to delete the buffered file early after successful blob creation. | `Off` | +| unify_tag | _Optional_ - This creates a single buffer file when the buffering mode is ON. | `On` | +| blob_uri_length | _Optional_ - Set the length of generated blob uri before ingesting to kusto. | `64` | +| scheduler_max_retries | _Optional_ - When buffering is turned ON, Set the maximum number of retries for ingestion using the scheduler. | `3` | +| use_imds | _Optional_ - Whether to use IMDS to retrieve oauth token. | `Off` | +| delete_on_max_upload_error | _Optional_ - When buffering is turned ON, Whether to delete the buffer file on maximum upload errors. | `Off` | +| io_timeout | _Optional_ - Configure the HTTP IO timeout for uploads. | `60s` | ### Configuration File @@ -80,6 +93,19 @@ Get started quickly with this configuration file: Database_Name Table_Name Ingestion_Mapping_Reference + buffering_enabled On + upload_timeout 2m + upload_file_size 125M + azure_kusto_buffer_key kusto1 + buffer_file_delete_early Off + unify_tag On + use_imds Off + buffer_dir /var/log/ + store_dir_limit_size 16GB + blob_uri_length 128 + scheduler_max_retries 3 + delete_on_max_upload_error Off + io_timeout 60s ``` ## Troubleshooting From 37abfb2dbcac49acd5aa5579f1486a4d3e6bca36 Mon Sep 17 00:00:00 2001 From: esmerel <6818907+esmerel@users.noreply.github.com> Date: Tue, 14 Jan 2025 08:52:41 -0800 Subject: [PATCH 2/3] Update pipeline/outputs/azure_kusto.md Signed-off-by: esmerel <6818907+esmerel@users.noreply.github.com> --- pipeline/outputs/azure_kusto.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/pipeline/outputs/azure_kusto.md b/pipeline/outputs/azure_kusto.md index a91870943..ab4866451 100644 --- a/pipeline/outputs/azure_kusto.md +++ b/pipeline/outputs/azure_kusto.md @@ -65,17 +65,17 @@ By default, Kusto will insert incoming ingestions into a table by inferring the | time_key | The key name of time. If `include_time_key` is false, This property is ignored. | `timestamp` | | workers | The number of [workers](../../administration/multithreading.md#outputs) to perform flush operations for this output. | `0` | | buffering_enabled | _Optional_ - Enable buffering into disk before ingesting into Azure Kusto. | `Off` | -| buffer_dir | _Optional_ - When buffering is turned ON, specifies the location of directory where the buffered data will be stored. | `/tmp/fluent-bit/azure-kusto/` | -| upload_timeout | _Optional_ - When buffering is turned ON, specifies a timeout for uploads. Fluent Bit will start ingesting buffer files which have been created more than x minutes and haven't reached upload_file_size limit yet. | `30m` | -| upload_file_size | _Optional_ - When buffering is turned ON, specifies the size of files to be uploaded in MBs. | `200MB` | -| azure_kusto_buffer_key | _Optional_ - When buffering is turned ON, set the azure kusto buffer key which needs to be specified when using multiple instances of azure kusto output plugin and buffering is enabled. | `key` | -| store_dir_limit_size | _Optional_ - When buffering is turned ON, set the max size of the buffer directory. | `8GB` | -| buffer_file_delete_early | _Optional_ - When buffering is turned ON, whether to delete the buffered file early after successful blob creation. | `Off` | -| unify_tag | _Optional_ - This creates a single buffer file when the buffering mode is ON. | `On` | -| blob_uri_length | _Optional_ - Set the length of generated blob uri before ingesting to kusto. | `64` | -| scheduler_max_retries | _Optional_ - When buffering is turned ON, Set the maximum number of retries for ingestion using the scheduler. | `3` | -| use_imds | _Optional_ - Whether to use IMDS to retrieve oauth token. | `Off` | -| delete_on_max_upload_error | _Optional_ - When buffering is turned ON, Whether to delete the buffer file on maximum upload errors. | `Off` | +| buffer_dir | _Optional_ - When buffering is `On`, specifies the location of directory where the buffered data will be stored. | `/tmp/fluent-bit/azure-kusto/` | +| upload_timeout | _Optional_ - When buffering is `On`, specifies a timeout for uploads. Fluent Bit will start ingesting buffer files which have been created more than x minutes and haven't reached `upload_file_size` limit. | `30m` | +| upload_file_size | _Optional_ - When buffering is `On`, specifies the size of files to be uploaded in MBs. | `200MB` | +| azure_kusto_buffer_key | _Optional_ - When buffering is `On`, set the Azure Kusto buffer key which must be specified when using multiple instances of Azure Kusto output plugin and buffering is enabled. | `key` | +| store_dir_limit_size | _Optional_ - When buffering is `On`, set the max size of the buffer directory. | `8GB` | +| buffer_file_delete_early | _Optional_ - When buffering is `On`, whether to delete the buffered file early after successful blob creation. | `Off` | +| unify_tag | _Optional_ - This creates a single buffer file when the buffering mode is `On`. | `On` | +| blob_uri_length | _Optional_ - Set the length of generated blob URI before ingesting to Kusto. | `64` | +| scheduler_max_retries | _Optional_ - When buffering is `On`, set the maximum number of retries for ingestion using the scheduler. | `3` | +| use_imds | _Optional_ - Whether to use IMDS to retrieve OAuth token. | `Off` | +| delete_on_max_upload_error | _Optional_ - When buffering is `On`, whether to delete the buffer file on maximum upload errors. | `Off` | | io_timeout | _Optional_ - Configure the HTTP IO timeout for uploads. | `60s` | ### Configuration File From 2f336baf696a6679595a811fd8b5090236512658 Mon Sep 17 00:00:00 2001 From: Tanmaya Panda Date: Tue, 6 May 2025 17:23:40 +0530 Subject: [PATCH 3/3] out_azure_kusto: added workload identity auth mode Signed-off-by: Tanmaya Panda --- pipeline/outputs/azure_kusto.md | 110 ++++++++++++++++++++++++-------- 1 file changed, 83 insertions(+), 27 deletions(-) diff --git a/pipeline/outputs/azure_kusto.md b/pipeline/outputs/azure_kusto.md index 46e0dbfc3..8ca4e7136 100644 --- a/pipeline/outputs/azure_kusto.md +++ b/pipeline/outputs/azure_kusto.md @@ -6,6 +6,40 @@ description: Send logs to Azure Data Explorer (Kusto) The Kusto output plugin allows to ingest your logs into an [Azure Data Explorer](https://azure.microsoft.com/en-us/services/data-explorer/) cluster, via the [Queued Ingestion](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/api/netfx/about-kusto-ingest#queued-ingestion) mechanism. This output plugin can also be used to ingest logs into an [Eventhouse](https://blog.fabric.microsoft.com/en-us/blog/eventhouse-overview-handling-real-time-data-with-microsoft-fabric/) cluster in Microsoft Fabric Real Time Analytics. +## Authentication Methods + +Fluent-Bit can use various authentication methods to connect to your Azure Data Explorer cluster: + +### Service Principal Authentication (Default) + +For service principal authentication, you'll need to create an Azure AD application: + +- [Register an Application](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app#register-an-application) +- [Add a client secret](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app#add-a-client-secret) +- [Authorize the app in your database](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/access-control/principals-and-identity-providers#azure-ad-tenants) + +Configure Fluent Bit with your application's `tenant_id`, `client_id`, and `client_secret`. + +### Managed Identity Authentication + +When running on Azure services that support Managed Identities (such as Azure VMs, AKS, or App Service): + +1. [Assign the managed identity appropriate permissions to your Kusto database](https://learn.microsoft.com/en-us/azure/data-explorer/configure-managed-identities-cluster) +2. Configure Fluent Bit with `auth_type` set to `managed_identity` +3. For system-assigned identity, set `client_id` to `system` +4. For user-assigned identity, set `client_id` to the managed identity's client ID (GUID) + +### Workload Identity Authentication + +For Kubernetes environments using Azure Workload Identity: + +1. [Set up Azure Workload Identity in your Kubernetes cluster](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) +2. Configure your pod to use a service account with Workload Identity Federation +3. Configure Fluent Bit with: + - `auth_type` set to `workload_identity` + - `tenant_id` and `client_id` of your Azure AD application + - `workload_identity_token_file` pointing to your token file path (typically `/var/run/secrets/azure/tokens/azure-identity-token`) + ## For ingesting into Azure Data Explorer: Creating a Kusto Cluster and Database You can create an Azure Data Explorer cluster in one of the following ways: @@ -20,15 +54,6 @@ You can create an Eventhouse cluster and a KQL database follow the following ste - [Create an Eventhouse cluster](https://docs.microsoft.com/en-us/azure/data-explorer/eventhouse/create-eventhouse-cluster) - [Create a KQL database](https://docs.microsoft.com/en-us/azure/data-explorer/eventhouse/create-database) - -## Creating an Azure Registered Application - -Fluent-Bit will use the application's credentials, to ingest data into your cluster. - -- [Register an Application](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app#register-an-application) -- [Add a client secret](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app#add-a-client-secret) -- [Authorize the app in your database](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/access-control/principals-and-identity-providers#azure-ad-tenants) - ## Creating a Table Fluent-Bit ingests the event data into Kusto in a JSON format, that by default will include 3 properties: @@ -51,11 +76,12 @@ By default, Kusto will insert incoming ingestions into a table by inferring the | Key | Description | Default | | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | -| tenant_id | _Required if `managed_identity_client_id` is not set_ - The tenant/domain ID of the AAD registered application. | | -| client_id | _Required if `managed_identity_client_id` is not set_ - The client ID of the AAD registered application. | | -| client_secret | _Required if `managed_identity_client_id` is not set_ - The client secret of the AAD registered application ([App Secret](https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal#option-2-create-a-new-application-secret)). | -| managed_identity_client_id | _Required if `tenant_id`, `client_id`, and `client_secret` are not set_ - The managed identity ID to authenticate with. Set to `SYSTEM` for system-assigned managed identity, or set to the MI client ID (GUID) for user-assigned managed identity. | | -| ingestion_endpoint | _Required_ - The cluster's ingestion endpoint, usually in the form `https://ingest-cluster_name.region.kusto.windows.net | +| tenant_id | _Required for service principal and workload identity auth_ - The tenant/domain ID of the AAD registered application. | | +| client_id | _Required for service principal and workload identity auth_ - The client ID of the AAD registered application. When using managed identity authentication, set this to 'system' for system-assigned identity or provide the managed identity's client ID. | | +| client_secret | _Required for service principal auth_ - The client secret of the AAD registered application ([App Secret](https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal#option-2-create-a-new-application-secret)). | | +| workload_identity_token_file | _Required for workload identity auth_ - The file path containing the workload identity token when using Azure Workload Identity authentication in Kubernetes. | /var/run/secrets/azure/tokens/azure-identity-token | +| auth_type | Authentication type to use. Supported values: `service_principal` (default), `managed_identity`, `workload_identity`. | `service_principal` | +| ingestion_endpoint | _Required_ - The cluster's ingestion endpoint, usually in the form `https://ingest-cluster_name.region.kusto.windows.net` | | | database_name | _Required_ - The database name. | | | table_name | _Required_ - The table name. | | | ingestion_mapping_reference | _Optional_ - The name of a [JSON ingestion mapping](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/mappings#json-mapping) that will be used to map the ingested payload into the table columns. | | @@ -83,7 +109,9 @@ By default, Kusto will insert incoming ingestions into a table by inferring the ### Configuration File -Get started quickly with this configuration file: +Get started quickly with these configuration examples: + +#### Service Principal Authentication (Default) ``` [OUTPUT] @@ -99,18 +127,46 @@ Get started quickly with this configuration file: ingestion_endpoint_connect_timeout compression_enabled ingestion_resources_refresh_interval - buffering_enabled On - upload_timeout 2m - upload_file_size 125M - azure_kusto_buffer_key kusto1 - buffer_file_delete_early Off - unify_tag On - buffer_dir /var/log/ - store_dir_limit_size 16GB - blob_uri_length 128 - scheduler_max_retries 3 - delete_on_max_upload_error Off - io_timeout 60s + buffering_enabled + upload_timeout + upload_file_size + azure_kusto_buffer_key + buffer_file_delete_early + unify_tag + buffer_dir + blob_uri_length + scheduler_max_retries + delete_on_max_upload_error +``` + +#### Managed Identity Authentication + +``` +[OUTPUT] + Match * + Name azure_kusto + Auth_Type managed_identity + Client_Id # Use 'system' for system-assigned managed identity + Ingestion_Endpoint https://ingest-..kusto.windows.net + Database_Name + Table_Name + # Additional parameters as needed +``` + +#### Workload Identity Authentication + +``` +[OUTPUT] + Match * + Name azure_kusto + Auth_Type workload_identity + Tenant_Id + Client_Id + Workload_Identity_Token_File /var/run/secrets/azure/tokens/azure-identity-token + Ingestion_Endpoint https://ingest-..kusto.windows.net + Database_Name + Table_Name + # Additional parameters as needed ``` ## Troubleshooting