Enabling logs and alerting in AWS EKS cluster - CloudWatch Log Insights and Metric filters
AWS CloudWatch Logs could be used to store logs generated by resources created in AWS or external resources, once the logs are in CloudWatch you can run some queries to get specific information and create alerts for specific events.
In the first part of this post, I described the steps to enable logs in an EKS cluster, control plane logs, and container logs using Fluent-bit and CloudWatch, in this post I will show how to get helpful information from logs and create alerts for specific events.
CloudWatch Log Insights
CloudWatch Logs Insights allows search and analysis of log data stored in Amazon CloudWatch Logs, queries can be run to identify potential causes and validate fixes, an advantage of Logs Insights is the ability to discover fields, doing more easy the process to run queries. Automatically Logs Insights define 5 fields:
-
message: This field contains the original log message sent to CloudWatch.
-
timestamp: contains the event timestamp registered in the original event.
-
ingestionTime: contains the time when CloudWatch Logs received the log event.
-
logStream: contains the name of the log stream where the event was added.
-
log: it is an identifier in the form of account-id:log-group-name. When querying multiple log groups, this can be useful to identify which log group a particular event belongs to.
Those fields are discovered by CloudWatch and depending on the log type that we are using CloudWatch will discover more fields, for instance, for EKS control plane logs you can see the field shown in the following image:
Running queries in AWS Log Groups
Queries can be run to search specific events, the field discovery is really helpful in designing the query to run, in some cases when you don't know the structure of the logs you can run a simple query to get the fields that CloudWatch discovered, and use them to design the query based on the use case.
An important thing to mention here is if the CloudWatch log groups have been encrypted by KMS you must have permission to use the key.
How to run queries?
-
Go to AWS CloudWatch service, in the left panel select Logs Insights.
-
Select the logs groups to run queries, up to 20 log groups can be selected, Logs Insights will search in the groups specified.
-
By default CloudWatch shows a simple query, you can run it and validate the fields discovered by CloudWatch. The following image shows a query that gets up to 10 results, you can check it and validate the fields.
AWS documentation describes the query sintax that you can use.
Queries examples for EKS
Search API calls made by kubectl user-agent
The following example searches the calls made to Kube API using the kubectl command and with the GET action. In this case, the log group is the one that EKS has created when you enable logging in the cluster, in the previous post I mentioned the name-format and how to enable it.
fields @logStream, @timestamp, @message
| filter @logStream like /kube-apiserver-audit/
| filter userAgent like /kubectl/
| sort @timestamp desc
| filter verb like /(get)/
The first line is used to specify the fields that you want to show in the results, the query in the example will show the logStream name, the timestamp, and the message, you can add the fields that you want.
Search events filtering by namespace
You can use the fields discovered by CloudWatch to create your queries, in this case for EKS control logs, one field discovered is objectRef.namespace, and the following query uses it to get the events where the kube-system namespace is used.
fields @timestamp, @message
| sort @timestamp desc
| filter objectRef.namespace like 'kube-system'
| limit 2
The result of the previous query could look like:
Creating alerts for specific events
CloudWatch logs can look for specific patterns inside the events that are sent, this allows the creation of metrics to monitor if a particular event happened and create alerts for this, for that we need to use AWS CloudWatch metric filters, this is configured directly in the Log group created and you must specify a pattern.
To create a metric filter you must select the Log Group to use, then in actions, you will see the option to create a metric filter.
Defining a pattern for the filter
When you are creating the filter you need to define a pattern to specify what to look for in the log file. When the logs are in JSON format is more easily define the pattern because you just need to specify the name of the key that you want to evaluate, for this case you can use the following format:
{ PropertySelector Operator Value }
For more details about the pattern syntax you can check the AWS documentation.
When the logs are not in JSON format is more tricky define the pattern , in this case you need to take in mind that each space is taken as a word in the filter, for instance, suppose that you have the following log event:
time="2022-10-09T20:37:25Z" level=info msg="STS response" accesskeyid=ABCD1234 accountid=123456789 arn="arn:aws:sts::123456789:assumed-role/test" client="127.0.0.1:1234" method=POST path=/authenticate
In this case, you have different words separated by space, if you want to look for some word specific you need to know the exact position of the element to compare, let's see this with an example, in this case, i want to match the logs with a level equal to info, if you see the previous log event you can validate that leve=info is the word number 2 in the whole event, in this case, the pattern could be:
[word1,word2="level=info",word3]
Remember that you need to include the whole word that you want to compare in this case you can use leve=info or you put the word between * which means any match with the word specified. let's see the result of the previous pattern.
If you see, CloudWatch is showing each word defined in the pattern and the events that match.
Let's see more examples to be more clear
Metric Filter to alert when actions are made in AWS-AUTH configmap
AWS-AUTH configmap is used to authenticate the user by IAM RBAC, and part of this kind of event looks like the following message:
kind:Event
level: Metadata
objectRef.apiVersion: v1
objectRef.name: aws-auth
objectRef.namespace: kube-system
objectRef.resource: configmaps
verb: get
Unauthorized modifications in this configmap could be a security risk, a metric filter can be created to alert when this configmap is edited. The pattern could be:
{( $.objectRef.name = "aws-auth" && $.objectRef.resource = "configmaps" ) && ($.verb = "delete" || $.verb = "create" || $.verb = "patch" ) }
Metric filter for 403 code response in calls to K8 API-SERVER
This is useful to detect several attempts to login or make calls to the cluster without valid credentials, part of this event looks like the following message:
requestURI: /
responseStatus.code:403
responseStatus.reason: Forbidden
responseStatus.status: Failure
sourceIPs.0 : 12.34.56.78
verb: get
The pattern could be:
{$.responseStatus.code = "403" }
Metric filter to check Access Denied generated by AWS IAM-RBAC
You can monitor the number access-denied in API calls, this is generated by the AWS IAM-RBAC, part of this event looks like the following message:
authenticator-8c7,2022-08-08 10:04:56,"time=""2020-08-04T28:43:44Z"" level=warning msg=""access denied"" client=""127.0.0.1:1234"" error=""sts getCallerIdentity failed: error from AWS (expected 200, got 403)"" method=POST path=/authenticate"
The pattern could be:
[time,level="*warning*",message1="*access*",message2="*denied*",more]