The S3 bucket that you specified for CloudFront logs does not enable ACL access

We have been finding more and more companies coming to CHAOSSEARCH to help them deal with the flood of logs that are being generated by their content delivery services such as Amazon CloudFront. In today’s world where your customers can exist all around the globe, it's important to make sure that your websites’ application assets are as close to the users as possible.

Amazon makes it incredibly easy to enable logging for your specific distribution — and will automatically send your logs to an S3 bucket of your choosing. Unfortunately, in order to get any value out of your logs, you would need to ingest them into a separate database, like Elasticsearch or Redshift. Maybe you are trying to track and analyze your bandwidth per distribution. Or perhaps you are trying to identify bot traffic by analyzing your top user agent strings per endpoint.

It’s no wonder that users end up feeling like they are struggling to get anything useful out of their CloudFront logs. Even when referring to Amazon documentation, their recommended solution for deep insights into CloudFront logs is by sending your logs from S3 -> Lambda -> Kinesis Firehose -> S3 -> Partitioning Lambda -> Athena.

This solution, even while avoiding the necessity of having to run a database environment yourself to get insights, is still extremely complex and expensive to maintain. Amazon Athena charges $5 per TB “scanned” which can easily grow cost prohibitive when attempting to gain insights over months and years. Companies that generate 1 TB of CloudFront logs per day will need to spend $150 per QUERY in order to ask questions across one month of their log data. One other huge limitation of the Amazon Athena system is the inability to run text search queries across your entire data. With Athena you are limited to SQL style queries only, even as APIs like the Elasticsearch API has grown to become the defacto standard for log search and analytics.

The power of the CHAOSSEARCH platform is to simplify search on your data in your Amazon S3 bucket. We can remove a significant portion of the complexity for customers who are looking to get quick, usable insights into their CloudFront log data. In this post, I’ll talk about how you can setup CloudFront logging, how to create an Amazon Lambda function to process the log data into JSON, and then how to start getting quick answers to your questions. All within minutes.

CloudFront Log Process

At a high level we’ll go through the following steps:

  1. Create a bucket for raw CloudFront logs
  2. Create a bucket for Lambda processed logs
  3. Create our Lambda and update IAM permissions
  4. Enable CloudFront logging on our distribution
  5. Index the processed logs with CHAOSSEARCH
  6. Start getting answers to our questions

The first step is to create a couple of buckets within your Amazon AWS account — I’m creating 2 buckets. One bucket for my raw CloudFront logs and one bucket for my post-processed log data.

Next, I’m going to create a new Lambda function, and I’m going to have it create a new IAM role with basic Lambda permissions.

You can use our example code for the creation of your Amazon Lambda available on our GitHub — or you can see the example below:

DISCLAIMER: This is a “product manager” level of code written below — it’ll get you where you gotta go, but it may not work for your specific use case. It should give you an idea of just how simple it is to process these logs into an easy to parse format like JSON.

require 'zlib' require 'time' require 'json' require 'aws-sdk-s3'   def lambda_handler(event:, context:)   event = event['Records'].first   filename = event['s3']['object']['key']   source_bucket = event['s3']['bucket']['name']     destination_bucket = ENV['DEST_BUCKET']   aws_region = ENV['AWS_REGION']   filedate = Date.parse(filename.split('.')[1]).to_s     s3 = aws_region)     source_file = s3.bucket(source_bucket).object(filename)     data ="\n")     logfile =     def gzip(data)     sio =     gz =     gz.write(data)     gz.close     sio.string   end     logline_schema = [     'date','time','edge_location','sc_bytes','c_ip','cs_method','cs_host','cs_uri_stem','sc_status','cs_referer','cs_user_agent','cs_uri_query','cs_cookie','edge_result_type','edge_request_id','host_header','cs_protocol','cs_bytes','time_taken','forwarded_for','ssl_protocol','ssl_cipher','edge_response_result_type','cs_protocol_version','fle_status','fle_encrypted_fields']     data.each do |line|     logline =     unless line.start_with?("#")       line.split("\t").each_with_index do |line_value, idx|         logline[logline_schema[idx]] = line_value       end       logline['timestamp'] = Time.parse("#{logline['date']} #{logline['time']} UTC").iso8601       logfile << logline.to_json     end   end     processed_filename = "#{File.basename(filename, '.gz')}-processed.json.gz"   obj = s3.bucket(destination_bucket).object([filedate,processed_filename].join('/'))   begin     response = obj.put(body: gzip(logfile.join("\n")))   rescue Aws::S3::Errors::ServiceError => e     puts e.message   end     puts "Sucessfully wrote #{processed_filename} with etag #{response.etag}" end

Now that my Lambda function is created, I simply need to set the environment variable for my destination bucket — and make sure that my Lambda is notified when new log files are written into my CloudFront log bucket.

After that new IAM role is created for my Lambda, I’ll just want to add a new policy to the IAM permissions to make sure my Lambda has the ability to both read from and write to my two S3 buckets.

{     "Version": "2012-10-17",     "Statement": [             "Sid": "VisualEditor0",             "Effect": "Allow",             "Action": [                 "s3:Get*"             ],             "Resource": [                 "arn:aws:s3:::pete-cloudfront-logs/",                 "arn:aws:s3:::pete-cloudfront-logs/*"           },             "Sid": "VisualEditor1",             "Effect": "Allow",             "Action": [                 "s3:Put*"             ],             "Resource": [                 "arn:aws:s3:::pete-cloudfront-processed/",                 "arn:aws:s3:::pete-cloudfront-processed/*" }

Now that everything is set up, we can go and turn on CloudFront logging for your distribution and have those logs start to get sent over. 

Assuming that everything has been set up correctly, you should start to see logging events landing into your source bucket in about 5-10 minutes. Depending on the size of the logs, the Lambda function should process them and drop them into the destination bucket within seconds.

My source bucket:

My destination bucket (while not necessary for CHAOSSEARCH I’ve had my Lambda drop the files into a prefix per day).

Since CHAOSSEARCH never needs to return to the source data, we can now enable an Amazon S3 lifecycle rule to purge files from our source bucket older than a few days. If I need to keep the data for compliance reasons I can always setup Amazon S3 Intelligent Tiering and send those logs off to Glacier.

Now I can navigate over to the CHAOSSEARCH platform and create an object group of all my processed CloudFront logs.

And since I have my Lambda function setup to continually drop new log files into my destination S3 bucket, I can enable CHAOSSEARCH to continually process data leveraging SQS notifications for each PUT message to S3. This will ensure that CHAOSSEARCH continually processes your log data in real time, making it available for search as it lands in your S3 buckets.

One of the best features of the CHAOSSEARCH platform is automated schema discovery of your log data, which means you don’t need to spend any time creating database schemas, or index mappings. During the indexing process, we will automatically identify strings, numbers, time values, etc. And since CHAOSSEARCH leverages a Schema-on-Read architecture, you can always adjust your schema any time without EVER having to reindex your data.

Now we can dive into the fully-integrated Kibana interface and get deep insights into our CloudFront log data.  We can dive into response times from the Edge for both cache hits and misses, understand our customer usage patterns, and even identify potentially malicious user traffic by analyzing source IP addresses and user agent tracking.

Going from raw data within an Amazon S3 bucket to deep insights is something that is now able to happen within minutes. No need to spend time cobbling together multiple different database solutions in order to get answers to your questions. Simply point CHAOSSEARCH at your Amazon S3 buckets and index your data without ever having to move your data out of Amazon S3.

Reach out today and learn more about how you can get quick answers to all your CloudFront questions without ever having to move your data out of your Amazon S3 buckets.

How do I enable CloudFront to access my S3 bucket?

Sign in to the AWS Management Console and open the CloudFront console at // ..
Choose the ID of a distribution that has an S3 origin..
Choose the Origins tab..
Select the Amazon S3 origin, and then choose Edit..
For S3 bucket access, choose Yes use OAI..

How do I enable logging for CloudFront?

Click the “Edit” button from the “General” tab on the top menu. In the “Distribution Settings” tab scroll down and verify the “Logging” feature configuration status. If Logging is “Off” then it cannot create log files that contain detailed information about every user request that CloudFront receives.

Can CloudFront access a private S3 bucket?

To use a bucket that is complete private the „Restrict Bucket Access“ must be yes. CloudFront now uses signed URL´s for requesting new assets and you must use an existing identity or let CloudFront create a new one. CloudFront can update your bucket policy or you can do it by your own.

How do I fix Error 403 CloudFront?

A custom origin is returning the 403 error A 403 error might be caused by an AWS WAF or custom firewall configuration made at the origin. To troubleshoot, make the request directly to the origin. If you can replicate the error without CloudFront, then the origin is causing the 403 error.


Neuester Beitrag
