Image for post
Image for post

Schedule AWS Lambda to download a file from the internet and save it to S3

How to write an AWS Lambda (Node.js) function and schedule it daily to download a file from the internet and save it into an S3 bucket.

This post will explain how to use AWS Lambda to download a file each day and save the file into an S3 bucket.

Why did I pick Lambda? The task runs for ~1 second every day and I don’t really need a virtual machine for that — or any infrastructure in fact— I sought the cheapest and simplest solution that I could natively trigger my code on a schedule (cron). AWS Lambda ticked all the right boxes for me, and the cost of the solution is less than a $1 a year.

Here’s the work brief that we’ll go through in this short tutorial.

  1. The process should run once every hour of the day.
  2. The process should download a file from an internet location (https) and save it into an S3 bucket.
  3. The object metadata should include the content type from the origin.
  4. It should be designed/written in such a way as we can run the same code for different source files.

I’m going to demonstrate how to do this directly via the Console, and I’ll follow up with setting up a development environment and using the AWS Serverless Application Model and command line to achieve the same.

All the source code for this article can be downloaded from GitHub.

Approach 1 — AWS Console

The steps we’re going to follow are;

  1. Create an S3 bucket to hold our file
  2. Create a Lambda function
  3. Update IAM to allow our Lambda function to write to S3
  4. Write our code and set some dynamic properties (source file, target bucket, and the target filename).
  5. Create a Test and verify everything is working.
  6. Configure a Schedule so the Lambda function will run every day.

First step is to create our bucket in AWS S3 — I selected all the default options, and I’ll be using a bucket called “our-lambda-demo”.

Image for post
Image for post

Next step is to head over to AWS Lambda and “Create function” where we are going to select to “Author from scratch”.

Image for post
Image for post

I’ve named the function “downloadFileToS3” and left all the defaults in place.

Image for post
Image for post

Once your function has been created, go to the “Permissions” tab and follow the link to our Execution Role in the AWS IAM Management Console.

Image for post
Image for post

Once in IAM, you will see that the default setup only has basic Lambda execution permissions. This includes the ability to write out to CloudWatch logs, but little else. So we need to give write access to S3.

Image for post
Image for post

I’m going the “Add inline policy” route, but you could also go through “Attach policies” and add an existing managed policy such as “AmazonS3FullAccess”.

You can use the dialogue to create a policy, or download the JSON if you want to just copy mine (remember to change the bucket name).

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::our-lambda-demo/*"
}
]
}

Give your Policy a name and save.

Back in Lambda, navigate to the code on the Configuration tab and we’re going to upload some code and its dependencies.

Lambda provides us with access to the ‘aws-sdk’ automatically but anything else you will need to upload — the AWS SAM instructions show a more complete solution that allows you how to develop the code further, package everything together automatically, and deploy.

For now, select “Upload a .zip file” and upload the provided index.zip file (shared on GitHub).

Image for post
Image for post

Once uploaded, you should see the index.js file with the function code and its dependencies under node_modules.

Image for post
Image for post

You’ll see that this piece of code requires three environment variables to be set which you can add from the Configuration tab also.

  1. SOURCE_URI — the full path to the internet source we are downloading
  2. S3_BUCKET — the bucket we are writing to
  3. S3_KEY — the name of the file we are going to write into S3

Scroll down to “Manage environment variables” and add the variables. Once done, you should have something that looks like this.

Image for post
Image for post

We should now be ready to test out our function. You can find “Test” at the upper right of the screen, use that to create a dummy test execution. Leave all the defaults in place and call it “MyTestEvent” and hit Create.

Image for post
Image for post

Ensure your new test event is selected in the drop down and click “Test” again to run your test.

Image for post
Image for post

All going well, and you should see output like this showing the completion of your test along with metadata about the job execution and a link to the CloudWatch logs.

Image for post
Image for post

Heading back to AWS S3 and we can see that our file was saved.

Image for post
Image for post

And opening the file displays its contents.

Image for post
Image for post

Awesome! Next and final stage is to configure a schedule so that our job will continue to run and update the file in our S3 bucket.

Back in the Configuration tab, select “+ Add trigger”.

Image for post
Image for post

From here we’re going to choose “CloudWatch Events/EventBridge” and create a new rule.

Image for post
Image for post

Give your rule a name and create the schedule using Cron or rate expressions.

Image for post
Image for post

Save and you should find your Trigger is now generated.

Image for post
Image for post

I updated mine to every minute to demo the success of running on a schedule.

Image for post
Image for post

So that’s it. Congratulations. We’ve written a Lambda function that runs on a schedule, will download a file and save that file to an S3 bucket.

Approach 2 — AWS Serverless Application Model

This section will repeat the same process — from scratch — using AWS Serverless Application Model, which is an “open-source framework that you can use to build serverless applications on AWS”.

AWS SAM can be used for local running and testing of our NodeJS Lambda function, and helps us to build and deploy the application. It’s a really cool tool, and behind the scenes it creates a declarative CloudFormation template that defines the stack and all the associated services required.

Setting up your development environment

Although we don’t need the AWS CLI installed to use AWS SAM, I have version 2 installed. If you don’t have the CLI installed, you’ll need to create a credentials file or set your AWS credentials as environment variables.

I’ve just reinstalled everything today, so here’s what I have.

aws --version
aws-cli/2.0.6 Python/3.7.5 Windows/10 botocore/2.0.0dev10
sam --version
SAM CLI, version 0.47.0
node -v
v12.16.1
npm -v
6.13.4

Just a reminder that all the source code for this article can be downloaded from GitHub.

I’m going to assume no prior knowledge of Node.js development, but feel free to skip through the project initialization stage.

We start off in an empty directory and we initialize using npm.

npm init

Give your application a name and you can leave all the other defaults as-is. On completion, you will get a default “package.json” file.

We need to add a few dependencies for our function to work (‘request-promise’ and ‘request’), so follow these two commands to get going.

npm install request-promise --save
npm install request --save

You’ll notice that your “package.json” has been updated and you should have a file structure like this.

Image for post
Image for post

With our project initialized and dependencies in place, we are now going to create the Lambda function code, so create a file called “index.js” and copy the contents from here: index.js on GitHub

Function source code

The code is trivial to meet our objectives.

Image for post
Image for post

We take in three parameters (as environment variables) and use these to download a file and save to S3. If you skipped the Console demonstration, a reminder of those three parameters that you can see in the code;

  1. SOURCE_URI — the full path to the internet source we are downloading
  2. S3_BUCKET — the bucket we are writing to
  3. S3_KEY — the name of the file we are going to write into S3

So far we’ve got our code and dependencies in place. Now to begin looking at AWS Serverless Application Model or SAM.

All AWS SAM operations require a template file (“template.yaml” by default) that defines all the resources we require. This is an extension to CloudFormation so you’ll recognise there are plenty of overlaps.

To get started we need create our SAM template in our root directory. You can copy the example from here: “template-1.yaml on GitHub”. You can just call it “template.yaml” if you prefer, as that’s the default.

Image for post
Image for post

This initial template file was very basic and is just to get started. You’ll see we define our handler and runtime, the policy we’re using (the managed AmazonS3FullAccess), and the environment variables.

Your project folder should now look like this.

Image for post
Image for post

Create target bucket in S3

I’m using the command line throughout for this part of the tutorial, but you can use the Console of course. Remember earlier that I installed the AWS CLI.

aws s3 mb s3://our-lambda-demo
make_bucket: our-lambda-demo
aws s3 ls
2020-04-10 16:15:38 our-lambda-demo

Optional Stage — local testing with AWS SAM — requires Docker

Local testing requires Docker to be installed on your workstation.

AWS SAM provides a local Docker execution environment that allows for the testing of your Lambda function code without needing to upload to AWS — this is invoked using the sam local invoke command.

sam local invoke --no-event -t .\template-1.yaml

All being well, you should see output like this showing the Lambda function has been successfully executed on your local workstation and the file was created successfully.

The code shared previously returns the output from the s3.upload() command, and we can see that in the JSON in the screenshot.

Image for post
Image for post

Let’s view the contents of the bucket to confirm our file was saved.

aws s3 ls s3://our-lambda-demo
2020-04-10 16:17:03 85 our-example-file

Build, package, and deploy to AWS

Now that we have successfully created our function, and optionally tested it, we are ready to deploy it to AWS.

AWS SAM provides a couple of guided stages here:

  • sam build followed by sam deploy

First of all, lets use AWS SAM to build our application.

sam build -t .\template-1.yaml
Building resource 'downloadFileToS3'
Running NodejsNpmBuilder:NpmPack
Running NodejsNpmBuilder:CopyNpmrc
Running NodejsNpmBuilder:CopySource
Running NodejsNpmBuilder:NpmInstall
Running NodejsNpmBuilder:CleanUpNpmrc
Build SucceededBuilt Artifacts : .aws-sam\build
Built Template : .aws-sam\build\template.yaml
Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Deploy: sam deploy --guided

You’ll notice a new directory has been created called .aws-sam/

Now we’re going to deploy to AWS and use all the default options.

sam deploy --guided

You’ll see a lot of output as resources are generated (too much to paste here).

Also notice how it’s backed off to CloudFormation. We have a stack, function, and role created for us automatically.

And if you head over to S3, you’ll find a new bucket was created automatically to contain your versioned, packaged code.

Image for post
Image for post

So lets go to the Console to see what was created. And we can run a simple test there too. Starting off in CloudFormation we can see the sam-app stack.

Image for post
Image for post

And across in AWS Lambda we can see the function was created successfully. Let’s create a Test and run it to verify… all good…

Image for post
Image for post

Adding a schedule trigger

So far, we are invoking our new function manually, let’s update the template to include our Schedule under Events. You can copy the updated example from here: “template-2.yaml on GitHub”. Again, you may prefer to just call your file “template.yaml”, which is the default.

Image for post
Image for post

Next we follow the same process as before, click through all the defaults.

sam build -t .\template-2.yaml
sam deploy --guided

You’ll see that CloudFormation will only update for the changes it recognises need to be applied. Heading back to the Console and we can see that the “CloudWatch Events/EventBridge” Trigger has been created for us.

Image for post
Image for post

And a quick check of the job run history, and we can see that it’s run a few times so that’s all worked as expected.

Image for post
Image for post

Conclusion

In this article, I demonstrated;

  • a Lambda function that will download a file from the internet and save it to an S3 bucket, and we passed in parameters so we can re-use the code.
  • how to schedule Lambda functions to run on a standard cron schedule.
  • how to achieve all the requirements within the AWS Management Console and by using the AWS Serverless Application Module.

A note from the author

Thank you for reading this article — I hope you found this article useful and I look forward to your comments and feedback.

A reminder that all the source code for this article can be downloaded from GitHub. If you want to learn more about AWS SAM, check out the AWS Serverless Application Module documentation.

You can follow me on Twitter and connect on LinkedIn.

Written by

DevOps | SRE | AWS | GCP https://twitter.com/davelms

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store