Skip to content
Bytes by Ying
Go back

Postgres, as an App! (Now with one-click deploys to AWS + Heroku!)

Edit page

Me saying “use Postgres as an app” is fine, but without shipping anything it doesn’t do anyone much good. So let’s ship!

First, the code

Here’s the GitHub repository complementing this blog post. Clone the repository, set up the system requirements, and follow the instructions in the README.


Alternatively, here’s a 3-step process to try out this stack:


Local Modeling

Follow the instructions listed in the local setup. This should model the stack on your local computer using docker-compose.

The difficulty of synchronizing foreign tables surprised me. Foreign table references can vary in many different ways, since they remain arbitrary table definitions, and something like JDBC-based database synchronization isn’t possible or if it is, isn’t cheap. This blog post references a stored procedure that can pull in multiple tables by regex at a time. I haven’t tried it myself, but this would be one option I’d like to try in production should the need arise. Combine that with a job scheduler, and you can run a batched synchronization process. I’ve saved a copy of the stored procedure here.

Remote Modeling on AWS

Follow the instructions listed in the AWS setup. This should model the stack on AWS using AWS CloudFormation, a cloud-native infrastructure-as-code tool. Keep in mind that you do need a valid AWS account for this step, and running this stack may cost money / credits (may not fall within AWS Free Tier usage). This stack also assumes free access to the Internet for downloading package archives and for accessing AWS.

I broke up this stack into multiple CloudFormation templates, because then each stack’s lifecycle can be managed separately from one another. This can be especially important for databases, because you can tear down the database without deleting the underlying files. Whether those files are useful in a recovery situation is a separate discussion, but I wanted to be able to mount a volume on AWS like you can locally with Docker volumes. It might also minimize the amount of copying and pasting you need to do to run templates in different regions, without resorting to something like Pulumi.

Defining an AWS IAM user

AWS supports a complete identity and access management platform. The collection of roles, policies, and permissions keeps unwarranted behavior to a minimum, which is nice from a security and legal perspective.

I read Docker on AWS by Justin Menga a few months back, and he walks you through how to set up an IAM user to force multi-factor authentication when accessing AWS resources. I liked it enough to structure it as an AWS CloudFormation template. In practice, MFA may be annoying to use at times, and it grants IAM admin access (everything except billing) after MFA login, but I think it’s a great first step towards further refining IAM policies.

I found password scripting for IAM users fairly tricky. I’ve created my own passwords and IAM rejected them as they did not follow AWS’s default password policy, so now I use awscli and the aws secretsmanager command instead. This Stack Overflow answer provided some good insight to me with respect to what’s appropriate for AWS, as the password policy appeared opaque to me. Keep in mind you can always change your password in the AWS console later and get clearer errors. Store passwords in a password manager like Bitwarden for ease of access later on.

I’ve found this to be the only step I can’t script, and / or don’t feel comfortable scripting entirely, since you need to add some changes to your local system (i.e. it’s not all changes to remote infrastructure), and since you need to use it in all other parts of your stack. You don’t have to set up an IAM user, as long as you have access to the relevant AWS resources defined in these templates.

Configuring a VPC

AWS by default creates a VPC and two public subnets for your applications, and those are the ones used if you don’t create your own VPC. If all services you’re creating are accessible via Internet and need access to the Internet, this arrangement is perfectly fine.

I’m creating my own VPC here because in addition to learning how to do so, I’d like to add private subnets and a network address translation (NAT) gateway to this VPC, in order to secure the custom database while pulling down updates from package archives (i.e. running sudo apt-get -y update, which requires DNS resolution). I haven’t done so, I acknowledge it’s bad, but NAT gateways cost extra money to run and I’m pretty cost-constrained. Suffice to say a NAT would be the first thing on my todo list. Either that, or building a VM with no expectation of needing the Internet, and deploying said VM behind a private subnet without adding a NAT gateway.

I found this InfoQ article very useful on getting started with VPCs and CloudFormation.

Creating the RDS instance

It’s a bit difficult to prototype using RDS because it takes some time for instances to start up and tear down, due to running backups and whatnot. That may be another reason why separating out various services into their own CloudFormation templates is a good idea, because faster deployment cycles means engineering is less of a bottleneck.

Since this is a proof of concept only, I templated this without usage of any private subnets, enabled access outside the VPC, and turned on global access to the instance via SSH. In production, you’d want to use private subnets and remove inbound rules except to allowlisted IPs.

If you log into the RDS instance, and try to create an unsupported extension, you can see how RDS locks down your data:

pgdb=> CREATE EXTENSION pg_cron;
ERROR:  Extension "pg_cron" is not supported by Amazon RDS
DETAIL:  Installing the extension "pg_cron" failed, because it is not on the list of extensions supported by Amazon RDS.
HINT:  Amazon RDS allows users with rds_superuser role to install supported extensions. See: SHOW rds.extensions;

If you didn’t want to create the table by hand, you can also create an AWS Lambda custom resource in order to run a SQL file or script after RDS signals CloudFormation that the database is stood up. I didn’t create the custom resource for this tutorial due to time and resource constraints.

Creating the EBS data layer

We can back our custom database deployment using AWS Elastic Block Store (EBS). I found this AWS article on deploying databases with ECS and EBS to be very useful when templating out this feature.

Creating the ECS compute node for custom Postgres / PostgREST

Now, we can stick on our compute node for our custom database, and our PostgREST proxy.

I’m using Docker and AWS ECS to deploy this stack. If you don’t want to use Docker, you can define your own “automated machine image” using a tool like Packer, and deploy that AMI directly onto a VM, cutting Docker out of the process. That’s honestly a safer way of doing things, since containers are meant to be created and terminated without notice, which may not be appropriate for databases. Might also be faster, as well, as ECS API calls are on top of EC2 API calls.

I’m using an EC2 autoscaling group, which from my understanding means the Docker containers for a given ECS cluster run on that EC2 node, and if additional load is detected, a new EC2 instance is created with that cluster replicated. If that is the case, it’s kind of like a Kubernetes pod, without having to install Kubernetes.

I’m using Docker Hub for this example. Usually, I use AWS Elastic Container Registry (ECR) an alternative to Docker Hub. I typically use ECR over Docker Hub because of the unlimited private repositories you can create, and because you can template it easily with CloudFormation. I’m not using ECR for this tutorial because it isn’t possible to fetch public Docker images from a private repository unless you add things like API Gateway + Lambda, which may be prone to breaking, and because building and uploading Docker images may work well with one-click deploys.

I wanted to integrate PostgREST with AWS ECS, but I ran into the problem of service discovery between interconnected containers. Apparently, containers from different task definitions have a hard time communicating with each other. There’s ways around it, like AWS has a Route 53 DNS-based service discovery tool integrated with AWS Cloud Map, but that involves standing up your own DNS servers and that’s where I drew the line in terms of scoping out this proof of concept.

What comes next?

I’m sure there’s other ideas I haven’t thought of. The app possibilities are quite endless, as they are with anything. But this with kernel as the core of your stack, you can effectively communicate your work and results with non-technical stakeholders in a much more transparent manner.


Edit page
Share this post on:

Previous Post
Property-Based Testing with `hypothesis`, and associated use cases
Next Post
#lifeprotip: Haskell-inspired "lifting into structure" for individual shell commands within a Docker context