TinyDevCRM Update #14: Wrapping up main DevOps push

This is a summary of TinyDevCRM development for the week of May 9th, 2020 to May 16th, 2020.

Goals from last week

  • [❓] Push the PostgreSQL + pg_cron Docker images pushed up to AWS ECR
  • [❓] Ensure that rex-ray Docker volume plugin persists PostgreSQL volumes after draining EC2/ECS instances, and that the same EBS volume connected to multiple different EC2 instances keeps the same cron.schedule table.
  • [❓] Do a write-up of shipping a Dockerized PostgreSQL instance with PostgreSQL extensions

  • [❓] Get static files properly shipped on both docker-compose and AWS via local EC2 Docker volumes
  • [❓] Get data uploads (CSV files) properly shipped on both local and AWS environments via EFS volumes

  • [❓] Get up CI/CD pipelines for test/production deploys with AWS CodeBuild and AWS CodePipeline
  • [❓] Update Basecamp roadmap
  • [❓] Add Docker healthchecks for staging release + acceptance + testing environments.
  • [❓] Add security group restrictions (policies and security group ingress/egress rules) for all CloudFormation resources
  • [❓] Deploy database cluster underneath private subnets + NAT, and turn off direct SSH access and mapping to IPV4 addresses
  • [❓] Add SNS topic for CloudFormation deploys

What I got done this week

  • [✔] Push the PostgreSQL + pg_cron Docker images pushed up to AWS ECR
  • [✔] Ensure that rex-ray Docker volume plugin persists PostgreSQL volumes after draining EC2/ECS instances, and that the same EBS volume connected to multiple different EC2 instances keeps the same cron.schedule table.
  • [👉] Do a write-up of shipping a Dockerized PostgreSQL instance with PostgreSQL extensions

  • [✔] Automate creation of Django superusers, esp. if there aren't any superusers already.
  • [❌] Create custom change password forms, so that passwords do not have write access in the browser, and you can change passwords in a plaintext form.

  • [✔] Create a separate Docker volume locally for various files POST'ed to TinyDevCRM API, such as CSV data dumps or PostgreSQL stored procedures.
  • [✔] Create a Django model for CSV data dumps that references the Docker volume / EFS resource.
  • [✔] Create an API endpoint /v1/data/upload in order to post a CSV file after getting an auth token, in order to make sure that the CSV file is correctly uploaded to /tinydevcrm-files.
  • [❓] Add an integration test to ensure the prior feature works for happy path cases, and register test run command as a Docker stage.
  • [✔] Add EFS resource for rex-ray plugin in order to mount /tinydevcrm-files as an EFS resource instead of local to the EC2 instance.
  • [✔] Mount a volume /tinydevcrm-files to the EFS resource.
  • [✔] Get data uploads (CSV files) properly shipped on both local and AWS environments via EFS volumes

  • [❌] Get up CI/CD pipelines for test/production deploys with AWS CodeBuild and AWS CodePipeline
  • [❌] Update Basecamp roadmap
  • [❌] Add Docker healthchecks for staging release + acceptance + testing environments.
  • [❌] Add security group restrictions (policies and security group ingress/egress rules) for all CloudFormation resources
  • [❌] Deploy database cluster underneath private subnets + NAT, and turn off direct SSH access and mapping to IPV4 addresses
  • [❌] Add SNS topic for CloudFormation deployments.
  • [❌] Update app dashboard with concrete data, table views, trigger definitions, and cron job scheduling, and notification endpoints.
  • [❌] Update landing page with user feedback, and hopefully some screenshots of the dashboard.

It looks like I have most of the devops stuff working. It's janky and most probably insecure, but it works for now and I don't think I want to touch this until I pay for AWS Support again (or migrate off of AWS entirely).

Metrics

  • Weeks to launch (primary KPI): 3 (10 weeks after declared KPI of 1 week)
  • Users talked to total: 1

RescueTime statistics

  • 71h 9m (56% productive)
    • 28h 31m “software development”
    • 19h 42m “utilities”
    • 8h 47m “entertainment”
    • 8h 13m “communication & scheduling”
    • 3h 42m “uncategorized”

iPhone screen time (assumed all unproductive)

  • Total: 31h 10m
  • Average: 4h 27m
  • Performance: 12% increase from last week

Hourly journal

https://hourly-journal.yingw787.com

Goals for next week

Scaffold out the MVP:

  • [❓] Add API endpoint for creating SQL tables using form-based POST requests.
  • [❓] Add API endpoint for viewing SQL tables.
  • [❓] Add API endpoint for creating materialized views.
  • [❓] Add API endpoint for viewing materialized views.
  • [❓] Add API endpoint for creating trigger definitions.
  • [❓] Add API endpoint for viewing trigger definitions.
  • [❓] Add API endpoint for creating a scheduled cron job.
  • [❓] Add API endpoint for viewing scheduled cron jobs.
  • [❓] Add WebSocket endpoint for forwarding notifications.
  • [❓] Add dashboard view for viewing SQL tables.
  • [❓] Add dashboard view for creating materialized views.
  • [❓] Add dashboard view for viewing materialized views.
  • [❓] Add dashboard view for creating trigger definitions.
  • [❓] Add dashboard view for viewing trigger definitions.
  • [❓] Add dashboard view for creating a scheduled cron job.
  • [❓] Add dashboard view for viewing scheduled cron jobs.
  • [❓] Add dashboard view for forwarding notifications.
  • [❓] Update landing page with user feedback from Pioneer.app.
  • [❓] Update Basecamp roadmap
  • [❓] Add API endpoint for updating/deleting SQL tables.
  • [❓] Add API endpoint for updating/deleting materialized views.
  • [❓] Add API endpoint for updating/deleting trigger definitions.
  • [❓] Add API endpoint for updating/deleting scheduled cron jobs.
  • [❓] Create Django custom change password forms, so that passwords do not have write access in the browser, and you can change passwords in a plaintext form.
  • [❓] Get up CI/CD pipelines for test/production deploys with AWS CodeBuild and AWS CodePipeline
  • [❓] Add Docker healthchecks for staging release + acceptance + testing environments.
  • [❓] Add security group restrictions (policies and security group ingress/egress rules) for all CloudFormation resources
  • [❓] Deploy database cluster underneath private subnets + NAT, and turn off direct SSH access and mapping to IPV4 addresses
  • [❓] Add SNS topic for CloudFormation deployments.

Things I've learned this week

  • Persistent volumes are surprisingly annoying. This week went deeper into EFS/EBS provisioning, and I thought I would be able to provision EBS and EFS volumes via CloudFormation and HTTP APIs. Turns out you can't, not completely, if the AMIs

    Just take a look at this tutorial. I'm probably going to reference this tutorial among others, but this is what passes for Fn::UserData:

    ContainerInstances:
        Type: AWS::AutoScaling::LaunchConfiguration
        Properties:
        ImageId:
            Ref: ECSAMI
        InstanceType:
            Ref: InstanceType
        IamInstanceProfile:
            Ref: EC2InstanceProfile
        KeyName:
            Ref: KeyName
        AssociatePublicIpAddress: true
        SecurityGroups:
            - Ref: InstanceSecurityGroup
        UserData:
            Fn::Base64:
            Fn::Sub: "#!/bin/bash\nyum install -y aws-cfn-bootstrap\n/opt/aws/bin/cfn-init
                -v --region ${AWS::Region} --stack ${AWS::StackName} --resource ContainerInstances\n/opt/aws/bin/cfn-signal
                -e $? --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSAutoScalingGroup\n\nexec
                2>>/var/log/ecs/ecs-agent-install.log\nset -x\nuntil curl -s http://localhost:51678/v1/metadata\ndo\n
                \  sleep 1\ndone\ndocker plugin install rexray/ebs REXRAY_PREEMPT=true
                EBS_REGION=us-west-2 --grant-all-permissions\nstop ecs \nstart ecs\n"
        Metadata:
        AWS::CloudFormation::Init:
            config:
            packages:
                yum:
                aws-cli: []
                jq: []
                ecs-init: []
            commands:
                01_add_instance_to_cluster:
                command:
                    Fn::Sub: echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
                02_start_ecs_agent:
                command: start ecs
            files:
                "/etc/cfn/cfn-hup.conf":
                mode: 256
                owner: root
                group: root
                content:
                    Fn::Sub: |
                    [main]
                    stack=${AWS::StackId}
                    region=${AWS::Region}
                "/etc/cfn/hooks.d/cfn-auto-reloader.conf":
                content:
                    Fn::Sub: |
                    [cfn-auto-reloader-hook]
                    triggers=post.update
                    path=Resources.ContainerInstances.Metadata.AWS::CloudFormation::Init
                    action=/opt/aws/bin/cfn-init -v --region ${AWS::Region} --stack ${AWS::StackName} --resource ContainerInstances
            services:
                sysvinit:
                cfn-hup:
                    enabled: true
                    ensureRunning: true
                    files:
                    - /etc/cfn/cfn-hup.conf
                    - /etc/cfn/hooks.d/cfn-auto-reloader.conf
    

    Especially this part:

    UserData:
        Fn::Base64:
        Fn::Sub: "#!/bin/bash\nyum install -y aws-cfn-bootstrap\n/opt/aws/bin/cfn-init
            -v --region ${AWS::Region} --stack ${AWS::StackName} --resource ContainerInstances\n/opt/aws/bin/cfn-signal
            -e $? --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSAutoScalingGroup\n\nexec
            2>>/var/log/ecs/ecs-agent-install.log\nset -x\nuntil curl -s http://localhost:51678/v1/metadata\ndo\n
            \  sleep 1\ndone\ndocker plugin install rexray/ebs REXRAY_PREEMPT=true
            EBS_REGION=us-west-2 --grant-all-permissions\nstop ecs \nstart ecs\n"
    

    You can imagine the more services you need to tie together, the more complex this string becomes. Fat-finger this, and you get silent failures that might get picked up via CloudWatch or the EC2 system logs, if you're lucky.

    Seriously, use an AMI definition tool like Packer. That'll be one of my ops priorities after all this is over.

    I think this is probably why S3 is so popular. My understanding of S3 is that it's a primarily HTTP-based service, or at least I've never interacted with it by provisioning a driver locally. It's all through boto3 and having an S3 Python object, which makes it tremendously easy to scale on top of EC2. Of course, the downside is you can't host your own version of S3, whereas you can host an NFS store or have an interconnect to a block store yourself, which mkaes it much easier to model for on-premise use cases.

  • Automated provisioning of AWS network-based resources like EFS suffers from some permissioning issues. I'm using this Docker plugin for EFS to minimize some burdens, but apparently deleting resources that require a network interface via CloudFormation causes dependency errors. I tried provisioning an EFS volume using the rex-ray Docker volume plugin, and when tearing down the stack, the network interface and security group got caught in a dependency error, which resulted in the stack not tearing down properly.

    I think this is another argument for configuring your own AMIs; you can use a local Docker volume driver and mount the drive in a shell command, instead of calling an API to do so, and it should be more reliable.

Subscribe to my mailing list