TinyDevCRM Update #16: Realtime Streaming + Other Keystones

This is a summary of TinyDevCRM development for the week of May 23rd, 2020 to May 30th, 2020.

Goals from last week

  • [❓] Make all Django models user-aware
  • [❓] Set up channels Django model and be able to successfully curl an HTTP endpoint to get events
  • [❓] Update dashboard to reflect basic creation process
  • [❓] Ship stacks to production
  • [❓] Add API endpoint for viewing SQL tables.
  • [❓] Add API endpoint for viewing materialized views.
  • [❓] Add API endpoint for creating trigger definitions.
  • [❓] Add API endpoint for viewing trigger definitions.
  • [❓] Add API endpoint for viewing scheduled cron jobs.
  • [❓] Add WebSocket endpoint for forwarding notifications.
  • [❓] Add dashboard view for viewing SQL tables.
  • [❓] Add dashboard view for creating materialized views.
  • [❓] Add dashboard view for viewing materialized views.
  • [❓] Add dashboard view for creating trigger definitions.
  • [❓] Add dashboard view for viewing trigger definitions.
  • [❓] Add dashboard view for creating a scheduled cron job.
  • [❓] Add dashboard view for viewing scheduled cron jobs.
  • [❓] Add dashboard view for forwarding notifications.
  • [❓] Update landing page with user feedback from Pioneer.app.
  • [❓] Update Basecamp roadmap
  • [❓] Do a write-up of shipping a Dockerized PostgreSQL instance with PostgreSQL extensions
  • [❓] Add API endpoint for updating/deleting SQL tables.
  • [❓] Add API endpoint for updating/deleting materialized views.
  • [❓] Add API endpoint for updating/deleting trigger definitions.
  • [❓] Add API endpoint for updating/deleting scheduled cron jobs.
  • [❓] Create Django custom change password forms, so that passwords do not have write access in the browser, and you can change passwords in a plaintext form.
  • [❓] Get up CI/CD pipelines for test/production deploys with AWS CodeBuild and AWS CodePipeline
  • [❓] Add Docker healthchecks for staging release + acceptance + testing environments.
  • [❓] Connect AWS application to AWS database, and add migration task definition.
  • [❓] Add security group restrictions (policies and security group ingress/egress rules) for all CloudFormation resources
  • [❓] Deploy database cluster underneath private subnets + NAT instead of public subnet, and turn off direct SSH access and mapping of EC2 instances to IPV4 addresses
  • [❓] Add SNS topic for CloudFormation deployments.
  • [❓] Add support for “agents” via authentication (“bots”) with JWT fine-grained permissioning.

What I got done this week

  • [✔] Make all Django models user-aware

  • [✔] Set up channels Django model and be able to successfully curl an HTTP endpoint to get events

  • [❌] Update dashboard to reflect basic creation process

  • [❌] Ship stacks to production


  • [❌] Add API endpoint for viewing SQL tables.

  • [❌] Add API endpoint for viewing materialized views.

  • [✔] Add API endpoint for creating trigger definitions.

  • [❌] Add API endpoint for viewing trigger definitions.

  • [❌] Add API endpoint for viewing scheduled cron jobs.

  • [✔] Add WebSocket endpoint for forwarding notifications. (This is actually HTTP/2 and Server-Sent Events)


  • [❌] Add dashboard view for viewing SQL tables.

  • [❌] Add dashboard view for creating materialized views.

  • [❌] Add dashboard view for viewing materialized views.

  • [❌] Add dashboard view for creating trigger definitions.

  • [❌] Add dashboard view for viewing trigger definitions.

  • [❌] Add dashboard view for creating a scheduled cron job.

  • [❌] Add dashboard view for viewing scheduled cron jobs.

  • [❌] Add dashboard view for forwarding notifications.


  • [❌] Update landing page with user feedback from Pioneer.app.


  • [❌] Update Basecamp roadmap


  • [❌] Do a write-up of shipping a Dockerized PostgreSQL instance with PostgreSQL extensions


  • [❌] Add API endpoint for updating/deleting SQL tables.

  • [❌] Add API endpoint for updating/deleting materialized views.

  • [❌] Add API endpoint for updating/deleting trigger definitions.

  • [❌] Add API endpoint for updating/deleting scheduled cron jobs.


  • [❌] Create Django custom change password forms, so that passwords do not have write access in the browser, and you can change passwords in a plaintext form.


  • [❌] Get up CI/CD pipelines for test/production deploys with AWS CodeBuild and AWS CodePipeline

  • [❌] Add Docker healthchecks for staging release + acceptance + testing environments.

  • [❌] Connect AWS application to AWS database, and add migration task definition.

  • [❌] Add security group restrictions (policies and security group ingress/egress rules) for all CloudFormation resources

  • [❌] Deploy database cluster underneath private subnets + NAT instead of public subnet, and turn off direct SSH access and mapping of EC2 instances to IPV4 addresses

  • [❌] Add SNS topic for CloudFormation deployments.

  • [❌] Add support for “agents” via authentication (“bots”) with JWT fine-grained permissioning.

  • [❌] Read more about deploying NGINX on AWS using Packer, in order to configure slact/nchan for production use cases; PacktPub has ‘NGINX HTTP Server - Fourth Edition’ available via a paid subscription I already have.

  • [❌] Remove dependency on the GRIP protocol and Fanout/Pushpin by removing dependency django-eventstream and relying on slact/nchan (Likely will not do, as slact/nchan is undergoing a rewrite + migration off of NGINX)

There's a good amount that isn't mentioned here, namely the substeps it took in order to get Django Channels up and running, and the weighing / research of different solutions like Pushpin.

Metrics

  • Weeks to launch (primary KPI): 2 (12 weeks after declared KPI of 1 week)
  • Users talked to total: 1

RescueTime statistics

  • 75h 55m (46% productive, 15% decrease from last week)
    • 23h 10m “software development”
    • 19h 44m “utilities”
    • 15h 0m “entertainment”
    • 8h 21m “communication & scheduling”
    • 5h 26m “uncategorized”

This is a pretty scary decline. I need to watch myself this week. Hopefully I can hit 60% productivity this week, though that will really depend on how this Saturday goes (and whether this Sunday is enough to sink me).

iPhone screen time (assumed all unproductive)

  • Total: 28h 16m
  • Average: 4h 2m
  • Performance: 6% increase from last week

This is also pretty scary. Combined, I've spent 104h, 11m in front of a screen this past week. I feel the fatigue in my eyes and this is definitely not sustainable. My biggest weakness is not being accountable during the weekends and the mornings, and I'm not quite sure how to address that.

Hourly journal

https://hourly-journal.yingw787.com

Goals for next week

One good thing is this week, I have a much clearer vision of what I need to do in order to ship MVP v0.1 to production.

  • [❓] Update /channels//create/ in order to create the SQL stored procedures for creating trigger, dropping trigger, creating function, and dropping function to issue a notification on a global PostgreSQL channel (wrap within django.db.transaction if possible)
  • [❓] Create Python process to read in psycopg2 asynchronous notifications from global PostgreSQL channel, and upon notify event, send event to Pushpin channel that a refresh has taken place.
  • [❓] Map Python process to a global subprocess that starts when the Django app is started (do NOT create one subprocess per API call).
  • [❓] Finish development-side Dockerization
  • [❓] Begin production Dockerization; pip install daphne and remove uwsgi, and finalize Dockerfile definitions for docker-compose.aws.yaml.
  • [❓] Publish Dockerfile images to Elastic Container Registry.
  • [❓] Update app.yaml CloudFormation stack definition in order to integrate Pushpin reverse proxy.
  • [❓] Re-deploy CloudFormation stack to release Pushpin into production.
  • [❓] Connect AWS application to AWS database, and add migration task definition.
  • [❓] Add back security group restrictions (policies and security group ingress/egress rules) for all CloudFormation resources
  • [❓] Deploy database cluster underneath private subnets + NAT instead of public subnet, and turn off direct SSH access and mapping of EC2 instances to IPV4 addresses
  • [❓] Add HTTP redirect to HTTPS using ACM-defined *.tinydevcrm.com SSL certificate.
  • [❓] Alias the DNS-qualified name of the app load balancer to api.tinydevcrm.com.

At this point, MVP v0.1 should be cut for the API, and work should resume on the dashboard.

  • [❓] Update dashboard “Sign In” and “Sign Up” features.
  • [❓] Add dashboard view for creating SQL tables.
  • [❓] Add dashboard view for creating materialized views.
  • [❓] Add dashboard view for creating a scheduled cron job.
  • [❓] Add dashboard view for creating an HTTP/2 SSE channel.

After the dashboard has been completed to satisfaction, add back the following features for the API:

  • [❓] Add API endpoint for viewing all tables on a per-user basis.
  • [❓] Add API endpoint for viewing all materialized views on a per-user basis.
  • [❓] Add API endpoint for viewing all cron jobs (active / inactive) on a per-user basis.
  • [❓] Add API endpoint for viewing all channels (active / inactive) on a per-user basis.
  • [❓] Add API endpoint for viewing a specific table on a per-user basis.
  • [❓] Add API endpoint for updating a specific table on a per-user basis.
  • [❓] Add API endpoint for deleting a specific table on a per-user basis (with cascading drop for all dependent foreign-key resources).
  • [❓] Add API endpoint for viewing a specific materialized view on a per-user basis.
  • [❓] Add API endpoint for updating a specific materialized view on a per-user basis.
  • [❓] Add API endpoint for deleting a specific materialized view on a per-user basis (with cascading drop for all dependent foreign-key resources).
  • [❓] Add API endpoint for viewing a specific scheduled cron job on a per-user basis.
  • [❓] Add API endpoint for updating a specific scheduled cron job on a per-user basis.
  • [❓] Add API endpoint for deleting a specific scheduled cron job on a per-user basis (with cascading drop for all dependent foreign-key resources).
  • [❓] Add API endpoint for viewing a specific HTTP/2 SSE channel on a per-user basis.
  • [❓] Add API endpoint for updating a specific HTTP/2 SSE channel on a per-user basis.
  • [❓] Add API endpoint for deleting a specific HTTP/2 SSE channel on a per-user basis (with cascading drop for all dependent foreign-key resources).

At this point, API v0.2 should be declared, and work on dashboard v0.2 should begin.

  • [❓] Update dashboard to include creation wizard.
  • [❓] Add dashboard view for updating specific tables.
  • [❓] Add dashboard view for deleting specific tables.
  • [❓] Add dashboard view for updating specific materialized views.
  • [❓] Add dashboard view for deleting specific materialized views.
  • [❓] Add dashboard view for updating specific scheduled cron jobs.
  • [❓] Add dashboard view for deleting specific scheduled cron jobs.
  • [❓] Add dashboard view for updating specific HTTP/2 SSE channels.
  • [❓] Add dashboard view for deleting specific HTTP/2 SSE channels.

Miscellaneous tasks:

  • [❓] Update landing page with user feedback from Pioneer.app.

  • [❓] Update Basecamp roadmap

  • [❓] Do a write-up of shipping a Dockerized PostgreSQL instance with PostgreSQL extensions

  • [❓] Do a write-up of shipping TinyDevCRM over the past few months

  • [❓] Create Django custom change password forms, so that passwords do not have write access in the browser, and you can change passwords in a plaintext form.

  • [❓] Get up CI/CD pipelines for test/production deploys with AWS CodeBuild and AWS CodePipeline

  • [❓] Add Docker healthchecks for staging release + acceptance + testing environments.

  • [❓] Add SNS topic for CloudFormation deployments.

Backlog:

  • [❓] Create an ‘etl.tinydevcrm.com’ to create a spreadsheet view of data / import handlers in order to POST a Parquet file to TinyDevCRM (do NOT integrate ETL logic directly into the data layer as it can get very messy)
  • [❓] Add support for “agents” (bots) via authentication with JWT fine-grained permissioning.
  • [❓] Explore whether Pushpin or nchan have the ability to natively connect to a remote server's channel and read information in without having an orchestration process continuously poll and then push, in order to support much more arbitrary JSON payloads and use multiple processes without encountering the webapp backend bottleneck.
  • [❓] Support remote database connections to a user schema in order to run arbitrary Django migrations on a schema within a database (to have one database with extensions installed yet run compartmentalized migrations).
  • [❓] Use Packer in order to install Pushpin and PostgreSQl instead of ECS and Docker, in order to remove Docker as a compile-time and runtime dependency.
  • [❓] Consider using NGINX instead of AWS Elastic Load Balancer for a full-fledged independent solution.
  • [❓] Create PostgreSQL users for each TinyDevCRM user with scoped permissions to their schema only, and create db.tinydevcrm.com as a PostgreSQL connection.
  • [❓] Add backups / export to Parquet & email or upload to S3 Glacier feature
  • [❓] Add ability to run migrations from an API
  • [❓] Add high availability for Pushpin, Django, and PostgreSQL.
  • [❓] Add logging / monitoring to the CloudFormation stack definition.

Things I've learned this week

  • I really like how PostgreSQL creates interfaces for internal components. I found a number of different libraries that interface directly with ‘sql_parse’ C library from PostgreSQL, that enable you to create a parse tree from an unstructured query and keep it around in memory. It exactly matches the parse tree generated by PostgreSQL internally, since it's a part of PostgreSQL.

    I stumbled on this as I was looking at validation of queries represented as unstructured text, without having to execute a database write, and I think building validation on top of the internal query parser looks promising in that regards. Ultimately I went with something simpler (check the statement begins with a keyword and that there are no semicolons), since this is an MVP for now, and just lifted the database cursor into a specific schema context.

  • Plus-addressing in email is a lifesaver. I flip-flopped and thought I needed a username field for my custom user model, which would have broken a lot of stuff for my authentication workflow that I'd have to rework. Turns out email is nifty in having plus-addressing capability supported by multiple email providers like GMail or Outlook or Fastmail.

    So you can do me+tinydevcrm@yingw787.com, and that registers as a separate email than me@yingw787.com with Django's models.EmailField, and auto-filters based on what folders you might have in your email. I really like this because it means less data modeling is necessary on my end. Instead, the user model can rely on protocols and use functionality other people have built out.

  • HTTP + PL/pgSQL = friendly RPC. Lifting a function to be callable from a remote server is basically a remote procedural call (RPC)…right? The PostgREST project, which “lifts” PostgreSQL PL/pgSQL functions directly as HTTP endpoints via function overloading, calls those endpoints /rpc. You're hitting an endpoint from somewhere, which then calls a procedure somewhere remotely.

    I like HTTP + PL/pgSQL because everybody knows how to speak it. Fancier protocols like gRPC have their place, especially in an enterprise setting, but they're not as friendly to smaller peeps like me and it's just more stuff I'd have to learn and maintain and can't build on top of. I really like HTTP because there's less of a “walled-garden” mentality with it since the vast majority of devices know how to speak it out of the box, which means lower barriers to shared understanding.

  • HTTP/2 and Server-Sent Events are nice. After looking at different production architectures a bit more, I think I made the right call with going with HTTP/2 and Server-Sent Events. The primary async, persistent protocol for web apps today is WebSockets, but you may have to install additional software in order to read ws://, and that gets into SDKs which gets pretty messy. Otherwise you need to munge with headers, and headers are arbitrarily variable. Other alternatives, like HTTP long-polling, still require client modifications to support the protocol and keep the response alive. Most importantly, new protocols mean drastically different and much more complex architectures. There are projects like postgres-websockets, but most projects use something like Redis as the buffer. This may result in security and cost issues; I don't think RAM can be encrypted, and RAM costs more than disk does.

    My current understanding of HTTP/2 and Server-Sent Events (SSE) is they independently issue keep-alive events to keep the channel open, and send information when it arrives. Everything is handled by the reverse proxy which issues polling requests for data underneath the hood, so the client can be extremely simple. My setup involves curl $ENDPOINT. That's it. That's a major win in my book, since I want to build many different clients (and even backends) connecting to this data layer that will vary far into the future. This makes risk management much simpler and more compartmentalized. Since HTTP/2 SSE also enforces a protocol guarantee that you can't push by blocking the request, you can also guarantee unitary dataflow, which means no merged states and simple, strict consistency, which is what I really wanted from a streaming standpoint. If you wanted to build a directed acyclic graph on top of SSE for various orchestration tasks, you can use something like AgensGraph as a PostgreSQL extension and expose channels when needed without risking consistency issues. This is great for exploring things like idempotency and reproducibility. The nicest thing is, the dependency I use (django-eventstream) can buffer to PostgreSQL automatically, and automatically replay channel events from persist in case the reverse proxy goes down, so the failure model is compartmentalized.

  • Realtime / async changes your system architecture requirements. Not just the fact that I'm using Docker, I know I can replace that with bare EC2 and Packer. But realtime / async. Realtime / async changes everything.

    This Heroku article aptly indicates the difference in complexity. Your architecture goes from this:

    sync arch

    To this:

    async arch

    Which is a lot to take in, and presents a lot of opportunity for confusion.

    For example, I followed slact/nchan, an NGINX extensions module for supporting realtime protocols, but there's some recent issues with fighting the NGINX paradigm that forced the maintainer to consider executing a rewrite as a standalone proxy. This likely isn't the only case; older articles like using uwsgi as an async server also ran into issues, which resulted in the creation of asgi and daphne.

    I currently use pushpin, which is a reverse proxy catered towards realtime APIs. It's great so far, but it introduces a new protocol (GRIP) in order to communicate with the proxy server, and while it's not exposed beyond internal services, I'm a little worried about its evolution and overall adoption outside of Fanout.

    It's hard getting the balance between wanting to re-use and extend existing services, and absolutely having to include something new. I overcompensated in trying to use what I had, but ultimately the solutions there appeared as stopgaps and not the permanent solutions we see more often today.

  • Django Channels is not called django-channels. It's called channels. You run pip install channels in order to install django channels. DO NOT RUN pip install django-channels, because adding it to your Django INSTALLED_APPS setting will cause namespace confliction and an incorrect, silent resolution underneath the hood that prevents the ASGI/Channels development server from running properly.

    This GitHub issue describes the problem; the regular development server will run and any and all dependencies on channels will fail in weird ways (like a Server-Sent Event request not issuing keep-alive and returning an HTTP 200 instead).

  • I wish remote direct channel access is a thing. Currently, what I have in place on the database end is a LISTEN/NOTIFY procedure calling pg_notify() to publish an event on a local channel. This would then be fetched by a Python subprocess running on the Django instance, which grabs the data sent by the procedure from the channel on the database instance, munges it, and sends it to the Pushpin reverse proxy for exposure on the HTTP/2 channel.

    This is problematic for a number of reasons. You have to have a secondary process running apart from the master process, which may cause issues with container lifecycle management and unnecessary inter-process communication, which embrittles your system design.

    What I would really want to know is if Pushpin, or other types of reverse proxies, support “remote direct channel access”, where you can listen to a channel on a remote machine, and if necessary, upload a Python blob in order to execute the munging on middleware in front of the reverse proxy instead of keeping it on the webapp. That way, the webapp can be deployed on containers and scale respective to request load keeping “one container one process”, while the reverse proxy can be deployed on direct compute and have multiple processes running.

    It's not a deal-breaker by any means, I can install an HTTP PostgreSQL extension and call the reverse proxy's HTTP endpoint in order to publish information. I've also not looked into ZeroMQ's pub/sub support, which may enable cross-instance channel sharing at the cost of adding more dependencies.

  • Wireshark / tshark doesn't work properly on Docker networks. There were some issues with Pushpin publishing to a channel, and I wanted to see how the HTTP request was constructed within the Docker container. The stack was pretty hard to follow by stepping into the call stack, and behaved a bit wierdly (like the response was reflected in the stream before I stepped all the way through), so I decided to run Wireshark (or tshark the CLI equivalent).

    I thought Dockerization would make network inspection easier, because you're running a virtual network and there's only so many requests coming from this internal IP, but it turns out that's not the case. I don't know how to use Wireshark in a Docker container. tshark indicated a “Cisco remote” tracer, and couldn't identify eth0 which was the result of ifconfig.

    I think the problem may be ngrok() tunnels forcing an alias between the channels django_eventstream creates and the ones it can publish to, because it aliases the origin. I can publish to Pushpin directly and it shows in the Pushpin logs, but I can't publish to the channel created from django_eventstream, it just doesn't show up. When I publish to django_eventstream.send_event(), it shows up in the endpoint but not in the Pushpin logs.

  • Tracing within dependencies is more difficult with containerization. With local development environments, you can git clone the source and run pip install -e ., then add import ipdb; ipdb.set_trace() in the dependency source files order to see results. With Dockerization, you need to install the dependency and add the trace by writing them into the Dockerfile, which then has to be removed after you figure out the problem.

  • I'm not sure why ETL is a big thing. If you had a JDBC foreign data wrapper (FDW) and could connect to any database that supports JDBC and create foreign table references locally, I'm not quite sure why you need a dedicated ETL service. It might be because you want finer-grained control over transforming data, or because it's too difficult in order to manage the physical resources underneath the hood for a particular driver, but I think now FDW drivers are underrated. You can even have ingest points for streaming tables using something like bottledwater as PostgreSQL extensions.

  • I think one way or another, I'm going to have to try to monetize TinyDevCRM. I'm not sure if I'm fully comfortable moving off of my existing services onto my own cloud unless I have high availability, logging / monitoring, and backups / export capability for TinyDevCRM. That's not investment of time, that's actual, additional boxes that need to run consistently, and that means increased operational costs. I need help managing those costs and that means monetizing the project.

    Sigh. Maybe I'll hold off on the whole HA stuff until I uncomfortably move onto my own bespoke service, and then see who uses it and whether it keels over because too many people are using it. Then probably ask them whether they'd use it for their own purposes.

Subscribe to my mailing list