I had a good time at AWS re:Invent 2017 last week, despite being sick as a dog for most of it. Though I caught fewer sessions than I would have liked, the ones I did attend on serverless topics were top notch. Here are some links to my favorites:
- Serverless foundations
- Web applications
- Data Lakes
- Stream processing
- Operations automation (e.g., Tailor, for automating AWS account creation)
- Excellent review of best practices and new features in Lambda
- Optimization Katas
- Lean Functions
- Eventful Invocations
- Coordinated Calls
- Serviceful Operations
- Cold start issues in Lambda
- Instrumenting Lambda with XRay
- Resource allocation
- Concurrency vs. latency
- Compelling customer story from ACloudGuru’s VP of Engineering on going 100% serverless
- Announced Serverless Application Repository
- Reviewed new Lambda console
- Reviewed new Lambda features
- Reviewed Cloud9 IDE
- Reviewed XRay tracing for Lambda
- New API Gateway features
- Compelling customer story from FICO’s VP of Engineering
When using SNS pub/sub components, a common integration pattern is to use Lambda to process SNS messages. This can include the use of data blobs as the SNS payload for doing file processing, data transformations, and archiving data in S3 among other things. SNS messages have a large payload limit of 256KB per message, but I recently ran into a situation where I could not reliably deliver messages that were sized well under that limit.
As it turns out, when Lambda is consuming your SNS large payloads in events, you hit a limit within Lambda that is exactly half of the SNS payload limit. For Event (asynchronous) invocations in Lambda, there is a 128KB payload limit. So, if your SNS messages are not being processed by Lambda, check the size of the messages and verify that they are below 128KB. This was a confusing problem until I looked at the CloudWatch console for SNS message deliveries and noticed the errors there.
I’ve recently been working on a streaming component in a project and have been spending a lot of time with both Kinesis Streams and Kinesis Firehose. I tend to think of the two as event queue frameworks, with Firehose also having the ability to forward events to other AWS services like ElasticSearch (for Kibana-style dashboarding) and backup the same data to a S3 bucket. If you don’t need either of those destinations, then most likely you will get plenty of mileage out of working with Streams alone.
Potential uses abound, but one powerful pattern is making Kinesis a destination for CloudWatch Logs streams via subscription filters. By creating a Kinesis stream and making it a CloudWatch log destination in one account, you can readily add CloudWatch subscription filters in other accounts to create a cross-account log sink. Once your CloudWatch Logs are in one or more Kinesis Streams shards, you can process that log data via Lambda and/or possibly forward to Kinesis Firehose for ES/S3 delivery. There’s a great blog post over at Blend about this exact sort of usage, including a link to their GitHub repo for the CloudFormation templates they use to build and deploy the solution.
One of the best overviews I’ve read recently about design and scale-out issues around event queue processing and how Kinesis resolves, by design, a lot of the challenges therein (e.g., data duplication, ABA problems) is by the fine folks over at Instrumental, entitled “Amazon Kinesis: the best event queue you’re not using“. If you are considering using Kinesis at scale, or are already designing/deploying a consumer/producer pattern to be used with Kinesis, I highly recommend you check out the Instrumental blog post.
Python’s logging module provides a powerful framework for adding log statements to code vs. what might be done via using print() statements. It provides a system of logging levels similar to syslog-style levels that can be used to produce both on-screen runtime diagnostics as well as more detailed logs with full debug level insights into per module/submodule behavior.
Managing usage of logging() can be complicated, especially around the hierarchical nature of the log streams that it provides. I have developed a simple boto3 script that integrates logging to illustrate a basic usage that is easy to adopt and, in the end, not much more work than using print() statements. For detailed information on logging beyond what I present here, consult the excellent Python docs on the topic, as well as the links in the References section at the end of this post.
The setup for logging() that I am using involves two configuration files, logger_config.yaml and logger_config_debug.yaml. The difference between the two files has to do with the log levels used by the log handlers. By default, the example module deployVpc.py uses the logger_config setup. This config will produce no screen output by default except at the ERROR level and above. It produces a log file, however, that contains messages at the INFO level for the module and at the WARNING level for boto-specific calls.
Note: boto (including botocore) ships with some logging() active at the INFO level. While not as detailed as DEBUG, there’s enough busyness to that level of logging by boto that you will likely want to not see its messages except when troubleshooting or debugging your code. This is the approach I took with the current configuration, by opting to set custom logger definitions for boto and friends, so that the root logger will not by default display boto’s native log level messages.
Let’s take a look at the default logging configuration file I’ve put together, logger_config.yaml:
--- version: 1 disable_existing_loggers: False formatters: simple: format: "%(asctime)s %(levelname)s %(module)s %(message)s" fancy: format: "%(asctime)s|%(levelname)s|%(module)s.%(funcName)s:%(lineno)-2s|%(message)s" debug: format: "%(asctime)s|%(levelname)s|%(pathname)s:%(funcName)s:%(lineno)-2s|%(message)s" handlers: console: class: logging.StreamHandler level: DEBUG formatter: simple stream: ext://sys.stdout screen: class: logging.StreamHandler level: ERROR formatter: fancy stream: ext://sys.stdout logfile: class: logging.handlers.RotatingFileHandler level: DEBUG formatter: debug filename: "/tmp/deployVpc.log" maxBytes: 1000000 backupCount: 10 encoding: utf8 loggers: boto: level: WARNING handlers: [logfile, screen] propagate: no boto3: level: WARNING handlers: [logfile, screen] propagate: no botocore: level: WARNING handlers: [logfile, screen] propagate: no deployVpc: level: INFO handlers: [logfile, screen] propagate: no __main__: level: INFO handlers: [logfile, screen] propagate: no root: level: NOTSET handlers: [console, logfile]
I chose to use YAML for the configuration file as it’s easier to parse, both visually and programmatically. By default, Python uses an INI file format for configuration, but both JSON and YAML are easily supported.
At the top of the file is some basic configuration information. Note the disable_existing_loggers setting. This allows us to avoid timing problems with module-level invocation of loggers. When logging per module/submodule, as those modules are imported early in your main script, they will not find the correct configuration information as it’s yet to be loaded. By setting disable_existing_loggers to False, we avoid that problem.
The remaining file consists of four sections:
- root logger definition
Formatters are used to define the log message string format. Here, I am using three different formatters:
- simple – very simple and brief
- fancy – more detail including timestamp for a helpful log entry
- debug – fancy with module pathname instead of module name, useful for boto messages
By default, I leave simple for the console handler (for root logger), use fancy for the screen handler, and debug for the logfile handler.
Handlers are used to define at what level, in what format, and exactly where a particular log message should be generated. I’ve left console in its default configuration, but added a StreamHandler and a RotatingFileHandler. Python’s logging module supports multiple types of handlers including Syslog, SMTP, HTTP, and others. Very flexible and powerful!
- console – used by the root logger
- screen – log ERROR level and above using fancy formatting to the screen/stdout
- logfile – log DEBUG level messages and above using debug formatting to a file in /tmp that gets automatically rotated at 1MB and retention of 10 copies
Loggers are referenced in your code whenever a message is generated. The configuration for a given logger is found in this section of the configuration file. In my case, I wanted a separate logger per module/function if necessary, so I’ve made entries at that level. I also include entries for boto and friends so I can adjust their default log levels so I don’t see their detailed information except when and where I want to (i.e., by logging at WARNING instead of INFO or DEBUG for normal operation). A logger entry also defines where log streams should end up. In this case, I send all streams to both my screen handler and my logfile handler.
I also don’t want custom loggers to propagate messages throughout the logging hierarchy (i.e., up to the root logger). So I’ve set propagate to “no”.
Implementing logging in code
I created a module called loggerSetup.py which is where I do the initialization for defining how logging() will be configured, via the configuration files:
#!/usr/bin/env python """Setup logging module for use""" import os import logging import logging.config import yaml home = os.path.expanduser('~') logger_config = home + "/git-repos/rcrelia/aws-mojo/boto3/loggerExample/logger_config.yaml" logger_debug_config = home + "/git-repos/rcrelia/aws-mojo/boto3/loggerExample/logger_config_debug.yaml" def configure(default_path=logger_config, default_level=logging.DEBUG, env_key='LOG_CFG'): """Setup logging configuration""" path = default_path value = os.getenv(env_key, None) if value: path = value if os.path.exists(path): with open(path, 'rt') as f: config = yaml.safe_load(f.read()) logging.config.dictConfig(config) else: logging.basicConfig(level=default_level) def configure_debug(default_path=logger_debug_config, default_level=logging.DEBUG, env_key='LOG_CFG'): """Setup logging configuration for debugging""" path = default_path value = os.getenv(env_key, None) if value: path = value if os.path.exists(path): with open(path, 'rt') as f: config = yaml.safe_load(f.read()) logging.config.dictConfig(config) else: logging.basicConfig(level=default_level)
This module defines two functions: configure() and configure_debug(). This provides another way of running a non-default logging configuration without using the LOG_CFG environment variable (i.e., on a per-module basis). When you setup logging in your module like so:
loggerSetup.configure() logger = logging.getLogger(__name__)
You would simply edit the first line to use .configure_debug() instead of .configure().
Usage is straightforward, simply do the following in each module you wish to use logging(). Refer to the deployVpc.py script for the full syntax and usage around these bits of code.
Note: deployVpc.py requires use of AWS API key access that is stored in a config profile (I used one called ‘aws-mojo’, change to your own favorite profile). It will create a VPC and Internet Gateway in your AWS account. But it will also, by default, remove those objects as well. Caveat emptor…
- Import the logging modules and loggerSetup module
import logging, logging.config, loggerSetup
- Activate the logging configuration and define your logger for the module
loggerSetup.configure() logger = logging.getLogger(__name__)
Note: By using
__name__instead of a custom logger name, you can easily re-use this setup code in any module.
- Add a logger command to your code using the level of your choice:
logger.info('EC2 Session object created')
That’s all there is to it. Below are some screenshots that show the handler output (screen and logfile) for both the default and debug configurations. Hopefully this will encourage you to look at using Python’s logging() framework for your own projects.
The full source for all of the logging module configuration as well as sample boto script is available over on GitHub in my aws-mojo repository.
Example: Default configuration – output to screen handler (should be no output except ERROR and above)
Example: Default configuration – output to logfile handler (should be messages at INFO and above for your code and at WARNING and above for boto library code messaging)
Example: Debug configuration – output to screen handler (should be messages at INFO and above for your code and at WARNING)
Example: Debug configuration – output to logfile handler (should be messages at DEBUG and all levels for your code and boto library code messaging)
- Python Logging Cookbook
- Good logging practice in Python
- Diving into Python logging
- Understanding Python’s logging module
- Logging and the logging module
- Python Logging 101
So, it’s been almost a couple of weeks now of hardcore Visual Studio Code (VSC) usage on my part. I have to say, it’s fantastic. Not a single crash after a solid two weeks of varied development (CloudFormation, Python, shell, and HCL (Terraform)) and at some points, intense Git activity with different repositories and different SCM endpoints. It’s easily 30% more performant than my Atom environment ever was.
The rock solid Git integration is the one feature I appreciate the most. It really works well with everything I do on a regular basis. I did install the GitEasy package just to see if it added anything beyond the built-in support. So far, I only use one GitEasy command reliably, and that’s GitEasy:PushCurrentBranchToOrigin.
I’ve also been able to increase my normal productivity after I installed a marketplace extension called “macros“. I use macros to automate combinations of git commands I often chain together manually (as well as any other keybindings I see fit to construct).
Nice job, Microsoft. I think that’s the first time I’ve said those words after nearly 30 years in technology.
I read a blog post the other day about Visual Studio Code vs. Atom. I was surprised to hear so much positivity about Code, but also confess that may be my experience-based choice of deafness to anything extolling the virtues of anything by Microsoft. And before you bust my chops on that, note the “experience-based” and accept that I may have a valid stance after over 25 years as a technology professional…. but, I digress.
I’ve been using Atom exclusively for about 18 months now, hours upon hours, day after day. I absolutely, and obsessively, love this editor. I have it tweaked and configured perfectly for my workflow and coding style.
I run 99% of my git commands within Atom via the git-plus package, and manage my repos with the Project Manager package. I have both vi/vim and ex capable command shortcuts and keystrokes, all of which reflect decades of motor memory and are very important for my productivity. In fact, I’d say they are critical to it. Not long ago, my particular configuration hit a regression bug in the deprecated built-in vim support for Atom and it brought my productivity down to more of a bad limp in terms of cadence until I was able to migrate to the vim-mode-plus community package as a replacement; I had avoided doing so because that package, until recently, did not integrate with the ex-mode command package I relied on as well. That’s all resolved now, but it sure did create a disturbance in the Force for a bit.
I use lots of other packages as well for linting different languages including Python, Ruby, JSON, CloudFormation, Ansible, and Terraform. I appreciate the easy-on-the-eyes color themes I’ve found, my current combo is Atom Dark for the UI and Gruvbox Plus for Syntax. Atom is just freaking great!
But, hey, I’m all for trying new things, just to say I’ve tried them. Especially when I see a lot of other folks buzzing about something…
Visual Studio Code is blowing me away.
I installed the latest Mac version and have been running it for 24 hours now, side by side with Atom. The interface is nearly identical to Atom. The command keystrokes and palette can be made the same by installing Atom keymap support. There are packages in the “Marketplace“, for free, that give me all the extras I rely on with my configured Atom environment. And, on top of all that, it’s faster and uses fewer system resources. It also has the feel of a true IDE and not just a fancy editor, with built-in debugging facilities, built-in git support, etc.
Now, I’m not about to jump ship completely from Atom. It’s been too good to me for that. But, I’m giving Visual Studio Code a solid trial run. I want to find its shortcomings and compare those with Atom. And then I’ll make a tough decision.
Kudos to you, Microsoft. This may be the best product you’ve ever made.