Logging in Python

Updated: October 6, 2023

Edit this Page via GitHub       Comment by Filing an Issue      Have Questions? Ask them here.

Logging in Python is performed through the simple and elegant logging module which comes in the standard Python library for both Python 2 and 3. The logging module enables developers to produce structured log messages and direct those log messages to a variety of outputs including console, files, TCP/UDP socket, syslog, and SMTP emails.

Recommended related reference and reading relating to working with logging in Python can be found here:

Primary Audience

  • Application developers, particularly those creating services which might be hosted remotely or called by other users.
  • Long-running analysis/batch developers who want regular in-processing status updates / checkpoints saved for later review; in some cases if an analysis crashes and the appropriate information has been logged to that point, it may be possible to restart the analysis from the crash point.
  • Anyone developing code which might be hosted on a machine separate from their own and who wants to be able to have logs available for any kind of debugging / performance analysis purposes.

Why Logging?

But I can just print stuff I need to see… why would I bother using a logger?

If you are writing applications which you expect others might use, might run for long periods of time, or are dependent on external services which could cause errors in your application if they are not working properly, chances are you may not be able to watch all your print statements fly across the terminal for hours on end. Or perhaps another user encounters an error while using your service, and replicating the error is simply impossible because you don’t know what all the inputs were, whether an external service was down, or possibly even what the error was. In these cases you’ll want to have logs detailing the use of the application leading up to the crash, hopefully capturing what went wrong and enough parameters of the event to give you insight into how to fix the problem without necessarily having to replicate it. Python loggers can be set up to automatically add contextual information such as line number the log message occurred on, timestamps, function / module, stack traces, and more, all of which can be vitally important when doing post mortems on a failure.

In more advanced use cases one can use retrospective log analysis on structured logs to profile an application and view metrics like how often components of your application are used, identify bottlenecks, and track down areas of the code where bugs seem to occur more often. When running in AWS Lambda, EC2, ECS, or Batch, the logging module automatically routes messages to CloudWatch, which can be essential for troubleshooting your code when executing in the cloud.

Getting Started

The quickest and simplest way to start logging is like this:

import logging
logging.basicConfig()
logging.warning('Blargh')

The logging module’s .basicConfig() method will setup the logging module to log messages of level WARNING and above (i.e. ERROR, EXCEPTION) to the console. You can provide a format string (more on Formatter classes and adding contextual informaation later) to your .basicConfig() call to format the structure your log messages for this logger will appear. By itself this way of initializing your log facilities has fairly limited capabilities, but can be further customized using some of the objects and methods listed below.

Basic Logger

Setting up a basic logger without calling .basicConfig() is very simple.

import logging

logger = logging.getLogger('new_logger')
logger.setLevel(logging.DEBUG)

stream_handler = logging.StreamHandler()
logger.addHandler(stream_handler)

logger.info('Hello world!')

If you run the code above you should see Hello world! has been output to the terminal, much like a print statement would. Knowing this toy example doesn’t really provide any obvious additional functionality to print, let’s explore what’s happening here.
In the first section a Logger instance is created. Next, a StreamHandler is created, which by default will send log messages to the console. Finally, Hello world! is outputted to the console by invoking the info method on the Logger instance.

Formating

We can quickly make this basic logger much more interesting by using a Formatter, which will add contextual information to every log message which the Handler it is associated with emits. The contextual information is added to the string using special key words within the %()s notation; more on this later.

import logging

logger = logging.getLogger('new_logger')
logger.setLevel(logging.DEBUG)

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s - line %(lineno)s')
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)

logger.addHandler(stream_handler)
logger.info('Hello world!')

Running this code, you should see something like this:
2020-01-23 14:46:29,928 - new_logger - INFO - Hello world! - line 11

Filtering Logs

A particularly helpful aspect of logging statements is the different contextual levels they have, allowing you to filter out log messages of different severity without redeploying the code. For instance, I could use an environment variable or configuration file where the desired log level is set. When the program runs this could be used to set the logging module level, meaning all logging calls below that won’t get reported. As an example you might want a bunch of .debug() calls around a particularly tricky piece of logic when running in your test environment, but perhaps want to ignore .debug() and only see.error() and above messages in another environment. You could have a LOG_LEVEL environment variable which is then used in the code to set your logger levels to their desired state, without having to touch the code. Example below:

import logging
import os

LOG_LEVEL = os.getenv('LOG_LEVEL')

logger = logging.getLogger('new_logger')
logger.setLevel(LOG_LEVEL)

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s - line %(lineno)s')
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)

logger.addHandler(stream_handler)
logger.info('Hello world!')

Now let’s examine the different components of a Logger to better understand what’s going on under the hood.

Anatomy of a Logger

Logger

A Logger instance is created when logging.getLogger() is called. Without any arguments this method returns a reference to a global root logger instance, which is stored in the logging module. By providing a string representing a name to getLogger() a child logger of the root logger is created. Loggers have a concept of log propagation up a chain of related parent-child loggers, which can be useful in more complex situations but will not be covered here. For more information on hereditary log propagation please see the Python Logging Cookbook and / or the Python Logging Docs.

Handlers

A handler is a critical component of a Logger instance, and provides protocols for directing log messages to a wide variety of different possible output sources. Some of the more common handlers include StreamHandlers for writing to various output streams such as STDOUT and STDERR, and FileHandlers for writing records directly to a file. It is common to want to see the output of your code in the terminal when you run it, but you might also want output logged to a file somewhere in case your project is run on a cron or run by other remote users. There are a lot of different types of handlers, including email, socket handlers, rotating file handlers, HTTP handlers, and more but I’m not going into them here.

Different log handler classes can be found listed here

Formatters

Formatter objects determine the structure that every message will be reported in, and allow you to add contextual information to every log message, such as line number, module name, function name, and date / time. The list of contextual information you can add to your message by embedding the key words into your Formatter string can be found here. Some of the attributes I find myself using quite often are %(lineno)d (line number), %(asctime)s (timestamp), %(filename), (funcName)s, and %(module)s.
There are also some handy formatting attributes for reporting what thread or process a message is coming from if you’re working in a concurrent environment.

When creating a Formatter object, it simply must be added to a Handler class instance using the .setFormatter() function, as seen in the earlier example. By attaching different Formatter objects to different handlers, you can provide different levels of detail to different outputs. For instance maybe I don’t want the module, function, and file name cluttering up my console, so I could attach a formatter to it which doesn’t have those attributes. But I want my file handler to have as much information as possible so I can better reconstruct events when doing retrospectives (example below).

import logging
logger = logging.getLogger('new_logger')
logger.setLevel(logging.DEBUG)
file_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s - line %(lineno)s - module %(module)s - function %(funcName)s - file %(fileName)s')  # File record formatter provides lots of information, better for automated log analysis and post-mortems
file_handler = logging.FileHandler(filename='verbose_logger.log', mode='a')  # define where the log will be written.  mode parameter will determine whether to append to log if it exists ('a') or write over file ('w'). 
file_handler.setFormatter(file_formatter)

stream_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s - line %(lineno)s')
stream_console_handler = logging.StreamHandler()
stream_console_handler.setFormatter(stream_formatter) # Stream formatter carries less information to clutter up terminal less
logger.addHandler(stream_console_handler) # Now log messages recorded by logger will be printed to console
logger.addHandler(file_handler)  # Now log messages recorded by logger will be recorded to file specified in logging.FileHandler() call

Filters

Filters are attached to Handlers and provided filtering of messages from below a specified point in the logger hierarchy of which the Logger that the attached Handler is a part of. We didn’t go through logger hierarchies too much, but say we have a logger hierarchy like:

- base_logger  
- - module_logger  
- - debug_logger

where the parent-child relationships of our loggers can be described as base_logger.module_logger and base_logger.debug_logger. A Filter object can be attached to a Logger instance in order to prevent it from propagating messages from specific parts of the logger hierarchy. Filter objects can also be attached to specific handlers as well, in order for more nuanced control of which loggers are emitted to a particular output location.
For instance, if we were to want to have messages from our module_logger propagate to the base_logger, but did not want any messages from debug_logger propagating up to the base_logger and its handlers, a Filter object could be attached to the base_logger specifying that it should ignore any messages recorded by debug_logger

Log levels

Both Logger and Handler instances have an attribute called .level, which determines the severity of messages the Logger will emit. Each log level is represented by a decimal number internally, although they can be referenced in a variety of ways. The following table shows the various built-in log levels (although one can create their own log levels as well).

Level Int Representation String Representation
DEBUG 10 ‘DEBUG’
INFO 20 ‘INFO’
WARNING 30 ‘WARNING’
ERROR 40 ‘ERROR’
CRITICAL 50 ‘CRITICAL’

It is also possible to create your own log levels, a process which is described in the Logging Cookbook.

Where/When to Implement Logging

I like to log early and log often, especially during the development phase of a project. In particular, I find often find myself logging the responses to remote calls so I can always have a reference in my log of what the response looked like before I start processing it. When writing REST APIs I often will log the raw request prior to any processing, so that when something goes wrong I can look at what data I started with and investigate from there. I’ve found logging to be invaluable for debugging stateless applications in AWS, as they pick up log messages emitted to STDOUT from your program so I can log state at various points in the application to better understand what might have gone wrong. In most of my projects I create a function to create or retrieve a pre-configured Logger instance early on, and use it constantly. I then use environment variables to set the log level of the service to reduce log message volume once the service is more stable.

A simple log function that I often keep in the src/init.py folder of my projects often looks like this:

import logging
def get_logger(log_name:str = None, stream_logging_level: str = logging.INFO, file_handler: bool = False, 
               log_file_path: str = ''):
    """ Create a logging instance that will log to console (or Cloudwatch if running on AWS), with an optional file handler.
        Logging levels can also be set using the top-level constants provided in the logging module, or an integer value
        corresponding to the desired logging level as described in the documentation.  When file logging is enabled logs
        will be printed in the /logs subdirectory using the file name 'registrar_log_{YYYYMMDD}.txt', where YYYYMMDD
        represents a day of log captures.
    """
    if not log_name:
        log_name = 'base_logger'
    
    logger = logging.getLogger(log_name)

    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(stream_logging_level)

    formatter = logging.Formatter(
        '[%(levelname)s] | %(asctime)s | %(message)s | Function: %(module)s.%(funcName)s | lineNumber: %(lineno)s |')
    ch.setFormatter(formatter)
    logger.addHandler(ch)  # Now any messages recorded with logger.info() will be send to the console

    # Add file handler if requested by caller - could be set by an environment variable as well.
    if file_handler:
        if not log_file_path:
            raise Exception(f"Log handler file path {log_file_path} is in a directory that doesn't exist")

        fh = logging.FileHandler(filename=log_file_path)
        fh.setLevel(logging.INFO)
        fh.setFormatter(formatter)
        logger.addHandler(fh)  # Now any messages recorded with logger.info() or higher will be sent to the console via
                               # the earlier-attached stream handler, as well as to a file by the newly attached FileHandler.
    
    return logger

    return logger

Logging and AWS

As mentioned before python logging instances will emit their log records to the AWS CloudWatch log group / stream for most of their services (I’ve seen it wth EC2, ECS, Batch, and Lambda so far). By leveraging structured logging through python’s custom log formatting, you can start to use CloudWatch Insights to query across log groups for messages pertaining to any of the attributes you’re logging.

Log messages a presented in a pretty nice format on CloudWatch:
cloudwatchlogs

While it’s not necessary to use in order to produce log messages to CloudWatch, there is a project out there called WatchTower which can really help consolidate your logs into particular user-story focused log streams that can track logs from across multiple services if they all use WatchTower to point to the same log group / stream. I’ve found WatchTower helpful for consolidating Lambda logs as by default a new stream is created for every new lambda instance, which can get really confusing, particularly if you have concurrent executions. I have had some issues (which may have more to do with how I’m creating the WatchTower handlers and my application structure) with some of the functions hitting AWS service limits on API calls like Create/DescribeLogGroups and Create/DescribeLogStreams, which can have have an affect on responsiveness and at worst can crash the Lambda call if the exceptions aren’t handled properly. This is particularly the case when the service is under a high load. If someone has another solution for this I’d love to hear about it!

Updated: October 6, 2023

Edit this Page via GitHub       Comment by Filing an Issue      Have Questions? Ask them here.