English 中文(简体)
What is the best practice for formatting logs?
原标题:

I m writing a piece of honeypot software that will have extensive logging of interactions with it, I plan to log in plaintext .log files.

I have two questions, from someone who isn t too familiar with how servers log.

  1. Firstly how shall I break up my log files, I m assuming after running this for a month I don t want one big .log file, do I do this by day, month, year? Is there some standard for it?

  2. The format of each line, do I have one standard delimiter that is whatever, *, -, +, anything? Is there a standard anywhere (my googling hasn t brought up much)?

最佳回答

I like this format for log files:

$ python simple_logging_module.py
2005-03-19 15:10:26,618 - simple_example - DEBUG - debug message
2005-03-19 15:10:26,620 - simple_example - INFO - info message
2005-03-19 15:10:26,695 - simple_example - WARNING - warn message
2005-03-19 15:10:26,697 - simple_example - ERROR - error message
2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message

This is from python s logging module. I usually have a file per day, one folder for each month, one folder for each year. You ll get huge log files that you can t edit properly otherwise.

logs/
  2009/
    January/
     01012009.log
     02012009.log
     ...
    February/
     ...
  2008/
   ...
问题回答

There is no standard for such a logging. And rolling, layout of files, it all depends on what you need. In general I have faced 3 main scenarios:

  • All in one file. Seems not an option for you.
  • Fixed size rolling. You define size when new log file is created once current file is bigger than defined value. Usually there is support out of a box for this in most log4anything packages.
  • Total custom rolling. I ve seen layouts like this
    • Every day gets it s own directory named in format of YYYYMMDD. If you don t stage your logs consider directory layout like YYYYMMYYYYMMDD as shown in other answers.
    • Inside this directory fixed size rolling should be used.
    • Every file has name logfile_yyyymmdd_ccc.log where ccc is increasing number. Adding time to file name is also a good idea (eg. to easily judge how many logs per minute you are generating)
    • To save space every log is compressed with zip automatically.
    • Last 3 days are allways kept uncompressed so you can have a quick access with UNIX text tools.

This custom one looked like this

logs/
  20090101/
     logfile_20090101_001.zip
     logfile_20090101_002.zip
     ...
  20090102/
     logfile_20090102_001.zip
     logfile_20090102_002.zip
   logfile_20090101_001.log
   logfile_20090101_002.log
   logfile_20090102_001.log
   logfile_20090102_002.log

There is also some bunch of good practices for good logging:

  • Always keep date in your log file name
  • Always add some name to your log file name. It will help you in the future to distinguish log files from different instances of your system.
  • Always log time and date (preferably up to milliseconds resolution) for every log event.
  • Always store your date as YYYYMMDD. Everywhere. In filename, inside of logfile. It greatly helps with sorting. Some separators are allowed (eg. 2009-11-29).
  • In general avoid storing logs in database. In is another point of failure in your logging schema.
  • If you have multithreaded system always log thread id.
  • If you have multi process system always log process id.
  • If you have many computers always log computer id.
  • Make sure you can process logs later. Just try importing one log file into database or Excel. If it takes longer than 30 seconds it means your logging is wrong. This includes:
    • Choosing good internal format of logging. I prefer space delimeted since it works nice with Unix text tools and with Excel.
    • Choosing good format for date/time so you can easily import into some SQL databse or Excel for further proccesing.

To break up your log files, you could use an external application like logrotate and let it take care of the dirty work.

As for the format of each line, there s no standard, so you should use what works best for you. If you re going to automatically parse the log file later, then you might want to keep that in mind as you format the log output.

I recommend you use a well-known logging library. Most logging libraries support rollover for you. Log4Net (.net) / Log4J (java) is a particularly good logging library to use, and it has a lot of options that you may find useful. Use whatever rollover interval works best for you. For a honeypot application, I think you will find hourly or daily turnover to work best. You could also use a fixed limit, like 256mb, to ensure that your log efforts don t overrun the available free disk space. Log4Net/Log4J supports this as well.

Log4J @ Apache.Org
Log4Net @ Apache.Org

The format of your logfiles should be setup according to your needs. It is highly desirable to use a delimiter that is unlikely to show up in your log input. For your application, this may not be possible. Under typical circumstances, some parties use spaces (NCSA logs), some parties use commas (to make CSV files), some parties use tabs (to make tab-delimited files). Each of these has their own benefits and drawbacks.

Today (2022) we love structured logging and log indexers (like Elasticsearch or Loki). So we have to log in NDJson (new line delimetted Json).

Because log delivery agents have difficulty with log file rotation (who loves missing log events??) we avoid rotation (no more logrotate!!): instead we name files using date pattern or auto-incremented sequence and define policy to remove outdated files.

Don t work with individual files! Use centralized log search engines!!

https://en.wikipedia.org/wiki/Common_Log_Format is in the past.

A suggestion:

It being for a honeypot system (and unless the baddies are really whacking the application/site), you may consider taking the extra time to log to a database instead.

This will make the analysis and usage of the logs easier, and real-time (i.e. you do not need to go through the ETL process prior to analyzing / browsing the logs.

This said being in a DB table(s) or in file(s), this doesn t preclude the need to define a format. Tentatively, you can have a "polymorphic" format, with a few common attributes (ID, IP address, Timestamp, Cookie/ID, "level" [of importance/urgency]) followed by a short mnemonic code defining a particular event type (say "LIA" = login attempt, "GURL" = guessed url, "SQLI" SQL Injection attempt etc...) followed by a few numeric fields, and a few string fields which semantics will vary as per the mnemonic. To summarize:

 - Id
 - TimeStamp  (maybe split in date and time)
 - IP_Address
 - UserID_of_sorts
 - // other generic/common fields that you may think of
 - EventCode   (LIA, GURL, SQLI...)
 - Message   Text message (varies with particular event instance)
 - Int1      // Numbers...
 - Int2
 - Str1      // ...and text which meaning varies with the EventCode
 - Str2
 - //... ?

Now... regardless of this going to a flat file or to SQL database (and maybe particularly if going to DB), you could/should use a standard logging library. Maybe log4j as suggested in other replies (although I m not sure if it readily has bindings in Python, and anyway, the Python s standard logging module is +/- the same...) or even the Python s standard library s logging module can probably be tailored for your needs.

In my opinion, the most important is:

Log File

  1. Make Log Entries Meaningful With Context
  2. Use a Standard Date and Time Format
  3. Use Local Time + Offset for Your Timestamps
  4. Use Logging Levels Correctly
  5. Split Your Logging to Different Targets Based on Their Granularity
  6. Include the Stack Trace When Logging an Exception
  7. Include the Name of the Thread When Logging From a Multi-Threaded Application




相关问题
Best logging approach for composite app?

I am creating a Composite WPF (Prism) app with several different projects (Shell, modules, and so on). I am getting ready to implement logging, using Log4Net. It seems there are two ways to set up the ...

How to make logging.debug work on Appengine?

I m having a tough time getting the logging on Appengine working. the statement import logging is flagged as an unrecognized import in my PyDev Appengine project. I suspected that this was just an ...

How to validate Java logging properties files?

I have a basic facility for allowing users to remotely apply changes to the logging files in my application. Some logs are configured using java.util.logging properties files, and some are configured ...

Logging SAS scripts

I ve been developing a lot of Java, PHP and Python. All of which offer great logging packages (Log4J, Log or logging respectively). This is a great help when debugging applications. Especially if the ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

热门标签