~ 6 min read

🧪 Using YARA for log analysis

Written by

An initial exploration of how useful a SAST tool like YARA could be for analyzing Gitaly and Redis logs when troubleshooting and performing systems administration tasks.

🌊 Hello, world!

I first started using static analysis tools like yara when I worked as an Information Security Analyst. Since leaving that role, the nature of my work has changed a bit but I have always found myself wondering:

  • 🤔 Can static analysis tools like semgrep and yara be useful for helping support engineers to solve certain kinds of problems?

YARA is primarily aimed at malware researchers but is broadly useful enough to be applicable for a range of other purposes. In this post, I am going to 🧪 experiment a bit and share thoughts and considerations for using YARA to ease log analysis. I’m going to be analyzing logs generated from one of my GitLab instances with a focus on the gitaly and redis services.

  • 🔖 Get your copy of the YARA rules I put together to accompany this post: 🖇 brie/yara-gitlab-logs.

🔑 Key Findings

If you do consider adopting YARA rules for log analysis on your team, consider:

  • 🗂 YARA permits you to organize rules by tags. With care, a robust ruleset that supports a broad range of log analysis purposes. (For example: you could organize your rules by service name (nginx, redis, postgres, etc) and by rule purpose (health checks, incident response, troubleshooting, things that are safe to ignore, etc).
    • Because YARA rules are simple text files, they can be versioned and your team can work together on maintaining and improving the ruleset.
    • Plan out your organizational scheme for your rules.
  • 👋 I would strongly recommend using regular expressions (instead of text strings) when writing YARA rules. (You get better matches this way: I explain why below.)
  • ⚖️ Remember the 80/20 rule. The idea here isn’t to generate a set of rules that capture every possible scenario. The goal is to maximize impact for value and capture the most impactful and important rules. Consider your team’s tolerance for false positives and false negatives.
  • 📊 I did not take performance into consideration at this time. Depending on how much data you’re analyzing, the resources of the machine you’re running YARA on, your patience and how many rules you assemble, you may wish to take performance into account. To get started, take a look at:
  • 😭 If you include emoji in the metadata for a YARA rule, they will not be rendered properly.

🎋 YARA’s Suitability for Log Analysis

🧮 Use regular expressions in your rules

YARA supports regular expressions. Since version 2.0 (released in 2014), YARA uses its own regex flavor that implements most PCRE features. One really good reason to use regular expressions (instead of text strings) for log analysis is the ability to view the entire line that matches the rule. To illustrate, let’s say we want a rule that tracks when the Repository Counter in the Gitaly service starts and stops counting repositories. Here are how the matches would differ depending on whether the rule used text strings or regular expressions. A rule that uses text strings would work like this:

yara -s -t CONFIG --recursive \
    rules/gitaly/gitaly_storage_paths_text.yara /var/log/gitlab/gitaly
gitalyStoragePathText /var/log/gitlab/gitaly/current
0x603e:$start_count01: starting to count

By comparison, a rule that uses regular expressions would work like this:

# yara -s -t CONFIG --recursive \
    yara-gitlab-logs/rules/gitaly/gitaly_storage_paths.yara \
gitalyStoragePath /var/log/gitlab/gitaly/current
0x603e:$start_count: starting to count repositories","pid":1758,\
0x61fe:$complete_count: completed counting repositories",\

(When using hex strings, YARA supports wildcards and the not operator.)

ℹ️ Use the metadata

One of the nice things about YARA is the -m flag. With -m or --print-meta, YARA will print information about the rule that is matched. With a well-maintained ruleset, this can be a great way to link error messages and other log entries directly with relevant information (including links to the documentation, troubleshooting guides and the ✨ very source code ✨ that generated said error). Here’s an example to illustrate:

Let’s say we are looking through the logs from a GitLab instance as we troubleshoot Redis and we want to find out how often the redis service declares itself ready. On a healthy system, there shouldn’t be too many matches for this rule $ready_to_accept string in the redisNowReady rule. Here’s what the support engineer would see as they run YARA (and add a bit of 🪄 CLI magic):

 # yara  -m  yara-gitlab-logs/rules/redis/redis_ready.yara \
     gitlabsos/var/log/gitlab/redis/current-manyaccept   \ 
     | grep '\[' | cut -d"[" -f2- | cut -d"]" -f1 \
     | gsed 's/,/\n/g'
author="Brie Carranza"
description="Check the Redis version and whether it is up and ready to accept connections"

🎬 Demo Time

WATCH checking Redis version and readiness:


🤭 Additional Reading

If you are still here and found all of this interesting: neat! I indulged quite a bit in the things that I find interesting in this post and I’m glad this resonated with you. You may find these resources as interesting as I do:

🤓 I keep these tabs open when writing YARA rules:

✨ Parting Words and 🐾 Next Steps

  • There is an undeniable learning curve for tools like YARA.
  • YARA’s modules are incredibly cool but YARA’s primary purpose for malware analysis is especially clear here. Few of the existing modules would aid log analysis but the fact that YARA can be extended in this way adds to the appeal.

I expect to continue tinkering with YARA and log analysis. I will 💯 definitely say more right here at brie.dev if I do.

It had been on my mind to write this blog post for a while. I am super passionate about this topic and I would love to hear about fun and interesting uses of tools like yara (or semgrep) for easing log analysis. 📯 Please do let me know if you try something cool!