002 - SOC Analyst Tips pt. 1 [Writing Good Tickets]

Having only just begun my computer information security career I don't have large papers or research documentation to present to the community; but drawing from my limited experience I wanted to write a small series on TTPs I've put together that have drastically improved my speed and quality of analysis. The first part in this series will be on writing tickets.

What and Why

In the SOC our job is to triage incoming incidents and requests and produce business actionable responses whether that be to tune a specific alert or provide analysis on phishing emails to C-level executives. In tracking of our work we write tickets and reports; in my current role I've predominantly written tickets so I will speak to that first. Tickets serve 3 main roles in my eyes:

  1. Provide technical tracking of incidents and alerts such that we can properly action security threats to an organization; whether that mean our initial analysis leads to IR or in 5 months we can look back and see a common pattern from various endpoints or users

  2. Provide audit a way to verify work is being done correctly

  3. And most importantly to the SOC analyst; it provides CYA in the event of audit

With these in mind it becomes obvious that our tickets should always contain complete information; be concise and searchable for future reference; and they should leave nothing to the imagination of audit.

Being Clear and Concise

To make your ticket useful to all parties involved make them as short as possible to provide the necessary level of information required per situation as to prevent extraneous expenditure of time in review or reference of old tickets; or in simpler terms, make the ticket short and simple while still getting your point across

For me this takes the form of having a system for my tickets. All of my tickets have an easy to search 'highlights' section at the top; a bulleted list that contains my analysis; and a reference section to provide a list of all the other tickets/resources I used to come to my conclusion.

Highlights Section

My 'highlights' section is setup so a colleague can skim my ticket and see our key indicators within seconds of opening it; and by listing all of the key indicators out I am making it queryable through our ticketing platform as a keyword.

Highlights Section Example: [A*]

[B*] TLS: 49195,49199,49162,49161,49171,49172,49170,49159,49169,51,50,69,57,56,136,22,47,65,53,132,10,5,4;0,65281,10,11,35,13172,5,13;23,24,25;0 [*1]
IP: 65.52.251.96
Device ID: 82aeb3c2
User Agent String: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36
[C*] User:
b.smith
h.jackson
k.peele
j.doe
r.torres
p.grant

[A] Definitions for the above list
<Indicator Type>: <Key Indicator>

[B] Single key indcators I choose to list on the same line as the indicator type to make the overall length of my document shorter

[C] For multiple key indicators per indicator type I choose to list them on the line below the indicator type so that I can easily copy and paste the list of items with having the option to include the indicator type or not to save time when doing emails or pasting lists to colleagues

Analysis Section

My analysis section is setup to guide the audience through 'the story' of my incident or alert. Just as I went through a series of steps and checks over the course of my analysis; I want to convey those steps to my audience so that without needing to access any of our tooling to fill in the blanks they have the facts necessary to understand my analysis and believe in its integrity. This can also be useful for 'shaping' the response to an incident. Say for example you were to blankly state the fact that an IP was labeled malicious in your TIP program, and provide no context this could raise red flags for a reader; but if in the the following line you explain that:

'Commonly in our TIP program we see VPN hosts and cellular nodes listed as malicious because threat actors can easily rotate through them whenever their actions are complete; at this time we do not agree with label associated with this IP'

Now you have established a reassurance for your audience by acknowledging that 'yes this could be bad' but in this case it isn't. (even though we're writing tickets for our employer, put on your social engineering hat; don't let your audience acquire an incorrect interpretation of your analysis)

With regards to structure; I take each bulleted line to establish the next step in my thought process and paint the clearest picture possible. This can mean one line per tool; this can mean one line per query ran; or this can mean one line per set of actions. The only two hard fast rules I have are that my intro line states the policy name, and what it triggered on if it isn't obvious from the name; AND my conclusion line includes my recommended actions and WHY I came to that recommendation. Outside of that I aim to make my tickets easy to read and make sure they flow. For non incident related tickets my intro line changes to what sparked the creation of my ticket and my conclusion line stays the same.

One other small thing to note. If a tool/resource gave me the information listed in a line I will reference the screenshot or resource that provided me that data.

Analysis Example:

Incident presented with 'Failed Logins Grouped by DeviceID' from 82aeb3c2 [D*]
Source IP is 65.52.251.96 (INC13420-splunk.PNG) [E*]
TIP shows the IP as a benign Azure host in the US (INC13420-otx.PNG)
Commonly cloud hosts are utilized to host VPN servers allowing benign users to access their chosen applications without exposing their home IP [F*]
Review in SIEM over the past 24 hours shows 58 total failed logins from DeviceID with no successful logins; IP, User Agent String, and TLS were also consistent across the set (INC13420-splunk-usernames-1.PNG)
Review of IP in same time frame showed no other activity (INC13420-splunk-usernames-2.PNG)
Doing a 30 day pull separately on IP and DeviceID show no activity prior to today (INC13420-splunk-history.PNG)
Review of failed logins shows no successful passwords were used AND all accounts have SMS/Token MFA enabled (internal-site.com/MFA-policy, see below) [G*]
IP and DeviceID are being added to watchlists to see if this pattern of behavior continues
Activity appears to be a malicious actor attempting password spray of known accounts from an Azure node
No further action is recommended by SOC; all activity that presented failed, all accounts have MFA enabled, and blocking of cloud IPs could prevent benign customers from accessing accounts from their VPN connection [H*]

[D] Intro line with the policy and reason for triggering

[E] Reference to screenshots

[F] Line adding context to the previous line's findings

[G] Reference to other resources

[H] Conclusion line with recommendations and rationale

Reference Section

My reference section is just a simple addition to the chunk of screenshots I upload with my ticket; it is the internal and external resources that provide the additional information to justify and validate my analysis. Depending on your organization and use case this could be very broad or very narrow in focus. This can also be a great tool for tracking occurrences of FPs and providing context to you write-up. Say for example a policy has triggered 15 FPs since it was turned on 2 days ago, you could build your audience buy in by having a reference section like this:

Reference:

INC13560
INC13594
INC13582
INC13576
INC13563
INC13559
INC13541
INC13536
INC13523
INC13518
INC13502
INC13497
INC13485
INC13470
INC13462

Reference Example

References:
internal-site.com/MFA-policy

Final Thoughts

With all of that said so far; I am not saying you have to do tickets my way, but I am encouraging you to experiment and ask your management/peers for candid feedback as to what works best for your environment. Then after receiving that input create a system for yourself AND then also seek to constantly be improving it. I prefer short bulleted lists; I have fellow analysts that do long form sentences in a paragraph format. From personal experience I know that finding key information in a sentence and paragraph format can be difficult; but if it works best for them, then who am I to judge.

Having a system in place allows my brain to know what boxes go on what shelf, and that leaves me the most mental bandwidth to provide analysis and stay attentive to our communications channels. Efficiency is a great goal to chase, and I will be covering tips on that later; but establishing your system that works for you is going to be one of the biggest things to help early on.

Example Ticket

TLS: 49195,49199,49162,49161,49171,49172,49170,49159,49169,51,50,69,57,56,136,22,47,65,53,132,10,5,4;0,65281,10,11,35,13172,5,13;23,24,25;0 [*1]
IP: 65.52.251.96
Device ID: 82aeb3c2
User Agent String: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36
User:
b.smith
h.jackson
k.peele
j.doe
r.torres
p.grant

Incident presented with 'Failed Logins Grouped by DeviceID' from 82aeb3c2
Source IP is 65.52.251.96
TIP shows the IP as a benign Azure host in the US
Review in SIEM over the past 24 hours shows 58 total failed logins from DeviceID with no successful logins; IP, User Agent String, and TLS were also consistent across the set
Review of IP in same time frame showed no new activity
Doing a 30 day pull separately on IP and DeviceID show no activity prior to today
Review of failed logins shows no successful passwords were used AND all have SMS/Token MFA enabled (internal-site.com/MFA-policy)
IP and DeviceID are being added to watchlists to see if this pattern of behavior continues
Activity appears to be a malicious actor attempting password spray of known accounts from an Azure node
No further action is recommended by SOC; all activity that presented failed, all accounts have MFA enabled, and blocking of cloud IPs could prevent VPN access

References:
internal-site.com/MFA-policy

References

[*1]: https://github.com/platonK/tls_fingerprints