Achieving 3 Clicks to Issue Resolution via ASM

A quick UI and visual intro to the ASM Portal.

This table will provide a brief overview of using Apica Synthetic Monitoring in the portal.

What you see above is the home dashboard of ASM.

These checks are organized by check function, but you can set these up in whatever way is relevant to your teams, whether by application business unit, the region that these checks are running, production or status, criticality; it's really up to you.

We can get some quick insight into all the virtual traffic from Apica to your site in the dashboard. This traffic includes your applications, externally coming from Apica's Saas network or internally from an Apica internal agent running behind your corporate firewall. All that data is going to be aggregated here in the dashboard.

Step/Discussion

Screenshot

Step/Discussion

Screenshot

So first off, I want to point out that all the events coming into the dashboard are mapped at a severity level; those will be our greens, yellows, oranges, and reds.

These correspond to Information, Warning, Error, and Fatal.

The two reasons that we have these four levels (rather than a simple just pass-fail, or a red, yellow, green) is

1. SLA Precision. To calculate a more precise impact on our applications SLAs. So this information warning will not impact our SLA, whereas the orange one or red will. So you can set the SLA thresholds on a rolling average or standard deviation of dominant or active timings.

2. 3rd Party Impact. Flagging third-party failures as warnings but not count these against your core SLAs. So, you can ignore failures that may happen on a third-party tracking pixel within a browser scenario. So that's one of the ways that we allow you to be flexible in defining multiple severity levels for your SLAs.

Fewer False Positive Alerts. Very importantly, these additional levels allow you to have more categories to handle how you create alerts. Each one of these severity levels is customizable. More in the advanced feature overview or some great resources in the Apica Knowledge Base for customizing the severity levels. Your policies are what help define as a pass or a failure. If you're familiar with other synthetics, you also know that the more you can identify potential false positives, the less alert-fatigue that your response teams will have in responding to the alerts coming from your monitoring applications.

Three Clicks to Resolution: In the dashboard, there are three levels of detail coming from this top-level.

The 1st level is the dashboard.

At this level you can see the current Up/Down status of the check as well as in segments of 6 hours that cover the last 24 hours.

Tip: You can bypass the 2nd level, and directly to the 3rd level (a single check run) right from the dashboard by hovering over one of these status bars for the last 24 hours and clicking into one of the points of the trendline.

The 2nd level will be a detailed view of a particular check.

The 2nd level isolates a single check and presents 4 sections

  1. The selected Return Value over time (the past 48 hours)

    1. As you hover over this line, you can click on any of the round nodes and drill down to the results.

  2. The SLA over the last 12 hours

    1. You can change the sample period to the current month or a custom level and inspect the SLA over time.

  3. A 24 hour status of the check widget, the check statistical performance trend compared to the prior runs.

    1. This widget is essentially a blown-up view of what we saw on the dashboard screen, with a few more details. This adds a few more check actions, like running a check ad hoc, changing the check settings directly, analyzing the metrics (e.g. comparing this check with another running from a different agent, or correlating one metric with another) or managing the alerts that this check might have.

  4. A check Trends area that allows you to see at a glance a breakdown of the the check runs and the results in terms of Severity Level (Green I, Yellow W, Orange E, Red F) and the Returned Value statistics (median, mean, min, max and standard deviation) of where there

The 3rd level is going to be a run result level detail.

This view of a single check will contain that check’s results for the run. All checks will have their Type, Agent Location, Title, check run date and time, and a high level summary. Each check type will have a levels of detail that are particular to their function.

In the example to the right, the Postman check that will be checking an API’s response will have Variables and Assertions that a web browser check or DNS check won’t have.
In the waterfall areas there will be a Grey Right expansion arrow that will fetch other runs to compare to this run and see if the performance is historically similar (or not).

URL Run Comparison Example

AppDynamics Example. Click on the spike with the Yellow warning, and it will take us to the waterfall view. But just above the waterfall, note first the Result Message: “…Time (3,896) was above upper limit (3,826 ms).“ So the reason that we saw the warning is that we have an upper limit of 3.8 seconds, and we breached that in this spike.

The domains called here and the 10 slowest URLs.

So we can immediately identify which resources are causing those problems from the run view here.
###
I also have screenshots, and it will collect screenshots for every page load in the transaction here. I can also force these screenshots to be taken as many times as I like throughout this scenario. So if I have an application that performs like a single page application, and I want to take multiple snaps of the details that I'm filling in, I can trace those and see if there's an error in the scenario. either. It's something like a graceful error that's being served to me, or if the website's performing in a strange way that doesn't allow me to proceed to the next step. I also have some scenario details here; I can see all of the open types clicks and asserts that we're doing in the scenario all of the targets, the page we're targeting, and the values we're putting in. So here we can see our checkout information. Coming down, I have my AppDynamics snapshot. So this was a check that used the AppDynamics integration. So essentially, for every synthetic run that we're doing, we will be triggering an app dynamics snapshot. It will open that snapshot directly, essentially cross the bridge into App dynamics, and continue our root cause analysis. And it's not just coming from a peek at app dynamics; we also tagged every single one of our requests in the headers. So you'll be able to view a peek at tag traffic from App dynamics just as well. So whatever direction you're coming to the root cause analysis, you'll be able to get to using Apica. Next on, I have our waterfall as well. So I'm seeing all the resources being called and the timings are broken up by the different pages. And we can, of course, see the pay and checkout final post at the bottom, and that's taking up much time there. So that's it from a drill-down through the different layers into Apica from those three steps.

 

 

 

 

There are some other views that we can get into, like the Health and Events view or an operations view, to start the drill down. But we can get into those perhaps in later videos, and I hope to catch you in the next tutorial. Thank you.

Can't find what you're looking for? Send an E-mail to support@apica.io