The Smile-IT Blog » Blog Archives

Tag Archives: monitoring

How to StartUp inside an Enterprise

I’ve been following Ruxit for quite some time now. In 2014, I first considered them for the Cloud delivery framework we were to create. Later – during another project – I elaborated on a comparison I did between Ruxit and newRelic; I was convinced by their “need to know” approach to monitor large diverse application landscapes.

Recently they added Docker Monitoring into their portfolio and expanded support for highly dynamic infrastructures; here’s a great webinar on that (be sure to watch closely on the live demos – compelling).

But let’s – for once – let aside the technical masterpieces in their development; let’s have a look on their strategic procession:

Dynatrace – the mothership – has been a well-known player in the monitoring field for years. I am working for quite some customers who leverage Dynatrace’s capabilities. I would not hesitate to call them a well-established enterprise. Especially in the field of cloud, well established enterprises tend to leak a certain elasticity in order to get their X-aaS initiatives to really lift-off; examples are manifold: Canopy failed eventually (my 2 cents; some may see that differently), IBM took a long time to differentiate their cloud from the core business, … some others still market their cloud endeavours sideways their core business – not for the better.

And then – last week – I received Ruxit’s eMail announcing “Ruxit grows up… announcing Dynatrace Ruxit!“, officially sent by “Bernd Greifeneder | Founder and CTO”. I was expecting that eMail; in the webinar mentioned before, slides were already branded “Dynatrace Ruxit”, and the question I raised on this, was answered expectedly, that from a successful startup-like endeavour they would now commence their move back into the parent company.

Comprehensible.

Because that is precisely what a disruptive endeavour inside a well-established company should look like: Greifeneder was obviously given the trust and money to ramp-up a totally new kind of business alongside Dynatrace’s core capabilities. I have long lost any doubts, that Ruxit created a new way of technologically and methodically doing things in Monitoring: In a container-based elastic cloud environment, there’s no need anymore to know about each and every entity; the only importance is to keep things alright for endusers – and when this is not the case, let admins quickly find the problem, and nothing else.

What – though – really baffled me was the rigorous way of pushing their technology into the market: I used to run a test account for running a few tests there and then for my projects. Whenever I logged in, something new had been deployed. Releases happened on an amazingly regular basis – DevOps style 100%. There is no way of doing this within established development processes and traditional on-premise release management. One may be able to derive traditional releases from DevOps-like continuous delivery – but not vice versa.

Bottom line: Greifeneder obviously had the possibility, the ability and the right people to do things in a totally different way from the mothership’s processes. I, of course, do not have insight in how things were really setup within Dynatrace – but last week they took their baby back into “mother’s bosom”, and in the cloud business – I’d argue – that does not happen when the baby isn’t ready to live on its own.

Respect!

Enterprise cloud and digitalisation endeavours may get their learnings from Dynatrace Ruxit. Wishing you a sunny future, Dynatrace Monitoring Cloud!

 

Published by:

Audit & Compliance for Automation Platforms

This post is part of the "Automation-Orchestration" architecture series. Posts of this series together comprise a whitepaper on Automation and Orchestration for Innovative IT-aaS Architectures.

 

Audit and Compliance has assumed greater importance in recent years. Following the Global Financial Crisis of 2007-08 – one of the most treacherous crises of our industrial age (Wikipedia cross-references various sources to the matter) – audit and standardization organizations as well as governmental institutions invested heavily into strengthening compliance laws, regulations, and enforcement.

This required enterprises in all industries to make significant investments to comply with these new regulations. Standards have evolved that define necessary policies and controls to be applied as well as requirements and procedures to audit, check, and enhance processes.

Typically, these policies encompass both business and IT related activities such as authentication, authorization, and access to systems. Emphasis is placed on tracking modifications to any IT systems or components through use of timestamps and other verification methods – particular focused on processes and communications that involve financial transactions.

Therefore, supporting the enforcement and reporting of these requirements, policies and regulations must be a core function of the automation solution. Following are the key factors to consider when it comes to automation and its impact on audit and compliance.

Traceability

The most important feature of an automation solution to meet compliance standards is traceability. The solution must allow for logging capabilities that tracks user activity within the system. It must provide tracking of all modifications to the system’s repository and include the user’s name, date and time of the change, and a copy of the data before and after the change was made. Such a feature ensures system integrity and compliance with regulatory statutes.

Statistics

Statistical records are a feature that ensures recording of any step performed either by an actual user or one initiated by an external interface (API). Such records should be stored in a hierarchy within the system’s backend database allowing follow up checking as to who performed what action at what specific time. Additionally, the system should allow for comments on single or multiple statistical records, thereby supporting complete traceability of automation activities by documenting additional operator actions.

Version Management

Some automation solutions offer the option of integrated version management. Once enabled, the solution keeps track of all changes made to tasks and blueprint definitions as well as to objects like calendars, time zones etc. Every change creates a new version of the specific object which can be accessible at any time for follow up investigation. Objects include additional information like version numbers, change dates and user identification. In some cases, the system allows for restoring an older version of the specific objects.

Monitoring

All of the above handle, process and record design-time activity of an automation system, ensuring stored data and data changes are documented to comply with audit needs. During execution, an automation system should also be able to monitor the behavior of every instantiated blueprint. Monitoring records need to track the instance itself as well as every input/output, changes performed to or by this instance (e.g. putting a certain task on hold manually).

Full Audit Trails

All of the above features contribute to a complete audit trail that complies with the reporting requirements as defined by the various standards. Ultimately an automation system must be able to easily produce an audit trail of all system activity from the central database in order to document specific actions being investigated by the auditor. An additional level of security that also enables compliance with law and regulations is the system’s ability to restrict access to this data on a user/group basis.

Compliance Through Standardization

Finally, to ease compliance adherence, the automation solution must follow common industry standards. While proprietary approaches within a system’s architecture are applicable and necessary (e.g. scripting language – see chapter “Dynamic Processing Control”), the automation solution itself must strictly follow encryption methods, communication protocols, and authentication technologies that are widely considered as common industry best practice. Any other approach in these areas would significantly complicate the efforts of IT Operations to prove compliance with audit and regulatory standards. In certain cases, it could even increase the audit cycle to less than a year depending on the financial and IT control standard being followed.

Published by:
SmileIT

Evaluation Report – Monitoring Comparison: newRelic vs. Ruxit

I’ve worked on cloud computing frameworks with a couple of companies meanwhile. DevOps like processes are always an issue along with these cooperations – even more when it comes to monitoring and how to innovatively approach the matter.

As an example I am ever and again emphasizing Netflix’s approach in these conversations: I very much like Netflix’s philosophy of how to deploy, operate and continuously change environment and services. Netflix’s different component teams do not have any clue on the activities of other component teams; their policy is that every team is self-responsible for changes not to break anything in the overall system. Also, no one really knows in detail which servers, instances, services are up and running to serve requests. Servers and services are constantly automatically re-instantiated, rebooted, added, removed, etc. Such is a philosophy to make DevOps real.

Clearly, when monitoring such a landscape traditional (SLA-fulfilment oriented) methods must fail. It simply isn’t sufficient for a Cloud-aware, continuous delivery oriented monitoring system to just integrate traditional on-premise monitoring solutions like e.g. Nagios with e.g. AWS’ CloudWatch. Well, we know that this works fine, but it does not yet ease the cumbersome work of NOCs or Application Operators to quickly identify

  1. the impact of a certain alert, hence its priority for ongoing operations and
  2. the root cause for a possible error

After discussing these facts the umpteenth time and (again) being confronted with the same old arguments about the importance of ubiquitous information on every single event within a system (for the sake of proving SLA compliancy), I thought to give it a try and dig deeper by myself to find out whether these arguments are valid (and I am therefore wrong) or whether there is a possibility to substantially reduce event occurrence and let IT personal only follow up the really important stuff. Efficiently.

At this stage, it is time for a little

DISCLAIMER: I am not a monitoring or APM expert; neither am I a .NET programming expert. Both skill areas are fairly familiar to me, but in this case I intentionally approached the matter from a business perspective – as least technical as possible.

The Preps

In autumn last year I had the chance to get a little insight into 2 pure-SaaS monitoring products: Ruxit and newRelic. Ruxit back then was – well – a baby: Early beta, no real functionality but a well-received glimpse of what the guys are on for. newRelic was already pretty strong and I very much liked their light and quick way of getting started.

As that project back then got stuck and I ended my evaluations in the middle of getting insight, I thought, getting back to that could be a good starting point (especially as I wasn’t able to find any other monitoring product going the SaaS path that radically, i.e. not even thinking of offering an on-premise option; and as a cloud “aficionado” I was very keen on seeing a full-stack SaaS approach). So the product scope was set pretty straight.

The investigative scope, this time, should answer questions a bit more in a structured way:

  1. How easy is it to kick off monitoring within one system?
  2. How easy is it to combine multiple systems (on-premise and cloud) within one easy-to-digest overview?
  3. What’s alerted and why?
  4. What steps are needed in order to add APM to a system already monitored?
  5. Correlation of events and its appearance?
  6. The “need to know” principle: Impact versus alert appearance?

The setup I used was fairly simple (and reduced – as I didn’t want to bother our customer’s workloads in any of their datacenters): I had an old t1.micro instance still lurking around on my AWS account; this is 1 vCPU with 613MB RAM – far too small to really perform with the stuff I wanted it to do. I intentionally decided to use that one for my tests. Later, the following was added to the overall setup:

  • An RDS SQL Server database (which I used for the application I wanted to add to the environment at a later stage)
  • IIS 6 (as available within the Server image that my EC2 instance is using)
  • .NET framework 4
  • Some .NET sample application (some “Contoso” app; deployed directly from within Visual Studio – no changes to the defaults)

Immediate Observations

2 things popped into my eyes only hours (if not minutes) after commencing my activities in newRelic and Ruxit, but let’s first start with the basics.

Setting up accounts is easy and straight forward in both systems. They are both truly following the cloud affine “on-demand” characteristic. newRelic creates a free “Pro” trial account which is converted into a lifetime free account when not upgraded to “paid” after 14 days. Ruxit sets up a free account for their product but takes a totally different approach – closer resembling to consumption-based pricing: you get 1000 hours of APM and 50k user visits for free.

Both systems follow pretty much the same path after an account has been created:

  • In the best case, access your account from within the system you want to monitor (or deploy the downloaded installer package – see below – to the target system manually)
  • Download the appropriate monitoring agent and run the installer. Done.

Both agents started to collect data immediately and the browser-based dashboards produced the first overview of my system within some minutes.

As a second step, I also installed the agents to my local client machine as I wanted to know how the dashboards display multiple systems – and here’s a bummer with Ruxit: My antivirus scanner alerted me with an Win32.Evo-Gen suspicion:

Avast virus alert upon Ruxit agent install

Avast virus alert upon Ruxit agent install

It wasn’t really a problem for the agent to install and operate properly and produce data; it was just a little confusing. In essence, the reason for this is fairly obvious: The agent is using a technique which is comparable to typical virus intrusion patterns, i.e. sticking its fingers deep into the system.

The second observation was newRelics approach to implement web browser remote checks, called “Synthetics”. It was indeed astonishingly easy to add a URL to the system and let newRelic do their thing – seemingly from within the AWS datacenters around the world. And especially with this, newRelic has a very compelling way of displaying the respective information on their Synthetics dashboard. Easy to digest and pretty comprehensive.

At the time when I started off with my evaluation, Ruxit didn’t offer that. Meanwhile they added their Beta for “Web Checks” to my account. Equally easy to setup but lacking some more rich UI features wrt display of information. I am fairly sure that this’ll be added soon. Hopefully. My take is, that combining system monitoring or APM with insights displaying real user usage patterns is an essential part to efficiently correlate events.

Security

I always spend a second thought on security questions, hence contemplated Ruxit’s way of making sure that an agent really connects to the right tenant when being installed. With newRelic you’re confronted with an extra step upon installation: They ask you to copy+paste a security key from your account page during their install procedure.

newRelic security key example

newRelic security key example

Ruxit doesn’t do that. However, they’re not really less secure; it’s just that they pre-embed this key into the installer package that is downloaded,c so they’re just a little more convenient. Following shows the msiexec command executed upon installation as well as its parameters taken form the installer log (you can easily find that information after the .exe package unpacks into the system’s temp folder):

@msiexec /i "%i_msi_dir%\%i_msi%" /L*v %install_log_file% SERVER="%i_server%" PROCESSHOOKING="%i_hooking%" TENANT="%i_tenant%" TENANT_TOKEN="%i_token%" %1 %2 %3 %4 %5 %6 %7 %8 %9 >con:
MSI (c) (5C:74) [13:35:21:458]: Command Line: SERVER=https://qvp18043.live.ruxit.com:443 PROCESSHOOKING=1 TENANT=qvp18043 TENANT_TOKEN=ABCdefGHI4JKLM5n CURRENTDIRECTORY=C:\Users\thome\Downloads CLIENTUILEVEL=0 CLIENTPROCESSID=43100

Alerting

After having applied the package (both packages) onto my Windows Server on EC2 things popped up quickly within the dashboards (note, that both dashboard screenshots are from a later evaluation stage; however, the basic layout was the very same at the beginning – I didn’t change anything visually down the road).

newRelic server monitoring dashboard

newRelic server monitoring dashboard showing the limits of my too-small instance 🙂

Ruxit server monitoring dashboard

The Ruxit dashboard on the same server; with a clear hint on a memory problem 🙂

What instantly stroke me here was the simplicity of Ruxit’s server monitoring information. It seemed sort-of “thin” on information (if you want a real whole lot of info right from the start, you probably prefer newRelic’s dashboard). Things, though, changed when my server went into memory saturation (which it constantly does right away when accessed via RDP). At that stage, newRelic started firing eMails alerting me of the problem. Also, the dashboard went red. Ruxit in turn did nothing really. Well, of course, it displayed the problem once I was logged into the dashboard again and had a look at my server’s monitoring data; but no alert triggered, no eMail, no red flag. Nothing.

If you’re into SLA fulfilment, then that is precisely the moment to become concerned. On second thought, however, I figured that actually no one was really bothered by the problem. There was no real user interaction going on in that server instance. I hadn’t even added an app really. Hence: why bother?

So, next step was to figure out, why newRelic went so crazy with that. It turned out that with newRelic every newly added server gets assigned to a default server policy.

newRelic's monitoring policy configuration

newRelic’s monitoring policy configuration

I could turn off that policy easily (also editing apparently seems straight forward; I didn’t try). However, to think that with every server I’m adding I’d have to figure out first, which alerts are important as they might be impacting someone or something seemed less on a “need to know” basis than I intended to have.

After having switched off the policy, newRelic went silent.

BTW, alerting via eMail is not setup by default in Ruxit; within the tenant’s settings area, this can be added as a so called “Integration” point.

AWS Monitoring

As said above, I was keen to know how both systems integrate multiple monitoring sources into their overviews. My idea was to add my AWS tenant to be monitored (this resulted from the previously mentioned customer conversations I had had earlier; that customer’s utmost concern was to add AWS to their monitoring overview – which in their case was Nagios, as said).

A nice thing with Ruxit is that they fill their dashboard with those little demo tiles, which easily lead you into their capabilities without having setup anything yet (the example below shows the database demo tile).

Ruxit demo tile example

This is one of the demo tiles in Ruxit’s dashboard – leading to DB monitoring in this case

I found an AWS demo tile (similar to the example above), clicked and ended up with a light explanation of how to add an AWS environment to my monitoring ecosystem (https://help.ruxit.com/pages/viewpage.action?pageId=9994248). They offer key based or role based access to your AWS tenant. Basically what they need you to do is these 3 steps:

  1. Create either a role or a user (for use of access key based connection)
  2. Apply the respective AWS policy to that role/user
  3. Create a new cloud monitoring instance within Ruxit and connect it to that newly created AWS resource from step 1

Right after having executed the steps, the aforementioned demo tiled changed into displaying real data and my AWS resources showed up (note, that the example below already contains RDS, which I added at a later stage; the cool thing here was, that that was added fully unattended as soon as I had created it in AWS).

Ruxit AWS monitoring overview

Ruxit AWS monitoring overview

Ruxit essentially monitors everything within AWS which you can put a CloudWatch metric on – which is a fair lot, indeed.

So, next step clearly was to seek the same capability within newRelic. As far as I could work out, newRelic’s approach here is to offer plugins – and newRelic’s plugin ecosystem is vast. That may mean, that there’s a whole lot of possibilities for integrating monitoring into the respective IT landscape (whatever it may be); however, one may consider the process to add plugin after plugin (until the whole landscape is covered) a bit cumbersome. Here’s a list of AWS plugins with newRelic:

newRelic plugins for AWS

newRelic plugins for AWS

newRelic plugins for AWS

newRelic plugins for AWS

Add APM

Adding APM to my monitoring ecosystem was probably the most interesting experience in this whole test: As a preps for the intended result (i.e.: analyse data about a web application’s performance at real user interaction) I added an IIS to my server and an RDS database to my AWS account (as mentioned before).

The more interesting fact, though, was that after having finalized the IIS installation, Ruxit instantly showed the IIS services in their “Smartscape” view (more on that a little later). I didn’t have to change anything in my Ruxit environment.

newRelic’s approach is a little different here. The below screenshot shows their APM start page with .NET selected.

newRelic APM start page with .NET selected

newRelic APM start page with .NET selected

After having confirmed each selection which popped up step by step, I was presented with a download link for another agent package which I had to apply to my server.

The interesting thing, though, was, that still nothing showed up. No services or additional information on any accessible apps. That is logical in a way, as I did not have anything published on that server yet which resembled an application, really. The only thing that was accessible from the outside was the IIS default web (just showing that IIS logo).

So, essentially the difference here is that with newRelic you get system monitoring with a system monitoring agent, and by means of an application monitoring agent you can add monitoring of precisely the type of application the agent is intended for.

I didn’t dig further yet (that may be subject for another article), but it seems that with Ruxit I can have monitoring for anything going on on a server by means of just one install package (maybe one more explanation for the aforementioned virus scan alert).

However, after having published my .NET application, everything was fine again in both systems – and the dashboards went red instantly as the server went into CPU saturation due to its weakness (as intended ;)).

Smartscape – Overview

So, final question to answer was: What do the dashboards show and how do they ease (root cause) analysis?

As soon as the app was up and running and web requests started to role in, newRelic displayed everything to know about the application’s performance. Particularly nice is the out-of-the-box combination of APM data with browser request data within the first and the second menu item (either switch between the 2 by clicking the menu or use the links within the diagrams displayed).

newRelic APM dashboard

newRelic APM dashboard

The difficulty with newRelic was to discover the essence of the web application’s problem. Transactions and front-end code performance was displayed in every detail, but I knew (from my configuration) that the problem of slow page loads – as displayed – lied in the general weakness of my web server.

And that is basically where Ruxit’s smartscape tile in their dashboard made the essential difference. The below screenshot shows a problem within my web application as initially displayed in Ruxit’s smartscape view:

Ruxit's smartscape view showing a problem in my application

Ruxit’s smartscape view showing a problem in my application

By this view, it was obvious that the problem was either within the application itself or within the server as such. A click to the server not only reveals the path to the depending web application but also other possibly impacted services (obviously without end user impact as otherwise there would be an alert on them, too).

Ruxit smartscape with dependencies between servers, services, apps

Ruxit smartscape with dependencies between servers, services, apps

And digging into the server’s details revealed the problem (CPU saturation, unsurprisingly).

Ruxit revealing CPU saturation as a root cause

Ruxit revealing CPU saturation as a root cause

Still, the amount of dashboard alerts where pretty few. While I had 6 eMails from newRelic telling me about the problem on that server, I had only 2 within Ruxit: 1 telling me about the web app’s weak response and another about CPU saturation.

Next step, hence, would be to scale-up the server (in my environment) or scale-out or implement an enhanced application architecture (in a realistic production scenario). But that’s another story …

Bottom line

Event correlation and alerting on a “need to know” basis – at least for me – remains the right way to go.

This little test was done with just one server, one database, one web application (and a few other services). While newRelics comprehensive approach to showing information is really compelling and perfectly serves the objective of complete SLA compliancy reporting, Ruxit’s “need to know” principle much more meets the needs of what I would expect form innovative cloud monitoring.

Considering Netflix’s philosophy from the beginning of this article, innovative cloud monitoring basically translates into: Every extra step is a burden. Every extra information on events without impact means extra OPS effort. And every extra-click to correlate different events to a probable common root-cause critically lengthens MTTR.

A “need to know” monitoring approach while at the same time offering full stack visibility of correlated events is – for me – one step closer to comprehensive Cloud-ready monitoring and DevOps.

And Ruxit really seems to be “spot on” in that respect!

 

Published by:

DevOps style performance monitoring for .NET

 

{{ this article has originally been published in DevOps.com }}

 

Recently I began looking for an application performance management solution for .NET. My requirements are code level visibility, end to end request tracing, and infrastructure monitoring in a DevOps production setup.

DotTrace is clearly the most well-known tool for code level visibility in development setups, but it can’t be used in a 24×7 production setup. DotTrace also doesn’t do typical Ops monitoring.

Unfortunately a Google search didn’t return much in terms of a tool comparison for .NET production monitoring. So I decided to do some research on my own. Following is a short list of well-known tools in the APM space that support .NET. My focus is on finding an end-to-end solution and profiler-like visibility into transactions.

New Relic was the first to do APM SaaS, focused squarely on production with a complete offering. New Relic offers web request monitoring for .NET, Java, and more. It automatically shows a component-based breakdown of the most important requests. The breakdown is fairly intuitive to use and goes down to the SQL level. Code level visibility, at least for .NET, is achieved by manually starting and stopping sampling. This is fine for analyzing currently running applications, but makes analysis of past problems a challenge. New Relic’s main advantage is its ease of us, intuitive UI, and a feature set that can help you quickly identify simple issues. Depth is the main weakness of NewRelic. As soon as you try to dig deeper into the data, you’re stuck. This might be a minor point, but if you’re used to working with a profiler, you’ll miss CPU breakdown as New Relic only shows response times.

net-1-newrelic

Dynatrace is the vendor that started the APM revolution and is definitely the strongest horse in this race. Its feature set in terms of .NET is the most complete, offering code level monitoring (including CPU and wait times), end to end tracing, and user experience monitoring. As far as I can determine, it’s the only tool with a memory profiler for .NET and it also features IIS web request insight. It supports the entire application life cycle from development environments, to load testing, to production. As such it’s nearly perfect for DevOps. Due to its pricing structure and architecture it’s targeted more at the mid to enterprise markets. In terms of ease of use it’s catching up to competition with a new Web UI. It’s rather light on infrastructure monitoring on its own, but shows additional strength with optional Dynatrace synthetic and network monitoring components.

net-2-dynatrace

Ruxit is a new SaaS solution built by Dynatrace. It’s unique in that it unites application performance management and real user monitoring with infrastructure, cloud, and network monitoring into a single product. It is by far the easiest to install, literally takes 2 minutes. It features full end to end tracing, code level visibility down to the method level, SQL visibility, and RUM for .NET, Java, and other languages, with insight into IIS and Apache. Apart from this it has an analytics engine that delivers both technical and user experience insights. Its main advantages are its ease of use, web UI, fully automated root cause analysis, and frankly, amazing breadth. Its flexible consumption based pricing scales from startups, cloud natives, and mid markets up to large web scale deployments of ten-thousands of servers.

net-3-ruxit

AppNetta‘s TraceView takes a different approach to application performance management. It does support tracing across most major languages including database statements and of course .NET. It visualizes things in charts and scatter plots. Even traces across multiple layers and applications are visualized in graphs. This has its advantages but takes some time getting used to it. Unfortunately while TraceView does support .NET it does not yet have code level visibility for it. This makes sense for AppNetta, which as a whole is more focused on large scale monitoring and has more of a network centric background. For DevOps in .NET environments however, it’s a bit lacking.

net-4-TraceView

Foglight, originally owned by Quest and now owned by Dell, is a well-known application performance management solution. It is clearly meant for operations monitoring and tracks all web requests. It integrates infrastructure and application monitoring, end to end tracing, and code level visibility on .NET, among other things. It has the required depth, but it’s rather complex to set up and obviously generates alert storms as far as I could experience. It takes a while to configure and get the data you need. Once properly set up though, you get a lot of insight into your .NET application. In a fast moving DevOps scenario though it might take too long to manually adapt to infrastructure changes.

net-5-foglight

AppDynamics is well known in the APM space. Its offering is quite complete and it features .NET monitoring, quite nice transaction flow tracing, user experience, and code level profiling capabilities. It is production capable, though code level visibility may be limited here to reduce overhead. Apart from these features though, AppDynamics has some weaknesses, mainly the lack of IIS request visibility and the fact that it only features walk clock time with no CPU breakdown. Its flash-based web UI and rather cumbersome agent configuration can also be counted as negatives. Compared to others it’s also lacking in terms of infrastructure monitoring. Its pricing structure definitely targets the mid market.

net-6-AppDynamics

Manage Engine has traditionally focused on IT monitoring, but in recent years they added end user and application performance monitoring to their portfolio called APM Insight. Manage Engine does give you metric level insight into .NET applications and transaction trace snap shots which give you code level stack traces and database interactions. However it’s apparent that Manage Engine is a monitoring tool and APM insight doesn’t provide the level of depth one might be accustomed to from other APM tools and profilers.

net-7-ME

JenniferSoft is a monitoring solution that provides nice real-time dashboarding and gives an overview of the topology of your environment. It enables users to see deviations in the speed of transactions with real time scatter charts and analysis of transactions. It provides “profiling” for IIS/.NET transactions, but only on single tiers and has no transaction tracing. Their strong suit is clearly cool dashboarding but not necessarily analytics. For example, they are the only vendor that features 3D animated dashboards.

net-8-JenniferSoft

Conclusion: There’s more buzz around on the APM space than a Google search would reveal on first sight and I did actually discover some cool vendors to target my needs; however, the field clears up pretty much when you dig for end-2-end visibility from code down to infrastructure, including RUM, any web service requests and deep SQL insights. And if you want to pair that with a nice, fluent, ease-of-use web UI and efficient analytics, there’s actually not many left …

Published by:
%d bloggers like this: