The Smile-IT Blog » Blog Archives

Tag Archives: DevOps

How to StartUp inside an Enterprise

I’ve been following Ruxit for quite some time now. In 2014, I first considered them for the Cloud delivery framework we were to create. Later – during another project – I elaborated on a comparison I did between Ruxit and newRelic; I was convinced by their “need to know” approach to monitor large diverse application landscapes.

Recently they added Docker Monitoring into their portfolio and expanded support for highly dynamic infrastructures; here’s a great webinar on that (be sure to watch closely on the live demos – compelling).

But let’s – for once – let aside the technical masterpieces in their development; let’s have a look on their strategic procession:

Dynatrace – the mothership – has been a well-known player in the monitoring field for years. I am working for quite some customers who leverage Dynatrace’s capabilities. I would not hesitate to call them a well-established enterprise. Especially in the field of cloud, well established enterprises tend to leak a certain elasticity in order to get their X-aaS initiatives to really lift-off; examples are manifold: Canopy failed eventually (my 2 cents; some may see that differently), IBM took a long time to differentiate their cloud from the core business, … some others still market their cloud endeavours sideways their core business – not for the better.

And then – last week – I received Ruxit’s eMail announcing “Ruxit grows up… announcing Dynatrace Ruxit!“, officially sent by “Bernd Greifeneder | Founder and CTO”. I was expecting that eMail; in the webinar mentioned before, slides were already branded “Dynatrace Ruxit”, and the question I raised on this, was answered expectedly, that from a successful startup-like endeavour they would now commence their move back into the parent company.

Comprehensible.

Because that is precisely what a disruptive endeavour inside a well-established company should look like: Greifeneder was obviously given the trust and money to ramp-up a totally new kind of business alongside Dynatrace’s core capabilities. I have long lost any doubts, that Ruxit created a new way of technologically and methodically doing things in Monitoring: In a container-based elastic cloud environment, there’s no need anymore to know about each and every entity; the only importance is to keep things alright for endusers – and when this is not the case, let admins quickly find the problem, and nothing else.

What – though – really baffled me was the rigorous way of pushing their technology into the market: I used to run a test account for running a few tests there and then for my projects. Whenever I logged in, something new had been deployed. Releases happened on an amazingly regular basis – DevOps style 100%. There is no way of doing this within established development processes and traditional on-premise release management. One may be able to derive traditional releases from DevOps-like continuous delivery – but not vice versa.

Bottom line: Greifeneder obviously had the possibility, the ability and the right people to do things in a totally different way from the mothership’s processes. I, of course, do not have insight in how things were really setup within Dynatrace – but last week they took their baby back into “mother’s bosom”, and in the cloud business – I’d argue – that does not happen when the baby isn’t ready to live on its own.

Respect!

Enterprise cloud and digitalisation endeavours may get their learnings from Dynatrace Ruxit. Wishing you a sunny future, Dynatrace Monitoring Cloud!

 

Published by:
SmileIT

Evaluation Report – Monitoring Comparison: newRelic vs. Ruxit

I’ve worked on cloud computing frameworks with a couple of companies meanwhile. DevOps like processes are always an issue along with these cooperations – even more when it comes to monitoring and how to innovatively approach the matter.

As an example I am ever and again emphasizing Netflix’s approach in these conversations: I very much like Netflix’s philosophy of how to deploy, operate and continuously change environment and services. Netflix’s different component teams do not have any clue on the activities of other component teams; their policy is that every team is self-responsible for changes not to break anything in the overall system. Also, no one really knows in detail which servers, instances, services are up and running to serve requests. Servers and services are constantly automatically re-instantiated, rebooted, added, removed, etc. Such is a philosophy to make DevOps real.

Clearly, when monitoring such a landscape traditional (SLA-fulfilment oriented) methods must fail. It simply isn’t sufficient for a Cloud-aware, continuous delivery oriented monitoring system to just integrate traditional on-premise monitoring solutions like e.g. Nagios with e.g. AWS’ CloudWatch. Well, we know that this works fine, but it does not yet ease the cumbersome work of NOCs or Application Operators to quickly identify

  1. the impact of a certain alert, hence its priority for ongoing operations and
  2. the root cause for a possible error

After discussing these facts the umpteenth time and (again) being confronted with the same old arguments about the importance of ubiquitous information on every single event within a system (for the sake of proving SLA compliancy), I thought to give it a try and dig deeper by myself to find out whether these arguments are valid (and I am therefore wrong) or whether there is a possibility to substantially reduce event occurrence and let IT personal only follow up the really important stuff. Efficiently.

At this stage, it is time for a little

DISCLAIMER: I am not a monitoring or APM expert; neither am I a .NET programming expert. Both skill areas are fairly familiar to me, but in this case I intentionally approached the matter from a business perspective – as least technical as possible.

The Preps

In autumn last year I had the chance to get a little insight into 2 pure-SaaS monitoring products: Ruxit and newRelic. Ruxit back then was – well – a baby: Early beta, no real functionality but a well-received glimpse of what the guys are on for. newRelic was already pretty strong and I very much liked their light and quick way of getting started.

As that project back then got stuck and I ended my evaluations in the middle of getting insight, I thought, getting back to that could be a good starting point (especially as I wasn’t able to find any other monitoring product going the SaaS path that radically, i.e. not even thinking of offering an on-premise option; and as a cloud “aficionado” I was very keen on seeing a full-stack SaaS approach). So the product scope was set pretty straight.

The investigative scope, this time, should answer questions a bit more in a structured way:

  1. How easy is it to kick off monitoring within one system?
  2. How easy is it to combine multiple systems (on-premise and cloud) within one easy-to-digest overview?
  3. What’s alerted and why?
  4. What steps are needed in order to add APM to a system already monitored?
  5. Correlation of events and its appearance?
  6. The “need to know” principle: Impact versus alert appearance?

The setup I used was fairly simple (and reduced – as I didn’t want to bother our customer’s workloads in any of their datacenters): I had an old t1.micro instance still lurking around on my AWS account; this is 1 vCPU with 613MB RAM – far too small to really perform with the stuff I wanted it to do. I intentionally decided to use that one for my tests. Later, the following was added to the overall setup:

  • An RDS SQL Server database (which I used for the application I wanted to add to the environment at a later stage)
  • IIS 6 (as available within the Server image that my EC2 instance is using)
  • .NET framework 4
  • Some .NET sample application (some “Contoso” app; deployed directly from within Visual Studio – no changes to the defaults)

Immediate Observations

2 things popped into my eyes only hours (if not minutes) after commencing my activities in newRelic and Ruxit, but let’s first start with the basics.

Setting up accounts is easy and straight forward in both systems. They are both truly following the cloud affine “on-demand” characteristic. newRelic creates a free “Pro” trial account which is converted into a lifetime free account when not upgraded to “paid” after 14 days. Ruxit sets up a free account for their product but takes a totally different approach – closer resembling to consumption-based pricing: you get 1000 hours of APM and 50k user visits for free.

Both systems follow pretty much the same path after an account has been created:

  • In the best case, access your account from within the system you want to monitor (or deploy the downloaded installer package – see below – to the target system manually)
  • Download the appropriate monitoring agent and run the installer. Done.

Both agents started to collect data immediately and the browser-based dashboards produced the first overview of my system within some minutes.

As a second step, I also installed the agents to my local client machine as I wanted to know how the dashboards display multiple systems – and here’s a bummer with Ruxit: My antivirus scanner alerted me with an Win32.Evo-Gen suspicion:

Avast virus alert upon Ruxit agent install

Avast virus alert upon Ruxit agent install

It wasn’t really a problem for the agent to install and operate properly and produce data; it was just a little confusing. In essence, the reason for this is fairly obvious: The agent is using a technique which is comparable to typical virus intrusion patterns, i.e. sticking its fingers deep into the system.

The second observation was newRelics approach to implement web browser remote checks, called “Synthetics”. It was indeed astonishingly easy to add a URL to the system and let newRelic do their thing – seemingly from within the AWS datacenters around the world. And especially with this, newRelic has a very compelling way of displaying the respective information on their Synthetics dashboard. Easy to digest and pretty comprehensive.

At the time when I started off with my evaluation, Ruxit didn’t offer that. Meanwhile they added their Beta for “Web Checks” to my account. Equally easy to setup but lacking some more rich UI features wrt display of information. I am fairly sure that this’ll be added soon. Hopefully. My take is, that combining system monitoring or APM with insights displaying real user usage patterns is an essential part to efficiently correlate events.

Security

I always spend a second thought on security questions, hence contemplated Ruxit’s way of making sure that an agent really connects to the right tenant when being installed. With newRelic you’re confronted with an extra step upon installation: They ask you to copy+paste a security key from your account page during their install procedure.

newRelic security key example

newRelic security key example

Ruxit doesn’t do that. However, they’re not really less secure; it’s just that they pre-embed this key into the installer package that is downloaded,c so they’re just a little more convenient. Following shows the msiexec command executed upon installation as well as its parameters taken form the installer log (you can easily find that information after the .exe package unpacks into the system’s temp folder):

@msiexec /i "%i_msi_dir%\%i_msi%" /L*v %install_log_file% SERVER="%i_server%" PROCESSHOOKING="%i_hooking%" TENANT="%i_tenant%" TENANT_TOKEN="%i_token%" %1 %2 %3 %4 %5 %6 %7 %8 %9 >con:
MSI (c) (5C:74) [13:35:21:458]: Command Line: SERVER=https://qvp18043.live.ruxit.com:443 PROCESSHOOKING=1 TENANT=qvp18043 TENANT_TOKEN=ABCdefGHI4JKLM5n CURRENTDIRECTORY=C:\Users\thome\Downloads CLIENTUILEVEL=0 CLIENTPROCESSID=43100

Alerting

After having applied the package (both packages) onto my Windows Server on EC2 things popped up quickly within the dashboards (note, that both dashboard screenshots are from a later evaluation stage; however, the basic layout was the very same at the beginning – I didn’t change anything visually down the road).

newRelic server monitoring dashboard

newRelic server monitoring dashboard showing the limits of my too-small instance 🙂

Ruxit server monitoring dashboard

The Ruxit dashboard on the same server; with a clear hint on a memory problem 🙂

What instantly stroke me here was the simplicity of Ruxit’s server monitoring information. It seemed sort-of “thin” on information (if you want a real whole lot of info right from the start, you probably prefer newRelic’s dashboard). Things, though, changed when my server went into memory saturation (which it constantly does right away when accessed via RDP). At that stage, newRelic started firing eMails alerting me of the problem. Also, the dashboard went red. Ruxit in turn did nothing really. Well, of course, it displayed the problem once I was logged into the dashboard again and had a look at my server’s monitoring data; but no alert triggered, no eMail, no red flag. Nothing.

If you’re into SLA fulfilment, then that is precisely the moment to become concerned. On second thought, however, I figured that actually no one was really bothered by the problem. There was no real user interaction going on in that server instance. I hadn’t even added an app really. Hence: why bother?

So, next step was to figure out, why newRelic went so crazy with that. It turned out that with newRelic every newly added server gets assigned to a default server policy.

newRelic's monitoring policy configuration

newRelic’s monitoring policy configuration

I could turn off that policy easily (also editing apparently seems straight forward; I didn’t try). However, to think that with every server I’m adding I’d have to figure out first, which alerts are important as they might be impacting someone or something seemed less on a “need to know” basis than I intended to have.

After having switched off the policy, newRelic went silent.

BTW, alerting via eMail is not setup by default in Ruxit; within the tenant’s settings area, this can be added as a so called “Integration” point.

AWS Monitoring

As said above, I was keen to know how both systems integrate multiple monitoring sources into their overviews. My idea was to add my AWS tenant to be monitored (this resulted from the previously mentioned customer conversations I had had earlier; that customer’s utmost concern was to add AWS to their monitoring overview – which in their case was Nagios, as said).

A nice thing with Ruxit is that they fill their dashboard with those little demo tiles, which easily lead you into their capabilities without having setup anything yet (the example below shows the database demo tile).

Ruxit demo tile example

This is one of the demo tiles in Ruxit’s dashboard – leading to DB monitoring in this case

I found an AWS demo tile (similar to the example above), clicked and ended up with a light explanation of how to add an AWS environment to my monitoring ecosystem (https://help.ruxit.com/pages/viewpage.action?pageId=9994248). They offer key based or role based access to your AWS tenant. Basically what they need you to do is these 3 steps:

  1. Create either a role or a user (for use of access key based connection)
  2. Apply the respective AWS policy to that role/user
  3. Create a new cloud monitoring instance within Ruxit and connect it to that newly created AWS resource from step 1

Right after having executed the steps, the aforementioned demo tiled changed into displaying real data and my AWS resources showed up (note, that the example below already contains RDS, which I added at a later stage; the cool thing here was, that that was added fully unattended as soon as I had created it in AWS).

Ruxit AWS monitoring overview

Ruxit AWS monitoring overview

Ruxit essentially monitors everything within AWS which you can put a CloudWatch metric on – which is a fair lot, indeed.

So, next step clearly was to seek the same capability within newRelic. As far as I could work out, newRelic’s approach here is to offer plugins – and newRelic’s plugin ecosystem is vast. That may mean, that there’s a whole lot of possibilities for integrating monitoring into the respective IT landscape (whatever it may be); however, one may consider the process to add plugin after plugin (until the whole landscape is covered) a bit cumbersome. Here’s a list of AWS plugins with newRelic:

newRelic plugins for AWS

newRelic plugins for AWS

newRelic plugins for AWS

newRelic plugins for AWS

Add APM

Adding APM to my monitoring ecosystem was probably the most interesting experience in this whole test: As a preps for the intended result (i.e.: analyse data about a web application’s performance at real user interaction) I added an IIS to my server and an RDS database to my AWS account (as mentioned before).

The more interesting fact, though, was that after having finalized the IIS installation, Ruxit instantly showed the IIS services in their “Smartscape” view (more on that a little later). I didn’t have to change anything in my Ruxit environment.

newRelic’s approach is a little different here. The below screenshot shows their APM start page with .NET selected.

newRelic APM start page with .NET selected

newRelic APM start page with .NET selected

After having confirmed each selection which popped up step by step, I was presented with a download link for another agent package which I had to apply to my server.

The interesting thing, though, was, that still nothing showed up. No services or additional information on any accessible apps. That is logical in a way, as I did not have anything published on that server yet which resembled an application, really. The only thing that was accessible from the outside was the IIS default web (just showing that IIS logo).

So, essentially the difference here is that with newRelic you get system monitoring with a system monitoring agent, and by means of an application monitoring agent you can add monitoring of precisely the type of application the agent is intended for.

I didn’t dig further yet (that may be subject for another article), but it seems that with Ruxit I can have monitoring for anything going on on a server by means of just one install package (maybe one more explanation for the aforementioned virus scan alert).

However, after having published my .NET application, everything was fine again in both systems – and the dashboards went red instantly as the server went into CPU saturation due to its weakness (as intended ;)).

Smartscape – Overview

So, final question to answer was: What do the dashboards show and how do they ease (root cause) analysis?

As soon as the app was up and running and web requests started to role in, newRelic displayed everything to know about the application’s performance. Particularly nice is the out-of-the-box combination of APM data with browser request data within the first and the second menu item (either switch between the 2 by clicking the menu or use the links within the diagrams displayed).

newRelic APM dashboard

newRelic APM dashboard

The difficulty with newRelic was to discover the essence of the web application’s problem. Transactions and front-end code performance was displayed in every detail, but I knew (from my configuration) that the problem of slow page loads – as displayed – lied in the general weakness of my web server.

And that is basically where Ruxit’s smartscape tile in their dashboard made the essential difference. The below screenshot shows a problem within my web application as initially displayed in Ruxit’s smartscape view:

Ruxit's smartscape view showing a problem in my application

Ruxit’s smartscape view showing a problem in my application

By this view, it was obvious that the problem was either within the application itself or within the server as such. A click to the server not only reveals the path to the depending web application but also other possibly impacted services (obviously without end user impact as otherwise there would be an alert on them, too).

Ruxit smartscape with dependencies between servers, services, apps

Ruxit smartscape with dependencies between servers, services, apps

And digging into the server’s details revealed the problem (CPU saturation, unsurprisingly).

Ruxit revealing CPU saturation as a root cause

Ruxit revealing CPU saturation as a root cause

Still, the amount of dashboard alerts where pretty few. While I had 6 eMails from newRelic telling me about the problem on that server, I had only 2 within Ruxit: 1 telling me about the web app’s weak response and another about CPU saturation.

Next step, hence, would be to scale-up the server (in my environment) or scale-out or implement an enhanced application architecture (in a realistic production scenario). But that’s another story …

Bottom line

Event correlation and alerting on a “need to know” basis – at least for me – remains the right way to go.

This little test was done with just one server, one database, one web application (and a few other services). While newRelics comprehensive approach to showing information is really compelling and perfectly serves the objective of complete SLA compliancy reporting, Ruxit’s “need to know” principle much more meets the needs of what I would expect form innovative cloud monitoring.

Considering Netflix’s philosophy from the beginning of this article, innovative cloud monitoring basically translates into: Every extra step is a burden. Every extra information on events without impact means extra OPS effort. And every extra-click to correlate different events to a probable common root-cause critically lengthens MTTR.

A “need to know” monitoring approach while at the same time offering full stack visibility of correlated events is – for me – one step closer to comprehensive Cloud-ready monitoring and DevOps.

And Ruxit really seems to be “spot on” in that respect!

 

Published by:

The “Next Big Thing” series wrap-up: How to rule them all?

What is it that remains for the 8th and last issue of the “Next Big Thing” blog post series: To “rule them all” (all the forces, disruptive challenges and game changing innovations) and keep services connected, operating, integrated, … to deliver value to the business.

A bit ago, I came upon Jonathan Murray’s concept of the Composable Enterprise – a paradigm which essentially preaches fully decoupled infrastructure and application as services for company IT. Whether the Composable Enterprise is an entire new approach or just a pin-pointed translation of what is essential to businesses mastering digital transformation challenges is all the same.

The importance lies with the core concepts of what Jonathan’s paradigm preaches. These are to

  • decouple the infrastructure
  • make data a service
  • decompose applications
  • and automate everything

Decouple the Infrastructure.

Rewind into my own application development and delivery times during the 1990ies and the 00-years: When we were ready to launch a new business application we would – as part of the rollout process – inform IT of resources (servers, databases, connections, interface configurations) needed to run the thing. Today, large IT ecosystems sometimes still function that way, making them a slow and heavy-weight inhibitor of business agility. The change to incorporate here is two-folded: On the one hand infra responsibles must understand that they need to deliver on scale, time, demand, … of their business customers (which includes more uniform, more agile and more flexible – in terms of sourcing – delivery mechanisms). And on the other hand, application architects need to understand that it is not anymore their architecture that defines IT needs but in turn their architecture needs to adapt to and adopt agile IT infrastructure resources from wherever they may be sourced. By following that pattern, CIOs will enable their IT landscapes to leverage not only more cloud-like infrastructure sourcing on-premise (thereby enabling private clouds) but also will they become capable of ubiquitously using ubiquitous resources following hybrid sourcing models.

Make Data a Service.

This isn’t about BigData-like services, really. It might be (in the long run). But this is essentially about where the properties and information of IT – of applications and services – really is located. Rewind again. This time only for like 1 or 2 years. The second last delivery framework, that me and my team of gorgeous cloud aficionados created, was still built around a central source of information – essentially a master data database. This simply was the logical framework architecture approach back then. Even only a few months – when admittedly me and my then team (another awesome one) already knew that information needs to lie within the service – it was still less complex (hence: quicker) to construct our framework around such a central source of (service) wisdom. What the Composable Enterprise, though, rightly preaches is a complete shift of where information resides. Every single service, which offers its capabilities to the IT world around it, needs to provide a well-defined, easy to consume, transparently reachable interface to query and store any information relevant to the consumption of the service. Applications or other services using that service simply engage via that interface – not only to leverage the service’s capabilities but even more to store and retrieve data and information relevant to the service and the interaction with it. And there is no central database. In essence there is no database at all. There is no need for any. When services inherently know what they manage, need and provide, all db-centric architecture for the sole benefit of the db as such becomes void.

Decompose Applications.

The aforementioned leads one way into the decomposition pattern. More important, however, is to spend more thorough thinking about what a single business related activity – a business process – really needs in terms of application support. And in turn, what the applications providing this support to the business precisely need to be capable of. Decomposing Applications means to identify useful service entities which follow the above patterns, offer certain functionality in an atom kind-of way via well-defined interfaces (APIs) to the outside world and thereby create an application landscape which delivers on scale, time, demand, … just by being composed through service orchestration in the right – the needed – way. This is the end of huge monolithic ERP systems, which claim to offer all that a business needs (you just needed to customize them rightly). This is the commencing of light-weight services which rapidly adopt to changing underlying infrastructures and can be consumed not only for the benefit of the business owning them but – through orchestration –form whole new business process support systems for cross-company integration along new digitalized business models.

Automate Everything.

So, eventually we’ve ended at the heart of how to breath life into an IT which supports businesses in their digital transformation challenge.

Let me talk you into one final example emphasizing the importance of facing all these disruptive challenges openly: An Austrian bank of high reputation (and respectful success in the market) gave a talk at the Pioneers about how they discovered that they are actually not a good bank anymore, how they discovered that – in some years’ time – they’d not be able to live up to the market challenges and customers’ demands anymore. What they discovered was simply, that within some years they would lose customers just because of their inability to offer a user experience integrated with the mobile and social demands of today’s generations. What they did in turn was to found a development hub within their IT unit, solely focussing on creating a new app-based ecosystem around their offerings in order to deliver an innovative, modern, digital experience to their bank account holders.

Some time prior to the Pioneers, I had received a text that “my” bank (yes, I am one of their customers) now offers a currency exchange app through which I can simply order the amount of currency needed and would receive a confirmation once it’s ready to be handed to me in the nearest branch office. And some days past the Pioneers I received an eMail that a new “virtual bank servant” would be ready as an app in the net to serve all my account-related needs. Needless to say that a few moments later I was in and that the experience was just perfect even though they follow an “early validation” policy with their new developments, accepting possible errors and flaws for the benefit of reduced time to market and more accurate customer feedback.

Now, for a moment imagine just a few of the important patterns behind this approach:

  • System maintenance and keeping-the-lights-on IT management
  • Flexible scaling of infrastructures
  • Core banking applications and services delivering the relevant information to the customer facing apps
  • App deployment on a regular – maybe a daily – basis
  • Integration of third-party service information
  • Data and information collection and aggregation for the benefit of enhanced customer behaviour insight
  • Provision of information to social platforms (to influence customer decisions)
  • Monitoring and dashboards (customer-facing as well as internally to business and IT leaders)
  • Risk mitigation
  • … (I could probably go on for hours)

All of the above capabilities can – and shall – be automated to a certain, a great extent. And this is precisely what the “automate everything” pattern is about.

Conclusion

There is a huge business shift going on. Software, back in the 80ies and 90ies was a driver for growth, had its downturn in and post the .com age and now enters an era of being ubiquitously demanded.

Through the innovative possibilities by combining existing mobile, social and data technologies, through the merge of physical and digital worlds and through the tremendously rapid invention of new thing-based daily-life support, businesses of all kind will face the need for software – even if they had not felt that need so far.

The Composable Enterprise – or whatever one wants to call a paradigm of loosely coupled services being orchestrated through well-defined transparently consumable interfaces – is a way for businesses to accommodate this challenge more rapidly. Automating daily routine – like e.g. the aforementioned tasks – will be key to enterprises which want to stay on the edge of innovation within these fast changing times.

Most importantly, though, is to stay focussed within the blurring worlds of things, humans and businesses. To keep the focus on innovation not for the benefit of innovation as such but for the benefit of growing the business behind.

Innovation Architects will be the business angels of tomorrow – navigating their stakeholders through an ongoing revolution and supporting or driving the right decisions for implementing and orchestrating services in a business-focussed way.

 

{the feature image of this last “The Next Big Thing” series post shows a design by New Jersey and New York-based architects and designers Patricia Sabater, Christopher Booth and Aditya Chauan: The Sky Cloud Skyscraper – found on evolo.us/architecture}

Published by:

What about Transparency?

If you need to seek for transparency, your provider failed.

Around September 25, AWS notified their valued customers of ongoing reboots of their EC2 infrastructure during the course of the upcoming weekend. The notifications always also stated: “You will not be able to stop/start or re-launch instances in order to avoid this maintenance update.” Hence, we – and many others, obviously – were forced to undergo this maintenance and prepare for any potential subsequent maintenance of their own following any possible failure during the reboot (which admittedly we were lucky not to have).

In an attempt to understand the root cause of this “scheduled” maintenance, we were able to discover some forum conversations such as this one: https://forums.aws.amazon.com/thread.jspa?threadID=161544&tstart=0

On Wednesday, Oct 1st, customers received an eMail notification subjected “Follow-up Note on Bash Security Issues from Last Week” which claimed that AWS “reviewed the security issues, known as CVE-2014-6271 and CVE-2014-7169, and determined that our APIs and backends were not affected“. More detailed explanations were linked into the eMail referring to an AWS Security Bulletin.

When digging a little further into the issue, I was able to discover this article (also dating September 25th).

At this very moment it is still unclear what really caused the host reboots affecting many EC2 customers, and while AWS did a very good job in sending target oriented information to those customers who are really affected by the reboot (rather than spamming everyone with the info), they failed completely in making it transparent to users why the reboots need to happen.

Security – ladies and gentlemen at AWS – is about transparency. First and foremost.

 

Published by:

(It’s) The End of the Cloud (as we know it)

“It’s the end of the world as we know it”, they say, “and I feel fine”. I do indeed. Things change. And change is always moving us and all around us. And in the best case it’s moving us forward.

So, let me start with a very moving (and maybe touching) statement:

 

“The Cloud has ended!
You can now stop talking about it.”

After about 5 years (if you think of the Middle-of-Europe geography – maybe some more in the US), the Cloud has finally come to its end. That’s great! Because eventually we can luckily move on to the really important things in our day-2-day IT life!

So – why is that? Why do I have the strong believe, that this hype is now finally over?

Some 5 years ago I participated in a panel discussion. And the main area it circulated on was whether Cloud can ever be secure or not – multi tenant or not – reliable or not – compliant or not. The panel discussion as such went quite well and eventually attendees of some real industry giants, like e.g. Siemens, for the first time felt assured that this was

 

The next Big Thing to Come …

And it was! Indeed! It remained to be perceived as the next big thing for the next – well – let’s say: 2 to 3 years.

That occasion back then was only one example – however – of how conversations went going forward. People lost themselves in arguments of security, compliance, controlability and segregatability of a cloudy, foggy and fuzzy monster which none could really grasp for a long time. Discussing these was just more easy than to understand the big advantages to come …

Still, it remained not only a hype but took its turn into our everyday life! Even more than that – Cloud Computing became the basis for what we today call the third IT revolution. And many talks today aim at spanning everybody’s perception regarding this revolution – one within our everyday lifes with IT being planted into any kind of small or big device – one within our industry lifes with production processes being shifted by 90° to create a complete new way of delivering applications – and eventually also a revolution for the big players as many of them might lose traction when remaining one foot on the platform, the other on the train for too long …

Soon after that panel discussion back in 2009, me and my then team entered the Windows Azure “Technology Adoption Program”. We created a first Cloud-based software distribution platform; you could compare it to those Systems Management architectures which have been very common for long in enterprise IT. By using Cloud patterns, we were able to address a broader reach and distribute more scalable and more flexible. Azure – back then – had its really major leaks and intensively improved down the path – together with fellow teams like ours who recommended changes and enhancements to the platform without end. Amazon – by that time – had already its place in the market. But what were they doing? Nothing more and nothing less than providing servers – more or less. Infrastructure-as-a-Service, it was called (question is: would you claim this to be disruptive today?).

Everybody talked about

 

The Big 3

Amazon, Microsoft and of course — ? — Google. Question: What’s one common thing of all 3 of them??

Exactly: All are US-based, spreading their capabilities over the world pretty quickly – backed by major investments – but still US-based. And every enterprise also back then knew: This was not going to work in compliancy to their respective country’s law or their anti trust regulatory.

So what remained the major discussion over years? Still? Security. Compliance. Controlability. Segregatability – still the same. Resentiments were omnipresent and the Big 3 kept having a hard time generating adoption of their brilliant new technologies (whereas admittedly Microsoft had the hardest time, because they leaked a $-rollin ecosystem such as brilliant search or brilliant book-sells).

Still, that Cloud-hype was not killable. Despite all concerns of private and public endavours, of small businesses and large enterprises, Cloud Computing remained and even strengthened its position as the basis for more to come: Technology ecosystems that evolved solely and purely because cloud took its turn into our everyday life and everyday enterprise IT.

And how couldn’t it?

How many of us are using eMail, some kind of file sharing facility, a social network of any kind? Anyone not using any remotely hosted collaboration product, authoring tools, design helpers – like the Adobe family – etc. etc. …

Who’s been using that within one’s work environment daily? For how long? 2 years, 3 years, …

We have all long ago started to enhance our day-2-day work experience by adding usefull little helpers into our IT-wise behaviour – sometimes without even noticing, into how far away premises we’re providing our data to. Haven’t we?

And many times we’ve all done that long before our enterprise IT provider offered us the same convenience, the same workforce enhancements from within our company’s premises. Either privately or even on our company PCs. Enterprise IT departments have been facing the worst time of their existence with sleepless nights for CIOs asking themselves the very same question ever and ever again: How the hell can I stop that Cloud thingy to happen to my enetrprise IT when it is so unsecure and uncompliant and at the same time, all my CxO’s are using it on their iPads? How can I adopt it without losing control completely.

Well – Ladies and Gentlemen – let me assure you: The nightmare has an end as it’s

 

The End of the Cloud as we know it!

(BTW: Ever had a look to a recent Gartner hypecycle and its positioning of “Cloud Computing”: They kept having it on the declinign edge for the last 2 years or so …) So I think we can rest assured: Our struggle is over.

But why? Why can a hype that big, that it managed to become the basis for all new and disruptive power in IT – social, mobile apps, data management and analytics – finally end?

Because it finally became utterly boring! Because it simply isn’t Cloud anymore, we’re talking about! Stop talking Cloud. Stop talking SaaS. It’s not retaining the importance it used to have. It has ceased to exist as a self-contained technology arguing its relevance by itself. It has ceased to exist on its own.

Hence – and here comes another good news – we’re now finally able to adopt this great new technology precisely for what we really neeed it: To accelerate and grow the businesses, which we – the IT guys amongst us – are bound to support. To integrate it into the devices we’re creating and stay as connected as needed. Or to accelerate our delivery into continuity. Or to simply consume the Services, we need to consume. Right now. Right as much and rapid as we need’em. And ultimately: To bridge the great IT systems we’ve all been managing for decades with the elasticity and scalability and accuracy of what was formerly called Cloud.

Let me talk you through a few examples:

  • Salesforce started in 2001. Claiming the decline of Software. They were facing the same concerns, the same weak adoption (at least in the beginning), the same hurdles for their entirely subscription based model. Today, Salesforce offers an ecosystem of products – even a platform to create your own: force.com – which are virtually friction-freely integrateable with your on-prem IT; mostly by just a few configurational mouseclicks. And a huge many enterprises trust Salesforce just enough to provide them one of their most critical business assets: their customer and opportunity data.
  • Or take user management: Many of today’s enterprises still struggle with providing their businesses a seamless login and authorization experience. At the same time, many roll out an identity provider which bridges on-prem IT systems with SaaS-provided systems and thereby offer a totally new single-sign-on experience not only within their on-prem IT. It’s a way of shifting – or even vanishing – the borders.
  • Or think of the most obvious of all examples: Infrastructure and Application Provisioning. Who has not yet introduced Virtualization in its IT? Every larger enterprise has, and the smaller ones leverage it from some out-of-prem providers. Finally, the technology that was formerly called “Cloud” has evolved into a level of maturity that it lives up to the promises of some earlier days: There’s means for you to control and manage your on-premise IT infrastructure seamlessly in a joint way with some Infrastructure provided – well – somewhere (I’m not stressing the word again).

However,

 

One Piece is Missing

One extensively important technology item that has the ability and purpose of glueing it all together: Automation. Or to be precise: Automation and Orchestration.

We are entering the era of Orchestration. We’re leaving the self-fullfilling technology-focussed island of pure “Cloud” behind us and can finally commence to create the really interesting stuff: By bridging our solidly designed and implemented IT systems and architectures with some even more solid, secure and reliable, scalable, elastic architectures, we are accelerating the business value of technology. We start to shift our thinking towards “Orchestration” and “Service”:

So, trust me: It’s the End of the Cloud as we know it! And we indeed can feel entirely fine about it. Security concerns have been addressed all the past years. Compliancy has been factored into vendor’s platforms – as has multi tenancy. And we are able to control where our data resides. Hence, the change can finally move on. We can stop talking “Cloud” as it became so boring as a standalone technology.

 

The Hype is Over

We’re entering the era of Orchestration – of Service Orchestration and entirely Service-based delivery. And Automation is the glue to make it happen. Automation is the glue between those valuable, well-defined, secure IT architectures and whole-new ecosystems of platforms – for the benefit of Service Orchestration.

So: Let us stop the “Cloud” talk. And start to

Automate to Orchestrate

 

 

Published by:

Mind: DevOps isn’t a role model!

This blog post (“How ‘DevOps’ is Killing the Developer”) airwaves a bit at the moment. It reached me right at some really cool, breakthrough conversations on how DevOps will lead change of culture and role perception … and it fully truly nailed the opposite of those highly positive talks. To say it in the words of one of the commenters: “I couldn’t disagree more!” I even would go as far as to consider it dangerous!

Why?

Because the post reflects a totally wrong perception of DevOps! The article claims that DevOps would transform the role and responsibility of a particular person – a developer in this case. I would be surprised if literature really postulates this – the change of a role. DevOps is the transformation of HOW things are done, not WHO does it. Firstly, you have to lay the basis for a DevOps company transformation. Do developers change their expertise by that? No. Do OPS guys do? Neither. BUT: They do get closer together, get better understanding for each others challenges.

Secondly: The post misses another highly important – maybe the most important – investment along with DevOps introduction: Automation! Along with the cultural change, you’ll have to invest in automation of processes for artefacts which would formerly have taken you days and weeks to create/setup/deploy/run.

90°

So – let’s be clear here: DevOps isn’t the change of a role! DevOps is a 90° turn of a modus operandi. The whole movement derives from manufacturing where the importance lies in getting rid of any blocker in a production pipeline. Neither would a continous production change the role of the screwmaster (to name just anything) nor would DevOps change that of a QA expert or buildmaster … or – well: developer (as exemplarly taken here)!

The article is dangerous in another aspect: It claims developers to concentrate on development and nothing else. It is – but – another important aspect of DevOps as a cultural tranformation: To bring understanding for everybody else’s responsibility in the process to everybody. And thereby encourage Automation even more to take its place in it all. This importance is totally missed out in the post!

Bottom line

Let’s be crystal clear on a few things with DevOps:

  • It’s a cultural and organizational change; not a role and responsibility change for single individuals
  • It is a 90° turn of a modus operandi. It turns vertical silos of responsibility and action into horizontal pipelines/chains of continous work-flow
  • It’s a way to create role and responsibility awareness throughout the whole chain of collaborating individuals
  • And it surfaces the need of Automation to support cultural and process transformation, stability, security, repeatability, speed, continuity, …

There’s – however – a really positive DevOps-supporting aspect in that post: It does indeed drive discussion into a good direction … just browse through the comments there … 😉

 

( This post was also published in the official Automic company blog: http://blog.automic.com/devops-not-a-role-model )

 

Published by:

3 things to consider when creating a new SaaS product – Automic blog post

You want to create a new product. At the same time you want to create a new delivery model – e.g. Software-as-a-Service (Saas) – for your new product.

Of course – these days – you also want to create a new pricing model for your product. And you want to increase delivery speed while maintaining product quality, of course, as high as it always was.

Ultimately you also want to keep the consistent flow of upgrades, patches, and hotfixes, for your existing enterprise product landscape intact.

Challenging? Yes. Impossible? No.

 

1. Deal proactively with the risk of technical debt

Creating something of this complexity within such a short timeframe can easily lead to artefacts being developed that play together well in the first place but never ever scale to the extent that an enterprise ready product would need to.

A clear architectural separation of concern between entities while at the same time keeping focus on a concise end-to-end architecture for all building blocks is key for avoiding technical debt right from the beginning.

One approach of achieving that focus is, of course, to invest heavily into upfront creation of architecture and design specifications.

However, this approach might just not serve the goal of a short time-to-market sufficiently.

Hence, the only way of maintaining the path and thereby reducing technical debt is to create just enough specs to form the boundaries within which a team of brilliant technologists can quickly develop the MVP – the minimal viable product – to be pushed out and at the same time stay focused on the path of the broader goal.

2. Be DevOps from the beginning

One might consider the creation of a new product within a new delivery model (like SaaS) to be just another product development.

Here at Automic we have a lean product development process in place based on agile patterns and already tailored to our customers’ needs with regards to fast change request and hotfix delivery.

However, approaching SaaS with this speed instantly surfaces the need of something new in that space. Hence – along with a concise architecture specification – you need to create not only a DevOps oriented tool chain but at the same time a DevOps culture between the involved organizational units.

DevOps – if implemented end-to-end – changes your delivery, maintenance and operations pipeline completely. Developers are challenged by the fact that their deliverables are instantly deployed to a QA system, test engineers change their focus from testing to end-2-end test automation in order to support automated production deployments and operations start to deal with an application centric rather than a system centric view onto their environments.

Setting the stage by creating a DevOps funnel from the very beginning is key to delivering not only the MVP but also its constant continuous enhancements.

3. reate a consistent architecture model of Automation and Orchestration

Having a solid enterprise ready SaaS-ified product in place is a major challenge in itself.  Creating a solid delivery framework of support services for operations and business processes clearly adds a significant level of complexity.

The cornerstone of this is a strong Automation layer defining its capabilities into clearly separated building blocks for the respective purposes (e.g. customer instance management, component packaging, user provisioning, etc.). Put them into the entities they clearly belong to.

Do not put capabilities (logic, functionality) into a building block or component that actually serves a different purpose. Create small functional entities within the Automation layer and orchestrate them into a support service for a well-defined purpose within the framework.

Holding these paradigms high during the minimal viable design process as well as during the rapid – somehow prototype-like – creation of the MVP will later allow you to decouple and re-bundle entities along the path of scaling your building blocks and your entire delivery framework. Of course, a strong Automation product tremendously eases achieving this goal.

Are you involved in creating a SaaS product? What have you learned from the experience? We’d be keen to get your thoughts in the comments below.

 

( This post was also published in the official Automic company blog: http://blog.automic.com/3-things-to-consider-when-creating-a-new-saas-product )

 

Published by:
SmileIT

Change “the Change”?

I’ve made up a little story.

The story is fictious. Characters in the story are fictious. Any resemblance to any existing persons are purely in the reader’s mind. Any resemblance with your company’s processes are mere coincidence.

Change?

Is it the process or is it communication?

Characters:

  • George: a very helpful infrastructure project manager located somewhere in Europe
  • Francis: an application manager located somewhere else in Europe
  • Olaf: an even more helpful datacenter operator geographically close to George
  • Dakshi: a very talented cloud operations engineer in/from India
  • Hans: a knowledgable senior cloud technician, European
  • Bob: the senior architect for infrastructure engineering (US)
  • Eveyln: the infrastructure engineering project manager (US, close to Bob)
  • cloud-team.operations@company.com: a support mailbox

On Feb, 4th Francis writes:

Hi Hans, Dakshi,

I currently have an issue you’ll probably able to solve easily.

Our company has signed a deal with a European government’s ministry to host an instance of our social enterprise application. For obvious reasons, that instance needs to be hosted in the customer’s country (not in our cloud)

As this project has a very tight schedule, and to be sure we can deliver in time, I need a dump of a few virtual machines from our existing V2 implementation so we can import them on our local VMWare infrastructure.

Machines needed are:

  • webserver01
  • dbserver01
  • accesssrv01
  • worker01
  • viewer01

I’m not sure what would be the best and fastest way to transfer them. A network transfer may be too long so maybe we can use a hard drive sent to us by UPS as we did for earlier transfers across the ocean.

I hope we can do this very quickly as the schedule is very tight, and I’m a bit late due to the time spent trying to fix the current V1 issues.

Best regards,

Francis


Let us briefly elaborate on the geographical circumstances of this request:

  • The datacenter running the V1 and V2 versions of the application under question is located in Europe
  • The customer is located in a different European country from the datacenter
  • The DC operations team is located in India

Dakshi is a very talented and quick operations guy, who picks up the request just 5 minutes later and responds willingly to export the requested VMs but being in need of a contact geographically close to the cloud DC to x-plug (and deliver) a USB harddisk . For this he redirects to Hans. Hans is known to know virtually everything which in certain organizations sometime means that Hans would be the one also doing everything – not so here: Hans redirects to George. George is a project manager located close to and responsible for activities in the DC in question, hence supposed to be best choice to coordinate the HDD x-plug and delivery process.

* FULLSTOP *


On Feb, 13th Francis writes:

Hi George,

I’m Francis, in charge of our social enterprise application. We’ve been working with our cloud team on the company’s “DriveSocial” project (V1 and V2). I’m writing to you on behalf of Hans. Our company has signed a deal with a local government’s ministry to host an instance of our social enterprise application. For obvious reasons, that instance needs to be hosted in the customer’s country.

As this project has a very tight schedule, and to be sure we can deliver in time, I need a dump of a few virtual machines from our application’s V2 implementation so we can import them on our local VMWare infrastructure.

Machines are:

  • webserver01
  • dbserver01
  • accesssrv01
  • worker01
  • viewer01

Can you please have them dumped and sent on a hard drive to:

DC1GER – Markus Verdinger, Sackstrasse 240, 99999 Praiss, Germany

The latest snapshot of the virtual machines, even though from last week would be perfect.

Thank you very much in advance,

Francis


A few minutes later the same day, Hans (in a supportive manner) shares a diagram of the V2 implementation with everybody; included is a detailed directive how to discover the right VMs within the “complicated jungle” of virtual organizations and virtual appliances [2(!) ORGs; 5(!) vApps].

George, the project manager and always eager to exactly specify the right activities, now kicks off a conversation about which server to plug the harddisk into – which ultimately involves Bob, the infrastructure architect.

On Feb, 13th Bob writes clarifyingly:

George,

They are looking for physical jump server in the EU1DC rack 0815.  On the rack elevation diagram, you are looking for CWRWTs001 in RU99.

Looks like these UCSs have the old US names on them.  We need to review the names for the UCSs in Rack 0815 and make the corrections.

@Evelyn, please set up meeting to talk about and correct this issue.  We should double check EU2DC too.

Thanks. Bob


Evelyn confirms instantly. Case closed for today.

  • Total #mail: 11
  • Total #mail today: 8

The next day passes by with Georges proactive attempts to get the right HDD into the right server. He’s supported by Olaf who gives regular live reports from the actual situation in the DC (i.e. precisely explaning which HDD is plugged into which server and asking for admittance to x-plug).

On Feb, 15th Geroge writes:

Hans,

Can you let us know who will be baring the costs for sending this HDD over to Praiss, Germany?

cheers G


Hans in all honesty responds not to know this and directs back to Francis “coram publico” assuming, that the cost question will for sure be no issue here as it all is about a very important customer and a very urgent request.

For the first time in our story things become “a bit complicated” here, as George has to ask admittance to book costs for x-plugging a HDD, putting it into an envelope and filing it with UPS. The representative of the respective delivery organization for this customer kicks in and asks whether sales has a budget (and PO) for this effort. George suggests to use a cloud development PO for the sake of simplicity, Hans again suggests – for the sake of even more simplicity – to invoice the department responsible for the social enterprise application directly (as this will be the benefitting party in this whole story anyway).

On Feb 15th, late in the evening (after having patiently watched the emails on the matter so far), Francis writes:

Jesus Christ…

Please invoice our department

In any case this is still our comapny’s money – so what!

Best regards

Francis

… and George kicks off the task of x-plugging and exporting with Dakshi by — — — asking Hans to open a Change Ticket in the Change Management system for this activity!

… which now – for the first time – leaves Hans standing in complete awe and totally leaking to understand his involveness in the case (especially as Hans is a future oriented agile minded technician who disbelieves in the flexibility of traditional change processes based on ITIL; ITIL was great some 10 years ago – Hans believes, that this is the cloud era which asks for more rapid process definitions and especially executions – but that’s a different story …)


This is the last we hear from our story’s heros before the weekend begins …

  • Time elapsed: 11 days
  • Total #mail: 34
  • Total #mail today: 17

On the following Monday, Hans and George spend some time (writtenly and verbously) to clarify how to rightly kick off such an activity and George (who asked for a change ticket just a few days ago) suggests, that the right way would be to engage with the cloud operations team in India directly. … Wait. … Directly with India. … I need to scroll up to the beginning here … Wasn’t that what Francis … did …

On Feb, 18th late in the evening, George writes:

Hi Dakshi,

could you please give me a feedback of the status regarding the transfer of the desired files?

Thanks and Best Regards

George

Dakshi now asks his colleague Abhu to act on the request (giving information on which VMs needed), Abhu asks Prahti to start the export. And Prahti re-queries the right ORG and vApps. Wait. … I need to scroll up again … Didn’t Hans … provide this very same information … directly to … Dakshi; well – it’s only copy-paste anyway and Hans is known to have an everlasting rapidly-searchable email-archive. Info delivered. Now to Prahti.

It’s Monday, Feb 18th.

  • Time elapsed: 14 days
  • Total #mail: 41
  • Total #mail today: 7

* FULLSTOP *


On Feb, 25th, Francis writes:

Hi Hans,

Sorry to disturb you but do you know if the virtual machines hard drive has been sent to our DC? I just had a call with them and apparently they had nothing delivered yet.

Best regards,

Francis

Hans in all honesty responds not to have any new status and Francis redirects his question to George. George in turn asks Dakshi. Dakshi confirms to have started the export but reports, that he had issues with some of the VMs (their export causing high load on the servers, which is why he did (does) not want to continue in order not to disturb productive environments). This leads Francis to ask the blunt question why it wouldn’t be possible to just use latest backups of the very same VMs.

cloud-team.operations@company.com thereafter confirms that backups can be restored to a seperate location and the export can then be started. Wait. … ? … Let’s contemplate briefly on why a restore … … …

On Feb, 26th, Prahti writes:

Hi Francis,

We have exported following machines to the attached hard drive:

  • webserver01
  • dbserver01
  • accesssrv01
  • worker01
  • viewer01
  • loadbalancer01

Prahti


  • Time elapsed: 22 days
  • Total #mail: 49
  • Total #mail today: 2

* FULLSTOP *


On Mar, 15th, Francis writes:

Hello Geroge and Hans,

I just had our DC people on the phone, and they’re still waiting for the VMs. It’s been one month now. Do you have a status on this please?

Best regards,

Francis

George, some hours later, replies:

Hi Hans, Hi Francis,

sorry for the delay, but there was no change ticket in place.

But in this case the colleagues will do an unconventional approach. However, the arrival of the disk at the agreed shipping address is expected during next week only.

Best Regards

Geroge

* FULLSTOP *

On Mar, 27th, Francis writes:

Hi George,

I’ve been contacted by the DC, and they’re still waiting for your shipment to be delivered. Do you have any update on this please?

Best regards,


After this we lose track and the emails trickle away.

  • Time elapsed: 51 days
  • Total #mail: 53
  • Total #mail today: 1

* FULLSTOP *



On June, 6th, Francis writes:

Good afternoon,

Our DC has mounted the machines and it appears you did not provided the right machines. Actually, you sent us virtual machines that belongs to another client and that are not even running WindowsServer.

Beside the fact that it is unacceptable to get the data from another client, the delay introduced by this makes our company at risk with that very important client (government ministry for employment).

For the record, virtual machines needed to be cloned, copied onto a USB drive and sent to our DC are:

  • webserver01
  • dbserver01
  • accesssrv01
  • worker01
  • viewer01
  • loadbalancer01

The client’s DC address is: DC1GER – Markus Verdinger, Sackstrasse 240, 99999 Praiss, Germany

Please, fix ASAP.

Regards,

Rewind to start …


* FULLSTOP *

Questions:

  • Where is the leak of communication?
  • Where is the process leaking clarity?
  • What could have been done by whom to improve the result of this operation/request?
  • Who should consider their job attitude?
  • How would ITIL support such a case?
  • Would this be possible to happen in your company? Why not?
Published by:
%d bloggers like this: