The Smile-IT Blog » Blog Archives

Tag Archives: cloud computing

SmileIT.DE

Nach der Cloud: Was kommt jetzt?

Transkript meiner Keynote zum Post-Cloudalen Zeitalter. Slides hier zum Download.

Haben Sie sich schon immer mal gefragt, wo die Cloud begraben liegt? Bittesehr! Hier ist sie 😉

Cloud-Grave

This is where the Cloud is burried!

Ich habe mich ja schon in einigen Publikationen mit der Sinnlosigkeit auseinandergesetzt, Cloud Computing zu diskutieren – nicht, weil die Cloud tatsächlich tot wäre (einen Talk damals im Jahr 2014 damit zu beginnen, war schon etwas – nunja – “frech”), sondern weil sie so allgegenwärtig ist, wie das Internet ansich – darüber diskutieren wir ja auch nicht mehr. Aber sehen wir uns die Geschichte einmal etwas genauer an:

Cloud Timeline 2016

Wussten Sie, dass Cloud Computing eigentlich auf das Jahr 1999 zurück geht? Damals wurde Salesforce gegründet – und hat mit Guerilla-Marketing die traditionelle CRM Szene ordentlich aufgemischt; der Erfolg gibt ihnen, denke ich, heute recht. In den Jahren danach wurde es dann doch erstmal etwas ruhiger – eine Zeit lang; doch ab etwa 2006 schossen Cloud Anbieter aus dem Nichts mit damals wohl noch ungeahnter Geschwindigkeit in die Höhe.

Heute diskutieren wir nicht mehr, ob es die Cloud gibt, sondern wie wir sie in unsere IT Strategie bestmöglich integrieren, welche Services wir sinnvoll nutzen können oder wie wir multiple Services und unser on-premise Rechenzentrum miteinander verbinden und effizient managen (RightScale hat da übrigens einen sehr vielversprechenden Ansatz mit deren Cloud Management Plattform). Und während man früher Logo-Landkarten für COTS (Commercial Off The Shelf) herzeigen konnte, zeigt man sie jetzt – wie hier – für z.B. SaaS.

Cloud-SaaS-Logos

SaaS Logos

Ich möchte noch einmal einen Schritt zurück gehen – auf die sattsam bekannte NIST Special Publication 800-145 – die Definition von Cloud Computing.

Cloud Computing: 5 essential characteristics

Cloud Computing: 5 essential characteristics

Dort lauten – wie im Bild oben dargelegt – die 5 essentiellen Charakteristiken einer Cloud:

  • On-demand self-service: Ich bekomme also etwas genau dann, wenn ich es brauche, und dann sofort
  • Broad network access: Ich erreiche den Service mittels eines hoch-performanten Netzwerkes – wir sprechen hier von der Provider-Anbindung, nicht notwendigerweise von meinem Mobil-Telefon in den Bergen
  • Resource pooling: Die Ressourcen der Cloud sind derart gemanaged, dass ein effizientes Miteinander mehrerer Konsumenten der Cloud möglich ist, und jeder immer das bekommt, was er gerade benötigt
  • Rapid elasticity: Dadurch sind Services so effizient skalierbar, dass sie sich elastisch an den Bedarf anpassen
  • Measured: Jeder Konsum eines Services wird exakt gemessen und der Konsum nachvollziehbar dem Nutzer berichtet

Sollten Sie in Ihrem Rechenzentrum – in Ihrer IT – ein Service haben, das die Geschäftsprozesse Ihres Unternehmens nach diesen Gesichtspunkten abbildet, dann haben Sie de facto eine Cloud. Überlegen Sie an dieser Stelle kurz: Haben Sie eine Cloud? Haben Sie ein Cloud Service? Mehrere? Woher? Selbst gebaut? Integriert? … ?

Wir brauchen über die Existenz von Cloud Computing in allen unseren Geschäftsanwendung nicht mehr sprechen (Sie würden sich übrigens wundern, welche Firmen in Österreich beispielsweise vom eigenen Exchange-Server bereits auf Office365 gewechselt sind …)

Reden wir also darüber, was nun auf uns zukommt!

Und da haben Analysten schon vor Jahren – etwa 2013 beginnend – Modelle vorgestellt, die die Cloud in einen größeren Gesamtzusammenhang setzen. Bei IDC hieß das Konzept “The Third Platform” – bei Gartner hieß es “The Nexus of Forces”.

The 3rd Platform (Source: IDC)

The 3rd Platform (Source: IDC)

Es gibt bestimmt noch andere. Eines ist allen diesen Konzepten gemeinsam: Sie postulieren die nächste Evolution in der IT und erklären diese durch das Zusammenwirken von 4 Kräften:

  • Cloud Computing
  • Mobility
  • Social Business / Social Media
  • und BigData/Analytics

Während das alles durchaus korrekt sein mag, glaube ich doch, dass diese Konzepte allesamt einen originären Fehler haben: Cloud Computing ist nicht Teil des Ganzen sondern die inherente Basis des gesamten Modells:

Cloud Computing as a basis to the 3rd Platform

Cloud Computing as a basis to the 3rd Platform

Lassen Sie mich an Hand der anderen drei Paradigmen ausführen, warum das so ist.

1: Social Transformation

Social Transformation - from Profiles to Data Feeds

Social Transformation – from Profiles to Data Feeds

Das Bild oben zeigt wie Facebook 2005 und 2011 ausgesehen hat und wie es heute aussieht. Sie sehen, dass der Fokus am Beginn auf dem Benutzer und dessen Profil, dessen Eigenschaften, dessen Vorlieben lag. Mark Zuckerberg wollte ursprünglich einfach die ganze Welt miteinander verbinden. Dass sich so eine bahnbrechende Idee – wenn sie funktioniert – natürlich perfekt durch Werbung auf der betreffenden Plattform finanzieren lässt, liegt auf der Hand (2011 schon deutlich sichtbar). Heute können Sie viel mehr mit dem sozialen Netzwerk und Ihren eigenen Informationen tun: Sie können sie zu zielgerichteter Information für andere Systeme oder Gruppen von Menschen nutzbar machen: Wenn 1000 Menschen an einer sonst flüssig befahrbaren Verkehrsader jammern, dass sie im Stau stehen, wird das wohl stimmen, und es lässt sich unter Umständen aus den “Social Feeds” der Autofahrer ableiten, was auf der betreffenden Strecke gerade schief läuft.

Und wo ist hier die Cloud? Unter der Plattform “Facebook” selbst, die in einem streng nach Cloud-Elastizitäts-Merkmalen aufgebauten Rechenzentrum läuft, sowie in angeschlossenen Data Lakes, die in der Lage sind, nicht nur derartige Informationen in großem Maß aufzunehmen, sondern eben auch in entsprechend kurzer Zeit auszuwerten.

Womit wir beim Thema …

2: Data Transformation

angelangt wären. Dazu – wie schon bereits in meinem mittlerweile doch schon ein paar Monate alten Whitepaper “The Next Big Thing” ausgeführt – die meiner Meinung nach bislang beste Definition von BigData:

“BigData summarizes the legal, social, technology, application and business dimension of the fact that through applications being consumed … a vast amount of information is generated and needs to be managed and efficiently used”

Ich habe zur Vereinfachung des Prinzips und seiner Auswirkungen versucht, auf einen Blick darzustellen, wie zukünftig mit Daten umzugehen ist:

BigData Transformation - from ETL to ELT

BigData Transformation – from ETL to ELT

Machen Sie sich keine Gedanken mehr am Beginn, was Sie am Ende mit den Daten anfangen wollen; sehen Sie, dass Sie der Daten habhaft werden, die sie für Ihr Geschäft benötigen könnten. In welcher Form (in welchem “Format”) auch immer. Eine Transformation zum Zwecke intelligenter Verknüpfung und Auswertung kann später folgen. Wir haben das früher so nicht gemacht, weil wir garnicht die Rechenleistungen bereit stellen hätten können, die nötig sind, um verschieden vorliegende Daten in Echtzeit zu konsolidieren und zu aggregieren. Und selbst ohne diese Fähigkeit waren unsere DWH-Systeme mitunter unwartbare Moloche, oder?

Die Cloud als Basis einer modernen BigData Architektur erlaubt die Elastizität und ad-hoc Rechenleistung, wenn sie zu Analyse-Zwecken benötigt wird.

Übrigens: Das bedeutet keineswegs, dass Sie sich gar keine Gedanken machen sollen, was Sie mit den mühevoll gesammelten Daten anfangen können – ich glaube sogar, dass diese Überlegung essentiell ist, um BigData strategisch effizient zu nutzen; ich glaube nur, dass diese Überlegung nicht mehr am Anfang stehen muss. Und dass dem so ist, ermöglicht uns Cloud Computing.

3: Mobile Transformation

Das ist einfach. Dazu reichen die folgenden beiden Bilder:

Old Mobile Phone

A very old mobile phone – some years maybe …

 

Mobile Payment - Mobile Banking

Mobile Payment – Mobile Banking

Zuerst ein Handy, das manche vielleicht noch kennen (es ist, sagen wir, etwa 10 Jahre alt, vielleicht ein bisschen mehr) – dann ein “etwas” Neueres mit implantierter Bankomatkarte (geht in Österreich seit Anfang Juni). Wo da die Cloud ist? Unter dem AppStore für die benötigten Mobile Apps, unter den Backend-Systemen für die Verknüpfung von Services, unter den Services selbst, die in Container-Technologie elastisch auf Benutzeranforderungen reagieren, …

Sie sehen also, Cloud Computing ist nicht eine “Force” in einem “Nexus”, sondern die Basis-Technologie schlechthin, die ein Zusammenwirken der anderen 3 Kräfte im 3rd-Platform-Modell überhaupt erst möglich macht.

Und was durch dieses Zusammenwirken erst möglich wird, lässt sich immer noch am Besten durch den Begriff

“Digital Business”

beschreiben. Ich möchte mich hier gar nicht mehr lang mit Begrifflichkeiten aufhalten (im oben erwähnten White Paper gibt es dazu bereits einiges nachzulesen, und über die Zeit haben sich – wie immer bei solch hochveränderlichen, innovativen Themen – Myriaden von “gscheiten” Menschen damit aufgehalten, was wie genannt werden darf oder muss (erinnern Sie sich nur an die viel-zitierte Behauptung, Industrie 4.0 dürfe nur in der deutschen Sprache verwendet werden, weil es eine Digitalisierungs-Initiative Deutschlands wäre).

Entscheidend ist nicht die Begrifflichkeit sondern das, worum es eigentlich geht: Eine Verbindung von Menschen, Systemen und Endgeräten (Dingen, Devices, Gadgets, …). Sich vor Augen zu führen, welche Möglichkeiten diese Verbindung, wenn intelligent umgesetzt, für uns bringt, bringt Digitalisierung überhaupt erst in Bewegung. In unsern Köpfen, unseren Innovationen – letztlich in unseren Unternehmen und im täglichen Leben.

Im Whitepaper “The Next Big Thing” erzähle ich am Ende ein paar kleine Geschichten; Szenarien, die illustrieren, was durch Digitale Transformation denkbar wird. Mittlerweile sind wir viel weiter als in diesen Geschichten. Teilweise sind die skizzierten Szenarien Realität, teilweise sind ganz neue Szenarien entstanden.

Connected Cars – Ist das heute Realität?

BMW-connected

BMW (2014) with connected-car information panel

Vielleicht noch nicht überall. Aber das hier gezeigte Bild entstammt einem Artikel aus 2014. Heute haben alle modernen Fahrzeuge eine In-Vehicle-Plattform, die es ermöglicht, von außen Informationen in das Fahrzeug einzuspielen. An der Nutzbarmachung dieser Möglichkeit für den modernen Verkehr wird gerade gearbeitet.

Oder SmartCity?

Sanatander ist eine von vielen Städten rund um den Erdball, in dem Digitalisierung und die Digitale Transformation Realität geworden sind. Während jedoch beispielsweise in Amsterdam auf der “Beacon Mile” gerade mal ein paar Information an vorbei”gehende” Smartphones verteilt werden können, hat Santander sein gesamtes City Management – von öffentlichem Verkehr, über Taxi, Luftqualität, Beleuchtung, Müllabfuhr, … und vieles mehr auf Digitalisierungs-Paradigmen im oben erwähnten Sinne (Mensch – System – Device) umgestellt. Hier ein Film, der die disruptive Veränderung dieser Initiative für Santander näher erklärt:

Und im Hintergrund arbeitet eine Infrastructure Cloud, die die Integration all dieser Prozesse ermöglicht.

Übrigens: 2 Beispiele aus Wien:

  1. Die SmartCity Strategie der Stadt Wien verfolgt genau die selben Ziele wie Santander
  2. Und in Wien Neubau arbeitet ein Unternehmen gemeinsam mit der Stadt an einer in Kürze produktiv gehenden Multi-Mobilitäts-App, die es ermöglicht, aus mehreren Verkehrsangeboten das für die jeweilige Route beste zu wählen. Können Sie heute bereits als Labor-App nutzen. Laden Sie sich’s einfach vom AppStore runter; sie heißt “WienMobil LAB”.

 

Und wo ist die Cloud jetzt?

Broadband Affordability 2014

Cost of Broadband Subscription as a Percentage of Average Yearly Income, 2014

Allgemein werden für all diese Digitalen Transformationen 4 Treiber genannt:

  • Breit verfügbare Internet-Verbindung (siehe Bild, oben)
  • Hohe Akzeptanz des Mobil-Telefons, eigentlich des Smartphones
  • Niedrige Kosten von Sensoren
  • Und in großem Maß verfügbare Rechenleistung

Da letzteres (nämlich: Rechenleistung) selbstredend in einem Cloud-Modell bereit gestellt wird, beantwortet sich die Frage nach der Allgegenwärtigkeit von Cloud Computing eigentlich von selbst, oder?

Security & Privacy

Security - Privacy - Control - Multi Tenancy

Security – Privacy – Control – Multi Tenancy

Abschließend noch zwei Wort an alle, die die Digitale Transformation als (persönliche oder unternehmerische) Gefahr empfinden: “Vergessen Sie’s!”

Warum?

Nun – lassen Sie mich das mit einer kleinen Geschichte aus den Anfangszeiten von Cloud Computing in Mitteleuropa beantworten: Im Jahr 2009 war ich zu einer Konferenz als Speaker eingeladen, in welcher Microsoft und Siemens die Potentiale von Cloud Computing gemeinsam zum Thema gemacht und diskutiert haben; damals hat ein sonst großartiger Kollege von Microsoft Multi-Mandanten-Fähigkeit leichtfertiger Weise mit Datensicherheit und Verschlüsselung verwechselt. Ohne den Irrtum auszubügeln, schlitterte sein Talk in eine Diskussion um die Unmöglichkeit, Cloud im Industrie-Bereich anzuwenden, weil es doch unmöglich sei, die eigene Unternehmens-IP vor der Konkurrenz zu schützen. Diese Art der Diskussion blieb uns über Jahre auf einschlägigen Events erhalten. Hat sie die Cloud aufhalten können? Nein.

Wenn Sie möchten, diskutieren wir gerne in weiterer Folge die Elemente

  • Sicherheit
  • Privatsphäre
  • Daten-Kontrolle
  • und Multi-Mandanten-Fähigkeit

Doch tun wir es bitte auf inkludierender Basis. Schließen wir neue Chancen, neue Dienstleistungen, Alltagserleichterungen nicht aus, weil wir Angst vor einer Verletzung obiger Werte haben, sondern schließen wir diese oben genannten Überlegungen mit ein in die Möglichkeiten, die sich uns auftun – und tun wir das auf Basis der Forderung nach größerer Transparenz!

Wenn ich weiß, was von wem wofür mit meinen Daten gemacht wird, dann werde ich sie gerne – kontrolliert – bereit stellen; weil ich nämlich dadurch einen Vorteil erfahre – in meinen eigenen Prozessen, in meinem eigenen beruflichen und privaten Alltag!

Published by:

Automation and Orchestration – a Conclusion

This post is part of the "Automation-Orchestration" architecture series. Posts of this series together comprise a whitepaper on Automation and Orchestration for Innovative IT-aaS Architectures.

 

Automation and Orchestration are core capabilities in any IT landscape.

Traditionally, there’d be classical on-premise IT, comprised of multiple enterprise applications, (partly) based on old-style architecture patterns like file exchange, asynchronous time-boxed export/import scenarios and historic file formats.

At the same time, the era of the Cloud hype has come to an end in a way that Cloud is ubiquitous; it is as present as the Internet as such has been for years, and the descendants of Cloud – mobile, social, IoT – are forming the nexus for the new era of Digital Business.

For enterprises, this means an ever-increasing pace of innovation and a constant advance of business models and business processes. As this paper has outlined, automation and orchestration solutions form the core for IT landscapes to efficiently support businesses in their striving for constant innovation.

Let’s once again repeat the key findings of this paper:

  • Traditional “old style” integration capabilities – such as: file transfer, object orientation or audit readiness – remain key criteria even for a cloud-ready automation platform.
  • In an era where cloud has become a commodity, just like the internet as such, service centered IT landscapes demand for a maximum of scalability and adaptability as well as multi-tenancy in order to be able to create a service-oriented ecosystem for the advancement of the businesses using it.
  • Security, maximum availability, and centralized management and control are fundamental necessities for transforming an IT environment into an integrated service center supporting business expansion, transformation, and growth.
  • Service orchestration might be the ultimate goal to achieve for an IT landscape, but system orchestration is a first step towards creating an abstraction layer between basic IT systems and business-oriented IT-services.

Therefore, for IT leaders, choosing the right automation and orchestration solution to support the business efficiently might be the majorly crucial decision to either become a differentiator and true innovation leader or (just) remain the head of a solid – yet: commodity – enterprise IT.

The CIO of the future is a Chief Innovation (rather than “Information”) Officer – and Automation and Orchestration both build the core basis for innovation. What to look at in getting to the right make-or-buy decision was the main requirement for this paper.

 

Published by:

Managing Tenants in Automation

This post is part of the "Automation-Orchestration" architecture series. Posts of this series together comprise a whitepaper on Automation and Orchestration for Innovative IT-aaS Architectures.

 

Another decision one needs to make during the selection process is whether the automation platform needs to support multi tenant and/or multi client capability. How you choose can have a significant financial impact.

Multi tenancy versus multi-client

Multi tenancy is closely related to the Cloud. Although not strictly a cloud pattern, multi tenancy has become one of the most discussed topics in application and service architectures due to the rise of Cloud Computing. That is, because multi tenancy eventually enables one of the essential cloud characteristics, namely: virtually endless scaling and resource pooling.

Multi-tenancy

Multi tenancy partitions an application into virtual units. Each virtual unit serves one customer while being executed within a shared environment together with the other tenants’ units. It doesn’t interact or conflict with other virtual units nor can a single virtual unit remove all resources from the shared environment in case of malfunction (resource pooling, resource sharing).

Multi-client

In contrast, a multi-client system is able to split an application into logical environments by separating functionality, management, object storage, and permission layers. This enables setting up a server that allows logons by different users with each user having their separate working environment while sharing common resources – file system, CPU, memory. However, in this environment, there remains the possibility of users impacting each other’s work.

Importance of Multi tenancy and Multi Client

These concepts are critical because of the need to provide separated working environments, object stores, automation flows for different customers or business lines. Therefore, one should be looking for an automation solution which supports this capability out-of-the-box. In certain circumstances you may not require strict customer segregation or the ability to offer pooling and sharing of resources out of one single environment. This clear differentiation might become a cost-influencing factor in certain cases.

Independent units within one system

Whether your automation solution needs to be multi-tenant or not depends on the business case and usage scenario. Normally in enterprise environments having major systems running on-premises, multi-tenancy is not a major requirement in an automation solution. Experience shows that when automation systems are shared between multiple organizational units or are automating multiple customers’ IT landscapes in an outsourcing scenario, multi-tenancy isn’t required since management of all units and customers is controlled through the central administration and architecture.

Multi-client capabilities, though, are indeed a necessity in an enterprise ready automation solution, as users of multiple different organizations want to work within the automation environment.

Multi-client capabilities would include the ability to:

  • Split a single automation solution instance into up to 1,000++ different logical units (clients)
  • Add clients on demand without downtime or without changing underlying infrastructure
  • Segregate object permission by client and enable user assignment to clients
  • Segregate automation objects and enable assignment to specific clients
  • Allow for automation execution delegation of centrally implemented automation workflows by simply assigning them to the specific clients (assuming the specific permissions have been set)
  • Re-use automation artifacts between clients (including clear and easy to use permission management)
  • Share use of resources across clients (but not necessarily for secure and scalable resource pooling across clients; see differentiation above)

Segregation of duties

Having multiple clients within one automation solution instance enables servicing of multiple external as well as internal customers. This allows for quick adaptation to changing business needs. Each client can define separate automation templates, security regulations, and access to surrounding infrastructure. Having a simple transport/delegation mechanism between clients at hand, allows to implement a multi-staging concept for the automation solution

Published by:

Scalability in Automation

This post is part of the "Automation-Orchestration" architecture series. Posts of this series together comprise a whitepaper on Automation and Orchestration for Innovative IT-aaS Architectures.

 

Scalability criteria are often referred to as “load scalability”, actually meaning the ability of a solution to automatically adopt to increasing and decreasing consumption or execution demands. While the need for this capability is obviously true for an automation solution, there’s surely more that belongs into the field of scalability. Hence, the following aspects will be discussed within this chapter:

  • load scalability
  • administrative scalability
  • functional scalability
  • throughput

Load scalability

Back at a time where cloud computing was still striving for the one and only definition of itself, one key cloud criteria became clear very quickly: Cloud should be defined mainly by virtually infinite resources to be consumed by whatever means. In other words, cloud computing promised to transform the IT environment into a landscape of endlessly scalable services.

Today, whenever IT executives and decision makers consider the deployment of any new IT service, scalability is one of the major requirements. Scalability offers the ability for a distributed system to easily expand and contract its resource pool to accommodate heavier or lighter loads or number of inputs. It also determines the ease with which a system or component can be modified, added, or removed to accommodate changing load.

Most IT decisions today are based on the question of whether a solution can meet this definition. This is why when deciding on an automation solution – especially when considering it to be the automation, orchestration and deployment layer of your future (hybrid) IT framework – scalability should be a high priority requirement. In order to determine a solution’s load scalability capabilities one must examine the system’s architecture and how it handles the following key functions:

Expanding system resources

A scalable automation solution architecture allows adding and withdrawing resources seamlessly, on demand, and with no downtime of the core system.

The importance of a central management architecture will be discussed later in this paper; for now it is sufficient to understand that centralized engine architecture develops its main load scalability characteristics through its technical process architecture.

As an inherent characteristic, such an architecture is comprised of thoroughly separated working processes each having the same basic capabilities but – depending on the system’s functionality – acting differently (i.e. each serving a different execution function). A worker process could – depending on technical automation requirements – be assigned to some of the following tasks (see figure below for an overview of such an architecture):

  • worker process management (this would be kind-of a “worker control process” capability needed to be built out redundantly in order to allow for seamless handover in case of failure)
  • access point for log-on requests to the central engine (one of “Worker 1..n” below)
  • requests from a user interface instance (“UI worker process”)
  • workload automation synchronization (“Worker 1..n”)
  • Reporting (“Worker 1..n”)
  • Integrated authentication with remote identity directories (“Worker 1..n”)

At the same time, command and communication handling should be technically separated from automation execution handling and be spread across multiple “Command Processes” as well – all acting on and providing the same capabilities. This will keep the core system responsive and scalable in case of additional load.

Automation scalability

Scalable and high available automation architecture

The architecture described in the figure above represents these core differentiators when it comes to one of the following load change scenarios:

Dynamically adding physical/virtual servers to alter CPU, memory and disk space or increasing storage for workload peaks without downtime

With the above architecture, changing load means simply adding or withdrawing physical (virtualized/cloud-based) resources to the infrastructure running the core automation system. With processes acting redundantly on the “separation of concern” principle, it is either possible to provide more resources to run other jobs or add jobs to the core engine (even when physically running on a distributed infrastructure).

This should take place without downtime of the core automation system, ensuring not only rapid reaction to load changes but also high resource availability (to be discussed in a later chapter).

Change the number of concurrently connected endpoints to one instance of the system

At any time during system uptime it might become necessary to handle load requirements by increasing the number of system automation endpoints (such as agents) connected to one specific job instance. This is possible only if concurrently acting processes are made aware of changing endpoint connections and are able to re-distribute load among other running jobs seamlessly without downtime. The architecture described above allows for such scenarios where a separated, less integrated core engine would demand reconfiguration when adding endpoints over a certain number.

Endpoint reconnection following an outage

Even if the solution meets the criteria of maximum availability, outages may occur. A load scalable architecture is a key consideration when it comes to disaster recovery. This involves the concurrent boot-up of a significant number of remote systems including their respective automation endpoints. The automation solution therefore must allow for concurrent re-connection of several thousand automation endpoints within minutes of an outage in order to resume normal operations.

Administrative scalability

While load scalability is the most commonly discussed topic when it comes to key IT decisions, there are other scalability criteria to be considered as differentiating criteria in deciding on an appropriate automation solution. One is “Administrative Scalability” defined as the ability for an increasing number of organizations or users to easily share a single distributed system.[1]

Organizational unit support

A system is considered administratively scalable when it is capable of:

  • Logically separating organizational units within a single system. This capability is generally understood as “multi-client” or “multi-tenancy”.
  • Providing one central administration interface (UI + API) for system maintenance and onboarding of new organizations and/or users.

Endpoint connection from different network segments

Another aspect of administrative scalability is the ability of an automation solution to seamlessly connect endpoints from various organizational sources.

In large enterprises, multiple clients (customers) might be organizationally and network-wise separated. Organizational units are well hidden from each other or are routed through gateways when needing to connect. However, the automation solution is normally part of the core IT system serving multiple or all of these clients. Hence, it must allow for connectivity between processes and endpoints across the aforementioned separated sources. The established secure client network delineation must be kept in place, of course. One approach for the automation solution is to provide special dedicated routing (end)points capable of bridging the network separation via standard gateways and routers but only supporting the automation solution’s connectivity and protocol needs.

Seamless automation expansion for newly added resources

While the previously mentioned selection criteria for automation systems are based on “segregation,” another key decision criteria is based on harmonization and standardization.

An automation system can be considered administratively scalable when it is capable of executing the same, one-time-defined automation process on different endpoints within segregated environments.

The solution must be able to:

  • Add an organization and its users and systems seamlessly from any segregated network source.
  • Provide a dedicated management UI including those capabilities which is solely accessible securely by organization admin users only.

and at the same time

  • Define the core basic automation process only once and distribute it to new endpoints based on (organization-based) filter criteria.

The architecture thereby allows for unified process execution (implement once, serve all), administrative scalability and efficient automation.

Functional scalability

Functional Scalability, defined as the ability to enhance the system by adding new functionality at minimal effort[2], is another type of scalability characteristics that shall be included in the decision-making process.

The following are key components of functional scalability:

Enhance functionality through customized solutions

Once the basic processes are automated, IT operations staff can add significant value to the business by incorporating other dedicated IT systems into the automation landscape. Solution architects are faced with a multitude of different applications, services, interfaces, and integration demands that can benefit from automation.

A functionally scalable automation solution supports these scenarios out-of-the-box with the ability to:

  • Introduce new business logic to existing automation workflows or activities by means of simple and easy-to-adopt mechanisms without impacting existing automation or target system functions.
  • Allow for creation of interfaces to core automation workflows (through use of well-structured APIs) in order to ease integration with external applications.
  • Add and use parameters and/or conditional programming/scripting to adapt the behavior of existing base automation functions without changing the function itself.

Template-based implementation and template transfer

A functionally scalable architecture also enables the use of templates for implementing functionality and sharing/distributing it accordingly.

Once templates have been established, the automation solution should provide for a way to transfer these templates between systems or clients. This could either be supported through scripting or solution packaging. Additional value-add (if available): Share readily-tested implementations within an automation community.

Typical use-cases include but are not limited to:

  • Development-test-production deployment of automation packages.
  • Multi-client scenarios with well-segregated client areas with similar baseline functionality.

Customizable template libraries with graphical modeler

In today’s object and service based IT landscapes, products that rely on scripting are simply not considered functionally scalable. When using parameterization, conditional programming, and/or object reuse through scripting only, scaling to augment the automation implementation would become time-consuming, complex, and unsustainable. Today’s functionally scalable solutions use graphical modelers to create the object instances, workflows, templates, and packages that enable business efficiency and rapid adaptation to changing business requirements.

Throughput

Finally, consider the following question as a decision basis for selecting a highly innovative, cloud-ready, scalable automation solution:

What is the minimum and maximum workflow load for execution without architectural change of the solution? If the answer is: 100 up to 4-5 Mio concurrent jobs per day without change of principle setup or architecture, one’s good to go.

In other words: Scalable automation architectures support not only the aforementioned key scalability criteria but are also able to handle a relatively small scale of concurrent flows equally to an utterly large scale. The infrastructure footprint needed for the deployed automation solution must obviously adapt accordingly.

 

 

Published by:

Scaling: Where to?

Pushed by some friends – and at a fortunately discounted price – I finally gave in and made it to an AWS exam; so call me certified now. Just so.

Solutions-Architect-Associate

However, this has nothing to do with the matter in discussion, here: While awaiting the exam, I talked to some fellow cloud geeks and was surprised again by them confusing up and down and out and in again – as experienced so many times before; so::: this is about “Scaling”!

And it’s really simple – here’s the boring piece:

The Prinicple

Both scaling patterns (up/down and out/in) in essence serve the same purpose and act by the same principle: Upon explicit demand or implicit recognition the amount of compute/memory resources are increased or decreased. Whether this is done

  • proactively or reactively (see guidelines above)
  • automatically
  • or manually through a portal by authorized personal

is subject to the framework implementation possibilities and respective architecture decisions.

What’s up and down?

For the scale-up/-down pattern, the hypervisor (or IaaS service layer) managing cloud compute resources has to provide the ability to dynamically (expectedly without outage; but that’s not granted) increase the compute and memory resources of a single machine instance.

The trigger for scaling execution can either be implemented within a dedicated cloud automation engine, within the Scalability building block or as part of a self-service portal command, depending on the intended flexibility.

What’s in and out?

The same principles apply as for the scale-up/-down pattern; however on scaling out an additional instance is created for the same service. This may involve the following alternatives:

  • Create an instance and re-configure the service to route requests to the new instance additionally
  • Create an instance, re-configure the service to route requests to the newly created instance only and de-provision an existing instance with lower capacity accordingly

Both cases possibly demand for (automated) loadbalancer reconfiguration and for the capability of the application to deal with changing server resources.

Respectively, scale-in means to de-provision instances once load parameters have sufficiently decreased. An application on top has to be able to deal with dynamically de-provisioned server instances. In a scenario where the application’s data layer is involved in the scaling process (i.e. a DBMS is part of the server to be de-provisioned) measures have to be taken by the application to reliably persist data before shutdown of the respective resource.

And now – for the more funny part

It occured to me that two silly graphics could ease to remember the difference, hence I do invite you all to think of a memory IC as a little bug climbing up a ladder and in turn of computers bursting out a data center. Does that make it easier to distinguish the two patterns?

scale-up

Scaling up and down

scale-out

Scaling up and down

 

You think you don’t need to care?

Well – as a SaaS consumer you’re right: As long as your tenant scales and repsonds accurately to any performance demand – no worries. But as soon as things deviate from this, one’s in trouble in terms of finding the right application architecture when it is unclear whether you’re to scale resources or instances. So — remember the bug and the fleeing servers 🙂

 

Published by:

What is “trending” – anyway?

Source: Gartner (August 2015)

The report “Hype Cycle of Emerging Technologies” – every year’s desperately expected Gartner report about what’s trending in IT – has been out now for a few weeks. Time to bend over it and analyze the most important messages:

1. Evolution

Gartner continues to categorize technologies on the hype cycle by their model of “business eras” (see my post about last year’s Hype Cycle for more details on that). The technologies analyzed for this year’s report are claimed to belong to the last 3 stages of this model: “Digital Marketing”, “Digital Business” and “Autonomous”. Little has changed within the most important technologies supporting these changes:

  • “Internet of Things” is still on its peak
  • “Wearable User Interfaces” has obviously been exchanged by just the term “Wearables” (which makes total sense)
  • “Speech-to-Speech Translation” has advanced beyond its peak
  • “Autonomous Vehicles” is probably the currently most-hyped area around Digital Business

2. Revolution

However, there’s a significant change in the world of technologies to be seen this year: While the plateau of productivity was pretty crowded last year with all sorts of 3D, Analytics and Social stuff (like streams, e.g.), this year’s Hype Cycle doesn’t show much in that area. Which actually proves nothing less than us living in an era of major disruption. Formerly hyped technologies like “Cloud” have vanished from the graph – they’ve become commodity. New stuff like all-things digital, “Cryptocurrencies” or “Machine Learning” are still far from any maturity. So, it’s a great time for re-shaping IT – let’s go for it!

Still, besides that, there remain some questions:

  • Why is “Hybrid Cloud” not moving forward, while “Cloud” is long gone from the Hype Cycle and CIOs are mainly – according to experience with my customers – looking for adopting cloud in a hybrid way? Is there still too little offer from the vendors? Are IT architects still not able to consume hybrid cloud models in a sufficiently significant way? Personally, I suspect “Hybrid” to have further advanced towards productivity than is claimed here; it’s just not that much talked about.
  • Why has Gartner secretly dropped “Software Defined Anything” (it was seen on the rise last year)? All that can be found on this year’s Hype Cycle is “Software-Defined Security”. While I agree, that in low-level infrastructure design the trend of software-defining components co-addresses important aspects of security, “Software-Defined Anything” has a much broader breadth into how IT will be changed in the next couple of years by programmers of any kind and languages of many sorts.
  • IoT Platforms has been introduced newly. With a 5-10 years adoption time? Really? Gartner, i know businesses working on that right now; I know vendors shaping their portfolio into this direction at awesome pace. I doubt this timeframe thoroughly.

3. and More

What’s, though, really important with this year’s Hype Cycle is the concentration of technologies that address “biology” in any sense. Look at the rising edge of the graph and collect what’s hyped there. We got:

  • Brain Computer Interface
  • Human Augmentation
  • 3D Bioprinting Systems
  • Biochips
  • or Bioacoustic Sensing

Not to mention “Smart Robots” and “Connected Homes” … Technologies like these will shape our future life. And it cannot be overestimated how drastically this change will affect us all – even if many of these technologies are still seen with a 5-10 years adoption time until they reach production maturity (however: it wouldn’t be the first time that a timeframe on the Hype Cycle need revision after a year of increased insight).

 

While reading a lot of comments on the Hype Cycle these days, I also fell upon “the five most over-hyped technologies” on venturebeat.com: The author, Chris O’Brien, takes a humorous view on some of the “peaked” technologies on the graph (Autonomous vehicles, self-service Analytics, IoT, Speech-to-speech translation and Machine Learning) – and shares a couple of really useful arguments on why the respective technologies will not be adopted that fast.

I can agree with most of O’Brien’s arguments – however: while some of the things-based stuff invented might be of limited applicability or use (connected forks? huh?), the overall meaningfulness of what “Digital Business” will bring to us all is beyond doubt. The question – as so often before – is not whether we’ll use all that new stuff to come, but whether we’ll be educated enough to use it to our benefit … ?

If you got questions and opinions of your own on that – or if you can answer some of my questions above – please, drop a comment! 🙂

The input for this post, the “Gartner’s 2015 Hype Cycle for Emerging Technologies” is published in the Gartner Newsroom

Published by:
SmileIT

Evaluation Report – Monitoring Comparison: newRelic vs. Ruxit

I’ve worked on cloud computing frameworks with a couple of companies meanwhile. DevOps like processes are always an issue along with these cooperations – even more when it comes to monitoring and how to innovatively approach the matter.

As an example I am ever and again emphasizing Netflix’s approach in these conversations: I very much like Netflix’s philosophy of how to deploy, operate and continuously change environment and services. Netflix’s different component teams do not have any clue on the activities of other component teams; their policy is that every team is self-responsible for changes not to break anything in the overall system. Also, no one really knows in detail which servers, instances, services are up and running to serve requests. Servers and services are constantly automatically re-instantiated, rebooted, added, removed, etc. Such is a philosophy to make DevOps real.

Clearly, when monitoring such a landscape traditional (SLA-fulfilment oriented) methods must fail. It simply isn’t sufficient for a Cloud-aware, continuous delivery oriented monitoring system to just integrate traditional on-premise monitoring solutions like e.g. Nagios with e.g. AWS’ CloudWatch. Well, we know that this works fine, but it does not yet ease the cumbersome work of NOCs or Application Operators to quickly identify

  1. the impact of a certain alert, hence its priority for ongoing operations and
  2. the root cause for a possible error

After discussing these facts the umpteenth time and (again) being confronted with the same old arguments about the importance of ubiquitous information on every single event within a system (for the sake of proving SLA compliancy), I thought to give it a try and dig deeper by myself to find out whether these arguments are valid (and I am therefore wrong) or whether there is a possibility to substantially reduce event occurrence and let IT personal only follow up the really important stuff. Efficiently.

At this stage, it is time for a little

DISCLAIMER: I am not a monitoring or APM expert; neither am I a .NET programming expert. Both skill areas are fairly familiar to me, but in this case I intentionally approached the matter from a business perspective – as least technical as possible.

The Preps

In autumn last year I had the chance to get a little insight into 2 pure-SaaS monitoring products: Ruxit and newRelic. Ruxit back then was – well – a baby: Early beta, no real functionality but a well-received glimpse of what the guys are on for. newRelic was already pretty strong and I very much liked their light and quick way of getting started.

As that project back then got stuck and I ended my evaluations in the middle of getting insight, I thought, getting back to that could be a good starting point (especially as I wasn’t able to find any other monitoring product going the SaaS path that radically, i.e. not even thinking of offering an on-premise option; and as a cloud “aficionado” I was very keen on seeing a full-stack SaaS approach). So the product scope was set pretty straight.

The investigative scope, this time, should answer questions a bit more in a structured way:

  1. How easy is it to kick off monitoring within one system?
  2. How easy is it to combine multiple systems (on-premise and cloud) within one easy-to-digest overview?
  3. What’s alerted and why?
  4. What steps are needed in order to add APM to a system already monitored?
  5. Correlation of events and its appearance?
  6. The “need to know” principle: Impact versus alert appearance?

The setup I used was fairly simple (and reduced – as I didn’t want to bother our customer’s workloads in any of their datacenters): I had an old t1.micro instance still lurking around on my AWS account; this is 1 vCPU with 613MB RAM – far too small to really perform with the stuff I wanted it to do. I intentionally decided to use that one for my tests. Later, the following was added to the overall setup:

  • An RDS SQL Server database (which I used for the application I wanted to add to the environment at a later stage)
  • IIS 6 (as available within the Server image that my EC2 instance is using)
  • .NET framework 4
  • Some .NET sample application (some “Contoso” app; deployed directly from within Visual Studio – no changes to the defaults)

Immediate Observations

2 things popped into my eyes only hours (if not minutes) after commencing my activities in newRelic and Ruxit, but let’s first start with the basics.

Setting up accounts is easy and straight forward in both systems. They are both truly following the cloud affine “on-demand” characteristic. newRelic creates a free “Pro” trial account which is converted into a lifetime free account when not upgraded to “paid” after 14 days. Ruxit sets up a free account for their product but takes a totally different approach – closer resembling to consumption-based pricing: you get 1000 hours of APM and 50k user visits for free.

Both systems follow pretty much the same path after an account has been created:

  • In the best case, access your account from within the system you want to monitor (or deploy the downloaded installer package – see below – to the target system manually)
  • Download the appropriate monitoring agent and run the installer. Done.

Both agents started to collect data immediately and the browser-based dashboards produced the first overview of my system within some minutes.

As a second step, I also installed the agents to my local client machine as I wanted to know how the dashboards display multiple systems – and here’s a bummer with Ruxit: My antivirus scanner alerted me with an Win32.Evo-Gen suspicion:

Avast virus alert upon Ruxit agent install

Avast virus alert upon Ruxit agent install

It wasn’t really a problem for the agent to install and operate properly and produce data; it was just a little confusing. In essence, the reason for this is fairly obvious: The agent is using a technique which is comparable to typical virus intrusion patterns, i.e. sticking its fingers deep into the system.

The second observation was newRelics approach to implement web browser remote checks, called “Synthetics”. It was indeed astonishingly easy to add a URL to the system and let newRelic do their thing – seemingly from within the AWS datacenters around the world. And especially with this, newRelic has a very compelling way of displaying the respective information on their Synthetics dashboard. Easy to digest and pretty comprehensive.

At the time when I started off with my evaluation, Ruxit didn’t offer that. Meanwhile they added their Beta for “Web Checks” to my account. Equally easy to setup but lacking some more rich UI features wrt display of information. I am fairly sure that this’ll be added soon. Hopefully. My take is, that combining system monitoring or APM with insights displaying real user usage patterns is an essential part to efficiently correlate events.

Security

I always spend a second thought on security questions, hence contemplated Ruxit’s way of making sure that an agent really connects to the right tenant when being installed. With newRelic you’re confronted with an extra step upon installation: They ask you to copy+paste a security key from your account page during their install procedure.

newRelic security key example

newRelic security key example

Ruxit doesn’t do that. However, they’re not really less secure; it’s just that they pre-embed this key into the installer package that is downloaded,c so they’re just a little more convenient. Following shows the msiexec command executed upon installation as well as its parameters taken form the installer log (you can easily find that information after the .exe package unpacks into the system’s temp folder):

@msiexec /i "%i_msi_dir%\%i_msi%" /L*v %install_log_file% SERVER="%i_server%" PROCESSHOOKING="%i_hooking%" TENANT="%i_tenant%" TENANT_TOKEN="%i_token%" %1 %2 %3 %4 %5 %6 %7 %8 %9 >con:
MSI (c) (5C:74) [13:35:21:458]: Command Line: SERVER=https://qvp18043.live.ruxit.com:443 PROCESSHOOKING=1 TENANT=qvp18043 TENANT_TOKEN=ABCdefGHI4JKLM5n CURRENTDIRECTORY=C:\Users\thome\Downloads CLIENTUILEVEL=0 CLIENTPROCESSID=43100

Alerting

After having applied the package (both packages) onto my Windows Server on EC2 things popped up quickly within the dashboards (note, that both dashboard screenshots are from a later evaluation stage; however, the basic layout was the very same at the beginning – I didn’t change anything visually down the road).

newRelic server monitoring dashboard

newRelic server monitoring dashboard showing the limits of my too-small instance 🙂

Ruxit server monitoring dashboard

The Ruxit dashboard on the same server; with a clear hint on a memory problem 🙂

What instantly stroke me here was the simplicity of Ruxit’s server monitoring information. It seemed sort-of “thin” on information (if you want a real whole lot of info right from the start, you probably prefer newRelic’s dashboard). Things, though, changed when my server went into memory saturation (which it constantly does right away when accessed via RDP). At that stage, newRelic started firing eMails alerting me of the problem. Also, the dashboard went red. Ruxit in turn did nothing really. Well, of course, it displayed the problem once I was logged into the dashboard again and had a look at my server’s monitoring data; but no alert triggered, no eMail, no red flag. Nothing.

If you’re into SLA fulfilment, then that is precisely the moment to become concerned. On second thought, however, I figured that actually no one was really bothered by the problem. There was no real user interaction going on in that server instance. I hadn’t even added an app really. Hence: why bother?

So, next step was to figure out, why newRelic went so crazy with that. It turned out that with newRelic every newly added server gets assigned to a default server policy.

newRelic's monitoring policy configuration

newRelic’s monitoring policy configuration

I could turn off that policy easily (also editing apparently seems straight forward; I didn’t try). However, to think that with every server I’m adding I’d have to figure out first, which alerts are important as they might be impacting someone or something seemed less on a “need to know” basis than I intended to have.

After having switched off the policy, newRelic went silent.

BTW, alerting via eMail is not setup by default in Ruxit; within the tenant’s settings area, this can be added as a so called “Integration” point.

AWS Monitoring

As said above, I was keen to know how both systems integrate multiple monitoring sources into their overviews. My idea was to add my AWS tenant to be monitored (this resulted from the previously mentioned customer conversations I had had earlier; that customer’s utmost concern was to add AWS to their monitoring overview – which in their case was Nagios, as said).

A nice thing with Ruxit is that they fill their dashboard with those little demo tiles, which easily lead you into their capabilities without having setup anything yet (the example below shows the database demo tile).

Ruxit demo tile example

This is one of the demo tiles in Ruxit’s dashboard – leading to DB monitoring in this case

I found an AWS demo tile (similar to the example above), clicked and ended up with a light explanation of how to add an AWS environment to my monitoring ecosystem (https://help.ruxit.com/pages/viewpage.action?pageId=9994248). They offer key based or role based access to your AWS tenant. Basically what they need you to do is these 3 steps:

  1. Create either a role or a user (for use of access key based connection)
  2. Apply the respective AWS policy to that role/user
  3. Create a new cloud monitoring instance within Ruxit and connect it to that newly created AWS resource from step 1

Right after having executed the steps, the aforementioned demo tiled changed into displaying real data and my AWS resources showed up (note, that the example below already contains RDS, which I added at a later stage; the cool thing here was, that that was added fully unattended as soon as I had created it in AWS).

Ruxit AWS monitoring overview

Ruxit AWS monitoring overview

Ruxit essentially monitors everything within AWS which you can put a CloudWatch metric on – which is a fair lot, indeed.

So, next step clearly was to seek the same capability within newRelic. As far as I could work out, newRelic’s approach here is to offer plugins – and newRelic’s plugin ecosystem is vast. That may mean, that there’s a whole lot of possibilities for integrating monitoring into the respective IT landscape (whatever it may be); however, one may consider the process to add plugin after plugin (until the whole landscape is covered) a bit cumbersome. Here’s a list of AWS plugins with newRelic:

newRelic plugins for AWS

newRelic plugins for AWS

newRelic plugins for AWS

newRelic plugins for AWS

Add APM

Adding APM to my monitoring ecosystem was probably the most interesting experience in this whole test: As a preps for the intended result (i.e.: analyse data about a web application’s performance at real user interaction) I added an IIS to my server and an RDS database to my AWS account (as mentioned before).

The more interesting fact, though, was that after having finalized the IIS installation, Ruxit instantly showed the IIS services in their “Smartscape” view (more on that a little later). I didn’t have to change anything in my Ruxit environment.

newRelic’s approach is a little different here. The below screenshot shows their APM start page with .NET selected.

newRelic APM start page with .NET selected

newRelic APM start page with .NET selected

After having confirmed each selection which popped up step by step, I was presented with a download link for another agent package which I had to apply to my server.

The interesting thing, though, was, that still nothing showed up. No services or additional information on any accessible apps. That is logical in a way, as I did not have anything published on that server yet which resembled an application, really. The only thing that was accessible from the outside was the IIS default web (just showing that IIS logo).

So, essentially the difference here is that with newRelic you get system monitoring with a system monitoring agent, and by means of an application monitoring agent you can add monitoring of precisely the type of application the agent is intended for.

I didn’t dig further yet (that may be subject for another article), but it seems that with Ruxit I can have monitoring for anything going on on a server by means of just one install package (maybe one more explanation for the aforementioned virus scan alert).

However, after having published my .NET application, everything was fine again in both systems – and the dashboards went red instantly as the server went into CPU saturation due to its weakness (as intended ;)).

Smartscape – Overview

So, final question to answer was: What do the dashboards show and how do they ease (root cause) analysis?

As soon as the app was up and running and web requests started to role in, newRelic displayed everything to know about the application’s performance. Particularly nice is the out-of-the-box combination of APM data with browser request data within the first and the second menu item (either switch between the 2 by clicking the menu or use the links within the diagrams displayed).

newRelic APM dashboard

newRelic APM dashboard

The difficulty with newRelic was to discover the essence of the web application’s problem. Transactions and front-end code performance was displayed in every detail, but I knew (from my configuration) that the problem of slow page loads – as displayed – lied in the general weakness of my web server.

And that is basically where Ruxit’s smartscape tile in their dashboard made the essential difference. The below screenshot shows a problem within my web application as initially displayed in Ruxit’s smartscape view:

Ruxit's smartscape view showing a problem in my application

Ruxit’s smartscape view showing a problem in my application

By this view, it was obvious that the problem was either within the application itself or within the server as such. A click to the server not only reveals the path to the depending web application but also other possibly impacted services (obviously without end user impact as otherwise there would be an alert on them, too).

Ruxit smartscape with dependencies between servers, services, apps

Ruxit smartscape with dependencies between servers, services, apps

And digging into the server’s details revealed the problem (CPU saturation, unsurprisingly).

Ruxit revealing CPU saturation as a root cause

Ruxit revealing CPU saturation as a root cause

Still, the amount of dashboard alerts where pretty few. While I had 6 eMails from newRelic telling me about the problem on that server, I had only 2 within Ruxit: 1 telling me about the web app’s weak response and another about CPU saturation.

Next step, hence, would be to scale-up the server (in my environment) or scale-out or implement an enhanced application architecture (in a realistic production scenario). But that’s another story …

Bottom line

Event correlation and alerting on a “need to know” basis – at least for me – remains the right way to go.

This little test was done with just one server, one database, one web application (and a few other services). While newRelics comprehensive approach to showing information is really compelling and perfectly serves the objective of complete SLA compliancy reporting, Ruxit’s “need to know” principle much more meets the needs of what I would expect form innovative cloud monitoring.

Considering Netflix’s philosophy from the beginning of this article, innovative cloud monitoring basically translates into: Every extra step is a burden. Every extra information on events without impact means extra OPS effort. And every extra-click to correlate different events to a probable common root-cause critically lengthens MTTR.

A “need to know” monitoring approach while at the same time offering full stack visibility of correlated events is – for me – one step closer to comprehensive Cloud-ready monitoring and DevOps.

And Ruxit really seems to be “spot on” in that respect!

 

Published by:

DevOps style performance monitoring for .NET

 

{{ this article has originally been published in DevOps.com }}

 

Recently I began looking for an application performance management solution for .NET. My requirements are code level visibility, end to end request tracing, and infrastructure monitoring in a DevOps production setup.

DotTrace is clearly the most well-known tool for code level visibility in development setups, but it can’t be used in a 24×7 production setup. DotTrace also doesn’t do typical Ops monitoring.

Unfortunately a Google search didn’t return much in terms of a tool comparison for .NET production monitoring. So I decided to do some research on my own. Following is a short list of well-known tools in the APM space that support .NET. My focus is on finding an end-to-end solution and profiler-like visibility into transactions.

New Relic was the first to do APM SaaS, focused squarely on production with a complete offering. New Relic offers web request monitoring for .NET, Java, and more. It automatically shows a component-based breakdown of the most important requests. The breakdown is fairly intuitive to use and goes down to the SQL level. Code level visibility, at least for .NET, is achieved by manually starting and stopping sampling. This is fine for analyzing currently running applications, but makes analysis of past problems a challenge. New Relic’s main advantage is its ease of us, intuitive UI, and a feature set that can help you quickly identify simple issues. Depth is the main weakness of NewRelic. As soon as you try to dig deeper into the data, you’re stuck. This might be a minor point, but if you’re used to working with a profiler, you’ll miss CPU breakdown as New Relic only shows response times.

net-1-newrelic

Dynatrace is the vendor that started the APM revolution and is definitely the strongest horse in this race. Its feature set in terms of .NET is the most complete, offering code level monitoring (including CPU and wait times), end to end tracing, and user experience monitoring. As far as I can determine, it’s the only tool with a memory profiler for .NET and it also features IIS web request insight. It supports the entire application life cycle from development environments, to load testing, to production. As such it’s nearly perfect for DevOps. Due to its pricing structure and architecture it’s targeted more at the mid to enterprise markets. In terms of ease of use it’s catching up to competition with a new Web UI. It’s rather light on infrastructure monitoring on its own, but shows additional strength with optional Dynatrace synthetic and network monitoring components.

net-2-dynatrace

Ruxit is a new SaaS solution built by Dynatrace. It’s unique in that it unites application performance management and real user monitoring with infrastructure, cloud, and network monitoring into a single product. It is by far the easiest to install, literally takes 2 minutes. It features full end to end tracing, code level visibility down to the method level, SQL visibility, and RUM for .NET, Java, and other languages, with insight into IIS and Apache. Apart from this it has an analytics engine that delivers both technical and user experience insights. Its main advantages are its ease of use, web UI, fully automated root cause analysis, and frankly, amazing breadth. Its flexible consumption based pricing scales from startups, cloud natives, and mid markets up to large web scale deployments of ten-thousands of servers.

net-3-ruxit

AppNetta‘s TraceView takes a different approach to application performance management. It does support tracing across most major languages including database statements and of course .NET. It visualizes things in charts and scatter plots. Even traces across multiple layers and applications are visualized in graphs. This has its advantages but takes some time getting used to it. Unfortunately while TraceView does support .NET it does not yet have code level visibility for it. This makes sense for AppNetta, which as a whole is more focused on large scale monitoring and has more of a network centric background. For DevOps in .NET environments however, it’s a bit lacking.

net-4-TraceView

Foglight, originally owned by Quest and now owned by Dell, is a well-known application performance management solution. It is clearly meant for operations monitoring and tracks all web requests. It integrates infrastructure and application monitoring, end to end tracing, and code level visibility on .NET, among other things. It has the required depth, but it’s rather complex to set up and obviously generates alert storms as far as I could experience. It takes a while to configure and get the data you need. Once properly set up though, you get a lot of insight into your .NET application. In a fast moving DevOps scenario though it might take too long to manually adapt to infrastructure changes.

net-5-foglight

AppDynamics is well known in the APM space. Its offering is quite complete and it features .NET monitoring, quite nice transaction flow tracing, user experience, and code level profiling capabilities. It is production capable, though code level visibility may be limited here to reduce overhead. Apart from these features though, AppDynamics has some weaknesses, mainly the lack of IIS request visibility and the fact that it only features walk clock time with no CPU breakdown. Its flash-based web UI and rather cumbersome agent configuration can also be counted as negatives. Compared to others it’s also lacking in terms of infrastructure monitoring. Its pricing structure definitely targets the mid market.

net-6-AppDynamics

Manage Engine has traditionally focused on IT monitoring, but in recent years they added end user and application performance monitoring to their portfolio called APM Insight. Manage Engine does give you metric level insight into .NET applications and transaction trace snap shots which give you code level stack traces and database interactions. However it’s apparent that Manage Engine is a monitoring tool and APM insight doesn’t provide the level of depth one might be accustomed to from other APM tools and profilers.

net-7-ME

JenniferSoft is a monitoring solution that provides nice real-time dashboarding and gives an overview of the topology of your environment. It enables users to see deviations in the speed of transactions with real time scatter charts and analysis of transactions. It provides “profiling” for IIS/.NET transactions, but only on single tiers and has no transaction tracing. Their strong suit is clearly cool dashboarding but not necessarily analytics. For example, they are the only vendor that features 3D animated dashboards.

net-8-JenniferSoft

Conclusion: There’s more buzz around on the APM space than a Google search would reveal on first sight and I did actually discover some cool vendors to target my needs; however, the field clears up pretty much when you dig for end-2-end visibility from code down to infrastructure, including RUM, any web service requests and deep SQL insights. And if you want to pair that with a nice, fluent, ease-of-use web UI and efficient analytics, there’s actually not many left …

Published by:

Synced – but where?

We had eventually setup our Office 365 (O365) tenant for eMail (read about that e.g. in the “Autodiscover” post) and, of course, wanted to leverage Sharepoint as well. My-sharepoint, either. And the “OneDrive for Business” Sync client (ODB) … whatelse.

It wasn’t without further ado that this was accomplished …

Setup

is very straight forward, indeed. Go to your “OneDrive” in the O365 portal and click the “sync” link at the top of the page:

O365 Sync Link, displayed

O365 Sync Link, displayed

Presuming, you got the Office on-premise applications installed on your PC, items will quickly commence showing up in your “OneDrive for Business” folder within the “Favorites” area of explorer.

Also, ODB is nice enough to offer you to jump to that folder by just clicking the little button in the bottom right corner of the confirmation dialog, that appears after having issued syncing:

O365 ODB Confirmation popup

O365 ODB Confirmation popup

Easy, isn’t it.

Sharing

Now, having files made accessible in Explorer, next thing would be to share them with others in your organization. ODB is nice in that, as well, as it offers you a “Share…” option in the Explorer context menu of ODB by which you’re able to launch a convenient “share’n’invite” browser popup with all necessary options:

O365 ODB Share Options

O365 ODB Share Options

Also that one is very straight forward in that

  • you just type in a name,
  • O365 will help you with auto-completion of known names,
  • you select whether people shall be able to edit or to only view items
  • you’re even able to create an “edit” or “view” link which will allow people to access items without dedicated invitation
  • etc.

So – no rocket science here. Users will easily be able to share their files with others. And once one’s done with sharing, invited colleagues will receive an eMail invite to the newly shared stuff which takes them into their browser and into O365 to actually see what’s newly shared with them.

Great!

And now …

Get that into your Windows Explorer!

Once, the necessary items were shared with every user of our tenant as needed, at least I was going right into my ODB sync folder in Explorer to await the newly shared files showing up. O-key, ODB takes a little while to sync it all (Hans Brender, Microsoft MVP for ODB, hence a real expert, wrote a great post on how syncing works in ODB). However, even waiting infinitely wouldn’t lead to us seeing any shared files. What we pretty quickly learned was, that the ODB sync client will – in its initial default setup – never ever sync anything that was shared with you. Only your own files will show up in Explorer. Period.

Makes no sense really for collaboration, does it? But …:

Here’s some solutions:

1. Access files that are shared only through your browser

Anything that has been shared with you, is accessible within your browser-based ODB access. Just click “shared with me” in the navigation on the left of the ODB portal and you’ll see it.

O365 ODB: Shared with me

O365 ODB: Shared with me

Pretty lame, though, for anyone that’s used to work from within the Explorer and not e.g. from the browser or any of the Office applications.

2. Create a teamsite for documents shared between multiple colleagues

O365 with its Sharepoint functionality, of course, does offer the ability to create a site – which also contains a document library. Documents put there are available for anyone with permissions on that site. Permissions can even be set on a more granular level (e.g. management-only for certain folders, team-A on team-A-folders only, etc.).

Navigating to that site’s document library offers you the same “sync” link possibility as with your own files (see screenshot above), i.e. in a few moments ODB will sync any file that’s eligible for you to be viewed or edited.

Nice. But what if creating umpteenth sites for all the different project and team setups within your company is just not what you want? Or what if managing all the various permission sets within one site is just beyond the acceptable effort for your IT team? There’s at least one more possibility that might help:

3. Sync your team-mates ODB document library

As you already know, every O365 user has its own ODB site which is invoked when one clicks to OneDrive in the O365 portal. When being invited to view or edit shared files and brought to the respective ODB site in your browser, you actually end up within the document library of someone else:

O365 ODB: Sync someone else's docs

O365 ODB: Sync someone else’s docs

Well — sync that! Just click to the “sync” link on top as described before and the ODB client on your PC adds another folder to the ODB folders in Explorer. And those will show exactly what has been shared with you from that library. Not 100% perfect maybe, as it leaves you with having to know “who” shared “what” with you, but still a possibility to work around having to create a teamsite or working from within the browser, only, if you don’t want to.

Anybody out there knowing other options how to conveniently add shared files and folders to the local ODB folder tree? Please share your insight in a comment!

P.S. – what about the doc-lib links?

If in case you do not want to go by the “sync” link in the ODB portal to invoke ODB synchronization but want to add libraries within your ODB sync client on your PC, right-click to the ODB tray icon (the little blue cloud – not the white one, that’s OneDrive, formerly aka “SkyDrive” ;)) and click “Sync a new library”. And here’s what to use for the syncing options discussed above:

  1. Your own ODB library: https://<company-name>-my.sharepoint.com/personal/<your-username>_<domain>_<TLD>/Documents (where <your-username> is what you’re called in that O365 tenant, e.g. johndoe, and <domain> and <TLD> is what your company is called in that tenant, e.g. contoso.com – in that case it would be “johndoe_contoso_com“)
  2. Teamsite: https://<company-name>.sharepoint.com/<name-of-shared-documents-library> (which is depending on the initial language at the initial site setup; e.g. “Freigegebene%20Dokumente” in case setup was done in German
  3. Someone else’s ODB library: https://<company-name>-my.sharepoint.com/personal/<that-person’s-username>_<domain>_<TLD>/Documents – i.e.: instead of using your name as described in (1) above, you’ve just to exchange that with the username of that other person who’s library you want to sync

But I think, how to format all those links correctly for being used in the ODB client’s “Sync a new library” dialog has been already discussed all around the place in multiple posts on the web, anyway.

 

Published by:

Are you outdated?

The Gartner Hype Cycle 2014 special report is out

So – here it is: Gartner’s assessment of emerging technologies for 2014. And it’s the first time in years, that I haven’t really anything substantial to requery with it. However, two things are worth mentioning:

Cloud Computing’s disillusionment

It’s “The End of the Cloud as we know it“, I said, recently. Gartner – in quite a similar way – sees The Cloud entering the trough of disillusionment with many signs of fatigue, partly accompanied by rampant “cloud washing” but also driven by many – if not all – vendors offering a Cloud Strategy although “many aren’t cloud-centric and some of their cloud strategies are in name only“. The early promises of massive cost savings are finally worn out for the benefit of more realistic advantages with a move into the cloud. And Gartner appreciates that Cloud continues to be one of the most hyped topics in IT history with organizations that develop a true cloud strategy focussing on the real benefits such as agility, speed, time to market and innovation.

Journey into the Digital Age

However, what’s far more important and interesting than the Hype Cycle itself is their publication of the “Journey into the Digital Age” which comes – according to Gartner – with 6 business era models. These models – alongside their respective driving technologies – characterize the focus and outcome of organizations operating within each of those eras. Dividing lines between them are

  • the “Web” before which the only relevant era was “Analog” characterized by CRM and ERP as the most important emerging technologies and
  • the “Nexus of Forces” (mobile, social, cloud and information) which seperates “Web” (as an era), “E-Business” and “Digital Marketing” from “Digital Business” and “Autonomous”

While the era of “Digital Marketing” is mostly what we see with innovative organizations these days, it is the last 2 eras that seperate the latter from the real innovators and the founders of the next age of IT (claimed by many to be called “Industry 4.0”):

  • Digital Business – mainly driven by how the “Internet of Things” changes the way to do business and to interact with customers – will be the era where our physical and virtual world will blur and businesses will adopt and mature technologies like 3D printing/scanning, sensor- or machine-to-machine-technologies or even cryptocurrencies (e.g. BitCoin). We should be watching out for the main innovators in the healthcare domain to show us the way into and through this era within the next few years.
  • Autonomous – to me – is the most compelling of those 6 business era models. According to Gartner it represents the final postnexus stage (which i.m.h.o. will change as evolution is ubiquitous and change is constant) and is characterized by organizations’ ability to “leverage technologies that provide humanlike or humanreplacing capabilities“. Enterprises having the capabilities to operate within this business era model will push innovative solutions of all kind, that allow normal day-2-day activity like driving cars, writing texts, understanding languages, assisting each other, … an automated – an autonomous – task.

When writing “Innovation doesn’t happen in IT” last year around the same time, I was overwhelmed by the fact, that we’re commencing to leave an age where IT was to be a discipline in itself. It is in these days, that we sense an even stronger move into IT being ubiquitous, the nexus of forces being felt in our every-day lifes and IT becoming servant of what’s really important.

I’m hoping for it being a humble servant!

 

(download the full Gartner Hype Cycle of Emerging Technologies Report here)

Published by:
%d bloggers like this: