System Center 2012 R2 Operations Manager (SCOM) can be a very in-depth
Monitoring solution owing to its ability to gain an insight of various areas of an infrastructure. SCOM has the ability to start helping you understand your environment from the physical hardware of your network switches, SANs, and physical servers right through to the operating systems, whether they be Unix, Linux, or Windows operating systems, and the applications running either within your data centers or externally in public cloud solutions such as Microsoft Azure.
Because of the wide-reaching and powerful solution that System Center 2012 R2 Operations Manager delivers, it is very important before implementing it to understand what the various roles and features of SCOM are as well as how they function and, more importantly, relate to the business and its needs.
Introducing Operations Manager roles
The following information will help you understand the various roles found within System Center 2012 R2 Operations Manager.
While in small or test environments you may consider installing all of the roles on a single server, large or production environments will often have the various roles spread across multiple servers for both performance and availability purposes.
How to do it…
SCOM has various roles that need to be deployed and configured within an implementation. A breakdown of these roles is given in the following sections to help you understand roles within SCOM.
The management server is the brains of SCOM. It is a role that coordinates management pack distribution, monitoring and rule application, as well as agent communication and the interface between the system and you, the admin, via the console.
Every deployment of SCOM will contain at least one management server, but adding more management servers will allow you to start to scale out the implementation for both performance and availability. When implementing the first management server, SCOM creates what is known as a management group. This can be seen as a control boundary allowing you to select which servers are managed by this implementation of SCOM and, if required, to implement multiple management groups, each with their own sets of management servers, for different purposes.
The operational database is the database backend used by the management servers for short-term storage of data and processing of information related to the management packs implemented within your deployment and their rules, monitors, and overrides, which is the system configuration.
Every management group requires one unique operational database.
Data warehouse database
The SCOM data warehouse consists of another SQL database but is used for long-term storage of data, the default period being 400 days. Data is written in parallel to the data warehouse while the data is simultaneously being written to the operational database, but over time the data in the data warehouse such as performance metrics is aggregated, rather than it being stored as raw data.
The data warehouse database is a required component for an SCOM management group, but it can be shared between different management groups, allowing for a centralized data warehouse to be implemented, providing you with a rolled-up and consolidated view of the health and performance of the different monitored areas of your environment.
The reporting server, while an optional extra, is highly recommended as this is the role that provides access to the reporting features of SCOM. It requires a server with a dedicated SQL Server Reporting Services (SSRS) instance to be designated as the reporting server. SCOM requires a dedicated SSRS instance as it will modify its security to match that of the role-based access model used for SCOM, potentially removing access to any reports you might have previously had on the SSRS instance. It is not recommended or supported to use this SSRS instance for any purpose other than for SCOM reporting.
A gateway server is used in two main scenarios, as a way to bridge security boundaries and to act as an agent traffic management point. A gateway can be placed outside of the security boundary where the main management servers reside, such as within an isolated Demilitarized Zone (DMZ), workgroup, or domain environments, without trusts established.
A gateway within an Active Directory environment can communicate to agents and then act as the communication point from the untrusted environment back to the management group using certificates to secure the communication channel.
A gateway can also be used to manage non-domain joined devices and then all agents will communicate to the gateway using certificates, and the gateway in turn will communicate back to the management group via certificate-secured channels.
Agents can be set to communicate with the gateway instead of directly with the management servers. This is useful for low-bandwidth remote sites where instead of having multiple agents reporting data directly across the network link, they report their data to the gateway, which can then compress that data and send it in batches instead. The compression can be as much as 50 percent.
While we refer to agents reporting to a gateway server, the agents themselves have no concept of a gateway and just see it as a management server.
Web console server
SCOM offers users the ability to access a web-based version of the operator console using a Silverlight-rendered console. This role can either be deployed on a separate server or on an existing management server. However, it is worth noting that if installed on a separate server, a management role cannot be deployed to that same server after installation of the web console role.
Alongside the web-based operator console, the web portals for Application Performance
Monitoring (APM) are also deployed as part of the web console role. These consoles give access to the rich diagnostic and performance monitoring that is gathered for .NET and Java applications.
Audit Collection Services
Audit Collection Services (ACS) allows security events generated by audit policies applied to monitored systems to be collected to a central location for review and monitoring. When enabled within your environment, a service installed as part of the SCOM agent called ACS Forwarder will send all security events to the ACS Collector.
The ACS Collector is a role that is enabled on a management server and will then filter and write to the ACS Database any security events you define as being monitored. The ACS Database is a dedicated database for security events. Each ACS Collector will require its own individual database.
How it works…
System Center 2012 R2 Operations Manager uses either Agent-based or Agentless communication to collect data from servers and devices. Servers with agents will push this data to the management servers or gateways that they have been assigned to, while agentless managed servers and devices, such as network switches, will generally have their information pulled from management servers.
The flow of information and/or connection points around the infrastructure can be visually represented as follows:
SCOM uses a mechanism known as management packs to control what type of information is collected and how to react to this information. These management packs are XML formatted files that define rules, monitors, scripts, and workflows for SCOM to use and essentially tell it how an aspect of your infrastructure should be monitored.
Most of these management packs will come from the suppliers of the software and devices used within your infrastructure, but there is nothing to stop you from creating your own management packs to fill a gap in monitoring if you find one. You are also able to override predefined options within management packs to better tune.
While you’ve just been introduced to the main roles that will be encountered within almost all deployment scenarios of System Center 2012 R2 Operations Manager, there are a couple more, well, features rather than roles worth introducing. With the release of Service Pack 1 for System Center 2012, Microsoft introduced a new feature known as Global Services Monitoring (GSM).
GSM allows you to configure a watcher node outside of your organization utilizing Microsoft’s Azure platform, which can then be used to perform availability and performance monitoring of your externally accessible web-based application.
This allows you to gain a true 360 degree perspective on your environment with both internal monitoring happening from within your data center and a customer perspective from outside your network.
This information can then be surfaced through dashboards to see a visual representation of access to your services from different locations around the world.
Another feature introduced fully with the 2012 R2 release is System Center Advisor integration. System Center Advisor is a standalone cloud-based service that helps in the proactive monitoring of the configuration of infrastructure systems and provides suggestions in line with best practices.
At the time of writing this, Microsoft had a preview of the replacement for Advisor in testing named Azure Operational Insights. This allows configuration information from SCOM to be uploaded into the cloud service and for the data to be analyzed for different purposes such as capacity planning, change tracking, and security.
In versions of Operations Manager prior to 2012, there was a role known as the Root Management Server (RMS). This role was typically held by the first management server deployed into a management group and was responsible for running some distinct workflows such as AD assignment, notifications, and database maintenance.
This meant special attention was required when considering high availability with Failover Clustering and adding a layer of complexity. It also meant consideration of placement was required in relation to other components, such as the data warehouse or operator console access, owing to the SDK running on the RMS that scripts and consoles used to connect to SCOM.
The good news is this role requirement has been removed in the 2012 and later releases, but there is still a RMS Emulator (RMSE) component.
The RMSE is present to only provide backward compatibility for legacy management packs, which may still contain a workflow that specifically targets the RMS (for example, the Microsoft Exchange 2010 management pack). Most management packs, especially those from Microsoft, should now be re-released with the RMS requirement removed, but if you have any in-house management packs, it is recommended you check whether they are targeting the Root Management Server class instance(Target=“SC!Microsoft.SystemCenter.RootManagementServer”).
You can identify which management server in your environment is running the RMSE by using the Get-SCOMRMSEmulator PowerShell command.
Running that command will display which management server is currently responsible for hosting and running the RMSE. In the event of a failure, however, the RMSE role will not fail over to another server, mainly as it isn’t considered critical and should have limited impact on the environment.
If the RMSE does require to move to another management server in the event of a failure or just for proactive maintenance reasons, you can use this PowerShell command: Set-SCOMRMSEmulator.
To move the RMSE with a single PowerShell line, you would combine the command to get the details of the target management server with the Set command as in this example where the command would move the RMSE to a server named POSNCOM01:
Get-SCOMManagementServer –Name “PONSCOM01” | Set-SCOMRMSEmulator
The following are useful links to information related to the System Center 2012
Operations Manager roles and features:
Microsoft TechNet—About Gateway Servers in Operations Manager:
System Center Advisor: https://www.systemcenteradvisor.com/
Microsoft TechNet—Global Service Monitor: http://technet.microsoft.com/enus/
Understanding the business and its requirements
In addition to understanding the various roles and how they are installed, as well as the performance, capacity sizing, and availability requirements for System Center 2012 R2
Operations Manager, it is equally important to understand what the business requires from its monitoring solution.
Without getting a clearly defined set of requirements, you could run the risk of not implementing high availability on the roles in highest demand, not implementing them at all, or focusing on monitoring areas of the business that provide no value.
The following information should give you a good idea of the areas and questions that you can then take back to the business and seek answers from those involved in the decision making processes.
How to do it…
The following information will provide you with areas of thought when discussing the monitoring requirements with the business.
Availability/percentage uptime required
Are you mandated to provide a five nines (99.999 percent) service or in reality can you provide a 98 percent uptime service? Most organizations like the sound of a five nines service, but in reality when they see the costs and controls associated with obtaining this uptime, requirements are often re-thought.
Try gathering information regarding your key systems and their priority. Once ranked, work with the business to agree on individual uptime percentages for each application rather than as a whole, as some may be less critical to the business and therefore shouldn’t have the same amount of high availability and expense associated with them.
Rather than concentrating on only the time that an application should be up, again work with the business to correctly identify periods of time that the application is able to be taken out of service for planned maintenance. This can help maximize the percentage uptime by allowing you to schedule work around that application’s maintenance window and track the different types of downtime to provide accurate metrics defining unplanned downtime, which lowers uptime, and planned downtime.
Cost of downtime
The cost of downtime helps to get an understanding from the business with regard to what downtime of the application actually does cost the business.
Is it a financial loss such as a stock exchange or mining corporation may see if a critical system is down? Maybe, it’s a loss of productivity or reputation or a loss of life in the case of systems used within hospitals.
Whatever the cost of downtime may be, knowing this in advance as you start designing your monitoring solution will enable you to focus on priorities and develop targeted reports that can represent the costs, highlighting areas doing well or others that need investments.
Services within the monitoring scope
Alongside simply deciding to deploy agents to monitor servers, you must also consider what business services are within the scope of monitoring. As part of this, you need to ensure individual components (servers, network, applications, and so on) are accounted for and the solution scaled to support.
With these business services to be monitored, there also arises the questions regarding any specific SLAs for performance and availability that may need to be set up against the services, along with any reports that may be needed.
This requires you to take into account not only the scale but also the extra work involved in the creation and maintenance of your services.
Alongside knowing the cost of downtime, you should also know whether there are specific areas of the infrastructure that, if down, will cause business-specific financial penalties so that these can again be prioritized for monitoring.
Resource metering – showback/chargeback
In addition to ensuring that you are monitoring key systems that may cause expenses to the business if problems aren’t quickly identified, you may need to also capture areas within your business that earn revenue.
As multi-tenancy or even just the requirement to recoup costs from individual parts of the organization grows ever more important, you should start gathering information related to how much capital was expended on your infrastructure and how that can be equated to costs for individual resource usage of the components of that infrastructure.
Typically, you would assign costs to CPU, memory, storage, and networking utilization.
Not so much an area to gather specific information for, but to gain understanding from the business regarding at what level of utilization they require foresight into, for capacity planning and the purchasing of new equipment or redistribution of workloads.
For example, would the business like to know when the drive space is down to 20 percent or 40 percent of free space? Are they happy with the utilization of server memory at 80 percent or 95 percent?
Having this information on hand will help with the initial tuning of your new monitoring environment and the creation of any forecasting reports.
How it works…
By gathering information from the outset before implementing your SCOM design, capacity planning allows you to understand exactly what the business is trying to achieve and how the SCOM implementation can best achieve that.
For example, if the business has no requirements to monitor access to files and systems, then implementing the ACS roles may be a waste of resources better served elsewhere.
Again, if the business decides it has 50 applications that require extensive distributed apps for creating and monitoring, then be sure to scale the number of management servers appropriately.
Another area to consider is other systems and their integration. For example, does information regarding NetFlow data from another system need to be fed into SCOM or does SCOM need to output information into a Service Desk tool such as System Center
2012 R2 Service Manager?
These interactions, along with normal notifications and other subscriptions, can again place load on the solution and must be taken into consideration.