| DBAmon |
|
Home |
Index / DBAmon Documentation |
| What DBAmon Monitors | Free Oracle Tool: orastat | Request Support | DBAmon Download | DBAmon Gadget
| |
| Note: While it may appear that DBAmon does not monitor as many metrics for MSSQL as
for other DBMS's, MSSQL actually does a good job of monitoring itself. See #5 below -- DBAmon leverages MSSQL alerts. |
| Category | Event | Metrics | What DBAmon Monitors | Metric Data Source | AutoFix Action Taken (If Any) Backup
| 1. Database Backup Age
| (AutoFix) BCKAGE_D
|
This Metric monitors The number of hours since the last successful D (Differential) or F (Full) backup is
measured. This metric will be measured for databases listed in the
DBC "Backup_Check_DBs:" parameter. If the
DBC "Backup_Age:"
number of hours is exceeded, then this event will occur.
| Data Dictionary Table: msdb..backupset
|
If specified, the DBC "Backup_Command:" is run.
| Availability
| 2. Instance Running
| INS
|
The DB is verified to be running and can be connected to.
| "net start" Is run to see which services are running +
Data Dictionary View: master..sysdatabases +
An ODBC connect is attempted
|
| Availability
| 3. Alerts
|
|
Any MSSQL Alert (SQL Studio Management -> SQL Server Logs) with a severity of 17 or higher will cause a DBAmon Critical Event.
the xp_readerrorlog procedure is used to read the log.
| System Proc: master..xp_readerrorlog
|
| Availability
| 4. Agent Running
|
|
A check is done to ensure that the SQL Agent is running.
| "net start"
|
| Availability
| 5. Transaction Log Full
| TLOG_PCT_FULL
|
The
DBC "T_Tlog_Full:"
parameter is used for the WARNING and CRITICAL thresholds. If the SQL Transaction Log:
|
DBCC sqlperf (logspace) + sp_helpdb
|
| Availability
| 6. Disk Drive Full
| DRIVE_SPACE
|
All disks which contain MSSQL DB Files are checked for capacity and freespace. If any are >= the
DBC "T_Disk_Full:" percent full, an event will occur.
| dbamon_mssql_di.exe + sp_spaceused
|
| Availability
| 7. Filegroup Full
| FILEGROUP_FULL
|
If ALL of the DB files in the file group DO NOT have MAX SIZE set to UNLIMITED, then
this check will take place. It will compare the TOTAL SIZE of all DB file to the TOTAL MAX SIZE
of all DB files. If the
DBC "T_FG_Full:" percent full is exceeded (warning
and critical threhsolds) are exceeded, a
DBA850 event will occur.
Again, this check will not occur
if ANY of the DB files are MAX SIZE UNLIMTED. In that case, the DRIVE FULL monitoring will be
used.
| sp_helpdb + sp_helpfilegroup
|
| Availability
| 8. Disk Potential Growth
| DISK_POTENTIAL
|
In the FILEGROUP check above, the POTENTIAL GROWTH for each DB file (the MAX size minus the CURRENT size)
is calculated for ALL DB file on ALL DISKS. If the total potential growth for ALL DB file, by disk, exceeds
the FREE SPACE for that drive, then a
DBA851 event will occur.
| sp_helpdb + sp_helpfilegroup
|
| Availability
| 9. DBMail Running OK
| DBMAIL_STATUS
|
A check is performed to endure that DBMail is working properly.
if the proper status is not detected, then a
DBA853 event will occur.
| msdb..sysmail_allitems
|
| Availability
| 10. Database SUSPECT PAGES
|
|
SQL Server will populate MSDB..SUSPECT_PAGES when a corrupt page is detected
in SQL 2005+. This check looks for rows in this table, which would indicate
DB CORRUPTION. The resulting
DBA856 event will occur in this case.
| msdb..suspect_pages
|
| Availability
| 11. Database Recovery Model Inconsistent With TLOG Backups
|
|
This metric checks for the condition where a SQL Database has been created with
RECOVERY MODEL set to FULL or BULK-LOGGED, but there are not any TLOG backups on
record (during the last 30 days). See
DBA519 for details.
|
SELECT name
FROM master.sys.databases
WHERE recovery_model_desc != 'SIMPLE'
AND is_read_only = 0
AND name NOT IN (
SELECT DISTINCT
database_name
FROM [msdb].[dbo].[backupset]
WHERE type = 'L'
AND backup_start_date > DATEADD(dd, -30, GETDATE()) )
|
| Management
| 12. TLog Size vs. DB Size Ratio
| TLOG_RATIO
|
If it is found for any MSSQL DB that the SQL Transaction Log is at least 1gb in size
and at least 5 times the total size of all Database files, a
DBA656 event will occur.
| sp_helpdb
|
| Management
| 13. Zero Non-System Databases
|
|
If a MSSQL instance is found to contain ZERO non-system databases, a
DBA663 event will occur.
| master..sysdatabases
|
| Management
| 14. OFFLINE Database(s) Found
|
|
If at least 1 database is OFFLINE, a
DBA804 event will occur.
| An ODBC connect at attempted
|
| Management
| 15. SQL Jobs Running
|
|
It is sometimes necessary in a SQL Instance to ensure that certain jobs are always running. An example
may be a SQL Distribution agent job, etc. By specifying the new
DBC "MSSQL_DDP_Job_Cat_MBR:"
and
DBC "MSSQL_DDP_Job_Name_MBR:"
parameters, you can specify the JOB CATEGORY or JOB NAME of jobs that should always be running.
if a job that is configured in this way is NOT running, then a
DBA351 event will occur.
| sp_help_job
|
| Management
| 16. SQL Job Failure
|
|
If the most recent exection of an Enabled SQL Job ended with a Failed status, the
DBA354 event will occur. This
check will occur if the
DBC "MSSQL_Job_Fail_Check:" parameter
is set to Y.
| sp_help_job
|
| Management
| 17. Hung SQL Backups
|
|
If a SQL backup is found to be executing for at least 24 hours, an event will occur. This is usually
due to a backup process being HUNG. In this case a
DBA854 event will occur.
| master..sysprocesses
|
| Management
| 18. MSSQL DB Ownership
|
|
This may be a matter of opinion, but it is (or can be) a poor practice for a production MSSQL database
to be owned by an individual. If that person were to be removed from DNS, some functionality would be
lost, as well as the obvious security concern when a person changes jobs, etc. The remedy is to have
all DBs owned by SA or some other generic ID (which you can specify), which is a good practice.
DBAmon checks all DBs to see that they are either owned
by SA or by a userid specified in the
DBC "MSSQL_DB_OK_Owners:" parameter.
If any DBs are found not owned by SA or the userids that you specify in the MSSQL_DB_OK_Owners:
DBC parameter, a
DBA353 event will occur. Corrective DDL is supplied in
the event log.
| sp_helpdb
|
| Management
| 19. MSSQL Job Ownership
|
|
This may be a matter of opinion, but it is (or can be) a poor practice for a production MSSQL job
to be owned by an individual. If that person were to be removed from DNS, some functionality would be
lost, as well as the obvious security concern when a person changes jobs, etc. The remedy is to have
all Jobs owned by SA or some other generic ID (which you can specify), which is a good practice.
DBAmon checks all Jobs to see that they are either owned
by SA or by a userid specified in the
DBC "MSSQL_DB_Job_Owners:" parameter.
If any Jobs are found not owned by SA or the userids that you specify in the MSSQL_Job_OK_Owners:
DBC parameter, a
DBA352 event will occur. Corrective DDL is supplied in
the event log.
| sp_helpjob
|
| Management
| 20. SQL Log String Checking
|
|
It is possible to tell DBAmon to check for a string of your choice in the SQL Log, and then to set
a threshold on the number of occurrences of this string.
The
DBC "MSSQL_String_Check:" parameter.
Look at the desciption of this DBC parameter (URL above) for more information.
| xp_readerrorlog
|
| Performance
| 21. *ANY* MSSQL or Server Performance Counter
|
|
Performance monitoring for SQL server and the servers that it runs on is
simple because of DBAmon's ability to monitor any Windows Performance Counter .
This Windows feature allows the OS or any underlying program to expose performance data
as a Performance Counter. Do a Google search for more information. With DBAmon
it is possible to take reading from any Performance Counter and to set a threshold.
A DBA852 event will occur if a
counter sample exceeds the threshold that you have set.
| *ANY* MSSQL or Windows Performance Counter.
|
| Performance
| 22. Known BAD DB Options
|
|
If it is found that any DBs have options set which are known to cause performance
problems (now just AUTOCLOSE and AUTOSHRINK), the a new
DBA855 event will occur.
| DATABASEPROPERTYEX
|
| Performance
| 22. "I/O Delay" Messages in SQL Log
|
|
Any messages with the text "I/O Delay" in the SQL Log indicate that the is I/O slowness. A
DBA665 Performance event will occur in this case.
| xp_readerrorlog
|
| Performance
| 24. "A significant part of SQL Memory has been paged out" Messages in SQL Log
|
|
Any messages with the text "A significant part of SQL Memory has been paged out" in the SQL Log indicate that the is I/O slowness. A
DBA670 Performance event will occur in this case.
| xp_readerrorlog
|
| Performance
| 25. TEMPDB Database File Count
| PERF_TEMPDB_FILE_COUNT
|
To prevent PAGE ALLOCATION contention, it is a good practice to have 1 TEMPDB Datbase File per usable CPU (CORE), up to
a maximum of 8. See the documentation for DBAmon (P)erformance event:
DBA857 for more information.
| sp_helpdb
|
| Performance
| 26. SQL Server Instance "Signal Wait" Percent
| PERF_SQL_SIGNAL_WAIT_PCT
|
SQL DMO's are queried to determine the current SIGNAL WAIT PERCENT. This basically is the percent of ALL wait time
which is spent WAITING FOR CPU. A high value here (over the 25% threshold) over a long period of time indicates a
need for additional CPUs. See DBAmon event:
DBA858 for more information.
| master.sys.dm_os_wait_stats
|
| Performance
| 27. SQL Server VLF (Virtual Log File) Count Per Database
|
PERF_SQL_VLF_COUNT
|
Having too many (you can set the threshold) VLFs can cause performance problems with SQL Updates/Inserts/Deletes and
with DB recovery at startup. A good article of VLFs can be found at:
Kimberly Tripp's BLOG Entry .
DBAmon simply counts the number VLF in each SQL database by running DBCC LOGINFO. The could of VLFs for each
DB is then compared to the
DBC "T_MSSQL_VLF_Count:" parameter. If the
number of VLFs is greater then or equal to the T_MSSQL_VLF_Count: value, then a
DBA859 event will occur. The text for this event
contains URLs to good internet articles that will show you how to reduce the number of VLFs.
| DBCC LOGINFO
|
| Performance
| 28. SQL Server Processes Blocked by Other Process
| PERF_SQL_BLOCKEDPROCSOTHERPROC
|
The value that SQL exposes to the performance counter for BLOCKED PROCESSES is misleading (and NOT THERE AT ALL for
SQL 2000). When an application uses MSSQL Parallel Processing (you know; MAXDOP), it is common for a process to be
waiting for its own parallel subprocesses. This metric better give a picture of SPIDs that are waiting for other
processes.
| select count(*) from master..sysprocesses (nolock) where blocked != 0 and blocked != spid
|
| Performance
| 29. SQL Server "Lock Pages In Memory" Setting
| PERF_SQL_LPIM_LOCKED_KB
|
It seems common that when we do a SQL install that we grant the server role to enable LOCK PAGES IN MEMORY (LPIM), but
for a variety of reasons it is not really enabled. A NON-ZERO value for this metric indicates that LPIM is enabled.
| select locked_page_allocations_kb from master.sys.dm_os_process_memory
|
| Performance
| 30. SQL Server "Lock Pages In Memory" Setting
| PERF_SQL_RUNNABLE_REQS
|
This metric shows the number of RUNNABLE requests. A non-zero value here indicates a possible CPU bottleneck.
| select count(*) from master.sys.dm_exec_requests where status = 'runnable'
|
| |
|---|