Dealing with Change Data Capture jobs for databases in Availability Groups

Problem

Having Change Data Capture enabled on databases that are part of an Availability Group, each time a failover happens (automatically or manually), we have to remember to make sure that the CDC capture and cleanup jobs are created and the capture job is started accordingly on the primary replica. Let's be honest, we are not going to remember (nor we want to) that after doing a system update and doing multiple failovers or after automatic failover we have some manual work to do. If we do forget about it, we most likely going to end up wtih some data loss in our CDC tables and a gap in anything that might rely on that data.



Solution

Lets first point out few things that we have to keep in mind while dealing with the problem that we are presented:
  • There might be more than one Availability Group in a Always On setup and each group might have primary role on different nodes.
  • When enabling CDC on a database, jobs are only created on the currently active node. This means that if AG fails to another node we won't have the capture and cleanup jobs.
  • When failing back to a node that already had the cdc enabled databases in the past, the jobs will be there but in a failed state and they have to be started again.
  • Capture job runs continuously and has to be started every time a failover happens.
  • Cleanup job runs on schedule, we just have to make sure it is created.
Keeping all the above in mind, I've created a stored procedure that needs to be scheduled on each node that is a part of our Always On setup:


CREATE PROCEDURE sp_cdc_hadr_watchdog
AS
BEGIN

SET NOCOUNT ON

DECLARE @database_name NVARCHAR(255)
DECLARE @SQL NVARCHAR(MAX)

DECLARE cdc_watchdog_dbs CURSOR FAST_FORWARD READ_ONLY 
FOR 
SELECT  D.name
FROM    sys.dm_hadr_availability_replica_states AS A
        JOIN sys.availability_replicas AS B ON B.replica_id = A.replica_id
        JOIN sys.availability_groups AS AG ON AG.group_id = A.group_id
        JOIN sys.availability_databases_cluster AS ADC ON ADC.group_id = A.group_id
        JOIN sys.databases AS D ON D.name = ADC.database_name AND D.is_cdc_enabled = 1
WHERE B.replica_server_name = @@SERVERNAME AND A.role_desc = 'PRIMARY'

OPEN cdc_watchdog_dbs

FETCH NEXT FROM cdc_watchdog_dbs INTO @database_name

WHILE @@FETCH_STATUS = 0
BEGIN

SET @SQL = '
USE ['+@database_name+'];

IF OBJECT_ID(''tempdb..#xp_results'') IS NOT NULL
EXEC (''DROP TABLE #xp_results'')

IF OBJECT_ID(''tempdb..#CDCJobs'') IS NOT NULL
EXEC (''DROP TABLE #CDCJobs'')

CREATE TABLE #CDCJobs(job_id          UNIQUEIDENTIFIER
                    , job_type        CHAR(8)
                    , job_name        NVARCHAR(255)
                    , maxtrans        SMALLINT
                    , maxscans        SMALLINT
                    , continuous      BIT
                    , pollinginterval SMALLINT
                    , retention       SMALLINT
                    , threshold       SMALLINT)

INSERT INTO #CDCJobs
EXEC sys.sp_cdc_help_jobs

DECLARE @job_id_capture UNIQUEIDENTIFIER
DECLARE @job_owner_capture sysname

DECLARE @job_id_cleanup UNIQUEIDENTIFIER

DECLARE @is_running INT

SELECT @job_owner_capture = SP.name, @job_id_capture = S.job_id
FROM #CDCJobs AS CJ
JOIN msdb.dbo.sysjobs AS S ON s.name = CJ.job_name
JOIN sys.server_principals AS SP ON SP.sid = S.owner_sid
WHERE CJ.job_type = ''capture''

SELECT @job_id_cleanup = CJ.job_id
FROM #CDCJobs AS CJ
WHERE CJ.job_type = ''cleanup''

IF @job_id_capture IS NULL
 EXEC sys.sp_cdc_add_job ''capture''

IF @job_id_cleanup IS NULL
 EXEC sys.sp_cdc_add_job ''cleanup''

IF @job_id_capture IS NOT NULL
 BEGIN
  CREATE TABLE #xp_results (job_id             UNIQUEIDENTIFIER NOT NULL,
        last_run_date         INT              NOT NULL,
        last_run_time         INT              NOT NULL,
        next_run_date         INT              NOT NULL,
        next_run_time         INT              NOT NULL,
        next_run_schedule_id  INT              NOT NULL,
        requested_to_run      INT              NOT NULL,
        request_source        INT              NOT NULL,
        request_source_id     sysname          COLLATE database_default NULL,
        running               INT              NOT NULL,
        current_step          INT              NOT NULL,
        current_retry_attempt INT              NOT NULL,
        job_state             INT              NOT NULL)

  INSERT INTO #xp_results
  EXEC master.dbo.xp_sqlagent_enum_jobs  0, @job_owner_capture, @job_id_capture 

  SELECT @is_running = XR.running
  FROM #xp_results AS XR

  IF @is_running = 1
   RETURN
  ELSE
   EXEC msdb.dbo.sp_start_job @job_id = @job_id_capture

 END'

EXEC (@SQL)

    FETCH NEXT FROM cdc_watchdog_dbs INTO @database_name
END

CLOSE cdc_watchdog_dbs
DEALLOCATE cdc_watchdog_dbs

It loops through all the databases that have cdc enabled and are a part of Availability Group that has "PRIAMRY" role. For each of those databases it gets the job information from sys.sp_cdc_help_jobs procedure to find out if the jobs is already created. If neither of jobs exist, they will be created by sys.sp_cdc_add_job procedure. Once made sure that capture job exists, procedure needs to check it's current state, to do that I've used dbo.xp_sqlagent_enum_jobs stored procedure - keep in mind that this extended procedure is undocumented and unsupported but was the best way I know to find the current state of a job. If the capture job was found to be running we simply do a RETURN and nothing happens, however if it was not dbo.sp_start_job is used to start it.

Comments

  1. it's give error...

    Msg 102, Level 15, State 1, Procedure sp_cdc_hadr_watchdog, Line 28 [Batch Start Line 0]
    Incorrect syntax near '];

    IF OBJECT_ID('tempdb..#xp_results') IS NOT NULL
    EXEC ('DROP TABLE #xp_results')

    IF OBJECT_ID('tempdb..#CDCJobs') IS NOT'.

    ReplyDelete
    Replies
    1. Check if the cursor FOR statement returns correct database names.

      Delete

Post a Comment

Popular posts from this blog

sp_QueryShortcuts - setup and how to use them