Categories: Projects

Remove Duplicate WordPress Posts in C# with WordPress API

While doing some QA on WordPress Poster, the program I wrote to post example T-SQL scripts and queries to my WordPress blog, I discovered that it duplicated many blog posts during testing.

Thankfully, I anticipated that something like this could happen during the development of WordPress Poster and made sure that any content that it posted had its  WordPress status set to ‘draft’.  Blog posts that are in ‘draft’ status, don’t appear on a WordPress site and are only available for WordPress administrators, so the duplicated blog posts weren’t viewable to website visitors.

Challenges

I recently edited a large number of posts in the WordPress backend and I wasn’t sure if the blog posts I edited were affected by this problem. The challenge was that I needed to delete the duplicate blog posts, retaining only the most recently edited version.

Solution

I resolved this issue with a short script that took the following approach:

  1. Back up WordPress Poster’s client-side blog post table.
  2. Identify duplicate blog posts by Post Id.
  3. Mark all but the most recently modified versions for deletion.
  4. .Connect to the WordPress API via WordPressSharp to delete the posts that were marked for deletion.

The Code

Below is the code I used to resolve this issue.  It is not intented to work for you, as I haven’t included any supporting code.  It is in C#.


// This code was added to resolve an issue where more than one blog post was being generated for a single 
// T-SQL script.  It physically deletes the extra WordPress blog post that was generated in error. The 
// version that's left on the server is the one that was most recently updated in the WordPress backend.  
using (var client = new WordPressClient (config))
{
 // Identify duplicated content, partitioning by SnippetId because BlogId was already a unique constraint.
 // Order By the last modified date, according to WordPress, descending, so that the version that was 
 // most recently edited in WordPress has record number 1 (rn=1), so this is the record we keep.

 // TO DO: This table needs to include only posts that this program cares about.  In other words, only 
 // posts that have Snippet Id's should be imported.  Until that change is in place, this might be a
 // problem again.
 // 
 // One way to reduce the possibility of duplicate Snippet Id's re-entering the database is to place a 
 // unique key constraint on the Post Name column, because that's the only part of the table that my
 // script presently retains when the post is in draft status.  Since Post Name is used in the URL to my 
 // pages, any pre-existing links to that page would be broken if it is updated, so I'm requiring that 
 // Post Name is updated in WordPress admin and will not be overwritten by WordPress Poster.
 sql = $@"SELECT PostId FROM 
  (SELECT *, rn=ROW_NUMBER()
  over (partition by SnippetId order by SnippetId ASC, PostModifiedDateTime DESC)
  from  CodeSnippets.[dbo].BlogPosts WHERE SnippetId is not null and PostStatus = 'draft' and BlogId ={blogid}) t1 where t1.rn > 1";
 records = db.ExecReader (sql);

 if (records.Count > 0)
 {
  // If this query returns records, backup the blog posts table
  sql = $"SELECT * INTO CodeSnippets.dbo.BlogPosts_{rundate.ToString ("yyyyMMdd_hhmm")} FROM CodeSnippets.dbo.BlogPosts";
  int i = db.ExecNonQuery (sql);

  // Physically look at SQL Server to ensure that the backup table was created and has content.
  Console.WriteLine ("Add breakpoint here and make sure the backup table was created");
 }

 foreach (IDataRecord record in records)
 {
  long postid = record.SafeGetInt64 (0);
  Console.WriteLine ($"Make sure that https://www.sevendaysofschema.com/?p={postid} exists");
  // WARNING: You are about to delete a post in WordPress.  Be very careful that you are 
  // not deleting content that you didn't want to.
  Console.WriteLine ("Add breakpoint here.");
  bool deletedpost = client.DeletePost (Convert.ToInt32(postid));
  // Make sure that the post you wanted to delete was the correct post. Once that's done, you can 
  // right click the breakpoints in Visual Studio and choose 'Disable Breakpoint' to allow the 
  // script to run without having to QA every deletion.  Remember that the most recent version of 
  // Visual Studio lets you re-enable the breakpoint at any time during execution by following the 
  // same steps.  That will let you confirm that the script hasn't gone rogue.
  Console.WriteLine ($"Make sure that https://www.sevendaysofschema.com/?p={postid} was removed");
  Console.WriteLine ("Add breakpoint here.");
 }
}

 

Lessons Learned

WordPress Poster’s reason for living is to post a single programmatically generated HTML blog post to WordPress for each of the T-SQL scripts in the archive of T-SQL scripts and queries on my development PC and in my Snippet Manager software.

Since there should only be one blog post devoted to each T-SQL script, this situation could have been identified and handled early on if I had taken the time to add unique constraints on WordPress Poster’s client-side blog post table.

This is something I originally considered but didn’t do, because the program imports all web content on my blog, including blog posts that don’t have a Snippet Id, like my ‘About Rick’ page. Obviously, key constraints can’t be placed on columns that can contain null values, which is what would need to happen in this case.

Another reason constraints weren’t being enforced on WordPress Poster’s Blog Posts table is because something was causing a key violation on the Blog Id column while importing blog posts from WordPress. This was strange because surely WordPress couldn’t export two records with the same BlogId. Alas, it could.

It turned out that WordPress  ‘auto-draft’ posts were attempting to insert into the table. These are the versions of blog posts that WordPress automatically saves while you’re editing the content in the WordPress control panel. Since these versions exist alongside the actual content, you’ll get a key violation on Post Id (and in my case, Snippet Id) if you attempt to import them into a table with a primary key or unique constraint set on those columns.

Ultimately,  the way to resolve this issue is to create a second table in WordPress Poster that will hold all of the content that this program doesn’t care about, such as my ‘About Rick’ page, or just exclude it altogether.  I’ve already excluded ‘auto-drafts’ from being imported into the client side WordPress Poster database.

Rick Bishop

Share
Published by
Rick Bishop

Recent Posts

C# System.Uri Class Examples

If you've needed to parse or construct a URI in C#, you've likely done it…

6 years ago

C# Basics – Access Modifiers

The second installment in my series of C# basics illustrates the proper use of access…

6 years ago

C# Coding Style

This page details the coding style I've adopted for C# code in my applications. You…

6 years ago

C# Basics – Inheritance

For the new C# section of my website, I wanted to post some notes that…

6 years ago

5 Reasons to Lock Down Your LinkedIn Profile

There are some pretty compelling reasons to lock down your LinkedIn account now. We bet…

6 years ago

LinkedIn is Ignoring Your Privacy Settings and You Paid Them to Do It

We bet you didn't know that your full name, picture, work history, and more may…

6 years ago