Migrating path aliases into Drupal 8 redirects: Part 2

 

A common problem for migrated Drupal 8 sites is that the URL scheme of content may have drastically changed. When that happens, you get broken links, bad search results, and loss of "SEO juice". In the last part, we took steps to remedy that situation. We installed the Pathauto and Redirect modules to provide pretty URLs as well as easily configurable redirects. We then started writing a new migration that will take in our Drupal 7 path aliases and use them to create new Drupal 8 redirects. 

But we're not finished yet. In this part, we'll write a new custom source plugin to extract the Drupal 7 node ID from the path alias, transform it into the Drupal 8 ID, then create the URI needed by the Redirect module. Finally, we'll run the migrations to preserve all our old URLs as 301 redirects.

So far, our migration looks like this:

id: deninet_node_redirects
migration_tags:
  - 'Drupal 7'
  - deninet
  - content
migration_group: deninet_urls
label: 'deninet node redirects'
source:
  plugin: d7_url_alias
  constants:
    slash: '/'
process:
  uid:
    plugin: default_value
    default_value: 1
  language:
    plugin: default_value
    source: language
    default_value: und
  status_code:
    plugin: default_value
    default_value: 301
  redirect_source/path: alias
  redirect_redirect/uri:
destination:
  plugin: entity:redirect

A lot of this was copied from the templates provided to us by the Pathauto and Redirect modules. We also did a few things specific to our use-case:

  • Changed the iddescription, and added a migration_group.
  • We removed the rid field, as we are creating new redirects.
  • We removed the redirect_source/query as we didn't need it for a redirect.
  • We set the uid, language, and status_code fields to static values.
  • Critically, we set the Drupal 8 redirect_source/path to the Drupal 7 alias, mapping our incoming URL.

The one field that remains for us to migrate is redirect_redirect/uri, or, the target URL of the redirect. Sounds easy, right? Well...there's a problem.

For our migration source, we're using the d7_url_alias plugin. This plugin provides the incoming, user-facing URL in the alias field, and maps it to an internal Drupal path stored in the source field. We've already set the alias field above, so we just need to see what's inside the source field. We can do that with a quick query against the url_alias table in our Drupal 7 database:

SELECT * FROM url_alias LIMIT 1;
+-----+-----------------+-----------------------------+----------+
| pid | source          | alias                       | language |
+-----+-----------------+-----------------------------+----------+
|   1 | taxonomy/term/1 | category/tags/militarystory | und      |
+-----+-----------------+-----------------------------+----------+

Hm. That...doesn't quite help. Right now we're more concerned about content (nodes) than about anything else. The url_alias table contains path aliases for everything in our Drupal 7 site. Still, it tells us something; the source column contains the internal Drupal path, where as alias provides the incoming URL entered by the user. 

By default, all nodes in Drupal have a simple, utilitarian path:

node/nodeID

So, we should be able to change our query a little to only pick out rows with a source column starting with node/:

SELECT * FROM url_alias WHERE source LIKE 'node/%' LIMIT 1;
+-----+--------+--------------------------------------------+----------+
| pid | source | alias                                      | language |
+-----+--------+--------------------------------------------+----------+
| 529 | node/1 | picture/tess/2005-11-20/november-20th-2005 | und      |
+-----+--------+--------------------------------------------+----------+

Great! We should be able to just set redirect_redirect/uri to source now in our migration, right?

While it's certainly possible to import content from a Drupal 7 site into a Drupal 8 site and preserve the node IDs, this isn't best practice. When doing a custom migration, we're far better off assuming that new node IDs will be created instead. This complicates things for us, since the d7_url_alias source plugin isn't aware of node IDs, just paths.

Normally, if we had just the node ID, we could use the migration_lookup plugin in our process section to transform the old node ID to the new one:

 

...
process:
  nid:
    -
      plugin: migration_lookup
      source: nid
      migration:
        - deninet_blog
      no_stub: true
    -
      plugin: skip_on_empty
      method: row
...

This runs two process plugins. The first instructs the migration system to look in all the migrations in the migration key for that node ID, and then return the results. We use no_stub: true to tell the migration system not to create any nodes it can't find. If it can't find a node, the skip_on_empty process plugin tells it to skip it and move on to the next one to migrate. 

While this eventually gets us the new, Drupal 8 node ID, we still don't even have a Drupal 7 node ID. It's buried in the source field of the path alias, prefixed by node/.

What we need to do is exact that node ID from the source field. Furthermore, we need to filter the path aliases we migrate so we only target source fields that start with node/. This is the perfect use case for a custom source plugin. So far we've been using the d7_url_alias source plugin given to us by Drupal core's Path module. We don't need to throw that plugin away, we only need to extend it a little. To do that in Drupal 8, we need to create a new subclass. 

The d7_url_alias source plugin is provided to us by the Path module. A quick way to where that plugin is defined is to search the entire Path module for the plugin ID. In doing so, we quickly find where the class is hiding:

core/modules/path
└── src
    └── Plugin
        └── migrate
            └── source
                ├── d6
                │   └── UrlAlias.php
                ├── d7
                │   └── UrlAlias.php
                └── UrlAliasBase.php

Since we're migrating from a Drupal 7 site, we use the UrlAlias class in the d7 folder. Note, however, there's a UrlAliasBase class too. At the time of this post, both the D6 and D7 UrlAlias classes are fairly sparse, with the bulk being defined in UrlAliasBase

Now that we know what class to derive from, we can create our own custom source plugin.

  1. If you don't already have a custom module for your migration plugins, create one now. Drupal Console makes this easy!
  2. In your_migration_module/src/Plugin/migrate/source, create a new PHP file for the plugin. This tutorial uses D7NodeRedirectFromPath.php.
  3. Define a new PHP class, extending from the Drupal 7 UrlAlias class.
  4. Write an @MigrateSource annotation to specify a unique source plugin ID.

When finished, you'll have something like this:

<?php

namespace Drupal\deninet_migrate\Plugin\migrate\source;

use Drupal\path\Plugin\migrate\source\d7\UrlAlias;

/**
 * Custom migration source for Drupal 7 path aliases.
 *
 * @MigrateSource(
 *   id = "deninet_node_redirect_from_path",
 *   source_module = "node"
 * )
 */
class D7NodeRedirectFromPath extends UrlAlias {
}

I had wrote the above for my own site, which is why the migration module is called deninet_migrate and the source plugin ID is deninet_node_redirect_from_path. For your source plugin, use whatever unique name makes the most sense to you!

With this finished, we should be able to update our migration *.yml to use the new source plugin:

 

...
source:
  plugin: deninet_node_redirect_from_path
  constants:
    slash: '/'
...

To make sure that took, we need to clear Drupal 8's cache using drush cr, and then re-import the migration *.yml using a drush cim. Then our migration should appear in a drush ms (migrate-status) just like before:

Group: deninet urls (deninet_urls)  Status  Total  Imported  Unprocessed  Last imported       
 deninet_node_redirects              Idle    3607   0      3607            

Perfect. Now we know that the migration is still working and using our custom source plugin. 

We only want our source plugin to work with Drupal 7 path aliases where the source starts with node/. When we were doing this on the command line, we only needed to change our SQL query slightly by adding a new condition. We can do the same in our plugin.

Migration source plugins like UrlAlias have a query() method that provides a SQL query to the migration system. This is what is used to gather data from the database, as well as power the migration status display of drush ms. Our base class already provides a database query for all path aliases, so we just need to extend that:

<?php

namespace Drupal\deninet_migrate\Plugin\migrate\source;

use Drupal\path\Plugin\migrate\source\d7\UrlAlias;

/**
 * Drupal 7 node source from database.
 *
 * @MigrateSource(
 *   id = "deninet_node_redirect_from_path",
 *   source_module = "node"
 * )
 */
class D7NodeRedirectFromPath extends UrlAlias {

  public function query() {
    // Get the database query from the UrlAlias class.
    $query = parent::query();

    // Add our condition to filter for only node paths.
    $query->condition('ua.source', 'node/%', 'LIKE');

    // Return the modified query.
    return $query;
  }
}

If we run drush ms again, we'll see that the number of items to import has changed:

Group: deninet urls (deninet_urls)  Status  Total  Imported  Unprocessed  Last imported       
 deninet_node_redirects              Idle    2929   0      2929            

This tells us that our query() is working and is now only picking up path aliases that start with node/. We're half-way there!

Now we need to extract the node ID from the source field. But where? We've already queried the database, so now we need to do some additional work for each individual row. Thankfully the migration system has us covered.

For migration source plugins, we can manipulate or even add additional source data in the prepareRow() method. The method is given one argument, the row that is currently being migrated. A "row" in this case is a row of the url_alias table in our Drupal 7 database, wrapped in a Row object.

We implement the method, extracting the node ID from the source field, and provide it to the migration as a new nid field:

<?php

namespace Drupal\deninet_migrate\Plugin\migrate\source;

use Drupal\migrate\Row;
use Drupal\path\Plugin\migrate\source\d7\UrlAlias;

/**
 * Drupal 7 node source from database.
 *
 * @MigrateSource(
 *   id = "deninet_node_redirect_from_path",
 *   source_module = "node"
 * )
 */
class D7NodeRedirectFromPath extends UrlAlias {

  public function query() {
    // Get the database query from the UrlAlias class.
    $query = parent::query();

    // Add our condition to filter for only node paths.
    $query->condition('ua.source', 'node/%', 'LIKE');

    // Return the modified query.
    return $query;
  }

  public function prepareRow(Row $row) {
    // Get the source field from the row.
    $source = $row->getSourceProperty('source');

    // If it matches node/nodeID...
    if (preg_match('/node\/[0-9]+/', $source)) {
      // Get the node ID from the string.
      $nid = substr($source, 5);

      // Provide it to the migration as the "nid" field.
      $row->setSourceProperty('nid', $nid);
    }

    // return the result.
    return parent::prepareRow($row);
  }

}

Now that our custom source plugin is providing the Drupal 7 node ID, we should be able to run it through the migration_lookup plugin to get the new, Drupal 8 node ID. We define a new field -- _nid -- in our process section to do this, referencing every migration we need to look in. Furthermore, we mark those migrations as required dependencies of this migration so they will all run in the right order. 

...
process:
  _nid:
    -
      plugin: migration_lookup
      source: nid
      migration:
        - deninet_blog
        - deninet_book
        - deninet_creation
        - deninet_gallery
        - deninet_podcast
      no_stub: true
    -
      plugin: skip_on_empty
      method: row
...
migration_dependencies:
  required:
    - deninet_blog
    - deninet_book
    - deninet_creation
    - deninet_gallery
    - deninet_podcast

But we're not finished yet. True, this gives us the new Drupal 8 node ID, but it doesn't give us a working path for use by the Redirect module. We need to prepend node/. To do that, we add yet another new field, _redirect. This field will take the _nid field and add the needed prefix:

...
source:
  plugin: deninet_node_redirect_from_path
  constants:
    prefix: node
...
  _redirect:
    plugin: concat
    source:
      - constants/prefix
       - '@_nid'
    delimiter: /
...

The _redirect field uses the concat process plugin provided to us by Drupal core. It combines each item provided to it separated by the delimiter. Since we can't drop any text in there, we have to define new item in constants under the source section. This creates a new static string called prefix that contains node.

What's with the funky @_nid? Normally, we can't refer to a field in a migration that is only defined inside the migration. We can only use fields provided to us by the migration source. The @ syntax allows us to bypass that problem, and refer to the _nid field we defined earlier. 

Now all we have to do is set redirect_redirect/uri to @_redirect, right? RIGHT!?

No, there's one more thing. The raw node/nodeID path we've created in _redirect isn't quite in the right format for the Redirect module. It needs a bit more processing to be just right. 

If we open up the Redirect module, and take a look at the migration templates it provides, we'll notice that the redirect_redirect/uri field is run through one more, final process plugin, d7_path_redirect. This process plugin is provided by the Redirect module itself, and makes sure everything is formatted as expected. 

So, all we have to do is run _redirect through the process plugin, giving us our final migration *.yml:

id: deninet_node_redirects
migration_tags:
  - 'Drupal 7'
  - deninet
  - content
migration_group: deninet_urls
label: 'deninet node redirects'
source:
  plugin: deninet_node_redirect_from_path
  constants:
    prefix: node
process:
  _nid:
    -
      plugin: migration_lookup
      source: nid
      migration:
        - deninet_blog
        - deninet_book
        - deninet_creation
        - deninet_gallery
        - deninet_podcast
      no_stub: true
    -
      plugin: skip_on_empty
      method: row
  _redirect:
    plugin: concat
    source:
      - constants/prefix
      - '@_nid'
    delimiter: /
  uid:
    plugin: default_value
    default_value: 1
  language:
    plugin: default_value
    source: language
    default_value: und
  status_code:
    plugin: default_value
    default_value: 301
  redirect_source/path: alias
  redirect_redirect/uri:
    plugin: d7_path_redirect
    source:
      - '@_redirect'
destination:
  plugin: entity:redirect
migration_dependencies:
  required:
    - deninet_blog
    - deninet_book
    - deninet_creation
    - deninet_gallery
    - deninet_podcast

After running drush cim to update our configuration, we should be able to run the new migration with drush mi ourMigrationId. When finished, we can log into our site and navigate to Admin > Config > Search and Metadata > URL Redirects and see tons of shiny new redirects imported!

The Drupal 8 migration system is a lot more powerful than we think. It's a complex Extract-Transform-Load (ETL) system. With it, we can not only import content, but transform it into something new altogether. Transforming Drupal 7 path aliases into Drupal 8 redirects keeps 404s and lost SEO juice to a minimum with only a bit of migration work. 

This post was created with the support of my wonderful supporters on Patreon.

If you like this post, consider becoming a supporter at patreon.com/socketwench.

Thank you!!!