Sitecore Content Export Tool for Azure – avoiding the 230 second timeout

The Content Export Tool has always had one major limitation – the long processing time it can take. With small exports, the tool is very quick and can finish in a few seconds, but for large exports that include a ton of items or a lot of computation such as getting referrers or linked items, the export can take several minutes. This was a minor inconvenience on locally hosted Sitecore sites – the browser page will hang while the task is running and prevent you from doing other things in Sitecore, but you can get around this by opening an Incognito window or another browser and working in a new session while the task runs in the first session.

However, on Azure this created an unavoidable problem. Azure has a hard request timeout of 230 seconds which cannot be changed. Therefore, large exports in Azure were simply impossible; the only way to export the entire site was to break the task up into multiple smaller exports that are small enough to complete in under 230 seconds.

There are some clever tricks to subvert the 230 second timeout, such as writing an empty line to the response every 30 seconds to let Azure know to keep it alive. However, this did not seem to work for the Content Export Tool since the tool doesn’t just process things behind the scenes, but rather returns data in the Response; when I tried this trick, it caused the browser to download an empty CSV file. Therefore my ultimate solution was to run the export process in a background Job and write the file to the server, and use a separate request (or rather, series of requests) to check if the file had been created and download it when it had.

This is now available on Github and has been tested in Azure for Sitecore 9.0 and 9.3.

How it works

In order to prevent the request from timing out, the entire code to generate the CSV file has been moved into a Sitecore Job. Sitecore jobs run on a background thread and do not time out or block other Sitecore functions.

However, because the CSV is now created on a background thread, it can no longer be returned in the Http Response. The way the original Content Export Tool works is that it generates the CSV as a string, gives the Response the necessary headers to return a CSV file and writes and flushes the CSV string to the Response. With the Sitecore Job, that no longer works. We cannot write the file to the Http Response within the job; as soon as the job is started, the initial Http Request ends. We also cannot make Response wait until the Job finishes, because then it would still have the original timeout problem.

The solution was to have the job write a file to the server. Meanwhile, on the client side, we would keep checking every 5 seconds to see if the file had been written, and stop checking once the file was downloaded. I haven’t included all of the code here, but the basic gist is:

  1. When the export begins, a unique download token is created and written to the hidden #txtDownloadToken input.
  2. The export job uses the value of the input to write a file with that name to the server.
  3. On page load, if #txtDownloadToken has a value, trigger the hidden btnDownloadFile button
  4. The hidden button makes a call to DownloadFile, which checks if the file exists and downloads it.
    1. If it exists, the file is downloaded, txtDownloadToken is cleared out and a cookie is returned in the response indicating that the download is complete
    2. Otherwise, the method returns void and btnDownloadFile will be triggered again in 5 seconds
function checkIfFileWritten(downloadToken) {    
    console.log("checking if file is written");
    $(".loading-modal").show();

    // wait a few seconds to see if the cookie gets generated
    setTimeout(function () {
        // if the file has been written and is downloading, we can stop trying to download it;
        var token = getCookie("DownloadToken");

        if ((token == downloadToken)) {    
            $(".loading-modal").hide();
            $("#txtDownloadToken").val("");

        } else {
            console.log("download token doesn't match");            
            $(".btnDownloadFile").click();
        }   
    }, 5000) 
}
private void DownloadFile()
{
    try
    {
        // if cookie already exists, then the file has already been downloaded
        if (Request.Cookies["DownloadToken"] != null && Request.Cookies["DownloadToken"].Value == txtDownloadToken.Value)
        {
            // cookie is already downloaded, return
            return;
        }

        // check if file exists. if not, return; if it does, set cookie
        if (!File.Exists(filePath))
        {
            return;
        }

        var fileName = !string.IsNullOrWhiteSpace(txtFileName.Value) ? txtFileName.Value : "ContentExport";

        var fileContents = File.ReadAllText(filePath);

        StartResponse(fileName);
        SetCookieAndResponse(fileContents);
    }
    catch(Exception ex)
    {
        return;
    }
}

Pros and Cons

Pro: The 230 timeout in Azure will no longer prevent large content exports

Pro: Because the file is now created as a background task, the user is no longer blocked from continuing to do other work in Sitecore while the export is running

Con: The window will continuously refresh every 5 seconds until it determines that the file has been created and downloaded. This makes for a slightly worse UX experience

Con: As this is the first release of the Azure version, there may be some bugs. It is possible that the Javascript might fail to detect that the file has downloaded. If the file has downloaded but the modal overlay does not disappear within a few seconds, close and reopen the window

Summary

This version of the module is intended for Azure, but can be used in on-prem instances. Personally, I like the original version because it doesn’t do the constant window reloading, but I do think the fact that the Azure version does not block consecutive requests while the export is running is a big advantage. If you are regularly doing large exports that take several minutes to run, you may want to use this version if if you’re not on Azure.

Currently the only thing that has been updated is the export feature. Moving the import feature to a background job will be the next step.

Make sure to download the correct version for 9.0 or 9.2+. Sitecore changed the class names for Jobs in 9.2, so the tool will fail to compile if you install the wrong version (if you do, just fix it by reinstalling the correct version).

Download Sitecore Media Library Files in Bulk

The latest update to the Content Export/Import Tool introduces a new feature for downloading media files in bulk.

There is a difference between downloading a Sitecore package of media items, and actually downloading the media files themselves. A Sitecore package will create a .zip file of Sitecore items that can be installed on another environment to migrate media library items. However, the actual attached images (or other media file types) cannot just be pulled from these images. If you want to download the actual attached file, you have to browse to the Media Library item and click the Download button, and there is no OOTB way to do this for multiple items, or an entire folder, at once.

I have had a few requests from clients to help them download the actual media files from Sitecore, so this seemed like it would be a useful addition to the Content Export Tool.

Media Export feature

The Media Export is easy to use and, like the other features, uses the same filters as everything else. The Start Item(s) field is used to choose the starting folder; you can exclude Children to only pull selected media files, and use the other filters as well such as date and author.

Because the Media Library often has thousands of items, you must select a Start Item(s). If you do not, you will get a warning informing you that a start item must be set. If you really want to export the entire Media Library, you can do so by setting the Start Item to /Sitecore/Media Library.

The selected images are automatically downloaded as a compressed zip folder, and can be found in your Downloads folder.

The unzipped zip folder full of images

Content Export Tool Latest Changes 3/25: Published checkbox and User audit updates

The most recent version of the Content Export/Import Tool is Version 8.8. Here’s what’s been added since my last post:

8.5: Fix version creation when language-fallback is enabled

In the Import-Update process, new language versions were not being created if the items had Language Fallback enabled. This release fixes the issue so that new language versions can be created in the Import process when using Language Fallback.

8.6: Use selected language version(s) when date filtering

The Created and Updated Date filters previously only checked the English (or default) version of each item. This release checks the dates of the language version for more accurate filtering.

8.7: Add published flag to export options

This release introduced a checkbox to export True/False for whether each item exists in web. This does not indicate whether the item is up to date with master. This release was not language specific; it only checked if the item itself existed in web, but did not check the language version.

8.8: Add details to Published export, User audit

  • Modify the Published flag from the previous release to check if the specified language version exists in web, rather than just the item
  • Add the Up To Date flag, which compares the Modified Date in master and web to determine if the published item is up to date with master
  • Add “Status” column to the User Audit to report users that are Disabled or Locked Out

Auditing Sitecore Components with the Content Export/Import Tool

The other day, a colleague asked me if they could use the Content Export/Import Tool to find out all of the components that were being used on their site. They were in the process upgrading from Sitecore 6.5 to Sitecore 9, and they needed to know which components were in use– there were a ton of components in the old solution, and many were likely obsolete, so they needed to know which components had to be included in the upgrade and which could be ignored.

I said, “give me a few hours.”

The result: the newest feature on the Content Export Tool, Component Audit.

audit
The Component Audit feature is located within the Advanced Options

The Component Audit uses the same filtering options as the Content Export; you can specify one or multiple starting paths, and filter by template and created/published dates. You can also use the No Children checkbox to only audit the components on the Start Item(s). For example, if you wanted all the components on the Homepage only; you could use the Template filters if the Homepage has a unique template, but otherwise you might get a lot of extra data that you don’t need. As a result, you can use the Component Audit to audit all of the components used throughout the entire site; an audit of the components on one single page; or an audit of the components on a subset of pages specified using the available filters.

no children.PNG
New “No Children” option

The resulting data will let you know the location (item path) of every component, as well as the datasource (if any) and the placeholder. You can also easily determine how many times each component is used.

Results
Example report

As with the Content Export, this process can take a while when done on the entire site. Keep in mind how many items you have on your site and consider what data you need before doing a Component Audit on the entire site (/sitecore/content).

You can download the latest release here. Available soon on the Sitecore Marketplace.

Why you should use the Sitecore Content Export Tool: A comparison with Sitecore Powershell Extensions

The Sitecore Powershell Extensions are an incredibly powerful tool for Sitecore, so much so that it is required for SXA. One of the features of SPE is Reporting. SPE PowerShell Reports offers a large variety of specific reports on Sitecore content, media, configuration, users, etc. It can find all broken links; all items with a particular template; items that are locked or stale.

SPE Reports

What SPE PowerShell Reports doesn’t offer

  1. The ability to report the values of specific fields
  2. The ability to combine filters/conditions in one report
  3. The ability to have multiple starting items in one report

All of the above are core features of the Content Export Tool.

Report the the values of specific fields

One of the most common requests I have gotten from my clients is that they need to know the value of a particular field(s) on every item in a folder or of a specific template type. The primary purpose and reason for inception of the CET is to report the values of any desired field of Sitecore items.

Fields.PNG

The user can select as many fields as they want. Fields can be manually entered by name or ID (not case sensitive, but name must be exact). There is also a Browse feature which allows the user to select fields by template, or to search by field name; selecting fields through Browse ensures that field names will be spelled correctly. There is also the All Fields option, which will report the value of all fields on each selected item (excluding fields that begin with __). This is helpful if the user needs the values of all of the fields for each item of one particular template type, but is an expensive operation and has the potential to include a huge amount of unnecessary data, and so should only be use as needed.

Combine filters/conditions in one report

The CET offers some of the same reporting functionality as SPE, such as reporting all Sitecore items with a particular template. However, these options can be combined with other filters to generate much more specific reports. The CET offers the following filtering options:

Filters

  • Template(s) – only select items of a specific template type(s)
  • Items with layout – only select items that have a layout (i.e. pages rather than components)
  • Language – only select the specified language version

In addition to those filters, there are a number of properties that can be included in a report in addition to field values:

Reportable properties

  • Item ID
  • Item name
  • Template name
  • Linked item IDs (paths of linked items are reported by default for link, image, droplist fields etc. This option will also report the linked item ID)
  • Raw HTML (paths of linked items are reported by default for image and link fields. This option will also report the raw HTML)
  • Referrers
  • Date Created
  • Date Modified
  • Created By
  • Modified By
  • Never Publish (true or false)
  • Workflow
  • Workflow State

All of these filters and property options can be combined in any combination to create highly specified and detailed reports.

Multiple starting items

The SPE Reports allow the user to specify the report root, the starting item where the scan begins. The CET offers this, but also offers the option to select multiple starting items. A user may want to get all a report on items from multiple different folders, but starting at /content/Home is an expensive operation and may include a lot of unneeded data in the report. Sure, in SPE the user could run the report over and over again for each root item, and could then manually combine all of the generated spreadsheets into one, but in the CET you don’t have to do that; you can run one single report that pulls the items from all of the specified roots and get all of the results in one single spreadsheet.

Other awesome features

In addition to filtering by template, the CET includes the option to include templates that inherit any of the selected templates.

inheritance.PNG

This can be used for multiple purposes. If the user wants a report on a large number of templates that all share a common base, they can accomplish this easily by selecting the base template and checking off the Inheritance option rather than selecting each individual template and running the risk of leaving out desired templates. Additionally, a user could run a report this way if they wanted to know all of the items there are that inherit a particular template (this would be like running the SPE “Items with template” report, but reporting all of the items that inherit the selected template instead).

Advanced search

Another request I have been asked is to find all Sitecore items that have a word or phrase in any field. This is where the CET’s Advanced Search feature comes into play. The Advanced Search searches for the entered word or phrase in all fields (by default) of all selected items. The Advanced Search works in conjunction with the Start  Item and Templates filters; the Fields filter can be used to only search for the keyword in specific fields.

Could this be done in the Powershell?

Probably; SPE is really powerful, and there’s a lot you can accomplish with the PowerShell Console. But composing the right Sitecore query, especially with a large number of conditions and filters, can be difficult for a developer let alone a regular content editor. The Content Export tool provides a user friendly, easy-to-use interface that allows the user to build out complex search conditions using simple form fields and to download that report as an Excel spreadsheet. It also provides the ability to save searches, so that if there is a report that a user frequently needs to run, they need only configure it once and then save it rather than remember how they configured it each time.

How the Content Export Tool could be incorporated into SPE

Easy! The SPE Reporting menu already has a Content Audit menu. The CET could be incorporated as a part of SPE by simply adding an item to this menu which would open the Content Export Tool window. I for one think it would be a great addition to SPE… but for the meantime, you can download it from the Sitecore Marketplace and use it as an independent tool for all of the custom content audits you or clients need.

The Content Export Tool is tested with Sitecore 6.5 – 8.2

 

Content Export Tool for Sitecore

The Content Export Tool is a new Sitecore module that allows Sitecore users to easily export data from the content tree. This post outlines the uses for this tool and provides setup and usage documentation.

Introduction

In my work as a Sitecore developer, I have had many occasions where clients have requested audits of their Sitecore content, such as needing to know all the items in a certain folder with the values of particular fields for those items.  For these requests, I would create a custom utility page so that they could export the data they needed, but as I found I was writing basically the same code over and over with different hardcoded values, I decided to create a reusable module that could handle all of these requests dynamically. This module works on all versions of Sitecore through 8.2.

Download

The source code and Sitecore installation package can also be downloaded at
https://github.com/estockwell-alpert/ContentExportTool.

Setup

Setup is simple. All you need to do is download the package Content Export Tool.zip and install this in your Sitecore instance using the Installation Wizard. The Content Export button should appear in the Start Menu once Sitecore restarts. That’s it!

How to Use

The tool allows the user to run exports ranging from minimal data to hundreds of fields. If the user clicks the Run Export button with no fields filled out/checked off, it will return a spreadsheet with the item path of every item in the Sitecore content tree. The following fields can be used to further customize the data:

  • Database – Select which database to retrieve content from. The Database dropdown is populated with all of the available databases for the website.
  • Include Ids – Check this box to include the Guid IDs of each item
  • Start Item – If this field is blank, the entire content tree will be scanned. If this field is populated, then the export will only include items beneath and including the item specified. Start Item can be specified by path or ID. The Start Item can be manually entered, or you can use the Browse button to search the content tree for the item you want (Browse can take a while to load).
  • Fast Query – Enter a fast query to select the items you want to export. This field overrides the Start Items. The tool will only export items returned by the fast query. You can use the Test button to see whether the query works and how many items it will return.
  • Templates – If this field is blank, all items under the start node will be included. If templates are selected, only items of the selected template types will be included. Templates can be specified by name or ID; if name is used, then if multiple templates have the same name, items of all templates with that name will be included. You can use the Browse button to choose template names from the Sitecore tree, or enter names or IDs manually.
  • Include Template Name – Check this box to include the template name of each item in the spreadsheet
  • Fields – Enter the names or IDs of all the fields that you want to get the values of. The tool determines the type of each field and handles the data accordingly; this can handle string, rich text, image, link, droplist, multilist, and checkbox fields. Fields names or IDs can be manually entered, or you can use Browse to select fields. If the Templates field is populated, then Browse will only suggest fields that belong to selected templates.
  • Include linked item IDs – This will include the item ID for any field that links to an item, such as images, links, droplists, and multilists (the linked item path is returned by default).
  • Include raw HTML – This will include the raw HTML that is returned by image and link fields
  • Workflow – This includes the workflow that each item uses
  • Workflow State – This includes the current workflow state that the item is in
  • Get All Language Versions – This includes the language of each item

Settings can be saved in case you want to run the same export multiple times. The Save Settings button will save the current configuration (you must enter a name to save as first). Selecting settings from the dropdown will populate all of the fields based on the saved configuration. You can delete a saved configuration when it is currently selected using the Delete button. You cannot save multiple configurations with the same name, but if you want to overwrite an existing configuration, you can enter the name of that configuration and click Save Settings (you will be prompted to confirm whether you want to overwrite).

Conclusion

The Content Export Tool can make auditing Sitecore content much easier for both developers and and Sitecore users. It makes auditing content simple: you can easily find the locations of all items of a particular template type; retrieve the fields values of thousands of items; get all of the items that exist in a particular folder; and many more possibilities. The code is open source and easily customizable as well. I hope you will find this module useful!