The Content Export Tool has always had one major limitation – the long processing time it can take. With small exports, the tool is very quick and can finish in a few seconds, but for large exports that include a ton of items or a lot of computation such as getting referrers or linked items, the export can take several minutes. This was a minor inconvenience on locally hosted Sitecore sites – the browser page will hang while the task is running and prevent you from doing other things in Sitecore, but you can get around this by opening an Incognito window or another browser and working in a new session while the task runs in the first session.
However, on Azure this created an unavoidable problem. Azure has a hard request timeout of 230 seconds which cannot be changed. Therefore, large exports in Azure were simply impossible; the only way to export the entire site was to break the task up into multiple smaller exports that are small enough to complete in under 230 seconds.
There are some clever tricks to subvert the 230 second timeout, such as writing an empty line to the response every 30 seconds to let Azure know to keep it alive. However, this did not seem to work for the Content Export Tool since the tool doesn’t just process things behind the scenes, but rather returns data in the Response; when I tried this trick, it caused the browser to download an empty CSV file. Therefore my ultimate solution was to run the export process in a background Job and write the file to the server, and use a separate request (or rather, series of requests) to check if the file had been created and download it when it had.
This is now available on Github and has been tested in Azure for Sitecore 9.0 and 9.3.
How it works
In order to prevent the request from timing out, the entire code to generate the CSV file has been moved into a Sitecore Job. Sitecore jobs run on a background thread and do not time out or block other Sitecore functions.
However, because the CSV is now created on a background thread, it can no longer be returned in the Http Response. The way the original Content Export Tool works is that it generates the CSV as a string, gives the Response the necessary headers to return a CSV file and writes and flushes the CSV string to the Response. With the Sitecore Job, that no longer works. We cannot write the file to the Http Response within the job; as soon as the job is started, the initial Http Request ends. We also cannot make Response wait until the Job finishes, because then it would still have the original timeout problem.
The solution was to have the job write a file to the server. Meanwhile, on the client side, we would keep checking every 5 seconds to see if the file had been written, and stop checking once the file was downloaded. I haven’t included all of the code here, but the basic gist is:
- When the export begins, a unique download token is created and written to the hidden #txtDownloadToken input.
- The export job uses the value of the input to write a file with that name to the server.
- On page load, if #txtDownloadToken has a value, trigger the hidden btnDownloadFile button
- The hidden button makes a call to DownloadFile, which checks if the file exists and downloads it.
- If it exists, the file is downloaded, txtDownloadToken is cleared out and a cookie is returned in the response indicating that the download is complete
- Otherwise, the method returns void and btnDownloadFile will be triggered again in 5 seconds
function checkIfFileWritten(downloadToken) {
console.log("checking if file is written");
$(".loading-modal").show();
// wait a few seconds to see if the cookie gets generated
setTimeout(function () {
// if the file has been written and is downloading, we can stop trying to download it;
var token = getCookie("DownloadToken");
if ((token == downloadToken)) {
$(".loading-modal").hide();
$("#txtDownloadToken").val("");
} else {
console.log("download token doesn't match");
$(".btnDownloadFile").click();
}
}, 5000)
}
private void DownloadFile()
{
try
{
// if cookie already exists, then the file has already been downloaded
if (Request.Cookies["DownloadToken"] != null && Request.Cookies["DownloadToken"].Value == txtDownloadToken.Value)
{
// cookie is already downloaded, return
return;
}
// check if file exists. if not, return; if it does, set cookie
if (!File.Exists(filePath))
{
return;
}
var fileName = !string.IsNullOrWhiteSpace(txtFileName.Value) ? txtFileName.Value : "ContentExport";
var fileContents = File.ReadAllText(filePath);
StartResponse(fileName);
SetCookieAndResponse(fileContents);
}
catch(Exception ex)
{
return;
}
}
Pros and Cons
Pro: The 230 timeout in Azure will no longer prevent large content exports
Pro: Because the file is now created as a background task, the user is no longer blocked from continuing to do other work in Sitecore while the export is running
Con: The window will continuously refresh every 5 seconds until it determines that the file has been created and downloaded. This makes for a slightly worse UX experience
Con: As this is the first release of the Azure version, there may be some bugs. It is possible that the Javascript might fail to detect that the file has downloaded. If the file has downloaded but the modal overlay does not disappear within a few seconds, close and reopen the window
Summary
This version of the module is intended for Azure, but can be used in on-prem instances. Personally, I like the original version because it doesn’t do the constant window reloading, but I do think the fact that the Azure version does not block consecutive requests while the export is running is a big advantage. If you are regularly doing large exports that take several minutes to run, you may want to use this version if if you’re not on Azure.
Currently the only thing that has been updated is the export feature. Moving the import feature to a background job will be the next step.
Make sure to download the correct version for 9.0 or 9.2+. Sitecore changed the class names for Jobs in 9.2, so the tool will fail to compile if you install the wrong version (if you do, just fix it by reinstalling the correct version).