Improving Alfresco datalist performance

Let’s talk about Datalists… (yes, that was a reference to the wat talk). If you have worked
with Datalists in Alfresco and added more than a few hundred rows to a table (maybe even more than 1000) you’ll have noticed a significant increase in loading time. To understand why that is, let’s a have a look whats going on behind the scenes when you open a datalist in the grid view:

  • The webscript e.g. /datalists/data/node/workspace/SpacesStore/dae615e9-1d08-483f-a19b-69b30b5bffed (that’s the Datalist’s nodeRef) is called
  • A list of columns (from Share’s form configuration) and filter criteria are sent to the repository as well
  • The repository webscript runs a lucene search (using the filter criteria, if set)
  • It loops over the result
  • for each result Evaluator.run() (from evaluator.lib.js) is called
  • Evaluator.run() uses the FormService to build the fields
  • The FormService builds the complete form from the datamodel (including model, data values, contraints)
  • For each field with a ListConstraints alle the values are localized (new in 3.4.8)
  • A ListValueConstraint’s getAllowedValues() and getDisplayLabel() are called potentially thousands of times.

What we know

What’s wrong here? Several things. It starts with the search which is not very fast. In Alfresco 3.4.x a lucene PATH query is used here which is known to not be very fast, espcially with large repositories or when the lucene index is not stored on a really fast disk. For the document library Jeff Potts has suggested to use node.children instead of the lucnene PATH query to speed things up. Of course this only works if no filters are used.

Another issue with the search that adds to the performance problems of the datalists, is, that the whole datalist is read into the browser at once. There is no server side paging implemented for datalists which means that if you have 1000 rows in the datalist you have to wait for the browser to load all the 1000 rows before the first 50 are displayed. Stefano Argentiero has suggested a modification to the Alfresco datalist to add server side pagination which whould also help speeding up the datalist rendering.

Stefano Argentiero realized that running the Evaluator.run() function for each row takes a lot of time, but why is that? The Evaluator function calls the FormService for each row and constructs the whole form. This means that all field defintions are built, including contraints and for ListContraints even all the individual contraint values are collected. Since the fields are only displayed in the datalist (mode=”view”) these contraints values are never needed at all.

The new localization doesn’t help

With a change in Alfresco 3.4.9 (ALF-9514) each of these contraint values are even localized. That means if you have 1000 rows in the table and one field with a list contraint of 100 values (not uncommon) your ListContraint’s getDisplayLabel() and getAllowedValues() methods are easily called 100000 times when the datalist model is built in the webscript.

Anyone upgrading to Alfresco >=3.4.8 who is using custom ListContraints should make sure that…

  • Your getAllowedValues() method returns a cached List as quickly as possible
  • If you don’t need localized constraint values overwrite the getDisplayLabel() method to simply return the value to skip the localization code.

What can be done?

Apart from the issues that this new localization introduced I still wanted to have the datalist displayed as fast as possible. Calling the FormService and building the whole form over and over again for each row didn’t strike me as a smart approach. I looked into the evaluator.lib.js to see if can be removed for good ;-)

This is the block from evaluator.lib.js calls the FormService and extracts the data that is returned:

// Use the form service to parse the required properties
scriptObj = formService.getForm("node", node.nodeRef, fields, fields);

// Make sure we can quickly look-up the Field Definition within the formData loop...
var objDefinitions = {};
for each (formDef in scriptObj.fieldDefinitions)
{
 objDefinitions[formDef.dataKeyName] = formDef;
}

// Populate the data model
var formData = scriptObj.formData.data;
for (var k in formData)
{
 var isAssoc = k.indexOf("assoc") == 0,
	value = formData[k].value,
	values,
	type = isAssoc ? objDefinitions[k].endpointType : objDefinitions[k].dataType,
	endpointMany = isAssoc ? objDefinitions[k].endpointMany : false,
	objData =
	{
	   type: type
	};

 if (value instanceof java.util.Date)
 {
	objData.value = utils.toISO8601(value);
	objData.displayValue = objData.value;
	nodeData[k] = objData;
 }
 else if (endpointMany)
 {
	if (value.length() > 0)
	{
	   values = value.split(",");
	   nodeData[k] = [];
	   for each (value in values)
	   {
		  var objLoop =
		  {
			 type: objData.type,
			 value: value,
			 displayValue: value
		  };

		  if (Evaluator.decorateFieldData(objLoop, node))
		  {
			 nodeData[k].push(objLoop);
		  }
	   }
	}
 }
 else
 {
	objData.value = value;
	objData.displayValue = objData.value;

	if (Evaluator.decorateFieldData(objData, node))
	{
	   nodeData[k] = objData;
	}
 }
}

This can be easily replaced by that following block of code which doesn’t use the FormService at all:

// Populate the data model
for each(k in fields)
{
 var value = node.properties[k], objData = {};
 if (value) {
	var formkey = "prop_" + k.replace(":", "_");

	if (typeof value == "boolean") {
		objData.type = "boolean";
		objData.value = value;
		objData.displayValue = objData.value;
		nodeData[formkey] = objData;
	}
	else if (value && value.getMonth)
	{
	objData.type = "date";
		objData.value = utils.toISO8601(value);
		objData.displayValue = objData.value;
		nodeData[formkey] = objData;
	}
	else
	{
		objData.type = "text";
		objData.value = value ? value : "";
		objData.displayValue = objData.value;
		if (Evaluator.decorateFieldData(objData, node))
		{
			nodeData[formkey] = objData;
		}
	}
}
else {
	var assoc = node.assocs[k];
	if (assoc && assoc.length > 0) {

		var formkey = "assoc_" + k.replace(":", "_");
		nodeData[formkey] = [];

		for each(var value in assoc) {

			objData.type = value.typeShort;
			objData.value = value.nodeRef;
			objData.displayValue = objData.value;
			if (Evaluator.decorateFieldData(objData, value))
			{
				nodeData[formkey].push(objData);
			}
		}
	}
}
}

Does it help?

I tested this change using a datalist with 1000 rows and 6 columns with one column having a list contraint (about 30 values). With this configuration the original webscript returned in about 2.4s (on a fast dev system). Using my patched evalator.lib.js without the call to the FormService the same datagrid is displayed in 870ms. This is more than a 2x increase in performance. Depending on the number of rows and constraints your results may vary.

I didn’t need all the features that can be modeled in the datalist so this change does NOT support all the features of the original implementation:

  • Multivalue properties are currently not supported (could be easily added)
  • NodeRef properties are currently not supported (could be easily added)
  • You can not change the datalists values in the grid using FormFilters any more
  • maybe other things are broken…

Maybe this well help some of you out there to improve the performance of the datalists in your Alfresco solutions.