Let’s talk about Datalists… (yes, that was a reference to the wat talk). If you have worked
with Datalists in Alfresco and added more than a few hundred rows to a table (maybe even more than 1000) you’ll have noticed a significant increase in loading time. To understand why that is, let’s a have a look whats going on behind the scenes when you open a datalist in the grid view:

The webscript e.g. /datalists/data/node/workspace/SpacesStore/dae615e9-1d08-483f-a19b-69b30b5bffed (that’s the Datalist’s nodeRef) is called
A list of columns (from Share’s form configuration) and filter criteria are sent to the repository as well
The repository webscript runs a lucene search (using the filter criteria, if set)
It loops over the result
for each result Evaluator.run() (from evaluator.lib.js) is called
Evaluator.run() uses the FormService to build the fields
The FormService builds the complete form from the datamodel (including model, data values, contraints)
For each field with a ListConstraints alle the values are localized (new in 3.4.8)
A ListValueConstraint’s getAllowedValues() and getDisplayLabel() are called potentially thousands of times.

What we know

What’s wrong here? Several things. It starts with the search which is not very fast. In Alfresco 3.4.x a lucene PATH query is used here which is known to not be very fast, espcially with large repositories or when the lucene index is not stored on a really fast disk. For the document library Jeff Potts has suggested to use node.children instead of the lucnene PATH query to speed things up. Of course this only works if no filters are used.

Another issue with the search that adds to the performance problems of the datalists, is, that the whole datalist is read into the browser at once. There is no server side paging implemented for datalists which means that if you have 1000 rows in the datalist you have to wait for the browser to load all the 1000 rows before the first 50 are displayed. Stefano Argentiero has suggested a modification to the Alfresco datalist to add server side pagination which whould also help speeding up the datalist rendering.

Stefano Argentiero realized that running the Evaluator.run() function for each row takes a lot of time, but why is that? The Evaluator function calls the FormService for each row and constructs the whole form. This means that all field defintions are built, including contraints and for ListContraints even all the individual contraint values are collected. Since the fields are only displayed in the datalist (mode=”view”) these contraints values are never needed at all.

The new localization doesn’t help

With a change in Alfresco 3.4.9 (ALF-9514) each of these contraint values are even localized. That means if you have 1000 rows in the table and one field with a list contraint of 100 values (not uncommon) your ListContraint’s getDisplayLabel() and getAllowedValues() methods are easily called 100000 times when the datalist model is built in the webscript.

Anyone upgrading to Alfresco >=3.4.8 who is using custom ListContraints should make sure that…

Your getAllowedValues() method returns a cached List as quickly as possible
If you don’t need localized constraint values overwrite the getDisplayLabel() method to simply return the value to skip the localization code.

What can be done?

Apart from the issues that this new localization introduced I still wanted to have the datalist displayed as fast as possible. Calling the FormService and building the whole form over and over again for each row didn’t strike me as a smart approach. I looked into the evaluator.lib.js to see if can be removed for good

This is the block from evaluator.lib.js calls the FormService and extracts the data that is returned:

// Use the form service to parse the required properties
scriptObj = formService.getForm("node", node.nodeRef, fields, fields);

// Make sure we can quickly look-up the Field Definition within the formData loop...
var objDefinitions = {};
for each (formDef in scriptObj.fieldDefinitions)
{
 objDefinitions[formDef.dataKeyName] = formDef;
}

// Populate the data model
var formData = scriptObj.formData.data;
for (var k in formData)
{
 var isAssoc = k.indexOf("assoc") == 0,
	value = formData[k].value,
	values,
	type = isAssoc ? objDefinitions[k].endpointType : objDefinitions[k].dataType,
	endpointMany = isAssoc ? objDefinitions[k].endpointMany : false,
	objData =
	{
	   type: type
	};

 if (value instanceof java.util.Date)
 {
	objData.value = utils.toISO8601(value);
	objData.displayValue = objData.value;
	nodeData[k] = objData;
 }
 else if (endpointMany)
 {
	if (value.length() > 0)
	{
	   values = value.split(",");
	   nodeData[k] = [];
	   for each (value in values)
	   {
		  var objLoop =
		  {
			 type: objData.type,
			 value: value,
			 displayValue: value
		  };

		  if (Evaluator.decorateFieldData(objLoop, node))
		  {
			 nodeData[k].push(objLoop);
		  }
	   }
	}
 }
 else
 {
	objData.value = value;
	objData.displayValue = objData.value;

	if (Evaluator.decorateFieldData(objData, node))
	{
	   nodeData[k] = objData;
	}
 }
}

This can be easily replaced by that following block of code which doesn’t use the FormService at all:

// Populate the data model
for each(k in fields)
{
 var value = node.properties[k], objData = {};
 if (value) {
	var formkey = "prop_" + k.replace(":", "_");

	if (typeof value == "boolean") {
		objData.type = "boolean";
		objData.value = value;
		objData.displayValue = objData.value;
		nodeData[formkey] = objData;
	}
	else if (value && value.getMonth)
	{
	objData.type = "date";
		objData.value = utils.toISO8601(value);
		objData.displayValue = objData.value;
		nodeData[formkey] = objData;
	}
	else
	{
		objData.type = "text";
		objData.value = value ? value : "";
		objData.displayValue = objData.value;
		if (Evaluator.decorateFieldData(objData, node))
		{
			nodeData[formkey] = objData;
		}
	}
}
else {
	var assoc = node.assocs[k];
	if (assoc && assoc.length > 0) {

		var formkey = "assoc_" + k.replace(":", "_");
		nodeData[formkey] = [];

		for each(var value in assoc) {

			objData.type = value.typeShort;
			objData.value = value.nodeRef;
			objData.displayValue = objData.value;
			if (Evaluator.decorateFieldData(objData, value))
			{
				nodeData[formkey].push(objData);
			}
		}
	}
}
}

Does it help?

I tested this change using a datalist with 1000 rows and 6 columns with one column having a list contraint (about 30 values). With this configuration the original webscript returned in about 2.4s (on a fast dev system). Using my patched evalator.lib.js without the call to the FormService the same datagrid is displayed in 870ms. This is more than a 2x increase in performance. Depending on the number of rows and constraints your results may vary.

I didn’t need all the features that can be modeled in the datalist so this change does NOT support all the features of the original implementation:

Multivalue properties are currently not supported (could be easily added)
NodeRef properties are currently not supported (could be easily added)
You can not change the datalists values in the grid using FormFilters any more
maybe other things are broken…

Maybe this well help some of you out there to improve the performance of the datalists in your Alfresco solutions.

And once again too much time has passed without me paying attention to the blog. I spent time experimenting with all sorts of apps and technologies and should’ve really posted some of my experiences.

Concerning my experiments with GWT – I upgraded to version 1.2.22 today and did realize the promised decrease in startup time when loading the project in the hosted mode browser but imho it could still be a bit smoother. The commercial GWT Designer (eclipse plugin, a free edition is available) looks very promising. Especially the useful wizards, refactoring support, I18N support and of course the visual designer make GWT development even more productive.

Another project I spend some time on lately, is the automatic generation of VMware images. The idea is basically to automate the installation of predefined software packages so that up-to-date images can be built at any time. Eventually custom images could be requested and the software installation could take place without supervision. I currently use the install scripts of the unattended.org project and an additional perl script to start the VM – as a prototype it works. I’ve also looked into other installer projects like WPKG, but I might end up creating my own database based package and profile management to enable user based profile definition via a web interface.

Btw, I need to get myself a new “java server” for the backyard: SUN’s Blackbox
(Audio Version)

techbits.de

thoughts on hardware, software, development and tech news

Category Archives: software

Improving Alfresco datalist performance

What we know

The new localization doesn’t help

What can be done?

Does it help?

Splitting mp3/cue files into multiple mp3 files

UML done right?

Still here .. with GWT and VMware image generation