Improving Alfresco datalist performance

Let’s talk about Datalists… (yes, that was a reference to the wat talk). If you have worked
with Datalists in Alfresco and added more than a few hundred rows to a table (maybe even more than 1000) you’ll have noticed a significant increase in loading time. To understand why that is, let’s a have a look whats going on behind the scenes when you open a datalist in the grid view:

  • The webscript e.g. /datalists/data/node/workspace/SpacesStore/dae615e9-1d08-483f-a19b-69b30b5bffed (that’s the Datalist’s nodeRef) is called
  • A list of columns (from Share’s form configuration) and filter criteria are sent to the repository as well
  • The repository webscript runs a lucene search (using the filter criteria, if set)
  • It loops over the result
  • for each result Evaluator.run() (from evaluator.lib.js) is called
  • Evaluator.run() uses the FormService to build the fields
  • The FormService builds the complete form from the datamodel (including model, data values, contraints)
  • For each field with a ListConstraints alle the values are localized (new in 3.4.8)
  • A ListValueConstraint’s getAllowedValues() and getDisplayLabel() are called potentially thousands of times.

What we know

What’s wrong here? Several things. It starts with the search which is not very fast. In Alfresco 3.4.x a lucene PATH query is used here which is known to not be very fast, espcially with large repositories or when the lucene index is not stored on a really fast disk. For the document library Jeff Potts has suggested to use node.children instead of the lucnene PATH query to speed things up. Of course this only works if no filters are used.

Another issue with the search that adds to the performance problems of the datalists, is, that the whole datalist is read into the browser at once. There is no server side paging implemented for datalists which means that if you have 1000 rows in the datalist you have to wait for the browser to load all the 1000 rows before the first 50 are displayed. Stefano Argentiero has suggested a modification to the Alfresco datalist to add server side pagination which whould also help speeding up the datalist rendering.

Stefano Argentiero realized that running the Evaluator.run() function for each row takes a lot of time, but why is that? The Evaluator function calls the FormService for each row and constructs the whole form. This means that all field defintions are built, including contraints and for ListContraints even all the individual contraint values are collected. Since the fields are only displayed in the datalist (mode=”view”) these contraints values are never needed at all.

The new localization doesn’t help

With a change in Alfresco 3.4.9 (ALF-9514) each of these contraint values are even localized. That means if you have 1000 rows in the table and one field with a list contraint of 100 values (not uncommon) your ListContraint’s getDisplayLabel() and getAllowedValues() methods are easily called 100000 times when the datalist model is built in the webscript.

Anyone upgrading to Alfresco >=3.4.8 who is using custom ListContraints should make sure that…

  • Your getAllowedValues() method returns a cached List as quickly as possible
  • If you don’t need localized constraint values overwrite the getDisplayLabel() method to simply return the value to skip the localization code.

What can be done?

Apart from the issues that this new localization introduced I still wanted to have the datalist displayed as fast as possible. Calling the FormService and building the whole form over and over again for each row didn’t strike me as a smart approach. I looked into the evaluator.lib.js to see if can be removed for good ;-)

This is the block from evaluator.lib.js calls the FormService and extracts the data that is returned:

// Use the form service to parse the required properties
scriptObj = formService.getForm("node", node.nodeRef, fields, fields);

// Make sure we can quickly look-up the Field Definition within the formData loop...
var objDefinitions = {};
for each (formDef in scriptObj.fieldDefinitions)
{
 objDefinitions[formDef.dataKeyName] = formDef;
}

// Populate the data model
var formData = scriptObj.formData.data;
for (var k in formData)
{
 var isAssoc = k.indexOf("assoc") == 0,
	value = formData[k].value,
	values,
	type = isAssoc ? objDefinitions[k].endpointType : objDefinitions[k].dataType,
	endpointMany = isAssoc ? objDefinitions[k].endpointMany : false,
	objData =
	{
	   type: type
	};

 if (value instanceof java.util.Date)
 {
	objData.value = utils.toISO8601(value);
	objData.displayValue = objData.value;
	nodeData[k] = objData;
 }
 else if (endpointMany)
 {
	if (value.length() > 0)
	{
	   values = value.split(",");
	   nodeData[k] = [];
	   for each (value in values)
	   {
		  var objLoop =
		  {
			 type: objData.type,
			 value: value,
			 displayValue: value
		  };

		  if (Evaluator.decorateFieldData(objLoop, node))
		  {
			 nodeData[k].push(objLoop);
		  }
	   }
	}
 }
 else
 {
	objData.value = value;
	objData.displayValue = objData.value;

	if (Evaluator.decorateFieldData(objData, node))
	{
	   nodeData[k] = objData;
	}
 }
}

This can be easily replaced by that following block of code which doesn’t use the FormService at all:

// Populate the data model
for each(k in fields)
{
 var value = node.properties[k], objData = {};
 if (value) {
	var formkey = "prop_" + k.replace(":", "_");

	if (typeof value == "boolean") {
		objData.type = "boolean";
		objData.value = value;
		objData.displayValue = objData.value;
		nodeData[formkey] = objData;
	}
	else if (value && value.getMonth)
	{
	objData.type = "date";
		objData.value = utils.toISO8601(value);
		objData.displayValue = objData.value;
		nodeData[formkey] = objData;
	}
	else
	{
		objData.type = "text";
		objData.value = value ? value : "";
		objData.displayValue = objData.value;
		if (Evaluator.decorateFieldData(objData, node))
		{
			nodeData[formkey] = objData;
		}
	}
}
else {
	var assoc = node.assocs[k];
	if (assoc && assoc.length > 0) {

		var formkey = "assoc_" + k.replace(":", "_");
		nodeData[formkey] = [];

		for each(var value in assoc) {

			objData.type = value.typeShort;
			objData.value = value.nodeRef;
			objData.displayValue = objData.value;
			if (Evaluator.decorateFieldData(objData, value))
			{
				nodeData[formkey].push(objData);
			}
		}
	}
}
}

Does it help?

I tested this change using a datalist with 1000 rows and 6 columns with one column having a list contraint (about 30 values). With this configuration the original webscript returned in about 2.4s (on a fast dev system). Using my patched evalator.lib.js without the call to the FormService the same datagrid is displayed in 870ms. This is more than a 2x increase in performance. Depending on the number of rows and constraints your results may vary.

I didn’t need all the features that can be modeled in the datalist so this change does NOT support all the features of the original implementation:

  • Multivalue properties are currently not supported (could be easily added)
  • NodeRef properties are currently not supported (could be easily added)
  • You can not change the datalists values in the grid using FormFilters any more
  • maybe other things are broken…

Maybe this well help some of you out there to improve the performance of the datalists in your Alfresco solutions.

 

Splitting mp3/cue files into multiple mp3 files

Usually you encode single MP3 files for each track on a CD. But there are cases when it’s sensible to encode the whole CD as one MP3 file. This is usually done, when you have a continuos live album where you want to avoid gaps between the tracks und preserve the original timecodes of the CD. In these cases you create a single MP3 file and a CUE file which contains the timecodes and track names for the single tracks in the MP3.

When you later like to split thes MP3/CUE files into individial MP3 files you need a specialized tool. Lately I used Medieval CUE Splitter to achieve this. It is a small freeware application which splits the MP3 and even fill the ID3-Tags from the information in the CUE files.

UML done right?

Don’t get me wrong, I have modeled my share of use cases, class and sequence diagrams – but I feel that I lack a lot of background knowledge on how to put all together, an uml development process. So I’m currently diving into to the OMG documents at uml.org to find out how you would start out modeling in UML with a given methodology. You can more or less easily draw your diagrams with modeling tools like StarUML but I have the impression that with understanding the process and leveraging model transformation the use of UML will be even more useful. I guess I’ll have to get a practical book on that topic as I find myself confused every time I start out with a new project.
(Audio Version)

Still here .. with GWT and VMware image generation

And once again too much time has passed without me paying attention to the blog. I spent time experimenting with all sorts of apps and technologies and should’ve really posted some of my experiences.

Concerning my experiments with GWT – I upgraded to version 1.2.22 today and did realize the promised decrease in startup time when loading the project in the hosted mode browser but imho it could still be a bit smoother. The commercial GWT Designer (eclipse plugin, a free edition is available) looks very promising. Especially the useful wizards, refactoring support, I18N support and of course the visual designer make GWT development even more productive.

Another project I spend some time on lately, is the automatic generation of VMware images. The idea is basically to automate the installation of predefined software packages so that up-to-date images can be built at any time. Eventually custom images could be requested and the software installation could take place without supervision. I currently use the install scripts of the unattended.org project and an additional perl script to start the VM – as a prototype it works. I’ve also looked into other installer projects like WPKG, but I might end up creating my own database based package and profile management to enable user based profile definition via a web interface.

Btw, I need to get myself a new “java server” for the backyard: SUN’s Blackbox ;-)
(Audio Version)