EvalJ Project

EvalJ project develops java source code for the evaluation of information retrieval experiments.

EvalJ is hosted on SourceForge: http://sourceforge.net/projects/evalj/

Please use the SourceForge interface to submit bug reports and/or feature requests. Include the version information you can get using the "about" command (see the section on EvalJ).


Please note that EvalJ contains two software packages:
Both packages are documented in this file. You will also find at the end of the file some instrutions on how to build EvalJ from CVS.


Usage info for XCGEval

Command line options and arguments:

Run XCGEval without any arguments to get detailed usage info.
Use -q to print evaluation results for each query.
Use -e to score empty result topics as 0, i.e. topics that occur in the recall-base but for which no results were returned by a run.
Use -gnu to produce gnuplot dat and gp files.
Use -jfree to use the jfree graphics modules to output graphs as png files.

The config file must contain the following settings:
INEXDOCCOLL_DIR: aDir #the directory where the inex document collection resides if different from ./inex-1.4.
EXPANDED_FULLRB: aFile #an XML file that contains all assessments in a single file. It is automatically generated by the CreateExpandedFullRB class when XCGEval is first run (as it speeds up consequent runs). Note that if EXPANDED_FULLRB is given then the ASSESSMENTS_DIR option is ignored!
ASSESSMENTS_DIR: aDir #the directory where the assessments files are stored, e.g. ./assess04. You may give a directory, a single file, a list of files (separated by comma), or a regular expression, e.g. aDir/16*.xml.
SUBMISSIONRUNS_DIR: aDir #the directory containing the submission run files, e.g. ./runs04". You may give a directory, a single file, a list of files (separated by comma), or a regular expression, e.g. aDir/*/*.xml.
RESULTS_DIR: aDir #the directory where all results are to be written if different from ./results.
METRICS: m1, m1, m3, ... #a comma separated list of metrics, available options are: xCG, nxCG, ep/gr, q
QUANT_FUNCTIONS: q1, q2, ... #a comma separated list of quantisation functions, available options are: strict/gen/sog[1,2,3]/sqsog[1,2,3], default is sog2.
OVERLAP: o1, o2, ... #a comma separated list of overlap functions, available options are: on/off, default is on.
DCV: dcv1, dcv2, ... #a comma separated list of document cutoff values for {nXCG, MAnxCG}@DCV, default is 5, 10, 15, 25, 50, 100, 500, 1000, 1500.
The config file may contains the following optional settings:
TASK: aTask #specifies the task to be evaluated, e.g. CO, CAS, Co.Focussed, which then acts as a filter. The submission runs' task attribute will be matched against the task defined here.\n");
POOLS: p1, p2, ... #a comma separated list of assessment pools to use from assessments (acts as a filter). If not given then first pool found is taken for all topics (i.e. for multi-assessed topics, where more than one pool exists).\n");
QUERY_TYPE: aType #query type, e.g. CO, CAS, CO+S. Acts as a filter, whereby only assessments with the defined query type will be processed.\n");
DATA_POINTS: 100 #number of data points to use for graphs, if different from 100, e.g. effort-precision at 100 gain-recall points

The output of XCGEval is the results of the chosen metrics, printed on the STDOUT and also stored in the xcgeval.log file.
If the -gnu option is used then additional output includes the .gnu and .dat files, stored in the RESULTS_DIR directory.
If the -jfree option is used then additional output includes the .png graph files, stored in the RESULTS_DIR directory.
The naming of the generated files is standardised to use a combination of the run's id, the topic's id (or ALL), the
metric (i.e. xCG, nxCG, or nRnxCG = normalised Rank and normalised xCG, ep-gr).


Usage info (for metrics frontend evalj.EvalJ):

EvalJ (under unix/linux) can be invoqued with the command scripts/evalj or bin/evalj; otherwise, use

java -cp jars/EvalJ.jar evalj.Eval

In the following, we denote EVALJ the start of the command line (either scripts/evalj, bin/evalj or java -cp jars/EvalJ.jar). We note EVALJDB the path to the directory that you choose for EvalJ to store data. A command is composed of three parts, the global parameters (e.g. location of the database), the command (e.g. set collection), and the command parameters:

EVALJ <global arguments> <command name> <arguments>


Get the list of commands
Get information about EvalJ (useful for a bug report)
set database
Setting the database is mandatory and must be done before any other command but about/help.
add assessments
to add assessments
set collection
to set a new collection of XML documents
list collections
get the list of available collection
list quantisations
get the list of available quantisations
to evaluate runs
simulated run
generate a simulated run from assessments

Instructions for evaluation

In order to construct the evalj database, the following steps should be followed:

  1. Create a database directory (set database)
  2. Add collection(s)
  3. Add assessments
  4. Evaluate

Create a database directory (denoted EVALJDB in the following)

EVALJ set database EVALJDB
Note that the directory should not exist (or should have been previously created by evalj).

Add collection(s)

EVALJ [-database EVALJDB] set collection [STATISTICS] [-search] -name COLLECTION_ID -dir ROOTDIR [-dtd DTDFILE] [-skip REGEXP]
Set the search flag for the collection. If set, files will be search recursively in the directory (can be useful for the wikipedia collection but slows down the assessments indexing process. However, this process is only needed once).
can be either -documents INT -elements INT -characters LONGINT or -predefined NAME where the list of predefined collections can be obtained by giving list as NAME:
EVALJ set collection -predefined list
If no statistic option is given the collection is parsed in order to obtain the statistics. if -predefined is given, then the -name attribute (id of the collection) is not needed.
option is used as an ID to identify the collection. Please note that you must use the "ieee" identifier for INEX 2005 assessments since they use a new collection attribute which is set to "ieee" for the adhoc task. Similarly, you must use "wikipedia" for INEX 2006.
set the DTD used to parse the documents

For the IEEE collection:

EVALJ set collection -name ieee -dir INEXPATH -dtd DTDPATH/xmlarticle.dtd -skip volume.xml

For the Wikipedia collection: If you have all the wikipedia documents in a single directory

EVALJ set collection -predefined adhoc-2006 dir WIKIPATH
or, in case you kept the files as provided, use the -search argument
EVALJ set collection -search -predefined adhoc-2006 dir WIKIPATH
Please note that the documents must be at the first level of the wikipedia directory (with the -predefined option, at least the relevant documents).

Add the assessments

EVALJ -database EVALJDB add assessments -collection COLLECTION_ID POOL_ID BASEDIR
The COLLECTION_ID should be the default collection for the assessments (might be overriden by assessment files)
The POOL_ID should be an unique identifier for the assessments collection.

For example, for the IEEE collection,
EVALJ [-database EVALJDB] add assessments -collection ieee my_pool PATH/TO/ASSESSMENT/DIR
For INEX 2005 assessments, you do not need to use the "-collection" attribute.

Run evaluation

where PARAMETERS are:
is used to output (in results.xml) the result of individual topics
-collection COLLECTION_ID
Set the default collection (might be overwritten by submission run)
Force the collection id to be the one given by the -collection option. Useful when there is only one collection and you want to ignore the value given in the runs.
Creates an alias of a collection to be used when parsing runs. For example, -alias wikipedia wikien maps the references to collection wikipedia to collection wikien
-limit LIMIT
Set the default maximum number of elements in each topical run (0 for no limit). Default value is 1500 (INEX default). OUTDIR is a directory where evaluation results will be stored
-task TASK
only evaluates runs of the given task (TASK might be: CO.FetchBrowse, CO.Focussed, etc.)
-type TYPE
Restrict evaluation to a given type of run. TYPE can be adhoc, nlp, etc.
-topics TOPICS
Restrict the list of topics to be evaluated. TOPICS is a (compact) list of topics like e.g. 128-130,140 for topics 128, 129, 130 and 140. When not specified, all the pool topics are used.
-only-metric METRICS
Restrict the metrics to be used. METRICS is a comma separated list containing the id of the metrics that should be used for evaluation.
-pool POOL_ID
POOL_ID is assessment set to evaluate against
The output directory to use (created if it does not exist)
The metric definition file to use (this parameter can be used one or more time to use more metric definitions).

A metric definition file is an XML file that contains the definition of the metrics you want to use. The root element is <metrics> and its children are tags with a the name of a metric (PR, PRNG, XCG (experimental), E2PRUM, PRUM, GR (experimental)). Options are given using one child element whose name is the option name
and whose "value" attribute is the value. The content of the element are suboptions and are constructed similarly.

Some metric definitions are available in the metrics directory. A complete example (containing all the possible options) is:

<metrics> <!-- In this file, quantisation can take the values returned by the command EVALJ list quantisations --> <!-- EPRUM metrics --> <EPRUM> <!-- Module used to generated ideal elements --> <generator value="GKTargetGenerator"><quantisation value="SOG"/></generator> <!-- Quantisation used for scoring ideal elements --> <quantisation value="Exh"/> <!-- Modifies the quantisation so it is binary --> <binary value="false"/> <!-- User behaviour: can be * hierarchic * T2I * null (ie, classical model) * BEPDistance (for the BEC task) with a parameter A. In this case use (here A=1): <behaviour value="BEPDistance"><A value="1"/></behaviour> --> <behaviour value="hierarchic"/> </EPRUM> <!-- Generalised Precision/Recall metrics --> <PR> <quantisation value="SOG"/> </PR> <!-- Precision/Recall NG metrics --> <!-- The two quantisations are standard and should not be changed --> <PRNG> <recallQuantisation value="Exh"/> <precisionQuantisation value="Spe"/> </PRNG> <!-- BEPD is a set based metric for the best in context task. The lower A, the more precise a system should be to get a high score --> <BEPCD> <a value="10"/> <normalised value="true"/> </BEPCD> </metrics>

In the evaluation output directory, you will find after the evaluation some files (data and graphics), and a results.xml file. This file contains all the results for a metric. A summary of the data it contains (average precision, precision at some recall points and ranks) can be obtained by applying the stylesheet summarizeResults.xsl to the results.xml file.


See the sourceforge project page for instructions. After updating the source tree, you can "ant" to build the code:

ant clean makejar

More ant options

The next commands may require some changes to build.xml, to
reflect your local settings. XCGEval can be run using inex2005.prop:
ant run

GNUplot graphs are generated using

ant gnuplot
A version of the package with installer can be produced by running ant package

ant package