EvalJ Project
EvalJ project develops java source
code for the evaluation of information retrieval experiments.
EvalJ is hosted on SourceForge: http://sourceforge.net/projects/evalj/
Please use the SourceForge interface to submit bug reports and/or
feature requests. Include the version information you can get using the
"about" command (see the section on EvalJ).
IMPORTANT NOTICE
Please note that EvalJ contains two software packages:
- XCGEval which evaluates runs with XCG
- EvalJ which evaluates submissions with PR (aka
inex-eval), PRNG (aka inex-eval-ng) and PRUM metrics (GR, PRUM and
EPRUM). An interface for XCG is currently in development.
Both packages are documented in this file. You will also find at the end of
the file some instrutions on how to build EvalJ from CVS.
XCGEval
Usage info for XCGEval
- To run XCGEval from the command line use:
java -Dorg.xml.sax.driver=gnu.xml.aelfred2.XmlReader -jar jars/EvalJ.jar [-e] [-q] [-gnu/jfree] -config yourconfigfile.prop
- To run XCGEval from within an IDE (e.g. Eclipse) use:
XCGEval [-q] [-e] [-gnu/jfree] -config yourconfigfile.prop
with VM arguments:
-Dorg.xml.sax.driver=gnu.xml.aelfred2.XmlReader -Xmx75
Command line options and arguments:
Run XCGEval without any arguments to get detailed usage info.
Use -q to print evaluation results for each query.
Use -e to score empty result topics as 0, i.e. topics that occur in the
recall-base but for which no results were returned by a run.
Use -gnu to produce gnuplot dat and gp files.
Use -jfree to use the jfree graphics modules to output graphs as png
files.
The config file must contain the following settings:
INEXDOCCOLL_DIR: aDir #the directory where the inex document collection
resides if different from ./inex-1.4.
EXPANDED_FULLRB: aFile #an XML file that contains all assessments in a single
file. It is automatically generated by the CreateExpandedFullRB class when
XCGEval is first run (as it speeds up consequent runs). Note that if
EXPANDED_FULLRB is given then the ASSESSMENTS_DIR option is ignored!
ASSESSMENTS_DIR: aDir #the directory where the assessments files are stored,
e.g. ./assess04. You may give a directory, a single file, a list of files
(separated by comma), or a regular expression, e.g. aDir/16*.xml.
SUBMISSIONRUNS_DIR: aDir #the directory containing the submission run files,
e.g. ./runs04". You may give a directory, a single file, a list of files
(separated by comma), or a regular expression, e.g. aDir/*/*.xml.
RESULTS_DIR: aDir #the directory where all results are to be written if
different from ./results.
METRICS: m1, m1, m3, ... #a comma separated list of metrics, available
options are: xCG, nxCG, ep/gr, q
QUANT_FUNCTIONS: q1, q2, ... #a comma separated list of quantisation
functions, available options are: strict/gen/sog[1,2,3]/sqsog[1,2,3], default
is sog2.
OVERLAP: o1, o2, ... #a comma separated list of overlap functions, available
options are: on/off, default is on.
DCV: dcv1, dcv2, ... #a comma separated list of document cutoff values for
{nXCG, MAnxCG}@DCV, default is 5, 10, 15, 25, 50, 100, 500, 1000, 1500.
The config file may contains the following optional settings:
TASK: aTask #specifies the task to be evaluated, e.g. CO, CAS, Co.Focussed,
which then acts as a filter. The submission runs' task attribute will be
matched against the task defined here.\n");
POOLS: p1, p2, ... #a comma separated list of assessment pools to use from
assessments (acts as a filter). If not given then first pool found is taken
for all topics (i.e. for multi-assessed topics, where more than one pool
exists).\n");
QUERY_TYPE: aType #query type, e.g. CO, CAS, CO+S. Acts as a filter, whereby
only assessments with the defined query type will be processed.\n");
DATA_POINTS: 100 #number of data points to use for graphs, if different from
100, e.g. effort-precision at 100 gain-recall points
The output of XCGEval is the results of the chosen metrics, printed on the
STDOUT and also stored in the xcgeval.log file.
If the -gnu option is used then additional output includes the .gnu and .dat
files, stored in the RESULTS_DIR directory.
If the -jfree option is used then additional output includes the .png graph
files, stored in the RESULTS_DIR directory.
The naming of the generated files is standardised to use a combination of the
run's id, the topic's id (or ALL), the
metric (i.e. xCG, nxCG, or nRnxCG = normalised Rank and normalised xCG,
ep-gr).
EvalJ
Usage info (for metrics frontend evalj.EvalJ):
EvalJ (under unix/linux) can be invoqued with the command scripts/evalj or
bin/evalj; otherwise,
use
java -cp jars/EvalJ.jar evalj.Eval
In the following, we denote EVALJ the start of the command line (either
scripts/evalj, bin/evalj or java -cp jars/EvalJ.jar).
We note EVALJDB the path to the directory that you choose
for EvalJ to store data.
A command is composed of three parts,
the global parameters (e.g. location of the database), the command (e.g. set collection),
and the command parameters:
EVALJ <global arguments> <command name> <arguments>
Commands
- help
- Get the list of commands
- about
- Get information about EvalJ (useful for a bug report)
- set database
- Setting the database is mandatory and must be done before any other command
but about/help.
- add assessments
- to add assessments
- set collection
- to set a new collection of XML documents
- list collections
- get the list of available collection
- list quantisations
- get the list of available quantisations
- evaluate
- to evaluate runs
- simulated run
- generate a simulated run from assessments
Instructions for evaluation
In order to construct the evalj database, the following steps should be
followed:
- Create a database directory (set database)
- Add collection(s)
- Add assessments
- Evaluate
Create a database directory (denoted EVALJDB in the following)
EVALJ set database EVALJDB
Note that the directory should not exist (or should have been
previously created by evalj).
Add collection(s)
EVALJ [-database EVALJDB] set collection [STATISTICS] [-search] -name COLLECTION_ID -dir ROOTDIR [-dtd DTDFILE] [-skip REGEXP]
where:
- -search
- Set the search flag for the collection. If set, files will be search recursively in the directory (can be useful
for the wikipedia collection but slows down the assessments indexing process. However, this
process is only needed once).
- STATISTICS
- can be either
-documents INT -elements INT -characters LONGINT
or -predefined NAME
where the list of predefined collections can be obtained by giving list
as NAME:
EVALJ set collection -predefined list
If no statistic option is given the collection is parsed in order to obtain the statistics.
if -predefined is given, then the -name attribute (id of the collection) is not needed.
- -name COLLECTION_ID
- option is used as an ID to identify the collection. Please note that you must use the "ieee" identifier
for INEX 2005 assessments since they use a new collection attribute which is set to "ieee" for the adhoc
task. Similarly, you must use "wikipedia" for INEX 2006.
- -dtd DTDFILE
- set the DTD used to parse the documents
Examples
For the IEEE collection:
EVALJ set collection -name ieee -dir INEXPATH -dtd DTDPATH/xmlarticle.dtd -skip volume.xml
For the Wikipedia collection:
If you have all the wikipedia documents in a single directory
EVALJ set collection -predefined adhoc-2006 dir WIKIPATH
or, in case you kept the files as provided, use the -search argument
EVALJ set collection -search -predefined adhoc-2006 dir WIKIPATH
Please note that the documents must be at the first level of the wikipedia directory (with the -predefined option,
at least the relevant documents).
Add the assessments
EVALJ -database EVALJDB add assessments -collection COLLECTION_ID POOL_ID BASEDIR
The COLLECTION_ID should be the default collection for the assessments (might
be overriden by assessment files)
The POOL_ID should be an unique identifier for the assessments collection.
For example, for the IEEE collection,
EVALJ [-database EVALJDB] add assessments -collection ieee my_pool
PATH/TO/ASSESSMENT/DIR
For INEX 2005 assessments, you do not need to use the "-collection"
attribute.
Run evaluation
EVALJ -database EVALJDB evaluate PARAMETERS RUNFILE [RUNFILE [...]]
where PARAMETERS are:
- -details
- is used to output (in results.xml) the result of individual
topics
- -collection COLLECTION_ID
- Set the default collection (might be overwritten by submission run)
- -force-collection
- Force the collection id to be the one given by the -collection option.
Useful when there is only one collection and you want to ignore the value given in the runs.
- -alias ALIAS COLLECTION_ID
- Creates an alias of a collection to be used when parsing runs. For example,
-alias wikipedia wikien maps the references to collection wikipedia to collection wikien
- -limit LIMIT
- Set the default maximum number of elements in each topical run (0 for no limit).
Default value is 1500 (INEX default).
OUTDIR is a directory where evaluation results will be stored
- -task TASK
- only evaluates runs of the given task (TASK might be: CO.FetchBrowse, CO.Focussed, etc.)
- -type TYPE
- Restrict evaluation to a given type of run. TYPE can be
adhoc
, nlp
, etc.
- -topics TOPICS
- Restrict the list of topics to be evaluated. TOPICS is a (compact) list of topics
like e.g. 128-130,140 for topics 128, 129, 130 and 140. When not specified, all the pool topics are used.
- -only-metric METRICS
- Restrict the metrics to be used. METRICS is a comma separated list containing the id of the metrics
that should be used for evaluation.
- -pool POOL_ID
- POOL_ID is assessment set to evaluate against
- -out OUTDIR
- The output directory to use (created if it does not exist)
- -metrics METRICS_FILE
- The metric definition file to use (this parameter can be used one or more time to use more metric definitions).
A metric definition file is an XML file that contains the definition of the metrics
you want to use. The root element is <metrics> and its children are
tags
with a the name of a metric (PR, PRNG, XCG (experimental), E2PRUM, PRUM, GR
(experimental)).
Options are given using one child element whose name is the option name
and whose "value" attribute is the value. The content of the element are
suboptions
and are constructed similarly.
Some metric definitions are available in the metrics
directory.
A complete example (containing all the possible options) is:
<metrics>
<!-- In this file, quantisation can take the values returned by the command EVALJ list quantisations -->
<!-- EPRUM metrics -->
<EPRUM>
<!-- Module used to generated ideal elements -->
<generator value="GKTargetGenerator"><quantisation value="SOG"/></generator>
<!-- Quantisation used for scoring ideal elements -->
<quantisation value="Exh"/>
<!-- Modifies the quantisation so it is binary -->
<binary value="false"/>
<!-- User behaviour: can be
* hierarchic
* T2I
* null (ie, classical model)
* BEPDistance (for the BEC task) with a parameter A. In this case use (here A=1): <behaviour value="BEPDistance"><A value="1"/></behaviour>
-->
<behaviour value="hierarchic"/>
</EPRUM>
<!-- Generalised Precision/Recall metrics -->
<PR>
<quantisation value="SOG"/>
</PR>
<!-- Precision/Recall NG metrics -->
<!-- The two quantisations are standard and should not be changed -->
<PRNG>
<recallQuantisation value="Exh"/>
<precisionQuantisation value="Spe"/>
</PRNG>
<!-- BEPD is a set based metric for the best in context task.
The lower A, the more precise a system should be to get a high score -->
<BEPCD>
<a value="10"/>
<normalised value="true"/>
</BEPCD>
</metrics>
In the evaluation output directory, you will find after the evaluation some files (data and graphics), and
a results.xml
file. This file contains all the results for a metric. A summary of the data it contains (average
precision, precision at some recall points and ranks) can be obtained by applying the stylesheet summarizeResults.xsl
to the results.xml
file.
CVS
See the sourceforge project page for instructions.
After updating the source tree, you can "ant" to build the code:
ant clean makejar
More ant options
The next commands may require some changes to build.xml, to
reflect your local settings.
XCGEval can be run using inex2005.prop:
ant run
GNUplot graphs are generated using
ant gnuplot
A version of the package with installer can be produced by running
ant package
ant package