Background
A research paper is a peer reviewed description of a body of work. Our research output is much more than the page-limited paper it is described by. For example, as part of our research we will write tech reports containing full descriptions of the work, software that realises the work, proofs that verify the work’s correctness, models that encapsulate the ideas, test suites and benchmarks to document empirical evidence, and so on. The quality of these research artifacts is just as important as that of the paper itself, perhaps even more so. Yet many of our conferences offer no formal means to submit and evaluate anything but the paper. This should change!
Artifact Evaluations [1] have steadfast become a common sight in our community. This year the 20th Asian Symposium on Programming Languages and Systems (APLAS’22) is excited to launch its own Artifact Evaluation process, that will allow authors of accepted papers to optionally submit supporting artifacts. The goal of artifact evaluation is two-fold: to probe further into the claims and results presented in a paper, and to reward authors who take the trouble to create useful artifacts to accompany the work in their paper. Although artifact evaluation is optional, we highly encourage authors of accepted papers to participate in this process.
The evaluation and dissemination of artifacts improves reproducibility and enables authors to build on top of each other’s work. Beyond helping the community, the evaluation and dissemination of artifacts confers several direct and indirect benefits to the authors themselves.
The ideal outcome for the artifact evaluation process is to accept every artifact that is submitted, provided it meets the evaluation criteria mentioned in the Call for Artifacts. We will strive to remain as close as possible to that ideal goal. However, even though some artifacts may not pass muster and may be rejected, we will evaluate in earnest and make our best attempt to follow authors’ evaluation instructions.
[1] https://www.artifact-eval.org/
The Process
To maintain the separation of paper and artifact review, authors will only be asked to upload their artifacts after their papers have been accepted. Authors planning to submit to the artifact evaluation should prepare their artifacts well in advance of this date to ensure adequate time for packaging and documentation.
Throughout the artifact review period, submitted reviews will be (approximately) continuously visible to authors. Reviewers will be able to continuously interact (anonymously) with authors for clarifications, system-specific patches, and other logistics help to make the artifact evaluable. The goal of continuous interaction is to prevent rejecting artifacts for minor issues, not research related at all, such as a “wrong library version”-type problem. The conference proceedings will include a discussion of the continuous artifact evaluation process.
Types of Artifacts
The artifact evaluation will accept any artifact that authors wish to submit, broadly defined. A submitted artifact might be:
- software
- mechanized proofs
- test suites
- data sets
- hardware (if absolutely necessary)
- a video of a difficult- or impossible-to-share system in use
- any other artifact described in a paper
When in doubt authors are encouraged to contact the AEC Co-chairs for guidance.
Artifact Evaluation Committee
By design, members of the Artifact Evaluation Committee (AEC) represent a broad church of experience ranging from senior graduate students to research associates, to lecturers and professors. All are welcome! Formation of the AEC is through an open call that supports those from underrepresented and distant groups to be become involved with the APLAS community.
A broad church is necessary as, among researchers, experienced graduate students are often in the best position to handle the diversity of systems expectations that the AEC will encounter. In addition, graduate students represent the future of the community, so involving them in the AEC process early will help push this process forward. The AEC chairs devote considerable attention to both mentoring and monitoring, helping to educate the students on their responsibilities and privileges.
This text was adapted from existing text from the ESOP’22 & PLDI’22 AECs.
Call for Artifacts
APLAS 2022 will have post-paper-acceptance voluntary artifact evaluation (new in 2022!). Authors of accepted will be welcome to submit artifacts for evaluation after paper notification. The outcome will not alter the paper acceptance decision.
We will publish the submission links once authors of accepted papers have been notified of acceptance.
Submission
Please submit your artifact via HotCRP:
General Information
Research artifacts are not just about presenting raw code and raw data. The artifact evaluation process is there to aid in reproducible research, and research output conservancy. A well-packaged artifact is key to ensuring the longevity and use of your work for decades to come. Thus, you should think of the readers of the packaged artifact as a history buff on a curated tour in a museum, rather than an archaeologist in the middle of a dig searching for answers.
It belongs in a Virtual Machine (or Container)!
We strongly advise packaging your artifact in a virtual machine, or container, that runs out of the box with very little system-specific configuration.
We encourage all authors to include provide a working installation of their artifact, together with the source and build scripts to facilitate regeneration of the artifact.
Provisioning pre-built virtual machines, and containers, is preferable to providing tooling to build them, as this alleviates reliance on external dependencies.
For some submissions, for example FPGA related research, provisioning access to the hardware itself is not possible.
We thus advise the authors to think about how their work can be made accessible to reviewers, and potentially what (if any) of the work can be bundled in a Virtual Machine or Container to demonstrate their research’s results.
Artifact evaluation is private
Submission of an artifact does not imply automatic permission to make its content public.
AEC members will be instructed that they may not publicize any part of the submitted artifacts during or after completing evaluation, and they will not retain any part of any artifact after evaluation. Thus, you are free to include models, data files, proprietary binaries, and similar items in your artifact.
Artifact evaluation is single-blind.
Please take precautions (e.g. turning off analytics, logging) to help prevent accidentally learning the identities of reviewers.
Submission Requirements
Your submission to HotCRP should consist of three things:
- The latest version (ideally camera-ready version) of your accepted paper (in pdf format).
- A
README.md
file that explains your artifact (details below). - A Zenodo link (details below).
We detail the packaging requirements of the artifact in the next section.
README.md
The README.md
file is there to provide reviewers with salient information about the artifact for use during the review process.
Specifically, the README.md
file should detail:
- the artifact itself, providing salient information about what is being submitted and how it relates to the submitted paper; and
- the size of the artifact.
Zenodo Link
Please create a Zenodo (https://zenodo.org/) repository. If you intend to publish the artifact, you can choose Open Access for License. Please note that this would generate a Zenodo DOI that is permanently public. On the other hand, you can create a “private” repository by checking Restricted Access which would require you to grant permission to someone (in our case, the AEC members) who wanted to access the repository.
Packaging Requirements
To ensure consistency for the AEC Members we require all artifacts to adhere to the following requirements.
- The ‘artifact’ must be submitted as a single archive in a known open format.
- The intended audience of the artifact is an interested researcher from the future, not the reviewers themselves.
- The artifact must contain:
- A
README.md
file that provides: information About the Archive; a Link to Research Paper section; a Getting Started Guide; and Step-by-Step Instructions that connects your submitted artifact to the submitted research paper. - the artifact itself, guidance over packaging can be found in the next section;
- copies of any source code contained within the artifact.
About the Archive
The README.md
will be one of the first parts of the artifact that the reader encounters.
Please use this space to provide salient information about what is being submitted and how it relates to the submitted paper, and what the reader is to expect from interacting with the artifact, and how best the reader can interact with the artifact.
Moreover, alongside detailing the minimum software requirements required to interact with the contained artifact, and what (if any) external dependencies the artifact relays upon, an explicit manifest of what other useful software the container/virtual machine has that the user may require, for example which editor support is there would be useful.
Link to Research Paper
The Link to Research Paper section will detail how the research artifact links to the research paper.
Explicitly it should list:
- Claims from the paper supported by the artifact, and how/why.
- Claims from the paper not supported by the artifact, and how/why.
Example: Performance claims cannot be reproduced in VM, authors are not allowed to redistribute specific benchmarks, etc. Artifact reviewers can then center their reviews / evaluation around these specific claims.
Getting Started Guide
The Getting Started Guide should contain setup instructions (including, for example, a pointer to the VM player software, its version, passwords if needed, etc.) and basic testing of your artifact that you expect a reviewer to be able to complete in 30 minutes.
Reviewers will follow all the steps in the guide during an initial kick-the-tires phase.
The Getting Started Guide should be as simple as possible, and yet it should stress the key elements of your artifact. Anyone who has followed the Getting Started Guide should have no technical difficulties with the rest of your artifact.
Step-By-Step Instructions
The Step-by-Step Instructions explain how to reproduce any experiments or other activities that support the conclusions in your paper. Write this for readers who have a deep interest in your work and are studying it to improve it or compare against it. If your artifact runs for more than a few minutes, point this out and explain how to run it on smaller inputs.
Where appropriate, include descriptions of and links to files (included in the archive) that represent expected outputs (e.g. the log files expected to be generated by your tool on the given inputs); if there are warnings that are safe to be ignored, explain which ones they are.
Why Copies of the submitted code?
We require copies of source code/documentation alongside the artifact as sometimes a container does not contain the most optimal way to view the source code. Think of it as the provided source code is the sheet music and the container is a recording of it in action
Containing the Artifact
When packaging your artifact, please keep in mind: a) how accessible you are making your artifact to other researchers, and b) the fact that the AEC members will have a limited time in which to make an assessment of each artifact.
Your artifact should have a container or a bootable virtual machine image with all of the necessary libraries installed.
We strongly recommend use of the following technologies to contain the artifact:
- Docker (https://www.docker.com)
- VirtualBox (https://www.virtualbox.org)
Other technologies that authors may find helpful are:
- Packer (https://www.packer.io) for scripting, and staging, container/virtual machine creation.
Using a container or a virtual machine image provides a way to make an easily reproducible environment — it is less susceptible to bit rot. It also helps the AEC have confidence that errors or other problems cannot cause harm to their machines.
It would also be prudent to be mindful of the final size of the resulting container/virtual machine. Anything greater than 1 GB in size can be a hefty download for those not on University networks. There are small, more lightweight, Linux Distributions that can help reduce the virtual machine/container size to a couple hundred megabytes rather than several gigabytes. One can also strip the virtual machine of unnecessary software. There are guides out there for how to do this. Moreover, is a full GUI necessary or is a purely TUI good enough for submission?
We stress that authors should consider how they can reduce the size of your artifact to only the necessary components.
You should upload your artifact to Zenodo and submit the Zenodo link. Please use open formats for documents.
Discussion with Reviewers
We expect each artifact to receive 3-4 reviews.
Throughout the review period, reviews will be submitted to HotCRP and will be (approximately) continuously visible to authors. AEC reviewers will be able to continuously interact (anonymously) with authors for clarifications, system-specific patches, and other logistics to help ensure that the artifact can be evaluated. The goal of continuous interaction is to prevent rejecting artifacts for a “wrong library version” types of problems.
Distinguished Artifacts
Based on the reviews and discussion among the AEC, one or more artifacts will be selected for Distinguished Artifact awards.
Evaluation Criteria, Badges, and Process.
Note This text was adapted from existing text from the ESOP’22 & PLDI’22 AECs.
Evaluation Approach
Artifacts submitted to APLAS AEC 2022 will be judged by how well the submitted artifact conforms to the expectations set in the supplied documentation, and by how well the paper’s claims are presented.
The specific high-level guiding approach we will use are:
Consistency with the paper
The artifact should reproduce the same results, modulo experimental error, as the paper where possible. Not all work reported, however, is reproducible within an isolated virtual environment. Take for example: work that compares with, or involves, closed source software, software with restrictions on use and export, or software that requires one to sign-up to use; performance studies with experiments that work on certain bare metal machines; and distributed system’s research that requires non-reproducible (or accessible) experimental setup such as a data centre.
In these cases, we ask that authors make a reasonable & good faith effort to present their paper’s claims when producing their artefact.
For studies relying on unobtainable/hard to obtain software it could involve reporting on a different aspect of the result. Whilst also providing information on how to reproduce the papers main results. For example, a show case of what the tool can do in light of not being able to obtain the compared software.
For example, with performance studies (and distributed systems) it could involve the artefact documenting the paper’s claims (using the same experimental software) but for a different setting.
If there is any doubt or concern from authors, we stress that authors contact the AEC Chairs in good stead of the submission deadline.
Completeness
The artifact should reproduce all the results that the paper reports, and should include everything (code, tools, 3rd party libraries, etc.) required to do so.
That being said, and in line with remarks from the previous criterion, some research will rely on software that is not easily obtainable now or in the future. Whilst such software may be accessible now, it might not be in the future.
As with the previous criterion: authors must make a reasonable & good faith effort to package their artifacts completely as possible. Where not, there should be clear rationale as to why.
Documentation
The artifact should be well documented so that reproducing the results is easy and transparent.
Ease of reuse
The artifact provides everything needed to build on top of the original work, including source files together with a working build process that can recreate the binaries provided.
Note that artifacts will be evaluated with respect to the claims and presentation in the submitted version of the paper, not the camera-ready version.
Badges
Upon successful evaluation, a submitted artifact the APLAS AEC 2022 will awarded badges that detail how well the evaluation criteria has been satisfied.
The three badges are:
- Accessible has the artifact been made publicly available or not.
- Verified has the artifact supported the claims made by the paper.
- Expandable can the artifact be used as the basis for further research by others.
Artifacts given one or both of the Verified, Functional and Expandable badges are referred to as accepted.
After decisions on the Verified and Expandable badges have been made, authors of any artifacts (including those not reviewed by the AEC, and those reviewed but not found Verified during reviewing) can earn an additional badge for their artifact being durably available.
We now detail the badges and their requirements.
Accessible
We hope that this will be the baseline outcome for all submitted artifacts. The Accessible badge will be awarded automatically if the artifact has been made available publicly on Zenodo. We strongly suggest, but do not require, that Verified artifacts are also Accessible ones. The reason for this is that not all artifacts can be disseminated publicly. We ask that if the archive cannot be shared publicly that the rationale be clearly stated and documented in the artifact submission.
Verified
The Verified badge will be awarded if the artifact supports the claims made in the paper.
This is the baseline outcome for attesting how well the artifact supports the claims made in the submitted paper.
In the ideal case, an artifact with this designation includes all relevant code, dependencies, input data (e.g. benchmarks), and the artifact’s documentation is sufficient for reviewers to reproduce the exact results described in the paper.
If the artifact claims to outperform a related system in some way (for example, in time or accuracy) and the other system was used to generate new numbers for the paper (for example, an existing tool was run on new benchmarks not considered by the corresponding publication), artifacts should include a version of that related system, and instructions for reproducing the numbers used for comparison as well. If the alternative tool crashes on a subset of the inputs, simply note this expected behaviour.
If there are claims that cannot be substantiated by the artifact, for whatever reason, such deviation must be clearly documented and well substantiated.
Deviations from this ideal must be for good reason. A non-exhaustive list of justifiable deviations includes:
- Some benchmark code is subject to licensing or intellectual property restrictions and cannot legally be shared with reviewers (for example, licensed benchmark suites like SPEC, or when a tool is applied to private proprietary code).
In such cases, all available benchmarks should be included.
If all benchmark data from the paper falls into this case, alternative data should be supplied: providing a tool with no meaningful inputs to evaluate on is not sufficient to justify claims that the artifact works.
- Some of the results are performance data, and therefore exact numbers depend on the particular hardware.
In this case, artifacts should explain how to recognize when experiments on other hardware reproduce the high-level results (for example, that a certain optimization exhibits a particular trend, or that comparing two tools one outperforms the other in a certain class of cases).
- In some cases, repeating the evaluation may take a long time. Reviewers may not reproduce full results in such cases.
- In some cases, the artifact may require specialized hardware (e.g., a CPU with a particular new feature, or a specific class of GPU, or a cluster of GPUs).
For such cases, authors should contact the Artifact Evaluation Chairs as soon as possible after round one notification to work out how to make these possible to evaluate.
In past years one outcome was that an artifact requiring specialized hardware paid for a cloud instance with the hardware, which reviewers could access remotely.
Expandable
The final badge may only be award to artifacts judged Verified.
The Expandable badge is awarded to artifacts that reviewers feel is particularly well packaged, documented, designed, etc. to support future research that might build on the artifact. For example, if it seems relatively easy for others to reuse this directly as the basis of a follow-on project, the AEC may award a Reusable badge. For binary-only artifacts to be considered Reusable, it must be possible for others to directly use the binary in their own research, such as a JAR file with very high-quality client documentation for someone else to use it as a component of their own project.
Artifacts with source can be considered Expandable if:
- they can be reused as components;
- others can learn from the source and apply the knowledge elsewhere (e.g., learning an implementation or proof/formalization technique for use in a separate codebase); or
- others can directly modify and/or extend the system to handle new or expanded use cases.
Evaluation Process
To maintain the separation of paper and artifact review, authors will only be asked to upload their artifacts after their papers have been accepted. Authors planning to submit to the artifact evaluation should prepare their artifacts well in advance of this date to ensure adequate time for packaging and documentation.
Throughout the artifact review period, submitted reviews will be (approximately) continuously visible to authors.
Reviewers will be able to continuously interact (anonymously) with authors for clarifications, system-specific patches, and other logistics help to make the artifact reviewable.
The goal of continuous interaction is to prevent rejecting artifacts for minor issues, not research related at all, such as a “wrong library version”-type problem.
The conference proceedings will include a discussion of the continuous artifact evaluation process.
All communications will happen using the APLAS’22 AEC HotCRP instance.
The review process will consist of three phases:
- Phase 1 ‘Kick-The-Tyres’
- Phase 2 ‘Full Review’
- Phase 3 ‘Iron-out-the-Wrinkles’
More details about the reviewing process are available in Reviewer Information.
Evaluation Criteria
For the two initial phases, reviewers will be asked to ‘grade’ the artifact against the following explicit criteria.
‘Kick-The-Tyres’
- Has the artifact been packaged correctly and follows the submission requirements?
- Can the ‘Getting Started Guide’ be followed successfully to completion?
- Does the artifact document sufficiently the claims supported by and not supported by the paper? This includes, where appropriate, detailing any deviation from the claims made by the paper and that such claims have been sufficiently well-documented and rationalised?
Main Review.
For awarding the Verified Badge:
- The artifact runs out-of-the-box;
- The artifact includes all relevant code, dependencies, input data (e.g., benchmarks);
- The documentation is sufficient to run and reproduce the results claimed in the paper now, and also in the future;
- The artifact is a reasonable & good faith effort to present their paper’s claims;
For awarding the Expandable Badge:
- The Reviewer felt that a future consumer of the artifact was on a curated tour;
- The artifact documents how the artifact can be extended;
- The artifact documents a correspondence between the proofs/code in the paper and that in the artifact;
- The Reviewer has confidence that the artifact can be easily reused in the future as the basis of a follow-on project;
- The Reviewer has confidence that the artifact can be easily modified in the future and/or extended to handle new or expanded use cases.
Grading Scheme
The grading scheme we will use is:
- Strongly Disagree
- Disagree
- Agree
- Strong Agree
Reviewer Information
The 20th Asian Symposium on Programming Languages and Systems (APLAS’22) is going to be holding its first Artifact Evaluation Committee (AEC). The artifact evaluation process aims to promote, share, and catalogue the research artifacts of papers accepted to the APLAS research track. We are looking for motivated researchers at all academic stages (PhD Students, Researchers, Lecturers, & Professors) to join us on the inaugural APLAS’22 AEC.
Nomination Forms.
The self nomination form:
To nominate a colleague, please use this form
As a committee member your primary responsibility will be to review artifacts submitted by authors of accepted papers and ensure that the artifact is a faithful representation of the accepted paper’s results. This will involve interacting with some tooling provided by the authors, check if the results of the main paper are consistent with the claims in the paper and are also reproducible for researchers to come. APLAS will use a three-phase artifact review process: Kick-The-Tyres; Review the Artifact; and Iron-out-the-Wrinkles. Instructions for chosen committee members will be made available once the committee has been formed.
We will close nominations on:
- Friday 8th July 2022 (AOE)
and notify the selected committee members on:
- Friday 15th July 2022 (AOE)
Important Dates (AOE)
- Author Artifact Submission: Thursday 18th August 2022.
- Reviewer Preferences Due: Tuesday 23rd August, 2022
- Review Process:
- Phase 1 ‘Kick-The-Tyres’ Review Due: Wednesday, 31st August, 2022
- Phase 2 ‘Full Review’ Due: Monday, 12th September, 2022
- Phase 3 ‘Iron-out-the-Wrinkles’ Due: Monday 19th September, 2022
- Author Notification: Thursday 22nd September 2022.
We expect the majority of the reviewing process to be performed between 22nd August 2022 and 19th September 2022. We expect most of the reviewing process to be performed between 22nd August 2022 and 16th September 2022. We expect each artifact to take around eight hours to review and we will look to assign each reviewer three to four reviews. For each artifact we will assign a Lead Reviewer to lead the reviewing process.
Reviewing Process
We expect each artifact to take, on average, eight hours to review, and we will look to assign each reviewer three to four reviews. For each artifact we will assign a Lead Reviewer to lead the reviewing process.
The review process is highly-interactive, you will be communicating anonymously with the authors, and you will know the identity of your fellow reviewers.
All communications will happen using the APLAS’22 AEC HotCRP instance.
Phase 1 ‘Kick-The-Tyres’
The aim of the first phase is to ensure that the artifacts are ready for reviewing. The first phase of the review process will require reviewers to check that they can:
- Obtain the artifact using the provided instructions.
- Go through a ‘Getting Started Guide’ to ensure the artifact is fit for the main review.
Each reviewer will be asked to submit a short review based on these checks. These initial reviews will be immediately available to authors, who will be able to communicate with the reviewers to address any issues found.
Phase 2 ‘Full Review’
The aim of the second phase is to conduct a thorough assessment of the artifact against the paper, and to submit full, complete reviews that extend and expand upon the initial Phase 1 reviews as necessary. As before, these reviews will be immediately available to authors, and they can communicate with you through HotCRP.
During this phase you will decide whether or not the submitted artifact satisfies the main criteria for Badges.
Phase 3 ‘Iron-out-the-Wrinkles’
We expect the majority of evaluations to be complete after the initial two phases. The third phase, however, is for artifacts whose review process still has issues after Phase 2. This additional phase will give authors and reviewers extra time to discuss and address any pertinent issues that stops the artifact from being reviewed.
Evaluation Guidelines
SIGPLAN has produced some guidance for reviewing empirical evaluations.
https://www.sigplan.org/Resources/EmpiricalEvaluation/
The ECOOP 2018 Committee produced some guidance for reviewing proof artifacts:
https://proofartifacts.github.io/guidelines/ecoop_guidelines.html
some more general guidance for proof artifacts is:
https://proofartifacts.github.io/guidelines/
This call was adapted from the PLDI’22/ESOP’22 AEC reviewer information guides.