2021-22
Effort: ~60 hours per student
Credit: 40% of overall module mark
Team size: 3-4 students.
Due date: Friday 4th March 2022, 16:00.
Handin: 2022/COMP6248/1/
Required files: team.txt
Due date: Friday 13th May 2022, 16:00.
Handin: 2022/COMP6248/3/
Required files: paper.pdf; mark_split.pdf; team.txt
For the last few years the International Conference on Representation Learning (ICLR; pronounced “i-clear”) has run a challenge encourgaging researchers to try to reproduce the work presented in papers submitted to the conference. The motivation for this has been as a way of trying to address a larger problem surrounding the lack of reproducability of published results in AI papers. This coursework takes direct inspiration from this challenge.
In this coursework you will form a team and choose a paper submitted to one of the ICLR 2021-2020, or NeurIPS 2021-2020 conferences. You will attempt to reproduce the results of your chosen paper and report back on your findings.
Students will form groups of three members. I’ve made a page on the student wiki to help you form teams: https://secure.ecs.soton.ac.uk/student/wiki/w/COMP6248-2122-classlist and setup a channel in Teams where you can talk to others in the class about finding a team.
Each group will propose a paper to reproduce. All papers submitted to ICLR are publically available on the Open Review website (https://openreview.net/group?id=ICLR.cc) together with their reviews. You can pick a paper that was accepted, or one that was not. The lab demonstrators and lecturers can offer advice on good and bad papers to try. You will need to be careful that you do not pick a paper that requires insane amounts of computation power (some of the papers claim experiments using hundreds of GPUs!). In some cases, the paper you pick might include experiments that require more resources than you have available, however in such cases it is fine to try to reproduce constrained or reduced versions of the experiments (e.g. with lower dimensional inputs, smaller datasets or less layers).
With your selected paper you should aim to replicate the experiments described in the paper. The goal is to assess if the experiments are reproducible, and to determine if the conclusions of the paper are supported by your findings. Your results can be either positive (i.e. confirm reproducibility), or negative (i.e. explain what you were unable to reproduce, and potentially explain why).
Essentially, think of your role as an inspector verifying the validity of the experimental results and conclusions of the paper. Your findings (including any code that you write) will be made publically available in a repository in the “COMP6248 Reproducability Challenge” Github organisation (https://github.com/organizations/COMP6248-Reproducability-Challenge), and will serve as a useful resource for others in the future.
You do not need to reproduce all experiments in your selected paper, for example the authors may experiment with a new method that requires more GPUs than you have access to, but also present results for a baseline method (e.g. simple logistic regression), in which case you could elect to reproduce only the baseline results. It is sometimes the case that baseline methods are not properly implemented, or hyper-parameter search is not done with the same degree of attention.
If available, the authors’ code can be used (authors of ICLR submissions are encouraged to release their code when they submit their work for review). The methods described can also be implemented/re-implemented according to the description in the paper. This is a higher bar for reproducibility, but may be helpful in detecting anomalies in the code, or shedding light on aspects of the implementation that affect results. Note that if you do use the authors’ code alone we will expect a significant amount of analysis and experimentation, whereas if you make your own implementation you would be expected to do less analysis and experimentation.
There are two deliverables for this coursework:
Each team will receive an overall mark (broken down into sub-categories). Individual marks will be assigned based on a split decided by the team. Full details below:
This assigment should allow you to demonstrate knowledge and understanding of: - Using deep learning libraries in order to create and evaluate network architectures. - Critical appraisal of recent scientific literature in deep learning.
The conference paper and accompanying material will be marked as a single piece of work using the following criteria:
Criterion | Description | Marks |
---|---|---|
Experimentation and Analysis | Analysis of the chosen paper, development of an approach reproduce the findings, and use of this approach to generate results | 45 |
Reflection | Reflect on what your experimental results tell us in relation to those in the paper you are reproducing | 45 |
Reporting | Clear and professional reporting | 10 |
Standard ECS late submission penalties apply.
Written group feedback will be given covering the above points, and will be emailed out once marking is complete.
Team members should agree between themselves as to how the marks awarded for the team submission will be divided between the team members (see below for instructions on how to proceed if this is not possible). The Team Leader should print out the form here, complete it as agreed and arrange for every member of the team to sign and date it. The completed signed form must be submitted via the ECS Handin system with the conference paper. An incomplete form (e.g. with missing signatures) means that the entire ECS Handin submission is incomplete and therefore subject to penalties.
Teams are encouraged to split the work evenly between all team members (in which case the marks split evenly). They are advised to consider any proposed non-uniform distribution very carefully before submission. Note that an individual contribution of zero is acceptable and will result in that team member being effectively removed from the team. One or more individual contributions of 10% or less may result in an ad-hoc reduction in the effective team size. Any proposed non-uniform distribution will be discussed with the team after submission and may be subject to modification by the Module Leader at that stage.
Teams are advised to make every effort to agree on the marks distribution because failure to agree will be interpreted as demonstrating a general lack of competence. However, the procedure to follow if there is no agreement is set out below:
The team should divide into two or more subteams (in the worst case, a team of size ‘N’ could have ‘N’ subteams). Each team should elect a subteam leader, who should make a full submission as detailed above. Each marks distribution form submitted should indicate proposed percentages of the overall team marks to be allocated to the members of that subteam, with a written one-page explanation of why such an allocation would be appropriate. It should be noted that any attempt by a team member to exploit the advice above (that teams should make every effort to agree) by, for example, refusing to sign the marks distribution form will not be successful (in the unlikely event that this happens, each individual should make a brief signed statement as to the facts of the case and submit this with the other documentation).
The final marks breakdown for a team that fails to agree will be determined by the Module Leader, taking all relevant factors into account. This decision will be final.
If you need computational resources:
If you have any problems/questions then email or speak to Jon.