The Econometrics Journal's Data Editor site

The purpose of the reproducibility checks carried out at the Econometrics Journal is to verify three aspects of the replication package: (i) it is complete, in the sense of producing each table, figure, and in-text number in the paper and its appendices, including those online; (ii) it is self-contained, in the sense of not requiring a subprogram or module not included in the package; and (iii) the data and code are adequately documented for other researchers to be able to use them to replicate the results in the paper. When the data are accessible (included in the package or, in case of exemptions, via temporary access by the reproducibility team), the checks ensure that the code exactly reproduces the results in the paper and its appendices. In the case of a data exemption, authors may provide simulated or synthetic data to check that the code runs and produces all output, but the exact results cannot be checked.
Reproducibility checks (not replication checks) are conducted. This means that our checks do not screen for coding errors, discrepancies between what the paper claims the code does and what it actually does, econometric errors, or whether the empirical approach followed in the paper can be reproduced in other environments or other datasets.

Yes, the replication package should produce each table, figure, and in-text number in the paper and its appendices, including those online. All these codes are checked for their ability to produce the results in the paper and appendices.

We firmly believe that reproducibility and replicability are the main pillars of science. The nature of replication checks requires time, effort, and resources that journals typically do not have: the publication process should be speedy for science to advance at the right pace. Our reproducibility checks provide a necessary first step: to ensure that authors publish all available data and the codes that generate the results they present in the papers we publish, and, importantly, to check that these codes and data run and produce the published results. The certification that we provide enhances transparency, since it assures that other researchers can reproduce the published research and test it against other datasets, assumptions, methods, etc. It also provides an additional service to the authors, as we often detect small errors that are better amended before publication than in an erratum afterwards.

Even publicly available data should be included in the replication package to ensure they remain available in the future for anyone who wants to replicate your results. The only exception is when your exact extract is published in a "trusted" repository (see the following list for guidance) with a permanent DOI. This is important, because datasets are often updated (or removed) by the provider, and your version of the data may no longer be available to researchers in the future.

Each provider offers a different policy regarding re-distribution of original and transformed datasets. Some providers, for example, allow re-distribution as long as your extract is deposited in a specific repository. You should make sure about the restrictions to publish your data before the first submission. You should also make sure to seek permission from the original owner of the data to publish them, and make sure to cite the original source accordingly.

Yes, you can request an exemption on the grounds that the data are restricted-access. The request should be made at the time of initial submission, in a cover letter addressed to the Editor. The Editor in charge of your submission will determine whether your request is justified before submitting the paper to referees. If the Editor decides against the exemption, the manuscript will not be sent to referees, and you will be requested to either accept the data and code availability policy or otherwise the paper will be rejected. When an exemption is needed for a dataset that is incorporated to the analysis during the editorial process, the exemption should be requested at the first iteration in which the new data are incorporated.

Yes, provided that the request is made at the time of initial submission.

Yes. If you do not require a data exemption at the time of your first submission, you will be required to publish all the data used in your paper.

Yes. The data to produce all results in the paper and appendices, including those online, should be shared unless an exemption is requested and granted at the time of first submission.

Yes. Unless you are granted an exemption at the time of first submission, you will be required to publish in the replication package all data to produce all results in the paper and appendices, including those online.

In general, no. Later exemptions can only be requested for new data that is incorporated into the analysis during the editorial process. If your data cannot be published and you did not request the exemption at the time of initial submission, your paper may be rejected for publication at Econometrics Journal.

Yes. Whenever the data used for the analysis in the paper cannot be published with the replication package (or in an open-access "trusted" repository, see the following list for guidance on what constitutes a "trusted" repository) an exemption needs to be requested at the time of first submission. Only if the exact extract that was used in the study is published in the repository and it is readily available in the exact format that is called by the code, an exemption will not be requested.

No. Data archived in "trusted" open repositories (see the following list for guidance) is acceptable in the replication package provided what is published is the extract that was used in the study and it is readily available in the exact format that is called by your code. The Data Editor will evaluate the suitability of the repository.

Data archived in "trusted" open repositories (see the following list for guidance) is acceptable in the replication package provided what is published is the extract that was used in the study and it is readily available in the exact format that is called by the code. The Data Editor will evaluate the suitability of the repository and whether or not there is the need of publishing a copy with the package on the journal's repository.

Yes. Personal websites are not considered "trusted" open repositories, because there is no guarantee that the package will be systematically archived. See the following list for guidance on what constitutes a "trusted" repository.

No. The goal of our data and code availability policy is to ensure transparency and reproducibility of research, and this requires publishing the data you collected. If others can use your data, your research will gain visibility.

Yes. Restricted access data is generally discouraged, but when the nature of your research largely relies on a specific dataset and cannot be conducted on an open alternative, those data are eligible for an exemption. However, you may be requested to provide a certification from the provider indicating that the data will be archived and made available to other users following the same procedure to request access to it.

In general, no. Data should be anonymized to ensure that subjects cannot be identified. Only when the nature of the study impedes such anonymization, the authors can request a data exemption, which will cover only the required minimum to ensure the anonymity of the experimental subjects.

No.

Yes. Open source software is encouraged, but licensed software is allowed.

Whenever possible, yes. If these packages or libraries are available in open repositories (e.g. most Stata packages), a clear indication on how to download and use them is sufficient. If the libraries cannot be included in the packages and are not publicly available, the Data Editor will be in contact with the authors to coordinate on a feasible way to implement the checks.

If you were granted a data exemption, your paper would still need to go through reproducibility checks before final acceptance. In order to do so, you can either (i) grant temporary (distance or physical) access to the data to the reproducibility team for the sole purpose of the checks (the data will be destroyed or access terminated after the checks), or (ii) supply simulated or synthetic dataset(s) instead of the one(s) used in the analysis.

A simulated dataset is generated by a model (ideally, your model). A synthetic dataset is a scrambling or perturbation of the actual dataset to ensure anonymity.

Whenever feasible, we strongly recommend providing temporary access to restricted data. There are numerous advantages of this approach: (i) it saves the effort of producing synthetic or simulated datasets; (ii) the certification provided by the journal is stronger in the sense that we certify that we have been able to reproduce the results published in the paper as opposed to only checking that the code is complete, runs, and produces output for all tables, figures, and in-text numbers published in the article and its printed and online appendices; (iii) we can detect if the results cannot be reproduced, which gives the authors a chance to fix any errors before publication.

The reproducibility team will treat the data with the highest ethical standards, preventing any violations of confidentiality, and using them exclusively to run the reproducibility checks. The restricted datasets will be destroyed as soon as the checks are performed and, therefore, they will not be published.

Even if you cannot provide direct access to the reproducibility team, this option is preferred to the simulated/synthetic dataset alternative as long as the checks can be executed in a reasonable amount of time. In this case, you need to supply the replication package to the journal and the contact of the data provider. The reproducibility team will send the code to the provider and the provider will send the output back to the team, who will check the results.

This option is still generally preferred to the simulated/synthetic dataset alternative. However, you should seek approval by the Data Editor before making any commitments with the certification agency. The Econometrics Journal, however, will NOT be able to cover the cost of certification.

If this option is available, it is generally preferred to the simulated/synthetic dataset (but less preferred to providing temporary access to the original data) as long as the testing sample can be published with your package. Otherwise, a simulated/synthetic dataset that can be published with the package is preferred.

The simulated/synthetic dataset will be published with the replication package. Even if these are not the real data, their structure, which by design will largely mimic the actual dataset, will give readers a better sense of your data. Please make sure the manipulations used to produce the synthetic/simulated datasets are described in the ReadMe file.

Our view is that, when reproducibility checks cannot be performed on real data, there is still an advantage of running them on such simulated/synthetic datasets: they are still useful to make sure the code is complete and self-contained, and that it runs without errors.

In this case, we strongly recommend simulating data using your model as data generating process. If that is not feasible, please contact the Data Editor explaining in detail why this is the case. The Data Editor will either assist you in the process, and, eventually, s/he will make a proposal to your original Editor about how to handle the situation.

In order to generate a dataset that mimics the same characteristics as the original one, the synthetic option may be easier. There are many open source routines that do it for you. However, there are also two main disadvantages: (i) you need to make sure that your scrambling/perturbation algorithm ensures correct anonymization of the data; and (ii) non-linear estimation routines may not converge on synthetic data, whereas they are more likely to converge in an artificial dataset generated by the model that you are estimating.

There are multiple ways to generate it. You can find some useful links with helpful resources, mostly in R, here, here, here, and here.

We will provide you with the outcome of our reproducibility checks as fast as possible. If the package is not complete or the code does not run, more than one iteration may be required, in which case the processing time might be increased. Articles that require a relatively long running time may take longer. The processing time also depends on how responsive the authors are to our requests.

The reproducibility checks are handled by our Data Editor and our reproducibility team: a team of advanced Ph.D. students that have been hired to carry out the checks under the supervision of the Data Editor. Once an article is accepted for publication at the Econometrics Journal, the authors are requested to submit the replication package along with other production files. Upon submission, the Data Editor assigns the package to one or several members of the reproducibility team. The reproducibility team provides the Data Editor with a report summarizing the outcome of the checks. After reviewing it, the Data Editor contacts the authors informing them about the outcome of the replication checks, and eventually requests them to amend the package if needed. Once the replication checks are completed, the article is transferred back to the original Editor, who is in charge of final acceptance. If results in the paper need to be modified as a result of the checks, the original Editor in charge will be responsible for approving these changes before acceptance. If these changes imply a modification of the message of the paper, the original Editor can decide to reject the paper. Final acceptance is conditional on full reproducibility.

Yes. Upon submission, the Data Editor assigns the package to one or several members of the reproducibility team, who will run your code and check the output generated. The reproducibility team provides the Data Editor with a report summarizing the outcome of the checks. In some instances, the code is too demanding to be run in a reasonable amount of time. In such cases, the Data Editor will be in contact with you with a recommendation for supplying a simplified version of the code that allows testing the essential parts of the code.

If the code is too demanding to be run in a reasonable amount of time, the Data Editor will be in contact with you with a recommendation for supplying a simplified version of the code that allows testing the essential parts of the code. For example, this can entail a reduced number of replications of a simulation exercise, the code that solves a structural model for a given set of parameters, a simplified function to test an optimization routine, etc. Such a simplified "testing" version will be published along with the original code in your replication package. This is so because we believe that these testing versions are extremely useful for other researchers that want to understand and use your code for replication or their related research, enhancing transparency and increasing the visibility of your research.

If the data and code that you provided fail to replicate the results in the paper, the Data Editor will be in contact with you to identify the source of the discrepancy. Once the reproducibility checks are completed, if the discrepancy implies a change in the results presented in the paper or online appendices, even if minor, the Data Editor will notify it to the original Editor in charge. The Editor in charge will be responsible for approving these changes before acceptance. If these changes imply a modification of the message of the paper, the original Editor can decide to reject the paper. Final acceptance is conditional on full reproducibility.

The Data Editor will be in contact with you indicating the amendments and additions that need to be done to the replication package to pass the reproducibility checks. Once amended, the revised package will go through the checks again.

We need you to submit the entire package again because updating the replication package ourselves increases the potential risk that the files you intend to submit for possible publication may be mishandled.

The replication package should include the following information:

A ReadMe file called ReadMe.pdf in PDF format. We recommend to use the following template, which includes all the information required by the journal. Specifically, the ReadMe should include the following information:
- Description of the content of the package (datasets, programs, folders, etc.)
- Data Availability Statement: precise indications on how the data were obtained, including required registrations, memberships, application procedures, monetary cost, or other qualifications, and, if applicable, URL to download them (which is typically part of the data citation).
- Precise instructions on how to run the code.
- Indications on where to find each output saved/displayed in the package's output.
- Software requirements, including the software version and operating system used by the authors.
- All packages and libraries that need to be installed to run the code and a clear indication on how to obtain them.
- Expected running time (even if it is a few seconds). When relevant, include the hardware that the estimated time refers to.
- Data citations: all datasets used in the paper (with no exceptions) should be listed in the references section of the paper in the same way that we cite other papers, and a copy of these citations should appear in a dedicated section of the ReadMe file. You can find some examples in page 7 of this document.
The raw datasets used in the paper and online appendices, including a complete, transparent and precise documentation describing all variables. You can additionally provide the analysis data if this is helpful, but they are not required if the raw data are provided. If you were granted a data exemption at the time of first submission (see here, here, and here for details), you should either provide temporary access to the replication team for the sole purpose of reproducibility checks, or you should submit a synthetic/simulated dataset that allows running the code and produce all outputs in the paper and appendices, even if the results do not match those in the paper.
The data cleaning codes and the analysis codes that produce all reproducible outputs reported in the article, appendix, and online appendices (including figures, tables, and numbers reported in the text). If some results are produced without scripts (e.g. ArcGIS maps), the ReadMe file should include step-by-step very detailed instructions on how to produce that output. In case of simulation/Monte Carlo studies, the authors are requested to set a seed so that the exact numbers that are reported can be obtained.
If data are provided in proprietary format (e.g. Stata's .dta), a copy of the data in non-proprietary format (e.g. ASCII, csv).
Additional documentation for experimental papers (if these files are part of the paper or of an appendix, copy them again in a separate document included in the replication package):
- A document outlining the design of the experiment.
- A copy of the instructions given to participants, in both the original language and an English translation.
- Information on the selection and eligibility of participants.
- A PDF copy of the Institutional Review Board (IRB) approval of one of the authors' institutions (IRB approval number, date, name of the institution) or an explicit mention that an exemption has been granted by the Editorial Board.
Confidential data that is provided to grant temporary access to the reproducibility team should be included in a separate folder "4 Confidential data not for publication" to ensure it will be treated as such, and it not be published.

Whenever possible, the easiest way is to provide a physical copy of your data by including it in a separate folder labeled "4 Confidential data not for publication" outside of the replication package. All replicators and the Data Editor have signed confidentiality agreements that prevent them to use the data for any other purpose than the reproducibility checks. When that option is not feasible, we recommend you to contact our Data Editor to arrange the best way to provide access to the reproducibility team.

To ensure that you do not forget all elements of the replication package. This avoids repeated iterations and speeds up the process.

Yes, you should and it is very important to do so. When submitted to production, your package is handled by different people at the Econometrics Journal and at the publisher, not all of them familiarized with data and code. Respecting the folder structure ensures that your package is published correctly.

The ReadMe file should be called ReadMe.pdf and it should be in PDF format. We recommend to use the following template, which includes all the information required by the journal. Specifically, the ReadMe should include the following information:

Description of the content of the package (datasets, programs, folders, etc.)
Data Availability Statement: precise indications on how the data were obtained, including required registrations, memberships, application procedures, monetary cost, or other qualifications, and, if applicable, URL to download them (which is typically part of the data citation).
Precise instructions on how to run the code.
Indications on where to find each output saved/displayed in the package's output.
Software requirements, including the software version and operating system used by the authors.
All packages and libraries that need to be installed to run the code and a clear indication on how to obtain them.
Expected running time (even if it is a few seconds). When relevant, include the hardware that the estimated time refers to.
Data citations: all datasets used in the paper (with no exceptions) should be listed in the references section of the paper in the same way that we cite other papers, and a copy of these citations should appear in a dedicated section of the ReadMe file. You can find some examples in page 7 of this document.

Yes, this is requested by our Data and Code Availability Policy.

The PDF format is portable, which means that it can be transferred without having to worry about dependencies, fonts, etc. This ensures readability across platforms and users.

Some users of your replication package may be not have access to the specific proprietary software that you used for your study. This ensures that they can have access to your data without problems. It also minimizes compatibility issues (e.g., old versions of Stata cannot open files saved by newer versions).

All datasets used in the paper (with no exceptions) should be cited both in the paper and in a dedicated section of the ReadMe file.

Yes, all datasets used in the paper (with no exceptions) should be listed in the references section of the paper in the same way that we cite other papers, and a copy of these citations should appear in a dedicated section of the ReadMe file.

You should cite all datasets used in the paper (with no exceptions) in the references section of the paper in the same way that we cite other papers, and a copy of these citations should appear in a dedicated section of the ReadMe file. You can find some examples in page 7 of this document. More specific guidance on data citations is available here.

Data citations are as fundamental as citations to other papers, if not more. Giving proper credit to data providers is in line with all scientific ethical standards. Moreover, giving proper credit to data providers ensures that they can keep receiving external funding to make their datasets publicly available for research.

The empirical/simulation/experimental papers that we checked include the following statement: "The data and codes for this paper are available at [...]. They were checked for their ability to reproduce the results presented in the paper."

This statement is adjusted accordingly when data exemptions are granted (acknowledging either that the authors provided temporary access to the confidential data or that the checks were implemented on simulated/synthetic data provided by the authors). In particular, we either certify "The authors were granted an exemption to publish their data because access to the data is restricted. However, the authors provided a simulated or synthetic dataset that allowed the Journal to run their codes. The synthetic/simulated data and codes are available at [...]. They were checked for their ability to generate all tables and figures in the paper, however, the synthetic/simulated data are not designed to reproduce the same results." or "The authors were granted an exemption to publish their data because access to the data is restricted. However, the authors provided the Journal with temporary access to the data, which allowed the Journal to run their codes. The codes are available at [...]. The data and codes were checked for their ability to reproduce the results presented in the paper.", depending on the case that is applicable. These statements are combined accordingly when more than one situation applies.

The statements are also also adjusted when the nature of the algorithms is highly demanding, and a partial/simplified version of the code has been used for the reproducibility checks: we add the sentence "Given the highly demanding nature of the algorithms, the replication checks were run on a simplified version of the code, which is also available at [...]" to the applicable statement.

After all reproducibility checks are completed, your package will be published along with your paper online.

Yes, as long as one copy is published with your paper. The only exception is when your replication package is published in a "trusted" repository (see the following list for guidance) with a permanent DOI. In that case, your DOI can be used to link your article with your package, and the Data Editor can wave the requirement to publish the package with your paper.

Each provider offers a different policy regarding re-distribution of original and transformed datasets. Some providers, for example, allow re-distribution as long as your extract is deposited in a specific repository. You should make sure about the restrictions to publish your data before the first submission. You should also make sure to seek permission from the original owner of the data to publish them, and make sure to cite the original source accordingly. You will be the responsible of copyright infringements for what you publish with the replication package.

Frequently Asked Questions

Scope of the reproducibility checks

Data and Code Availability Policy and Exemptions

Procedures when Exemptions Are Granted

Implementation of the Reproducibility Checks

Content of the Replication Package

Data Citations

Reproducibility Certification, Publication of the Replication Package and Copyright Issues