====== Zotero Translators ======
Translators are at the core of one of Zotero’s most popular features: its ability to import and export item metadata from and to a variety of formats. Below we describe how translators work, and how you can write your own.
This page describes the function and structure of translators. For in-depth documentation on how to write translator code, see [[dev/translators/Coding]].
**Note:** Before writing a translator for a site, look at the [[dev:exposing_metadata|documentation on exposing metadata]]; website authors should try embedding the necessary metadata before attempting to write a translator.
If you're looking for a broken translator to fix, see the [[https://zotero-translator-tests.s3.amazonaws.com/index.html|recent translator errors]] and check on one of the top reported errors. You can also check the status of many translators by reviewing the [[dev:translators:testing#running_tests|translator test overview]].
===== Translator Types - Web, Import, Export and Search =====
Zotero translators can operate in four different ways (note that translators are not necessarily restricted to a single type):
* **[[dev:translators:coding#web_translators|Web translators]]**: can be activated when visiting a website. E.g. with Zotero for Firefox, an icon appears in Firefox's address bar when a translator finds item metadata on the loaded webpage. Clicking this icon will activate the translator, saving the item metadata into your Zotero library.
* **[[dev:translators:coding#import_translators|Import translators]]**: can import item metadata from one of the standard storage formats, such as BibTeX or RIS, into your Zotero library. The data to be imported may be supplied as a file, through the operating system's clipboard, or it may be delivered through a web translator (in this case, the role of the web translator is typically restricted to retrieving the metadata, with the import translator doing the actual parsing).
* **[[dev:translators:coding#export_translators|Export translators]]**: can export item metadata from items in your Zotero library to a file in one of the standard storage formats (like BibTeX or RIS).
* **[[dev:translators:coding#search_translators|Search translators]]**: can look up and retrieve item metadata when supplied with a standard identifier, like a PubMed ID (PMID) or DOI.
===== Translator Structure =====
Zotero translators are stored as individual JavaScript files in the “translators” subdirectory of the [[/support/zotero_data#locating_your_zotero_library|Zotero data directory]]. Each translator contains a JSON metadata header, followed by the translator’s JavaScript code. This code must include certain top-level JavaScript functions, as determined by the translator type(s).
==== Metadata ====
An example of a JSON metadata header is shown below:
{
"translatorID": "fcf41bed-0cbc-3704-85c7-8062a0068a7a",
"label": "NCBI PubMed",
"creator": "Simon Kornblith, Michael Berkowitz, Avram Lyon, and Rintze Zelle",
"target": "https?://[^/]*(www|preview)[\\.\\-]ncbi[\\.\\-]nlm[\\.\\-]nih[\\.\\-]gov[^/]*/(?:m/)?(books|pubmed|sites/pubmed|sites/entrez|entrez/query\\.fcgi\\?.*db=PubMed|myncbi/browse/collection/|myncbi/collections/)",
"minVersion": "2.1.9",
"maxVersion": "",
"priority": 100,
"inRepository": true,
"translatorType": 13,
"browserSupport": "gcsbv",
"lastUpdated": "2014-06-20 04:23:04"
}
The roles of the different metadata fields are:
* **translatorID** \\ The unique internal ID by which Zotero identifies the translator. You must use a stable [[http://en.wikipedia.org/wiki/Globally_Unique_Identifier|GUID]], as the ''translatorID'' is used for automatic updating of translators, and for calling translators within other translators.
* **translatorType** \\ An integer specifying to which type(s) the translator belongs. The value is the sum of the values assigned to each type: import (1), export (2), web (4) and search (8). E.g. the value of ''translatorType'' is 2 for an export translator, and 13 for a search/web/import translator, because 13=8+4+1.
* **label** \\ The name of the translator.
* **creator** \\ The author(s) of the translator.
* **target** \\
* For [[dev:translators:coding#web_translators|web translators]], the ''target'' should specify a [[:dev:technologies#regular_expressions|JavaScript regular expression]] (note that escaping requires two backslashes: one for the regular expression itself, and one for the JSON, e.g. "^https?://(www\\.)?example.com/". If using Scaffold, it takes care of the JSON escaping, so backslashes do not need to be escaped).\\ When only matching a domain, the translator should terminate in a forward slash, so it only matches a non-proxied domain. Zotero will take care of de-proxifying the URL and pass the de-proxified URL to the translator.\\ Whenever a webpage is loaded, Zotero tests the target regular expressions of all web translators on the webpage URL. If there is a translator with a matching target, this translator’s ''detectWeb'' function is run. If this function finds item metadata, the Zotero translator icon appears or becomes active in the browser. When multiple translators have a matching target, the translator with the lowest priority number is selected. Web translators with an empty ''target'' string (e.g. the DOI translator) match every webpage, but normally have a high priority number and are only used when no other translator matches.
* For import translators, the ''target'' is set to the expected extension (e.g. the BibTeX import/export translator has its target set to "bib"; selecting BibTex in Zotero’s import window filters for files with a ".bib" extension).
* For export translators, the ''target'' is set to the extension that should be given to generated files (e.g. the BibTeX translator produces "filename.bib" files).
* **minVersion** & **maxVersion** \\ Respectively the minimum and maximum version of Zotero (as specified in Zotero’s [[https://developer.mozilla.org/en/install_manifests|Install Manifest]]) with which the translator is compatible.
* **browserSupport** \\ A string containing one or more of the letters ''g'', ''c'', ''s'', ''i'', representing the connectors that the translator can be run in -- Gecko (Firefox), Chrome, Safari, Internet Explorer, respectively. ''b'' indicates support for the Bookmarklet ([[https://groups.google.com/forum/#!topic/zotero-dev/ZWCe86B3OCw/discussion|zotero-dev thread]]) and ''v'' indicates support for the [[https://github.com/zotero/translation-server|translation-server]]. For more information, see [[dev:translators:connectors|Connectors]].
* **priority** \\ An integer indicating translator priority. When multiple translators can translate a source, the translator with the lowest priority number is selected. Site-specific web translators normally have a priority of 100. For guidelines on picking an appropriate priority for web translators see [[:dev:translators:priority|this page]]
* **inRepository** \\ Set to ''true'' for translators that are added to the Zotero repo and distributed to all Zotero users, and ''false'' for those that are not.
* **lastUpdated** \\ The date and time when the translator was last modified (format “YYYY-MM-DD HH:MM:SS”). For the metadata to be read correctly, this line must be the last line in the JSON block.
=== Metadata Options ===
In addition to the required metadata fields described above, two optional fields exist, **configOptions** and **displayOptions**. Both are JavaScript objects, with several properties that control translator behavior:
* **configOptions**
* **dataMode** \\ For [[dev:translators:coding#import_translators|import translators]], this sets the form in which the input data is presented to the translator. If set to "rdf/xml", Zotero will parse the input as XML and expose the data through the ''Zotero.RDF'' object. If "xml/dom", Zotero will expose the data through the function ''Zotero.getXML()''. Zotero does not natively support importing N3 representations of RDF. The values "block" and "line" are deprecated and no longer necessary in [[dev:client_coding:changes_in_zotero_2.1|Zotero 2.1]] and later.
* **getCollections** \\ For [[dev:translators:coding#export_translators|export translators]], set to ''true'' or ''false''. If ''true'', an export translator will have access to the collection names and can recreate them in the exported file.
* **displayOptions**
* **exportCharset** \\ The default character set to use for export, defaults to "UTF-8"
* **exportFileData**, **exportNotes** and **exportTags** \\ For each property that is set, a checkbox (respectively "Export Files", "Export Notes" and "Export Tags") is added to Zotero's export window, allowing files, notes and/or tags to be exported. A checkbox is checked by default if the corresponding property is set to ''true'', and unchecked if the property is set to ''false''.
An example of how these properties are set (taken from the BibTeX.js translator):
"configOptions": {"getCollections": true},
"displayOptions": {"exportCharset": "UTF-8", "exportNotes": true, "exportFileData": false, "useJournalAbbreviation": false}
==== Top-level Functions ====
Depending on the translator type, each Zotero translator must include certain top-level JavaScript functions:
* **[[dev:translators:coding#web_translators|Web translators]]**
* //detectWeb// \\ After a web translator has been selected based by its matching target and its priority ranking, ''detectWeb'' is run to determine whether item metadata can indeed be retrieved from the webpage. Should return the detected item type (e.g. "journalArticle", see the [[https://aurimasv.github.io/z2csl/typeMap.xml|overview of Zotero item types]]), or, if multiple items are found, "multiple". If ''detectWeb'' does not return a value, the translator with the next-highest priority is selected, until all translators with a matching target have been exhausted.
* //doWeb// \\ Performs the actual item metadata retrieval.
* **[[dev:translators:coding#import_translators|Import translators]]**
* //detectImport// \\ Determines whether the translator can import item metadata. Should return ''true'' if it can, and ''false'' if it cannot.
* //doImport// \\ Performs the actual import.
* **[[dev:translators:coding#export_translators|Export translators]]**
* //doExport// \\ Performs the export.
* **[[dev:translators:coding#search_translators|Search translators]]**
* //detectSearch// \\ Determines whether the translator can look up item metadata. Should return ''true'' if it can, and ''false'' if it cannot.
* //doSearch// \\ Performs the actual lookup.
See [[dev:translators:coding|Translator Coding]] for a detailed description on how these functions can be coded.
===== Tools =====
The following tools can make coding Zotero translators easier:
* [[:dev:translators:scaffold|Scaffold]] - Scaffold is an IDE for translators built into Zotero (Tools -> Developer -> Translator Editor). Translators can be quickly [[:dev:translators:testing|tested]] and debugged, and item saving is simulated, so no changes are made to your Zotero library.
* Browser inspector - Web translators generally use [[https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector|querySelector]] and [[https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll|querySelectorAll]] to extract content from web pages. Your browser likely provides an inspector tool to help you understand pages' structure. You can access it by right-clicking and selecting Inspect (Firefox) or Inspect Element (Chrome).
* XPath Tools - Many older web translators rely on XPath to extract information from HTML or XML. Various tools can assist with generating and checking XPath expressions, including the DevTools built into browsers. For example, in Firefox, you can get the XPath for any element by finding it in the browser's Inspector tool, right-clicking on the element, and choosing Copy -> XPath.
===== Contributing Translators =====
If you created or modified a translator and wish to have it added to Zotero, or are looking for support on writing translators, please submit a pull request to the [[https://github.com/zotero/translators/|Zotero Translators GitHub repo]]. You can also ask questions about translator development on [[http://groups.google.com/group/zotero-dev|Zotero development mailing list]].
To submit a pull request, fork the [[https://github.com/zotero/translators|Zotero Translator GitHub repository]], commit your changes (i.e., adding or modifying translator files), and create a [[http://help.github.com/pull-requests/|pull request]]. You can use your Git client of choice, but for new users we recommend [[http://www.syntevo.com/smartgit/index.html|SmartGit]], which is free for non-commercial purposes.
When you submit a pull request on GitHub, your translator code will be reviewed, and you will receive comments from the Zotero developers or experienced volunteers. Once you've made any necessary changes, your translator will be added to the Zotero translator repository.
==== Licensing ====
Please note that contributed translators need to be licensed in a way that allows the Zotero project to distribute them and modify them. We encourage you to license new translators under the [[http://www.gnu.org/licenses/agpl.html|GNU Affero General Public License version 3]] (or later), which is the license used for Zotero. To do so, just add a license statement to the top of the file. Take a look a recently committed translator, like "Die Zeit.js", for an example of such a statement.
===== Recommendations for Translator Authors =====
While there are no strict coding guidelines for translators, there are some general recommendations:
- Web translator detect targets should be selective, to minimize the number of ''detectWeb'' functions that are run for each page.
- ''detectWeb'', ''detectImport'' and ''detectSearch'' should be coded to minimize the likelihood of the corresponding ''doWeb'', etc. function failing. Do your minimum required input checking the detect functions -- a failing ''do'' function will cause user-visible errors.
- Make detect functions lightweight-- they may be run on pages that a user is not even considering saving. Detect functions should not need to make additional HTTP requests. This obviously runs counter to the preceding point-- find a happy medium.
- When translating the web page in the browser, do not modify any part of its DOM.
- Minimize HTTP requests. More HTTP requests slow down the user, cause undue load on servers, risk getting the user rate-limited or blocked, and in general are bad.
- Don't leak user data. HTTP requests should in general not be directed to 3rd-party hosts.
- Document your code. If there are input data deficiencies and the translator is working around them, document the deficiencies. If there are specific types of pages that a web translator is for, provide example URLs and expected output.
- Produce [[dev:translators:testing|translator tests]] when possible, covering the basic page types that the translator is designed to support.
- Run ESLint on your code before submitting it. Zotero provides an ESLint plugin for translator development. You can run it on your translator within a clone of the ''zotero/translators'' repository: npm ci && npm run lint -- "Your Translator.js"