XML: About

Folger Digital Texts uses eXtensible Markup Language (XML) to encode our master files. XML is a semantic encoding language that allows encoders to include information "behind the scenes" of the visible text, which can then be used for special searching and analysis, visualizations, and other applications. Types of special information that are included in our encoded texts include details about which characters are entering or exiting a scene, which character is delivering a speech, and even when each character dies.

Folger Digital Texts follows the guidelines of the Text Encoding Initiative (TEI), a set of guidelines that has become the standard for encoding literary texts. If you're new to XML or the TEI guidelines and want to learn more, some helpful online resources to get you started are W3School's XML Tutorial and TEI By Example.

Using the encoded texts as a starting point is a significant time-saver in creating mobile apps and other digital projects, or conducting research. We are delighted to share our encoded texts at no cost for noncommercial use (please review our Terms of Use for more details). To get started, select a play from the Download table below.

For more specific information about how Folger Digital Texts uses TEI tags in the corpus, please refer to the Tag Guide.

XML: Download

Please select your content from the list. Your browser will download a compressed folder that contains the XML file(s) and supporting images and processing files needed to run your selected title(s) in a browser, edit using a text editor, or import into a pre-existing XML-based project. Remember that in order for the file to render properly in a browser, all of the files that are included in the download must be kept in the same place (i.e., in the same folder) on your system.

For more information about Folger Digital Texts's XML methodology, visit our Tag Guide, or download our Documentation PDF.

For your convenience, we no longer require you to register before downloading files. If you would like to join our mailing list or help us develop better offerings and services by giving us feedback on how we're doing so far, please visit our feedback page.

Title Last Updated Download Format
Folger Digital Texts - Complete Set May 31, 2016 XML
All's Well That Ends Well July 31, 2015 XML
Antony and Cleopatra July 31, 2015 XML
As You Like It July 31, 2015 XML
The Comedy of Errors July 31, 2015 XML
Coriolanus July 31, 2015 XML
Cymbeline July 31, 2015 XML
Hamlet July 31, 2015 XML
Henry IV, Part 1 July 31, 2015 XML
Henry IV, Part 2 July 31, 2015 XML
Henry V July 31, 2015 XML
Henry VI, Part 1 July 31, 2015 XML
Henry VI, Part 2 July 31, 2015 XML
Henry VI, Part 3 July 31, 2015 XML
Henry VIII July 31, 2015 XML
Julius Caesar July 31, 2015 XML
King John July 31, 2015 XML
King Lear April 21, 2016 XML
Love's Labor's Lost July 31, 2015 XML
Lucrece July 31, 2015 XML
Macbeth July 31, 2015 XML
Measure for Measure July 31, 2015 XML
The Merchant of Venice July 31, 2015 XML
The Merry Wives of Windsor July 31, 2015 XML
A Midsummer Night's Dream July 31, 2015 XML
Much Ado About Nothing July 31, 2015 XML
Othello May 31, 2016 XML
Pericles July 31, 2015 XML
The Phoenix and Turtle July 31, 2015 XML
Richard II July 31, 2015 XML
Richard III July 31, 2015 XML
Romeo and Juliet July 31, 2015 XML
Shakespeare's Sonnets July 31, 2015 XML
Taming of the Shrew July 31, 2015 XML
The Tempest July 31, 2015 XML
Timon of Athens July 31, 2015 XML
Titus Andronicus July 31, 2015 XML
Troilus and Cressida July 31, 2015 XML
Twelfth Night July 31, 2015 XML
The Two Gentlemen of Verona July 31, 2015 XML
The Two Noble Kinsmen July 31, 2015 XML
Venus and Adonis July 31, 2015 XML
The Winter's Tale April 21, 2016 XML

XML: Tag Guide

The following is a list of the TEI standard elements that we use in the Folger Digital Texts XML source code, along with specifications on how each tag is used in the project. To review the TEI documentation for each element, click on any highlighted tag.

Element Description
<pb> Marks the beginning of a page in the print edition. The n attribute gives the page number. The spanTo attribute gives the xml:id of a milestone element marking the end of the page.
<milestone> Milestones are used in several instances. When the unit attribute has the value "page", it marks the end of a page in the print edition (see also the entry for <pb>). The n attribute gives the page number. When the unit attribute has the value "ftln", it describes a line of text, and the n attribute gives the line number. The corresp attribute notes the corresponding w, c, pc, or anchor elements. The ana attribute has the value "verse", "prose", or "short". The prev and next attributes provide the means for reconstructing split verse lines.
<fw> Provides the act/scene header for the page, as given in the print edition. The n attribute gives the page number. The type attribute has the value "header".
<lb> Marks a line break in the print edition.
<div1> Marks an act (or induction, prologue, epilogue). The type attribute gives the division type. The n attribute gives the canonical act number, where appropriate.
<div2> Marks a scene (or prologue, epilogue, chorus). The type attribute gives the division type. The n attribute gives the canonical act number, where appropriate.
<head> Provides the act/scene header, as given in the print edition.
<stage> Marks stage directions. The n attribute gives the stage direction line number. The type attribute identified the type of stage direction, as follows:
  • "entrance": marks character entrances.
  • "exit": marks character exits. In most circumstances, dead characters are included in the exit direction at the end of the scene, even if the removal of the body is not explicitly referenced in the text of the direction.
  • "delivery": marks directions on how a character speaks (asides, speaking to a specific character, reading, singing, disguising a voice).
  • "location": marks where the character speaks ("within", "above", etc.).
  • "modifier": usually marks a character in disguise (eg., "as Balthazar").
  • "business": any other action, whether performed by a character or not. Directions such as "flourish" and "thunder and lightning" are considered to be "business", since someone will have to make them happen.
  • "dumbshow": describes the action of a dumbshow.
  • "mixed": a stage direction that combines several of the above.
The who attribute identifies the characters associated with that stage direction.
<sound> Marks musical and other sound cues. The type attribute categorizes the type of cue, as follows:
  • "military": marks alarums, marches, retreats, parleys, and other cues related to combat.
  • "flourish": marks flourishes, sennets, tuckets, and other such cues.
  • "music": marks any music (i.e., as performed by musician characters) and off-stage.
  • "sound": marks other sound cues (eg., "Thunder still", "A clock strikes", etc.)
<sp> Marks a speech within a text. The who attribute identifies the characters associated with that speech.
<speaker> Provides the speech prefix, as given in the print edition.
<ab> Within sp tags, contains the text of the speech.
<w> Marks a word in a speech, stage direction, speech prefix, or header. The n attribute gives the line number, where appropriate.
<c> Marks a space character in a speech, stage direction, speech prefix, or header. The n attribute gives the line number, where appropriate.
<pc> Marks a punctuation character in a speech, stage direction, speech prefix, or header. The n attribute gives the line number, where appropriate.
<gap> Marks editorial placeholders where words are missing or unclear in the primary text.
<join> No longer used. In previous beta versions, was used to join w, c, pc, and anchor elements into a typographic line. The n attribute gave the line number. The type attribute had the value "verse", "prose", or "short". The prev and next attributes provided the means for reconstructing split verse lines.
<ptr> Creates a pointer for one or more w, c, pc, and anchor elements, used to link them to analytical interpretations such as textual notes or stanza identification.
<seg> Often contains a song, poem, or letter, identified by its type tag, identifies a word segment that may be quoted or emended.
<label> Marks the header to a song or dumbshow.
<floatingText> No longer used. In previous beta versions, used sparingly to contain a song or poem that seems distinct from the surrounding text and may not be a ttributed to a specific speaker.
<q> Contains quoted sections of text.
<foreign> Marks non-English words. The xml:lang attribute identifies the foreign language, where appropriate.
<name> Marks a proper name that may be quoted or italicized in the text.
<title> Marks a title that may be quoted or italicized in the text.
<hi> Marks sections of text that are otherwise highlighted (generally italicized).
<anchor> Marks areas where content in a prior source text is not present in the current reading.
<app> Critical apparatus containing variant readings from prior publications.