Markup Languages (SGML, XML, HTML, WML, HDML, SML)

Each of these mark-up languages offers benefits for the interactive publisher, the choice being dependent on the application being developed. XML with its promise of providing the full functionality of SGML but without some of its complexity is receiving increasing attention. Many of the resources listed below in the XML section have been developed from a previous thorough grounding in SGML.

The SGML/XML section of the products pages on this site gives extensive links to available software tools.

Organisation for the Advancement of Structured Information Standards (OASIS)

The Organization for the Advancement of Structured Information Standards (formerly SGML Open, the international consortium that has guided the SGML industry since 1993) is a non-profit, international consortium of users and suppliers whose products and services support SGML and XML, today's most viable standards for document and data interchange. Their web site has information about their members and their products, and many papers on XML, SGML and HTML subjects. The site also hosts Robin Cover's SGML/XML Web Page, a comprehensive reference site for information about open document interchange standards.

URL: http://www.oasis-open.org

International Digital Enterprise Alliance (IDEAlliance)

The Graphic Communications Association (GCA) created the not-for-profit International Digital Enterprise Alliance (IDEAlliance) to provide support to working groups engaged in developing industry-specific applications of both vertical and cross-industry open information standards.

Current member groups of the IDEAlliance include:

IDEAlliance serves as a host for meetings of the committees and other working groups of the International Organization for Standardization (ISO), Organization for the Advancement of Structured Information Standards (OASIS), American National Standards Institute (ANSI), and Worldwide Web Consortium (W3C) - groups responsible for the development and maintenance of structured information standards, the eXtensible Markup Language (XML), the Standard Generalized Markup Language (SGML), and their derivatives. It is active in the application of standards through a wide variety of technical committees staffed by devlopers working for both user and vendor organisations.

For more information on the IDEAlliance, visit its web site or contact IDEAlliance Chief Information Officer Pete Janhunen.

URL: http://www.IDEAlliance.org/
URL: Pete Janhunen mailto:pjanhunen@IDEAlliance.org

Markup languages International standardisation

The ISO group responsible for the SGML family of standards is an ISO subcommittee called ISO/IEC JTC1/SC34 Information Technology - Document Description and Processing Languages, divided into three working groups :

URL: http://www.ornl.gov/sgml/sc34/sc34home.htm

SGML (Standard Generalized Markup Language)

For those looking for information about SGML (Standard Generalized Markup Language), SGML sites, projects using SGML and lots more on HyTime and other SGML related applications the SGML Web site is the place to go.

URL: http://www.sil.org/sgml/sgml.html

"A Tutorial Introduction to SGML Architectures" provides a tutorial introduction to SGML architectures in the context of using architectures with XML documents.

URL: http://www.isogen.com/papers/archintro.html

For about a decade now, the Text Encoding Initiative has been working on markup for the humanities and social sciences.  Much of their original research work was based on SGML, which is easily transferable to XML.

URL: http://www.uic.edu/orgs/tei/

HyTime (Hypermedia/Time-based Structuring Language)

The HyTime User's Group web site provides a good starting point for all things HyTime.

URL: http://www.hytime.org

See also: "Introduction to HyTime"
URL: http://www.sgml.no/hytime/ekhtqstr.htm
"Quick Guide to HyTime Basics"
URL: http://info.admin.kth.se/SGML/Anvandarforening/Arbetsgrupper/HyTime/Reports/tr1v1.html

DocBook: The Definitive Guide

The Organization for the Advancement of Structured Information Standards (OASIS), which is responsible for the continued maintenance of the DocBook DTD, designated O'Reilly's publication: "DocBook: the Definitive Guide", as the official documentation of the DocBook Document Type Definition (DTD). DocBook is a system for writing structured documents using SGML and XML. A number of computer companies use DocBook for their documentation.

The book includes:

In addition, the accompanying CD-ROM contains:

URL: O'Reilly publication details http://www.oreilly.com/catalog/docbook/chapter/book/docbook.html
URL: OASIS, book online http://www.oasis-open.org/docbook/documentation/reference/index.html
URL: the DocBook site, book online http://www.docbook.org/tdg/html

XML (Extensible Markup Language)

XML (Extensible Markup Language) is a subset of SGML designed for delivery on the web. At present the standard is being developed under the auspices of the W3C Consortium. The activities are controlled through a wide variety of working groups within the W3C whose site includes details of current activities, discussion groups, links to developer mailing lists, status of recommendations, drafts and final recommendations (Note: the W3C issues"Recommendations" rather than "Standards".)

URL: http://www.w3.org/XML/Activity

XML Resources

Further information and resources on a supplementary XML page

The range of information and resources related XML has expanded considerably, and continues to do so. We have therefore created a supplementary XML page featuring links to resources concerning the development of the standard and its applications. Due to the speed of developments in this area, resources featured on the page have the tendency to "age" fairly rapidly. Readers are therefore advised to consult more widely concerning the subject of XML.

HTML - Hypertext Markup Language

HTML 4.0

HyperText Markup Language (HTML) version 4.0 adds enhancements to previous versions of the standard in several areas including: advanced forms, frame improvements, table enhancements, object support, script and style elements, additional named entities (adding support for symbols and glyphs used in mathematics) and internationalization (supports multilingual documents by fully supporting the international ISO 10646 character set).

URL: http://www.w3.org/MarkUp/

HTMLs Future

The W3C Workshop: "Shaping the Future of HTML" held in San Francisco, on May 4-5, 1998 considered some key questions relating to the future of HTML such as:

A summary of the discussions held is available on the workshop Website. A key message was that:

"It was agreed that further extending HTML 4.0 would be difficult, as would converting 4.0 to be an XML application. The proposed way to break free of these restrictions is to make a fresh start with the next generation of HTML based upon a suite of XML tag-sets.
The next step is for W3C to draft a briefing package for setting up an activity to carry this forward. The previous HTML working group having been closed when HTML 4.0 became a W3C Recommendation. Work on the new version of HTML is expected to take 18 months or so."

URL: Meeting http://www.w3.org/MarkUp/future/
URL: position papers http://www.w3.org/MarkUp/future/papers.html
URL: meeting papers http://www.w3.org/MarkUp/future/presentations.html

HTML resources

Simple HTML Ontology Extensions (SHOE)

SHOE is a superset of HTML which adds the tags necessary to embed arbitrary semantic data into web pages. Further information on the Web and web-like technology topic page of this site.

URL: http://www.cs.umd.edu/projects/plus/SHOE/index.html

Document Object Model (DOM)

The World Wide Web Consortium (W3C) Document Object Model (DOM) Level 2 Recommendation reflects cross-industry agreement on a standard API (Applications Programming Interface) for manipulating documents and data through a programming language (such as Java or ECMAScript).

The specification defines the foundation of a platform- and language-neutral interface to access and update dynamically a document's content, structure, and style. The DOM Level 2 provides a standard set of objects for representing HTML and XML documents and data, a standard model of how these objects may be combined, and a standard interface for accessing and manipulating them.

Whilst the W3C's HTML 4.0 provides authors with a standard way to embed scripts in a document, it does not specify how those scripts can manipulate the document's content, structure, and style. Several vendors already offer powerful mechanisms for doing so, but these mechanisms do not always work with different software packages. The DOM, on the other hand, defines a standard API that allows authors to write programs that work without changes across tools and browsers from different vendors.

The DOM specification is separated in severals documents, one module (or more if they are strongly related) per document. Consult the W3C site for the latest information on the Document Object Model (DOM).

URL: DOM http://www.w3.org/DOM/

WML - Wireless Markup Language

Wireless Markup Language (WML) is being promoted as a potential markup language for WAP (Wireless Application Protocol) mobile devices. The language itself is based on on XML, and a contributor to the XML developers list has published a very simple tutorial concerning WML on the web. The tutorial explains features of the markup, demonstrates its use with a simple emulation of PDA device, and provides examples.

URL: http://zvon.vscht.cz/HTMLonly/WMLTutorial/Examples/Example1/index.html

WAP and web standards

The Wireless Application Protocol (WAP) Forum and the World Wide Web Consortium (W3C) worked together in order to define next-generation web specifications that support the full participation of wireless devices on the World Wide Web.

The collaboration centred on the development of a common process to: produce XML-based web specifications, define and test implementation processes, and promote these specifications to the industry at large. The collaboration centred on the incorporation of WAP's Wireless Markup Language (WML) features into the W3C's XHTML, the next-generation markup language for the Web.

URL: http://www.wapforum.org/
URL: http://www.w3.org/

HDML - Handheld Device Markup Language

The Handheld Device Markup Language (HDML) is a simple language used to define hypertext-like content and applications for hand-held devices with small displays. HDML is designed to leverage the infrastructure and protocols of the World Wide Web while providing an efficient markup language for wireless and other handheld devices.

Congruent with the capabilities and limitations of many handheld devices, HDML's focus goes beyond presentation and layout. HDML provides an explicit navigation model which does not rely upon the visual context required of HTML. As such, HDML offers an efficient means of providing content via the WWW infrastructure to handheld devices such as cellular phones, pagers, and wireless PDA's.

URL: http://www.w3.org/pub/WWW/Submission/1997/5/Overview.html

SML - Simplified Markup Language

There has been much discussion on the XML developers list concerning a proposal to standardise: "Simplified Markup Language" (SML), a cut-down version of XML. SML's proponents wish to cut XML 1.0 down to is barest essentials. The XML.com web site features an article which gives the background to the discussions, albeit from a point of view of an SML supporter.

URL: http://xml.com/pub/1999/11/sml/index.html?wwwrrr_19991124.txt

A mailing list concentrating on the development of Simplified Markup Language (SML) has been set up. For further details, and to sign-up to the list, see the SML-DEV eGroup pages on the web.

URL: http://www.egroups.com/group/sml-dev/info.html

Agent Markup Language Programme (DAML)

The DARPA Agent Markup Language (DAML) Programme officially began in the US on 14 August, 2000. The goal of the DAML effort is to develop a language and tools to facilitate the concept of the semantic web through the creation of "technologies that will enable software agents to dynamically identify and understand information sources, and to provide interoperability between agents in a semantic manner".

This goal will be pursued by a research plan that includes the following six tasks:

Find out more from the public portal below.

URL: http://www.daml.org/

Style sheet languages

XSL: the eXtensible Stylesheet Language

XSL is a proposal to the W3C to provide style sheet information for eXtensible markup language (XML) data. XSL extends beyond Cascading Style Sheets (CSS), enabling developers to handle the full richness of XML data and documents.

CSS (see below) can remain the style sheet language of choice for HTML and simply structured XML documents. Whilst XSL could be used for formatting highly structured XML data, especially where the data's presentation order may change between delivery and display. For example, a table of stocks, stock prices and trading information might be sent to a browser using XML. Using an XML-enabled browser users could sort, filter and display the stock information based on their own preferences.

XSL joins Cascading Style Sheets (CSS), the other W3C-developed style sheet language implemented in current popular browsers, as part of the W3C Style Sheets Activity. W3C will be developing both the XSL and CSS style sheet languages in parallel. CSS is used to style HTML and XML documents on the Web. In addition to styling XML documents, XSL is also be able to generate new XML documents from XML data.

URL: http://www.w3.org/Style/XSL

XSL exposed

The XML.com website focused on whether there was a need for the style language XSL, by asking questions such as: How necessary is XSL? Is it just too complicated? Is it really an improvement over what we have today? Might XSL even be considered harmful to the Web?

An article, entitled: "XSL Considered Harmful" provided a controversial view, although the sentiments expressed have a resonance with many in the XML developer community. An online debate was held, and is summarised, at the site based on issues raised in the article. The site also features links to previous coverage of XSL including a tutorial which introduces the technology.

URL: XSL Considered Harmful http://xml.com/xml/pub/1999/05/xsl/xslconsidered_1.html?wwwrrr_990520.txt
URL: XSL Considered Harmful, Part 2 http://xml.com/xml/pub/1999/05/xsl/XSLCompare.html?wwwrrr_990520.txt
URL: XML.com http://xml.com/

Extensible stylesheet technology

XML.com has published a comprehensive introduction to the W3C's extensible stylesheet technology. XSLT expert G. Ken Holman takes readers through XSLT's place in the world of XML standards, as well as XSLT's practical application.

XSLT (Extensible Stylesheet Language Transformations) is a technology which lets developers transform information marked up in XML from one vocabulary to another, providing a flexible solution for XML document manipulation.

URL: http://www.xml.com/pub/2000/08/holman/

Style sheet language resources

Cascading Style Sheets Level 2 (CSS2)

Cascading Style Sheets, level 2 (CSS2) has been created and developed by the W3C Cascading Style Sheets and Formatting Properties Working Group. CSS2 is a style sheet language for HTML-based documents which builds on the CSS1 specification. CSS2 offers precise control over the presentation of Web pages, adding: improved printing, positioned and layered elements, improved Internationalization, and a rich WebFont model, including downloadable fonts. CSS2 can also control voice, pitch, stereo position and other aspects of how Web pages will sound when rendered to speech.

URL: spec. http://www.w3.org/TR/PR-CSS2
URL: web style sheets http://www.w3.org/Style

CSS2 tutorial

Miloslav Nic of the Department of Organic Chemistry, ICT Praguehas published   the first version of his CSS2 tutorial on the Zvon site. It contains at this moment  around 50 XML sources with around 100 simple stylesheets. The Mozilla browser is required to view the examples, the author is planning support for "other browers (ie. Internet Explorer) when they reasonably support CSS2".

URL: http://zvon.vscht.cz/HTMLonly/CSS2Tutorial/General/book.html
URL: Miloslav Nic mailto:nicmila@vscht.cz


The Unicode Standard is an essential reference for computer programmers and software developers working on global software and multilingual applications as it provides the foundation for the internationalization and localization of software. Development of Unicode is overseen by a consortium made-up of individual and corporate members.

The Unicode Consortium web site provides authoritative sources of information on the Unicode character encoding standard. In addition to the authorised description and guide to the Unicode standard, there are sources on all the essential aspects, including basic principles, code charts, and a discussion of implementation issues. Encompassing the principal scripts of the world, the Unicode standard includes the alphabets used in countries across Europe, Africa, Asia, and the Indian subcontinent, and the "unified Han" set of logographic characters for Chinese, Japanese, Korean and historical Vietnamese. The unified Han section includes a radical-stroke index, a multi-glyph table, and a character cross-reference chart providing mapping information to major national, bibliographic, and industrial standards.

URL: http://www.unicode.org

Unicode 3.0 released

Unicode Standard Version 3.0 supports 49,194 characters, including 31% more ideographs for Japanese, Chinese and Korean markets. Implementation support is greatly expanded, with double the character property data, and four times as many technical specifications for supporting implementations. Unicode is the default text representation in XML, and is enabled in all modern web browsers, almost all operating systems, and Internet standards such as HTML, Java, ECMAScript, XML, and LDAP. The Unicode Standard, may be obtained directly from the Unicode Consortium.

URL: http://www.unicode.org/

Unicode in XML

"Unicode in XML and other Markup Languages" is published jointly by the Unicode Consortium and the World Wide Web Consortium (W3C).

URL: http://www.w3.org/TR/unicode-xml
URL: http://www.unicode.org/unicode/reports/tr20

Unicode font system for Windows

Bitstream have produced a Unicode font system for Windows. It is free to download from their web site. The fonts are rather heavy to install requiring approximately 14MB of memory to run in.

URL: http://www.bitstream.com

Topic maps


Topic maps are an ISO Standard (ISO13250), designed to aid consistent navigation of large-scale information resources. It allows for the concepts or topics that underlie a set of information objects to be exposed to those people or applications processing the information. Topic Maps provide a semantic layer that is not hierarchical, although it could be visualised that way, it facilitates navigation at a level independent from any of the information.

The topic map web site provides additional information on the concept, resources, demos and a topic map forum (an area of the site which will be used to host FAQs and to disseminate Topic Map information).

URL: http://www.topicmaps.com/

