Labels

Thursday, July 17, 2008

14 - XML Schema

Hi,

Here we’ll learn the need of XML Schema in Web Services.

Need -

· SOAP provides a standard method of encoding data into an XML document. This technology is also extremely flexible. Anything can be encoded into the body of a SOAP message as long as it does not invalidate the XML. The body of a message can contain a request for the latest weather information, a purchase order or whatever else the implementer of a Web service can dream up.

· With the variety of content and types of data that can be contained within a SOAP message, you need a way of expressing the structure, way to determine the type of data of a message.

Hence in all below is the need –

· What’s needed is a standard way of describing the structure and the type of information that should be contained within an XML message sent to the Web service.

· In other words, you need a way of representing the schema an XML message must conform to in order to be processed by the Web service

Schema –

· As mentioned above, we need a schema to describe the structure of an XML document and its type information.

· The two dominant technologies for defining an XML schema are DTDs and XML Schema. You

DTD -

· DTDs to define the structure of an XML document but not to describe the type information. i.e.

· DTDs do describe the structure of the document, but they cannot express the type of data it contains. There is no notion of fundamental types within DTDs such as integers and strings, nor is there support for defining your own types.

· DTD syntax is not XML- based.

· DTDs cannot be parsed using XML parsers and cannot be easily embedded into other XML documents.

· DTDs should be considered legacy technology for defining XML schemas because of their limitations and lack of industry support.

Due to its limitations, the best way to express schema is using XML Schema.

XML Schema -

· The recommended way to express schemas for XML-based Web services is via XML Schema.

· XML Schema comprises two specifications managed by the W3C

1. XML Schema Part 1: Structures http://www.w3.org/TR/xmlschema-1/) and

2. XML Schema Part 2: Datatypes (http://www.w3.org/TR/xmlschema-2/).

· XML Schema provides a rich syntax for defining schemas used to validate XML instance documents.

· It not only allows you to define the structure of an XML document, but it also allows you to define the type of data the document contains and any constraints on that data.

· Also, it lets you specify foreign key and referential integrity constraints.

e.g.

XML Schema

Instance Document

<?xml version=‘1.0’?>

<schema xmlns=‘http://www.w3.org/2001/XMLSchema’>

<element name=‘Amount’/>

</schema>

<?xml version=‘1.0’?>

<Amount>351.43</Amount>

The above sample schema is a valid XML document that is capable of being consumed by any standard XML parser. A schema definition is contained within a root schema element. The example schema defines one element named Amount. XML documents that can be validated against a schema are called Instance Documents.

DataType Support

· One of the more useful features of XML Schema is that it defines a core set of datatypes.

· These include basic programming types such as string, int, float, and double; mathematical types such as integer and decimal; and XML types such as NMTOKEN and IDREF.

· One of the most significant advantages of the XML Schema type system is that it is completely platform independent. Values of types are consistently represented no matter what hardware, operating system, or XML processing software is used.

· The XML Schema type system allows XML-based protocols such as SOAP to achieve strong interoperability in heterogeneous computing environments.

XML Schema – Simple (No type information)

XML Schema – Typed (Containing Type Information)

<?xml version="1.0" encoding="utf-8"?>

<soap:Envelope

xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

<soap:Body>

<PurchaseItem>

<Item>Apple</Item>

<Quantity>12</Quantity>

</PurchaseItem>

</soap:Body>

</soap:Envelope>

<?xml version="1.0" encoding="utf-8"?>

<soap:Envelope

xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"

xmlns:xsi="http://www.w3.org/2001/ XMLSchema-instance">

<soap:Body>

<PurchaseItem>

<item xsi:type="xsi:string">Apple</item>

<quantity xsi:type="xsi:int">1</quantity>

</PurchaseItem>

</soap:Body>

</soap:Envelope>

Namespaces in XML Schema:

Namespaces are leveraged heavily in XML schemas. Namespaces provide logical boundaries for entities defined within a schema.

For example, above, there is a schema that defines the Amount element. What if someone else defines an Amount element? If an Amount element appears in an instance document, how do I know whether it is an instance of the one defined by my schema or theirs?

If each Amount element were defined within a separate namespace, the Amount element would be fully qualified to a particular namespace and therefore would be unambiguous. To achieve this we use a attribute ‘targetNamespace’.

1). targetNamespace Attribute

· The targetNamespace attribute is used to set the identifier of the namespace.

· The value of this attribute is a URI that serves as an opaque pointer to reference the namespace.

· A namespace identified by a URI is defined within a schema document.

· Entities that can be scoped to namespaces include datatypes, elements, and attributes.

· Within an XML Schema document, the schema element can contain a targetNamespace parameter that contains a URI for the schema.

XML Schema

Instance Document

<?xml version=‘1.0’?>

<schema xmlns=‘http://www.w3.org/2001/XMLSchema’

targetNamespace=‘urn:Commerce'>

<element name=‘Amount’/>

</schema>

<?xml version=‘1.0’?>

<Amount xmlns=‘urn:Commerce’ >351.43</Amount>

2). xmlns Attribute

· XML document (To fully qualify the entities refrerenced) may refere one or more schemas.

· XML documents that refer schemas are instance documents and schemas themselves.

· Instance documents must refer to the namespace URI in order to fully qualify the entities referenced.

· You can accomplish this by adding an xmlns attribute to any element within the document as shwon above.

Moniker:

· When you reference a schema namespace in an XML Document, you can assign the reference a moniker.

· The assignment of a moniker to a referenced namespace takes the form of xmlns:moniker='SomeURI'.

· Once moniker is used, Any entities referenced within the schema must then be prefixed by the moniker.

See e.g below, where ‘soap’ is used as moniker within Schema.

XML Schema – Here the ‘soap’ is used as ‘moniker’

<?xml version="1.0" encoding="utf-8"?>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"

xmlns:xsi="http://www.w3.org/2001/ XMLSchema-instance">

<soap:Body>

<Amount xmlns=‘urn:Commerce’>123.45</Amount>

</soap:Body>

</soap:Envelope>

· It is often necessary to reference multiple schemas. You can do this by adding multiple xmlns attributes.

· These are often added to the root element of the document for better readability and developer convenience.

· However, schema references can be made in any element within the instance document.

Here is a SOAP message that contains two Amount elements in the message body:

XML Schema – Here the ‘soap’ is used as ‘moniker’

<?xml version="1.0" encoding="utf-8"?>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

<soap:Body>

<Amount xmlns=‘urn:Commerce’>123.45</Amount>

<soap:Amount xmlns:soap=‘urn:Commerce’>123.45</soap:Amount>

</soap:Body>

</soap:Envelope>

Even though I used different syntax, both Amount elements in the preceding document are equivalent. Let’s look at how I used the individual namespace references.

· Three namespace references were made. The first reference was made in the root element and defines the soap: moniker that is used to fully qualify entities referenced in the SOAP Envelope schema.

· The second reference sets Commerce as the default namespace for the first Amount element and its child elements (if any existed).

· The final reference overrides the soap: moniker defined earlier to instead reference the Commerce schema for the second Amount element and its children (if any existed).

· By convention, the XML Schema namespace is assigned the xsd moniker and the XML Schema Instance namespace is assigned the xsi moniker.

· Also, if the schema contains references to its own definitions, is usually assigned the tns moniker.

· A reference to a namespace that is not assigned a moniker is used to define the default namespace.

· Attributes and elements that are not fully qualified with a prefix, that is within the scope of the default namespace declaration, are qualified with respect to the default namespace.

Tns moniker -

Note - Tns - which is short for “this namespace.”

· Namespace references are also used heavily in schema documents. Within schema documents, it is often necessary to reference entities defined within the document.

· You can do this by creating a reference within the schema document to itself.

· By convention, this reference is usually associated with the tns: moniker, which is short for “this namespace.”

Here is a portion of the SOAP Envelope schema that shows the use of the tns: moniker.

- The schema element sets the target namespace to http://schemas.xmlsoap.org/soap/envelope/.

- It also contains a reference to itself that is given the moniker tns:

- The schema defines the Envelope element. The Envelope element is declared as type Envelope.

- Because the type definition is contained within the schema, it is prefixed by tns:

XML Schema – Here the ‘tns: moniker is used.

<?xml version=‘1.0’?>

<!-- XML Schema for SOAP v 1.1 Envelope -->

<schema xmlns=‘http://www.w3.org/1999/XMLSchema’

xmlns:tns=‘http://schemas.xmlsoap.org/soap/envelope/’

targetNamespace=‘http://schemas.xmlsoap.org/soap/envelope/’>

<!-- Definition for the Envelope element -->

<element name="Envelope" type="tns:Envelope"/>

<!-- Definition for the Envelope type -->

<complexType name=‘Envelope’>

<element ref=‘tns:Header’ minOccurs=‘0’/>

<element ref=‘tns:Body’ minOccurs=‘1’/>

<any minOccurs=‘0’ maxOccurs=‘*’/>

<anyAttribute/>

</complexType>

</schema>

3). schemaLocation Attribute

· The URI of a namespace reference is an opaque pointer. So even if the URI is specified in the form of a URL, you cannot count on it to resolve to the actual schema document.

· However, you can use the schemaLocation attribute to give the parser hints on where the schema documents that define the referenced namespaces are located.

· The value of the schemaLocation attribute is a whitespace-delimited string. It contains the URI of the schema followed by the URL that resolves to the schema document that is used to define the namespace.

Instance Document

<?xml version=‘1.0’?>

<Amount xmlns=‘urn:Commerce’

xmlns:xsi=‘http://www.w3.org/2001/XMLSchema-instance’

xsi:schemaLocation=‘urn:Commerce http://somedomain/Commerce.xsd

http://www.w3.org/2001/XMLSchema-instance http://www.w3.org/2001/XMLSchema.xsd’>

123.45

</Amount>

4). noNamespaceSchemaLocation Attribute

· Schemas are not required to define namespaces.

· You can use the noNamespace- SchemaLocation attribute to reference schemas with no namespace.

Here is an example:

- Because the Amount element is not defined within a namespace, I used the noNamespaceSchemaLocation attribute to reference the schema.

- The Amount element is then fully qualified with respect to the Commerce.xsd schema.

XML Schema

Instance Document

<?xml version=‘1.0’?>

<schema xmlns=‘http://www.w3.org/2001/XMLSchema’ >

<element name=‘Amount’/>

</schema>

<?xml version=‘1.0’?>

<Amount xmlns:xsi=‘http://www.w3.org/2001/XMLSchema-instance’

xsi:noNamespaceSchemaLocation=‘file:Commerce.xsd’>

123.45

</Amount>

Custom Datatypes

· Elements are defined using the element element. The name attribute is used to specify the name of the element and the type attribute to indicate what type of data the element can contain.

· Datatypes fall into two categories, Simple Types And Complex Types. Simple types cannot contain subelements or attributes; custom types can.

1).Simple Types

· Simple types are datatypes that can be used to describe the type of data contained within an element or an attribute.

· Instances of simple types cannot contain attributes or other elements.

· Examples of simple types include int, long, string, and dateTime. A simple type can also define an enumeration or a union.

· A simple type definition always derives from another simple type.

· The three types of derivations allowed by XML Schema are By Restriction, By List, and By Union.

- By Restriction - Simple type derived from its base type by restriction can define additional restrictions, imposed on the values that instances of the simple type can contain. Therefore, instances of a simple type derived by restriction can contain only a subset of the values that can be contained by its base type

- By List –

- By Union - union. An instance of a type derived by union can contain a value of one of the types contained within the union

<?xml version=‘1.0’?>

<schema xmlns="http://www.w3.org/2001/XMLSchema.xsd">

<simpleType name="MyInt">

<restriction base="int"/>

</simpleType>

<simpleType name="GenericProductId">

<restriction base="string">

<minLength value="1"/>

<maxLength value="20"/>

</restriction>

</simpleType>

<simpleType name="Percent">

<restriction base="integer">

<minInclusive value="0"/>

<maxInclusive value="100"/>

</restriction>

</simpleType>

</schema>

<?xml version=‘1.0’?>

<schema xmlns="http://www.w3.org/2001/XMLSchema.xsd">

<simpleType name="MyUnion">

<union memberTypes="string int"/>

</simpleType>

<simpleType name="PhoneNumber">

<union>

<simpleType name="UsPhoneNumber"/>

<restriction base="string"/>

<pattern value="([0-9]{3}) [0-9]{3}-[0-9]{4}"/>

</restriction>

</simpleType>

<simpleType name="UkPhoneNumber">

<restriction base="string">

<pattern value="+[0-9]{2} ([0-9])[0-9]{3} [0-9]{3} [0-9]{4}"/>

</restriction>

</simpleType>

</union>

</simpleType>

</schema>

2). Complex Types

· A complex type is defined using the complexType element. The complexType element contains declarations for all elements and attributes that can be contained within the element.

· A complex type is a logical grouping of element and/or attribute declarations.

· For example, the SOAP Envelope schema defines numerous complex types. The Envelope itself is a complex type because it must contain other elements such as the Body element and possibly a Header element.

<?xml version=‘1.0’?>

<schema xmlns="http://www.w3.org/2001/XMLSchema.xsd"

xmlns:tns="urn:Commerce" targetNamespace="urn:Commerce">

<!-- Type Definitions -->

<simpleType name="ProductId">

<restriction base="string">

<minLength value="1"/>

<maxLength value="20"/>

<pattern value=‘[^/\&#x5B;&#x5D;:;|=,+*?&gt;&lt;]+’/>

</restriction>

</simpleType>

<simpleType name="Items">

<restriction base="ProductId">

<enumeration value="Apple"/>

<enumeration value="Banana"/>

<enumeration value="Orange"/>

</restriction>

</simpleType>

<!-- Request Message (work-in-progress) -->

<element name=‘PurchaseItem’>

<complexType>

<element name=‘Item’ type=‘tns:ProductId’/>

<element name=‘Quantity’ type=‘int’/>

</complexType>

</element>

<!-- Response Message (work-in-progress) -->

<element name=‘PurchaseItemResponse’>

<complexType>

<element name=‘Amount’ type=‘double’ nillable=‘true’/>

</complexType>

</element>

</schema>

Namespace Scoping

· Detailed From Book

· Complex types can be derived by restriction or by extension.

· They can also contain attributes only (simple content) or attributes and elements (complex content).

· These attributes and elements can be locally defined or can be references of globally defined entities.

· If the entities within a complex type are locally defined, they should be associated with the namespace by having their form attribute set to qualified. This makes it easier to author instance documents that reference a default namespace.

· You can also override the default value for the form attribute. You can do this by setting two attributes in the schema element: elementFormDefault and attributeFormDefault. Schemas automatically generated by the .NET platform for Web services will generally set elementFormDefault and attributeFormDefault to qualify.

Polymorphism

· Detailed From Book

· XML Schema enables polymorphic behavior by allowing elements to contain instances of derived types to appear within the document.

· XML Schema also allows elements to be substituted with elements of a compatible type via substitution groups.

· In order to facilitate polymorphic behavior, instances of derived types must be able to be substituted in place of an instance of its base type.

Restricting Inheritance

· Detailed From Book

· XML Schema also provides mechanisms for restricting inheritance and polymorphic behavior.

· Complex type definitions can restrict how the type can be inherited by setting the final attribute.

· Element definitions can also restrict the type of substitutions that are allowed by setting the block attribute

Thanks & Regards,

Arun Manglick || Senior Tech Lead

No comments:

Post a Comment