| How to Submit Solutions for this Homework/Computer Lab | |
|---|---|
| Due Date: | Friday 9-08-06 |
| E-mail To: | ![]() |
| Subject Line: | HWCL 01 |
| 1'st Line: | Your name and student number |
This computer lab is written for Windows 9x/2000/NT/XT based PCs, with a Mozilla/Firefox Browser or an Internet Explorer. Those portions of the lab that are not platform-specific can also be run on a Mac.
The goals of this computer lab are:
Computer resources and user accounts on the CU Boulder campus are handled by Information Technology Services (ITS). If you need to create an account, change a password, or find the locations of computing labs on the campus go to ITS.
In order for two (or more) users to communicate with each other it clearly takes some infrastructure, e.g., in the form of wires or radio transmitters between the users, or in the form of a network to which the users are connected. Examples of such networks are the PSTN (public switched telephone network) and the Internet.
To use a communication infrastructure like the Internet for communication between arbitrary sets of users, the following three ingredients are necessary:
| 1. Address | An address is a series of letters, and/or numbers that uniquely identifies users and other resources in a network. |
|---|---|
| 2. Protocol | A protocol is a series of steps, involving two or more parties, designed to accomplish a task. |
| 3. Language | A language is a system of words or symbols and rules to combine them to describe ideas, concepts, tasks, processes, etc. |
For example, if user A wants to talk to user B over the telephone, user A first dials user B's telephone number (address). User B picks up the phone when it rings and says hello (part of protocol). User A and B then talk to each other, e.g., in english (language). At the end of the call both users hang up (part of protocol).
In everyday life you are used to addresses in the form of postal addresses, telephone numbers, social security numbers, etc. Inside computers numerical addresses in binary form are used to locate information in memory and to communicate with peripherals such as the keyboard, the monitor, disk drives, printers, etc.
On the Internet addresses take on the form of a fully qualified domain name (see below) of a host computer, such as yahoo.search.com or, in the form of a numeric IP (Internet Protocol) address, such as 66.218.71.200 (dotted decimal address).
A simple example of an everyday protocol is going to the store, selecting groceries, having them scanned and bagged at the checkout stand, paying the amount shown on the cash register, and then returning home to eat. In this case you interact with the grocery store and its clerks and you get food in exchange for money.
On the Internet several different protocols exist according to which computers interact to accomplish specific tasks. Some of the most commonly used (application-layer) protocols on the Internet are:
| Application-Layer Protocols on the Internet | |
|---|---|
| http | Hypertext Transfer Protocol. Used to access web pages. |
| smtp | Simple Mail Transfer Protocol. Used for e-mail. |
| ftp | File Transfer Protocol. Used to upload and download files. |
| telnet | Terminal emulation. Used to remotely control computers. |
Natural languages like english, french, german, etc, are quite complex because of the many irregularities and exceptions that need to be known to use and interpret these languages correctly.
For the purpose of programming computers, or storing, tranmsitting, and rendering text, images, and sound, languages with much more rigorous structure and unambiguous interpretation are needed. On the WWW the most common document language is html (hypertext markup language) and its various variants. The language of resources on the Internet is usually known from the filename extension of the file in which the resource is stored. A file with the name index.html, for instance, contains hypertext markup language. The filename extensions of the most common "languages" used on the Internet are shown in the table below.
| Common "Languages" (Filename Extensions) on the Internet and the WWW | |
|---|---|
| txt | A plain text file, usually written in ASCII (American Standard Code for Information Interchange) "language". |
| htm or html | Hyper-Text Markup Language. An html document can contain text, images, tables, and links to other documents or multimedia files. |
| asp | Active Server Pages. A scripting environment used by Microsoft Internet Information Servers (IIS) in which html, scripts and reusable ActiveX control components are combined to create dynamic web pages on the fly. ASP is not a language, it is a process that uses (a possibly extended) version of the html "language". |
| cfm or cfml | ColdFusion Markup Language. A server-side scripting environment the can be used to interface with databases and create interactive Web applications. What is eventually sent to the browser uses the html "language" (possibly with some extensions). |
| php | PHP stands for 'PHP: Hypertext Preprocessor'. A script language and interpreter that is freely available, primarily for use on Linux Web servers. As with Active Server Pages, PHP scripts are embedded within HTML code, but after processing the server just sends out documents in HTML "language". |
| Portable Document Format. The PDF "language" can describe documents containing any combination of text, graphics, and images in a device-, resolution-, and platform-independent format. A free reader for PDF files can be downloaded from http://www.adobe.com/products/acrobat/readstep.html. | |
| gif | Graphics Interchange Format. A "language" that descibes images, which can be animated, in (lossless) compressed form. Often used for icons, graphics and clipart. |
| jpg or jpeg | Joint Photographic Experts Group. A "language" that describes images in compressed form using (more or less) lossy compression. Good for photographic images and images that use more than 256 different colors. |
| png | Portable Network Graphics. Developed as a (royalty-free) successor to the gif format. A "language" to describe images with up to 64 bits per pixel in compressed form using lossless compression. Good for graphic images with large color depth. |
| mpg or mpeg | Moving Pictures Experts Group. A "language" that describes compressed audio and video information using one of the formats specified by the MPEG. |
| exe | "executable" files. These are files encoded in a low-level computer language that can be executed directly on the computer platform for which they were written, e.g., a Windows machine that uses an Intel chipset. |
| zip | "zipped" files. These are file archives that contain one or more files, with arbitrary filename extensions, in compressed form. This is handy for saving on storage space and transmission time, and for organizing sets of files that are related, but possibly written in different "languages" (e.g., text and image files). |
Documents, images, sounds, programs, etc, are stored in the form of data files with appropriate filename extensions on computers. All major computer platforms use hierarchically organized file systems with directories, subdirectories, sub-subdirectories (or folders, subfolders, sub-subfolders), etc.
The conventions that determine the syntax and other restrictions for directories and file names are dependent on the operating system under which a particular host computer runs. The main variants of operating systems are Unix/Linux, Windows, and MacOS. For historical reasons Windows is not truly distinguishing between filenames in upper and lower case (even though it lets you use upper and lower case letters in file and directory names). Thus, Windows is usually quite forgiving if you make upper/lower case errors when specifying a file or path name. Unix/Linux, however, which is very often used for WWW servers, is case sensitive when it comes to directory and file names and index.html and Index.html refer to two different files.
As an example, the following figure shows all subfolders of a folder called "directories", as displayed in Windows Explorer (set to "Detailed View").
The starting point, which is also called the "root" of a directory tree, is the folder or directory called "directories" in the above figure. One level below are the subfolders or subdirectories "Dogs", "e-texts", "images", and "Monkeys". Each of these subdirectories can contain other sub-subdirectories, etc. Often the names of directories and their subdirectories are chosen in a logical fashion. For example, the electronic text versions of the works of Edgar Allan Poe are stored using the directory path directories/e-texts/Poe,EdgarAllan.
The above figure does not show the actual files that are stored in the various directories. A more explicit graph of the directory "directories" and its subdirectories (highlighted in yellow) that shows all files (the entries without highlighting) is shown in the next figure.
The representation in this figure is called a tree representation of the root directory "directories" because the blue lines that lead from directories to their subdirectories resemble the branches of a tree. There are no further branches emanating from the actual stored files, and thus the filenames are called the leafs of the tree. Note that any (sub)directory may contain files as well as further (sub)directories, e.g., "gutindex.all" is a file in the subdirectory "e-texts", whereas "Maugham,W.Somerset" is another subdirectory.
URI stands for uniform resource identifier, and URL stands for uniform resource locator. A URI is characterized by the following definitions:
| Characterization of Uniform Resource Identifiers (URIs) | |
|---|---|
| Uniform | Uniformity allows different types of resource identifiers (e.g., for text and images) to be used in the same context, even if the mechanisms to access these sources differ. But it also allows the identifiers to be reused in many different contexts, so that new applications or protocols can take advantage of a pre-existing and widely-used set of resource identifiers. |
| Resource | A resource is anything that has an identity, such as an electronic document, an image, a service, etc. Not all resources are network retrievable, e.g., a plumber who stops a leak is also considered a resource. |
| Identifier | An identifier is an object that acts as a reference to a resource. In the case of URIs, the object is a sequence of characters with a restricted syntax, e.g., http://www.w3.org/TR/html401/about.html. |
Comment: The concept of a URI is more general than the concept of a URL because a URI can describe any resource (e.g., a book that was written 1000 years ago and subsequently lost in a fire), whereas a URL must also specify a location where the resource can be retrieved from. In practice the terms URI and URL are often used interchangeably.
To ensure that every user in a network executes a given protocol according to the same rules and every address is specified and used in a uniform manner, standards that everyone in the network agrees upon are necessary.
Definition: A standard is a documented agreement containing technical specifications or other precise criteria to be used consistently as rules, guidelines, or definitions of characteristics, to ensure that materials, products, processes and services are fit for their purpose.
For the Internet, all information necessary to achieve standardization is published in the form of RFCs (Request For Comment) which can be accessed on-line at http://www.rfc-editor.org.
The generic syntax of a URI, as defined in RFC 3986, consists of five main components as follows:
| Generic Syntax of a Uniform Resource Identifier (URI) |
|---|
| <scheme>://<authority>/<path>?<query>#<fragment> |
Here is an example of a URI (or URL) for the WWW and its interpretation:
| Example: http://search.yahoo.com/search?p=car+loan | |
|---|---|
| <scheme> | http (HTTP protocol) |
| <authority> | search.yahoo.com (fully qualified host name) |
| <path> | search (name of file on host) |
| <query> | p=car+loan (query passed along to "search") |
| <fragment> | (omitted) |
Here is another example of a URI (or URL) for the WWW and its interpretation:
| Example: http://www.schaik.com/pngsuite/pngsuite.html#palette | |
|---|---|
| <scheme> | http (HTTP protocol) |
| <authority> | www.schaik.com (fully qualified host name) |
| <path> | pngsuite/pngsuite.html (directory/name of HTML file on host) |
| <query> | (omitted) |
| <fragment> | palette (specific position within document) |
More generally, if the URI specifies the location of a retrievable resource on a network, then the five components have the following meanings (the specification in RFC 3986 is actually more complex and general, but this generality is not currently needed in the context of the WWW):
| Meaning of URI Components for a URL | |
|---|---|
| <scheme> | Specifies the protocol that is used to retrieve the resource. Examples are http (HyperText Transfer Protocol) which is used to access Web pages and ftp (File Transfer Protocol) which is used to exchange files between two computers. In general <scheme> strings are NOT CASE SENSITIVE. |
| <authority> | Specifies the host address in the form of a fully qualified domain name (see below) of a network host computer, such as yahoo.search.com or, in the form of a numeric IP (Internet Protocol) address, such as 66.218.71.200 (dotted decimal address). In general <authority> strings are NOT CASE SENSITIVE. |
| <path> | Specifies the path to the file (including directories and subdirectories) on the host (or server) that contains data in a "language" specific to an application, e.g., in HTML for the WWW. The "language" is usually (implicitly) specified by the filename extension (see above). In general <path> strings ARE CASE SENSITIVE and depend on the conventions of the host where the resource resides. |
| <query> | The query component is a string of information, e.g., the specification of search terms for a database, to be interpreted by the resource. In general <query> strings ARE CASE SENSITIVE and depend on the conventions used by the application which processes them. |
| <fragment> | The fragment component allows to specify a secondary resource within a primary resource, e.g., a subsection in a HTML document. It is up to the primary resource to determine how to interpret the fragment. In general <fragment> strings ARE CASE SENSITIVE and depend on the conventions used by the application which processes them. |
Not every URI (or URL) will consist of all five parts. The <query> part is usually only needed for searching databases or submitting data from forms embedded in a Web page. If the <path> component is omitted, then the host retrieves a default document, e.g., named index.html. If the <scheme> and <authority> components are omitted in a document, then the URL is taken to be relative to the URL of the document.
Upper/Lower Case Letters. The portions of a URL that specify the protocol and the the host computer address are case insensitive, i.e., http, HttP, and HTTP all specify the same protocol and www.colorado.edu, WWW.Colorado.EDU, and WwW.cOlOrAdO.eDu all specify the same host computer. But the <path>, <query>, and <fragment> strings are specific to the host computer and programs running on it, and will in most cases be case sensitive. Thus, whereas http://ece.colorado.edu/~mathys/ecen1200 takes you to the WWW site of this class, http://ece.colorado.edu/~mathys/ECEN1200 will give you an error message.
Special Characters. Certain characters cannot be used in URIs or should be avoided. Here's a table that lists all printable ASCII (American Standard Code for Information Interchange) characters and their classification for URIs:
| Special Characters in URIs | |
|---|---|
| reserved (special function) |
; / ? : @ & = + $ , |
| unreserved (no restrictions) |
alpha a ... z, A ... Z and numeric
0 ... 9 and - _ . ! ~ * ' ( ) |
| excluded (must be ecaped) |
< > # % " SP, where SP stands for space. |
| unwise (better to escape) |
{ } | \ ^ [ ] ` |
Non-printable characters, such as tabs, carriage return and line feeds are not allowed in URIs. If a reserved, excluded or unwise character needs to be included as part of a URI, then it must be "escaped" according to the table below. The % sign is the escape character, and the 2-digit number after it is the hexadecimal representation of the ASCII code of the character.
| "Escaped" Representation of Special Characters in URIs | |||||
|---|---|---|---|---|---|
| SP | %20 | / | %2F | [ | %5B |
| " | %22 | : | %3A | \ | %5C |
| # | %23 | ; | %3B | ] | %5D |
| $ | %24 | < | %3C | ^ | %5E |
| % | %25 | = | %3D | ` | %60 |
| & | %26 | > | %3E | { | %7B |
| + | %2B | ? | %3F | | | %7C |
| , | %2C | @ | %40 | } | %7D |
A host name, such as ece.colorado.edu consists of fields separated by dots. The fields correspond to different levels in the hierarchical domain name system (DNS), with the highest hierarchy level (top level) at the end ("edu") and the lowest hierarchy level ("ece") at the beginning. Another example is shown in the following table.
| Example: galileo.jpl.nasa.gov | |
|---|---|
| gov | Top level domain, administered globally |
| nasa | Second level domain (NASA: National Aeronautics and Space Administration), administered by gov. |
| jpl | Third level domain (JPL: Jet Propulsion Laboratory), administered by nasa.gov |
| galileo | Fourth (and in this case bottom) level domain (Galileo project), administered by jpl.nasa,gov |
Any host on the Internet has a unique numerical address associated with its name. Under IPv4 (Internet Protocol version 4), these numerical addresses have a fixed length of 32 bits and are usually expressed in dotted decimal form, by converting 8 bits at a time to a decimal number in the range 0..255. For example, the dotted decimal address of the host galileo.jpl.nasa.gov is 137.78.160.55 and the URLs http://www.tonic.to/faq.htm and http://206.14.214.154/faq.htm refer to the same resource (assuming that tonic.to does not change the assignment between the numeric IP address and the domain name).
How about host addresses like 1.2.3.com or 1.2.3.4? Are these numeric IP addresses or domain names? To answer this question you look at the last field of the host address. If it is a number, then the address is a numeric IP address, otherwise it is a domain name.
Under IPv4, a valid numerical IP address has exactly 4 fields with decimal numbers in the range 0..255 in each. Valid domain names can in principle contain any number of fields (but more than 4 or 5 is not very practical), and the last field must be an official top level domain (TLD, see below).
In any kind of network, the address of each user must be distinct from the addresses of all other users. In a small network it is possible to just maintain a single table or database which keeps track of all addresses that are used. In a large network it is better to use a more distributed approach and to subdivide the whole address space into non-overlapping sets, which can then be either further subdivided, or administered by individual decentralized databases. The Internet uses a hierarchical decentralized system, called the domain name system (DNS), to ensure unique address allocation on a world-wide basis. At the top of the hierarchy are the top level domains (TLD), which can be either generic (or global) TLDs (gTLD) such as edu or country code TLDs (ccTLD) such as us, as shown in the following tables.
| Original generic Top Level Domains (gTLD) | |
|---|---|
| arpa | "Address and routing parameter area". Used for reverse address lookup (numeric IP to domain name). |
| com | This domain is intended for commercial entities or companies. |
| edu | This is the domain for 4-year colleges and universities. |
| gov | Reserved for agencies of the US Federal government. |
| mil | This domain is used by the US military. |
| net | Originally intended to hold only the computers of network providers and the network node computers. Now available to the general public. |
| org | Originally intended for any kind of organization that doesn't fit anywhere else. Now available to the general public. |
| Additional generic Top Level Domains (gTLD) | |
|---|---|
| aero | Intended for the air transport industry. |
| biz | Intended for business domain names. |
| cat | Domain names for Catalan language/culture. |
| coop | Domain names for cooperatives. |
| info | Is a general TLD with no restrictions that can be used by anyone. |
| int | For international organizations established by treaty. |
| jobs | Domain names for employment-related sites. |
| mobi | Domain names for sites catering to mobile devices. |
| museum | Domain names for museums. |
| name | Intended for personal Web sites in your name to post hobbies, interests, pictures, etc. |
| pro | Intended for use by certified professionals worldwide. |
| travel | Domain names for travel agents, airlines, hoteliers, etc. |
| Country code Top Level Domains (ccTLD) | |
|---|---|
| xx | xx stands for any 2-letter country code as specified in the ISO 3166-1 document, where ISO stands for International Standards Organization. |
| Examples of Country code Top Level Domains (ccTLD) | |
|---|---|
| af | Afghanistan |
| at | Austria |
| au | Australia |
| be | Belgium |
| ch | Switzerland |
| cn | China |
| de | Germany |
| fr | France |
| in | India |
| jp | Japan |
| mx | Mexico |
| ru | Russian Federation |
| sa | Saudi Arabia |
| uk | United Kingdom |
| us | United States |
To obtain credit for this homework/computer lab you need to answer the
questions stated below. Send your solution to
,
making sure they conform to the following rules.
Rules for the Submission of Homework/Computerlab Solutions
Format: E-mail your solution as a plain text (ASCII) file. Do not use word processor files like Microsoft Word or "rich-text" HTML files.
Corrections: If you need to make corrections after you submitted your solution, resubmit all your answers (not only the ones that changed) since only your last submission of each homework/computer lab will be graded.
Teamwork: Teamwork is fine for the homework/computer labs, but the solutions must be turned in individually. In particular, copy and paste of entire solutions from other students is not acceptable.
Questions:
| URL # | <scheme> | <authority> | <path> | <query> | <fragment> |
|---|---|---|---|---|---|
| 1 | http | www.pocketmovies.net | pop.html | ||
| 2 | ftp | 128.138.189.30 | pub/ecen1200/mm/bike.mpg | ||
| 3 | http | www.aliceadsl.fr | meteo | ville=07156 | |
| 4 | http | www.w3.org | TR/html4/types.html | h-6.5 |
hTtP://as.seen.on.tv/miracle_gro.asp?cash-in http://show.biz/"oscar"awards.html http://just.do.it/sweatshops.cfm?locations http://flowers.4.u/order.htm#nameIf a URL is illegal, point out why it is illegal.
©1996-2006, P. Mathys. Last revised: 9-07-06, PM.