ECEN 1200 - Telecommunications 1

Peter Mathys, Fall 2006, 9/01/06


Homework/Computer Lab 1: Resource Identification, Directories and Files


How to Submit Solutions for this Homework/Computer Lab
Due Date: Friday 9-08-06
E-mail To: no-spam e-mail
Subject Line: HWCL 01
1'st Line: Your name and student number

This computer lab is written for Windows 9x/2000/NT/XT based PCs, with a Mozilla/Firefox Browser or an Internet Explorer. Those portions of the lab that are not platform-specific can also be run on a Mac.

Quick Links


Goals of this Lab

The goals of this computer lab are:


Information Technology Services

Computer resources and user accounts on the CU Boulder campus are handled by Information Technology Services (ITS). If you need to create an account, change a password, or find the locations of computing labs on the campus go to ITS.


What Does it Take to Communicate?

In order for two (or more) users to communicate with each other it clearly takes some infrastructure, e.g., in the form of wires or radio transmitters between the users, or in the form of a network to which the users are connected. Examples of such networks are the PSTN (public switched telephone network) and the Internet.

To use a communication infrastructure like the Internet for communication between arbitrary sets of users, the following three ingredients are necessary:

1. Address An address is a series of letters, and/or numbers that uniquely identifies users and other resources in a network.
2. Protocol A protocol is a series of steps, involving two or more parties, designed to accomplish a task.
3. Language A language is a system of words or symbols and rules to combine them to describe ideas, concepts, tasks, processes, etc.

For example, if user A wants to talk to user B over the telephone, user A first dials user B's telephone number (address). User B picks up the phone when it rings and says hello (part of protocol). User A and B then talk to each other, e.g., in english (language). At the end of the call both users hang up (part of protocol).


Addresses

In everyday life you are used to addresses in the form of postal addresses, telephone numbers, social security numbers, etc. Inside computers numerical addresses in binary form are used to locate information in memory and to communicate with peripherals such as the keyboard, the monitor, disk drives, printers, etc.

On the Internet addresses take on the form of a fully qualified domain name (see below) of a host computer, such as yahoo.search.com or, in the form of a numeric IP (Internet Protocol) address, such as 66.218.71.200 (dotted decimal address).


Protocols

A simple example of an everyday protocol is going to the store, selecting groceries, having them scanned and bagged at the checkout stand, paying the amount shown on the cash register, and then returning home to eat. In this case you interact with the grocery store and its clerks and you get food in exchange for money.

On the Internet several different protocols exist according to which computers interact to accomplish specific tasks. Some of the most commonly used (application-layer) protocols on the Internet are:

Application-Layer Protocols on the Internet
http Hypertext Transfer Protocol. Used to access web pages.
smtp Simple Mail Transfer Protocol. Used for e-mail.
ftp File Transfer Protocol. Used to upload and download files.
telnet Terminal emulation. Used to remotely control computers.

Languages

Natural languages like english, french, german, etc, are quite complex because of the many irregularities and exceptions that need to be known to use and interpret these languages correctly.

For the purpose of programming computers, or storing, tranmsitting, and rendering text, images, and sound, languages with much more rigorous structure and unambiguous interpretation are needed. On the WWW the most common document language is html (hypertext markup language) and its various variants. The language of resources on the Internet is usually known from the filename extension of the file in which the resource is stored. A file with the name index.html, for instance, contains hypertext markup language. The filename extensions of the most common "languages" used on the Internet are shown in the table below.

Common "Languages" (Filename Extensions) on the Internet and the WWW
txt A plain text file, usually written in ASCII (American Standard Code for Information Interchange) "language".
htm or html Hyper-Text Markup Language. An html document can contain text, images, tables, and links to other documents or multimedia files.
asp Active Server Pages. A scripting environment used by Microsoft Internet Information Servers (IIS) in which html, scripts and reusable ActiveX control components are combined to create dynamic web pages on the fly. ASP is not a language, it is a process that uses (a possibly extended) version of the html "language".
cfm or cfml ColdFusion Markup Language. A server-side scripting environment the can be used to interface with databases and create interactive Web applications. What is eventually sent to the browser uses the html "language" (possibly with some extensions).
php PHP stands for 'PHP: Hypertext Preprocessor'. A script language and interpreter that is freely available, primarily for use on Linux Web servers. As with Active Server Pages, PHP scripts are embedded within HTML code, but after processing the server just sends out documents in HTML "language".
pdf Portable Document Format. The PDF "language" can describe documents containing any combination of text, graphics, and images in a device-, resolution-, and platform-independent format. A free reader for PDF files can be downloaded from http://www.adobe.com/products/acrobat/readstep.html.
gif Graphics Interchange Format. A "language" that descibes images, which can be animated, in (lossless) compressed form. Often used for icons, graphics and clipart.
jpg or jpeg Joint Photographic Experts Group. A "language" that describes images in compressed form using (more or less) lossy compression. Good for photographic images and images that use more than 256 different colors.
png Portable Network Graphics. Developed as a (royalty-free) successor to the gif format. A "language" to describe images with up to 64 bits per pixel in compressed form using lossless compression. Good for graphic images with large color depth.
mpg or mpeg Moving Pictures Experts Group. A "language" that describes compressed audio and video information using one of the formats specified by the MPEG.
exe "executable" files. These are files encoded in a low-level computer language that can be executed directly on the computer platform for which they were written, e.g., a Windows machine that uses an Intel chipset.
zip "zipped" files. These are file archives that contain one or more files, with arbitrary filename extensions, in compressed form. This is handy for saving on storage space and transmission time, and for organizing sets of files that are related, but possibly written in different "languages" (e.g., text and image files).

Directories, Subdirectories, and Files

Documents, images, sounds, programs, etc, are stored in the form of data files with appropriate filename extensions on computers. All major computer platforms use hierarchically organized file systems with directories, subdirectories, sub-subdirectories (or folders, subfolders, sub-subfolders), etc.

The conventions that determine the syntax and other restrictions for directories and file names are dependent on the operating system under which a particular host computer runs. The main variants of operating systems are Unix/Linux, Windows, and MacOS. For historical reasons Windows is not truly distinguishing between filenames in upper and lower case (even though it lets you use upper and lower case letters in file and directory names). Thus, Windows is usually quite forgiving if you make upper/lower case errors when specifying a file or path name. Unix/Linux, however, which is very often used for WWW servers, is case sensitive when it comes to directory and file names and index.html and Index.html refer to two different files.

As an example, the following figure shows all subfolders of a folder called "directories", as displayed in Windows Explorer (set to "Detailed View").

Subdirectories of the test directory `directories'

The starting point, which is also called the "root" of a directory tree, is the folder or directory called "directories" in the above figure. One level below are the subfolders or subdirectories "Dogs", "e-texts", "images", and "Monkeys". Each of these subdirectories can contain other sub-subdirectories, etc. Often the names of directories and their subdirectories are chosen in a logical fashion. For example, the electronic text versions of the works of Edgar Allan Poe are stored using the directory path directories/e-texts/Poe,EdgarAllan.

The above figure does not show the actual files that are stored in the various directories. A more explicit graph of the directory "directories" and its subdirectories (highlighted in yellow) that shows all files (the entries without highlighting) is shown in the next figure.

Tree representation of the hierachical structure in 'directories'

The representation in this figure is called a tree representation of the root directory "directories" because the blue lines that lead from directories to their subdirectories resemble the branches of a tree. There are no further branches emanating from the actual stored files, and thus the filenames are called the leafs of the tree. Note that any (sub)directory may contain files as well as further (sub)directories, e.g., "gutindex.all" is a file in the subdirectory "e-texts", whereas "Maugham,W.Somerset" is another subdirectory.


URIs and URLs

URI stands for uniform resource identifier, and URL stands for uniform resource locator. A URI is characterized by the following definitions:

Characterization of Uniform Resource Identifiers (URIs)
Uniform Uniformity allows different types of resource identifiers (e.g., for text and images) to be used in the same context, even if the mechanisms to access these sources differ. But it also allows the identifiers to be reused in many different contexts, so that new applications or protocols can take advantage of a pre-existing and widely-used set of resource identifiers.
Resource A resource is anything that has an identity, such as an electronic document, an image, a service, etc. Not all resources are network retrievable, e.g., a plumber who stops a leak is also considered a resource.
Identifier An identifier is an object that acts as a reference to a resource. In the case of URIs, the object is a sequence of characters with a restricted syntax, e.g., http://www.w3.org/TR/html401/about.html.

Comment: The concept of a URI is more general than the concept of a URL because a URI can describe any resource (e.g., a book that was written 1000 years ago and subsequently lost in a fire), whereas a URL must also specify a location where the resource can be retrieved from. In practice the terms URI and URL are often used interchangeably.

To ensure that every user in a network executes a given protocol according to the same rules and every address is specified and used in a uniform manner, standards that everyone in the network agrees upon are necessary.

Definition: A standard is a documented agreement containing technical specifications or other precise criteria to be used consistently as rules, guidelines, or definitions of characteristics, to ensure that materials, products, processes and services are fit for their purpose.

For the Internet, all information necessary to achieve standardization is published in the form of RFCs (Request For Comment) which can be accessed on-line at http://www.rfc-editor.org.

The generic syntax of a URI, as defined in RFC 3986, consists of five main components as follows:

Generic Syntax of a Uniform Resource Identifier (URI)
<scheme>://<authority>/<path>?<query>#<fragment>

Here is an example of a URI (or URL) for the WWW and its interpretation:

Example: http://search.yahoo.com/search?p=car+loan
<scheme> http (HTTP protocol)
<authority> search.yahoo.com (fully qualified host name)
<path> search (name of file on host)
<query> p=car+loan (query passed along to "search")
<fragment> (omitted)

Here is another example of a URI (or URL) for the WWW and its interpretation:

Example: http://www.schaik.com/pngsuite/pngsuite.html#palette
<scheme> http (HTTP protocol)
<authority> www.schaik.com (fully qualified host name)
<path> pngsuite/pngsuite.html (directory/name of HTML file on host)
<query> (omitted)
<fragment> palette (specific position within document)

More generally, if the URI specifies the location of a retrievable resource on a network, then the five components have the following meanings (the specification in RFC 3986 is actually more complex and general, but this generality is not currently needed in the context of the WWW):

Meaning of URI Components for a URL
<scheme> Specifies the protocol that is used to retrieve the resource. Examples are http (HyperText Transfer Protocol) which is used to access Web pages and ftp (File Transfer Protocol) which is used to exchange files between two computers. In general <scheme> strings are NOT CASE SENSITIVE.
<authority> Specifies the host address in the form of a fully qualified domain name (see below) of a network host computer, such as yahoo.search.com or, in the form of a numeric IP (Internet Protocol) address, such as 66.218.71.200 (dotted decimal address). In general <authority> strings are NOT CASE SENSITIVE.
<path> Specifies the path to the file (including directories and subdirectories) on the host (or server) that contains data in a "language" specific to an application, e.g., in HTML for the WWW. The "language" is usually (implicitly) specified by the filename extension (see above). In general <path> strings ARE CASE SENSITIVE and depend on the conventions of the host where the resource resides.
<query> The query component is a string of information, e.g., the specification of search terms for a database, to be interpreted by the resource. In general <query> strings ARE CASE SENSITIVE and depend on the conventions used by the application which processes them.
<fragment> The fragment component allows to specify a secondary resource within a primary resource, e.g., a subsection in a HTML document. It is up to the primary resource to determine how to interpret the fragment. In general <fragment> strings ARE CASE SENSITIVE and depend on the conventions used by the application which processes them.

Not every URI (or URL) will consist of all five parts. The <query> part is usually only needed for searching databases or submitting data from forms embedded in a Web page. If the <path> component is omitted, then the host retrieves a default document, e.g., named index.html. If the <scheme> and <authority> components are omitted in a document, then the URL is taken to be relative to the URL of the document.

Upper/Lower Case Letters. The portions of a URL that specify the protocol and the the host computer address are case insensitive, i.e., http, HttP, and HTTP all specify the same protocol and www.colorado.edu, WWW.Colorado.EDU, and WwW.cOlOrAdO.eDu all specify the same host computer. But the <path>, <query>, and <fragment> strings are specific to the host computer and programs running on it, and will in most cases be case sensitive. Thus, whereas http://ece.colorado.edu/~mathys/ecen1200 takes you to the WWW site of this class, http://ece.colorado.edu/~mathys/ECEN1200 will give you an error message.


Special Characters, "Escaped" Characters in URIs

Special Characters. Certain characters cannot be used in URIs or should be avoided. Here's a table that lists all printable ASCII (American Standard Code for Information Interchange) characters and their classification for URIs:

Special Characters in URIs
reserved
(special function)
; / ? : @ & = + $ ,
unreserved
(no restrictions)
alpha a ... z, A ... Z and numeric 0 ... 9
and - _ . ! ~ * ' ( )
excluded
(must be ecaped)
< > # % " SP, where SP stands for space.
unwise
(better to escape)
{ } | \ ^ [ ] `

Non-printable characters, such as tabs, carriage return and line feeds are not allowed in URIs. If a reserved, excluded or unwise character needs to be included as part of a URI, then it must be "escaped" according to the table below. The % sign is the escape character, and the 2-digit number after it is the hexadecimal representation of the ASCII code of the character.

"Escaped" Representation of Special Characters in URIs
SP %20 / %2F [ %5B
" %22 : %3A \ %5C
# %23 ; %3B ] %5D
$ %24 < %3C ^ %5E
% %25 = %3D ` %60
& %26 > %3E { %7B
+ %2B ? %3F | %7C
, %2C @ %40 } %7D

Host Names

A host name, such as ece.colorado.edu consists of fields separated by dots. The fields correspond to different levels in the hierarchical domain name system (DNS), with the highest hierarchy level (top level) at the end ("edu") and the lowest hierarchy level ("ece") at the beginning. Another example is shown in the following table.

Example: galileo.jpl.nasa.gov
gov Top level domain, administered globally
nasa Second level domain (NASA: National Aeronautics and Space Administration), administered by gov.
jpl Third level domain (JPL: Jet Propulsion Laboratory), administered by nasa.gov
galileo Fourth (and in this case bottom) level domain (Galileo project), administered by jpl.nasa,gov

Any host on the Internet has a unique numerical address associated with its name. Under IPv4 (Internet Protocol version 4), these numerical addresses have a fixed length of 32 bits and are usually expressed in dotted decimal form, by converting 8 bits at a time to a decimal number in the range 0..255. For example, the dotted decimal address of the host galileo.jpl.nasa.gov is 137.78.160.55 and the URLs http://www.tonic.to/faq.htm and http://206.14.214.154/faq.htm refer to the same resource (assuming that tonic.to does not change the assignment between the numeric IP address and the domain name).

How about host addresses like 1.2.3.com or 1.2.3.4? Are these numeric IP addresses or domain names? To answer this question you look at the last field of the host address. If it is a number, then the address is a numeric IP address, otherwise it is a domain name.

Under IPv4, a valid numerical IP address has exactly 4 fields with decimal numbers in the range 0..255 in each. Valid domain names can in principle contain any number of fields (but more than 4 or 5 is not very practical), and the last field must be an official top level domain (TLD, see below).


Top Level Domains (TLDs)

In any kind of network, the address of each user must be distinct from the addresses of all other users. In a small network it is possible to just maintain a single table or database which keeps track of all addresses that are used. In a large network it is better to use a more distributed approach and to subdivide the whole address space into non-overlapping sets, which can then be either further subdivided, or administered by individual decentralized databases. The Internet uses a hierarchical decentralized system, called the domain name system (DNS), to ensure unique address allocation on a world-wide basis. At the top of the hierarchy are the top level domains (TLD), which can be either generic (or global) TLDs (gTLD) such as edu or country code TLDs (ccTLD) such as us, as shown in the following tables.

Original generic Top Level Domains (gTLD)
arpa "Address and routing parameter area". Used for reverse address lookup (numeric IP to domain name).
com This domain is intended for commercial entities or companies.
edu This is the domain for 4-year colleges and universities.
gov Reserved for agencies of the US Federal government.
mil This domain is used by the US military.
net Originally intended to hold only the computers of network providers and the network node computers. Now available to the general public.
org Originally intended for any kind of organization that doesn't fit anywhere else. Now available to the general public.


Additional generic Top Level Domains (gTLD)
aero Intended for the air transport industry.
biz Intended for business domain names.
cat Domain names for Catalan language/culture.
coop Domain names for cooperatives.
info Is a general TLD with no restrictions that can be used by anyone.
int For international organizations established by treaty.
jobs Domain names for employment-related sites.
mobi Domain names for sites catering to mobile devices.
museum Domain names for museums.
name Intended for personal Web sites in your name to post hobbies, interests, pictures, etc.
pro Intended for use by certified professionals worldwide.
travel Domain names for travel agents, airlines, hoteliers, etc.


Country code Top Level Domains (ccTLD)
xx xx stands for any 2-letter country code as specified in the ISO 3166-1 document, where ISO stands for International Standards Organization.


Examples of Country code Top Level Domains (ccTLD)
af Afghanistan
at Austria
au Australia
be Belgium
ch Switzerland
cn China
de Germany
fr France
in India
jp Japan
mx Mexico
ru Russian Federation
sa Saudi Arabia
uk United Kingdom
us United States

Your Task

To obtain credit for this homework/computer lab you need to answer the questions stated below. Send your solution to no-spam e-mail, making sure they conform to the following rules.

Rules for the Submission of Homework/Computerlab Solutions

Format: E-mail your solution as a plain text (ASCII) file. Do not use word processor files like Microsoft Word or "rich-text" HTML files.

Corrections: If you need to make corrections after you submitted your solution, resubmit all your answers (not only the ones that changed) since only your last submission of each homework/computer lab will be graded.

Teamwork: Teamwork is fine for the homework/computer labs, but the solutions must be turned in individually. In particular, copy and paste of entire solutions from other students is not acceptable.

Questions:

  1. URL Syntax. Use the pieces given below to make URLs and access the corresponding Web sites. State the complete URLs that you used and briefly describe the Web documents to which you were taken. State also the (computer as opposed to natural) "language" in which each was written and, if possible (based on the 2-letter country codes), the physical location of the host from which the documents came.

    URL # <scheme> <authority> <path> <query> <fragment>
    1 http www.pocketmovies.net pop.html
    2 ftp 128.138.189.30 pub/ecen1200/mm/bike.mpg
    3 http www.aliceadsl.fr meteo ville=07156
    4 http www.w3.org TR/html4/types.html h-6.5


  2. RFCs. Use the search feature at http://www.rfc-editor.org to find the newest RFC that describes the generic syntax of URIs (the title of the document is "Uniform Resource Identifier (URI): Generic Syntax"). What is the number of this RFC and when was it issued?

  3. URL Rules. For each of the following URLs, determine whether it is "legal" or not in terms of the rules that govern the construction of URLs (note that the actual Web sites may not exist, even if the URL is "legal")
    hTtP://as.seen.on.tv/miracle_gro.asp?cash-in
    http://show.biz/"oscar"awards.html
    http://just.do.it/sweatshops.cfm?locations
    http://flowers.4.u/order.htm#name
    If a URL is illegal, point out why it is illegal.

  4. Directories and Files. The test directory "directories" and all its subdirectories and files that were used as an example in the section Directories, Subdirectories, and Files have been installed on the server for this class and can be accessed at http://ece.colorado.edu/~mathys/ecen1200/directories. By modifying the <path> directly in the address window of the browser, open several files until you become familiar with the structure of "directories".
    1. The different sentences that can be made up by using the <path>

      ~mathys/ecen1200/directories/{Dogs or Monkeys}/{hate or like}/{bananas or bones}.html

      are contained in the bananas.html and bones.html files which constitute the leafs of the Dogs and Monkeys directory subtrees. Two of the files got misplaced in the process of uploading to the server. Give the complete <paths> of these two files.
    2. What is the title of the Edgar Allan Poe work in the file story2.txt?
    3. About which people is the image in the file df950523.jpg?


  5. Standards.
    1. Give three examples of products for which international standards exist.
    2. Give three examples of products for which international standards do not exist.