==============================================================================
C-Scene Issue #2
CGI in C a starters tutorial
Name: Brent York
Handle: The Dragon	(ThaDragon on IRC)
Email: york@nbnet.nb.ca
Affils: Coder for Nuclear Winter Entertainment
	Coder for Spheringer technologies.
	Coder for Henry and York Development    "Development with thought."
        Editor of Cscene Magazine
	Organizer of the OS Developers Information Network (ODIN).

Desc:	CGI in C a starters tutorial
==============================================================================


	Many web developers haven't really experienced coding CGI, and if they
have its usually in a language such as perl. The nice thing about CGI is 
its interface is common (hence Common Gateway Interface), and therefore 
its a standard accross all languages.

	CGI requires that you use a language that writes to STDOUT, which means
that pretty much any language can be used for writing CGI, but for fast 
execution C is IDEAL for cgi, and fast execution is usually a must for 
small or large databases, and other types of CGI.

	In this article I will give you a background on CGI, apply that 
background to C, and then walk you through developing a CGI application. 
The CGI application we will develop will take 2 forms, a text based counter,
and a program that shows you how to get input from the HTTPD, so you can use it
in your program.

	So without further ado, lets get on with the article.
=============================================================================
The common gateway interface (CGI)
==================================

	The common gateway interface, or CGI as people know it, is a 
standard put forth by NCSA for allowing more interaction with webpages. 
All programs are run on the server side, unlike java, and prettymuch any 
language can be used for developing CGI.

	The CGI interface for output merely consists of a Content type 
line, that contains a MIME type/subtype description followed by two 
newlines that alerts the browser of whats coming to it. The CGI interface 
as it applies to input consists of a set of environment variables which can be
retrieved and used by the CGI application, thereby allowing us to get 
information from the client.

	This probably means nothing to you, but as we go on it will.


===============
Output with CGI 
===============

	CGI output is quite simple. It consists of a Content type line 
with a MIME type/subtype that alerts the HTTPD of what type of data is 
coming for it to parse and return the proper data to the client 
so it can use it.
	
	However, on the CGI end of things, the type/subtype you output 
has to be adhered to quite strictly. This means that if you tell it youre 
going to send it text/plain youre sending plain text... much like 
text/html expects plaintext or HTML.

MIME types/subtypes are as follows:	
text/plain
text/html
image/gif
image/jpeg

There are others but I wont relay them here as these are the ones we are 
concerned with.

There are many headers that are valid to CGI and they are as follows:

Content-type     - A header used to tell the HTTPD what type of data to 
		   expect so it can parse it and output things properly.
		   Example: "Content-type: text/plain"

Location         - A header used to refer the HTTPD to another location for
		   the proper document, often used in things like Microsoft's 
		   pull down page referral lists etc.
		   Example: 
                   "Location: http://dwi.netc.com/CScene/CS1/CS101.html"

Content-length:  - A header used to tell the HTTPD the size of the data 
		   being sent to it.
                   Example: "Content-length: 1024"

Expires		 - used to tell the HTTPD only to show the data if its earlier 
		   than a certain day of the week/month/day/year at a certain
		   Hour/Minute/Second based at GMT in a 24 hour format.
		   Example: "Expires: Tuesday, 02-may-12 24:00:00 GMT"
		   
Content-encoding - Used to specify the encoding of a document, valid 
		   values for this are x-gzip (.gz), x-compress (.z) and
		   x-zip (.zip).
		   Example: "Content-encoding: x-gzip"

Note all of these are followed by two newlines (\n\n).


	The only headers we will be occupied with in this article are text/html
and text/plain, which are used for our counter and which will also be used for
our variable display program.

	So, basically, the CGI output consists of a header, two newlines 
and your appropriate data, and thats it. Simple eh?

Example:

sayhi.c			/* Compile to sayhi.cgi */
==================================================

int main(void) {
  printf("Context-type: text/html\n\n");
  printf("Hi from the CGI!\n");
  return 0;
}


This simply adds "Hi from the CGI!" into the HTML code. You could also 
add text manipulation tags, infact you could add your entire page as the 
CGI, however it would be a waste of time to do so.

	So continuing on, youve now got a small grip on CGI and how it 
works, so lets describe the input.


============================================================
Input with the environment with the common gateway interface
============================================================

	Input with CGI usually (but not always) requires environment 
variables, which can be gotten with a call to getenv() for each 
environment variable you want. 
 
	There are quite a few variables to choose from, each giving you 
valuable information that you might be able to use to your advantage.
We will cover each of them here, and possibly some you might not find in 
NCSA's own documentation (woo!). This is mainly because we will be using 
them all in our program to display them.

	I wont go into how getenv() works you can check your helpfile, 
C-Book or manpage for that. I will however list the variables that you 
can access:

SERVER_SOFTWARE   - This obviously holds the software name and version of the
		    server you are running on. For example "NCSA 1.0"

SERVER_NAME	  - This holds the servers hostname, DNS alias, or IP address,
                    as it would appear in self referencing URLs.

GATEWAY_INTERFACE - This holds the revision of the CGI specification to which
		    this server complies and understands. Format is
		    CGI/revision.

SERVER_PROTOCOL   - This holds the name and revision of the information 
		    protocol this request came in with. 
                    Format is protocol/revision

SERVER_PORT	  - The port the server listens on for connections, usually
		    80, but its best to check this if your CGI relys on it
		    because it doesn't have to be.

REQUEST_METHOD    - The method with which the request was made, for the HTTP
		    protocol, this is "GET", "HEAD", or "POST".

PATH_INFO	  - Scripts can be accessed as thier virtual pathname, 
		    followed by extra information at the end of this path.
                    The extra information is sent as PATH_INFO. This 
                    information should be decoded by the server if it comes
		    as a URL before it is passed to the CGI script.

PATH_TRANSLATED	  - The server provides a translated version of PATH_INFO,
                    which takes the path and does any virtual to physical
                    mapping to it. It is then stored in this environment
                    variable.

SCRIPT_NAME	  - This is a virtual path to any script being executed, used
                    for self referencing URLs.

QUERY_STRING	  - Any information following a ? in the URL which  
                    referred to this script. It should not be decoded in any
                    fashion when it gets to you, which means of course youll
                    have to decode it. This is *GREAT* for search engines ;}. 
 
REMOTE_HOST	  - This holds the address of the remote host which is 
                    the host of the person calling the script.
                    If the server doesn't have the information this is NULL
		    and REMOTE_ADDR is set instead with its IP.
REMOTE_ADDR	  - The ip of the remote address making the request.

AUTH_TYPE	  - If the server supports authentification, and the script 
		    is protected this is the protocol specific method 
                    used to validate the user. 

REMOTE_USER	  - If the server supports authentification, and the 
                    script is protected this is the username they have 
                    authenticated as.

REMOTE_IDENT      - If the server supports RFC 931 identfication 
                    protocol, then this variable will be set to the name of 
                    the user that it retrieved from the remote host.
                    Usage of this variable should be limited to logging only
                    and is not suggested for authentification purposes as
                    identification can be faked easily.

CONTENT_TYPE	  - For queries which have attached information such as "HTTP"
                    "POST" and "PUT", this is the content type of the 
                    data, usually its text/plain.

CONTENT_LENGTH	  - For queries which have attached information such as "HTTP"
                    "POST" and "PUT", this is the content length of the 
                    data.

HTTP_ACCEPT       - The MIME types which the client will accept, as given by
                    the HTTP headers. Each item in this list is seperated by
                    commas.

HTTP_USER_AGENT   - The browser the client is using to send the request.
	            General format is software/version library/version, but
                    it can prettymuch be anything.

HTTP_REFERER      - The URL of the document that refered you to the script.
		    This of course will be nothing if you happen to just
                    access the script instead of accessing it from an html
                    document.

These are all variables which return the information that you have in the 
description of them all through the use of getenv().

Above and beyond that, for input HTTPD has its own encoding, which you 
have to handle yourself, its pretty simple to handle and only has a few 
quirks, basically you dont need to know any of this until you cover forms 
and cgi, which will be my next installment of this document in the next 
CScene. The reason I leave it till the next CScene is because I dont have 
the time to cover forms in this document, there is a HUGE plethora of 
information that involves forms and its a tutorial in itself =}.



===========================================================
C programming as it applies to the common gateway interface
===========================================================

	C programs are EXCELLENT for CGI because its a fast compiled 
language, and doesn't take up as much ram as perl or other programming 
languages. Above and beyond that C programs allow for a bit of security 
in that they are compiled and someone cant swipe your CGI program (which 
is possible).

	So as it applies to CGI, C is an excellent way to go, C is 
completely capable of handling CGI output and input, although its 
sometimes harder in C to handle input, but its never a complete and total 
disaster. 

	C is also perfect because it can open binary files and print the 
data from them to stdout, which is EXTREMELY useful when making things 
that involve picture based counters.


==============================================
Developing A text based C/CGI counter program.
==============================================

	In developing a text based CGI counter program we will encounter 
a few quirks of CGI and therefore its a great way to start programming in it.

	The first quirk is that with text you cant have its output on the 
end of the page without SSI (Server Side Includes), This is fine however 
as we will write it and put it in an A HREF and access it as a link, so 
you can see it work.

	So basically the sequence we want is to print the header out to STDOUT
with printf, followed by two newline characters. We are printing out text 
so we will go with text/plain so theres no translation by the server. 

	Other things we need are file opening input, and output, as well 
as a way to increment the counter. This should be left to you as you 
should know C and or C++ well enough before you ever tackle CGI. Protocol 
interfacing is *NEVER* a task for a newbie.

So without further ado, tcount.c

----------->8 Cut 8<----------tcount.c ------------>8 Cut 8<------------ 
#include <stdio.h>		/* Standard IO routines */

int main(void) {
  FILE *data_ptr;
  int count;

  /* Print the header */
  printf("Content-type: text/plain\n\n");
  
  /* Open the data file if you cant say so and exit with errlevel 1 */
  if (!(data_ptr=fopen("tcount.dat","r"))) {
    printf("Error opening tcount.dat for reading!\n");
    printf("Error 001: Exiting.\n");
    return 1;
  } else {     			               /* Obviously the datafile
				               opened fine so read it. */
    fscanf(data_ptr,"%i\n",&count);
    printf("%i\n",++count);		       /* print the counter and */ 
    fclose(data_ptr);			       /* increment */

    /* Open the same file for writing, if you cant say so and exit with 
       errorlevel 2. */
    if (!(data_ptr=fopen("tcount.dat","w"))) {	
      printf("Error opening tcount.dat for writing!\n");
      printf("Error 002: Exiting.\n");
      return 2;
    } else				       /* You got write access */
      fprintf(data_ptr,"%i\n",count);          /* write the new access count
						  to the datafile */
  }
  return 0;				       /* Exit without error */
}



This is about as simple as a counter cgi gets. Compile it with:
gcc -o tcount.cgi tcount.c (or whatever is appropriate for your compiler).

Then comes the fun part of setting it up, chmod it with execute values 
globally and make a datafile that says "0" with a newline call it 
tcount.dat and chmod it with rw values globally. Stick both in your 
cgi-bin dir (whereever that may be) and use the following HTML to test it.

<!-- Test for tcount.cgi -->
<Html>
<Body>
  <A Href="/cgi-bin/tcount.cgi">See the counter!</A>
</Body>
</Html>

It should work perfectly =}.

And now for input using CGI, isn't this document wonderful ? ;}

========================================
Input using the Common Gateway Interface
========================================

	Given the above variables, we can get some input from the user 
for things like search engines, forms, and a few other things. We wont go 
into forms right now so basically we wont get into the GET, or POST way 
of things.

	All we will cover is how to get the variables and print them from 
a CGI program. But with this you could use the QUERY_STRING variable with 
your graphical counter CGI to implement things like number sets etc... 
this will be an excercise for you, I wont cover using the QUERY_STRING 
variable with the counter, but I will cover it with the CGI program Im 
going to write.

Basically we will be outputting a Content-type: text/plain header and 
then outputting the names of the environment variables and what they contain.
We will however use the QUERY_STRING in our html so you can see whats 
going on with it =}.

So basically heres our program

-------->8 Cut 8<------- showvars.c ------->8 Cut 8<-------
#include <stdio.h>

char evars[20][80]={"SERVER_SOFTWARE", "SERVER_NAME", "SERVER_PROTOCOL", 
                    "SERVER_PORT",
                    "GATEWAY_INTERFACE", "REQUEST_METHOD", 
                    "PATH_INFO", "PATH_TRANSLATED", "SCRIPT_NAME", 
                    "QUERY_STRING", 
                    "REMOTE_HOST", "REMOTE_ADDR", "REMOTE_USER", 
                    "REMOTE_IDENT",
                    "AUTH_TYPE", "CONTENT_TYPE", "CONTENT_LENGTH", 
                    "HTTP_ACCEPT", "HTTP_USER_AGENT", "HTTP_REFERER"};


int main(void) {
  const numvars=20;
  int i;
  
  printf("Content-type: text/plain\n\n");
  for (i=0;i<numvars;i++) printf("%s = %s\n", evars[i], getenv(evars[i]));
  return 0;
}
 


Note that the things you did for the text counter MUST be done for this 
as well. That means make it world executable. Note theres no data files 
so chmod'ing them is not needed. If there was any there would be.

The HTML for the test for this is:
<Html>
<Body>
<A Href="/cgi-bin/evars.cgi?Testing_Testing_1_2_3">See the environment vars!</a> 
</Body>
</Html>

Note the ?Testing_Testing_1_2_3... This will appear in QUERY_STRING... I 
think you can see the possibilities ;}.



=======
Summary
=======

	In this document you have learned how to get basic input from the 
user and output many things from your CGI to the HTTPD and conversely to 
the remote host accessing the document. 

	This is enough information to begin writing useful CGI, however, 
Im sure you would like to know more... Therefore, what I plan on doing is 
doing another article covering forms and CGI in the next issue of CScene. 
Finally in a third installment two CScenes from now I shall finish off 
the entire CGI tutorial set with CGI and graphics.

Be on the lookout for them, as they supplement this article and should give 
you almost everything you ever needed to know to write good CGI. 

	Thank you for listening to my rants and raves.

You may contact me at york@nbnet.nb.ca if you have any questions or 
concerns about this document.

The Dragon
Brent York

You can download a zipfile of all the source in this document here.

C Scene Official Web Site : http://cscene.oftheinter.net
C Scene Official Email : cscene@mindless.com
This page is Copyright © 1997 By C Scene. All Rights Reserved