Homework 3

  1. Text files are structured. They are sequences of characters arranged in things called Lines. Lines look like lines when you display the text file.

    The lines are marked in the text file by special characters. Depending on the operating system and different characters is used. Because unix and windows use different standards, sometimes this causes pain and suffering. Unix marks the end of a line with a single end-of-line character called newline and denoted \n, backslash n. It is in facter character code 10 in ascii, (see man ascii) and might sometimes show upt as ^J, j being the tenth letter of the alphabet, and ^ meaning counting from offset zero.

    Windows inherits from DOS the convention that lines are separated by the two character sequence carriage-return new-line, because in the olden times a new line was these two things, moveing the carriage back to column 1 and advancing the page by one new line. Carriage-return is denoted \r, backslash r, and is character code 13 in ascii. Since M is the thirteenth letter of the alphabet, it might show up looking like ^M.

    If you transfer text from the unix world to the microsoft world you have to compensate for this chnage in convention. Programs have to identify the thing you are transfering as text, find out the conventions on both platforms, and do the transformation. When this doesn't work, you get problems.

    Typically what happens is the microsoft text file ends up literal on a unix machine and as far as the unix machine is concerned every line ends with a carriage-return (just for the new-line) and so you get ^M and the end of each line.

    Write a sed program to fix this problem. It is helpful to assume that what you are doing is simply squashing the last character of each line.

    Test is on this file Find this file on lee, in directory ~csc322/bin. You might need the program hexdump -c to see the carriage returns. The vi on lee appears to have default options to treat the \r\n as an \n.

  2. I have a problem. I have several mail boxes that I sort my stuff into. When I run mail I have to use the ugly command,
       mail -f mailboxname
    
    with mailboxname a path to the mailbox. I'd rather have a script so that I can type,
       go-mail name
    
    Please write me this script.

    Use the case statement, see /etc/rc.d/init.d/httpd for the syntax (bottom of the file). Each name is a case entry, and the command expands out the full name. Here are some names to mailboxname pairs to implement,

       go-mail zzz   -> mail -f ~/mail/urgent-mail
       go-mail junk  -> mail -f ~/mail/csc322-complaints
       go-mail !     -> mail -f ~/mail/rec.games
    
  3. It's time to talk about submitting homework. So far, pretty good! But what about the easy way? Like zip and attachments and point and click it just works? tar, uuencode, ... , no wonder why everyone hates unix!

    The point is, the easy way is this way, with a bit of sugar around. Well it is really great sugar, but why don't we get a bit curious and find out how it works? And in the meanwhile, you can learn to fix the problems you have with submissions.

    Where this knowledge might be useful in the future is if you have to write something to automatically handle mail. People will send your programs the stuff that (sometimes) you send me, and you might just have to deal with it. Besides it is amazing how a little bit of organization on the technical details can cause a social revolution.

    Those organizing details is called MIME, invented by Borenstein, then at Bellcore (New Jersey, the homeland of Unix). Here are two example emails csc322 has received:

    Look at the header of the first email. Find the line:

     Content-Type: multipart/mixed; boundary=Apple-Mail-7-898105972
    
    See how the mail is broken by lines with this boundary value embedded. After the boundary is a subheader, which is broken from the body by a blank line, just like all email. Find the line
      Content-Disposition: attachment;
    	filename=hw1.tar.uu
    
    Get the picture?

    Now look at the second email. I uploaded the tar.uu to gmail as an attachment. Find the lines:

      ------=_Part_289_20995838.1108053948914
      Content-Type: application/uue; name="hw1.tar.uu"
      Content-Transfer-Encoding: base64
      Content-Disposition: attachment; filename="hw1.tar.uu"
    
    Can you figure it out? What is What is base 64 encoding? (Do a google.) Can you decode this portion of the file (see http://makcoder.sourceforge.net/demo/base64.php)?

    Now that we've talked, here's what you should do:

    To help get you started.

    1. Write the grep that decides if you have multipart mail, else exit
    2. Write the grep and sed that splices the mail into pieces along the boundary.
    3. Now add that while you are doing the splicing you add the proper filter script.
    4. Note this way of finding out if a string is in a file:
           if ( grep string filename > /dev/null ) then
      
      Because grep outputs the match, but also returns true if there is a match. So it can be used to check for, as well as extract, patterns.

    This is open ended!

    You should be able to process the two example files given. You should experiment with mail you generate and take on any additional interesting challenges you feel appropriate.