Search Urban Dictionary with C#

January 22, 2010 by Dave · Leave a Comment
Filed under: Code Snippets 

Here’s a quick example of how one might scrape the search results from Urban Dictionary.  This shows some nice use of Regular Expressions and WebClient.

(View Text)

Thankfully Urban Dictionary is a very scrape-friendly site, finding results is as easy as locating two div’s and extracting the contents.

UrbanDictionary.Search(string); will return a list of key-value pairs which contain the word as the key, and the Urban Dictionary definition as the value.

C#: WebClient Usage

May 9, 2008 by Dave · 12 Comments
Filed under: Code Snippets, Quick Tips 

Microsoft has provided a great utility since .NET 1.0 for doing quick I/O via websites.  The WebClient class provides basic functionality for downloading from and uploading to webservers.  the WebClient example on MSDN leaves a lot to be desired, especially for beginners.  I’d like to expand on that, as well as provide some example code.

C#: Using WebClient to fetch a page:

        // create a new instance of WebClient
        WebClient client = new WebClient();
 
        // set the user agent to IE6
        client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)");
        try
        {
            // actually execute the GET request
            string ret = client.DownloadString("http://www.google.com/");
 
            // ret now contains the contents of the webpage
            Console.WriteLine("First 256 bytes of response: " + ret.Substring(0,265));
        }
        catch (WebException we)
        {
            // WebException.Status holds useful information
            Console.WriteLine(we.Message + "\n" + we.Status.ToString());
        }
        catch (NotSupportedException ne)
        {
            // other errors
            Console.WriteLine(ne.Message);
        }

(This code uses DownloadString, you can also use DownloadData for a more binary-friendly version)

This is great for fetching simple pages that have data encoded in the querystring, but there are some problems with the basic DownloadString method of WebClient.  It’s synchronous, so it will block until the operation completes.  So for slow connections, or large files, this would need to run in another thread.  There is a better way.  But first, here’s another example of another basic, but important, method, DownloadFile.

        // create a new instance of WebClient
        WebClient client = new WebClient();
 
        // set the user agent to IE6
        client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)");
        try
        {
            // actually execute the GET request
            client.DownloadFile("http://www.google.com/","google_fetch.txt");
 
            // ret now contains the contents of the webpage
            Console.WriteLine("File Saved.");
        }
        catch (WebException we)
        {
            // WebException.Status holds useful information
            Console.WriteLine(we.Message + "\n" + we.Status.ToString());
        }
        catch (NotSupportedException ne)
        {
            // other errors
            Console.WriteLine(ne.Message);
        }

This example is almost identical to the above, aside from the client.DownloadFile method.  This will save the file to disk, instead of returning a string–often what you would end up doing with the string anyway.

But can I send POST data?

Yes!  Using WebClient.UploadString or WebClient.UploadData you can POST data to the server easily.  I’ll show an example using UploadData, since UploadString is used in the same manner as DownloadString.

            byte[] bret = client.UploadData("http://www.website.com/post.php", "POST",
                System.Text.Encoding.ASCII.GetBytes("field1=value1&field2=value2") );
 
            string sret = System.Text.Encoding.ASCII.GetString(bret);

UploadData returns a byte array (byte[]) which contains the contents of the page.  This can easily be converted to a string by using the System.Text.Encoding.ASCII.GetString() method.  You can also use the other encoding types in System.Text.Encoding (such as UTF8) to do the same.

Also note the use of GetBytes(), the UploadData method will only take a byte array as a buffer for the upload.  This works well if you are uploading a binary file.

What about the asynchronous methods of WebClient?

WebClient provides asynchronous methods for fetching webpages as well, they are named similarly to the synchronous methods, DownloadFileAsync, DownloadDataAsync, DownloadStringAsync, UploadFileAsync, UploadDataAsync, UploadStringAsync, and UploadValuesAsync.

In order to execute an Async request, you will need to attach event handlers before the method is executed.  Without these event handlers you will not have any sense of when the operation finishes, or what stage it is at.

C# WebClient Asynchronous call example:

void do_upload() {
   WebClient client = new WebClient();
   // add event handlers for completed and progress changed
   client.UploadProgressChanged += new UploadProgressChangedEventHandler(client_UploadProgressChanged);
 
   client.UploadFileCompleted += new UploadFileCompletedEventHandler(client_UploadFileCompleted);
  // carry out the operation as normal
   client.UploadFileAsync("http://www.daveamenta.com/up.php", @"c:\somefile.bin");
}
 
void client_UploadProgressChanged(object sender, UploadProgressChangedEventArgs e)
{
   Console.WriteLine(e.ProgressPercentage);
}
 
void client_UploadFileCompleted(object sender, UploadFileCompletedEventArgs e)
{
   if(e.Result != null) {
      Console.WriteLine(System.Text.Encoding.ASCII.GetString(e.Result));
   }
}

The call to client.DownloadFileAsync will no longer block the thread, and will execute in the background, periodically calling the event handlers to display progress.  The other methods can be used in this same way.

C#: Send text to a pastebin (HttpWebRequest POST example)

May 9, 2008 by Dave · 4 Comments
Filed under: Code Snippets 

I’m working on a small application that hooks into the clipboard monitor chain and displays a list of options whenever text is saved to the clipbaord.  One of the options I wanted was to be able to send data to pastebin.com, a popular text/code sharing service.  This is a relatively mundane task, faking the postdata for a webpage, and sending the requests as a browser.  There were some hicups though, mostly with Finding the redirect URL using HttpWebRequest/HttpWebResponse.

A little background:

HttpWebRequest and HttpWebResponse live in the System.Net namespace, and provide the ability to quickly send HTTP requests to a webserver.  This is what a simple GET request would look like:

try
{
 
    HttpWebRequest request  = (HttpWebRequest) WebRequest.Create("http://davux.pastebin.com/pastebin.php");
 
    HttpWebResponse res =(HttpWebResponse) request.GetResponse();
    if (res.StatusCode == HttpStatusCode.OK)
    {
          StreamReader reader = new StreamReader(res.GetResponseStream());
          Console.WriteLine(reader.ReadToEnd());
    } else {
         // handle errors
    }
 
    res.Close();
 
}
catch (Exception ex)
{
    Console.WriteLine("Error: " + ex.Message);
}

Note: For most simple tasks, it is wise to use System.Net.WebClient, which is not suitable for this task, because it does not expose the ability to disable automatic redirection.

To send new data to pastebin, it needs to be sent via HTTP POST.  Using Firefox LiveHTTPHeaders, I determined what string needs to be sent in order to create a new entry.  This can also be determined by reading the source of the page, and looking at each of the fields contained in the <form> block.

string post = "&amp;parent_pid=&amp;format=text&amp;code2=" + code_text_to_send + "&amp;poster=" + poster_name + "&amp;paste=Send&amp;expiry=" + expiry + "&amp;email=";
  • code_text_to_send – The data which should be used to create a new page.
  • poster_name – The author of the post
  • expiry – How long the data should be retained for.  Options are ‘d’ for one day, ‘m’ for one month, and ‘f’ for indefinitely.

Note: Make sure to URLEncode each field that is sent!

After further examination of the headers and data sent back, I decided that the best way to extract the link was to catch the HTTP 302 Document Found response.

HTTP/1.x 302 Found
Date: Thu, 08 May 2008 01:06:08 GMT
Server: Apache/1.3.33 (Debian GNU/Linux) mod_python/2.7.10 Python/2.3.4 PHP/4.3.10-22 mod_perl/1.29
X-Powered-By: PHP/4.3.10-22
Location: http://davux.pastebin.com/m10a794d6
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

HttpWebRequest, by default, has AllowAutoRedirect set to true, which means that, internally, it will resolve this message, and fetch the new URL.  This extra operation is needless in this circumstance, so it is best to disable it and extract the URL from the header.  The headers associated with each request or response are found in the Headers object for each class.  We wish to look at the Location field to find our URL.
C# WebClient HTTP POST Example:

try
{
 
    HttpWebRequest request = (HttpWebRequest)
        WebRequest.Create("http://davux.pastebin.com/pastebin.php");
 
    request.AllowAutoRedirect = false;
    request.Method = "POST";
 
    string post = "&amp;parent_pid=&amp;format=text&amp;code2=" + HttpUtility.UrlEncode(text) + "&amp;poster=Dave&amp;paste=Send&amp;expiry=m&amp;email=";
    byte[] data = System.Text.Encoding.ASCII.GetBytes(post);
 
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = data.Length;
 
    Stream response = request.GetRequestStream();
 
    response.Write(data,0,data.Length);
 
    response.Close();
 
    HttpWebResponse res =(HttpWebResponse) request.GetResponse();
    res.Close();
    // note that there is no need to hook up a StreamReader and
    // look at the response data, since it is of no need
 
    if (res.StatusCode == HttpStatusCode.Found)
    {
        Console.WriteLine(res.Headers["location"]);
    }
    else
    {
        Console.WriteLine("Error");
    }
 
}
catch (Exception ex)
{
    Console.WriteLine("Error: " + ex.Message);
}

(Some error checking and extra code have been removed for the sake of clarity)

The values for expiry and author have been hard-coded in order to save space.  This basic example will create a new Pastebin entry on my not so private pastebin, and print the URL of the entry to the console.  If there is an error, it will print the exception message.  If the HTTP code returned is not 302/Found, it will print a generic error.  (If the server is not returning a 302 for this request, there is nothing codewise that can be fixed, and thus the only option for error is generic.)