Search Urban Dictionary with C#
Here’s a quick example of how one might scrape the search results from Urban Dictionary. This shows some nice use of Regular Expressions and WebClient.
Thankfully Urban Dictionary is a very scrape-friendly site, finding results is as easy as locating two div’s and extracting the contents.
UrbanDictionary.Search(string); will return a list of key-value pairs which contain the word as the key, and the Urban Dictionary definition as the value.
C#: WebClient Usage
Microsoft has provided a great utility since .NET 1.0 for doing quick I/O via websites. The WebClient class provides basic functionality for downloading from and uploading to webservers. the WebClient example on MSDN leaves a lot to be desired, especially for beginners. I’d like to expand on that, as well as provide some example code.
C#: Using WebClient to fetch a page:
// create a new instance of WebClient WebClient client = new WebClient(); // set the user agent to IE6 client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)"); try { // actually execute the GET request string ret = client.DownloadString("http://www.google.com/"); // ret now contains the contents of the webpage Console.WriteLine("First 256 bytes of response: " + ret.Substring(0,265)); } catch (WebException we) { // WebException.Status holds useful information Console.WriteLine(we.Message + "\n" + we.Status.ToString()); } catch (NotSupportedException ne) { // other errors Console.WriteLine(ne.Message); }
(This code uses DownloadString, you can also use DownloadData for a more binary-friendly version)
This is great for fetching simple pages that have data encoded in the querystring, but there are some problems with the basic DownloadString method of WebClient. It’s synchronous, so it will block until the operation completes. So for slow connections, or large files, this would need to run in another thread. There is a better way. But first, here’s another example of another basic, but important, method, DownloadFile.
// create a new instance of WebClient WebClient client = new WebClient(); // set the user agent to IE6 client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)"); try { // actually execute the GET request client.DownloadFile("http://www.google.com/","google_fetch.txt"); // ret now contains the contents of the webpage Console.WriteLine("File Saved."); } catch (WebException we) { // WebException.Status holds useful information Console.WriteLine(we.Message + "\n" + we.Status.ToString()); } catch (NotSupportedException ne) { // other errors Console.WriteLine(ne.Message); }
This example is almost identical to the above, aside from the client.DownloadFile method. This will save the file to disk, instead of returning a string–often what you would end up doing with the string anyway.
But can I send POST data?
Yes! Using WebClient.UploadString or WebClient.UploadData you can POST data to the server easily. I’ll show an example using UploadData, since UploadString is used in the same manner as DownloadString.
byte[] bret = client.UploadData("http://www.website.com/post.php", "POST", System.Text.Encoding.ASCII.GetBytes("field1=value1&field2=value2") ); string sret = System.Text.Encoding.ASCII.GetString(bret);
UploadData returns a byte array (byte[]) which contains the contents of the page. This can easily be converted to a string by using the System.Text.Encoding.ASCII.GetString() method. You can also use the other encoding types in System.Text.Encoding (such as UTF8) to do the same.
Also note the use of GetBytes(), the UploadData method will only take a byte array as a buffer for the upload. This works well if you are uploading a binary file.
What about the asynchronous methods of WebClient?
WebClient provides asynchronous methods for fetching webpages as well, they are named similarly to the synchronous methods, DownloadFileAsync, DownloadDataAsync, DownloadStringAsync, UploadFileAsync, UploadDataAsync, UploadStringAsync, and UploadValuesAsync.
In order to execute an Async request, you will need to attach event handlers before the method is executed. Without these event handlers you will not have any sense of when the operation finishes, or what stage it is at.
C# WebClient Asynchronous call example:
void do_upload() { WebClient client = new WebClient(); // add event handlers for completed and progress changed client.UploadProgressChanged += new UploadProgressChangedEventHandler(client_UploadProgressChanged); client.UploadFileCompleted += new UploadFileCompletedEventHandler(client_UploadFileCompleted); // carry out the operation as normal client.UploadFileAsync("http://www.daveamenta.com/up.php", @"c:\somefile.bin"); } void client_UploadProgressChanged(object sender, UploadProgressChangedEventArgs e) { Console.WriteLine(e.ProgressPercentage); } void client_UploadFileCompleted(object sender, UploadFileCompletedEventArgs e) { if(e.Result != null) { Console.WriteLine(System.Text.Encoding.ASCII.GetString(e.Result)); } }
The call to client.DownloadFileAsync will no longer block the thread, and will execute in the background, periodically calling the event handlers to display progress. The other methods can be used in this same way.
C#: Send text to a pastebin (HttpWebRequest POST example)
I’m working on a small application that hooks into the clipboard monitor chain and displays a list of options whenever text is saved to the clipbaord. One of the options I wanted was to be able to send data to pastebin.com, a popular text/code sharing service. This is a relatively mundane task, faking the postdata for a webpage, and sending the requests as a browser. There were some hicups though, mostly with Finding the redirect URL using HttpWebRequest/HttpWebResponse.
A little background:
HttpWebRequest and HttpWebResponse live in the System.Net namespace, and provide the ability to quickly send HTTP requests to a webserver. This is what a simple GET request would look like:
try { HttpWebRequest request = (HttpWebRequest) WebRequest.Create("http://davux.pastebin.com/pastebin.php"); HttpWebResponse res =(HttpWebResponse) request.GetResponse(); if (res.StatusCode == HttpStatusCode.OK) { StreamReader reader = new StreamReader(res.GetResponseStream()); Console.WriteLine(reader.ReadToEnd()); } else { // handle errors } res.Close(); } catch (Exception ex) { Console.WriteLine("Error: " + ex.Message); }
Note: For most simple tasks, it is wise to use System.Net.WebClient, which is not suitable for this task, because it does not expose the ability to disable automatic redirection.
To send new data to pastebin, it needs to be sent via HTTP POST. Using Firefox LiveHTTPHeaders, I determined what string needs to be sent in order to create a new entry. This can also be determined by reading the source of the page, and looking at each of the fields contained in the <form> block.
string post = "&parent_pid=&format=text&code2=" + code_text_to_send + "&poster=" + poster_name + "&paste=Send&expiry=" + expiry + "&email=";
- code_text_to_send – The data which should be used to create a new page.
- poster_name – The author of the post
- expiry – How long the data should be retained for. Options are ‘d’ for one day, ‘m’ for one month, and ‘f’ for indefinitely.
Note: Make sure to URLEncode each field that is sent!
After further examination of the headers and data sent back, I decided that the best way to extract the link was to catch the HTTP 302 Document Found response.
HTTP/1.x 302 Found Date: Thu, 08 May 2008 01:06:08 GMT Server: Apache/1.3.33 (Debian GNU/Linux) mod_python/2.7.10 Python/2.3.4 PHP/4.3.10-22 mod_perl/1.29 X-Powered-By: PHP/4.3.10-22 Location: http://davux.pastebin.com/m10a794d6 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=iso-8859-1
HttpWebRequest, by default, has AllowAutoRedirect set to true, which means that, internally, it will resolve this message, and fetch the new URL. This extra operation is needless in this circumstance, so it is best to disable it and extract the URL from the header. The headers associated with each request or response are found in the Headers object for each class. We wish to look at the Location field to find our URL.
C# WebClient HTTP POST Example:
try { HttpWebRequest request = (HttpWebRequest) WebRequest.Create("http://davux.pastebin.com/pastebin.php"); request.AllowAutoRedirect = false; request.Method = "POST"; string post = "&parent_pid=&format=text&code2=" + HttpUtility.UrlEncode(text) + "&poster=Dave&paste=Send&expiry=m&email="; byte[] data = System.Text.Encoding.ASCII.GetBytes(post); request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = data.Length; Stream response = request.GetRequestStream(); response.Write(data,0,data.Length); response.Close(); HttpWebResponse res =(HttpWebResponse) request.GetResponse(); res.Close(); // note that there is no need to hook up a StreamReader and // look at the response data, since it is of no need if (res.StatusCode == HttpStatusCode.Found) { Console.WriteLine(res.Headers["location"]); } else { Console.WriteLine("Error"); } } catch (Exception ex) { Console.WriteLine("Error: " + ex.Message); }
(Some error checking and extra code have been removed for the sake of clarity)
The values for expiry and author have been hard-coded in order to save space. This basic example will create a new Pastebin entry on my not so private pastebin, and print the URL of the entry to the console. If there is an error, it will print the exception message. If the HTTP code returned is not 302/Found, it will print a generic error. (If the server is not returning a 302 for this request, there is nothing codewise that can be fixed, and thus the only option for error is generic.)

