Showing posts with label Internet/Networking. Show all posts
Showing posts with label Internet/Networking. Show all posts

Monday, December 18, 2006

"This stream does not support seek operations."

I ran into a problem while reading binary data from a site for my web-spidering application that I was developing a couple of months ago. I was able to read text strings from few sites but failed on many websites because the ResponseStream was not seekable. See this code snippet

 
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(url);         
HttpWebResponse myResponse = (HttpWebResponse) myRequest.GetResponse();
int length = myResponse.ContentLength;
The ContentLength property of the WebResponse object failed to retrieve the stream’s length and threw “This stream does not support seek operations” exceptions. I later realized that it was not actually a problem with the WebResponse object, but the way I intended to retrieve binary data from the web was not right. Ok, so how can I retrieve data out of this stream, say, a straight HTML text for further parsing? The best way to do this would be to copy this stream to a MemoryStream and finally convert it into a Byte Array.
 
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(url);         
HttpWebResponse myResponse = (HttpWebResponse) myRequest.GetResponse();
          
Stream respStream = myResponse.GetResponseStream();

MemoryStream memStream = new MemoryStream();
byte[] buffer = new byte[2048];

int bytesRead = 0;
do
{
    bytesRead = respStream.Read(buffer, 0, buffer.Length);
    memStream.Write(buffer, 0, bytesRead);
} while (bytesRead != 0);
respStream.Close();
buffer = memStream.ToArray();
string html = System.Text.Encoding.ASCII.GetString(buffer);

Here, I am instantiating a new MemoryStream object, reading fixed bytes from the stream and copying it over to the MemoryStream. The Stream.Read() method reads a maximum of "bytesRead" bytes from the current stream and store them in the buffer. In the above example it reads a maximum of 2048 bytes each time, stores them in a buffer and finally write that into the MemoryStream. The method returns a 0 if there is no more data to be read. One important thing to be noted here is that the Stream.Read() method can return fewer bytes than requested (< 2048) even if end of the stream has not been reached.

MemoryStream.ToArray() method finally converts it into a Byte Array. If the retrieved data is of plain-text type, which can be know from its headers, can be converted into a string using the System.Text.Encoding.ASCII.GetString(buffer) method. Else, write the byte array to a file using the FileStream object.

You will find this implementation very useful if you’re planning to download something and later resume broken downloads in your web-spidering applications….!