I ran into a problem while reading binary data from a site for my web-spidering application that I was developing a couple of months ago. I was able to read text strings from few sites but failed on many websites because the ResponseStream was not seekable. See this code snippet
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(url); HttpWebResponse myResponse = (HttpWebResponse) myRequest.GetResponse(); int length = myResponse.ContentLength;The ContentLength property of the WebResponse object failed to retrieve the stream’s length and threw “This stream does not support seek operations” exceptions. I later realized that it was not actually a problem with the WebResponse object, but the way I intended to retrieve binary data from the web was not right. Ok, so how can I retrieve data out of this stream, say, a straight HTML text for further parsing? The best way to do this would be to copy this stream to a MemoryStream and finally convert it into a Byte Array.
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(url); HttpWebResponse myResponse = (HttpWebResponse) myRequest.GetResponse(); Stream respStream = myResponse.GetResponseStream(); MemoryStream memStream = new MemoryStream(); byte[] buffer = new byte[2048]; int bytesRead = 0; do { bytesRead = respStream.Read(buffer, 0, buffer.Length); memStream.Write(buffer, 0, bytesRead); } while (bytesRead != 0);
respStream.Close(); buffer = memStream.ToArray(); string html = System.Text.Encoding.ASCII.GetString(buffer);
Here, I am instantiating a new MemoryStream object, reading fixed bytes from the stream and copying it over to the MemoryStream. The Stream.Read() method reads a maximum of "bytesRead" bytes from the current stream and store them in the buffer. In the above example it reads a maximum of 2048 bytes each time, stores them in a buffer and finally write that into the MemoryStream. The method returns a 0 if there is no more data to be read. One important thing to be noted here is that the Stream.Read() method can return fewer bytes than requested (< 2048) even if end of the stream has not been reached.
MemoryStream.ToArray() method finally converts it into a Byte Array. If the retrieved data is of plain-text type, which can be know from its headers, can be converted into a string using the System.Text.Encoding.ASCII.GetString(buffer) method. Else, write the byte array to a file using the FileStream object.
You will find this implementation very useful if you’re planning to download something and later resume broken downloads in your web-spidering applications….!