Using WebClient with Basic Authentication and Forms Authentication

August 10, 2015

I had the chance to investigate how we could automate downloads from a couple of websites. The current process is excruciatingly manual, and ripe for errors (as all manual processes are).

I first went to check out the websites to see what we were dealing with here. Is there an API that could be used to pull the files down? Is there some sort of service that they could provide to push the files to us? No. And no, of course.

So no API. No clean way to do it. I’ll just have to login programatically and download what I need. Of course the two sites I was accessing had completely different implementations.

The first one was pretty easy. It just uses basic authentication and then allows you to proceed. Why a public-facing web application uses basic authentication in 2015 I don’t know, but I guess that’s another conversation.

Here’s how I implemented it. I also needed to actually download the file by sending a POST to a particular URL. I needed to save it somewhere specific so that’s included as well.

            Uri uri = new Uri(_authUrl);

            var credentialCache = new CredentialCache();
            credentialCache.Add(
              new Uri(uri.GetLeftPart(UriPartial.Authority)), // request url's host
              "Basic",  // authentication type. hopefully they don't change it.
              new NetworkCredential(_uname, _pword) // credentials 
            );

            using (WebClient client = new WebClient())
            {
                client.UseDefaultCredentials = true;
                client.Credentials = credentialCache;

                System.Collections.Specialized.NameValueCollection formParams = new System.Collections.Specialized.NameValueCollection();

                // This is the stuff that the form on the page expects to see. Pulled from the HTML source and javascript function.
                formParams.Add("param1", "value1");
                formParams.Add("param2", "value2");
                formParams.Add("param3", "value3");
                formParams.Add("filename", _downloadFileName);

                byte[] responsebytes = client.UploadValues(_urlForDownload, "POST", formParams);

                // Write the file somewhere? NOTE: location must exist. May want to do something to make sure of that when implementing exception handling
                if (!Directory.Exists(_fileDownloadLocation))
                    Directory.CreateDirectory(_fileDownloadLocation);

                File.WriteAllBytes(string.Format(@"{0}\{1}", _fileDownloadLocation, _downloadFileName), responsebytes);
            }

The other website used Forms Authentication in its implementation. While this was a welcomed difference (since, again it’s 2015), it did make it a little bit more difficult.

I couldn’t just use C#’s WebClient again because it doesn’t deal with cookies. And most applications on the internet use sessions, cookies, and other such hackery to keep track of you and make sure that you’re really logged in and are who you say you are.

I found an implementation of what seems to be called a “cookie-aware WebClient.” I don’t recall which site I got it from, but many implement it in a very similar way. Here is the code for a class called WebClientEx. It simply extends WebClient:

    public class WebClientEx : WebClient
    {
        public WebClientEx(CookieContainer container)
        {
            this.container = container;
        }

        private readonly CookieContainer container = new CookieContainer();

        protected override WebRequest GetWebRequest(Uri address)
        {
            WebRequest r = base.GetWebRequest(address);
            var request = r as HttpWebRequest;
            if (request != null)
            {
                request.CookieContainer = container;
            }
            return r;
        }

        protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
        {
            WebResponse response = base.GetWebResponse(request, result);
            ReadCookies(response);
            return response;
        }

        protected override WebResponse GetWebResponse(WebRequest request)
        {
            WebResponse response = base.GetWebResponse(request);
            ReadCookies(response);
            return response;
        }

        private void ReadCookies(WebResponse r)
        {
            var response = r as HttpWebResponse;
            if (response != null)
            {
                CookieCollection cookies = response.Cookies;
                container.Add(cookies);
            }
        }
    }

And its usage for me is as follows:

            CookieContainer cookieJar = new CookieContainer();

            HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(_urlForLoginPage);
            req.CookieContainer = cookieJar;
            req.Method = "GET";
            Uri uri;

            // First send a request to the login page so that we can get the URL that we will be redirected to, which contains the proper
            // querystring info we'll need.
            using (HttpWebResponse response = (HttpWebResponse)req.GetResponse())
            {
                uri = response.ResponseUri;
            }

            // The c# WebClient will not persists cookies by default. Found this WebClientEx class that does what we need for this
            using (WebClientEx ex = new WebClientEx(cookieJar))
            {
                var postData = string.Format("USER={0}&PASSWORD={1}&target={2}", _uname, _pword, _urlForDownload);
                var resp = ex.UploadString(uri, postData);

                // Note that useUnsafeHeaderParsing is set to true in app.config. The response from this URL is not well-formed, so was throwing
                // an exception when parsed by the "strict" default method.
                ex.DownloadFile(_wirelessToWireline, string.Format(@"{0}\FILE1-{1}.TXT", _fileDownloadLocation, DateTime.Now.ToString("yyyyMMdd")));
                ex.DownloadFile(_wirelineToWireless, string.Format(@"{0}\FILE2-{1}.TXT", _fileDownloadLocation, DateTime.Now.ToString("yyyyMMdd")));
            }

You’ll often hear of people struggling with the 401 redirect that is sent back. It’s basically the server sending back the challenge for credentials. In my case, I needed to send the request and get the information that was appended to the querystring anyway, so it was handy. I then posted the data to the form that the application would be expecting, and downloaded my file.

Also note that the server I was downloading the information from sent back the response in a way that the .NET framework didn’t like, by default. So I had to set useUnsafeHeaderParsing to true. This was an acceptable risk for me. Make sure that you know what it means.

This took longer than I care to admit to implement, but once I found and understood the “cookie-aware” concept, it worked out pretty well.

SharePoint Throwing 503, Cause is Managed Account Failure

November 7, 2011

I knew that when I had a few dozen emails this morning regarding SharePoint being unavailable for some users that we had a real problem. Generally I can just chalk it up to users not understanding what they are doing, or locking themselves or others out due to configuration changes that they make on their own sites. But this morning seemed legit.

Our production farm has 2 web front ends. Both were “up” but I found that one of them (let’s say WEB01) was throwing lots of “503 Service Unavailable” errors in the logs. I looked at the server and noticed that the main application pool was stopped. So I started it up and went off to look at some information in the logs. I found this:

Application pool SharePoint – 80 has been disabled. Windows Process Activation Service (WAS) encountered a failure when it started a worker process to serve the application pool.

Of course this meant that the affected application pool would stop immediately as well. A bit of research revealed that this probably meant that the identity of the application pool was somehow compromised or not configured appropriately. I am using a SharePoint-managed account for the identity of this application pool and noticed that our password policy had changed the password this morning. Our policy has the passwords changing every month on the 7th between 2 and 3AM.

To make a long story short — somehow the process either did not complete or did not execute on WEB01, and it had invalid credentials for the application pool’s identity. I found that you can force a password reset for the managed accounts inside of SharePoint Central Administration. So I forced that reset, had SharePoint generate its own password, and everything is now fine.

Have others experienced the issue of the passwords not being configured appropriately during one of these changes? Is this a bug in SharePoint 2010?

Directory.GetFiles VS Directory.EnumerateFiles

April 12, 2011

Where I work, we have fairly large archives of files due to the large volume of messages received from various clients. Most of these messages are received through a VLTrader or ASP(M)X front-end. Eventually they are archived onto the file system according to some pre-determined process. The teams supporting these archives had grown concerned about the ever-increasing amount of storage required for these files. There are thousands of directories (nested many levels deep) and hundreds of thousands of files, potentially millions.

I was asked to help come up with a solution for this problem. The app needed to be configurable when run to specify the root directory and the number of days back to check the date on the file. I needed to allow them to specify that all files older than 90 days should be deleted, for example.

My initial reaction was to use the excellent (and very convenient) System.IO.Directory.GetFiles and System.IO.Directory.GetDirectories methods to simply get an array of the files and directories I would need to enumerate in order to accomplish the task. So I wrote a quick app, utilizing these methods, and saw the IOPS go crazy for a while, then do nothing, then go crazy again. All the while, not much was being accomplished. The issue, as anyone who has tried to “browse” the file system using Windows Explorer may tell you, is that getting the properties of the entire tree, including the number/size of directories and number/size of files, is quite an expensive process.

After doing a bit more research, I came upon the Directory.EnumerateFiles method, which (you guessed it) returns an enumerable collection of file names in a specified path, as opposed to Directory.GetFiles, which returns an array of file names in a specified path. The difference when checking a path with thousands of directories and hundreds of thousands if files is huge. In fact you don’t even have to have that many files/directories to see a dramatic difference. This is only available in .NET 4.0 and above. I have seen others suggest ways of doing something similar with the win32 API, but it was much easier for me to make sure I had .NET 4.0 available than it was to try and implement something using the win32 API.

Usage is simply:

foreach (string file in Directory.EnumerateFiles(rootDirectory, "*", SearchOption.AllDirectories))
                    ShouldDeleteFile(file);

When using these methods, be sure that proper permissions are available on the entire tree. See this post at Stack Overflow for more information. Otherwise you may get an exception. Speaking of permissions — part of my requirement was that I was supposed to delete all files more than 90 days old and all directories which were empty. To avoid any potential conflicts with permissions and/or file properties, the application will run as an administrator and

File.SetAttributes(filePath, FileAttributes.Normal);

is being set each time through. I’m not sure of the performance penalty this may result it. I’ll have to research and see what the hit would be.

BIN Deploy MVC 3 Project on IIS 6

April 7, 2011

Tonight I was doing a test deployment of a small MVC project into our corporate environment. We run IIS6 on most of the servers, aside for the newer SharePoint 2010 stuff. I didn’t want to have to install the MVC MSI onto the server, so I just thought I’d deploy what I needed in the bin directory. It was surprisingly frustrating, especially after all the talk I kept hearing about it should “just work.”

Here are the DLLs I needed to include in the bin folder (copy local = true):

Microsoft.Web.Infrastructure
System.Web.Entity
System.Web.Extensions
System.Web.Helpers
System.Web.Mvc
System.Web.Razor
System.Web.Routing
System.Web.WebPages
System.Web.WebPages.Deployment
System.Web.WebPages.Razor

This post from Phil Haack was also quite helpful in configuring IIS appropriately.

Parallel.ForEach

January 18, 2011

I’ve spent some time off and on over the last year or so writing various versions of web crawlers to get different information off of the web. Some of it for a potential business idea, some of it just to learn a few things. One thing I had a hard time trying to figure out was how to deal with threading. I have a list of URLs that I wanted to crawl, but I had specific things that I wanted to try and do with each one, and there were various counters I was incrementing. Plus me and threading don’t jive that well I’ve found. Maybe I’m just not smart enough for it, who knows.

As I was doing my research/learning/reading about C# in general, I ran across the excellent Parallel Processing blog from MSDN. I was fascinated by the Microsoft Biology Foundation and how they were using the parallelism support in .NET 4. The blog is a good read in general. Those guys are a bit too smart for me to keep up with, but it’s fascinating nonetheless.

I’ll let the smart guys at that blog explain it better than I can, but Parallel Processing allows you to execute additional threads if you have additional CPUs available. It’s important to note that you will not gain from this technique if some other outside resource is what is slowing down your processing. But in my case, I am going out to a website and pulling information from different pages. Parallel Processing allowed me to do this much faster than a regular foreach loop. Good stuff.

Configurable EndPoint for WCF Connecting to Authorize.NET’s ARB

October 15, 2010

Configuring a WCF service in a class library has been something that has been a struggle for me in the past. There was always something that I knew should be done differently, as it just didn’t “feel” right to have to recompile the class library when we move from a test environment to a production environment.

This specific example uses WCF to connect to Authorize.NET’s ARB service for creating subscriptions.

Here is what I came up with:

// Be sure to configure this in the database for the various environments, as needed
EndpointAddress ea = new EndpointAddress(YourDataAccess.GetUrl);

// HTTPS
BasicHttpBinding serviceBinding = new BasicHttpBinding(BasicHttpSecurityMode.Transport);
serviceBinding.ReceiveTimeout = new TimeSpan(0,0,0,20);
ARB.ServiceSoapClient service = new ARB.ServiceSoapClient(serviceBinding, ea);
ARB.ARBCreateSubscriptionResponseType response;

// Set the credentials
authentication = new ARB.MerchantAuthenticationType();
authentication.name = this.AuthNetName();
authentication.transactionKey = this.AuthNetTxn();
response = service.ARBCreateSubscription(authentication, sub);

Saving a bar code image to JPG

July 30, 2008

I’ve used the excellent iTextSharp library to generate PDFs for different projects. It works very well and has been an excellent tool. One of my recent projects had me needing to generate bar codes for use in a rebate application. The bar code would be the unique rebate ID, used by the mail room scanner to streamline and accelerate the data entry and processing. There are other libraries out there, but since I was already familiar with iTextSharp and knew that it included bar code libraries, I decided to try it out. It was so easy it was nearly ridiculous.

I decided to implement it as an HttpHandler, so that it could be accessible by different applications (including my own). In addition to the bar code, the calling application would also require being passed a unique ID along with some identifying information, which would give minimal security to the page.

It went something like this:

Page.aspx?id=123456&z=12345

Where the 2 parameters would form a unique key that would allow the user to lookup information and get the desired bar code. Inside Page.aspx, I have it calling something like this:

Here is the code for BarCode.ashx:

        public void ProcessRequest(HttpContext context)
        {
            string _barCodeId;

            if (context.Request.QueryString["id"] != null)
            {
                _barCodeId = context.Request.QueryString["id"].ToString();
            }
            else
            {
                throw new ArgumentException("No Bar Code ID specified");
            }

            context.Response.ContentType = "image/jpg";

            System.IO.MemoryStream strm = new System.IO.MemoryStream();
            iTextSharp.text.Document doc = new iTextSharp.text.Document(iTextSharp.text.PageSize.A4, 50, 50, 50, 50);
            iTextSharp.text.pdf.PdfWriter writer = iTextSharp.text.pdf.PdfWriter.GetInstance(doc, strm);
            doc.Open();

            iTextSharp.text.pdf.PdfContentByte cb = writer.DirectContent;
            iTextSharp.text.pdf.Barcode128 code128 = new iTextSharp.text.pdf.Barcode128();
            code128.Code = _barCodeId;
            code128.StartStopText = true;
            code128.GenerateChecksum = false;
            code128.Extended = true;

            code128.CreateDrawingImage(System.Drawing.Color.Black, System.Drawing.Color.White).Save(context.Response.OutputStream, System.Drawing.Imaging.ImageFormat.Jpeg);
        }

Running ReportViewer Control on Windows 2000

April 30, 2008

I’ve been working on an interesting project where we had reporting considerations. I hadn’t been very involved with it. There were some folks creating the reports, some others telling those folks what to do, and still others trying to figure out how to deploy the thing. Needless to say, there were issues with it. So I was asked to come in and see what I could do.

Microsoft has a handy ReportViewer control that really is quite cool. I came from a background where we used Actuate and the report delivery mechanism was a little clunky (probably mostly due to our implementation) and not really integrated at all. So to have this control that basically puts the reports in an IFRAME is kind of cool. It looks integrated at least.

Our environment is a little messed up. I develop on my own W2K3 server workstation and deploy to another W2K3 server. I knew that I had to impersonate a domain user in order to access the required reports. So I followed the instructions I learned here. It all worked fine, until I deployed the solution to an old W2K server, which is our public test machine (don’t ask). I could not for the life of me get it to work.

After many hours trying this that and the other, I finally figured that I *must* be doing something wrong in the impersonation. That’s the only thing that makes sense. So then I came across this. And the solution was so obvious that I felt a little sheepish in announcing my find to others. I don’t know enough about the internals to figure out why it’s happening, but I do know that I’ll consider that I need to grant the “Act as part of the operating system” privilege for the ASPNET user when running on Windows 2000. Hopefully that scenario won’t present itself again.

Strangeness in GridView’s HyperLinkField

April 10, 2008

Today I got an email from a client who I do some work for. He has an existing application that uses a GridView, probably in much the same way that many others do. There are a few BoundFields and a HyperLinkField. The HyperLinkField looked a little like this:

Pretty standard, right? We see this all the time. And it worked all the time, until we had a “Buyer” who happened to have a colon (:) in her username. It wouldn’t work. There was no error displayed on the page. But there was also no hint of a hyperlink.

I found an article that mentioned this behavior. Apparently it’s to keep your app more secure, so there isn’t malicious stuff thrown in there. In other words, it’s “by design.” So this is what I came up with in its place (the code thing isn’t parsing this properly, no matter what hackery I try):

`>

Another interesting thing is that you have to either remove the EnableSortingAndPagingCallbacks or set it to false. It makes some sense, I suppose, though I won’t pretend to know the exact reason why.

Calling Oracle Packages from C#

April 7, 2008

I’ve been working on a number of projects that required calling into an Oracle database to get various bits of data. Most of the information was related to product or customer in an ecommerce and/or ERP system. For the longest time, we’ve been relying on an old java listener that just sits there sends and receives sockets. No one is sure how it works, nor are they sure if they even have the latest source code. Everyone is scared that if we pull the latest from VSS and compile it that we’ll have a non-functioning system. They’re probably also wondering if it will even compile.

So here we are in the last few weeks of 2007 and we’re just starting to get going with .NET web services and .NET 2.0. Maybe we’ll even get rid of our old VB6 ecommerce platform (that’s a whole different story). So I found myself needing to create these web services that would call into Oracle and get what I needed. I searched a few places and didn’t find exactly what I needed, so I started with a little trial and error. My packages included CLOBs, so it was a little different. Here’s what I came up with. ProcedureInfo is just an object that holds information about the procedure that I’m calling and my connection to it.

        public static string SendDBRequest(ProcedureInfo pi)
        {
            OracleConnection con = GetPooledConnection(connectionString);
            OracleCommand cmd = con.CreateCommand();

            string clobString = pi.InMessage;

            byte[] clobContents = System.Text.ASCIIEncoding.Unicode.GetBytes(clobString);

            // Get our temporary clob
            OracleClob tempClob = new OracleClob(con);
            tempClob.Write(clobContents, 0, clobContents.Length);

            // Clear the parameters from the command object
            cmd.Parameters.Clear();

            // Set the name of the procedure
            cmd.CommandText = pi.Procedure;
            cmd.CommandType = CommandType.StoredProcedure;

            cmd.Parameters.Add("result_", OracleDbType.Varchar2, 8000);
            cmd.Parameters[0].Direction = ParameterDirection.Output;
            cmd.Parameters[0].Value = string.Empty;

            cmd.Parameters.Add("result_text_", OracleDbType.Varchar2, 8000);
            cmd.Parameters[1].Direction = ParameterDirection.Output;
            cmd.Parameters[1].Value = string.Empty;

            cmd.Parameters.Add("out_message_", OracleDbType.Clob);
            cmd.Parameters[2].Direction = ParameterDirection.Output;

            // In params
            cmd.Parameters.Add("in_message_", OracleDbType.Clob);
            cmd.Parameters["in_message_"].Value = tempClob;
            cmd.Parameters["in_message_"].Direction = ParameterDirection.Input;

            cmd.Parameters.Add("command_", OracleDbType.Varchar2);
            cmd.Parameters["command_"].Value = pi.Command;
            cmd.Parameters["command_"].Direction = ParameterDirection.Input;

            cmd.ExecuteNonQuery();

            string detailText = cmd.Parameters["result_text_"].Value.ToString();

            // Get the response from the clob
            OracleClob rtrnClob;
            string clobMsg = string.Empty;

            try
            {
                rtrnClob = (OracleClob)cmd.Parameters["out_message_"].Value;
                clobMsg = rtrnClob.Value.ToString();
            }
            catch (Exception exc)
            {
                clobMsg = exc.Message;
            }

            cmd.Dispose();
            con.Dispose();

        }