Using WebClient with Basic Authentication and Forms Authentication

I had the chance to investigate how we could automate downloads from a couple of websites. The current process is excruciatingly manual, and ripe for errors (as all manual processes are).

I first went to check out the websites to see what we were dealing with here. Is there an API that could be used to pull the files down? Is there some sort of service that they could provide to push the files to us? No. And no, of course.

So no API. No clean way to do it. I’ll just have to login programatically and download what I need. Of course the two sites I was accessing had completely different implementations.

The first one was pretty easy. It just uses basic authentication and then allows you to proceed. Why a public-facing web application uses basic authentication in 2015 I don’t know, but I guess that’s another conversation.

Here’s how I implemented it. I also needed to actually download the file by sending a POST to a particular URL. I needed to save it somewhere specific so that’s included as well.

            Uri uri = new Uri(_authUrl);

            var credentialCache = new CredentialCache();
            credentialCache.Add(
              new Uri(uri.GetLeftPart(UriPartial.Authority)), // request url's host
              "Basic",  // authentication type. hopefully they don't change it.
              new NetworkCredential(_uname, _pword) // credentials 
            );

            using (WebClient client = new WebClient())
            {
                client.UseDefaultCredentials = true;
                client.Credentials = credentialCache;

                System.Collections.Specialized.NameValueCollection formParams = new System.Collections.Specialized.NameValueCollection();

                // This is the stuff that the form on the page expects to see. Pulled from the HTML source and javascript function.
                formParams.Add("param1", "value1");
                formParams.Add("param2", "value2");
                formParams.Add("param3", "value3");
                formParams.Add("filename", _downloadFileName);

                byte[] responsebytes = client.UploadValues(_urlForDownload, "POST", formParams);

                // Write the file somewhere? NOTE: location must exist. May want to do something to make sure of that when implementing exception handling
                if (!Directory.Exists(_fileDownloadLocation))
                    Directory.CreateDirectory(_fileDownloadLocation);

                File.WriteAllBytes(string.Format(@"{0}\{1}", _fileDownloadLocation, _downloadFileName), responsebytes);
            }

The other website used Forms Authentication in its implementation. While this was a welcomed difference (since, again it’s 2015), it did make it a little bit more difficult.

I couldn’t just use C#’s WebClient again because it doesn’t deal with cookies. And most applications on the internet use sessions, cookies, and other such hackery to keep track of you and make sure that you’re really logged in and are who you say you are.

I found an implementation of what seems to be called a “cookie-aware WebClient.” I don’t recall which site I got it from, but many implement it in a very similar way. Here is the code for a class called WebClientEx. It simply extends WebClient:

    public class WebClientEx : WebClient
    {
        public WebClientEx(CookieContainer container)
        {
            this.container = container;
        }

        private readonly CookieContainer container = new CookieContainer();

        protected override WebRequest GetWebRequest(Uri address)
        {
            WebRequest r = base.GetWebRequest(address);
            var request = r as HttpWebRequest;
            if (request != null)
            {
                request.CookieContainer = container;
            }
            return r;
        }

        protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
        {
            WebResponse response = base.GetWebResponse(request, result);
            ReadCookies(response);
            return response;
        }

        protected override WebResponse GetWebResponse(WebRequest request)
        {
            WebResponse response = base.GetWebResponse(request);
            ReadCookies(response);
            return response;
        }

        private void ReadCookies(WebResponse r)
        {
            var response = r as HttpWebResponse;
            if (response != null)
            {
                CookieCollection cookies = response.Cookies;
                container.Add(cookies);
            }
        }
    }

And its usage for me is as follows:

            CookieContainer cookieJar = new CookieContainer();

            HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(_urlForLoginPage);
            req.CookieContainer = cookieJar;
            req.Method = "GET";
            Uri uri;

            // First send a request to the login page so that we can get the URL that we will be redirected to, which contains the proper
            // querystring info we'll need.
            using (HttpWebResponse response = (HttpWebResponse)req.GetResponse())
            {
                uri = response.ResponseUri;
            }

            // The c# WebClient will not persists cookies by default. Found this WebClientEx class that does what we need for this
            using (WebClientEx ex = new WebClientEx(cookieJar))
            {
                var postData = string.Format("USER={0}&PASSWORD={1}&target={2}", _uname, _pword, _urlForDownload);
                var resp = ex.UploadString(uri, postData);

                // Note that useUnsafeHeaderParsing is set to true in app.config. The response from this URL is not well-formed, so was throwing
                // an exception when parsed by the "strict" default method.
                ex.DownloadFile(_wirelessToWireline, string.Format(@"{0}\FILE1-{1}.TXT", _fileDownloadLocation, DateTime.Now.ToString("yyyyMMdd")));
                ex.DownloadFile(_wirelineToWireless, string.Format(@"{0}\FILE2-{1}.TXT", _fileDownloadLocation, DateTime.Now.ToString("yyyyMMdd")));
            }

You’ll often hear of people struggling with the 401 redirect that is sent back. It’s basically the server sending back the challenge for credentials. In my case, I needed to send the request and get the information that was appended to the querystring anyway, so it was handy. I then posted the data to the form that the application would be expecting, and downloaded my file.

Also note that the server I was downloading the information from sent back the response in a way that the .NET framework didn’t like, by default. So I had to set useUnsafeHeaderParsing to true. This was an acceptable risk for me. Make sure that you know what it means.

This took longer than I care to admit to implement, but once I found and understood the “cookie-aware” concept, it worked out pretty well.

SharePoint Throwing 503, Cause is Managed Account Failure

I knew that when I had a few dozen emails this morning regarding SharePoint being unavailable for some users that we had a real problem. Generally I can just chalk it up to users not understanding what they are doing, or locking themselves or others out due to configuration changes that they make on their own sites. But this morning seemed legit.

Our production farm has 2 web front ends. Both were “up” but I found that one of them (let’s say WEB01) was throwing lots of “503 Service Unavailable” errors in the logs. I looked at the server and noticed that the main application pool was stopped. So I started it up and went off to look at some information in the logs. I found this:

Application pool SharePoint – 80 has been disabled. Windows Process Activation Service (WAS) encountered a failure when it started a worker process to serve the application pool.

Of course this meant that the affected application pool would stop immediately as well. A bit of research revealed that this probably meant that the identity of the application pool was somehow compromised or not configured appropriately. I am using a SharePoint-managed account for the identity of this application pool and noticed that our password policy had changed the password this morning. Our policy has the passwords changing every month on the 7th between 2 and 3AM.

To make a long story short — somehow the process either did not complete or did not execute on WEB01, and it had invalid credentials for the application pool’s identity. I found that you can force a password reset for the managed accounts inside of SharePoint Central Administration. So I forced that reset, had SharePoint generate its own password, and everything is now fine.

Have others experienced the issue of the passwords not being configured appropriately during one of these changes? Is this a bug in SharePoint 2010?

Migrate MOSS 3.0 Using SQL Server Embedded Edition

We are in the midst of a fairly large migration from SharePoint/MOSS 2007/3.0 to SharePoint 2010. We have several content databases on an existing MOSS 2007 farm that have been migrated; and fairly successfully migrated so far. I’ll post more on that soon.

In addition to this farm, there was one rogue server we had running SharePoint for a contact center. This instance was running completely outside of the scope of any corporate IT support structure and need to be migrated so that it could be supported appropriately. I found that it was running SharePoint V12.0.0.6545. As I went to backup the database, I found the connection string (SERVERNAME\Microsoft##SSEE) inside of central administration and attempted to connect. To my surprise, I was unable to.

It turns out that this instance is actually running SQL Server Embedded Edition, which means that I needed to put this in as the connection in SSMS: \\.\pipe\mssql$microsoft##ssee\sql\query. Apparently this is frowned upon, unsupported, etc., but I needed to get into the database to see what was there.

Now to do the actual migration/upgrade to the 2010 farm. More on that later.

Directory.GetFiles VS Directory.EnumerateFiles

Where I work, we have fairly large archives of files due to the large volume of messages received from various clients. Most of these messages are received through a VLTrader or ASP(M)X front-end. Eventually they are archived onto the file system according to some pre-determined process. The teams supporting these archives had grown concerned about the ever-increasing amount of storage required for these files. There are thousands of directories (nested many levels deep) and hundreds of thousands of files, potentially millions.

I was asked to help come up with a solution for this problem. The app needed to be configurable when run to specify the root directory and the number of days back to check the date on the file. I needed to allow them to specify that all files older than 90 days should be deleted, for example.

My initial reaction was to use the excellent (and very convenient) System.IO.Directory.GetFiles and System.IO.Directory.GetDirectories methods to simply get an array of the files and directories I would need to enumerate in order to accomplish the task. So I wrote a quick app, utilizing these methods, and saw the IOPS go crazy for a while, then do nothing, then go crazy again. All the while, not much was being accomplished. The issue, as anyone who has tried to “browse” the file system using Windows Explorer may tell you, is that getting the properties of the entire tree, including the number/size of directories and number/size of files, is quite an expensive process.

After doing a bit more research, I came upon the Directory.EnumerateFiles method, which (you guessed it) returns an enumerable collection of file names in a specified path, as opposed to Directory.GetFiles, which returns an array of file names in a specified path. The difference when checking a path with thousands of directories and hundreds of thousands if files is huge. In fact you don’t even have to have that many files/directories to see a dramatic difference. This is only available in .NET 4.0 and above. I have seen others suggest ways of doing something similar with the win32 API, but it was much easier for me to make sure I had .NET 4.0 available than it was to try and implement something using the win32 API.

Usage is simply:

foreach (string file in Directory.EnumerateFiles(rootDirectory, "*", SearchOption.AllDirectories))
                    ShouldDeleteFile(file);

When using these methods, be sure that proper permissions are available on the entire tree. See this post at Stack Overflow for more information. Otherwise you may get an exception. Speaking of permissions — part of my requirement was that I was supposed to delete all files more than 90 days old and all directories which were empty. To avoid any potential conflicts with permissions and/or file properties, the application will run as an administrator and

File.SetAttributes(filePath, FileAttributes.Normal);

is being set each time through. I’m not sure of the performance penalty this may result it. I’ll have to research and see what the hit would be.

BIN Deploy MVC 3 Project on IIS 6

Tonight I was doing a test deployment of a small MVC project into our corporate environment. We run IIS6 on most of the servers, aside for the newer SharePoint 2010 stuff. I didn’t want to have to install the MVC MSI onto the server, so I just thought I’d deploy what I needed in the bin directory. It was surprisingly frustrating, especially after all the talk I kept hearing about it should “just work.”

Here are the DLLs I needed to include in the bin folder (copy local = true):

  • Microsoft.Web.Infrastructure
  • System.Web.Entity
  • System.Web.Extensions
  • System.Web.Helpers
  • System.Web.Mvc
  • System.Web.Razor
  • System.Web.Routing
  • System.Web.WebPages
  • System.Web.WebPages.Deployment
  • System.Web.WebPages.Razor

This post from Phil Haack was also quite helpful in configuring IIS appropriately.

Get host name from IP

My networking knowledge is fairly limited. I know more than the average person, I’m sure, but when I start talking to people who are true network engineers, I realize very quickly that my knowledge is pretty cursory.

I had a need recently to get the host name from a windows server. I only had the IP address available. I was told about nbtstat, which allows you to request NetBIOS information from the IP address. I’m sure it’s useful for much more than that, but this was my immediate need.

Its usage is as follows: nbtstat -a ipaddress

Parallel.ForEach

I’ve spent some time off and on over the last year or so writing various versions of web crawlers to get different information off of the web. Some of it for a potential business idea, some of it just to learn a few things. One thing I had a hard time trying to figure out was how to deal with threading. I have a list of URLs that I wanted to crawl, but I had specific things that I wanted to try and do with each one, and there were various counters I was incrementing. Plus me and threading don’t jive that well I’ve found. Maybe I’m just not smart enough for it, who knows.

As I was doing my research/learning/reading about C# in general, I ran across the excellent Parallel Processing blog from MSDN. I was fascinated by the Microsoft Biology Foundation and how they were using the parallelism support in .NET 4. The blog is a good read in general. Those guys are a bit too smart for me to keep up with, but it’s fascinating nonetheless.

I’ll let the smart guys at that blog explain it better than I can, but Parallel Processing allows you to execute additional threads if you have additional CPUs available. It’s important to note that you will not gain from this technique if some other outside resource is what is slowing down your processing. But in my case, I am going out to a website and pulling information from different pages. Parallel Processing allowed me to do this much faster than a regular foreach loop. Good stuff.

SQL Server Date Functions

Here are some handy date functions that I find myself looking up occasionally (especially the “last day of”-type things):

—-Today
SELECT GETDATE() ‘Today’
—-Yesterday
SELECT DATEADD(d,-1,GETDATE()) ‘Yesterday’
—-First Day of Current Week
SELECT DATEADD(wk,DATEDIFF(wk,0,GETDATE()),0) ‘First Day of Current Week’
—-Last Day of Current Week
SELECT DATEADD(wk,DATEDIFF(wk,0,GETDATE()),6) ‘Last Day of Current Week’
—-First Day of Last Week
SELECT DATEADD(wk,DATEDIFF(wk,7,GETDATE()),0) ‘First Day of Last Week’
—-Last Day of Last Week
SELECT DATEADD(wk,DATEDIFF(wk,7,GETDATE()),6) ‘Last Day of Last Week’
—-First Day of Current Month
SELECT DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0) ‘First Day of Current Month’
—-Last Day of Current Month
SELECT DATEADD(ms,- 3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,GETDATE())+1,0))) ‘Last Day of Current Month’
—-First Day of Last Month
SELECT DATEADD(mm,-1,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) ‘First Day of Last Month’
—-Last Day of Last Month
SELECT DATEADD(ms,-3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0))) ‘Last Day of Last Month’
—-First Day of Current Year
SELECT DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0) ‘First Day of Current Year’
—-Last Day of Current Year
SELECT DATEADD(ms,-3,DATEADD(yy,0,DATEADD(yy,DATEDIFF(yy,0,GETDATE())+1,0))) ‘Last Day of Current Year’
—-First Day of Last Year
SELECT DATEADD(yy,-1,DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0)) ‘First Day of Last Year’
—-Last Day of Last Year
SELECT DATEADD(ms,-3,DATEADD(yy,0,DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0))) ‘Last Day of Last Year’

—-TodaySELECT GETDATE() ‘Today’—-YesterdaySELECT DATEADD(d,-1,GETDATE()) ‘Yesterday’—-First Day of Current WeekSELECT DATEADD(wk,DATEDIFF(wk,0,GETDATE()),0) ‘First Day of Current Week’—-Last Day of Current WeekSELECT DATEADD(wk,DATEDIFF(wk,0,GETDATE()),6) ‘Last Day of Current Week’—-First Day of Last WeekSELECT DATEADD(wk,DATEDIFF(wk,7,GETDATE()),0) ‘First Day of Last Week’—-Last Day of Last WeekSELECT DATEADD(wk,DATEDIFF(wk,7,GETDATE()),6) ‘Last Day of Last Week’—-First Day of Current MonthSELECT DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0) ‘First Day of Current Month’—-Last Day of Current MonthSELECT DATEADD(ms,- 3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,GETDATE())+1,0))) ‘Last Day of Current Month’—-First Day of Last MonthSELECT DATEADD(mm,-1,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)) ‘First Day of Last Month’—-Last Day of Last MonthSELECT DATEADD(ms,-3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0))) ‘Last Day of Last Month’—-First Day of Current YearSELECT DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0) ‘First Day of Current Year’—-Last Day of Current YearSELECT DATEADD(ms,-3,DATEADD(yy,0,DATEADD(yy,DATEDIFF(yy,0,GETDATE())+1,0))) ‘Last Day of Current Year’—-First Day of Last YearSELECT DATEADD(yy,-1,DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0)) ‘First Day of Last Year’—-Last Day of Last YearSELECT DATEADD(ms,-3,DATEADD(yy,0,DATEADD(yy,DATEDIFF(yy,0,GETDATE()),0))) ‘Last Day of Last Year’

I originally found them on the excellent SQL Authority blog.

HCPCS 2011 ICD9 Codes

There’s been a bit of activity on the OpenEMR lists lately about the ability to import the ICD9 codes into the application. Apparently there are some Perl scripts which go out to a particular website, extract the data, and pull it down for the application to use. I’ve been wanting an excuse to try the Parallel.ForEach functionality in C# 4.0 and see how it works with threading. This provided a perfect opportunity to write a quick program which would go out, parse the site and data, and pull it down. In addition to the Parallel functions, I’ve also used the excellent HtmlAgilityPack to parse the data.

I’m not exactly sure about where the data ends up yet (I’m not as familiar with the OpenEMR data model as I should be), so all I have for now is a tab-delimited text file which simply contains the code “type” (all HCPCS in this case), ICD9 code, and its description. I’ll have to poke through the OpenEMR code and database in the coming days and see what is done with the data. Perhaps then I can create a SQL file that someone can then load in phpMyAdmin inside of OpenEMR.

The file is located here: hcpcs 2011 ICD9

Drupal and CiviCRM

I have come across a project that I think might be a great fit for Drupal and CiviCRM. These are 2 excellent open source projects that I’ve been wanting to use for some time. Drupal always intimidated me somehow, with all of its fancy taxonomy and nodes and what not, but I think that this project will allow me to really figure out if I’m up to the task or not.

The install was a bit tricky, at least the CiviCRM part was. Drupal was installed with Fantastico (which is certainly fantastico), so that was a piece of cake. I kept reading about putting the CiviCRM files into /sites/all/modules, but that path didn’t exist in my Drupal installation. It seemed logical that I would put the directory into the /modules directory, especially after I saw the CiviCRM stuff show up in the modules section after I put it there. But I could not get it right.

I ended up finding a post on the excellent drupal.org which instructed me to simply create the missing directory, load the files, and run the install. After that it just worked.

I hope to chronicle the experience as I go. The hope is that I can build and configure something which is sort of a mini-EMR with great reporting already baked in. We’ll see if this thought makes sense.

Off to learn!