Saturday, January 24, 2009

SSIS Screen Scraper

I wanted to save one of my co-workers a bunch of monkey work, so I wrote this screen scraper to get the html he needed.

Please use this sample at your own risk.  It worked on my box and that is all I know.   I stole, err, borrowed heavily from others.  The three main victims are:





To run the sample you will need 2 tables created in a SQL database.  I used SQL 2008 and my database name is SandBox.

The scripts to create the tables are in the SQL folder along with one to insert data into the SourceURLs table.

The SSIS package and SQL can be downloaded from my SkyDrive.

Have fun.

5 comments:

Anonymous said...

Hi Paul. I've opened your zip file, created the sandbox db (2008) and run the scripts to create the tables and perform the insert. Confirmed the tables and data exists and made sure the ssis connection tests ok.

Now I have hit a brick wall and am not sure where to look within the package - nor do I understand the errors!

When I run it I get these 2 errors.

[SS_DST_HTML_SnippetTable [17]] Error: Unable to prepare the SSIS bulk insert for data insertion.

and

[SSIS.Pipeline] Error: component "SS_DST_HTML_SnippetTable" (17) failed the pre-execute phase and returned error code 0xC0202071.

I'm new to SSIS coming from a DTS (as a DBA) background and wanted to do my own screen scraping. Any idea of what the errors mean and how I can get past them?

Paul S. Waters said...

Hello,
Thank you for your comment. To my surprise I was able to replicate your exact errors. Fortunately, it is easy to resolve. Run Visual Studio or BIDS as Administrator and then open the project.

You should now be able to run the package.

I first created this on Windows XP and now have Windows 7.

Paul

Anonymous said...

Hi Paul, This is Ram. I am running the Visual Studio as Administrator on the server, I still get the Unable to prepare the SSIS bulk insert for data insertion. This package was developed on XP and now I am trying to run on window 7.

TomH said...
This comment has been removed by a blog administrator.
Paul S. Waters said...

TomH,

I have fixed the issue with the link. Thanks for letting me know.