New Posts New Posts RSS Feed: Performance for high volume writes
  FAQ FAQ  Forum Search   Calendar   Register Register  Login Login

Performance for high volume writes

 Post Reply Post Reply
Author
PJones View Drop Down
Newbie
Newbie
Avatar

Joined: 10-Oct-2007
Location: United States
Posts: 10
Post Options Post Options   Quote PJones Quote  Post ReplyReply Direct Link To This Post Topic: Performance for high volume writes
    Posted: 10-Oct-2007 at 5:05am
I'm a developer working on evaluating this product for our project.
 
The issue is that we will have a need for a high performance write of bulk output data from a calculation module.  The output could easily top 10GB for a large run.  We had previously developed a WCF service that performed, we think, acceptably well being able to write rows of 105 fields (100 floating point values plus some int's and a date) into a SQL Server 2005 database table from a console application that generated a bulk of rows.
 
This test application was able to write 1,000,000 rows of the above type in about 2min 10 sec. or so.  Using the IdeaBlade software and a 10,000 "row" set of objects the write times were between 1min 15sec and 2min 10sec as apposed to the same amount of rows in the WCF Service which write in under 3sec.
 
Is there a way to get higher performance on the writes?
 
I also noticed a significant overhead in the instancing of new objects, it is a few orders of magnitude slower than normal C#, are there any settings or configuration that might speed things up?
 
Kind regards,
 
Pete Jones
Back to Top
PJones View Drop Down
Newbie
Newbie
Avatar

Joined: 10-Oct-2007
Location: United States
Posts: 10
Post Options Post Options   Quote PJones Quote  Post ReplyReply Direct Link To This Post Posted: 10-Oct-2007 at 5:21am

I just tried setting my PersistenceManager.VerifierEngine.Enabled property to false (from another thread) and the timing test actually show worse results than with it enabled?

Start: 10/10/2007 8:16:40 AM
Stop: 10/10/2007 8:19:02 AM
Elapsed time: 00:02:22.4089842

Any suggestions would be much appreciated.
 
 
Back to Top
IdeaBlade View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 30-May-2007
Location: United States
Posts: 346
Post Options Post Options   Quote IdeaBlade Quote  Post ReplyReply Direct Link To This Post Posted: 10-Oct-2007 at 7:55pm

You have run into the limit of what object oriented persistence (OOP) can handle effectively.

I'm not alone in thinking so. Rocky Lhotka makes the same point (I'm looking for the references but they've escaped me for the nonce). There is a good discussion of when to use OOP and when not to use it in Martin Fowler's Patterns of Enterprise Application Architecture.

We (IdeaBlade and Rocky) love OOP when we're making active use of rich behavior associated with our data. At some point, however, we are asking OOP to do what it was never intended to do - efficient processing of large volumes of data. We're talking an extremely low ratio of logic to data.

Fortunately, DevForce has the hooks to let you step outside of the OOP paradigm when you have to do so.

==

You mentioned that you already have a (WCF) service that works well. You might want to use this as is. We can talk about how later. In the next section, I describe another approach.

==

We have a customer who had to insert 350,000 records in a single shot. First try with DF normal save took 20 - 25 minutes. Next try with raw ADO.NET took about the same ("a little faster, not much"). Thus there is no faster standard way to insert a ton of records. It's not a DF issue. The .NET standard way to insert records is one record at a time and SQL Server can only eat the records at a certain pace.

Then he went with BulkCopy (SQL Server only) - and the insert took 10 seconds!

Please note that he was using a DevForce app and DevForce mechanisms to prepare for insert. He just side-stepped DF save at the last moment by isolating the in-memory table with the huge number of inserts and using BulkCopy to jam the records into the database.

The following pseudo code is four lines approximately that look like so:

DataTable aTable = ... // a DataTable with the records to insert

System.Data.SqlClient.SqlBulkCopy bulkCopy =

new System.Data.SqlClient.SqlBulkCopy(connectionString);

bulkCopy.DestinationTableName = "high_volume_table";

bulkCopy.WriteToServer(aTable);

Note that the BulkCopy bypasses transactions, triggers, referential integrity checks, etc. That's the price for speed ... And probably makes sense for data sources (e.g., calculation methods) that generate a ton of data.

I suppose you could always re-read and verify that the data made it but that's probably over the top. No reason for triggers. No worry about transactions (since all data are new). Just have to be careful about referential integrity.

Yes it's a hack ... but it does the job and you don't have to stray far from DevForce to do it.

===

I should mention that you probably want to GENERATE this data close to the data tier. Performance will be terrible if you create millions of objects on a remote client and ship them over the wire. You want to create them on the host side if at all possible. The DevForce RPC mechanism helps you initiate and control that server-side process from a remote client.

===

When there is virtually no human interaction - no UI to worry about - and the objective is to crunch, crunch, crunch as fast as you can, OOP may not be an acceptable approach. This different scenario is more suited to traditional procedural methodology - what Fowler calls "Transaction Script".

I trust that your application does more than just crank out tons of new records. Surely you're using other kinds of less numerous business objects, objects that will hang around for awhile, objects with complex logic, objects you display to users, objects that change at the pace of human/computer interaction. Here is where OOP comes into its own; here OOP saves time and reduces complexity.

Back to Top
PJones View Drop Down
Newbie
Newbie
Avatar

Joined: 10-Oct-2007
Location: United States
Posts: 10
Post Options Post Options   Quote PJones Quote  Post ReplyReply Direct Link To This Post Posted: 11-Oct-2007 at 4:56am

Moderator,

The approach you mention, SQL Bulk Copy using a datatable is precisely how our WCF service does it's work.  There are some additional enhancements that get it to work faster as well but the bulk copy is the heart of the operation.
 
I don't understand your "hack" and how it fits into an IdeaBlade implementation though, looks like plain Code to me?  Am I missing something?
 
Yes our app will have more than just number cruching and we may decided to use IdeaBlade to facilitate business logic for other parts of the system.  What concerns me also is the amount of time required to instantiate a new object.  It's 500+ times (litterally 26sec vs. 4msec for 10,000 objects) slower than newing a class object in code yourself.  I can appreciate some overhead but that is IMHO alarmingly too much.
 
I like object modeling in concept but don't feel the application performance has to suffer because of it.
 
Pete
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down