This project has moved and is read-only. For the latest updates, please go here.

Importing and exporting data - performance

Jun 4, 2010 at 8:53 AM

Hi! First of all I want to thank You for a great component :)

I wonder what the best apporoach is for importing/exporting dataTables from/to xlsx.

Im relatively new to both programming and VS, so therefore use VB.net not c#.

On the homepage, quote: "The access to cells has been totally rewritten using dictionaries, to gain better performance. You can now load 50 000 cells in seconds."

When saving to xlsx I iterate approx 1000 rows and 400 columns, takes about 30-45 seconds.

Is this the best opproach or im I missing soemthing here?

For instance, is it possible to import and save a whole worksheet, "wsheet.ExtractToDataTable(dataTable, wsheet.Rows[start], wsheet.Columns[start])"

And similar for putting data from a table to a worksheet, setting a cellID as start cell....

 

I can live with the performance, but if finetunig is possible, its always welcome :)

Keep up the good work :)

Jun 4, 2010 at 2:05 PM

Hi

Loading 400 000 cells in 30-45 sek sounds normal I guess.

The performance depends on a lot of things, like what kind of data you load  (a lot of strings take longer time), what kind of formatting you do and the performance of you database query, of course.

Generally, always load data from the top, Left to right to avoid the internal index to be resorted (thats the way you do it if you load a datatable).

That means that if you add a formula on the data it's faster to add it for every row than do it for a whole range after you have loaded the datatable (since that will insert cells in the sorted internal index).

Avoid to use the InsertRow and DeleteRow methods.

You can use the LoadFromDataTable-method on a range to load a datatable. I have not added a method for the other way around.

Jan

 

Jun 4, 2010 at 3:50 PM

Thanks for the reply!

quote: "You can use the LoadFromDataTable-method on a range to load a datatable. I have not added a method for the other way around. "

So when saving to an xlsx I can 'loadFromDataTable' - but when importing an xlsx I have to iterate and place each cell - one at a time. Correct?

Svein

Jun 6, 2010 at 1:34 PM

Correct

Sep 24, 2010 at 3:56 PM
Edited Sep 24, 2010 at 3:58 PM

dear jankallman you say :
"That means that if you add a formula on the data it's faster to add it for every row than do it for a whole range after you have loaded the datatable (since that will insert cells in the sorted internal index)."

i have a datatable with 70 000 rows and 100 columns with a lot of string, with LoadFromDataTable-method it's very quick ( 5 minutes, it's good for me), but when a try to add a column with formula using cell range, example: 

sheet.Cells["CU2:CU" + (dt.Rows.Count + 1).ToString()].Formula = "CE2-BB2"

,

this take another 5 minutes for each column. How i can add a formula in my data.?
Your recommend me to write cells using LoadFromDataTable and later iterate for place formula cell by cell ?

wich is the faster method to add formula cells in a excel thar have many rows( like 70 000 with 100 columns) ¿?

 

Sep 27, 2010 at 7:15 AM

Hi,

Here is a few hints to get better performance with huge amounts of data.

The fastest way to add cells is always from Left to Right-Top to bottom.

If you have a datatable to start with use the LoadFromDataTable method, it's a little bit faster than using the Cells collection. For every formula column, add an empty column to the datatable to ensure the cells are created in order.

If you don't have a datatable (or if you can't or don't want to add columns to the datatable), just use the Cells collection and create the cells in order.

The reason cells should be created in order is that the internal RangeCollection class uses a a sorted array that points into a list of Cells. If you add a cell at the top, a "pointer item" will be inserted in the sorted array and all items after will be shifted down.

If you have millions of cells and insert a new column, that will take a looong time to complete, but if you add the cells in order the "pointer item" will allways be addad at the end.

I guess you could aviod this by implementing some sort of paging in the sorted array, but that is something for the future,

Have a look at the RangeCollection class if you are interested.

Jan

Oct 1, 2010 at 5:43 AM

Thanks Jankallman, i iterate trougth datatable and add cell by cell and i got a good time, aprox 5 minutes ( from load my data and put into excel), my datatable contains 70 000 rows and 100 columns, this works fine in this escenari, bue when i have a datataable with 90 000 rows and 300 columns i get a systemoutofmemory exception = (

Aug 31, 2013 at 9:35 AM
Hi, some results from my own investigation about performance loading huge amounts of data.

Scenario: Load 50.000 rows with 15 columns, no NumberFormat, Formulae, Styles, just plain data. The data loaded is mainly string, but contains also short, int, long and DateTime. The data to load is a list of objects (List<T>) that need some customization so need to be transformed into somewhat that will passed to EPPlus.

The approach with best performance we be found is LoadFromArrays using object[][] as the container of the data to load. Requires 300ms to prepare the data (converting List<T> to object[][] with some customization) and 9.800ms to load the 750.000 cells on a 1-year laptop with Core i7.

The second approach with best performance is LoadFromDataTable. The time to load the cells is very similar, but preparing the data tooks 1.300ms as the DataTable storage is not as efficient as object[][].

The last and worst scenario is of course Cells[row][column].Value = data, from left to right and from top to bottom as usual.

So, if you have your data already in a DataTable, use LoadFromDataTable but if you have your data in any other form, convert it to object[][] and use LoadFromArrays.

We did not test LoadFromCollection as the use of reflection may impose an overhead versus LoadFromArrays so if you can generate the object[][] easily, LoadFromArrays is your best option.

We use reflection in LoadFromArrays for generating the object[][] but in a very efficient way (we think!).