Problem when writing large files - OutOfMemoryException

Oct 8, 2010 at 2:47 AM

Hi everyone,

I'm using EPPlus for writing large data to Excel file (with approximately 160,000 rows and 50 columns).

I've used the following code as directive (because I'm developing an Asp.Net MVC 2.0 web app):

	public ActionResult Export()
        {
            var template = new FileInfo(Path.Combine(Request.PhysicalApplicationPath, @"Content\template.xlsx"));
            using (ExcelPackage package = new ExcelPackage(template, true))
            {
                // fill cells
                //...

                return File(package.GetAsByteArray(), "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "Data.xlsx");
            }
        }

But the 'System.OutOfMemoryException' was thrown.

Furthermore, I'm using jankallman's method to fill the data to cells: "The fastest way to add cells is always from Left to Right-Top to bottom." and the exported data is stored in List<T> object.

I don't know how to writing large data to Excel file avoiding OutOfMemory exception.

Any suggestions in this case?

 

Thanks guy,

Coordinator
Oct 11, 2010 at 5:02 PM

Hi,

I did a performance test on my dev machine (8GB ram) with 650,000 rows and 40 columns (I duplicated the columns of sample 7 a few times).It took around 10min. The result is a 237 MB xlsx. The sheet1.xml file is 1.3Gb, so it takes a lot of memory.

If I increase the rows more than that things starts to go really slow (i guess the memory is swaped out to disk).

Check the taskmanager when you generate the file and get the exception to see if you are out of memory.

Jan

Oct 12, 2010 at 4:38 AM
Edited Oct 12, 2010 at 8:04 AM

Hi Jan,

I'm currently exporting data using XML Spreadsheet 2003 (*.xml) format. The exported file is quite large (~400MB), but when I open it and save as Excel 2007 (*.xlsx) format, its size down to ~40MB.

I'm using TextWriter.Write(Response.Output) to append each row of the sheet from an ASP.NET MVC app on my dev mechine (2GB RAM) and it's OK. I only want to using EPPlus for export to new format to reduce the file's size. I don't know why export to XML format with 400MB is OK but cannot export to XLSX fomat with 40MB.

Does EPPlus support similar way to save file similar with saving XML as above? That doesn't need to load all the content of the file before save file out?

Update: here is the stack trace output when I called "ExcelPackage.SaveAs(Response.OutputStream)"

[OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.]
   System.IO.MemoryStream.set_Capacity(Int32 value) +63
   System.IO.MemoryStream.EnsureCapacity(Int32 value) +55
   System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count) +245
   MS.Internal.IO.Packaging.PackagingUtilities.CopyStream(Stream sourceStream, Stream targetStream, Int64 bytesToCopy, Int32 bufferSize) +260
   MS.Internal.IO.Packaging.SparseMemoryStream.WriteToStream(Stream stream) +53
   MS.Internal.IO.Zip.ZipIOFileItemStream.Save() +583
   MS.Internal.IO.Zip.ZipIOLocalFileBlock.Save() +163
   MS.Internal.IO.Zip.ZipIOBlockManager.SaveContainer(Boolean closingFlag) +483
   MS.Internal.IO.Zip.ZipIOBlockManager.SaveStream(ZipIOLocalFileBlock blockRequestingFlush, Boolean closingFlag) +56
   MS.Internal.IO.Zip.ZipIOModeEnforcingStream.Dispose(Boolean disposing) +84
   System.IO.Stream.Close() +17
   System.IO.Packaging.PackagePart.Close() +114
   System.IO.Packaging.Package.DoClose(PackagePart p) +194
   System.IO.Packaging.Package.DoOperationOnEachPart(PartOperation operation) +261
   System.IO.Packaging.Package.System.IDisposable.Dispose() +139
   System.IO.Packaging.Package.Close() +8
   OfficeOpenXml.ExcelPackage.Dispose() +33

 

Coordinator
Oct 12, 2010 at 5:54 PM

Hi,

Epplus supports the Xlsx-format only.

It's not the XML that consumes the most memory, it is the in-memory cell objects. Each cell is stored in the ExcelCell class that holds some properties about the cell, like the value, formula, styling, hyperlinks, comments etc. I'm note sure how large each object is but assume it is 200 bytes including the internal index, then the size will be 200*160,000*40≈1,2GB only to store the cells. Then you will have a memorystream for the package and a few other things. In the end it will require a lot of memory. I guess the ExcelCell class can be optimized to use less memory , but that is something for the future.

Jan

Jul 12, 2011 at 2:31 AM

Is there any way to do an incremental write, so that say every 100 rows the file is written to the database and the memory cleaned up?

Would saving the file, disposing of the ExcelPackage and then loading the file and appending data in work?

We're doing a data dump of 200,000 rows with 100-200 columns in ASP.NET/IIS6 x86 and sometimes get an out of memory exception and sometimes don't.

Dec 27, 2011 at 1:12 PM

Probably a little late, but I have this exact same issue. I made an article in my blog with a solution for this. Here is the link:

 

http://blog.emendezrivas.com/?p=45

 

Hope this helps.

Apr 5, 2013 at 10:32 PM
I'm sorry, but I need to raise this question again. I am trying to write 11.000 columns with 900 rows each, which gives a total of nearly 10 million cells, and consumes like 1.5Gb of RAM.

I've tried everything: directly writing to "Cells[x, y].Value", writing everything to an Array and try to read with LoadFromCollection and now I've tried writing all my info to an array, create a CSV from it, and read it through LoadFromText, because I thought it would give me a better performance.

The question is: how can I write such amount of information to a spreadsheet? If I write all my data to an array, the whole program consumes no more than 500Mb. Is it possible to write to "Cells" without this huge overhead? I have all information I need in one place, so I can do it in one-pass, or save the file little by little... Any suggestion will help!

Thank you in advance!
Sep 10, 2013 at 9:44 AM
Hello

I found myself in a similar situation some time ago. I needed to generate very large files (>200MB) and I always got an OutOfMemoryException
I finally found a workaround referencing DocumentFormat.OpenXml directly. The work is a two step process

Step 1. Create an Xml file with your data. I use a XmlWriter. Data format is as follows (this worked for me but I think format could be somehow different as defined by OpenXml)
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <sheetData>

    <row r="1">
      <c r="A1" t="inlineStr">
        <is>
          <t>Some Value row 1 column A</t> 
        </is>
      </c>
      <c r="B1" t="inlineStr">
        <is>
          <t>Other value row 1 column B</t> 
        </is>
      </c>
      ...
    </row>

    <row r="2">
      <c r="A2" t="inlineStr">
        <is>
          <t>Some Value row 2 column A</t> 
        </is>
      </c>
      <c r="B2" t="inlineStr">
        <is>
          <t>Other value row 2 column B</t> 
        </is>
      </c>
      ...
    </row>

    ....

  </sheetData>
</worksheet>
Step 2. Once you have your xml file (as large as your drive capacity or your operating system can hold) you can use this code to create the xlsx file
using (SpreadsheetDocument oExcel = SpreadsheetDocument.Create(sFullNameDes, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook))
{
    // Create workbook, worksheet
    WorkbookPart oWorkbook = oExcel.AddWorkbookPart();
    WorksheetPart oWorksheet = oWorkbook.AddNewPart<WorksheetPart>();
    string sheetId = oWorkbook.GetIdOfPart(oWorksheet);
    string XML = @"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?><workbook xmlns=""http://schemas.openxmlformats.org/spreadsheetml/2006/main"" xmlns:r=""http://schemas.openxmlformats.org/officeDocument/2006/relationships""><sheets><sheet name=""{1}"" sheetId=""1"" r:id=""{0}"" /></sheets></workbook>";
    XML = string.Format(XML, sheetId, "Content");
    using (Stream stream = oWorkbook.GetStream())
    {
        byte[] buffer = (new UTF8Encoding()).GetBytes(XML);
        stream.Write(buffer, 0, buffer.Length);
    }
    // Import sheet
    Stream oStremOut = oWorksheet.GetStream();
    // My buffer
    byte[] buffer = new byte[1000000];
    int nBytesRead = 0;
    FileStream oFileStream = new FileStream(sFullNameOfXmlGeneratedInStep1, FileMode.Open);
    do
    {
        // Read data
        nBytesRead = oFileStream.Read(buffer, 0, 1000000);
        if (nBytesRead > 0) oStremOut.Write(buffer, 0, nBytesRead);
    } while (nBytesRead > 0);
    // Close things
    oFileStream.Close();
    oExcel.Close();
}
This code uses a buffer of 1MB to write data to generate the final xslx file. You can change that value to whatever you can depending of your memory availability.

Hope this will be useful.
Oct 5, 2013 at 1:58 AM
Edited Oct 5, 2013 at 1:58 AM
Has anyone found a way to handle the memory issue using EPPlus? I haven't tried @z0mp's solution because I have a massive CSV file already and don't want to generate XML from it, I want to go straight to XLSX. When I use LoadFromText I get an OutOfMemoryException, and when I iterate myself setting cell values I also get an OutOfMemoryException.

Is there a setting that will tell ExcelPackage not to hold all of the cells in memory?

There are python libraries that limit memory usage (http://xlsxwriter.readthedocs.org/en/latest/working_with_memory.html), but I would rather stick with C#.
Nov 30, 2013 at 1:19 PM
Try this
const int LIMIT = 1048576;
const int BUFFER = 100000;
var tempFile = new FileInfo(Path.GetTempFileName());
try
{
    var pck = new ExcelPackage();
    var ws = pck.Workbook.Worksheets.Add("Sheet1");
    // initialize Worksheet
    // Write header row etc.

    pck.SaveAs(tempFile);
    pck.Dispose();

    using (var stream = tempFile.OpenRead())
    {
        pck.Load(stream);
    }
    pck.Workbook.Worksheets.MoveToEnd(1);
    ws = pck.Workbook.Worksheets[1];
    var rowCount = 0;

    // reader = SqlReader
    var values = new object[reader.FieldCount];
    while (reader.Read())
    {
        if (++rowCount >= LIMIT)
        {
            break;
        }
        else if (rowCount % BUFFER == 0)
        {
            pck.Save();
            pck.Dispose();

            using (var stream = tempFile.OpenRead())
            {
                pck.Load(stream);
            }
            pck.Workbook.Worksheets.MoveToEnd(1);
            ws = pck.Workbook.Worksheets[1];
        }
        reader.GetValues(values);
        ws.Cells[1 + rowCount, 1].LoadFromArrays(new[] { values.Select(v => (object)Convert.ToString(v)).ToArray() });
    }

    reader.Dispose();

    pck.Save();
    pck.Dispose();

    response.WriteFile(tempFile.FullName, true);
}
finally
{
    try { tempFile.Delete(); }
    catch { }
}
-Steven Chong
Oct 8, 2014 at 1:35 PM
I'm having the same issue, and I tried Steven's code, but I can't manage to write in the file past the first time (with the pck.SaveAs).
And I can't use SaveAs each time, because otherwise the entire file is rewritten, and the OOM exception comes back...
Anybody have another solution for this problem ? (I'm working with the 4.02 beta)
Apr 28, 2015 at 7:17 AM
I had this problem, but I fixed it by switching the option of "Platform target", from x86 to x64 or "Any CPU". ( right click on the project, then select "Properties", then the tab "Build", then on "Platform target" select "x64" )

The problem is that for platform x86 you can use only about 1.8 GB of RAM. For platform x64, you do not have this limitation.