After putting so much effort into importing your data into an SQL database and connecting it to your website, how do you get it back out and into Excel in order to keep your off-line and on-line systems synchronised?
The following article presents a simple method for downloading any data from PHP into an Excel spreadsheet - or at least something that looks like one.
1. Preparing the data
The following examples use the dataset created for Sorting Arrays of Arrays which is defined as follows:
<?PHP $data = array( array("firstname" => "Mary", "lastname" => "Johnson", "age" => 25), array("firstname" => "Amanda", "lastname" => "Miller", "age" => 18), array("firstname" => "James", "lastname" => "Brown", "age" => 31), array("firstname" => "Patricia", "lastname" => "Williams", "age" => 7), array("firstname" => "Michael", "lastname" => "Davis", "age" => 43), array("firstname" => "Sarah", "lastname" => "Miller", "age" => 24), array("firstname" => "Patrick", "lastname" => "Miller", "age" => 27) ); ?>
Note: Further down this page you can find an example on creating an export from an SQL query.
The first step is to output the data in a tab-delimited format (CSV can also be used but is slightly more complicated). To achieve this we use the following code:
<?PHP header ("Content-Type: text/plain"); $flag = false; foreach($data as $row) { if(!$flag) { # display field/column names as first row echo implode ("\t", array_keys ($row)) . "\r\n"; $flag = true; } echo implode("\t", array_values ($row)) . "\r\n"; } ?>
We set the content type to text/plain so that the output can more easily be viewed in the browser. Otherwise, because there is no HTML formatting, the output would appear as a single line of text.
The first line of output will be the column headings (in this case the field names are used). Values are separated with a tab \t and rows with a line break \n. The output should look something like the following:
firstname lastname age Mary Johnson 25 Amanda Miller 18 James Brown 31 Patricia Williams 7 Michael Davis 43 Sarah Miller 24 Patrick Miller 27
There's already a weakness in this code that may not be immediately obvious. What if one of the fields to be ouput already contains one or more tab characters, or worse, a newline? That's going to throw the whole process out as we rely on those characters to indicate column- and line-breaks.
The solution is to 'escape' the tab characters. In this case we're going to replace tabs with a literal \t and line breaks with a literal \n so they don't affect the formatting:
<?PHP function cleanData(&$str) { $str = preg_replace ("/\t/", "\\t", $str); $str = preg_replace("/\r?\n/", "\\n", $str); } header("Content-Type: text/plain"); $flag = false; foreach($data as $row) { if(!$flag) { # display field/column names as first row echo implode("\t", array_keys($row)) . "\r\n"; $flag = true; } array_walk ($row, 'cleanData'); echo implode("\t", array_values($row)) . "\r\n"; } ?>
Now, before each row is echoed any tab characters are replaced "\t" so that our columns aren't broken up. Also any line breaks within the data are replaced with "\n". Now, how to set this up as a download...
2. Triggering a download
What many programmers don't realise is that you don't have to create a file, even a temporary one, in order for one to be downloaded. It's sufficient to 'mimic' a download by passing the equivalent HTTP headers followed by the data.
If we create a PHP file with the following code then when it's called a file will be downloaded which can be opened directly using Excel.
<?PHP function cleanData(&$str) { $str = preg_replace("/\t/", "\\t", $str); $str = preg_replace("/\r?\n/", "\\n", $str); if(strstr($str, '"')) $str = '"' . str_replace('"', '""', $str) . '"'; } # filename for download $filename = "website_data_" . date('Ymd') . ".xls"; header("Content-Disposition: attachment; filename=\"$filename\""); header("Content-Type: application/vnd.ms-excel"); $flag = false; foreach($data as $row) { if(!$flag) { # display field/column names as first row echo implode("\t", array_keys($row)) . "\r\n"; $flag = true; } array_walk($row, 'cleanData'); echo implode("\t", array_values($row)) . "\r\n"; } exit; ?>
Note that we've added an extra line to the cleanData function to detect double-quotes and escape any value that contains them. Without this an uneven number of quotes in a string can confuse Excel.
This should result in a file being downloaded and saved to your computer. If all goes well then the filename will be named "website_data_20111029.xls" and will open in Excel looking something like this:
How does it work? Setting the headers tells the browser to expect a file with a given name and type. The data is then echoed, but instead of appearing on the page it becomes the downloaded file.
Because of the .xls extension and the ms-excel file type, most computers will associate it with Excel and double-clicking will cause that program to open. You could also modify the file name and mime type to indicate a different spreadsheet package or database application.
Before you ask, there is no way to specify data or cell formatting, column widths, etc, using this method. We are only passing a tab-delimited text file. To include formatting try generating HTML code or a script that actually builds an Excel file. Or a macro.
A similar technique can be used to allow users to download files that have been uploaded previously using PHP and stored with different names. More on that later...
3. Exporting from an SQL database
If your goal is to allow data to be exported from a query result then the changes are relatively simple:
<?PHP // Original PHP code by Chirp Internet: www.chirp.com.au // Please acknowledge use of this code by including this header. function cleanData(&$str) { $str = preg_replace("/\t/", "\\t", $str); $str = preg_replace("/\r?\n/", "\\n", $str); if(strstr($str, '"')) $str = '"' . str_replace('"', '""', $str) . '"'; } // file name for download $filename = "website_data_" . date('Ymd') . ".xls"; header("Content-Disposition: attachment; filename=\"$filename\""); header("Content-Type: application/vnd.ms-excel"); $flag = false; $result = pg_query ("SELECT * FROM table ORDER BY field") or die('Query failed!'); while(false !== ($row = pg_fetch_assoc ($result))) { if(!$flag) { // display field/column names as first row echo implode("\t", array_keys($row)) . "\r\n"; $flag = true; } array_walk($row, 'cleanData'); echo implode("\t", array_values($row)) . "\r\n"; } ?>
This would be the entire file required to query the database and trigger the file download. The database functions need to match the database you're using. MySQL users for example would use mysql_query andmysql_fetch_assoc in place of the PostgreSQL functions.
For other databases see under User Comments below or check the PHP documentation. If you are seeing duplicate columns (numbered as well as labeled) you need to change the fetch call to return only the associative (ASSOC) array.
If you're having trouble at this stage, remove the Content-Disposition header and change the Content-Type back to text/plain. This makes debugging a lot easier as you can see the output in your browser rather than having to download and open the generated file every time you edit the script.
4. Preventing Excel's ridiculous auto-format
When importing from a text file as we're essentially doing here, Excel has a nasty habit of mangling dates, timestamps, phone numbers and similar input values.
For our purposes, some simple additions to the cleanData function take care of most of the problems:
<?PHP // Original PHP code by Chirp Internet: www.chirp.com.au // Please acknowledge use of this code by including this header. function cleanData(&$str) { // escape tab characters $str = preg_replace("/\t/", "\\t", $str); // escape new lines $str = preg_replace("/\r?\n/", "\\n", $str); // convert 't' and 'f' to boolean values if($str == 't') $str = 'TRUE'; if($str == 'f') $str = 'FALSE'; // force certain number/date formats to be imported as strings if(preg_match("/^0/", $str) || preg_match("/^\+?\d{8,}$/", $str) || preg_match("/^\d{4}.\d{1,2}.\d{1,2}/", $str)) { $str = "'$str"; } // escape fields that include double quotes if(strstr($str, '"')) $str = '"' . str_replace('"', '""', $str) . '"'; } ?>
The section that prevents values being scrambled does so by inserting an apostrophe at the start of the cell. When you open the resuling file in Excel you may see the apostrophe, but editing the field will make it disappear while retaining the string format. Excel is strange that way.
The types of values being escape this way are: values starting with a zero; values starting with an optional+ and at least 8 consecutive digits (phone numbers); and values starting with numbers in YYYY-MM-DD format (timestamps). The relevant regular expressions have been highlighted in the code above.
5. Formatting - colours, numbers, dates
Again, this script generates and triggers the download of a tab-delimited text file with an .xls extension and Excel Mime-Type. We are not building an actual Excel document.
Defining styles, colours, column widths, etc, is not possible using this technique. You may be able to generate an HTML table with some formatted data that Excel will recognise, otherwise you need a much more complicated script.
6. Exporting to CSV format
The tab-delimited text options describe above may be a bit limiting if your data contains newlines or tab breaks that you want to preserve when opened in Excel or another spreadsheet application.
A better format then is comma-separated variables (CSV) which can be generated as follows:
<?PHP function cleanData(&$str) { if($str == 't') $str = 'TRUE'; if($str == 'f') $str = 'FALSE'; if(preg_match("/^0/", $str) || preg_match("/^\+?\d{8,}$/", $str) || preg_match("/^\d{4}.\d{1,2}.\d{1,2}/", $str)) { $str = "'$str"; } if(strstr($str, '"')) $str = '"' . str_replace('"', '""', $str) . '"'; } header("Content-Disposition: attachment; filename=\"$filename\""); header("Content-Type: application/vnd.ms-excel;"); $out = fopen ("php://output", 'w'); $flag = false; $result = pg_query("SELECT * FROM table ORDER BY field") or die('Query failed!'); while(false !== ($row = pg_fetch_assoc($result))) { if(!$flag) { // display field/column names as first row fputcsv ($out, array_keys($row), ',', '"'); $flag = true; } array_walk($row, 'cleanData'); fputcsv($out, array_values($row), ',', '"'); } fclose($out); ?>
Normally the fputcsv command is used to write data in CSV format to a separate file. In this script we're tricking it into writing directly to the page by telling it to write to php://output instead of a regular file.
7. Exporting to CSV with Unicode intact
If like us your data contains UTF-8 characters you will notice that Excel doesn't handle them very well. Other applications can open UTF-8 content without problems, but for some reason Microsoft wants to keep you in the dark ages.
Fortunately, there is a trick you can use. Below you can see how we modify the script to convert everything from UTF-8 to UTF-16 Lower Endian (UTF-16LE) format which Excel, at least on Windows, will recognise.
Please Note: When you open the file in Excel you might find all the data bunched into the first column. This should be fixable using the "Text to Columns..." command in the Data menu.
<?PHP function cleanData(&$str) { if($str == 't') $str = 'TRUE'; if($str == 'f') $str = 'FALSE'; if(preg_match("/^0/", $str) || preg_match("/^\+?\d{8,}$/", $str) || preg_match("/^\d{4}.\d{1,2}.\d{1,2}/", $str)) { $str = "'$str"; } if(strstr($str, '"')) $str = '"' . str_replace('"', '""', $str) . '"'; $str = mb_convert_encoding($str, 'UTF-16LE', 'UTF-8'); } header("Content-Disposition: attachment; filename=\"$filename\""); header("Content-Type: application/vnd.ms-excel; charset=UTF-16LE"); $out = fopen("php://output", 'w'); $flag = false; $result = pg_query("SELECT * FROM table ORDER BY field") or die('Query failed!'); while(false !== ($row = pg_fetch_assoc($result))) { if(!$flag) { // display field/column names as first row fputcsv($out, array_keys($row), ',', '"'); $flag = true; } array_walk($row, 'cleanData'); fputcsv($out, array_values($row), ',', '"'); } fclose($out); ?>
This script may not work for all versions of Excel. Please let us know using the Feedback form below if you encounter problems or come up with a better solution.
No comments:
Post a Comment