Web browser HTML table clipboard tests

Results and analysis of Internet Explorer

Note, don't miss the analysis & conclusion further down the page!

Raw Test Data

The raw test data for Internet Explorer can be found here: Internet Explorer 6 & 7 test data, Internet Explorer 8 Beta 1 test data
Test data was gathered for both version 6, version 7, and version 8 Beta 1, on multiple platforms. Although the test data from Internet Explorer 8 Beta 1 is different overall compared with version 6 and 7, the sections from the actual HTML tables were found to be identical on all!

Results

Legend: n = newline; t = tab; s = space

The following patterns were collected from between the cells (containing actual data) either side of each of the tests! So for example, if the test is looking at an empty cell at row 2 cell 1, the pattern covering that would include everything between the contents of row 1 cell 5 and row 2 cell 2!

#TestGroup 1 PatternsGroup 2 PatternsGroup 3 PatternsGroup 4 Patterns
1Between two normal cellsssnsns
2Between two normal rowssnsnnsnnsn
3Empty cell at start of rowsnssnssnsnnsnsnsns
4Empty cell at end of rowssnsssnnsnsnnssnsn
5Two empty cells at start of rowsnsssnssssnsnnsnsnsnsnssns
6Two empty cells at end of rowsssnsssssnnsnsnsnnssnssnsn
7Empty cell in middle of rowsssssnsnsnssns
8Three empty cells in middle of rowsssssssssssnsnsnsnsnssnssnssns
9Empty cell at end of row followed by empty cell at start of next rowssnssssnssnsnsnnsnssnsnsns
10Three empty cells at end of row followed by three empty cells at start of next rowssssnssssssssssnssssssnsnsnsnsnnsnsnsnssnssnssnsnsnssnssns
12Entire row empty (Five cells)snsssssnsnssssssssssnnsnnsnsnsnsnsnnsnsnssnssnssnssnsn
13Empty cells at beginning of first row (One cell)(n)s(n)ss(n)ns(n)sns
14Empty cells at beginning of first row (Two cells)(n)ss(n)ssss(n)nsns(n)snssns
15Empty cells at beginning of first row (Four cells)(n)ssss(n)ssssssss(n)nsnsnsns(n)snssnssnssns
16Empty cells at end of last row (One cell)ssn(n)sssn(n)nsnsn(n)nssnsn(n)
17Empty cells at end of last row (Two cells)sssn(n)sssssn(n)nsnsnsn(n)nssnssnsn(n)
18Empty cells at end of last row (Four cells)sssssnsssssssssnnsnsnsnsnsnnssnssnssnssnsn

Note, the bits of patterns shown in brackets were found for Internet Explorer 6 and 7, but not the Internet Explorer 8 Beta 1!

  • <thead>, <tbody> and <tfoot> seem to make no difference - good news!
  • No difference between <th> and <td> seen - good news!
  • A <br /> tag in the middle of the data splits it with a new line

Analysis

It is strange that in Internet Explorer 6 and 7, tests 16 and 17 have an extra n on the end of the patterns, but test 18 doesn't. I did a little additional test in Internet Explorer 7 and removed everything between the tables for tests 16 to 18 in Group 2. I discovered that all of the bracketed n's (those that don't appear in IE8 Beta 1) for tests 13 to 18 were no longer present, so they must all therefore have been placed there due to the <h4> and <h5> tags alongside the tables. In which case for the purposes of this study you can completely ignore there presence!

The basic building blocks used by Internet Explorer are an s between cells and sn between rows. Using &nbsp; in empty cells (required for cells to display correctly in certain browsers, such as this one!) inserts a single s as cell data.

If you compare the patterns above to those from the Opera browser (use the patterns in the first two tests to break down the patterns), you'll find that with the exception of the final three tests for Group 1 and Group 2, the patterns are actually all the same. The only difference with the last three tests in Group 1 and Group 2 is that Internet Explorer's patterns each have an extra sn on the end (aka row divider).

The effect the <p> tag has on cells is to add a single n after the contents of the cell, whether the cell is completely empty or not.

Spacing out the HTML has the effect of adding a single s after existing cell contents for cells that are not completely empty. It even does this for cells using the <p> tag if enough spacing is added!

Conclusion

The only way to draw a conclusion as to how well these test cases are handled, is to see how easy it is to build a parsing algorithm which can convert all of these patterns into a simple format from which the data can then be easily extracted. Let's make a few rules first though:

  1. Our algorithm will not be told how many columns or rows it has been provided with (you couldn't expect a user to have to provide this info).
  2. Any number of cells in the table could be using the <p> tag, and any number of "empty" cells in the table could be using &nbsp;.
  3. Any part of the HTML of the table could be spaced out and therefore introduce additional spaces into the patterns.
  4. One single algorithm must cover everything.

Processing of all Group 1 patterns is possible with the following steps:

  1. sn => [newrow]
  2. s => [newcell]

However, it is impossible to process any of the patterns from the other groups, nor even Group 1 patterns if spaces are inserted due to spacing in the HTML.

If Microsoft swapped the basic building blocks used in Groups 1 and 2 to those used by Opera and Firefox (n and t), it would be completely possible to process all Group 1 and Group 2 patterns. Group 3 and Group 4 patterns would still be a problem though as I'm having problems with those in Opera and Firefox too.

Ultimately, not a good result Microsoft! The majority of patterns are impossible to process! Although to be fair, if the basic building blocks were changed as I mentioned above, you'd be on par with Opera, which has given me the best results in this study!