HTML table clipboard tests

Note, don't miss the analysis & conclusion further down the page!

Raw Test Data

The raw test data for Internet Explorer can be found here: Internet Explorer 6 & 7 test data, Internet Explorer 8 Beta 1 test data
Test data was gathered for both version 6, version 7, and version 8 Beta 1, on multiple platforms. Although the test data from Internet Explorer 8 Beta 1 is different overall compared with version 6 and 7, the sections from the actual HTML tables were found to be identical on all!

Results

Legend: n = newline; t = tab; s = space

The following patterns were collected from between the cells (containing actual data) either side of each of the tests! So for example, if the test is looking at an empty cell at row 2 cell 1, the pattern covering that would include everything between the contents of row 1 cell 5 and row 2 cell 2!

#	Test	Group 1 Patterns	Group 2 Patterns	Group 3 Patterns	Group 4 Patterns
1	Between two normal cells	s	s	ns	ns
2	Between two normal rows	sn	sn	nsn	nsn
3	Empty cell at start of row	sns	snss	nsnns	nsnsns
4	Empty cell at end of row	ssn	sssn	nsnsn	nssnsn
5	Two empty cells at start of row	snss	snssss	nsnnsns	nsnsnssns
6	Two empty cells at end of row	sssn	sssssn	nsnsnsn	nssnssnsn
7	Empty cell in middle of row	ss	sss	nsns	nssns
8	Three empty cells in middle of row	ssss	sssssss	nsnsnsns	nssnssnssns
9	Empty cell at end of row followed by empty cell at start of next row	ssns	sssnss	nsnsnns	nssnsnsns
10	Three empty cells at end of row followed by three empty cells at start of next row	ssssnsss	sssssssnssssss	nsnsnsnsnnsnsns	nssnssnssnsnsnssnssns
12	Entire row empty (Five cells)	snsssssn	snssssssssssn	nsnnsnsnsnsnsn	nsnsnssnssnssnssnsn
13	Empty cells at beginning of first row (One cell)	(n)s	(n)ss	(n)ns	(n)sns
14	Empty cells at beginning of first row (Two cells)	(n)ss	(n)ssss	(n)nsns	(n)snssns
15	Empty cells at beginning of first row (Four cells)	(n)ssss	(n)ssssssss	(n)nsnsnsns	(n)snssnssnssns
16	Empty cells at end of last row (One cell)	ssn(n)	sssn(n)	nsnsn(n)	nssnsn(n)
17	Empty cells at end of last row (Two cells)	sssn(n)	sssssn(n)	nsnsnsn(n)	nssnssnsn(n)
18	Empty cells at end of last row (Four cells)	sssssn	sssssssssn	nsnsnsnsnsn	nssnssnssnssnsn

Note, the bits of patterns shown in brackets were found for Internet Explorer 6 and 7, but not the Internet Explorer 8 Beta 1!

<thead>, <tbody> and <tfoot> seem to make no difference - good news!
No difference between <th> and <td> seen - good news!
A <br /> tag in the middle of the data splits it with a new line

Analysis

It is strange that in Internet Explorer 6 and 7, tests 16 and 17 have an extra n on the end of the patterns, but test 18 doesn't. I did a little additional test in Internet Explorer 7 and removed everything between the tables for tests 16 to 18 in Group 2. I discovered that all of the bracketed n's (those that don't appear in IE8 Beta 1) for tests 13 to 18 were no longer present, so they must all therefore have been placed there due to the <h4> and <h5> tags alongside the tables. In which case for the purposes of this study you can completely ignore there presence!

The basic building blocks used by Internet Explorer are an s between cells and sn between rows. Using   in empty cells (required for cells to display correctly in certain browsers, such as this one!) inserts a single s as cell data.

If you compare the patterns above to those from the Opera browser (use the patterns in the first two tests to break down the patterns), you'll find that with the exception of the final three tests for Group 1 and Group 2, the patterns are actually all the same. The only difference with the last three tests in Group 1 and Group 2 is that Internet Explorer's patterns each have an extra sn on the end (aka row divider).

The effect the <p> tag has on cells is to add a single n after the contents of the cell, whether the cell is completely empty or not.

Spacing out the HTML has the effect of adding a single s after existing cell contents for cells that are not completely empty. It even does this for cells using the <p> tag if enough spacing is added!

Conclusion

The only way to draw a conclusion as to how well these test cases are handled, is to see how easy it is to build a parsing algorithm which can convert all of these patterns into a simple format from which the data can then be easily extracted. Let's make a few rules first though:

Our algorithm will not be told how many columns or rows it has been provided with (you couldn't expect a user to have to provide this info).
Any number of cells in the table could be using the <p> tag, and any number of "empty" cells in the table could be using  .
Any part of the HTML of the table could be spaced out and therefore introduce additional spaces into the patterns.
One single algorithm must cover everything.

Processing of all Group 1 patterns is possible with the following steps:

sn => [newrow]
s => [newcell]

However, it is impossible to process any of the patterns from the other groups, nor even Group 1 patterns if spaces are inserted due to spacing in the HTML.

If Microsoft swapped the basic building blocks used in Groups 1 and 2 to those used by Opera and Firefox (n and t), it would be completely possible to process all Group 1 and Group 2 patterns. Group 3 and Group 4 patterns would still be a problem though as I'm having problems with those in Opera and Firefox too.

Ultimately, not a good result Microsoft! The majority of patterns are impossible to process! Although to be fair, if the basic building blocks were changed as I mentioned above, you'd be on par with Opera, which has given me the best results in this study!

Web browser HTML table clipboard tests

Raw Test Data

Results

Analysis

Conclusion