Web browser HTML table clipboard tests

Results and analysis of Konqueror

Note, don't miss the analysis & conclusion further down the page!

Raw Test Data

The raw test data for Konqueror can be found here: Konqueror test data
Test data was gathered for version 3.9.5, on Linux.

Results

Legend: n = newline; t = tab; s = space

The following patterns were collected from between the cells (containing actual data) either side of each of the tests! So for example, if the test is looking at an empty cell at row 2 cell 1, the pattern covering that would include everything between the contents of row 1 cell 5 and row 2 cell 2!

#TestGroup 1 PatternsGroup 2 PatternsGroup 3 PatternsGroup 4 Patterns
1Between two normal cellsssssnssnss
2Between two normal rowsnssnssnssnss
3Empty cell at start of rownssnsssssnssnsssnss
4Empty cell at end of rownsssssnssnssnsssnss
5Two empty cells at start of rownssnssssssssnssnsssnsssnss
6Two empty cells at end of rownssssssssnssnssnsssnsssnss
7Empty cell in middle of rowsssssssnssnsssnss
8Three empty cells in middle of rowsssssssssssssnssnsssnsssnsssnss
9Empty cell at end of row followed by empty cell at start of next rownsssssnsssssnssnsssnsssnss
10Three empty cells at end of row followed by three empty cells at start of next rownsssssssssssnsssssssssssnssnsssnsssnsssnsssnsssnsssnss
12Entire row empty (Five cells)nssnsssssssssssssssnssnssnsssnsssnsssnsssnsssnss
13Empty cells at beginning of first row (One cell)sssnssnsnss
14Empty cells at beginning of first row (Two cells)ssssssnssnsnsssnss
15Empty cells at beginning of first row (Four cells)ssssssssssssnssnsnsssnsssnsssnss
16Empty cells at end of last row (One cell)nsssnnnsssn
17Empty cells at end of last row (Two cells)nssssssnnnsssnsssn
18Empty cells at end of last row (Four cells)nssssssssssssnnnsssnsssnsssnsssn
  • <thead>, <tbody> and <tfoot> seem to make no difference - good news!
  • There is an apparent difference between <th> and <td> tags! Cells are separated with an n rather than nss. I am unsure whether there are any other differences.
  • A <br /> tag in the middle of the data splits it with a new line

Analysis

You know what, there may be some logic to the construction of the patterns in Group 2 & 4, but honestly it's just a complete mess really, so I'm not going to waste time trying to analyse it!

Conclusion

The only way to draw a conclusion as to how well these test cases are handled, is to see how easy it is to build a parsing algorithm which can convert all of these patterns into a simple format from which the data can then be easily extracted. Let's make a few rules first though:

  1. Our algorithm will not be told how many columns or rows it has been provided with (you couldn't expect a user to have to provide this info).
  2. Any number of cells in the table could be using the <p> tag, and any number of "empty" cells in the table could be using &nbsp;.
  3. Any part of the HTML of the table could be spaced out and therefore introduce additional spaces into the patterns.
  4. One single algorithm must cover everything.

Wow, it's just simply completely impossible to do anything with these patterns. Terrible job Konqueror!