Web browser HTML table clipboard tests

Explanation of the test page

Overall

The test page itself

The tests are all found on a single web page (link above). When gathering test data, the entire page was highlighted at once (CTRL+A) in the browser being tested, copied (CTRL+C), and then pasted into a text editor, ready for analysis.

The tests are split up into six groups. The first group tests a variety of situations based around normal HTML tables and empty cells. The second group is a copy of Group 1, with a HTML &nbsp; within otherwise empty cells (this is sometimes done to make these cells display correctly in certain browsers such as Internet Explorer). Groups 3 and 4 are copies of Groups 1 and 2, but now everything within the <td> tags are then wrapped in <p> tags, which for some reason is common practice on the website data needs to be copied from for the system I am developing. Groups 5 and 6 simply test a variety of other odd situations explained below.

The overall idea is to look at the formatting covering all of these situations in the tables in order to build an algorithm that can cope with parsing all of the situations tested.

Groups 1 to 4

The section titled 'Test 1' tests a variety of situations to do with empty cells in the table. First of all there's an empty cell at the start of a row. Then an empty cell at the end of a row. These two tests are then repeated but now using two empty cells each, this allows me to see the difference in order to determine repeating blocks. I then look at the effect of an empty cell in the middle of the table, and then for comparison a row of three. Next I look at the effect of an empty cell on the end of one row, followed by an empty cell at the start of the next, and again I follow this up with a copy using multiple empty cells. The final test here is to see what happens when a whole row of empty cells is encountered.

The section titled 'Test 2' allows me to analyse the effect of an empty cell at the start and end of the table itself, in comparison to non-empty cells from Test 1. Test 3 and 4 are extensions of Test 2 using multiple empty cells for further comparison.

Group 5

Group 5 Test 1 simply duplicates Test 1 of Group 4, but this time adds <thead>, <tbody> and <tfoot> tags to check whether or not they have any influence on the resulting patterns.

Group 5 Test 2 simply looks at the effect of adding a <br /> tag into the middle of a cell's data. I thought it might be interesting to see, although this thankfully (looking at the complications it adds from the results) doesn't occur in the tables I need to process, as far as I have seen. This test also to some extent looks at the influence the <p> tag actually has on patterns, however I created a proper test for this later on.

Group 5 Test 3 simply repeats some of the previous tests based on first and last rows, but using single row tables and <th> tags instead of <td> tags to check column heading tags don't act different in any way.

Group 6

Group 6 Test 1 takes a look at the effect spacing in the HTML itself has, which is pretty useful to know. I threw in a variety of situations here to try to be thorough.

With Group 6 Test 2 I was thinking of checking out the influence various other obscure HTML tables would give me, and I started out by using a <h5> tag inside of a cell. In the end though I realised that there are a huge number of possible such tables, and since I have yet to see any that would need to be processed by the system I am producing, I never went any deeper into testing this, and this test can be ignored really.

Group 6 Test 3 was a late addition, and allowed me to properly determine the influence the <p> tag had on formatting, which allowed me to properly understand and analyse the data I collected, especially with Mozilla Firefox.