Comparison of 2 tables in excel lesson. How to compare two columns in Excel for matches

Sometimes there is a need to compare two MS Excel files. This may be finding discrepancies in prices for certain items or changing any indications, it doesn’t matter, the main thing is that it is necessary to find certain discrepancies.

It would not be amiss to mention that if there are a couple of records in the MS Excel file, then there is no point in resorting to automation. If the file contains several hundred, or even thousands of records, then it is impossible to do without the help of the computing power of a computer.

Let's simulate a situation where two files have the same number of lines, and the discrepancy must be looked for in a specific column or in several columns. This situation is possible, for example, if you need to compare the price of goods according to two price lists, or compare measurements of athletes before and after the training season, although for such automation there must be a lot of them.

As a working example, let's take a file with the performance of fictitious participants: 100-meter run, 3000-meter run, and pull-ups. The first file is a measurement at the beginning of the season, and the second is the end of the season.

The first way to solve the problem. The solution is only using MS Excel formulas.

Since the records are arranged vertically (the most logical arrangement), it is necessary to use the function. If you use horizontal placement of records, you will have to use the function.

To compare 100 meter running performance, the formula is as follows:
=IF(VLOOKUP($B2,Sheet2!$B$2:$F$13,3,TRUE)<>D2;D2-VLOOKUP($B2;Sheet2!$B$2:$F$13,3,TRUE);"No difference")
If there is no difference, a message is displayed that there is no difference; if it is present, then the value at the end of the season is subtracted from the value at the end of the season.

The formula for the 3000 meter run is as follows:
=IF(VLOOKUP($B2,Sheet2!$B$2:$F$13,4,TRUE)<>E2;"There is a difference";"There is no difference")
If the final and initial values ​​are not equal, a corresponding message is displayed. The formula for pull-ups can be similar to any of the previous ones; there is no point in giving it additionally. The final file with the discrepancies found is shown below.

A little clarification. To make the formulas easier to read, the data from the two files was moved into one (on different sheets), but this could not have been done.

Video comparing two MS Excel files using and functions.

The second way to solve the problem. Solution using MS Access.

This problem can be solved if you first import MS Excel files into Access. As for the method of importing external data itself, there is no difference in finding different fields (any of the presented options will do).

The latter represents the connection Excel files and Access, so when you change data in Excel files, discrepancies will be found automatically when you run a query in MS Access.

The next step after importing is to create relationships between tables. As a connecting field, select the unique field “Item No.”
The third step is to create a simple select query using the Query Builder.

In the first column we indicate which records need to be displayed, and in the second - under what conditions the records will be displayed. Naturally, for the second and third fields the actions will be similar.

Video comparing MS files to Excel using MS Access.

As a result of the manipulations performed, all records are displayed, with different data in the field: “Running 100 meters.” The MS Access file is presented below (unfortunately, SkyDrive does not allow embedding as an Excel file)

These two methods exist for finding discrepancies in MS Excel tables. Each has both advantages and disadvantages. Obviously, this is not an exhaustive list of comparisons between the two Excel files. We are waiting for your suggestions in the comments.

Information presented in the form of tables is much more convenient to analyze and use in various calculations, but when it is necessary to compare data from several similar tables, it is very difficult to do all this visually. Suitable software can always help out in such a situation, and next we will look at how to compare two tables in Excel using different analysis methods.

Unfortunately, you won’t be able to compare tables in Excel with the click of one button, and what’s more, you may also have to prepare the data in some way and write a formula for comparison.

Depending on the desired result, the method for comparing data from tables is selected. The simplest way is to compare two seemingly identical columns to identify rows in which this difference still exists. You can compare both numeric values ​​and text in this way.

Let's compare two columns of digital values ​​in which there is a difference in only a few cells. By writing a simple formula in the adjacent column, the condition for equality of two cells "=B3=C3", we will get the result "TRUE", if the contents of the cells are the same, and "LIE", if the contents of the cells are different. By stretching the formula across the entire height of the column of compared values, it will be very easy to find the different cell.

If you just need to verify the presence or absence of differences in the columns, you can use the menu item "Find and Select", on the tab "Home". To do this, you need to first select the columns being compared, and then select the required menu item. In the drop-down list you must select “Select a group of cells...”, and in the window that appears select "differences by line".

Conditional formatting of differences in ordered values
If desired, you can apply conditional formatting to different cells by filling the cell, changing the text color, etc. In this case, you need to select the item "Conditional Formatting", in the drop-down list of which we select "Rule Management".

In the rules manager, select the item "Create Rule", and in creating rules we select . Now we can set the formula "=$B3<>$C3" to define the cell to be formatted, and set the format for it by clicking on the button "Format".

Now we have a cell selection rule, formatting has been set, and the range of cells being compared has been defined. After pressing the button "OK", the rule we set will be applied.

Comparing and formatting differences in unordered values
Comparing Excel tables is not limited to comparing ordered values. Sometimes you have to compare ranges of mixed values ​​in which you need to determine whether one value fits within a range of other values.

For example, we have a set of values, formatted as two columns, and another set of values ​​of the same type. In the first set we have all the values ​​from 1 to 20, and in the second set some values ​​are missing and duplicated by other values. Our task is to use conditional formatting to highlight values ​​in the first set that are not in the second set.

The procedure is as follows: select the first data set, which we call "Column 1", and in the menu "Conditional Formatting" select an item "Create a rule...". In the window that appears, select , enter the required formula "=COUNTIF($C$3:$D$12,A3)=0" and select the formatting method.

Our formula uses the function "COUNTIF", which counts the number of times a value from a specific cell is repeated "A3" within a given range "$C$3:$D$12", which is our second column. The comparison cell must be the first cell in the range of values ​​to which formatting will be applied.

After applying the created rule, all cells with non-repeating values ​​in another set of values ​​will be highlighted with the specified color.

Of course, there are more complex options for comparing two tables in Excel, such as comparing cents of goods in the new and old price lists. Let's say there are two tables with prices, and next to the prices in the new table you need to indicate the old prices for each product, and the order of the products in the lists is not respected.

Next to the prices in the new table, in the cell of the next column, you need to write a formula that will select the values. In the formula we will use the function "VPR", which can return a value from any column in the row in which the search condition was met. For the function to work correctly, it is necessary that each row in the column contains unique values, which will be searched. If values ​​are repeated, only the first one found will be taken into account.

The formula we need will look like this: "=VLOOKUP(B18,$B$3:$C$10,2,FALSE)". First value "B18" corresponds to the first cell of the desired product name. Second meaning "$B$3:$C$10" means the permanent address of the range of the old price table, the values ​​from which we will need. Third meaning "2" means the column number from the selected range, in the cell of which we will take the old price of the product. And the last meaning "LIE" specifies a search only by exact match of values. After dragging the formula over the entire column of the new table, we will receive in this column the old price values ​​​​for each item available in the new table. Opposite the name of the last product, the formula displays an error message "#N/A", which indicates the absence of this name in the old price list.

There can be countless options for comparing tables in Excel, and some of them can only be done using a VBA add-in.

1. Statement of the problem

Let there be two tables of values ​​that have the same composition and column types. It is required to compare these tables in order to determine the differences, available between them.

Thinking about the conditions of the problem using the most common circumstances, we will additionally establish that:

  1. The different order of the same rows in two tables does not make the tables different (in problems where the order of the rows is important, you can always add a column with the row number to notice their rearrangement);
  2. There cannot be two identical rows in one table (and if there is such a thing, you can always perform a convolution across all columns, counting identical rows in the added column - this will simplify the interpretation of the comparison results).
  3. Tables are compared by directly comparing the values ​​of their elements or references. If table elements contain collections, then only references to the collections are compared without attempting to determine whether their contents are equal.

The second refinement automatically leads to the fact that the table will always have one or more columns, the value (combination of values) in which will be unique and can serve as a row identifier. Such a column (set of columns) can be called a key: simple in the case of one column or compound in a more complex case. Better yet, by analogy with registers, call the mentioned columns table dimensions, and the remaining ones - resources.

Selecting dimension columns allows, when comparing tables, to establish not only the fact of deleting or adding a row, but also the fact of changing a row if resources have changed in the same set of dimensions.

For example, when comparing tables of values ​​obtained from the balance sheet of the raw materials and supplies account, the dimensions will be the columns containing the item and warehouse, and the resources will be the balances and turnover of the account. And when comparing the “Products” tabular parts, the dimensions will be nomenclature, characteristics and series, and the resources will be all other details of this tabular part. And then, by comparing the versions of the tabular parts, it will be possible to say that such and such nomenclature was removed or added, and such and such was changed.

When setting up the problem, we also determine the form for presenting the comparison results. This is the most vulnerable solution to criticism. Because the result of the competition of methods depends on it. One form may be convenient for one method, a second for another, a third for a third, but practice, due to the variety of tasks and situations, does not help the answer.

After much hesitation, the following decision was made: the result of comparing two tables Table0 and Table1 should be a “Difference” table of the same structure as the tables being compared. “Difference” should contain different rows of two tables (deleted, added, changed). In this case, in the additional column “Sign” there should be a mark: 0 - if the line is in Table0 and 1 - if the line is in Table1. This can be interpreted as 0 - row deleted, 1 - added, or 0 - row before change, 1 - after. In addition (attention!), rows with the same measurement values ​​must be located one below the other, which provides a convenient way for visual control to “link” the rows before and after the change.

For example, if you compare the “Grade 7” table with the “Grade 8” table using the proposed method, you should get a “Difference” table.

7th grade 8th grade Difference
Item Grade Item Grade Item Grade Sign
Singing 5 Literature 5 Singing 5 0
Literature 5 Algebra 4 Algebra 5 0
Algebra 5 Physics 5 Algebra 4 1
Physics 5 Chemistry 4 Chemistry 4 1

Well, one last thing. Not so often, but there are still cases when tables already ordered by key fields are compared. Let's add this condition to the problem in order to expand the set of tested algorithms with a method that is specially tailored for this case.

2.Evaluation criteria and test methods

The main evaluation criterion is naturally to choose the time of comparison. An additional criterion can be the simplicity of the comparison function. The comparison execution time can be measured using processing specially created for this purpose. The simplicity of functions is proposed to be assessed subjectively.

The processing created for testing generates a table of values ​​with a specified number of rows and columns and a specified number of dimensions. The data type of the elements is selected from a limited list of primitive types: string, number and date, and the length of the value can also be specified. The table element values ​​are generated randomly. By changing the first table, the second one is formed. The number of changes is specified as a percentage of the number of rows in the first table by three different indicators: the percentage of deletions, changes and additions. The number of repetitions is also specified to determine the average running time of the method. All tested methods are run one after another on the same test tables. The processing used in testing is attached to this publication so that the results can be rechecked on other equipment and in another software environment.

3. Brief description of the compared methods

A total of seven different methods were selected for detailed testing:

3.1. Collapse and sort

The essence of the method is to combine tables by adding one row from the first table to the second in a loop. Then an additional column “Account” is added for subsequent counting of identical rows. The calculation is done by convolution over all columns. This determines the same and different rows in the first and second tables. Those rows that appear one at a time in the combined table are rewritten into a table of differences, which is then sorted by dimension so that the rows before and after the changes are side by side. Here is the code for this function

FunctionTableValueDifference(Table0, Table1, Dimensions) Export AllColumns = ""; For Each Column From Table0.Columns Cycle AllColumns = AllColumns + ", " + Column.Name EndCycle; AllColumns = Avg(AllColumns, 2); Table = Table1.Copy(); Table.Columns.Add("Sign", New TypeDescription("Number")); Table.FillValues(1, "Sign"); For Each Row From Table0 Loop FillPropertyValues(Table.Add(), Row) EndLoop; Table.Columns.Add("Account"); Table.FillValues(1, "Account"); Table.Collapse(AllColumns, "Sign, Account"); Answer = Table.Copy(New Structure("Account", 1), AllColumns + ", Sign"); Answer.Sort(Dimensions); Return Reply EndFunction

3.2 Trick, folding and sorting

This function is a slight modification of the previous function due to the fact that the addition of the first table to the second occurs not in rows, but in columns. This is in certain range conditions speeds up the operation of joining tables

FunctionTableValueDifference(Table0, Table1, Dimensions) Export AllColumns = ""; For Each Column From Table0.Columns Cycle AllColumns = AllColumns + ", " + Column.Name EndCycle; AllColumns = Avg(AllColumns, 2); Table = Table1.Copy(); Table.Columns.Add("Sign", New TypeDescription("Number")); Table.FillValues(1, "Sign"); For е = 1 By Table0.Quantity() Cycle Table.Insert(0) EndCycle; For е = 0 By Table0.Columns.Quantity() - 1 Cycle Table.LoadColumn(Table0.UnloadColumn(е), е) EndCycle; Table.Columns.Add("Account"); Table.FillValues(1, "Account"); Table.Collapse(AllColumns, "Sign, Account"); Answer = Table.Copy(New Structure("Account", 1), AllColumns + ", Sign"); Answer.Sort(Dimensions); Return Reply EndFunction

3.3. Join by index

This feature is built on a simple and clear idea. The loop iterates through the rows of the first table. For each row, an attempt is made to find a row in the second table that corresponds to it by dimension value, using the “Find Rows” method. The resources of the found rows are then compared for discrepancies, the found row in the second table is marked with zero, in order to then select the unmarked “one” rows as missing in the first table. To make the FindRows method work quickly, one index is created for the second table over the entire set of dimensions.

Function DifferenceTableValues(Table0, Table1, Dimensions) Export Selection = New Structure(Dimensions); Resources = New Array; For ColumnIndex = 0 By Table0.Columns.Quantity() - 1 Cycle If NOT Select.Property(Table0.Columns[ColumnIndex].Name) Then Resources.Add(ColumnIndex) EndIf EndCycle; Table1.Columns.Add("Sign", New TypeDescription("Number")); Table1.FillValues(1, "Sign"); NewIndex = Table1.Indexes.Add(Dimensions); Difference = Table1.CopyColumns(); For Each Row0 From Table0 Loop FillPropertyValues(Selection, Row0); Rows1 = Table1.FindRows(Select); If Rows1.Quantity() = 0 Then FillPropertyValues(Difference.Add(), Row0) Else Row1 = Rows1; For Each Resource From Resources Loop If Row0[Resource]<>Row1[Resource] Then FillPropertyValues(Difference.Add(), Row0); FillPropertyValues(Difference.Add(), Line1); Abort EndIfLoopEnd; Line1.Sign = 0 EndIfEndCycle; For Each Row1 From Table1.FindRows(New Structure("Sign", 1)) Loop FillPropertyValues(Difference.Add(), Row1); EndCycle; Table1.Columns.Delete("Sign"); Table1.Indices.Delete(NewIndex); Refund Difference EndFunctions

3.4. Compliance connection

This function is algorithmically the same as the previous one, except that instead of a regular index, a “homemade” match-based index is used. To do this, the second table is first traversed, as a result of which references to its rows are remembered in the search tree built on the basis of the correspondence

Function DifferenceTablesValues_(Table0, Table1, Dimension Row) Export Table1.Columns.Add("Sign", New TypeDescription("Number")); Table1.FillValues(1, "Sign"); DimensionsStructure = New Structure(DimensionsString); Dimensions = New Array; Resources = New Array; For Index = 0 By Table0.Columns.Quantity() - 1 Loop ColumnName = Table0.Columns[Index].Name; If DimensionsStructure.Property(ColumnName) Then Dimensions.Add(Index) Else Resources.Add(Index) EndIf EndCycle; DimensionPlus = Dimensions[Dimensions.Quantity() - 1]; Dimensions.Delete(Dimensions.Quantity() - 1); HashMap = New Match; For Each Row1 From Table1 Loop Root = HashMap; For Each Dimension From Dimension Loop KeyPart = Row1[Dimension]; Branch = Root[KeyPart]; If Branch = Undefined Then Branch = New Match; Root[KeyPart] = BranchEndIf; Root = Branch EndCycle; KeyPart = Row1[DimensionPlus]; Root[PartKey] = Row1EndCycle; Dimensions.Add(DimensionPlus); Difference = Table1.CopyColumns(); For Each Row0 From Table0 Loop Root = HashMap; For Each Dimension From Dimension Loop KeyPart = Row0[Dimension]; Branch = Root[KeyPart]; If Branch = Undefined Then FillPropertyValues(Difference.Add(), Row0); Abort EndIf; Root = Branch EndCycle; If Branch<>Undefined Then For Each Resource From Resources Loop If Row0[Resource]<>Branch[Resource] Then FillPropertyValues(Difference.Add(), Row0); FillPropertyValues(Difference.Add(), Branch); Abort EndIfLoopEnd; Branch.Sign = 0 EndIfEndCycle; For Each Row1 From Table1.FindRows(New Structure("Sign", 1)) Loop FillPropertyValues(Difference.Add(), Row1); EndCycle; Table1.Columns.Delete("Sign"); Return Difference EndFunctions

3.5. Merger

This function assumes that the tables being compared are sorted by key dimensions. During its operation, the rows of two tables are read in turn, compared with each other so that the resulting output is a merged, ordered table without identical rows.

Function DifferenceTablesValues_(Table0, Table1, Dimension Row) Export Table1.Columns.Add("Sign", New TypeDescription("Number")); Table1.FillValues(1, "Sign"); Difference = Table1.CopyColumns(); DimensionsStructure = New Structure(DimensionsString); Dimensions = New Array; Resources = New Array; For Index = 0 By Table0.Columns.Quantity() - 1 Loop ColumnName = Table0.Columns[Index].Name; If DimensionsStructure.Property(ColumnName) Then Dimensions.Add(Index) Else Resources.Add(Index) EndIf EndCycle; Compare = New CompareValues; Index1 = Table0.Quantity() - 1; Index2 = Table1.Quantity() - 1; Row1 = Table0[Index1]; Row2 = Table1[Index2]; While True Loop For Each Dimension From Dimension Loop Comparison Result = Compare.Compare(Row1[Dimension], Row2[Dimension]); If Comparison Result<>0 Then Abort EndIf End Of Cycle; If Comparison Result = 0 Then For Each Resource From Resources Loop If Row1[Resource]<>Line2[Resource] Then FillPropertyValues(Difference.Add(), Line1); FillPropertyValues(Difference.Add(), Line2); Abort EndIfLoopEnd; Index1 = Index1 - 1; Index2 = Index2 - 1; If Min(Index1, Index2)< 0 Тогда Прервать КонецЕсли; Строка1 = Таблица0[Индекс1]; Строка2 = Таблица1[Индекс2]; ИначеЕсли РезультатСравнения >0 Then FillPropertyValues(Difference.Add(), Line1); Index1 = Index1 - 1; If Index1< 0 Тогда Прервать КонецЕсли; Строка1 = Таблица0[Индекс1] Иначе ЗаполнитьЗначенияСвойств(Разница.Добавить(), Строка2); Индекс2 = Индекс2 - 1; Если Индекс2 < 0 Тогда Прервать КонецЕсли; Строка2 = Таблица1[Индекс2] КонецЕсли КонецЦикла; Пока Индекс1 >= 0 Loop Row1 = Table0[Index1]; FillPropertyValues(Difference.Add(), Line1); Index1 = Index1 - 1 EndCycle; While Index2 >= 0 Loop Row2 = Table1[Index2]; FillPropertyValues(Difference.Add(), Line2); Index2 = Index2 - 1 EndCycle; Table1.Columns.Delete("Sign"); Return Difference EndFunctions

3.6. Query - Full Connection

The function is based on passing two tables into a query, where they are joined by equal values ​​in the dimensions. A slight complication is associated with the subsequent “unfolding” in two lines of lines,differing in resources.

Function LinePart(String, Separator) ExportSeparatorPosition = Find(String, Separator); If SeparatorPosition = 0 Then Response = New Array; Answer.Add(String); Otherwise Answer = StrPart(Avg(Row, SeparatorPosition + StrLength(Separator)), Separator); Answer.Insert(0, Avg(Row, 1, SeparatorPosition - 1)) EndIf; Return Answer EndFunction Function DifferenceTableValues(Table0, Table1, Dimensions) Export Query = New Query("SELECT | 0 AS Sign(), T.Field() | PLACE T0 | FROM | &Table0 AS T |; | |///// //////////////////////////////////////////////// ////////////////////// |SELECT |1 AS Sign(), T.Field() |PLACE T1 |FROM &Table1 AS T |; |//////////////////////////////////////////// /////////////////////////// |SELECT | 0 AS SIGN |PUT SIGNS |SELECT | ; |///////////////////////////////////////// //////////////////////////// |SELECT() | SELECT Signs.Sign | WHEN 0 | | ELSE T1.Field | END AS Field, () | Signs | T0 AS T0 | Signs | WHERE | (()T0.Field IS NULL AND Signs.Sign = 1 | OR T1.Field IS NULL AND Signs.Sign = 0 | () OR T0.Field<>T1.Field()) | |ORDER BY | ()Field"); DimensionStructure = New Structure(Dimensions); Sections = StrParts(Query.Text, "()"); Query.Text = Sections; For Each Column From Table1.Columns Cycle Query.Text = Query.Text + StrReplace(Sections, "Field", Column.Name) EndCycle; Request.Text = Request.Text + Sections; For Each Column From Table1.Columns Cycle Request.Text = Request.Text + StrReplace(Sections, "Field", Column. Name) EndCycle; Request.Text = Request.Text + Sections; For Each Column From Table1.Columns Cycle Request.Text = Request.Text + StrReplace(Sections, "Field", Column.Name) EndCycle.Text = Request. Text + Sections; For Each Element From Dimension Structure Cycle Request.Text = Request.Text + StrReplace(Sections, "Field", Element.Key) EndCycle Request.Text = Request.Text + Sections; Request.Text = Request.Text StrReplace(Sections, "Field", Table1.Columns.Name); For Each Column From Table1.Columns Loop If NOT DimensionStructure.Property(Column.Name) Then Query.Text = Query.Text + StrReplace(Sections, "Field", Column.Name) EndIf EndCycle; Query.Text = Query.Text + Sections; Query.Text = Query.Text + StrReplace(Sections, "Field", Dimensions); Query.SetParameter("Table0", Table0); Query.SetParameter("Table1", Table1); Return Request.Execute().Unload() EndFunction

3.7. Request - grouping

This function is built on exactly the same idea as function 3.1, only implemented inside the request

Function DifferenceTableValues(Table0, Table1, Dimensions) Export Query = New Query("SELECT | 0 AS Sign, | T.Field | PLACE T0 | FROM | &Table0 AS T |; | |/////////// //////////////////////////////////////////////// ///////////////// |SELECT | 1 AS Sign, | T.Field | PLACE T1 | //////////////////////////////////////////////// //////////////////// |SELECT |T.Sign, |T.Field |PLACE T |T0 AS T |SELECT ALL | | T. Sign, | T. Field | T1 AS T | |///////////////////////// ////////////////////////////////////////////// | SELECT | SUM (T. Sign) | T. AS T | | GROUP BY | | HAVING | AllColumns = ""; For Each Column From Table1.Columns Cycle AllColumns = AllColumns + ", T." + Column.Name EndCycle; Query.Text = StrReplace(Query.Text, "T.Field", Medium(AllColumns, 2)); Query.Text = StrReplace(Query.Text, "Field//", Dimensions); Query.SetParameter("Table0", Table0); Query.SetParameter("Table1", Table1); Return Request.Execute().Unload() EndFunction

All the functions presented here have been carefully tuned to achieve maximum performance. Taking into account the experience gained during joint tuning on the function forum for the one-dimensional case. However, this was not done as carefully as there, so perhaps a little more performance could be squeezed out of some functions.

4. Test results

4.1 Effect of number of lines

Let's study the dependence of comparison time on the number of rows in tables. To do this, we use the following values ​​of testing parameters. Number of rows – 20000, 40000, 60000, 80000, 100000, number of columns – 10, number of key columns – 1, data type – string, row length – 10, percentage of deletions, changes, additions – 5, number of test repetitions – 2. We get the following dependence, which is more convenient to present in graph form.

This relationship is almost linear for most methods! That's how it should be. The running time of the FindRows method in the presence of an index does not depend on the number of rows, so a join by index is performed in linear time. The same is true when using match and merge. With a full join, the query most likely uses a hash match to join tables of equal size.

The non-linearity of sorting time for a relatively small number of different rows slightly deviates from the direct relationship for convolution. The method using merging by copying columns is worse - it is this method of copying that introduces significant nonlinearity in addition to the slight nonlinearity of sorting. Because of this, the benefit of using the “trick” of joining tables with more than 60,000 rows is lost.

4.2 Effect of value length

Now let's examine the dependence of time on the length of string values. Let's set the number of lines equal to 50000. The remaining parameters are the same as in 4.1. We present the result in the form of a bar chart. It better shows the ratio of the operating time of different methods and allows you to identify the leader, which in most cases is the convolution method.

It can be seen that the dependence of time on the method practically does not change when the string length changes. Only the query execution time increases.

To make this diagram more descriptive of query methods, it separates the time it takes to enter tables into a query into separate dimensions. For this purpose, a dummy function has been created that only performs entering tables into the query and does not perform any other work. The long time it takes to enter tables shows that it is very difficult for query technology to compete with leading methods. In many cases, leaders have already finished their work by the time the initial data is just in the request.

4.3 Impact of data types

Next interesting question– relation of methods to data types. The following diagram shows it. Here the number of lines is also 50000, the length of the string and numeric values ​​is 10. The rest is as in 4.1.

It shows that the data type has the greatest impact on the time of query methods. For numbers, grouping works better. And dates are processed very well.

4.3 Effect of the number of columns

Another dependence is the dependence of comparison time on the number of columns. The following diagram shows it. The number of lines here is 50,000, the data type is a string of length 10, the percentage of additions, distortions and deletions is 5. One key column.

It can be seen that the number of columns does not greatly change the comparative speed of the methods. The biggest slowdown in query performance is increasing the number of columns.

4.4 Effect of number of measurements

More interesting is the dependence on the number of key columns shown below. The number of rows here is 50,000, the data type is a string of length 10, the percentage of additions, distortions and deletions is 5. There are 10 columns in total.

It can be seen that the correspondence-based method, which previously showed good results, is now an outsider. Fusion also worsens. But searching by index is improved - due to the fact that there are fewer columns left to compare.

4.5 Impact of differences in table sizes

Now let's pay attention to the asymmetry of methods 1 - 4 (convolution and join) regarding the sizes of the compared tables. All of these methods benefit from making the first table smaller! This is confirmed by the following table, which shows the time it takes to compare two tables of 50,000 and 40,000 rows in different orders.

There is an interesting artifact visible in the diagram above. Given the number of rows and columns, it turns out to be more profitable to add 50 thousand rows in a loop to a table of 40 thousand rows than vice versa. This may be due to the peculiarities of memory allocation for the value table.

4.6 Effect of number of differences

And finally, let’s examine the dependence of the comparison time on the degree of difference between the tables. It can be seen that as the percentage of discrepancies increases, the convolution operating time slows down. Since the nonlinearity of sorting begins to play a role.

4.7 Influence of hardware and software environment

Tests were performed on platform 8.3.5.1248 on a VGN-Z51MRG laptop. The obtained dependencies are generally confirmed on other equipment, but there are also some features that have not yet been generalized.

5. Conclusions

5.1. The simplest convolution method turns out to be the most productive in most cases. It should be used as a universal method, but not in special cases.

5.2 If the size is small (up to 50,000 rows), you can get additional speedup for the folding by using column copying when merging tables (method 3.2).

5.3 In the special case of one key column, a significant number of differences, and a significant difference in table sizes, a match join should be used. The same should be done even if there are several key tables, but the comparison is made with the same table, for which you can prepare in advance “decision tree” based on compliance,tuned to its characteristics.

5.4 In the special case of several key columns with a significant number of differences and the tables being compared are not sorted, you need to use the join method by index.

5.5 For the greatest efficiency of methods 1-4, you need to choose the correct order of indicating tables when comparing.

5.6 In the special case of sorted tables when there are a significant number of differences, merging should be used.

5.7. In the special case of large (depending on the equipment) and approximately equal in size tables, which, moreover, have significant differences and consist of short rows and an extremely small number of columns, it is possible to use queries.

5.8 If tables are dominated by numeric data, dates, medium and long strings, then table comparison queries should use grouping, and only for very short strings a full join.

6. General conclusions

6.1 In any case, before making a decisive choice, it is better, if possible, to compare several methods under real conditions of their use. For example, using the processing attached to the article.

6.2 Taking into account the features of the data in the tables allows for targeted additional optimization of most of the methods presented. There are many possibilities for this that remain outside the scope of issues considered.

6.3 Entering value tables into queries can take considerable time, which in most cases negates the effectiveness of their use in tasks where data is taken from memory rather than from the database. Thoughtless use of queries in this task is a harmful misconception.

6.4 The running time of the FindRows method, if there is an index on the columns included in the selection, does not depend on the size of the values ​​table. Thus, the correct estimate for the performance of the table comparison method using a join by index is O(N).

, who proposed, implemented and debugged their methods for the one-dimensional case, made many useful amendments and considerations, and also actively participated in all discussions. Special thanks to the sponsors of that very thread and - for an interesting question.

Sometimes you want to view only those records in an Access table that match records in another table that contain fields with matching data. For example, you might want to review the records of employees who processed at least one order to determine who deserves a promotion. Or you may need to review the contact details of clients who live in the same city as the employee to arrange a face-to-face meeting.

If you need to compare two Access tables and find matching data, there are two options.

    Create a query that joins fields from each table that contain matching data, using an existing relationship or a join that you created for the query. This method is optimal in terms of the speed of returning query results, but does not allow combining fields with data of different types.

    Create a field comparison query that uses one field as a condition for another. This method usually takes longer because a join removes rows from the query results before the base tables are read, whereas the conditions are applied to the query results after those tables are read. But the field can be used as a condition for comparing fields with data of different types, which cannot be done when using unions.

This article discusses comparing two tables to identify matching data and provides sample data that can be used in example procedures.

In this article

Comparing two tables using joins

To compare two tables using joins, you must create a select query that includes both tables. If there is no relationship between the tables yet based on the fields containing the required data, you need to create joins on them. You can create as many joins as you like, but each pair of joined fields must contain the same or compatible data type.

Let's say you work at a university and want to know how recent changes to the math curriculum have affected students' grades. In particular, you are interested in the grades of those students whose major subject is mathematics. You already have a table containing data about the major subjects, and a table containing data about the students who study them. Data on grades is stored in the “Students” table, and data on major subjects is stored in the “Specializations” table. To see how math majors' grades have changed since recent curriculum changes, you'll need to look at the entries in the Students table that correspond to the entries in the Majors table.

Preparing sample data

In this example, you are creating a query that determines how recent changes to the math curriculum have affected the grades of students in the math major. Use the two tables below: Specializations and Students. Add them to the database.

Access provides several ways to add these tables to a sample database. You can enter the data manually, copy each table into the program spreadsheet (for example, Microsoft Office Excel 2007) and then import the sheets into Access or you can paste the data into text editor, for example Notepad and then import the data from the resulting text files.

IN step by step instructions This section explains how to enter data manually on blank sheet, and how to copy sample tables into Excel and then import them into Access.

Specializations

Student code

Specialization

Students

Student code

Semester

Syllabus

Item number

Grade

If you are going to enter sample data in a spreadsheet, you can.

Entering example data manually

If you are not interested in creating a worksheet based on sample data, skip the next section (Creating Worksheets with Sample Data).

Creating sample data sheets

Creating Database Tables from Sheets


Compare sample tables and find matching records using joins

You are now ready to compare the Students and Specializations tables. Since the relationships between the two tables are not defined, you need to create joins on the corresponding fields in the query. The tables contain multiple fields, and you will need to create a join for each pair of common fields: Student ID, Year, and Curriculum (in the Students table) and Major (in the Specializations table). . In this case, we are only interested in mathematics, so we can limit the query results using a field condition.

    On the tab Creation click the button Query Builder.

    In the dialog box Adding a table Double-click the table that contains the records you want ( Students), and then double-click the table you're comparing it to ( Specializations).

    Close the dialog box Adding a table.

    Drag the field Student code from the table Students in the field Student code tables Specializations. A line will appear in the query form between the two tables, indicating that a join has been created. Double click the line to open the dialog box Merge Options.

    Notice the three options in the dialog box Merge Options. By default, option 1 is selected. In some cases, you need to add additional rows from the same table to the join parameters. Since you only need to find matching data, leave the join value at 1. Close the dialog box Merge Options by pressing the button Cancel.

    You need to create two more joins. To do this, drag the field Year from the table Students in the field Year tables Specializations and then the field Syllabus from the table Students in the field Specialization tables Specializations.

    In the table Students double-click the star ( * ) to add all table fields to the request form.

    Note: Students.*.

    In the table Specializations double click the field Specialization to add it to the form.

    Show column Specialization.

    In line Selection condition column Specialization enter MATH.

    On the tab Constructor in the group Results click the button Execute.

Comparing two tables using a field as a condition

Sometimes you need to compare tables based on fields with matching values, but different types data. For example, a field in one table may have a numeric data type, but you need to compare it with a field from another table that has a text data type. Fields containing similar data of different types may appear when you save numbers as text (for example, when importing data from another program). Since it is not possible to create joins between fields with different types of data, you will need to use another method. To compare two fields with different types of data, you can use one field as a condition for the other.

Let's say you work at a university and want to know how recent changes to the math curriculum have affected students' grades. In particular, you are interested in the grades of those students whose major subject is mathematics. You already have the "Specializations" and "Students" tables. Data on grades is stored in the “Students” table, and data on major subjects is stored in the “Specializations” table. To see how math majors' grades have changed, you need to look at the entries in the Students table that correspond to the entries in the Majors table. However, one of the fields you want to use to compare tables has a different data type than the field it is being matched against.

To compare two tables using a field as a condition, you need to create a select query that includes both tables. Include in your query the fields you want to display, as well as the field corresponding to the field that will be used as a condition. Then create a condition to compare the tables. You can create as many conditions as needed to compare fields.

To illustrate this method, we'll use , but in the Student ID field of the Specializations table, we'll change the data type from numeric to text. Since we cannot create a union of two fields with different data types, we will have to compare the two Student ID fields using one field as a condition for the other.

Changing the data type in the Student ID field of the Specializations table

    Open the database where you saved the sample tables.

    IN navigation areas right-click the Specializations table and select Constructor.

    The Specializations table opens in design mode.

    In column Data type change for field Student code data type Number on Text.

    Close the "Specializations" table. Click the button Yes when prompted to save changes.

Compare example tables and find matching records using a field condition

The following shows how to compare two Student ID fields using a field from the Students table as a condition for a field from the Specializations table. Using a keyword Like you can compare two fields even if they contain different types of data.

    On the tab Create in the group Other click the button Query Builder.

    In the dialog box Adding a table double click the table Students and then the table Specializations.

    Close the dialog box Adding a table.

    Drag the field Year from the table Students in the field Year tables Specializations and then the field Syllabus from the table Students in the field Specialization tables Specializations. These fields contain the same type of data, so you can use joins to compare them. To compare fields of the same data type, it is recommended to use unions.

    Double-click the star ( * ) in the table Students to add all table fields to the request form.

    Note: When you use an asterisk to add all the fields on a form, only one column appears. This column name includes the table name followed by a period (.) and an asterisk (*). In this example, the column is named Students.*.

    In the table Specializations double click the field Student code to add it to the form.

    In the request form, uncheck the box in the line Show column Student code. In line Selection condition column Student code enter Like [Students].[Student code].

    In the table Specializations double click the field Specialization to add it to the form.

    In the request form, uncheck the box in the line Show column Specialization. In line Selection condition enter MATH.

    On the tab Constructor in the group Results click the button Execute.

    The request is executed, and the math grades of only those students whose major is this subject are displayed.

Question from a user

Hello!

I have one task, and for the third day now I’ve been racking my brains - I don’t know how to complete it. There are 2 tables (about 500-600 rows in each), you need to take the column with the name of the product from one table and compare it with the name of the product from the other, and if the products match, copy and paste the value from table 2 into table 1. Confusedly explained , but I think you’ll understand the task from the photo ( approx. : the photo was cut out by censorship, it’s still personal information).

Thank you in advance. Andrey, Moscow.

Good day everyone!

What you described refers to fairly popular tasks that are relatively easy and quick to solve using Excel. All you need to do is put your two tables into the program and use the VLOOKUP function. More about her work below...

An example of working with the VLOOKUP function

As an example, I took two small signs, they are shown in the screenshot below. In the first table (columns A, B- product and price) no data on the column B; in the second, both columns (product and price) are filled in. Now you need to check the first columns in both tables and automatically, if a match is found, copy the price to the first table. It seems like a simple task...

How to do it...

Place the mouse pointer in the cell B2- that is, in the first cell of the column where we have no value and write the formula:

=VLOOKUP(A2,$E$1:$F$7,2,FALSE)

A2- the value from the first column of the first table (what we will look for in the first column of the second table);

$E$1:$F$7- a completely selected second table (in which we want to find and copy something). Pay attention to the "$" sign - it is necessary so that when copying the formula, the cells of the selected second table do not change;

2 - the number of the column from which we are copying the value (note that our selected second table has only 2 columns. If it had 3 columns, then the value could be copied from the 2nd or 3rd column);

LIE- we are looking for an exact match (otherwise the first similar one will be substituted, which obviously does not suit us).

Actually, you can adjust the finished formula to your needs by slightly changing it. The result of the formula is presented in the picture below: the price was found in the second table and entered in auto mode. Everything works!

To set the price for other product names, simply extend (copy) the formula to other cells. Example below.

After which, as you can see, the first columns of the tables will be compared: from the rows where the cell values ​​coincide, the necessary data will be copied and substituted. In general, it is clear that tables can be much larger!

Note: I must say that the VLOOKUP function is quite demanding on computer resources. In some cases, with an excessively large document, it may take quite a long time to compare tables. In these cases, it is worth considering either other formulas or completely different solutions (each case is individual).

That's all, good luck!