Page cover

⏱️About the performance

© Alexander Sychev, Sergei Sychev, “TRIZ-RI Group”

After describing the interface that works with the database, we should discuss the performance.

First, let's take a closer look at search operations:

Case of "Get"

def get(SetId, Name, Value):

Let's define operations which have Value parameter equal to None as "fast", whether parameters setID and Name are known or unknown. These operations are "fast" because they compare integer fields (field Set_ID and field Name_ID), and not searching through Value field.

Let's define as "potentially slow" those operations for implementation of which we need to define which value_id in table Values corresponds to a given value. They are slow because they require comparison of strings, and their number is going to be huge, according to the statement of the problem. It should be mentioned that this comparison will only be slow when data type in Values table is String.

In the case of other types, the search goes much faster, and there are no problems with the performance at all. In any case, if the problem is solved for the string type, it is solved for any other type as well.

Then let's take a look at the situation where we need to look through a very large table Values which contains string data. Usually, in this scenario it is structured in such way so it is possible to refer to the table with a smaller number of data. So, the desired end result is a smaller number of data in one table which we are looking through, rather than the structure of data itself. That means we should - ideally - have more Value tables with easy maintenance and not typisation of any sort.

So let's cut our table into few smaller ones. They are simple, they have the same two fields: Id and Value. The unique identifier is generated automatically, regardless of which table the data is stored in.

Any number of such similar simple and primitive tables does not complicate maintenance of our program at all. Moreover, it makes it easier than in the "standard normalized case" because the programmer does not need to learn (understand, remember, describe, explain, etc.) database structure.

The dispatcher function (defining which Value table to use for storing the data and in which table to look for requested data) is placed in the same place, where we store a list of names. For example:

# List of names:
Document_name = 1
Provider_name = 2
Agent_name = 3
People_name = 4
# dispatcher:

def table_for_name(name):

    values1 = "VALUES_1"
    values2 = "VALUES_2"

    t1 = [Document_name, Provider_name]
    t2 = [Agent_name, People_name]

    if name in t1:
        return values1
    if name in t2:
        return values2

    return None

The Dispatcher function defines the table to look through before any request to the database is done. Search through several thousands or tens of thousands of records goes pretty fast. Thus, the kernel itself will determine the specific table Values, which should be addressed, when entering data for the request. And as the number of entries in there will not be too big, the search will happen quickly.

You can change the code of Dispatcher in whatever way you like (if you cut your tables by your own method). Kernel code (respectively, functions add, get, and delete) will not change at all.

So how to cut our tables? If you have different types of data, then divide them into different Value tables (into the string "String Values" and the numerical "Integer Values", and that is perhaps it, since keeping boolean records in this structure ends up in having a table "Bool Values", consisting of 2 lines which are "true" and "false"). String Values tables should be cut into the similar tables for the data which is added rarely, and for data which is added frequently. String data which is added frequently should be kept in a separate table. And only when there will be a lot of records(many hundreds of thousands, or even more), should be cut into several tables, depending on the objectives of your project.

In general, the task of choosing the method for cutting one long table into several ones – is quite typical. Many major Internet services, not only "cut" their tables, but also store them on different servers. Actually, their structure, in this case, is not important, since one specific table is divided into several smaller ones. A table in our case the table is primitive as well. For the general case you can find some information here, https://msdn.microsoft.com/en-us/library/dn589797.aspx or google "Database Sharding".

Now, we see that search operations and, as a result, the get function, are performed optimally. So now let's see how do remaining kernel functions (add and delete) perform themselves (the get function is already discussed above).

Cases of "Add"

def add(SetId, Name, Value):

Case 1:

def add(SetId, Name, Value):
# updates the existing property with name Name and value Value 
# for the set with id Set_id.

Here the add function checks (search operation), whether there is a requested value in the corresponding Values table. If there is none, it creates it (add operation). Then it creates a record in the table Sets, representing a pointer to this value (add operation).

It is also tested, whether there are other sets with the same property (with the same Name and the same Value_id) - a fast search operation.

If there is none, the function deletes the Value from the corresponding table.

Both adding operations are fast. Search operations are discussed above - thus taking this into account, the function is fast.

Case 2:

def add(Set_id=None, Name, Value): # creates a new Set with known property

In the second case, the add function immediately generates a new SetId and works in the same way as described for the case 1 because Set_is is now known. Thus, the second case reduces to the first.

Other situations are meaningless: we either add who knows what property (Name = None), or add who knows what value (Value = None), which is absurd (and the corresponding limitation is reflected in the description of table Sets).

Thus, the add function is fast.

Cases of "Delete"

def delete(id, name):

Case 1:

def delete(id,name):
# removes the property with name Name from the set with the identifier id

Here the delete function searches for a set with the identifier id (search operation). Then it searches for the property with name Name and corresponding Value_id (the search operation we discussed above, we now know that it is fast).

Then it checks whether there are other sets with the same property (with the same Name and the same Value_id) - a fast search operation.

If there is none, deletes the Value from corresponding Values table, and then removes the record from the table Sets which stores the pointer to the property. All of these removal operations are standard and fast.

Case 2:

def delete(id, name=None):
# removes the entire set with identifier id and all of its properties

Here the delete function searches for all of the properties associated with the given set (comparing two numbers is fast operation), and then removes them, just as described in Case 1.

Thus, the delete function is fast.

Last updated