4
I Use This!
Moderate Activity

News

Analyzed about 2 hours ago. based on code collected about 5 hours ago.
Posted over 11 years ago by Park Kieun
This article is a part of "CUBRID Internals" book. In the previous article I have explained about CUBRID Processes and where to configure them. In this article I would like to dive into how those processes run. If you have not read the previous ... [More] article, I recommend you to do so before reading further. Figure 1 below illustrates the basic operation flow in CUBRID. First, when a user enters the cubrid service start command, CUBRID gets started. Then, the cub_master and cub_broker processes are started. At this time, a number of cub_cas processes will be started which corresponds to the value of MIN_NUM_APPL_SERVER parameter in the cubrid_broker.conf file. Then, a user enters the cubrid server start demodb command which creates the cub_server process that mounts the demodb database volume. As described in CUBRID Processes, the cub_master process connects the cub_server process with the cub_cas process, which sends the requests, or the csql program. Figure 1: CUBRID Operation Procedure. As shown in Figure 1, if a JDBC application is connected to the server process which mounts the database volume file, the SQL statement can be executed in CUBRID. Two most popular methods to use CUBRID is to execute SQL manually using the command line CSQL Interpreter program or to write a program which uses various APIs like JDBC, PHP, Node.js, etc. Note:As we have discussed, the JDBC program connects to the database server through a broker while CSQL Interpreter directly connects to the database server bypassing the broker. This is an important difference between APIs and CSQL. The CUBRID Manager database administration tool is developed on top of the JDBC driver. The SQL statement executed in CUBRID Manager are passed to the server through the JDBC API. Other database management functions that are available in CUBRID Manager (database creating/deleting, etc.) are executed through a management utility called CUBRID Manager Server which runs as a separate manager server daemon outside of the database. Now, let's take a look at CUBRID Manager as an example to see how an SQL statement is executed in CUBRID. Refer to Figure 2 for this example. The SQL statement entered in the query editor of the CUBRID Manager (for example, SELECT * FROM olympic WHERE host_year > 1988 LIMIT 4;) is sent to CUBRID JDBC driver. This assumes that a database has already been selected in CUBRID Manager before executing this SQL, and the JDBC connection has been established. In the JDBC connection process, the connection is made to the port of the host specified in the JDBC connection information (JDBC Connection URL). The cub_broker process receives the connection and allocates a cub_cas process for the session to the connection. Then, it sends the socket connection to the cub_cas process so the connection between the JDBC driver and the cub_cas process is established. Back to the SQL statement execution, the JDBC driver included in the CUBRID Manager sends the SQL statement of the query editor to the connected broker, the cub_cas process. The broker sequentially calls db_open_buffer(), db_compile_statement() and db_execute_statement() among C APIs provided by the client library, to execute the received SQL statement. The db_open_buffer() function parses the SQL statement, the db_compile_statement() function compiles the execution plan, and the db_execute_statement() function executes SQL by sending XASL (eXtended Access Spec Language) to the server. Figure 2: SQL Statement Execution Procedure in CUBRID. As shown in Figure 2, the qmgr_execute_query() function is executed in the cub_cas, and the xqmgr_execute_query() function is executed in the cub_server process. The qmgr_execute_query() function in the client is the xqmgr_execute_query() function in the server. As shown above, CUBRID implements a communication interface of Remote Procedure Call (RPC) between a client and a server. In the server, the qexec_execute_query() and qexec_execute_mainblock() functions of the query processing module are used to execute XASL. If you are interesting how queries are processed in CUBRID, I recommend you to read CUBRID Query Processing which provides the detailed step by step explanation using real examples. By Park Kieun, Senior Software Engineer and Architect at Service Platform Development Center & IT Service Center, NHN Corporation. [Less]
Posted over 11 years ago by Park Kieun
Some developers have already asked why there are a dozen processes being created when CUBRID is started. Being one of the core developers of CUBRID Database and creator of CUBRID Cluster project, I thought it was my duty to explain CUBRID processes ... [More] and the communication between each of them. Frankly speaking, it is very easy to understand. This article is a part of a book I am writing called CUBRID Internals. I will publish the rest of the book in coming posts here at CUBRID Blog. CUBRID is based on Client-Server DBMS architecture. Such architecture enhances processing speed and distributes DBMS functions to multiple hosts as several clients can cache the database object saved on the server. In CUBRID, the database server process (cub_server process linked to libcubrid.so) performs the storage system function while the client process (cub_cas process linked to libcbridcs.so) performs analysis, optimization, and creation of execution plan of query statements. CUBRID consists of several processes. In MySQL, one mysqld process provides most functions; in Oracle, many processes starting with ora_ are running. CUBRID, like Oracle, has some processes starting with cub_. As we have reviewed in CUBRID System Architecture, CUBRID is divided into database server and broker. Each of them consists of the following processes shown in Figure 1. A circle indicates a process including the process name and the dynamic shared library. For example, the cub_server process is linked to the shared library called "libcubrid.so". The rectangle indicates the DB file or configuration file such as cubrid.conf. Figure 1: CUBRID Process Configuration. Database Servercub_serverCUBRID has one server process per DB called cub_server. A user uses it by specifying the DB name. cub_server process is a core process of CUBRID DBMS. It directly accesses the database volume file and transaction log file to process request from a client. cub_server is a multi-threaded process that can simultaneously process multiple requests. Generally, cub_cas and csql are the client processes that send requests to cub_server process. All of these client processes are linked to the libcubridcs.so shared library. The system configuration file used by cub_server is cubrid.conf. cub_mastercub_master is a master process for the CUBRID server process. It establishes a communication between the client and server processes. One master process is created for each connection port specified in cubrid.conf (cubrid_port_id parameter). Each cub_master process listens to a specified TCP/IP port. This port is used by client processes to connect with the server process. According to the specified database name, a socket connection is sent to the corresponding cub_server process, and the communication between the client and server processes is started. cub_adminThe cub_admin process performs database utility management functions such as createdb, deletedb, addvoldb, spacedb, compactdb and checkdb. According to a function, the cub_admin process can be executed in Client/Server mode (CS mode) or independent execution mode (SA mode); therefore, the shared library is dynamically loaded. This is called shared library dynamic loading. In other words, when the command is the CS mode utility command, the process is dynamically linked to the libcubridcs.so shared library; and when the command is the SA mode utility command, the process is dynamically linked to the libcubridsa.so shared library. For a more detailed description on database management utility, see CUBRID Utilities. Brokercub_brokerCUBRID broker provides various application programming interfaces (JDBC, PHP, CCI, etc.) to connect to a database server. cub_broker process establishes a socket connection to connect the JDBC or other APIs to the cub_cas process, figures out the status of the cub_cas process through the shared memory and allocates the cub_cas process distinguished by the application session. In addition, the cub_broker process adjusts the number of active cub_cas according to the setting values in the cubrid_broker.conf file for connection pooling between the application client and the database server. cub_cascub_cas relays the requests from a variety of application clients such as JDBC driver, PHP driver or CCI library program (most requests are related to SQL execution) to the CUBRID database server and exchanges information with the cub_broker process through the shared memory. The cub_cas process is linked to the libcubridcs.so shared library and it runs in CS mode. Therefore, it is a client process for the cub_server process. Now you know the purpose of each process which is created when CUBRID is started. There is another group of processes which belong to CUBRID Manager Server. But this is another talk as it is not a part of CUBRID Server itself. If you are interested, check out this question: "What is cub_js process for?" In the next article I will explain about the Basic Operations in CUBRID, i.e. what happens what you start CUBRID, what order the processes are started in, and the like. By Park Kieun, Senior Software Engineer and Architect at Service Platform Development Center & IT Service Center, NHN Corporation. [Less]
Posted over 11 years ago by CUBRID
And another one gone, and another one gone...We gladly announce that a couple more critters were smashed with your help!If you are curious just how you in the race for the 13'' MacBook Air, you can find here the Rankings Table that is updated once a ... [More] day.Remember, the important thing is to have fun and discover as many pesky little bugs as possible! [Less]
Posted over 11 years ago by CUBRID
Hello everybody,We have recently released a new application, CUBRID Web Manager. This is a web based tool used to administer CUBRID database. For those who are familiar with MySQL, we can also add that it's the equivalent of phpmyadmin. Of course ... [More] , it's still a work in progress and here is where you come in: Help us by finding bugs or by mentioning relevant features that are missing!Now, you may wonder, why should I do this ? Well, first of all, it's not a complicated thing to do. If you have basic knowledge of MySQL and phpmyadmin, then you don't have anything to worry about. The principle is pretty much the same.Do I gain something from my effort? At the end of the month we will reward the person with most bugs found with a smashing 13'' MacBook Air. But, that's not all! The 3 runners up will also receive an Amazon Kindle Fire If I only post about the event on my blog or on social network sites, can I still win something ?We will also reward 3 persons that post on twitter, facebook or on their blogs information about the event. The prize consists of 50$ vouchers for Amazon or iTunes.Where do I begin ?The best place to begin is by reading the installation instructions available here: www.cubrid.org/wiki_tools/entry/cubrid-web-manager-installation-instructionsWhere can I get more information ?You can find all the information you need at the event page: www.cubrid.org/cwm_bug_bashIf you have any other questions still left unanswered, then all you need to do is to go to www.cubrid/questions, post your question and we'll respond to it asap.Good luck and most importantly, have fun! [Less]
Posted over 11 years ago by Park Kyungil
Update: Bootstrap 2.1.0 has been released yesterday on August 20, 2012. Check out their official blog for release notes. Bootstrap is a front-end toolkit open sourced by Twitter which provides flexible HTML, CSS, and JavaScript-based design ... [More] templates, a variety of UI components and interactions for building a beautiful website. In this article I will provide an overview of Bootstrap, describe its characteristics, and will show how to start building websites with it. Bootstrap from Twitter Developing a Web UI looks easy; however, so many developers encounter a number of difficulties when trying to develop it. Different from general UI development which requires a single client environment, Web UI development requires Cross Browsing that satisfies compatibility among a variety of browser environments. There are five browsers that are mainly used in PCs - Internet Explorer, Firefox, Chrome, Safari and Opera - and additional work is necessary depending on the version and the operating system. In addition, we need to consider the current mobile environment that has a variety of screen sizes, such as smartphone and tablet PC, and the browsers specialized for each device type. These various and complex browser environments make it difficult to process Cross Browsing for the small- and mid-size Web sites. Bootstrap is the front-end toolkit from Twitter which tries to clear this difficulty and provide a start point that allows developers to quickly and easily implement a flexible web site. Bootstrap is an open source front-end toolkit developed by Mark Otto, a Twitter UI designer, and Jacob Thornton, a Twitter developer. As its meaning "to start up a system by turning on the power of a computer or pressing a reset key," Bootstrap is a starting point for building websites which provides flexible HTML, CSS and JavaScript-based design templates, variety of frequently-used UI components and interactions. Grid System Grid is a basic element for layout design. Generally, the screen is split by using the tag and CSS to place elements on the webpage layout. Popular CSS frameworks that standardize screen split and provide a grid layout system are Blueprint, 960 grid and YUI CSS grids. Bootstrap also provides its own grid layout system. The grid system of Bootstrap consists of 12-column grid and there are two versions: default and fluid. In addition, it has four design patterns for a variety of screen sizes. The patterns are linked to the responsive design to be discussed later. Default Grid System The default grid system utilizes a 12-column grid that renders out at the width of 940 pixels. A row is expressed by using row and a column is expressed by using span*. The span* class can be span1-12. The space between columns is 20 pixels and the width of the span1 class is 60 pixels. As the number of the span* class increases by one, the width increases by 80 pixels, so the width of the span12 class is 940 pixels. Figure 1. 12-column Grid System. The markup for the third row in Figure 1 is as shown below, expressed as 300 pixels + 20 pixels + 620 pixels (total 940 pixels) on the screen.       …     …   Fluid Grid System The fluid grid system uses percentage for column widths instead of fixed pixels. To use the fluid grid system, make any row fluid simply by changing to row-fluid.       …     …   Offsetting Columns Add the offset* class to a column to offset the column to the right. However, offsetting columns is available for the default grid system only. Figure 2. Offsetting Columns In Figure 2, the markup of the first row can be written as follows. The second row will be offset to right as much as four-grid width (offset4).         …     …       Nesting Columns To nest your content as shown in Figure 3, just add a new tag containing row and set of span* columns within an existing .span* column. Figure 3. Nesting Columns For the default grid, the sum of nested rows should be the number of columns of its parents.              Level 1 of column                      Level 2             Level 2                     For nesting with a fluid grid, the sum of nested rows should be 12, not the parent column.              Level 1 of column                      Level 2             Level 2                     Responsive Design Responsive Design requires to design the layout of a Web page using the CSS3 media queries in a way so that the layout automatically adapts to the screen size and resolution of various devices such as smartphones, PC, and tablet PC. Bootstrap supports a flexible screen layout with its grid system and Responsive Design. The size of the grid is changed according to the screen size as shown in the following table. Table 1. Changed Grid Size by Device. Label Layout width Column width Gutter width Smartphones 480 pixels and lower Fluid columns, no fixed widths Portrait tablet PCs 767 pixels and lower Fluid columns, no fixed widths Landscape tablet PCs 768 pixels and higher 42 pixels 20 pixels Default 980 pixels and higher 60 pixels 20 pixels Wide-screen monitor 1200 pixels and higher 70 pixels 30 pixels The following figure shows the changed layouts with responsive design according to the screen size. Figure 4. Responsive Design by Screen Size. The CSS that makes media queries responsive to the device size is as follows.   // Landscape phones and down @media (max-width: 480px) { ... } // Landscape phone to portrait tablet @media (max-width: 767px) { ... } // Portrait tablet to landscape and desktop @media (min-width: 768px) and (max-width: 979px) { ... } // Large desktop @media (min-width: 1200px) { ... }   In addition, this CSS includes a selector as a utility that displays the optimized contents on the device. Table 2. Responsive CSS Classes by Device. Class Phones 480 pixels and lower Tablet PCs 767 pixels and lower Desktops 768 pixels and higher .visible-phone Visible Hidden Hidden .visible-tablet Hidden Visible Hidden .visible-desktop Hidden Hidden Visible .hidden-phone Hidden Visible Visible .hidden-tablet Visible Hidden Visible .hidden-desktop Visible Visible Hidden CSS with a Clean and Organized Look Bootstrap provides buttons and a variety of styles offering a clean and organized look with the default CSS. The following table shows the default button designs of Bootstrap. Table 3. CSS Class by Button Type. Button class="" Description btn Gray default button with gradation btn btn-primary Button indicating something that needs to be visually highlighted or specialized btn btn-info Button that can replace the default button btn btn-success Button indicating a successful or positive behavior btn btn-warning Warning button btn btn-danger Button indicating danger or a negative effect btn btn-inverse Dark gray button that can be used generally without any special meaning The following table shows the default form designs of Bootstrap. Figure 5. CSS Class for Form. Initially, Bootstrap did not support Internet Explorer. However, it has been changed to be compatible with Internet Explorer 7 and above. Therefore, form design for Internet Explorer is not identical according to the version. A minor difference may be seen according to CSS support, such as gradation and rounded border. However, the overall layout and colors remain the same. Figure 6. Comparison of Examples of Expressing Form Design in Internet Explorer. CSS Reset uses Nicolas Gallagher's normalize.css and UI icon image uses a set of icons provided by Glyphicons. The icon set is a single image in which the CSS Sprite technique is used. Figure 7. Glyphicons Icon Set. The following is an example of a UI component where the icon set is applied. Figure 8. UI Component with Icon Set. Bootstrap CSS is written in LESS so it is easy to modify and reproduce a theme. This will be discussed later. UI Component and jQuery Plug-in There are UI patterns that are frequently used for producing webpages. UI components refers to a bundle of markup, CSS and JavaScript for easy pattern reuse. Bootstrap has a default CSS that provides a variety of easy-to-use UI components. Some components that require dynamic interaction can be controlled by jQuery plug-in type JavaScript. A component that uses the default CSS has the button group, dropdown list, navigation, tab, Move Page button, thumbnail layout, warning display and close icon. For a more detailed component list and examples, see http://twitter.github.com/bootstrap/components.html. Components that control the interaction by using the jQuery plug-in are modal layer, dropdown, toggle tap and tooltip. For a more detailed component list and examples, see http://twitter.github.com/bootstrap/javascript.html.   Note: NHN has open sourced UIO Factory, which provides developers a static UI pattern set, as well as Jindo framework-based Jindo Component which provides components for dynamic UI.   LESS Different from general programming language, CSS has no inheritance or variable. Therefore, developers should modify the code when they need to modify just the screen height. It is quite cumbersome to write and manage CSS. To solve this problem, the CSS preprocessors have been developed. CSS preprocessors have self-defined syntax by expanding the existing CSS syntax. The expanded syntax allows for describing variables, mixin, inheritance, operation, and function, and creating the final CSS that is compatible with the browser by compiling the code written with the expanded CSS. Some people have been inconvenienced by not having dynamic and object-oriented concept from CSS3. The most popular CSS preprocessors are LESS, Sass and Stylus, and the syntax and functions are similar. Bootstrap uses LESS as its CSS preprocessor. Even though it has significant benefits, most domestic sites have not applied the CSS preprocessors so far. As well as the inconvenience of compiling, in most cases, homepage design has been outsourced just for one-time so the existing CSS has not been reused and there are not many professionals who can create well-structured markup and CSS. In foreign countries, after releasing Bootstrap on Github, an active movement for applying CSS preprocessors is made and many developers started to create projects for which CSS preprocessors are changed to Sass or Stylus instead of LESS. Let's discuss LESS in detail. LESS Syntax LESS syntax uses a syntax that expands the functions only on the standard CSS syntax. Variable Declaration To declare a variable, use the @ symbol.   @mainColor: #0982c1; @siteWidth: 1024px; @borderStyle: dotted; body {  color: @mainColor;  border: 1px @borderStyle @mainColor;  max-width: @siteWidth; }   Mixin Function You can create a mixin function similar to the class selector as shown below.   /* LESS mixin error with (optional) argument @borderWidth which defaults to 2px if not specified */ .error(@borderWidth: 2px) { border: @borderWidth solid #F00; color: #F00; } .generic-error { padding: 20px; margin: 4px; .error(); /* Applies styles from mixin error */ } .login-error { left: 12px; position: absolute; top: 20px; .error(5px); /* Applies styles from mixin error with argument @borderWidth equal to 5px */ }   Inheritance LESS defines the mixin function as a class selector. Therefore, the general class selector can be used as a mixin function so that inheritance can be expressed as shown below.   .block { margin: 10px 5px; padding: 2px; } p { .block; /* Inherit styles from '.block' */ border: 1px solid #EEE; } ul, ol { .block; /* Inherit styles from '.block' */ color: #333; text-transform: uppercase; }   Operator A simple operator can be used for the operable attributes as shown below.   body { margin: (14px/2); top: 50px + 100px; right: 100px - 50px; left: 10 * 10; }   Nested Selector Nested selector is expressed as shown below.   section { margin: 10px; nav { height: 25px; a { color: #0982C1; &:hover { text-decoration: underline; } } } }   When compiling a nested selector, the following CSS is made.   section { margin: 10px; } section nav { height: 25px; } section nav a { color: #0982C1; } section nav a:hover { text-decoration: underline; }   Bootstrap LESS There are a lot of pre-defined LESS variables and mixins related to the layouts and components. By modifying the predefined variables and mixins, it is simple to change and create a variety of themes. The following is an example of LESS variables and values defined by Bootstrap by default. For more details, see http://twitter.github.com/bootstrap/less.html#variables. Figure 9. A Variety of LESS Variables of Bootstrap. The following is an example defined in Bootstrap by default. For more details, see http://twitter.github.com/bootstrap/less.html#mixins. Figure 10. A Variety of LESS Mixins of Bootstrap. LESS Compiling LESS compiling is classified into client side and server side. For client side compiling, the less.js file is saved in the Web server and linked as shown below.       For server side compiling, download the compiler from the node.js package manager and then install and use it.   $ npm install –g less $ lessc bootstrap.less > bootstrap.css   Customizing The Bootstrap homepage allows developers to customize a set of CSS and jQuery plug-in by changing the variables applied to LESS and selecting the desired CSS and jQuery plug-in. You can select the components and jQuery plug-ins and download the desired CSS and jQuery JavaScript. Figure 11. Bootstrap Customizing Page. Even though there is no LESS compiler, you can change the values applied to the LESS variables as desired on the Customizing Page. Figure 12. LESS Variable Value Customizing. Select the components and jQuery plug-ins, modify the LESS variable values, and then click the Download button to download the .zip file compressing the selected CSS and JavaScript. Figure 13. Customizing Result Download. As well as the customizing page, you can use a variety of themes from http://bootswatch.com/, http://wrapbootstrap.com/, and http://stylebootstrap.info/. Conclusion If we take a deeper look into Bootstrap from Twitter, we can see very little new technology. There seems to be nothing special. Nevertheless, Bootstrap is very popular which is proved by the number of watches and forks of Bootstrap project on Github. Also there are a variety of Bootstrap-related sites. There are a large number of well designed components accompanied by a set of jQuery plugins. It is a great framework if you want to build beautiful Web sites quickly. By Kyungil Park, Senior Software Engineer, NHN Platform Ajax Team, NHN Corporation. About the author: I develop the JavaScript CI server and a variety of tools in NHN Platform Ajax Team. I like everything related to development. Dreaming of being a gray-haired developer... References   Bootstrap, from Twitter: http://twitter.github.com/bootstrap/ Stepping Out Width Bootstrap from Twitter: http://webdesign.tutsplus.com/tutorials/htmlcss-tutorials/stepping-out-with-bootstrap-from-twitter/ CSS3 Media Queries: http://www.w3.org/TR/css3-mediaqueries/ Sass vs. LESS vs. Stylus: http://net.tutsplus.com/tutorials/html-css-techniques/sass-vs-less-vs-stylus-a-preprocessor-shootout/ 20 Resources for Bootstrap Lovers: http://designshack.net/articles/css/20-awesome-resources-for-twitter-bootstrap-lovers/ [Less]
Posted over 11 years ago by Donghyun Lee
These days at NHN we use CUBRID for our services more often than MySQL. This article summarizes the differences between MySQL and CUBRID Database, the knowledge that several departments at NHN have obtained as they have changed their database from ... [More] MySQL to CUBRID. This document is based on MySQL 5.5 and CUBRID 8.4.1. The differences can be classified into three types: Column Types SQL Syntax Provided Functions Differences in Column TypesCase-sensitiveness of Character TypesBasically, MySQL is not case-sensitive for character type values when the query is executed. Therefore, to make the character type be case-sensitive in MySQL, you should add an additional binary keyword when creating a table or a query statement. On the contrary, CUBRID is basically case-sensitive for character type values when the query is executed. The following example shows how to indicate that a target column is BINARY when creating a table in MySQL. CREATE TABLE tbl (name CHAR(10) BINARY); INSERT INTO tbl VALUES('Charles'),('blues'); SELECT * FROM tbl WHERE name='CHARLES';   Empty set (0.00 sec) This example shows how to indicate that a target column is BINARY when executing a SELECT statement in MySQL. SELECT * FROM tbl WHERE BINARY name='Charles';   +---------+ | name | +---------+ | Charles | +---------+ To make CUBRID be case-insensitive, just like MySQL, apply the UPPER() function or the LOWER() function to the target column as shown in the following example. SELECT * FROM tbl ORDER BY name;   name ====================== 'Charles   ' 'blues     '     SELECT * FROM tbl ORDER BY UPPER(name);   name ====================== 'blues     ' 'Charles   ' As shown in the example above, when UPPER() function is not applied the data is returned as they were inserted. When we apply UPPER() function to the ORDER BY column we can obtain case-insensitive results. In this case even if there was an index defined on the name column this index cannot be used by ORDER BY due to the existence of UPPER() function, thus no optimization can be applied. If there was no UPPER() function, ORDER BY would fetch the data in the order of the defined index. In order to optimize ORDER BY, you may consider creating a separate column that is totally upper-cased or lower-cased and configuring index in the column. The following is an example of separately adding the sorting column name2 which is not case-sensitive. ALTER TABLE tbl ADD COLUMN (name2 CHAR(10)); UPDATE tbl SET name2=UPPER(name); CREATE INDEX i_tbl_name2 ON tbl(name2); SELECT * FROM tbl ORDER BY name2;   name name2 ============================================ 'blues     ' 'BLUES     ' 'Charles   ' 'CHARLES   ' This coming fall we will release a new version of CUBRID under the code name "Apricot" which will introduce function based indexes. Then, you will no longer need to create a separate column and create an index on it. You will be able to do the following: CREATE INDEX idx_tbl_name_upper ON tbl (UPPER(name)); SELECT * FROM tbl WHERE UPPER(name) = 'CHARLES'; Automatic Type Conversion for Date TypeMySQL is very flexible in converting the type. It accepts character string input in the numeric type and vice versa (number input in the character string type). It also accepts numeric input in the date type. From version 8.4.0, CUBRID supports flexible type conversion, allowing for character string input in the numeric type and number input in the character string type. However, unlike MySQL, CUBRID does not accept number input in the date type. The following is an example of inputting numbers in the date type dt column in MySQL. mysql> CREATE TABLE dt_tbl(dt DATE); mysql> INSERT INTO dt_tbl VALUES (20120515); mysql> SELECT * FROM dt_tbl;   +------------+ | dt | +------------+ | 2012-05-15 | +------------+ 1 row in set (0.00 sec) The following is an example of inputting numbers in the date type dt column in CUBRID. You can see that an error is returned as a result value when numbers are input in the date type. csql> CREATE TABLE dt_tbl(dt DATE); csql> INSERT INTO dt_tbl VALUES (20120515);   ERROR: before ' ); ' Cannot coerce 20120515 to type date.   csql> INSERT INTO dt_tbl VALUES ('20120515'); csql> SELECT * FROM dt_tbl;   dt ============ 05/15/2012 When an error occurs as a result of executing the date function, MySQL returns NULL and CUBRID returns an error by default. To make CUBRID return NULL for such cases, set the value of return_null_on_function_errors system parameter to yes. The following example shows that NULL is returned when an invalid parameter has been entered in the date function of MySQL. mysql> SELECT YEAR('12:34:56');   +------------------+ | YEAR('12:34:56') | +------------------+ | NULL | +------------------+ 1 row in set, 1 warning (0.00 sec) The following example shows that an error is returned when an invalid parameter has been entered in the date function of CUBRID when the value of system parameter return_null_on_function_error has been set to no which is the default value. csql> SELECT YEAR('12:34:56');   ERROR: Conversion error in date format. The following example shows that NULL is returned when an invalid parameter has been entered in the date function of CUBRID when the value of system parameter return_null_on_function_errors has been changed to yes. csql> SELECT YEAR('12:34:56');   year('12:34:56') ====================== NULL Result Value Type of Integer-by-Integer DivisionWhen integer-by-integer division is performed, MySQL prints the output value as a DECIMAL (m, n), but CUBRID prints it as a rounded INTEGER. This is because when each operand is of the same type, the result in CUBRID is printed as that same type. In this case to display the result value as a REAL number, apply the CAST() function to any or all operands in the fraction. The following shows an example of executing integer-by-integer division in MySQL. The result value will be printed as a real number type. mysql> SELECT 4/3;   +--------+ | 4/3 | +--------+ | 1.3333 | +--------+   mysql> SELECT 4/2;   +--------+ | 4/2 | +--------+ | 2.0000 | +--------+ The following shows an example of executing integer-by-integer division in CUBRID. The result value will be printed as an INTEGER type. csql> SELECT 4/3;   4/3 ============= 1   csql> SELECT 4/2;   4/2 ============= 2 The following shows an example of executing integer-by-integer division by using the CAST() function in CUBRID. The result value will be printed as a real number type. csql> SELECT CAST(4 AS DECIMAL(5,4))/CAST(3 AS DECIMAL(5,4));   cast(4 as numeric(5,4))/ cast(3 as numeric(5,4)) ====================== 1.333333333 The following shows an example of executing integer-by-real number division in the CUBRID. Since one of the input values is a real number type, the result value will be printed as a real number (DOUBLE) type . csql> SELECT 4/3.0;   4/3.0 ====================== 1.333333333 Processing SUM result which is larger than the Maximum Value of Input Value Type How will the result be printed if the result of SUM is larger than the maximum value of the input value type? MySQL converts the result of SUM to a pre-defined large DECIMAL number type. However, CUBRID processes the result as an overflow error. It means that in CUBRID the type of the input column decides the result type. Therefore, to avoid overflow errors in CUBRID, you should convert (CAST()) the input column type to a type that can accept the SUM result value before executing operations. Converting the type when executing the CAST() function incurs some additional cost, so I recommend to decide the column type considering the result value of functions you plan to use. First, configure the same table in MySQL and the CUBRID as follows. CREATE TABLE t (code SMALLINT); INSERT INTO t VALUES(32767); INSERT INTO t VALUES (32767); MySQL successfully prints the value because the result value of executing SUM is smaller than the result type. However, as the print type is decided by the input type in CUBRID, an overflow error occurs because the resulting value of executing SUM is larger than the result type. mysql> SELECT SUM(code) FROM t;   +-----------+ | sum(code) | +-----------+ | 65534 | +-----------+   mysql> SHOW COLUMNS FROM ttt;   +-----------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+---------------+------+-----+---------+-------+ | sum(code) | decimal(27,0) | YES | | NULL | | +-----------+---------------+------+-----+---------+-------+   csql> SELECT SUM(code) FROM t;   ERROR: Overflow occurred in addition context. The following is an example of converting the column type and then executing SUM in CUBRID. You can see that the value has been successfully printed. csql> SELECT SUM(CAST (CODE AS INT)) FROM t;   sum( cast(code as integer)) ====================== 65534 The following is an example of executing SUM after deciding the column type by considering the SUM result value size in the stage of creating a table in the CUBRID. You can see that the value has been successfully printed. csql> CREATE TABLE t (code INT); csql> INSERT INTO t VALUES(32767); csql> INSERT INTO t VALUES (32767); csql> SELECT SUM(code) FROM t;   sum(code) ====================== 65534 Result Value Type of VARCHAR TypeMySQL and CUBRID both allow for numerical operation when the value of the column that is VARCHAR type is a string consisting of numbers. In this case, the operation result type is DOUBLE for both (however, saving a value that needs numerical operation as a string type is not recommended; it is used just for explanation here). The following is an example of comparing the query result type of the MySQL to that of CUBRID. mysql> CREATE TABLE tbl(col VARCHAR(10)); mysql> INSERT INTO tbl VALUES('1'),('2'),('3'),('4'),('5'); mysql> CREATE TABLE ttbl AS SELECT SUM(col) FROM tbl; mysql> SHOW COLUMNS FROM ttbl;   +----------+--------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+--------+------+-----+---------+-------+ | SUM(col) | double | YES | | NULL | | +----------+--------+------+-----+---------+-------+     csql> CREATE TABLE tbl(col VARCHAR(10)); csql> INSERT INTO tbl VALUES('1'),('2'),('3'),('4'),('5'); csql> CREATE TABLE ttbl AS SELECT SUM(col) FROM tbl; csql> ;sc ttbl           ttbl         sum(col)         DOUBLE In MySQL, if there is a character (not a number) that exists on the column value, the character is considered as 0 for operation. However, CUBRID prints an error that it cannot convert the character to double type. See the following example. mysql> INSERT INTO tbl VALUES('a'); mysql> SELECT SUM(col) FROM tbl;   +--------+ | SUM(a) | +--------+ | 15 | +--------+ csql> INSERT INTO tbl VALUES('a'); csql> SELECT SUM(col) FROM tbl;   ERROR: Cannot coerce value of domain "character varying" to domain "double". Difference of SQL SyntaxSupporting START WITH … CONNECT BYCUBRID supports the START WITH … CONNECT BY syntax which can express the hierarchy that MySQL does not support as a query. This is a part of Oracle SQL compatibility syntax. As an example, using the following data we will print "managers" and "juniors" by sorting the result values within the same level in the order of "join date". id is the employee ID of a junior staff and mgrid is the employee ID of a manager. CREATE TABLE tree(id INT, mgrid INT, name VARCHAR(32), birthyear INT);     INSERT INTO tree VALUES (1,NULL,'Kim', 1963); INSERT INTO tree VALUES (2,NULL,'Moy', 1958); INSERT INTO tree VALUES (3,1,'Jonas', 1976); INSERT INTO tree VALUES (4,1,'Smith', 1974); INSERT INTO tree VALUES (5,2,'Verma', 1973); INSERT INTO tree VALUES (6,2,'Foster', 1972); INSERT INTO tree VALUES (7,6,'Brown', 1981); MySQL does not support hierarchy statement. Therefore, to print the result value satisfying the above request, you should execute several query statements in the following order. 1) First, print a "level 1" employees whose mgrid is NULL. SELECT id, mgrid, name, 1 AS level FROM tree WHERE mgrid IS NULL;  2) Then print the "level 2" employees whose mgrid is 1. SELECT id, mgrid, name, 2 AS level FROM tree WHERE mgrid=1; 3) Then print the "level 2" employees whose mgrid is 2. SELECT id, mgrid, name, 2 AS level FROM tree WHERE mgrid=2; 4) Then print the level "3 employee" whose mgrid is 6. SELECT id, mgrid, name, 3 AS level FROM tree WHERE mgrid=6; On the contrary, as CUBRID supports the hierarchical queries, a single query statement can be created as follows. SELECT id, mgrid, name, LEVEL FROM tree START WITH mgrid IS NULL CONNECT BY PRIOR id=mgrid ORDER SIBLINGS BY id;   id mgrid name level =============================================== 1 null Kim 1 3 1 Jonas 2 4 1 Smith 2 2 null Moy 1 5 2 Verma 2 6 2 Foster 2 7 6 Brown 3 The above code means that a parent node (manager) and the child node (junior staff) should be printed in order of the values of child nodes (junior staffs) with the same level in the order of the id. Including Disaggregate Item in the SELECT LIST which Includes Aggregate Function ONLY FULL GROUP BY When executing the GROUP BY clause, both MySQL and CUBRID basically allow the disaggregate column that is not included in the GROUP BY clause to be included in the SELECT list. However, the disaggregate column not included in the GROUP BY clause selects the record value which is fetched among various values for the first time. Therefore, note that the value may be different according to the fetch orders of MySQL and CUBRID. For the disaggregate column, it is not clear which proper value should it select among several values. Therefore, enable the ONLY FULL GROUP BY function to not expose the column that is not included in the GROUP BY clause to the SELECT list. To enable the ONLY FULL GROUP BY function, for MySQL set the sql_mode value in the my.conf configuration file to ONLY_FULL_GROUP_BY. For CUBRID, set the only_full_group_by value in the cubrid.conf configuration file to yes. From version 8.3.0 CUBRID supports the only_full_group_by system parameter. Before 8.3.0, the ONLY FULL GROUP BY function always ran as if it was enabled. The following example shows the result of executing the GROUP BY clause in MySQL and CUBRID while the ONLY FULL GROUP BY function is enabled. CREATE TABLE sales_tbl (dept_no int, name VARCHAR(20) PRIMARY KEY, sales_month int, sales_amount int DEFAULT 100); INSERT INTO sales_tbl VALUES (501, 'Stephan', 4, DEFAULT), (201, 'George' , 1, 450), (201, 'Laura' , 2, 500), (301, 'Max' , 4, 300), (501, 'Chang' , 5, 150), (501, 'Sue' , 6, 150), (NULL, 'Yoka' ,4, NULL);   SELECT dept_no, avg(sales_amount) FROM sales_tbl GROUP BY dept_no ORDER BY dept_no;   dept_no avg(sales_amount) ================================ NULL NULL 201 475 301 300 501 133 When There Is No GROUP BY Clause, Is it possible to Query Even if the Disaggregate Item Exists in the SELECT LIST?MySQL has an item that uses the aggregate function in the SELECT list and executes one value that has been fetched initially when there are several values for another item. However, in this case, CUBRID considers that it cannot decide the value and returns an error. If there is a disaggregate column by the aggregate function in the SELECT LIST, it means that any value among various values for the column will be randomly selected. Therefore, it is recommended not to execute that kind of query. Configure identical data in MySQL and CUBRID as shown below. CREATE TABLE tbl(a int, b date); INSERT INTO tbl VALUES (1,'20000101'); INSERT INTO tbl VALUES (2,'20000102'); INSERT INTO tbl VALUES (3,'20000103'); INSERT INTO tbl VALUES (4,'20000104'); In this case, the following query can be executed for both MySQL and CUBRID. SELECT COUNT(a), DATE_ADD(MAX(b), INTERVAL 10 DAY) FROM tbl;   +----------+-----------------------------------+ | COUNT(a) | DATE_ADD(MAX(b), INTERVAL 10 DAY) | +----------+-----------------------------------+ | 4 | 2000-01-14 | +----------+-----------------------------------+ In MySQL, when there are several values for column a, the column value of the record which has been fetched for the first time is calculated. In this case, the value of column a is floating so this processing is not proper. mysql> SELECT COUNT(a), DATE_ADD(MAX(b), INTERVAL a DAY) FROM tbl;   +----------+----------------------------------+ | COUNT(a) | DATE_ADD(MAX(b), INTERVAL a DAY) | +----------+----------------------------------+ | 4 | 2000-01-05 | +----------+----------------------------------+ In CUBRID, when there are several values for column a, CUBRID considers that it cannot determine which record column value must be calculated and returns an error. csql> SELECT COUNT(a), DATE_ADD(MAX(b), INTERVAL a DAY) FROM tbl;   ERROR: tbl.a is not single valued. Attributes exposed in aggregate queries must also appear in the group by clause. When an error is returned as shown above, if you want to make it executable, change the a to MAX(a) or MIN(a) in order to get only one value for the INTERVAL a as shown below. If the value of a is always the same, the result of executing this query in MySQL before changing the a will be same as the result in CUBRID after changing the a. csql> SELECT COUNT(a), DATE_ADD(MAX(b), INTERVAL MAX(a) DAY) FROM tbl;   count(a) date_add(max(b), INTERVAL max(a) DAY) ===================================================== 4 01/08/2000 Using a Reserved Word as Column Name and Table NameBoth MySQL and CUBRID do not allow reserved words for column name, table name, and alias. To use reserved words for identifiers such as column name, table name, and alias, the reserved words must be enclosed with quotes (" or `). In CUBRID, square brackets ([ ]) are allowed as well as quotes. Each DBMS supports different reserved words. For example, MySQL does not use ROWNUM, TYPE, NAMES, FILE, and SIZE as reserved words, however, CUBRID uses them all. For more details on CUBRID reserved words, see the online manual CUBRID SQL Guide > Reserved Words. The following example is using reserved words as column names in CUBRID. CREATE TABLE `names` (`file` VARCHAR(255), `size` INT, `type` CHAR(10)); CREATE TABLE "names" ("file" VARCHAR(255), "size" INT, "type" CHAR(10)); CREATE TABLE [names] ([file] VARCHAR(255), [size] INT, [type] CHAR(10)); SELECT [file], [size], [type] FROM [names]; Functional DifferencesSupporting Descending IndexTechnically, in MySQL, you can create a descending (DESC) index. However, the descending index is not actually created. However, the descending index is actually created in CUBRID. The following example shows creating the descending index for the ndate column in the CUBRID. csql> CREATE INDEX ON test_tbl(no ASC, ndate DESC); Note that you can create an index in CUBRID as using the REVERSE keyword. Then the index is created in the same order of having DESC in the column. The following example shows how to use the REVERSE keyword and the DESC keyword in order to create indexes for the ndate column in CUBRID in descending order. CREATE REVERSE INDEX ON test_tbl(ndate);  ... which is same as... CREATE INDEX ON test_tbl(ndate DESC); In MySQL, creating indexes in descending order is not allowed. Therefore, you can create an ascending index after adding an additional column in order to input values in the reverse order. For example, convert the value of DATE"2012-05-18" to the numeric value -20120518 (negative value) and then input an additional column. The following example shows creating an ascending index for the reverse_ndate column in MySQL. 1) CREATE TABLE test_tbl(no INT, ndate DATE, reverse_ndate INT); 2) CREATE INDEX ON test_tbl(ndate ASC); 3) UPDATE test_tbl SET reverse_ndate = -ndate; 4) CREATE INDEX ON test_tbl(reverse_ndate ASC); However, when you create a descending index in the ndate column, or when you create the reverse_ndate column with reverse order values and then create an ascending index, if one is UPDATE or DELETE and the other is SELECT, UPDATE, or DELETE and both of indexes are simultaneously scanned, the index scan will be made in reverse to each other. This case increases the possibility of a deadlock so you must note this while creating a descending index. Both MySQL and CUBRID have a bi-directional link between index nodes. Therefore, the ascending index can be used when the query planner determines that the cost of reverse scan using an ascending index will be lower than the sequential scan using the descending index even if ORDER BY DESC is performed in the query, The reverse scan takes the advantage only when the number of records to be scanned is relatively small. The following example shows using an ascending index when ordering in the descending order in CUBRID. csql> CREATE TABLE test_tbl(a INT, b char(1024000)); csql> CREATE INDEX i_test_tbl_a ON test_tbl(a); csql> INSERT INTO test_tbl (a, b) VALUES (10, 'a'), (20, 'b'), (30, 'c'), (40, 'd'), (50, 'e'), (60, 'f'), (70, 'g'), (80, 'h'), (90, 'i'), (100, 'j');   csql> SELECT a FROM test_tbl WHERE a > 70 ORDER BY a DESC; Supporting ROWNUMROWNUM is a function to number the result rows of the SELECT query from 1 in the ascending order and used as a column of a table. With ROWNUM, you can add a serial number for printed records and limit the number of records of the query result by using the conditions of the WHERE clause. CUBRID supports ROWNUM while MySQL does not. So if you want to add a serial number to the records in MySQL, you should use a session variable. Both MySQL and CUBRID support LIMIT … OFFSET in order to limit the number of records of the query result. However, the following discussion will address ROWNUM. The following two examples are used to print serial numbers for the records of the query result. In MySQL, process ROWNUM by using a session variable. mysql> CREATE TABLE test_tbl(col CHAR(1)); mysql> INSERT INTO test_tbl VALUES ('a'), ('b'),('c'),('d'); mysql> SELECT @rownum := @rownum + 1 as rownum, col FROM test_tbl WHERE (@rownum := 0)=0; +--------+------+ | rownum | col | +--------+------+ | 1 | a | | 2 | b | | 3 | c | | 4 | d | +--------+------+ In CUBRID, execute ROWNUM. csql> SELECT ROWNUM, col FROM test_tbl;   rownum col ============================================ 1 'a' 2 'b' 3 'c' 4 'd' Oracle also supports ROWNUM. However, in Oracle, ROWNUM is executed first in the syntax including the GROUP BY clause and the ORDER BY clause, and then the GROUP BY clause and the ORDER BY clause are executed later. Therefore, the printed ROWNUM> is not in the sorting order. So, if you want to execute the query including the GROUP BY clause or the ORDER BY clause first, make the query as a subquery of the FROM clause and execute ROWNUM for that. The following example shows printing ROWNUM in the order of ORDER BY in Oracle. SELECT ROWNUM, contents FROM (SELECT contents ORDER BY date) AS subtbl; CUBRID supports the GROUPBY_NUM() function and the ORDERBY_NUM() function which produces the result of GROUP BY and ORDER BY printed in the order. In the example of MySQL, since there was no ROWNUM and a session variable @rownum has been used, the final order result was used to print the result even though there were the GROUP BY clause and the ORDER BY clause. However, since CUBRID uses ROWNUM, you should use the GROUPBY_NUM() function and the ORDERBY_NUM() function if you want to keep the order of printed numbers without using a subquery. The following two examples show the difference between two cases; one uses GROUPBY_NUM() function while the other does not. Also it shows limiting the number of result records by using ROWNUM and then executing the GROUP BY order in the corresponding records. csql> SELECT ROWNUM, host_year, MIN(score) FROM history WHERE ROWNUM BETWEEN 1 AND 5 GROUP BY host_year;   rownum host_year min(score) ========================================================= 6 2000 '03:41.0' 6 2004 '01:45.0' The following example shows limiting the number of result records by using GROUPBY_NUM() for the record set ordered by using GROUP BY. csql> SELECT GROUPBY_NUM(), host_year, MIN(score) FROM history GROUP BY host_year HAVING GROUPBY_NUM() BETWEEN 1 AND 5;   groupby_num() host_year min(score) ================================================== 1 1968 '8.9' 2 1980 '01:53.0' 3 1984 '13:06.0' 4 1988 '01:58.0' 5 1992 '02:07.0' The following two examples show the difference between two cases; one uses the FOR ORDERBY_NUM() function while the other does not. Also it shows limiting the number of result records by using ROWNUM and then executing the ORDER BY order in the corresponding records. SELECT athlete, score FROM history WHERE ROWNUM BETWEEN 3 AND 5 ORDER BY score;   athlete score ============================================ 'Thorpe Ian' '01:45.0' 'Thorpe Ian' '03:41.0' 'Hackett Grant' '14:43.0' The following example shows limiting the number of the result records by using ORDERBY_NUM() for the record set ordered by using ORDER BY. SELECT athlete, score FROM history ORDER BY score FOR ORDERBY_NUM() BETWEEN 3 AND 5; athlete score ============================================ 'Luo Xuejuan' '01:07.0' 'Rodal Vebjorn' '01:43.0' 'Thorpe Ian' '01:45.0' Difference of AUTO_INCREMENTWhat should we do if we must know the primary key created after executing INSERT? The LAST_INSERT_ID() function is used to get the last value which has been INSERTED to the column after INSERTING a value to the AUTO_INCREMENT attribute column within a program connected to the database. For example, you can use it when you want to INSERT a value in the table where the primary key is included and then insert a foreign key value from another table in the table as the value of the LAST_INSERT_ID() function. Both MySQL and CUBRID support this function. However, an error may occur in some specific cases, so I recommend you to directly get the AUTO_INCREMENT value rather than using the LAST_INSERT_ID() function in CUBRID. Below I will explain these specific cases. In MySQL or CUBRID, when a record is INSERTED to the table which includes the AUTO_INCREMENT attribute column, the value of the column is automatically increased by 1. CREATE TABLE tbl (col INT AUTO_INCREMENT PRIMARY KEY, col2 INT); INSERT INTO tbl (col2) VALUES(1); INSERT INTO tbl (col2) VALUES(2); INSERT INTO tbl (col2) VALUES(3); SELECT LAST_INSERT_ID();   +------------------+ | last_insert_id() | +------------------+ | 3 | +------------------+ After executing INSERT to Table A, if you are trying to INSERT a foreign key value to Table B (which refers to Table A) by using the LAST_INSERT_ID() function value and an application or a server is unexpectedly and abnormally terminated, what will happen? (on the assumption that the procedure is one transaction, of course) In MySQL, the procedure, including all values increased by the AUTO_INCREMENT, is rolled back. If the AUTO_INCREMENT value of the record to execute INSERT newly was 3, the AUTO_INCREMENT value becomes 3 again after transaction rollback. However, in CUBRID, the increase of AUTO_INCREMENT value is not affected by transaction rollback. Therefore, the existing AUTO_INCREMENT value of the record was 3, the AUTO_INCREMENT value becomes 4 after transaction rollback. In MySQL, you can get the value entered in the column specified as the AUTO_INCREMENT by using the LAST_INSERT_ID() function only. However, in CUBRID, you can directly get the AUTO_INCREMENT value without using the LAST_INSERT_ID() function. It is just like using the SERIAL object of CUBRID because AUTO_INCREMENT has been implemented as a SERIAL object in CUBRID. When an AUTO_INCREMENT column is created in CUBRID, the name of the SERIAL object of the column is specified as "_ai_" internally. The following example shows calculating the next value and the current value on the AUTO_INCREMENT column in CUBRID. SELECT _ai_.NEXT_VALUE; SELECT _ai_.CURRENT_VALUE; In the current CUBRID version, the LAST_INSERT_ID() function may malfunction in the following specific cases. Therefore, I recommend that you directly get the AUTO_INCREMENT value rather than using the LAST_INSERT_ID() function. The LAST_INSERT_ID() is one of the session variables that manage the values in each database connection unit. If a broker is switched in CUBRID or if a failover occurs because of master node switch in the HA environment, it may malfunction so the desired value may not be printed. It will be modified later. In the following example, a session variable has been used as an intermediate storage for getting the next AUTO_INCREMENT value. For an application, use the variable in the program as an intermediate storage. Note that the SERIAL in CUBRID has the same function as the SEQUENCE in Oracle. The following example shows executing INSERT the primary key and the foreign key respectively after getting the NEXT_VALUE value of the SERIAL. csql> CREATE TABLE tblPK (col INT AUTO_INCREMENT PRIMARY KEY, col2 INT);   csql> CREATE TABLE tblFK (colfk INT AUTO_INCREMENT PRIMARY KEY, colfk2 INT, CONSTRAINT fk_col FOREIGN KEY (colfk2) REFERENCES tblPK (col)); At this time, the name of SERIAL created by the tblPK AUTO_INCREMENT is tblpk_ai_col. You can check that by executing: SELECT name FROM db_serial WHERE class_name='tblpk'; Directly insert the tblPK AUTO_INCREMENT value in the tblPK primary key and the tblPK foreign key. You don't need to save the tblFK colfk in an additional variable since it is not referred to by the other tables. csql> SET @a=(SELECT tblpk_ai_col.NEXT_VALUE); csql> INSERT INTO tblPK VALUES (@a,10); csql> INSERT INTO tblFK(colfk2) VALUES (@a); The following example shows how to write the code that returns the primary key value when a value executes INSERT in the tblPK in the iBatis sqlmaps. For the tblFK foreign key, use the returned value from the "insert" object. SELECT tblpk_ai_col.NEXT_VALUE INSERT INTO tblPK(col, col2) VALUES (#id#, #col2#) Length Difference of String Functions Such as CHAR_LENGTH For a string function, MySQL calculates the length based on the number of characters. However, CUBRID calculates it based on the character byte length. This difference depends on the database support of the character set. MySQL supports the character set and calculates the length based on the character set. However, CUBRID does not support the character set and considers all data as bytes, so it calculates the length based on the byte length. For UTF-8 character set, the byte length of one Korean character is 3 bytes. Therefore, the return value of the CHAR_LENGTH will be three times the number of characters in CUBRID in case of Korean characters. In CUBRID, the functions related to the character length, such as POSITION, RPAD and SUBSTR, use the byte length as the input parameter, not the character length. When specifying the length of CHAR or VARCHAR also as creating a table, the byte length is used. In the second half of this year we will release a new version of CUBRID under the code name "Apricot" with full Unicode support where the calculation will be made based on the character length, not the byte length. The following example shows the result of executing CHAR_LENGTH in the CUBRID. SELECT CHAR_LENGTH('CUBRIDa');   char length('CUBRIDa') ================== 13 Cursor Holdability After Commit while the Record Set FetchCursor holdability is that an application holds the record set of the query result to fetch the next record even after performing an explicit commit or an automatic commit. In this connection, the JDBC specification supports ResultSet.HOLD_CURSORS_OVER_COMMIT and ResultSet.CLOSE_CURSORS_AT_COMMIT. Both MySQL and CUBRID ignore the configuration that uses the conn.setHoldability() function. MySQL always runs as HOLD_CURSORS_OVER_COMMIT, and CUBRID always runs as CLOSE_CURSORS_AT_COMMIT. In MySQL, when conn.getHoldability() is called, the CLOSE_CURSORS_AT_COMMIT value is returned, which is opposite to the current operation. Refer to the source code of mysql-connector-java-5.1.20.tar.gz of http://www.mysql.com/downloads/connector/j/. In other words, MySQL holds the cursor even if a commit occurs during fetch, and CUBRID closes the cursor when a commit occurs during fetch. Therefore, to hold the cursor while fetching the record set SELECT in CUBRID 8.4.x or lower versions, set the auto commit mode to FALSE and fetch the record set while not committing a transaction. (In the new version of CUBRID "Apricot" to be released in the second half of this year, the cursor will be held regardless of commit unless the record set is closed.) The following schema is used below to demonstrate how cursor holdability works. CREATE TABLE tbl1(a INT); INSERT INTO tbl1 VALUES (1),(2),(3),(4); CREATE TABLE tbl2(a INT); The following example shows a program that there are four data rows on Table tbl1 while the auto commit is set to true (autocommit=true) and SELECT and INSERT the data to Table tbl2. After executing that, MySQL holds the cursor even when auto commit is executed, so four data will be INSERT in Table tbl2. On the contrary, CUBRID closes the cursor because of auto commit when INSERT is executed. Therefore, no more fetch is executed and only one data will be INSERT in Table tbl2. public static void executeTr(Connection conn) {     stmt=conn.createStatement();     stmt2=conn.createStatement();     String q1 = "SELECT a FROM tbl1";     String q2;     ResultSet rs = stmt.executeQuery(q1);     conn.commit();     while (rs.next())     {         int a = rs.getInt("a");         q2="INSERT INTO tbl2 VALUES ("+a+")";         stmt2.executeUpdate(q2);     }     stmt.close();     stmt2.close(); } ConclusionIn this article I have reviewed many (but not all) differences between MySQL and CUBRID, which I recommend users to know before they switch from MySQL to CUBRID. It is important to fully understand these differences to successfully apply CUBRID to services. By Donghyeon Lee, NHN Business Platform DBMS Development Lab From the author: "Is your baby a boy?" I am a father who is asked this question even though my little baby is a girl. I will do my best until the day CUBRID becomes a DBMS that is very familiar to users. References MySQL 5.5 Reference Manual: http://dev.mysql.com/doc/refman/5.5/en/create-index.html MySQL Performance Blog: http://www.mysqlperformanceblog.com/2006/05/09/descending-indexing-and-loose-index-scan/ Interface ResultSet (Java Platform SE 7): http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html [Less]
Posted about 12 years ago by Dongsun Choi
There are various techniques to improve the performance of your Java application. In this article I will talk about Statement Pooling Configuration and its effect on Garbage Collection process. Statement Pooling allows to improve the performance of ... [More] an application by caching SQL statements that are used repeatedly. Such caching mechanism allows to prepare frequently used statements only once and reuse them multiple times, thus reducing the overall number of times the database server has to parse, plan, and optimize these queries. A well-configured number of statements (maxStatements) to be cached can be as good as tuning the Garbage Collection. Now let's see how Statement Pooling can affect the Garbage Collection. Why Check the Number of Statement in the Pool? Often the size of the JDBC statement pool is set to the default value. Using the default value, of course does not usually lead to any special issue. But a well-configured maxStatements value can be as effective as GC tuning. If you are using the default maxStatements value and would like to optimize the use of memory, let's think about the correct statement pool value before attempting GC tuning. As was discussed in Understanding Java Garbage Collection, a weak generational hypothesis (most objects quickly become unreachable and a reference from an old object to a new object is rare) was used as the precondition when creating garbage collector in Java. For the majority of NHN web services there should be a response within 300ms at the latest, unless it is a special case. Therefore, NHN web services are more applicable to the above situations than the general stand-alone type applications. The GC Process between HTTP Request and Response When developing a web service using web containers like Tomcat and other frameworks, the lifespan of objects created by a developer tend to be either very short or very long. Web developers usually write codes like Interceptor, Action, BO, or DAO (BO and DAO are generated and used as singletons from applicationContex in Spring, and are not the target of GC). The objects generated from these codes stay alive for a very brief time that exists between the time HTTP is requested and the time it has responded. For this reason, such objects are usually collected during Young GC. There are also objects, such as singleton objects, that stay alive long enough to exist for the lifecycle of Tomcat. Such objects will be promoted to the old area soon after Tomcat starts running. Yet, when continuously monitoring web applications through jstat and the like, there are always some objects promoted to the old area during Young GC. These objects are usually used after being stored in the cache used for improving the performance of frameworks in most of the containers and projects. Whether the cached objects become the target of GC or not is determined by their cache hit ratio, not their age, so unless the hit ratio is 100%, they cannot avoid being promoted to the old area, even when the Young GC cycle is set to be long. Among these caches, statement pooling affects the memory usage the most. If you are using iBatis, as iBatis processes all SQLs as preparedStatment, you will be using statement pooling. If the size of statement pooling is smaller than the number of SQLs being used, the cache hit ratio will decrease and result in cache maintenance cost. Objects that are reachable in the old area become the target of GC and will be retrieved, then will be regenerated during the HTTP request process, only to be cached and promoted to the old area. The full GC cycles are affected by this process. Size of the Statement Objects It would be safe to say that the size of a single statement object is proportional to the length of the SQL code processed by the same statement. Even for a long and complex SQL, the size of the object should be around 500 bytes. The object's small size would seem to have little effect on the full GC cycles, but such an assumption would be incorrect.  When you look at the JDBC specifications, each connection has its own statement pool (maxStatementsPerConnection), as described in Figure 1 below. So, although a statement object is as small as 500 bytes, if there are many connections, the statements cache may occupy the proportional amount of the heap.  Figure 1: Relationship between the Connection and the Statement.(Though the statement has the ResultSet, it should be clarified that ResultSet is not an object for caching. ResultSet is allocated as null when rs.close() is called by iBatis, then retrieved in the young area during young GC.) The Effect of Statement Pool's Cache Hit Ratio on the Full GC A simple test program was created to assess the effect of cache hit ratio on the full GC. One cache hit ratio was set to 100% while the other was set to 50%. When the same amount of load was applied, the results presented in Table 1 and 2 were obtained. In both cases, the occurrences of young GC were very similar but the results for the full GC was different. When the cache hit ratio was 100%, full GC occurred only once, because the number of objects promoted to the old area during young GC was small. When the ratio was 50%, full GC occurred 4 times because the number of statement objects promoted to the old area during young GC was high, as the objects were cached in the statement pool, then removed from the pool in LRU way, then cached again at the next request. Table 1. Cache hit ratio = 100%. ... OC OU YGC FGC FGCT GCT ... 10688.0 6940.9 532 1 0.190 1.274 ... 10688.0 6940.9 532 1 0.190 1.274   Table 2. Cache hit ratio = 50%. ... OC OU YGC FGC FGCT GCT ... 10240.0 7092.7 554 4 0.862 2.253 ... 10240.0 7412.0 555 4 0.862 2.255 I would like to add one more thing. When the cache hit ratio is 50%, it violates the 2nd category of weak generational hypothesis I introduced previously. When low cache hit ratio causes frequent pool registration and subsequent removal, it means the statement object generated in the young area is being referenced in the pool from the old area, which leads to additional strain during GC because the card marking technique is used to manage the references separately. In Conclusion In Lucy (NHN's internal Java Framework), the maxStatements value for statement pooling in Oracle and MySQL is 500. In most cases, 500 should be enough. However, when more SQL is being used, increasing the default value to meet such demand would be a way to improve the system efficiency (when using $(String replacement) for query on iBatis for the reason of table partitioning and the like, the number of queries must be multiplied by the number of partitioned tables). However, when the default value is higher than necessary, this leads to a different problem. A higher value means more memory usage and higher likelihood of an Out Of Memory (OOME) occurrence. In a situation where the number of SQLs are 10,000 and the number of connections are 50, then the total size of statement objects is about 250 MB. (500 byte * 50 * 10,000 = 250 MB). It should be easy to determine the likelihood of OOME occurrence by checking the Xmx configuration for the service in use. What strategy do you follow to determine the correct number of statements to be pooled? Share your experience in the comments below. By Dongsoon Choi, Senior Engineer at Game Service Solution Team, NHN Corporation. [Less]
Posted about 12 years ago by Dongsun Choi
There are various techniques to improve the performance of your Java application. In this article I will talk about Statement Pooling Configuration and its effect on Garbage Collection process. Statement Pooling allows to improve the performance of ... [More] an application by caching SQL statements that are used repeatedly. Such caching mechanism allows to prepare frequently used statements only once and reuse them multiple times, thus reducing the overall number of times the database server has to parse, plan, and optimize these queries. A well-configured number of statements (maxStatements) to be cached can be as good as tuning the Garbage Collection. Now let's see how Statement Pooling can affect the Garbage Collection. Why Check the Number of Statement in the Pool? Often the size of the JDBC statement pool is set to the default value. Using the default value, of course does not usually lead to any special issue. But a well-configured maxStatements value can be as effective as GC tuning. If you are using the default maxStatements value and would like to optimize the use of memory, let's think about the correct statement pool value before attempting GC tuning. As was discussed in Understanding Java Garbage Collection, a weak generational hypothesis (most objects quickly become unreachable and a reference from an old object to a new object is rare) was used as the precondition when creating garbage collector in Java. For the majority of NHN web services there should be a response within 300ms at the latest, unless it is a special case. Therefore, NHN web services are more applicable to the above situations than the general stand-alone type applications. The GC Process between HTTP Request and Response When developing a web service using web containers like Tomcat and other frameworks, the lifespan of objects created by a developer tend to be either very short or very long. Web developers usually write codes like Interceptor, Action, BO, or DAO (BO and DAO are generated and used as singletons from applicationContex in Spring, and are not the target of GC). The objects generated from these codes stay alive for a very brief time that exists between the time HTTP is requested and the time it has responded. For this reason, such objects are usually collected during Young GC. There are also objects, such as singleton objects, that stay alive long enough to exist for the lifecycle of Tomcat. Such objects will be promoted to the old area soon after Tomcat starts running. Yet, when continuously monitoring web applications through jstat and the like, there are always some objects promoted to the old area during Young GC. These objects are usually used after being stored in the cache used for improving the performance of frameworks in most of the containers and projects. Whether the cached objects become the target of GC or not is determined by their cache hit ratio, not their age, so unless the hit ratio is 100%, they cannot avoid being promoted to the old area, even when the Young GC cycle is set to be long. Among these caches, statement pooling affects the memory usage the most. If you are using iBatis, as iBatis processes all SQLs as preparedStatment, you will be using statement pooling. If the size of statement pooling is smaller than the number of SQLs being used, the cache hit ratio will decrease and result in cache maintenance cost. Objects that are reachable in the old area become the target of GC and will be retrieved, then will be regenerated during the HTTP request process, only to be cached and promoted to the old area. The full GC cycles are affected by this process. Size of the Statement Objects It would be safe to say that the size of a single statement object is proportional to the length of the SQL code processed by the same statement. Even for a long and complex SQL, the size of the object should be around 500 bytes. The object's small size would seem to have little effect on the full GC cycles, but such an assumption would be incorrect.  When you look at the JDBC specifications, each connection has its own statement pool (maxStatementsPerConnection), as described in Figure 1 below. So, although a statement object is as small as 500 bytes, if there are many connections, the statements cache may occupy the proportional amount of the heap.  Figure 1: Relationship between the Connection and the Statement.(Though the statement has the ResultSet, it should be clarified that ResultSet is not an object for caching. ResultSet is allocated as null when rs.close() is called by iBatis, then retrieved in the young area during young GC.) The Effect of Statement Pool's Cache Hit Ratio on the Full GC A simple test program was created to assess the effect of cache hit ratio on the full GC. One cache hit ratio was set to 100% while the other was set to 50%. When the same amount of load was applied, the results presented in Table 1 and 2 were obtained. In both cases, the occurrences of young GC were very similar but the results for the full GC was different. When the cache hit ratio was 100%, full GC occurred only once, because the number of objects promoted to the old area during young GC was small. When the ratio was 50%, full GC occurred 4 times because the number of statement objects promoted to the old area during young GC was high, as the objects were cached in the statement pool, then removed from the pool in LRU way, then cached again at the next request. Table 1. Cache hit ratio = 100%. ... OC OU YGC FGC FGCT GCT ... 10688.0 6940.9 532 1 0.190 1.274 ... 10688.0 6940.9 532 1 0.190 1.274   Table 2. Cache hit ratio = 50%. ... OC OU YGC FGC FGCT GCT ... 10240.0 7092.7 554 4 0.862 2.253 ... 10240.0 7412.0 555 4 0.862 2.255 I would like to add one more thing. When the cache hit ratio is 50%, it violates the 2nd category of weak generational hypothesis I introduced previously. When low cache hit ratio causes frequent pool registration and subsequent removal, it means the statement object generated in the young area is being referenced in the pool from the old area, which leads to additional strain during GC because the card marking technique is used to manage the references separately. In Conclusion In Lucy (NHN's internal Java Framework), the maxStatements value for statement pooling in Oracle and MySQL is 500. In most cases, 500 should be enough. However, when more SQL is being used, increasing the default value to meet such demand would be a way to improve the system efficiency (when using $(String replacement) for query on iBatis for the reason of table partitioning and the like, the number of queries must be multiplied by the number of partitioned tables). However, when the default value is higher than necessary, this leads to a different problem. A higher value means more memory usage and higher likelihood of an Out Of Memory (OOME) occurrence. In a situation where the number of SQLs are 10,000 and the number of connections are 50, then the total size of statement objects is about 250 MB. (500 byte * 50 * 10,000 = 250 MB). It should be easy to determine the likelihood of OOME occurrence by checking the Xmx configuration for the service in use. What strategy do you follow to determine the correct number of statements to be pooled? Share your experience in the comments below. By Dongsoon Choi, Senior Engineer at Game Service Solution Team, NHN Corporation. [Less]
Posted about 12 years ago by Donghyun Lee
The internals of an RDBMS is extremely profound and sensitive. When updating a record in the table, not only the size of columns to be updated affects the performance but also the size of the record itself. Updating a variable-length column is less ... [More] favorable than updating a fixed-length column in terms of performance, and when updating a variable-length column, you need to change the unfill_factor value and adjust the percentage of disk space. In this article, I will explain why these details affect an UPDATE performance, and what happens inside CUBRID when an UPDATE query is executed. My story will be based on the latest version of CUBRID as of today 8.4.1. UPDATE in RDBMS Let's make an analogy between the process that implements an UPDATE query in RDBMS and the process of selling products in a shopping mall. Shopping mall RDBMS StorageDisk ShelfMemory Buffer At a shopping mall the employee keep frequently bought products on the shelves and less frequently purchased products in the main storage. When customers come, they can buy the products available on shelves quickly without waiting. For products not available on the shelves, they have to wait while the shopping mall employees go and retrieve them from the storage. It is the same in RDBMS. To have fast access to the data, you need to put frequently requested data in a memory buffer. How and what products are displayed on the shelves will have a profound impact on the shopping mall's sales. Likewise, depending on which data is in the memory buffer, it will have a decisive effect on RDBMS. In order to improve performance, RDBMS handles data input/output by page (the default value in CUBRID is 16KB). Operations for the memory buffer used for cache are also processed by pages. There could be a shortage of space if too many products are displayed on the shelf. Suppose you want to remove slow-selling items off the shelves and display other more frequently sold products. However, the latter ones are too big to be displayed. What should you do? You will have to move products around on the shelf to allocate more space for big ones. Which method should you use if you do not want to rearrange a page when the new data to be UPDATED from RDBMS is bigger than the old record? Maybe the employees at the shopping mall could display products on the shelf all together. It would be effective to divide areas but it is impossible in RDBMS. In RDBMS, reading and writing can occur in one record at the same time, or there could be a race in limited buffer space. Therefore, you should be able to access a record exclusively by using record locking. How do we actually make this work? UPDATE athlete SET name = 'Cheolsu Kim' WHERE code = '10980'; The above SQL implementation process can be summarized in the flowchart below: Figure 1: Implementation process flowchart for UPDATE query. To be precise, we should also express the process of SQL parsing, SQL optimization, transaction process, data locking, data correction, data storing, operation logging, and operation restoration. However, this article will focus only on what happens in memory buffer and disk while updating. Index NavigationThe first thing to be done is to find the record we need to update. If the conditions used in the corresponding query can use index, then we can use index navigation. If the details of the our record are in the memory buffer, use them as they are. If the record is not found in the memory buffer, we need to read it from the disk and transfer it to the memory buffer. This process is called FIX. In CUBRID, index is composed of a B+ tree (for details refer to CUBRID Query Tuning Techniques and What is Covering Index in CUBRID 8.4.0?). Generally speaking, a B+ tree stores the location (VPID) of previous and next key nodes in a non-leaf node while the disk location of the actual data which corresponds to the key is stored in a leaf node. Therefore data search process is affected by the type of a B+ tree configuration. Figure 2: Storage structure of a B+ tree index. Sometimes the size of a column (like text) can be bigger than a page (16KB). In this case you need a separate storage other than the node of a B+ tree. Therefore, you might have to allocate a separate storage space except the B+ tree when searching for data. Further I will explain about which index configuration is likely to cause the overflow key file or overflow OID issues when UPDATE-ing a record. Overflow KeyFirst, let's look at a very large node that configures index. CREATE INDEX idx_history ON athlete(history);UPDATE athleteSET gold = 3WHERE history = $history_string; If history field data used by key is roughly 100KB, it will be greater than a page (16KB), a unit that stores regular data. This is called an overflow key, and you cannot save the key value to a page. In a node, you should store ID (OFID, Overflow File ID), which refers to a file that stores the key value. As CUBRID manages data by page, it is burdensome if data is divided into several pages, as many memory copies (memcpy) are needed to configure a single piece of data. Hence, it is recommended to consider the size of a key so that it can be stored in one page if possible. Overflow OIDNow let's look at a situation in which keys are overlapped. CREATE INDEX idx_gender ON athlete(gender);UPDATE FROM athleteSET gender_type = 'Male'WHERE gender = 'M' Assume gender column is indexed. Low-selectivity of the gender field (only two values "M" and "F", or DISTINCT(column) is low) may result in many OIDs being stored in the same node in B+ tree for the same index key. If space necessary for storing OID is bigger than a page by 1/4, it will be stored in a separate Overflow OID (OOID) page. If the number of separate overflow pages increases, it would affect the performance as there would be a burden to navigate between these additional links. The following is a structure of a B+ tree in case overflow files and overflow OID pages exist. Figure 3: A storage structure of OFID and OOID. Modifying the Column ValueThis process finds the record to be updated and modifies the necessary column data of the record. The following flowchart summarizes this process: Figure 4: Data update flowchart. Since RDBMS implements read and write operations in pages, you will need to read or write more than one pages. Data stored in the memory can be read/written faster than the data stored on a disk, thus you should minimize disk I/O operations by maximizing the memory. To do this, you can use a memory buffer, which is an intermediate storage space.  As shown in the above Figure 4, first, check if the necessary data is in the memory buffer. If there is no data, load it to the memory buffer from the disk (DB Volume). Then update the data in the memory. When the data is updated, the data to be modified is recorded in logs, and, finally, it is recorded to the database volume (disk) at the checkpoint. Memory and Disk ManagementThe following figure summarizes the policy and main tasks used for memory and disk management. Figure 5: Work that occurs when storing updated data. Page Management PolicyLRU (Least-Recently Used)In order to effectively use the limited buffer size and maintain recently used data, remove the data which has not been used for a long time and free up the space. This is a memory method used by the STEAL policy. Tasks of Buffer AdministratorFIXThe task acquires a latch for a buffer in competition with other transactions when reading a page from a disk using a memory buffer. Therefore, you cannot use different transactions while acquiring a latch for a buffer to be used in your transaction. If there is a corresponding page in a buffer, a page is not loaded from a disk (database volume). UNFIXThis is a different concept from FIX in a way that the task is for a page that is no longer used in a transaction. It is not always the case that an UNFIXed page is flushed to a disk. FLUSHIn this task you will write a DIRTY page in a buffer to a disk. As you have to write a log file (WAL - Write-Ahead Log) to a disk, just before executing the task a race can occur in which two or more files write in the same log file. Logging Policy to Restore Failure or RollbackWAL (Write-Ahead Logging)Write-Ahead Logging is a policy to enable restoration upon system failure, i.e. always write UNDO/REDO logs to a disk before writing the data page. UNDO includes the data before UPDATE and REDO includes the new data after UPDATE. You can use STEAL/NO STEAL, FORCE/NO FORCE policies to specify when to reflect the page stored in a memory buffer to a disk. STEAL/NO STEALWhen a transaction attempts to use a memory buffer, the STEAL policy flushes the Least Recently Used (LRU) dirty pages and frees up the memory buffer. To use the STEAL policy, you need UNDO logging in order to recover old data when you rollback a transaction. In contrast, the NO STEAL policy keeps a DIRTY page in a buffer until a transaction is completed. Therefore, there should be enough buffer space to keep all pages that have been changed by all transactions in progress. FORCE/NO FORCEThe FORCE policy reflects all pages updated by a transaction to a database volume when commit is called. Using the FORCE policy affects performance, as disk write has to be carried out every time a transaction commits. If a page has to be corrected 20 times by several transactions in a brief period of time, the disk also needs to be written to 20 times. The NO FORCE policy does not necessarily reflect all pages renewed by a transaction during a commit process. The costs for rewriting a page could be reduced, if other transactions are renewing the same page while commit is not reflected to a database volume. However, to ensure data changes by a successfully committed transaction, REDO logging is necessary in the event of a system failure. The NO STEAL and FORCE policies will be the easiest implementation, but their performance is the worst. As the STEAL and NO FORCE policies are the best, most databases including CUBRID are implemented via these policies. For this reason both UNDO logging and REDO logging are necessary, and the entire record should be logged not just the column to UPDATE. When UNDO logging, the record before the change is stored as-is and the record after the change is compressed and stored when REDO logging. You must note that a big record size will affect logging, even if the size of column to UPDATE is small. If there are a lot of records that are very big and the number of columns to UPDATE frequently is small, it is preferable to gather these UPDATE-able columns and create a separate table for them. This is recommended only considering the size of logging records. Page Storage StructureNow let's learn about how a page is stored depending on the size of data to UPDATE. The following figure shows the basic structure where a page is stored in the record: Figure 6: Record storage structure inside a page. When a new value is entered, records are assigned from the front of a page while slots are assigned from the end of a page in the opposite direction. One record consists of a header (header), fixed-length columns (Fix Col), and variable-length columns (Var Col). The following explains each component. page header: Stores the following information about total slots within a page a number of slots and records the size of total extra space and contiguous extra space the initial offset of contiguous space transaction ID etc. heap page header: Stores initial record location and schema information location. record: Stores a record value. slot: Stores information about record location (offset, length). If there is a variable-length column in a record, the size of the record to UPDATE may be changed. If the record to UPDATE is smaller than the existing size, data can be UPDATEd in the same location. However, if the size is bigger, assign the record to UPDATE in the same page, and the existing slot indicates the record to UPDATE. If there is no space in the same page to UPDATE a new record, a separate page will be assigned as shown in the following figure, and the existing record indicates a new record. Figure 7: The structure of record storage when space to UPDATE in the existing page is insufficient. If the size of a record to UPDATE is bigger than a page, data is divided and stored by page as shown in the following figure, and the page that stored the previous data indicates the page that stored renewed data. Figure 8: The structure of a record storage when the record to UPDATE is bigger than a page. When writing data in this case, data should be split into several pages, and to read, the split pages should be loaded in a series of memory, which may negatively affect the performance. Exclusive Record LockTo this point we have explored how CUBRID uses system memory and disk when UPDATE query is requested. Now I will explain how to manage records within the corresponding range when executing an UPDATE query. Generally, a shared lock is used to search for a specific record to allow other transactions to see [SELECT] the same record you are currently navigating. For INSERT, UPDATE, or DELETE, an exclusive lock is used to block access from other transactions. In the searching process using WHERE condition, however, you must use an update lock instead of shared lock when acquiring the lock for the record in the target range of UPDATE. You also need to acquire the key of the record in operation and the lock for the next key to block your operating range from other transactions. We will describe why an additional update lock is necessary besides exclusive lock during UPDATE, and why key locking is needed. Update LockThe following explains lock acquisition/release for UPDATE execution. Request UPDATE query with the WHERE condition to the database server. Acquire update lock while searching for the record with the corresponding range condition.  In this case, acquire the key lock for the following corresponding key in the WHERE condition range. While executing actual data UPDATE, convert update lock to exclusive lock. Commit transactions and release all acquired locks. The action of an update lock can be explained as: as this is the only transaction authorized to UPDATE until the UPDATE transaction is completed, other transactions cannot not UPDATE. Another transaction can acquire an update lock on a transaction that has a record with shared lock. However, if a transaction has acquired update lock, another transaction cannot obtain shared lock. Concurrency may decrease if an exclusive lock of the record to UPDATE can be obtained immediately, and thus using shared lock seems to be a better option while searching for a target by condition. Why do we have to use an update lock instead of shared lock? The reason is to minimize deadlock. For better understanding, let's look at the following table, which has no update lock. Transaction 1shared lock1 (Record A)Transaction 2shared lock2 (Record A)Transaction 1exclusive lock1 (Record A);StandbyTransaction 2exclusive lock1 (Record A);Standby The situation above is the one in which Transactions 1 and 2 of the same Record A acquired the shared lock, and they are about to upgrade the lock to exclusive lock. However, they both are waiting for the other transaction to free shared lock; that is, they are in deadlock. What would happen if there is an update lock? Transaction 1update lock1 (Record A);NavigationTransaction 2update lock2 (Record A);StandbyTransaction 1exclusive lock1 (Record A);UPDATE;lock release Transaction 2update lock2 (Record A);Navigation;exclusive lock2 (Record A);UPDATE;lock release Transaction 1 has acquired an update lock and Transaction 2 is in the standby mode to obtain an update lock. Transaction 1 changes the update lock to exclusive lock, UPDATEs Record A and releases the lock. Transaction 2 acquires the update lock2, which was in a standby mode, and thus it can execute UPDATE. When the update lock is introduced, you can avoid some of the deadlock effects caused by UPDATE. The following deals with specific examples of the situations above. Automatic commit mode is released and isolation level is 4 (REPEATABLE READ SCHEMA, READ COMMITTED INSTANCES). create table t1(id int, num int);create unique index idx_t1_id on t1(id);insert into t1 values (10,10), (20,20), (40,40), (80, 80);commit; Transaction 1 Execution (update ... where id = 40;)Transaction 2Standby (update ... id = 40;); StandbyTransaction 1commit;Transaction 2Execution (update ... where id = 40;)Transaction 2commit; When a transaction is acquiring an update lock, other transactions cannot UPDATE or DELETE the corresponding record. Therefore, if you use an update lock and access the same record above, you can avoid deadlock resulting from UPDATE. An update lock is used not only for UPDATE, but also for DELETE tasks with the WHERE condition that uses index. Key LockAn update lock can be obtained even when the record to UPDATE is in the shared lock with other transactions; however, two or more transactions cannot be acquired at the same time. For data consistency, update lock alone is insufficient. Let's consider the following example. Transaction 1delete table where id = 1Transaction 2insert into t1 (id) values (1)Transaction 2 commitTransaction 1rollback There are many models that can be used to handle the request above, and approach methods vary depending on RDBMS. CUBRID 2008 R4.1 places the INSERT task of Transaction 2 in a standby mode. However, Transaction 1 cannot put Transaction 2's tasks in the standby mode by using an update lock as the data of the corresponding key will be deleted from the B+tree when executing DELETE. This problem can be solved by using a key lock. When performing DML tasks, a key lock acquires lock authority of the record indicated by the next key of the record for processing. If table locking is executed without obtaining a key lock, it will deteriorate concurrency; thus, a lock to a certain area should be obtained to ensure maximum concurrency. Typical example of using key locks is a unique index. When performing DML tasks, you should obtain a key lock for the record indicated by the next key of the corresponding record and the target record, to ensure there are no DML tasks by other transactions within the range. Let's learn about the operating process of key locks with the following example: Automatic commit mode is released. create table t1 (id int, num int);create unique index idx_t1_id on t1(id);insert into t1 values (10,10), (20,20), (40,40), (80, 80);commit; Transaction 1 update t1 set num=400 where id=40;Transaction 2insert into t1 values (30, 30);  Standby… rollback; Transaction 2delete from t1 where id=20;  Standby… rollback; Transaction 2insert into t1 value (70, 70);  Standby… rollback; Transaction 2delete from t1 where id=80;  Standby… rollback; Transaction 2update t1 set num=444 where id=80;  Standby…rollback; When Transaction 1 executes UPDATE of the id=40 record, the transaction will acquire the key lock for the id=40 key and id=80, the next key. In this case, other transactions within the range cannot INSERT, UPDATE, or DELETE new records. For Transaction 2 to INSERT the id=30 record, it should acquire the key lock for id=40; as Transaction 1 already acquired the key lock for id=40, the work is in standby mode. As Transaction 1 already acquired the key lock for id=40 and id=80, all other tasks, which require the work with these keys, are all in standby mode.  It is likely that other transactions can be in standby mode because of key lock. The more indexes you have on a column you plan to UPDATE, the occurrence of key lock may increase which results in a greater negative effect on UPDATE performance. Thus, we recommend that you remove index for the corresponding columns, considering both SELECT and UPDATE performance. Figure 9: Key lock related scenario about records being updated. Considering the occurrence of deadlock between indexes due to key lock, we recommend that you retry the corresponding task. In the following scenario, we will learn about deadlocks resulting from key lock. Deadlock may occur under the assumption that Transaction 1 and 2 are implemented at the same time. create table t1(id int, num int);create unique index idx_t1_id on t1(id);insert into t1 values (10,10), (20,20), (40,40), (80, 80);commit; Transaction 1 update t1 set num = 40 where id = 1 using index pk_t1_id; (pk)Transaction 2 update t1 set id = 30 where num = 7 using index i_t1_num;  (idx) Figure 10: Deadlock scenario caused by key lock. Transaction 1 attempts to UPDATE the column num of id=1 record and obtains the key lock for id=1 and id=5 as well as the exclusive lock for record (1,9). In this case, Transaction 2 attempts to UPDATE the column id, which is num = 7, and acquires the key lock for num=7 and num=9 as well as the exclusive lock for record (5 and 7). Transaction 1 needs the key lock for index idx 9 to UPDATE the column num of id=1 record but Transaction 2 has already acquired the corresponding lock. While Transaction 2 needs the key lock for index pk 5 to UPDATE the column id of num=9 record, Transaction 1 has already acquired the corresponding lock. That is, deadlock has occurred. Key lock has key shared lock and key exclusive lock. The lock that the INSERT task acquires for the next key is key shared lock, and the lock that UPDATEd or DELETEd tasks will acquire for the next key is key exclusive lock. When UPDATE or DELETE is working on the record of a specific range, other transactions must not change the record within the corresponding range; this is to allow INSERT by other transactions into the key lock range of the next location during INSERT operation. Let's examine the following figure: Figure 11: Key lock related scenario for record being INSERTed. When Transaction 1 INSERTs id = 90 record, it acquires the key shared lock for id = 100, the next key. When Transaction 2 INSERTs id = 95 record, it acquires the key shard lock for id = 100, which is also the next key. Lock sharing by transactions that are different from one another on the same key id = 100 is possible because key lock is acquired in key shared lock mode. In ConclusionIn this article we have observed the events that occur in the memory and disk storage structure during UPDATE of data in CUBRID and how to obtain locks so as to block other transactions from the record within the UPDATE range. It is a mistake to think that the processing costs will be small if just one record or a small column is UPDATEd. When searching for the WHERE condition to find the UPDATE target, yes, the use of index will affect the searching time. But also you must remember that the size of a record also affects UNDO/REDO logging, not the size of UPDATE target column. We can assume that the smaller the size of record to be UPDATEd and the longer the fixed-length compared to the existing size, the more advantageous it is for UPDATE performance. Under the assumption that UPDATE of a variable-length column occurs always, you can consider adjusting the system parameter unfill_factor (the default value 0.1) value, which controls the disk space of a pre-allocated page for data UPDATE. For example, suppose the value of unfill_factor is 0.1, there will be about 10% of extra space in a page when data INSERT occurs continuously; thus, the remained 10% of extra space could be used for INSERT for the same page or for UPDATE. UPDATE requires update lock and key lock. You should understand that these could lead to standby or deadlock of competing transactions. If an unexpected delay or deadlock occurs, you should check if an index configuration for UPDATE column is appropriate, or if the number of indexes on the column could be minimized. I hope that this article and the recommendations I gave to reduce the burden of UPDATE execution provide you an opportunity to understand the locking and storage structure usage methods in CUBRID database when executing UPDATE operations. By Lee Donghyun, CUBRID Manual and Release Notes writer, CUBRID DBMS Development Lab, NHN Corporation. [Less]
Posted about 12 years ago by Jeon Won Hee
"Ins and outs of NHN" is a series of articles that compares platforms and services from third-party vendors with NHN's own solutions. The topic of this first article in the series is Database Sharding Platform. I will introduce about the efforts ... [More] being made from inside and outside of NHN to implement Database Sharding. I will first explain the concept of Sharding data vs. Partitioning data. Then review the common methodology to implement sharding. And finally I will compares all sharding platforms. This article will be very interesting for developers interested in big data management. Database ExpansionTo store and search a volume of data which is so big that it cannot be handled by one database, you must find a way to use multiple databases. Although there are some databases made for distributed environments, such as Cassandra or Dynamo, these have many functional constraints, such as weakness in terms of search range or inability to use the JOIN operations. In order to expand data while using a relatively feature-rich functionality, it is recommended to use RDBMS by sharding the databases. In the past, the Sharding logic was implemented directly on the application layer, but now there are an increasing number of examples introducing Sharding platforms which allow to move the Sharding logic from the application layer to database or middleware layers. At a core level, Sharding platforms must respond effectively to ever-increasing data without failure, and handle different data characteristics and models depending on services. In this article I will compare Spock Proxy, the Sharding platform based on MySQL Proxy, Gizzard, created by Twitter, and CUBRID SHARD, native database sharding feature in CUBRID which is set to launch in the first half of 2012. A table of other solutions was previous posted at Database Sharding with CUBRID. Horizontal Partitioning and ShardingHorizontal partitioning is a design that divides and stores data, such as a schema, into two or more tables within one database. For example, to handle large data of user messages, such schema may be created so that messages by users from city A are stored in one table, while messages by users from city B are stored in another table. This allows to reduce the size of the index and increases the concurrency of operations. The key point in this approach is that the data is partitioned between tables within a single database. Sharding, on the other hand, is the distributed approach where data is horizontally partitioned between tables created in physically different databases.  Thus, Sharding is a method to store the messages by users from city A in database A and messages by users from city B in database B. Here each database is called a Shard. As you have to work on multiple databases, there could be limits to functionality according to circumstances and also a drawback in terms of consistency and replication, including JOIN operations. In many cases, Sharding used to be implemented at the application server level. There have been many attempts to provide this at the platform level. These can be classified into a pattern that operates in an application server, such as Hibernate Shards, a middle tier pattern as CUBRID SHARD, Spock Proxy, and Gizzard, and a pattern that provides the Sharding functionality from a database itself, e.g. nStore or MongoDB. Middle Tier Sharding PlatformBy default, a Sharding platform should consider the following items: Database location abstraction Scalability Monitoring/Ease of operations Database location abstraction and scalability are different from each other but connected. Database location abstraction ensures that on the application layer you do not need to know which data (which Row) is located in which database. The application is connected only with the Sharding platform. Connecting to a database is what the Sharding platform should do. In addition, Sharding platform should carry out the task to add a replicated storage to a specific Shard (one of the partitioned databases) in order to migrate database for replacement without restarting or changing the application code. When the Sharding platform comes to a stop, it is obvious that an application server will not be able to access the database. For this reason, the Sharding platform should provide redundancy. For monitoring, the Sharding platform should be able to provide a number of requests and error information according to Shard keys (a standard to determine which data is stored in which database). Comparison between Spock Proxy and CUBRID SHARDThe key function of the Sharding platform is the CRUD operation (CREATE, READ, UPDATE, DELETE) which chooses one database among many according to 'a standard set by developer' in other words the Sharding strategy. We will compare Spock Proxy, a typical Sharding platform of MySQL, with the CUBRID SHARD platform developed by NHN. Spock Proxy is a Sharding platform designed based on MySQL Proxy. In MySQL Proxy, you can execute the Lau scrip code, which is written by a developer before and after performing SQL. The primary purpose of using MySQL Proxy is to analyze and modify SQL. To use Spock Proxy, you need to create a MySQL database to manage the information about shards and how the data should be distributed. The row is a rule for Sharding. Figure 1: Specifying Sharding rules in Spock Proxy. CUBRID SHARD is the Sharding platform for CUBRID. The uniqueness of CUBRID SHARD is that it can also be used with MySQL as well as Oracle. It is set to be launched in the first half of 2012, and is planned to be used for processing the meta information database system of NDrive service, the cloud storage system developed by NHN. The following table shows a summary of a comparison between Spock Proxy and CUBRID SHARD. Table 1 Comparison between Spock Proxy and CUBRID SHARD  Spock Proxy CUBRID SHARD Sharding rule storeDBMS TableConfiguration fileHow to create Shard keysModulo Modulo Developer's own sharding strategy provided in a library How to find a Shard keySQL parsingUsing HINTStrengthNo need to change SQL Supports CURBID, MySQL, and Oracle Higher performance Weakness Lower performance due to extra SQL parsing Supports MySQL only Requires the change to SQL queries to insert sharding HINT Spock Proxy stores Sharding rules in the table in MySQL database (the rest 'universal_db'). The SQL received from an application server is parsed, and checks whether the query has shard keys. If shard keys are provided, the MySQL instance will be identified according to the standard recorded in universal_db, then SQL will be relayed to that MySQL instance. When using this method, you do not need to describe information related to shard keys in SQL, unlike in CUBRID SHARD. Therefore, if you did not use Sharding before for your coding, but recently had to use due to the data increase, you may use Spock Proxy which will work without requiring you to change the SQL in your application. Note that this is limited to cases where there is no need to change the schema for Sharding, or when Sharding can be applied without changing the SQL that you use. However, the method used to find Shard after parsing SQL, as used by Spock Proxy, has weaknesses in terms of performance. It could lead to unnecessary work, as SQL should be parsed twice: once by Spock Proxy to determine the MySQL instance, then by MySQL itself. In CUBRID SHARD, HINT is used which allows to avoid parsing SQL twice. Suppose there is a table as follows: student student_nonameage ......... To use the student_no column data as a shard key, an application server sends the following prepared SQL to CUBRID SHARD. SELECT student_no, name, age FROM student WHERE student_no = /*+ SHARD_KEY */ ? CUBRID SHARD shows that if a HINT displays /*+ SHARD_KEY */ in SQL, the column data with the corresponding HINT can be used as a Shard key. It then reads the student_no value, which follows the hint, and identifies RDBMS based on the configuration file and transmits the corresponding query. As such, the benefit of using HINT is that you can improve processing efficiency by avoiding parsing SQL twice, and react to various RDBMS without violating the database location abstract. CUBRID SHARDING provides various HINTs, in addition to /*+ SHARD_KEY */. Typically, there is /*+ SHARD_ID(__id__) */, a HINT that allows you to find a special shard. The /*+ SHARD_VAL(__shard_key_val__) */ HINT can also play a role in finding a special shard for tables which have no shard keys. While this HINT is the same in that it searches for tables that have no specific shard keys, it configures a value for shard key column directly without choosing a shard, and selects the shard according to the internal rules of the middleware. Unlike Spock Proxy where Sharding rules are inserted into a database table, in CUBRID SHARD Sharding rules are specified in the configuration file. If there are three access addresses for actual RDBMS storage, you should specify DB addresses to CUBRID SHARD as follows. 0 shardDB shardNODE1:33061 shardDB shardNODE2:33062 shardDB shardNODE3:33063 shardDB shardNODE4:3306 If you want to use the Modulo method, write the corresponding value in the configuration file as below. SHARD_KEY_MODULAR = 256 In addition, specify [MIN..MAX] according to the value generated from SHARD_KEY_MODULAR, and describe which shard to send the corresponding query to. #min max shard_id0    63   064   127  1128  191  2192  255  3 If you need a more subtle method than Modulo, you can create program code of your own that calculates the Shard ID for Shard keys. SHARD_KEY_LIBRARY_NAME = libshardkeyid.soSHARD_FUNCTION_NAME = user_get_shard_key As shown in the above example, you can register your own library which will calculate the Shard ID logic. The common weakness of both Spock Proxy and CUBRID SHARD is that they both require additional network IO time for each additional hop because they are implemented as a middle tier. The following figure displays a general process of internal execution in CUBRID SHARD, which is performed when a developer executes a query. When a developer executes a query, the query is analyzed by the DB shard middleware, which determines to which shard it will be sent. Then the query is transmitted to the selected shard, and finally the middleware delivers the response to the client. Figure 2: CUBRID SHARD process. Gizzard Gizzard is a Sharding platform developed by Twitter. It is also a middle tier just like Spock Proxy and CUBRID SHARD. However, its usage and architecture are quite different from Spock Proxy or CUBRID SHARD. The following figure shows how the data can be shared between several databases deploying two units of Gizzard. Figure 3: Gizzard deployment diagram. In this case, the interface being used by an application server is Thrift, an RPC protocol, not JDBC. Therefore, it is similar to DBGW platform developed by NHN. When the schema of a storage changes, Gizzard may also need modifications. While it can be a constraint, it could also open up access not only to RDBMS, but also to various other databases (e.g. Lucene). Gizzard is written based on Scala, and you can expand its functions by adding Scala codes as needed. Instead of viewing it as a complete product, we recommend that you download the source and modify it to suit your needs. The biggest advantage of Gizzard is that you can perform hotspot response and database migration in the middle tier platform level. For example, Gizzard provides the direct replication feature, as shown in the following figure. Figure 4: The replication feature of Gizzard. Spock Proxy or CUBRID SHARD do not offer this feature, as they do not need it. CUBRID, MySQL, and Oracle have their own replication feature, thus they do not need to offer the feature from the middle tier. The replication feature provided by Gizzard is very simple - send each request to several replicas - but cannot handle variety of requests that may occur while running the replication feature. However, it can be useful when using a database without the replication feature. Figure 5: The migration feature of Gizzard. As shown in the figure above, it offers the replication feature which does not require a service halt. When migrating a certain database, the feature replicates the details of a database storage before replication configuration, after allowing the new data to be written in two locations by configuring the replication. This approach, of course, cannot be used on RDBMS due to consistency issue. Limitations of ShardingSharding, on the other hand, also has its own limitations. Typical constraints of Sharding are as follows: It cannot perform JOIN operations for two or more Shardings. auto increment (serial) values can vary depending on Sharding. last_insert_id() value is no more valid. shard key column value must not be updated; requires delete, then insert. does not allow to access two or more Shards from one transaction. When using Sharding, therefore, it is important that you right perform data modeling and schema design to prevent the above constraint issues. To learn more about Sharding and other features of CUBRID I suggest you to see the Slideshare presentation at http://www.slideshare.net/cubrid/growing-in-the-wild-the-story-by-cubrid-database-developers. By Jeon Won Hee, Senior Software Engineer at CUBRID DBMS Development Lab, NHN Corporation. [Less]