Skip to content

๐Ÿ’˜ Database Performance Improvement Story(EN)

๊น€๋ฏผ์ข… edited this page Oct 28, 2024 · 2 revisions

Noticeable Issues After Inserting 100,000 Records in the Database

Pair Room Creation

  • There was a logic that performed a full scan of all pair rooms to check for duplicate access codes during pair room creation.
  • This issue was not noticeable with a small amount of data, but as the data grew, it put a strain on the server, resulting in long response times for creation requests.

Link Retrieval

  • Similarly, during link retrieval, the logic involved fetching all links into memory and then filtering for pair rooms.
  • Just like with pair room creation, this became problematic as the amount of dummy data increased, leading to server strain.

image

  • A bottleneck occurred during the execution of findAll(), taking over 5 minutes.

Member Logic

  • An index was created on PROVIDER_USER_ID to optimize queries.

image

Next Steps

  • Due to the issues mentioned above, we will check and resolve the database bottlenecks across all tables.

Common Test Environment

  • Loaded Data: 5 million records
  • Measurement Target: Query Time (sec)

Pair Room Service

Query List

  • Create Pair Room (pair_room)

    INSERT INTO pair_room (access_code, created_at, driver, navigator, status, updated_at, id) 
    VALUES (?, ?, ?, ?, ?, ?, default);
  • Replace Pair and Change Pair Room Status (pair_room)

    UPDATE pair_room
    SET
      access_code=?,
      driver=?,
      navigator=?,
      status=?,
      updated_at=?
    WHERE
       id=?
  • Retrieve Pair Room Data by Specific Access Code (pair_room)

    SELECT
        pair_room.id,
        pair_room.access_code,
        pair_room.created_at,
        pair_room.driver,
        pair_room.navigator,
        pair_room.status,
        pair_room.updated_at
     FROM
        pair_room
     WHERE
        pair_room.access_code = ?
  • Check Existence of Pair Room with Specific Access Code (pair_room)

    SELECT 
        pair_room.id
    FROM
        pair_room
    WHERE
        pair_room.access_code= ?
    LIMIT ?
  • Retrieve Pair Room-Member Reference Table Data by Specific Member ID (pair_room_member)

    SELECT
      pair_room_member.id,
      pair_room_member.member_id,
      pair_room_member.pair_room_id
    FROM	
      pair_room_member
    WHERE
      pair_room_member.member_id = ?

Index Configuration and Performance Testing

The columns used in the query conditions for the pair room service are as follows:

  • pair_room
    • access_code
  • pair_room_member
    • member_id

Since both columns have FK & UNIQUE constraints, it was determined that separate index configurations were unnecessary. In fact, executing the retrieval queries confirms that the data is retrieved at a very fast speed.

  • Performance Test for Retrieving Pair Room Data by Specific Access Code (pair_room)

    Pair Room Data Retrieval Test

    • Query Time: 0.01 sec

    With the Unique Index already set, the retrieval performance is excellent.

  • Performance Test for Retrieving Pair Room-Member Reference Table Data (pair_room_member)

    Pair Room-Member Reference Data Retrieval Test

    • Query Time: 0.00 sec

    With the Index already set, the retrieval performance is excellent.

Conclusion

It was decided not to set additional indexes for the tables related to the pair room service.


Member Service

Query List

  • Create Member (member)

    SELECT
        m1_0.id,
        m1_0.access_token,
        m1_0.created_at,
        m1_0.provider_login_id,
        m1_0.profile_image,
        m1_0.updated_at,
        m1_0.provider_user_id,
        m1_0.user_name 
    FROM
        member m1_0 
    WHERE
        m1_0.provider_user_id=?
  • Retrieve Specific Member Information by Provider User ID (member)

    SELECT
        m1_0.id,
        m1_0.access_token,
        m1_0.created_at,
        m1_0.provider_login_id,
        m1_0.profile_image,
        m1_0.updated_at,
        m1_0.provider_user_id,
        m1_0.user_name 
    FROM
        member m1_0 
    WHERE
        m1_0.provider_user_id=?
  • Check Existence of Specific Member by Provider User ID (member)

    SELECT
      m1_0.id 
    FROM
      member m1_0 
    WHERE
      m1_0.provider_user_id=? 
    LIMIT ?
  • Check Existence of Specific Member by User ID (member)

    SELECT
      m1_0.id 
    FROM
      member m1_0 
    WHERE
      m1_0.user_id=? 
    LIMIT ?
  • Check Existence of Non-Deleted Member Information by User ID (member)

    SELECT
      m1_0.id 
    FROM
      member m1_0 
    WHERE
      m1_0.provider_user_id=? AND deleted_at IS NULL
    LIMIT ?

Index Configuration and Performance Testing

The columns used in the query conditions for the member service are as follows:

  • member
    • user_id
    • provider_user_id

Before and after configuring the index for the columns used in the conditions, the query times were measured by executing the same query.

  • The two queries used in the member service were executed almost identically, so only one query was tested.

  • The column using user_id was excluded because it already has a unique constraint.

  • Performance Test for Retrieving Specific Member Information by Provider User ID (member)

Before Index Configuration

![Before Index Configuration](https://github.com/user-attachments/assets/50c45547-d192-4139-831f-88d0218f9155)

- Time Taken: `1.45 sec`

After Index Configuration

![After Index Configuration](https://github.com/user-attachments/assets/6ccf25f0-37c6-4a95-adac-ad6cf650806d)

- Time Taken: `0.01 sec`

Conclusion

The query time was significantly reduced from 1.45 sec to 0.01 sec.

Overall Conclusion

It was decided to configure an index for the provider_user_id column in the member table.

  • Performance improvement was observed when the index was applied.
  • Since the values in that column are not frequently changed, it is not necessary to be overly concerned about performance issues related to the index.

Reference Link Service

Query List

  • Retrieve all reference link data based on pair room ID (reference_link)

    SELECT
        rl.id,
        rl.category_id,
        rl.created_at,
        rl.pair_room_id,
        rl.updated_at,
        rl.url
    FROM
        reference_link rl
    WHERE
        rl.pair_room_id=?
  • Delete a single reference link data based on reference link ID (reference_link)

    DELETE
    FROM
      reference_link
    WHERE
      id=?
  • Retrieve all reference link category data based on pair room ID (category)

    SELECT
        c.id,
        c.category_name,
        c.created_at,
        c.pair_room_id,
        c.updated_at
    FROM
        category c
    WHERE
        c.pair_room_id=?
  • Check the existence of a category based on pair room ID and category name (category)

    SELECT
        c.id
    FROM
        category c
    WHERE
        c.category_name=?
        AND c.pair_room_id=?
    LIMIT
        ?
  • Check the existence of a category based on category ID and category name (category)

    SELECT
        c.id
    FROM
        category c
    WHERE
        c.id=?
        AND c.pair_room_id=?
    LIMIT ?
  • Delete a single category data based on reference link ID and pair room ID (category)

    DELETE
    FROM
        category
    WHERE
        pair_room_id=?
        AND id=?
  • Retrieve all open graph data based on reference link ID (open_graph)

    SELECT
        og.id,
        og.created_at,
        og.description,
        og.head_title,
        og.image,
        og.open_graph_title,
        og.reference_link_id,
        og.updated_at
    FROM
        open_graph og
    WHERE
        og.reference_link_id=?
  • Delete a single open graph data based on reference link ID (open_graph)

    DELETE 
    FROM 
      open_graph 
    WHERE
      reference_link_id=?

Index Configuration and Performance Testing

The columns used for query conditions in the member service are as follows:

  • reference_link
    • id
    • pair_room_id
  • category
    • pair_room_id
    • pair_room_id & category_name
    • category_id & category_name
    • reference_link_id & pair_room_id
  • open_graph
    • reference_link_id

By default, we considered index configuration for the category_name column, excluding the columns already indexed as PK & FK.

Since category_name is not used alone in conditions but alongside FK values, we thought it might still allow for fast queries without a separate index, so we conducted an experiment.

  • We only tested queries that utilize pair_room_id & category_name due to their similar conditions.

  • Check the existence of a category based on pair room ID and category name (category)

image

  • Time taken for the query: 0.01 sec

We confirmed that the performance was fast when querying together with the already indexed pair_room_id.

Conclusion

In conclusion, we decided not to create separate index settings for the tables related to the reference link service.


To-Do Service

Query List

  • Retrieve all to-do data based on to-do ID (todo)

    SELECT * 
    FROM todo
    WHERE id = ?;
  • Retrieve all to-do data based on pair room ID in ascending order (todo)

    SELECT * 
    FROM todo td
    WHERE 
        td.pair_room_id = ?
    ORDER BY 
        td.sort ASC;
  • Retrieve the largest sort value in ascending order based on pair room ID (todo)

    SELECT *
    FROM 
        todo td
    WHERE 
        td.pair_room_id = ?
    ORDER BY 
        td.sort DESC
    LIMIT 1;

Index Configuration and Performance Testing

The columns used for query conditions in the to-do service are as follows:

  • todo
    • id
    • pair_room_id

Conclusion

Since all columns used in the conditions are already indexed as PK & FK, there is no need to consider additional index configurations.


Timer

Query List

  • Retrieve timer data based on pair room ID (timer)

    SELECT
        te1_0.id,
        te1_0.created_at,
        te1_0.duration,
        te1_0.pair_room_id,
        te1_0.remaining_time,
        te1_0.updated_at 
    FROM
        timer te1_0 
    WHERE
        te1_0.pair_room_id=?
  • Retrieve timer data based on pair room access code (timer, pair_room)

    SELECT
        te1_0.id,
        te1_0.created_at,
        te1_0.duration,
        te1_0.pair_room_id,
        te1_0.remaining_time,
        te1_0.updated_at 
    FROM
        timer te1_0 
        LEFT JOIN
        pair_room pre1_0 
        ON pre1_0.id=te1_0.pair_room_id 
    WHERE
        pre1_0.access_code=?
    SELECT
        pre1_0.id,
        pre1_0.access_code,
        pre1_0.created_at,
        pre1_0.driver,
        pre1_0.navigator,
        pre1_0.status,
        pre1_0.updated_at 
    FROM
        pair_room pre1_0 
    WHERE
        pre1_0.id=?

Index Configuration and Performance Testing

The columns used for query conditions in the timer service are as follows:

  • timer
    • pair_room_id

Conclusion

Since the FK is already indexed by default, we decided not to set an additional index.

๐Ÿ€ ์ฝ”๋”ฉํ•ด๋“€์˜ค

์ „์ฒด

์ •๋ณด

BE

FE

๊ธฐ์ˆ 

์ปจ๋ฒค์…˜

ํ…Œ์ŠคํŠธ

์„ฑ๋Šฅ ๊ฐœ์„  & ์ ‘๊ทผ์„ฑ ๊ฐœ์„  ๋ฆฌํฌํŠธ

์ธํ”„๋ผ

Clone this wiki locally