:

What is REDO?

In this article I hope the summarise in simple terms what the REDO logs are for in your database, how to manage them and why they are so important. There are a lot of topics to cover when discussing REDO logs, but the aim of this article is not to comprehensively detail every nuance of the REDO. My aim is to provide you with enough information to know why you have them and allow you to work safely with them, from understanding whether you should multiplex your online redo logs to the Oracle wait events concerned with REDO generation.

So, Tell me What REDO Is…Sure, let’s keep it simple and brief with a list of what they are and do:

  • Keep a record of every transaction which happens against the database
  • You must have at least 2 REDO logs for each instance (many instances may access 1 database, which is how RAC works)
  • Critical for instance recovery if the instance should fail

NOTE: A REDO thread for a single instance database is always 1. The only time you have different threads of REDO is when you have a RAC configuration in which case you have a set of REDO log files per instance. This helps to avoid a bottleneck of each instance having to write to the same set of REDO logs.

REDO logs contain REDO records which describe a change to a database block (actually it contains the change to the UNDO, the database block and the transaction table of the undo segments) Using the REDO you can reconstruct all changes made to the database, including the UNDO segments REDO records which are also written to an area of memory called the LOG BUFFER, before being written out to disk by the Log Writer (LGWR) process. When a transaction COMMITs, the LGWR process writes out the contents of the LOG BUFFER to the active online REDO log file and only returns control back to the user once this is complete. Uncommitted changes may make it through to the online REDO log on disk when another transaction commits, but these will be rolled back if the instance crashes before the transaction has actually been committed.

How Does Oracle Write to the REDO Logs?

The LGWR process writes out REDO records sequentially, so one after another, into the online REDO logs. Once the active online REDO log is filled, the database moves to the next one, then the next one, etc, until it reaches the last one then starts back at the beginning, circling around them continually. Online REDO logs are reusable once all changes in that logfile have been archived (when in Archivelog mode) and all changes have been written to the datafiles. If archiving is disabled then each REDO log is available for reuse only after all changes in the REDO log have been written to the datafiles. The LGWR only writes to one REDO log file at a time. There are three types of REDO:

  • CURRENT – REDO log which is being written to by the LGWR
  • ACTIVE – Required for instance recovery
  • INACTIVE – no longer required for instance recovery

REDO Log Switching and Log Sequence Numbers

A log switch is the name given to the time when the instance stops writing to one REDO log and starts writing to another. This normally happens when one REDO log is full, but can also be user initiated or on a time interval. The parameter ARCHIVE_LAG_TARGETcan be used to set a maximum time between log switches.

Every new REDO log written to by the database is given a unique sequence number, which is also retained when it becomes an archived REDO log. The database can use this ordering to ensure it recovers the database from the logs in the correct order.

NOTE: I have started calling them online REDO logs because there are also archived REDO logs which are just copies of the REDO logs which allow you to keep all of the transactions against your database so that you can perform point-in-time recovery.

Multiplexing REDO logs

This area is one of massive debate. If you Google it and have a read through the forums you will see that there are two main schools of thought on this one; those who do multiplex and those who don’t. Here are the pros and cons of each approach:

Multiplex REDO

Pros

  • Safe-guards against loss of one member of a group of log files

Cons

  • Costly performance hit as Log Writer (LGWR) has to write to two places simultaneously
  • Multiple files to manage

Non-Multiplexed REDO

Pros

  • Best performance possible

Cons

  • Doesn’t protect you against loss of a log file

So, the pros and cons are pretty simple but there is one factor which I haven’t mentioned above and that is RAID groups. So, the people who say there is no need to multiplex your online REDO logs will argue that using SAN technology to write your data across multiple disks and have redundancy (the most common, high performance solution being a RAID10 or RAID1+0) negates the need for multiplexing the REDO at the database level as well. Someone who is for multiplexing the REDO will say that although the SAN can help protect against this and make it very improbable that you will lose an online REDO log, there is still a chance. Also, the SAN can’t protect against someone accidentally or maliciously deleting the REDO log file from disk.

The short answer is it depends on what risk you are prepared to take for a higher performing database. Some see the risk as very small and don’t multiplex their REDO. Others see that as too much of a risk and go in favour of multiplexing them. It’s up to you to decide which approach you want to take. As always, it’s a trade-off between performance and belt and braces safety.

Oracle Wait Events

There are two main Oracle wait events which are concerned with the generation of REDO; Log file sync and Log file parallel write. Let’s take a closer look at each one in turn and find out more information.

Log File Sync

First of all I should recommend a book that I find very helpful when it comes to Oracle wait events. The Oracle Wait Interface book here is really good. Although it was written in 2004, the underlying concepts rarely change in the database so the information is not dated. Sure, there are some aspects which have changed in recent version but the underlying principles remain the same. Anyway, moving on.

A quick recap: Every transaction which occurs in the database is written to the log buffer and then to the online REDO logs. This is one of the fundamentals of how the database works to keep a track of everything which has happened in the database. It allows us to us tools like Log miner (to view every transaction in the log file) and technologies such as Data Guard and Change Data Capture. It is also critical for the recovery of the database.

If you see the log file sync wait event it is because the database is writing out data from the log buffer to the online redo logs and the user session is having to wait for that to complete. Why is it waiting? There could be many reasons, but here are the main ones:

  • There are a high number of commits or rollbacks happening on the system
  • The drives on which the REDO log files sit are performing poorly and are slow to respond

To address the first point, you can see if it is the code issuing a high frequency of commits/rollbacks. You need to try and identify which session is causing them and try to reduce this. Developers should commit in logical places to commit, so as to conform with the business requirements. The code should not just commit after every row for the sake of it, there has to be a reason. As I mentioned before, log miner can be used to look at the contents of the REDO logs and this is one way you can check to see the commit behaviour of the application.

With regard to the second point, you should check through the AWR report and see what the response times are. If they are above about 4-5 milliseconds(ms) it might be worth looking at the layout of the disks and if the contention can be reduced. Anything below that number and your response times are pretty good. If you are running your database on a SAN, the write cache should be absorbing all of the writes so the response time should be low. I know that on our system we have a 1ms response time for the online REDO log drive.

Log File Parallel Write

It is only the Log writer process (LGWR) which waits on the log file parallel write wait event. You’ll usually find that both the log file sync and log file parallel write wait events are high at the same time, as they are very much related to one another. I would describe it as the background wait event is this one and the foreground wait event (which a user waits on) is the log file sync. Because the two are so closely related, the actions you need to take are the same as above.

It is also worth double checking that asynchronous I/O has been enabled so that the writes to the online redo logs are faster.

I hope that this article about REDO has helped clarify what is happening inside the database and given you some food for thought in terms of multiplexing the REDO logs and what options you have available to use.

Like it, share it...

Category: Articles


Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *