WEB caching acceleration based on reverse proxy - cached CMS system design

WEB caching acceleration based on reverse proxy - cached CMS system design

Author: Cha Dong Email: chedongATbigfoot.com/chedongATchedong.com

Written on: 2003/05 Last updated: 02/22/2006 14:42:55

Copyright Notice: You can reprint anything, please be sure to indicate the original source and author information and this statement by hyperlink. Http://www.chedong.com/tech/cache.html

Keywords: cache squid mode_proxy mod_cache "Reverse Proxy" reverse agent acceleration

Summary: For a daily visit to a million-level website, the speed will become a bottleneck. In addition to optimizing the application of the content publishing system, if the output result of the dynamic page that does not require real-time update is converted into a static web page, the increase in the speed will be significant, because a dynamic page is often more than static The page is 2-10 times, and if the content of the static web page can be cached in memory, access speed is even more than 2-3 order levels than the original dynamic web page.

Dynamic cache and static cache Based on the reverse proxy acceleration site Plan Program Based on Apache Mod_Proxy's reverse proxy to speed up the cache compatibility of the SQUID-oriented page design application: http_host / server_name and transote_addr / Remote_host needs to use http_x_forwarded_rost / http_x_forwarded_server instead of the page output of the background content management system to comply with the cache design so that performance issues can be done to the front desk cache server to solve it, thereby greatly simplifying the complexity of the CMS system itself.

Comparison of static cache and dynamic cache

There may be two forms of the cache of the static page: the main difference is whether the CMS is responsible for the cache update management of related content.

Static caching: It is a static page of the corresponding content at the same time, such as March 22, 2003. After entering an article through the background content management interface, the administrator immediately generates http: // www. Chedong.com/tech/2003/03/22/001.Html This static page and synchronizes the link on the relevant index page. Dynamic cache: After the new content is released, it is not prescribed to the corresponding static page until it issues a request for the corresponding content, if the front cache server does not find the corresponding cache, the background system will issue a request, the background system generates The static page of the corresponding content may be slower when the user visits the page, but it will be directly accessed. If you go to ZDNET and other foreign websites will find that the Vignette content management system they use is available in the Vignette content management system: 0,22342566,300458.html. In fact, 0,22342566,300458 is a multiple parameter that is separated by commas: After the first access is not found, it is equivalent to generating a DOC_TYPE = 0 & DOC_ID = 22342566 & DOC_TEMPLATE = 300458 in the server side, and the query result will Static page for generated cache: 0, 22342566, 300458.html

Disadvantages of static cache:

Complex trigger update mechanism: These two mechanisms are very suitable when the content management system is relatively simple. But for a relatively complex website, the logical reference relationship between the page is a very and very complicated issue. The most typical example is a news that the news should appear in the news home and related three news topics. In the static cache mode, each new article is sent, in addition to this news content itself, the system needs to trigger the system. The gear generates multiple new related static pages, which often become one of the most complex parts of the content management system. Batch update of old content: By static cache released, it is difficult to modify for previously generated static pages, so that the user has access to the old page, the new template does not take effect at all. In dynamic cache mode, each dynamic page only needs to be careful, and the relevant other pages can be automatically updated, which greatly reduces the need for design-related pages to update triggers.

I used to use similar ways before making small applications: After the first access, the query result of the database is used locally, and the next request will check if there is a cache file in the local cache directory, thereby reducing access to the background database. . Although this can also carry a relatively large load, such content management and cache management integration is difficult to separate, and data integrity is not well saved, and the content is updated, the application needs to put the corresponding content File delete. But such a design is often necessary to make a certain distribution of the cache directory when the cached file is many, otherwise the file node in a directory exceeds 3000, and the RM * will be wrong.

At this time, the system needs to be divided again, breaking complex content management systems into: content input and cache these two relatively simple system implementations.

Backstage: Content management system, focus on content release, such as complex workflow management, complex template rules, etc. ... Front desk: Cache management can be implemented using cache system

________________________________________ | squid Software Cache | | F5 Hardware Cache | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --- / / _________________ / | ASP | JSP | PHP | Content Manage System ----------------

So after division of labor: Content management and cache management 2, no matter which one is available, it is very large: software (such as the front desk 80 port uses Squid to cache the background 8080 content release management system), cache hardware, even Hand give a professional service provider like Akamai.

A cached site planning a Web acceleration HTTP Acceleration scheme for multiple sites using Squid:

The original site planning may be like this: www.chedong.com news.chedong.com bbs.chedong.com images.chedong.com

In the design of the cached server: All sites point to the same IP: through external DNS (using 2 sets for redundant backup)

____________________________www.chedong.com request / | cache box | | | / www.chedong.com news.chedong.com request - | | - | Firewall | - news.chedong.com BBS.CHEDONG.COM request / | / etc / hosts | | box | / bbs.chedong.com ------------------------ ---- Working Principle: When the external request comes, set the cache to turn the resolution according to the configuration file. In this way, the server request can be forwarded to the internal address we specified.

In terms of processing multi-virtual host steering: MOD_PROXY is simpler than Squid: You can turn different services to different ports of multiple IPs in the background.

Squid can only be disabled by disabling DNS parsing, and then forwards the address based on the local / etc / hosts file, and multiple servers must use the same port.

Use reverse proxy to accelerate, we can not only get performance improvements, but also get additional security and flexibility:

Configuration flexibility: You can control the DNS resolution of the background server on the internal server. When you need to migrate adjustments between the server, you don't have to modify the external DNS configuration, just modify the adjustment of internal DNS implementation services. Data security has increased: all background servers can be easily protected in the firewall. Background application design complexity reduction: I originally needed to establish a special picture server image.chedong.com and load relatively high application server bbs.chedong.com Separation, in the reverse proxy acceleration mode, all reception requests pass cache Server: In fact, it is a static page. In this way, you don't have to consider the picture and the application itself. It also greatly reduces the complexity of the design of the background content distribution system. It is also convenient for data and applications. Maintenance and management of file systems.

Reverse Agent Cache Acceleration Based on Apache Mod_Proxy implementation Apache contains the MOD_PROXY module, which can be used to implement the proxy server, and accelerate against the background server.

Install Apache 1.3.x compile:

--enable-shared = max --Nable-module = MOST

Note: MOD_PROXY in Apache 2.x has been separated into mod_proxy and mod_cache: MOD_CACHE has file and memory-based different implementation

Create / VAR / WWW / Proxy, setting up Apache service users can write

MOD_PROXY configuration example: reverse agent cache cache

Setting up the 8080 port service of www.example.com in the front desk.

Modify: httpd.conf

Servername www.example.com

ServerAdmin admin@example.com

# REVERSE Proxy Setting

ProxyPass / http://www.backend.com:8080/

ProxyPassReverse / http://www.backend.com:8080/

# cache dir root

Cacheroot "/ var / www / proxy"

# Max Cache StorageCachesize 50000000

# Hour: Every 4 Hour

Cachegcinterval 4

# Max Page Expire Time: HOUR

Cachemaxexpire 240

# Expire Time = (now - last_modified) * CacheLastModifiedFactor

CacheLastModifiedFactor 0.1

# Defalt Expire Tag: Hour

CachedefaultExpire 1

# Force Complete After Precent of Content Retrived: 60-90%

CacheforceCompletion 80

Customlog / usr / local / apache / logs / dev_access_log combined

Squid-based reverse proxy Acceleration Squid is a more dedicated proxy server, performance and efficiency will be much higher than the Apache's mod_proxy.

If you need a Combined format log patch:


Compilation of Squid:

./configure --Nable-useERAGENT-log --enable-need-log --enable-default-err-language = Simplify_Chinese / --Nable-Err-languages ​​= "simplify_chinese english" - Disable-Internal-DNS


#make install


Make Dir Cache

chown squid.squid *

vi /usr/local/squid/etc/squid.conf

In / etc / hosts: add internal DNS resolution, such as: www.chedong.com news.chedong.com bbs.chedong.com

-------------------- Cut here --------------------------- -------

# visible name

Visible_hostname cache.example.com

# cache config: Space USE 1G and memory use 256m

Cache_dir ufs / usr / local / squid / cache 1024 16 256

Cache_mem 256 MB

Cache_effective_user squid

Cache_effective_group Squid







# accelerage my domain only

ACL AcceleratedHosta dstdomain .example1.com

ACL AcceleratedHostb Dstdomain .example2.com

ACL AcceleratedHostc dstdomain .example3.com

# accelerage Http Protocol on port 80

ACL AcceleratedProtocol Protocol HTTPACL AcceleratedPort Port 80

# access arc


# Allow requests wheny is to to the accelerated machine and to the accelerated machine and to the

# Right Port with Right Protocol

HTTP_ACCESS Allow AcceleratedProtocol AcceleratedPort AcceleratedHosta

HTTP_ACCESS Allow AcceleratedProtocol AcceleratedPort AcceleratedHostB

HTTP_ACCESS Allow AcceleratedProtocol AcceleratedPort AcceleratedHostc

# Logging

Emulate_httpd_log on

Cache_Store_log None


ACL Manager Proto Cache_Object


Cachemgr_passwd pass all

---------------------- Cut here ---------------------------------------------------------------------- -------

Create a cache directory:

/ usr / local / Squid / Sbin / Squid -z

Start Squid

/ usr / local / Squid / Sbin / SQUID

Stop Squid:

/ usr / local / Squid / Sbin / Squid -k Shutdown

Enable new configuration:

/ usr / local / Squid / Sbin / Squid -k Reconfig

Truncate / round-off logs per day through crontab:

0 0 * * * (/ usr / local / squid / sbin / squid -k rotate)

Can a cache dynamic page design What kind of page can be better than the cached server cache? If there is "Last-Modified" and "Expires" in the HTTP header of the content, such as:

Last-Modified: Wed, 14 May 2003 13:06:17 GMT

Expires: Fri, 16 Jun 2003 13:06:17 GMT

The front-end cache server will have a generated page to be stored locally: hard disk or memory until the above page expires.

Therefore, a cached page:

The page must contain Last-Modified: Mark General Pure Static Page itself will have Last-Modified information, and dynamic pages need to be enforced by functions, such as in PHP: // always modified nowheader ("Last-Modified:". Gmdate D, D MYH: I: S ")." GMT "); must have expiffic or cache-control: max-age tag setup page's expiration time: For static pages, set the cache cycle according to the MIME type of the page via the page MIME Type For example, the image is 1 month, and the HTML page default is 2 days. ExpiresActive on ExpiresByType image / gif "access plus 1 month" ExpiresByType text / css "now plus 2 day" ExpiresDefault "now plus 1 day" for dynamic pages, it can be directly written by HTTP The returned header information, such as the news home index.php can be 20 minutes, and for a specific news page may be expired after 1 day. For example: adding 1 month after PHP, expired: // Expires One Month Laterhead ("Expires:" .gmdate ("D, D MYH: I: S", TIME () 3600 * 24 * 30). " GMT "); If the server is HTTP-based authentication, there must be Cache-Control: Public tags, allowing cache modifications for reception ASP applications First, add the following public functions in public containments (such as include.asp): <%

'Set Expires Header in Minutes

Function setExpiresheader (byval minutes)

'Set Page Last-Modified Header:

'Converts Date (19991022 11:08:38) To HTTP FORM (Fri, 22 Oct 1999 12:08:38 GMT)

Response.addheader "Last-Modified", DateTohttpdate (now ())

'The page expires in minutes

Response.expires = minutes

'Set Cache Control to Externel Applications

Response.cachecontrol = "public"


'Converts Date (19991022 11:08:38) To HTTP FORM (Fri, 22 Oct 1999 12:08:38 GMT)

Function datetohttpdate (Byval Oledate)

Const gmtdiff = # 08: 00: 00 #


DateTohttpdate = EngweekdayName (OLEDATE) & _

"," & Right ("0" & ​​DAY (OLEDATE), 2) & "& Engmonthname (OLEDATE) & _" & Year (OLEDATE) & "& Right (" 0 "& Hour (OLEDATE), 2 ) &_

":" & Right ("0" & ​​Minute (OLEDATE), 2) & ":" & Right ("0" & ​​Second (OLEDATE), 2) & "GMT"


Function EngweekdayName (DT)


Select Case Weekday (DT, 1)

Case 1: Out = "sun"

Case 2: Out = "MON"

Case 3: Out = "Tue"

Case 4: Out = "WED"

Case 5: Out = "THU"

Case 6: Out = "fri"

Case 7: Out = "SAT"


EngweekdayName = OUT


Function Engmonthname (DT)



Case 1: Out = "Jan"

Case 2: Out = "Feb"

Case 3: Out = "Mar"

Case 4: Out = "APR"

Case 5: Out = "May"

Case 6: Out = "jun"

Case 7: Out = "jul"

Case 8: OUT = "AUG"

Case 9: Out = "SEP"

Case 10: Out = "OCT"

Case 11: Out = "NOV"

Case 12: Out = "DEC"


Engmonthname = OUT



Then in the specific page, for example, INDEX.ASP and News.asp Add: HTTP Header


'The page will be set after 20 minutes



Cache compatibility design

After the agent, since the intermediate layer is added between the client and the service, the server cannot directly get the client's IP, and the server-side application cannot be returned directly to the client through the address of the forwarding request. However, in the HTTD header information of the forwarding request, it adds http_x_forwarded _ ???? information. Used to track the original client IP address and server address for client requests:

Below is 2 examples for explaining the design principle of cache and compatibility applications:

'ASP application for a name server address needs: Do not reference HTTP_HOST / SERVER_NAME, determine whether there HTTP_X_FORWARDED_SERVER function getHostName () dim hostName as String = "" hostName = Request.ServerVariables ( "HTTP_HOST") if not isDBNull (Request .ServerVariables ( "HTTP_X_FORWARDED_HOST")) then if len (trim (Request.ServerVariables ( "HTTP_X_FORWARDED_HOST")))> 0 then hostName = Request.ServerVariables ( "HTTP_X_FORWARDED_HOST") end if end if return hostNmae end function // need for a record client IP PHP application: Do not directly quote REMOTE_ADDR, but to use HTTP_X_FORWARDED_FOR, function getUserIP () {$ user_ip =

