4C++Boost正則表達式

4 C++ Boost 正則表達式

創(chuàng)新互聯公司自2013年起，先為梁溪等服務建站，梁溪等地企業(yè)，進行企業(yè)商務咨詢服務。為梁溪企業(yè)網站制作PC+手機+微官網三網同步一站式服務解決您的所有建站問題。

目錄:
離線文檔:
去除HTML文件中的標簽:

正則表達之檢驗程序:
	正則表達式元字符:
	錨點:
	匹配多個字母與多個數字
	標記:含有()一對小括號里面的東西,Boost中()不需要轉譯了
	?: 不被標記,不能被反向引用
	重復特性[貪婪匹配,盡量去匹配最多的]:
	? 非貪婪匹配[盡可能少的匹配]:
	流模式,不會回頭,匹配就匹配了,為高性能服務:
	反向引用:必須存在被標記的表達式
	或條件:
	單詞邊界:
	命名表達式:
	注釋:
	分支重設:
	正向預查:
	舉例1:只是匹配th不是匹配ing,但是ing必須存在
	舉例2:ing參與匹配,th不被消耗,in被匹配
	舉例3:除了ing不匹配,其他都匹配.
	反向預查:
	遞歸正則:
	操作符優(yōu)先級:
	
顯示子串的個數	
boost 正則表達式 sub match
boost 正則表達式 算法regex_replace
boost 正則表達式  迭代器
boost 正則表達式 -1,就是未被匹配的字符
boost 正則表達式 captures 官方代碼為什么會出現段錯誤?
boost 正則表達式 官方例子
boost 正則表達式 search方式 簡單的詞法分析器,分析C++類定義
boost 正則表達式 迭代器方式 簡單的詞法分析器,分析C++類定義
boost 正則表達式,將C++文件轉換為HTML文件
boost 正則表達式 ,抓取網頁中的所有連接:

離線文檔:

boost_1_62_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

去除HTML文件中的標簽:

chunli@Linux:~/workspace/Boost$ sed 's/<[\/]\?$[[:alpha:]][[:alnum:]]*[^>]*$>//g' index.html

正則表達之檢驗程序:

chunli@Linux:~/boost$ cat main.cpp 
#include <iostream>
#include <iomanip>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, const char* argv[]) 
{
	if (argc != 2) 
	{
		cerr << "Usage: " << argv[0] << " regex-str" << endl;
		return 1;
	}

	boost::regex e(argv[1], boost::regex::icase);
	//mark_count 返回regex中帶標記子表達式的數量。帶標記子表達式是指正則表達式中用圓括號括起來的部分
	cout << "subexpressions: " << e.mark_count() << endl;

	string line;
	while (getline(cin, line)) 
	{
		boost::match_results<string::const_iterator> m;
		if (boost::regex_search(line, m, e, boost::match_default)) 
		{
			const int n = m.size();
			for (int i = 0; i < n; ++i)
			{
				cout << m[i] << " ";
			}
			cout << endl;
		} 
		else 
		{
			cout << setw(line.size()) << setfill('-') << '-' << right << endl;
		}
	}
}

正則表達式元字符:

.[{}()\*+?|^$

錨點:

Anchors

A '^' character shall match the start of a line.

A '$' character shall match the end of a line.

匹配多個字母與多個數字

chunli@Linux:~/boost$ g++ main.cpp -l boost_regex -Wall && ./a.out "\w+\d+"

subexpressions: 0

Hello,world2016

world2016

標記:含有()一對小括號里面的東西,Boost中()不需要轉譯了

chunli@Linux:~/boost$ g++ main.cpp  -l boost_regex -Wall  && ./a.out "([[:alpha:]]+)[[:digit:]]+\1" 
subexpressions: 1
hello123abc8888888abc
abc8888888abc abc 
\1為引用$1
只有被標記的內容才能被反向引用.

?: 不被標記,不能被反向引用

chunli@Linux:~/boost$ g++ main.cpp  -l boost_regex -Wall  && ./a.out '(?:[[:alpha:]]+)[[:digit:]]+'
subexpressions: 0
abcd1234
abcd1234 
11111@@
-------

重復特性[貪婪匹配,盡量去匹配最多的]:

*	任意次
+	至少一次
?	一次
{n}	n次
{n,}	大于等于n次
{n,m}	n到m次

chunli@Linux:~/boost$ g++ main.cpp  -l boost_regex -Wall  && ./a.out 'a.*b'
subexpressions: 0
azzzzzzzzzbbaaazzzzzzzb
azzzzzzzzzbbaaazzzzzzzb

? 非貪婪匹配[盡可能少的匹配]:

Non greedy repeats
The normal repeat operators are "greedy", that is to say they will consume as much input as possible. There are non-greedy versions available that will consume as little input as possible while still producing a match.

*? Matches the previous atom zero or more times, while consuming as little input as possible.

+? Matches the previous atom one or more times, while consuming as little input as possible.

?? Matches the previous atom zero or one times, while consuming as little input as possible.

{n,}? Matches the previous atom n or more times, while consuming as little input as possible.

{n,m}? Matches the previous atom between n and m times, while consuming as little input as possible.

chunli@Linux:~/boost$ g++ main.cpp  -l boost_regex -Wall  && ./a.out 'a.*?b'
subexpressions: 0
azzzzzzzzzbbaaazzzzzzzb
azzzzzzzzzb

流模式,不會回頭,匹配就匹配了,為高性能服務:

Possessive repeats
By default when a repeated pattern does not match then the engine will backtrack until a match is found. However, this behaviour can sometime be undesireble so there are also "possessive" repeats: these match as much as possible and do not then allow backtracking if the rest of the expression fails to match.

*+ Matches the previous atom zero or more times, while giving nothing back.

++ Matches the previous atom one or more times, while giving nothing back.

?+ Matches the previous atom zero or one times, while giving nothing back.

{n,}+ Matches the previous atom n or more times, while giving nothing back.

{n,m}+ Matches the previous atom between n and m times, while giving nothing back.

Back references

反向引用:必須存在被標記的表達式

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out '^(a*).*\1$'
subexpressions: 1
a66a66
a66a66  
asssasss
asssasss

或條件:

Alternation
The | operator will match either of its arguments, so for example: abc|def will match either "abc" or "def".

Parenthesis can be used to group alternations, for example: ab(d|ef) will match either of "abd" or "abef".

Empty alternatives are not allowed (these are almost always a mistake), but if you really want an empty alternative use (?:) as a placeholder, for example:

|abc is not a valid expression, but

(?:)|abc is and is equivalent, also the expression:

(?:abc)?? has exactly the same effect.

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out 'l(i|o)ve'
subexpressions: 1
love
love o 
live
live i 
^C
chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out '\<l(i|o)ve\>'
subexpressions: 1
love
love o 
live
live i 

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out 'abc|123|234'
subexpressions: 0
23
--
123
123 
abc
abc 
234
234 
123456789abc
123

4 C++ Boost 正則表達式

單詞邊界:

Word Boundaries
Word Boundaries

The following escape sequences match the boundaries of words:

< Matches the start of a word.

> Matches the end of a word.

\b Matches a word boundary (the start or end of a word).

\B Matches only when not at a word boundary.

命名表達式:

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out '(?<r1>\d+)[[:blank:]]+\1'
subexpressions: 1
123 123
123 123 123 
234    234
234    234 234 
^C
chunli@Linux:~/boost$ 
chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out '(?<r1>\d+)[[:blank:]]+\g{r1}'
subexpressions: 1
1234                 1234 
1234                 1234 1234 
1236     1236
1236     1236 1236

注釋:

Comments
(?# ... ) is treated as a comment, it's contents are ignored.

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out '\d+(?#我的注釋)'
subexpressions: 0
hello1234
1234

分支重設:

Branch reset

(?|pattern) resets the subexpression count at the start of each "|" alternative within pattern.

The sub-expression count following this construct is that of whichever branch had the largest number of sub-expressions. This construct is useful when you want to capture one of a number of alternative matches in a single sub-expression index.

In the following example the index of each sub-expression is shown below the expression:

# before  ---------------branch-reset----------- after
/ ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
# 1            2         2  3        2     3     4

chunli@Linux:~/boost$ ./a.out '( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x'
subexpressions: 4

正向預查:

即使字符已經被匹配,但是不被消耗,留著其他人繼續(xù)匹配

Lookahead

(?=pattern) consumes zero characters, only if pattern matches.

(?!pattern) consumes zero characters, only if pattern does not match.

Lookahead is typically used to create the logical AND of two regular expressions, for example if a password must contain a lower case letter, an upper case letter, a punctuation symbol, and be at least 6 characters long, then the expression:

(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}

could be used to validate the password.

舉例1:只是匹配th不是匹配ing,但是ing必須存在

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out 'th(?=ing)'
subexpressions: 0
those
-----
thing
th

舉例2:ing參與匹配,th不被消耗,in被匹配

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out 'th(?=ing)(in)'
subexpressions: 1
thing
thin in 
those
-----

舉例3:除了ing不匹配,其他都匹配.

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out 'th(?!ing)'
subexpressions: 0
this 
th 
thing
-----

反向預查:

Lookbehind
(?<=pattern) consumes zero characters, only if pattern could be matched against the characters preceding the current position (pattern must be of fixed length).
(?<!pattern) consumes zero characters, only if pattern could not be matched against the characters preceding the current position (pattern must be of fixed length).

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out '(?<=ti)mer'
subexpressions: 0
timer
mer 
memer
-----

chunli@Linux:~/boost$ g++ main.cpp -lboost_regex -Wall &&./a.out '(?<!ti)mer'
subexpressions: 0
timer
-----
hhmer
mer

遞歸正則:

(?N) (?-N) (?+N) (?R) (?0) (?&NAME)
(?R) and (?0) recurse to the start of the entire pattern.
(?N) executes sub-expression N recursively, for example (?2) will recurse to sub-expression 2.
(?-N) and (?+N) are relative recursions, so for example (?-1) recurses to the last sub-expression to be declared, and (?+1) recurses to the next sub-expression to be declared.
(?&NAME) recurses to named sub-expression NAME.

操作符優(yōu)先級:

Operator precedence
The order of precedence for of operators is as follows:

Collation-related bracket symbols [==] [::] [..]
Escaped characters \
Character set (bracket expression) []
Grouping ()
Single-character-ERE duplication * + ? {m,n}
Concatenation
Anchoring ^$
Alternation |

===========================================================

Boost regex API

顯示子串的個數

pi@raspberrypi:~/boost $ cat main.cpp 
#include <iostream>
#include <iomanip>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, const char* argv[]) 
{
	using boost::regex;

	regex e1;
	e1 =  "^[[:xdigit:]]*$";
	cout << e1.str() << endl;
	cout << e1.mark_count() << endl;
	//regex::save_subexpression_location如果沒有打開, e2.subexpression(0)會報錯
	regex e2("\\b\\w+(?=ing)\\b.{2,}?([[:alpha:]]*)$",regex::perl | regex::icase|regex::save_subexpression_location	);
	cout << e2.str() << endl;
	cout << e2.mark_count() << endl;

	pair<regex::const_iterator,regex::const_iterator> sub1 = e2.subexpression(0);
	string sub1Str(sub1.first,++sub1.second);
	cout << sub1Str << endl;

	return 0;
}
pi@raspberrypi:~/boost $ 


pi@raspberrypi:~/boost $ g++ main.cpp -lboost_regex -Wall &&./a.out 
^[[1;5D^[[:xdigit:]]*$
0
\b\w+(?=ing)\b.{2,}?([[:alpha:]]*)$
1
([[:alpha:]]*)
pi@raspberrypi:~/boost $

boost 正則表達式 sub match

pi@raspberrypi:~/boost $ cat main.cpp 
#include <iostream>
#include <iomanip>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, const char* argv[]) 
{
	using boost::regex;
	//以T開頭,跟多個字母 \b邊界,然后是16進制匹配
	regex e1("\\bT\\w+\\b ([[:xdigit:]]+)");//讓正則表達式看到反斜杠
	
	string s("Time ef09,Todo 001");
	boost::smatch m;
	
	//bool b = boost::regex_search(s,m,e1,boost::match_all);//:match_all只會匹配最后一下
	bool b = boost::regex_search(s,m,e1);//默認只會匹配首次
	cout << b <<endl;
	
	const int n = m.size();
	for(int i = 0; i<n; i++)
	{
		cout << "matched:" << i << " ,position:" << m.position(i) <<", ";
		cout << "length:" << m.length(i) << " , str:" << m.str(i) << endl;
	}

	return 0;
}
pi@raspberrypi:~/boost $ g++ main.cpp -lboost_regex -Wall &&./a.out 
1
matched:0 ,position:0, length:9 , str:Time ef09
matched:1 ,position:5, length:4 , str:ef09
pi@raspberrypi:~/boost $

boost 正則表達式算法regex_replace

pi@raspberrypi:~/boost $ cat main.cpp 
#include <iostream>
#include <iomanip>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, const char* argv[]) 
{
	using boost::regex;
	regex e1("([TQV])|(\\*)|(@)");
	string replaceFmt("(\\L?1$&)(?2+)(?3#)");//轉小寫,轉+,轉#	

	string src("guTdQhV@@g*b*");//輸入的字符串
	cout << "before replaced: " <<src << endl;
	//before replaced: guTdQhV@@g*b*
	
	string newStr1 = regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_all);//必須format_all
	cout << "after replaced: " << newStr1 << endl;
	//after replaced: gutdqhv##g+b+

	string newStr2 = regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_default);//奇怪的結果
	cout << "after replaced: " << newStr2 << endl;
	
	//其他的方式
	ostream_iterator<char> oi(cout);
	regex_replace(oi,src.begin(),src.end(),e1,replaceFmt,boost::match_default | boost::match_all);
	cout << endl;

	return 0;
}
pi@raspberrypi:~/boost $ g++ main.cpp -lboost_regex -Wall &&./a.out 
before replaced: guTdQhV@@g*b*
after replaced: gutdqhv##g+b+
after replaced: gu(?1t)(?2+)(?3#)d(?1q)(?2+)(?3#)h(?1v)(?2+)(?3#)(?1@)(?2+)(?3#)(?1@)(?2+)(?3#)g(?1*)(?2+)(?3#)b(?1*)(?2+)(?3#)
guTdQhV@@g*b(?1*)(?2+)(?3#)
pi@raspberrypi:~/boost $

boost 正則表達式迭代器

pi@raspberrypi:~/boost $ cat main.cpp 
#include <iostream>
#include <iomanip>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, const char* argv[]) 
{
	using boost::regex;
	regex e("(a+).+?",regex::icase);
	
	string s("ann abb aaat");
	
	boost::sregex_iterator it1(s.begin(),s.end(),e);
	boost::sregex_iterator it2;

	for(;it1 != it2;++it1)
	{
		boost::smatch m = *it1;
		cout << m << endl;
	}
	return 0;
}
pi@raspberrypi:~/boost $ g++ main.cpp -lboost_regex -Wall &&./a.out 
an
ab
aaat
pi@raspberrypi:~/boost $

boost 正則表達式 -1,就是未被匹配的字符

pi@raspberrypi:~/boost $ cat main.cpp 
#include <iostream>
#include <iomanip>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, const char* argv[]) 
{
	using boost::regex;

	string s("this is ::a string ::of tokens");
	boost::regex re("\\s+:*");//匹配
	boost::sregex_token_iterator i(s.begin(),s.end(),re,-1);
	boost::sregex_token_iterator j;
	unsigned count  = 0;
	while(i != j)
	{
		cout << *i++ << endl;
		count++;
	}
	cout << "There were "<< count << " tokens found !" << endl;
	return 0;
}
pi@raspberrypi:~/boost $ g++ main.cpp -lboost_regex -Wall &&./a.out 
this   
is
a
string
of
tokens
There were 6 tokens found !
pi@raspberrypi:~/boost $

boost 正則表達式 captures 官方代碼為什么會出現段錯誤?

pi@raspberrypi:~/boost $ cat main.cpp 
#include <boost/regex.hpp>
#include <iostream>

void print_captures(const std::string& regx, const std::string& text)
{
	boost::regex e(regx);
	boost::smatch what;
	std::cout << "Expression:  \"" << regx << "\"\n";
	std::cout << "Text:        \"" << text << "\"\n";
	if(boost::regex_match(text, what, e, boost::match_extra))
	{
		unsigned i, j;
		std::cout << "** Match found **\n   Sub-Expressions:\n";
		for(i = 0; i < what.size(); ++i)
			std::cout << "      $" << i << " = \"" << what[i] << "\"\n";
		std::cout << "   Captures:\n";
		for(i = 0; i < what.size(); ++i)
		{
			std::cout << "      $" << i << " = {";
			for(j = 0; j < what.captures(i).size(); ++j)
			{
				if(j)
					std::cout << ", ";
				else
					std::cout << " ";
				std::cout << "\"" << what.captures(i)[j] << "\"";
			}
			std::cout << " }\n";
		}
	}
	else
	{
		std::cout << "** No Match found **\n";
	}
}

int main(int , char* [])
{
	print_captures("(([[:lower:]]+)|([[:upper:]]+))+", "aBBcccDDDDDeeeeeeee");
	print_captures("a(b+|((c)*))+d", "abd");
	print_captures("(.*)bar|(.*)bah", "abcbar");
	print_captures("(.*)bar|(.*)bah", "abcbah");
	print_captures("^(?:(\\w+)|(?>\\W+))*$", "now is the time for all good men to come to the aid of the party");
	print_captures("^(?>(\\w+)\\W*)*$", "now is the time for all good men to come to the aid of the party");
	print_captures("^(\\w+)\\W+(?>(\\w+)\\W+)*(\\w+)$", "now is the time for all good men to come to the aid of the party");
	print_captures("^(\\w+)\\W+(?>(\\w+)\\W+(?:(\\w+)\\W+){0,2})*(\\w+)$", "now is the time for all good men to come to the aid of the party");
	return 0;
}
pi@raspberrypi:~/boost $ g++ -D BOOST_REGEX_MATCH_EXTRA -l boost_regex -Wall main.cpp &&./a.out 
Expression:  "(([[:lower:]]+)|([[:upper:]]+))+"
Text:        "aBBcccDDDDDeeeeeeee"
** No Match found **
Bus error
pi@raspberrypi:~/boost $

boost 正則表達式官方例子

pi@raspberrypi:~/boost $ cat main.cpp 

#include <cstdlib>
#include <stdlib.h>
#include <boost/regex.hpp>
#include <string>
#include <iostream>

using namespace std;
using namespace boost;

regex expression("^([0-9]+)(\\-| |$)(.*)$");//0-9,- $,*三種

int process_ftp(const char* response, std::string* msg)
{
	cmatch what;
	if(regex_match(response, what, expression))
	{
		// what[0] contains the whole string
		// what[1] contains the response code
		// what[2] contains the separator character
		// what[3] contains the text message.
		if(msg)
			msg->assign(what[3].first, what[3].second);
		return ::atoi(what[1].first);
	}
	// failure did not match
	if(msg)
		msg->erase();
	return -1;
}

#if defined(BOOST_MSVC) || (defined(__BORLANDC__) && (__BORLANDC__ == 0x550))
istream& getline(istream& is, std::string& s)
{
	s.erase();
	char c = static_cast<char>(is.get());
	while(c != '\n')
	{
		s.append(1, c);
		c = static_cast<char>(is.get());
	}
	return is;
}
#endif

int main(int argc, const char*[])
{
	std::string in, out;
	do
	{
		if(argc == 1)
		{
			cout << "enter test string" << endl;
			getline(cin, in);
			if(in == "quit")
				break;
		}
		else
			in = "100 this is an ftp message text";
		int result;
		result = process_ftp(in.c_str(), &out);
		if(result != -1)
		{
			cout << "Match found:" << endl;
			cout << "Response code: " << result << endl;
			cout << "Message text: " << out << endl;
		}
		else
		{
			cout << "Match not found" << endl;
		}
		cout << endl;
	} while(argc == 1);
	return 0;
}
pi@raspberrypi:~/boost $ g++ -l boost_regex -Wall main.cpp &&./a.out 
enter test string
404 not found
Match found:
Response code: 404
Message text: not found

enter test string
500 service error
Match found:
Response code: 500
Message text: service error

enter test string
^C
pi@raspberrypi:~/boost $

boost 正則表達式 search方式簡單的詞法分析器,分析C++類定義

pi@raspberrypi:~/boost $ cat main.cpp 
  
#include <string>
#include <map>
#include <boost/regex.hpp>

// purpose:
// takes the contents of a file in the form of a string
// and searches for all the C++ class definitions, storing
// their locations in a map of strings/int's

typedef std::map<std::string, std::string::difference_type, std::less<std::string> > map_type;

const char* re = 
   // possibly leading whitespace:   
   "^[[:space:]]*" 
   // possible template declaration:
   "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
   // class or struct:
   "(class|struct)[[:space:]]*" 
   // leading declspec macros etc:
   "("
      "\\<\\w+\\>"
      "("
         "[[:blank:]]*\\([^)]*\\)"
      ")?"
      "[[:space:]]*"
   ")*" 
   // the class name
   "(\\<\\w*\\>)[[:space:]]*" 
   // template specialisation parameters
   "(<[^;:{]+>)?[[:space:]]*"
   // terminate in { or :
   "(\\{|:[^;\\{()]*\\{)";


boost::regex expression(re);

void IndexClasses(map_type& m, const std::string& file)
{
   std::string::const_iterator start, end;
   start = file.begin();
   end = file.end();   
   boost::match_results<std::string::const_iterator> what;
   boost::match_flag_type flags = boost::match_default;
   while(boost::regex_search(start, end, what, expression, flags))   
   {
      // what[0] contains the whole string
      // what[5] contains the class name.
      // what[6] contains the template specialisation if any.
      // add class name and position to map:
      m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
               what[5].first - file.begin();      
      // update search position:
      start = what[0].second;      
      // update flags:
      flags |= boost::match_prev_avail;
      flags |= boost::match_not_bob;
   }
}


#include <iostream>
#include <fstream>

using namespace std;

void load_file(std::string& s, std::istream& is)
{
   s.erase();
   if(is.bad()) return;
   s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));
   char c;
   while(is.get(c))
   {
      if(s.capacity() == s.size())
         s.reserve(s.capacity() * 3);
      s.append(1, c);
   }
}

int main(int argc, const char** argv)
{
   std::string text;
   for(int i = 1; i < argc; ++i)
   {
      cout << "Processing file " << argv[i] << endl;
      map_type m;
      std::ifstream fs(argv[i]);
      load_file(text, fs);
      fs.close();
      IndexClasses(m, text);
      cout << m.size() << " matches found" << endl;
      map_type::iterator c, d;
      c = m.begin();
      d = m.end();
      while(c != d)
      {
         cout << "class \"" << (*c).first << "\" found at index: " << (*c).second << endl;
         ++c;
      }
   }
   return 0;
}

pi@raspberrypi:~/boost $ cat  my_class.cpp 
template      <class      T>
struct A
{
public:

};

template      <class      T>
class M
{

}
;
pi@raspberrypi:~/boost $ g++ -l boost_regex -Wall main.cpp &&./a.out my_class.cpp 
Processing file my_class.cpp
2 matches found
class "A" found at index: 36
class "M" found at index: 88
pi@raspberrypi:~/boost $

boost 正則表達式迭代器方式簡單的詞法分析器,分析C++類定義

pi@raspberrypi:~/boost $ cat main.cpp

#include <string>
#include <map>
#include <fstream>
#include <iostream>
#include <boost/regex.hpp>

using namespace std;

// purpose:
// takes the contents of a file in the form of a string
// and searches for all the C++ class definitions, storing
// their locations in a map of strings/int's

typedef std::map<std::string, std::string::difference_type, std::less<std::string> > map_type;

const char* re = 
   // possibly leading whitespace:   
   "^[[:space:]]*" 
   // possible template declaration:
   "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
   // class or struct:
   "(class|struct)[[:space:]]*" 
   // leading declspec macros etc:
   "("
      "\\<\\w+\\>"
      "("
         "[[:blank:]]*\\([^)]*\\)"
      ")?"
      "[[:space:]]*"
   ")*" 
   // the class name
   "(\\<\\w*\\>)[[:space:]]*" 
   // template specialisation parameters
   "(<[^;:{]+>)?[[:space:]]*"
   // terminate in { or :
   "(\\{|:[^;\\{()]*\\{)";


boost::regex expression(re);
map_type class_index;

bool regex_callback(const boost::match_results<std::string::const_iterator>& what)
{
   // what[0] contains the whole string
   // what[5] contains the class name.
   // what[6] contains the template specialisation if any.
   // add class name and position to map:
   class_index[what[5].str() + what[6].str()] = what.position(5);
   return true;
}

void load_file(std::string& s, std::istream& is)
{
   s.erase();
   if(is.bad()) return;
   s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));
   char c;
   while(is.get(c))
   {
      if(s.capacity() == s.size())
         s.reserve(s.capacity() * 3);
      s.append(1, c);
   }
}

int main(int argc, const char** argv)
{
   std::string text;
   for(int i = 1; i < argc; ++i)
   {
      cout << "Processing file " << argv[i] << endl;
      std::ifstream fs(argv[i]);
      load_file(text, fs);
      fs.close();
      // construct our iterators:
      boost::sregex_iterator m1(text.begin(), text.end(), expression);
      boost::sregex_iterator m2;
      std::for_each(m1, m2, &regex_callback);
      // copy results:
      cout << class_index.size() << " matches found" << endl;
      map_type::iterator c, d;
      c = class_index.begin();
      d = class_index.end();
      while(c != d)
      {
         cout << "class \"" << (*c).first << "\" found at index: " << (*c).second << endl;
         ++c;
      }
      class_index.erase(class_index.begin(), class_index.end());
   }
   return 0;
}




pi@raspberrypi:~/boost $ g++ -l boost_regex -Wall main.cpp &&./a.out main.cpp my_class.cpp 
Processing file main.cpp
0 matches found
Processing file my_class.cpp
2 matches found
class "A" found at index: 23
class "B" found at index: 36
pi@raspberrypi:~/boost $

boost 正則表達式,將C++文件轉換為HTML文件

pi@raspberrypi:~/boost $ cat main.cpp

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <iterator>
#include <boost/regex.hpp>
#include <fstream>
#include <iostream>

// purpose:
// takes the contents of a file and transform to
// syntax highlighted code in html format

boost::regex e1, e2;
extern const char* expression_text;
extern const char* format_string;
extern const char* pre_expression;
extern const char* pre_format;
extern const char* header_text;
extern const char* footer_text;

void load_file(std::string& s, std::istream& is)
{
   s.erase();
   if(is.bad()) return;
   s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));
   char c;
   while(is.get(c))
   {
      if(s.capacity() == s.size())
         s.reserve(s.capacity() * 3);
      s.append(1, c);
   }
}

int main(int argc, const char** argv)
{
   try{
   e1.assign(expression_text);
   e2.assign(pre_expression);
   for(int i = 1; i < argc; ++i)
   {
      std::cout << "Processing file " << argv[i] << std::endl;
      std::ifstream fs(argv[i]);
      std::string in;
      load_file(in, fs);
      fs.close();
      std::string out_name = std::string(argv[i]) + std::string(".htm");
      std::ofstream os(out_name.c_str());
      os << header_text;
      // strip '<' and '>' first by outputting to a
      // temporary string stream
      std::ostringstream t(std::ios::out | std::ios::binary);
      std::ostream_iterator<char> oi(t);
      boost::regex_replace(oi, in.begin(), in.end(), e2, pre_format, boost::match_default | boost::format_all);
      // then output to final output stream
      // adding syntax highlighting:
      std::string s(t.str());
      std::ostream_iterator<char> out(os);
      boost::regex_replace(out, s.begin(), s.end(), e1, format_string, boost::match_default | boost::format_all);
      os << footer_text;
      os.close();
   }
   }
   catch(...)
   { return -1; }
   return 0;
}

const char* pre_expression = "(<)|(>)|(&)|\\r";
const char* pre_format = "(?1&lt;)(?2&gt;)(?3&amp;)";


const char* expression_text = // preprocessor directives: index 1
                              "(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"
                              // comment: index 2
                              "(//[^\\n]*|/\\*.*?\\*/)|"
                              // literals: index 3
                              "\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
                              // string literals: index 4
                              "('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
                              // keywords: index 5
                              "\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
                              "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
                              "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
                              "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
                              "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
                              "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
                              "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
                              "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
                              "|using|virtual|void|volatile|wchar_t|while)\\>"
                              ;

const char* format_string = "(?1<font color=\"#008040\">$&</font>)"
                            "(?2<I><font color=\"#000080\">$&</font></I>)"
                            "(?3<font color=\"#0000A0\">$&</font>)"
                            "(?4<font color=\"#0000FF\">$&</font>)"
                            "(?5<B>$&</B>)";

const char* header_text = "<HTML>\n<HEAD>\n"
                          "<TITLE>Auto-generated html formated source</TITLE>\n"
                          "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=windows-1252\">\n"
                          "</HEAD>\n"
                          "<BODY LINK=\"#0000ff\" VLINK=\"#800080\" BGCOLOR=\"#ffffff\">\n"
                          "<P> </P>\n<PRE>";

const char* footer_text = "</PRE>\n</BODY>\n\n";

pi@raspberrypi:~/boost $ g++ -l boost_regex -Wall main.cpp &&./a.out main.cpp 
Processing file main.cpp

看效果圖:

4 C++ Boost 正則表達式

boost 正則表達式 ,抓取網頁中的所有連接:

pi@raspberrypi:~/boost $ cat main.cpp

#include <fstream>
#include <iostream>
#include <iterator>
#include <boost/regex.hpp>

boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"",
               boost::regex::normal | boost::regbase::icase);

void load_file(std::string& s, std::istream& is)
{
   s.erase();
   if(is.bad()) return;
   //
   // attempt to grow string buffer to match file size,
   // this doesn't always work...
   s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));
   char c;
   while(is.get(c))
   {
      // use logarithmic growth stategy, in case
      // in_avail (above) returned zero:
      if(s.capacity() == s.size())
         s.reserve(s.capacity() * 3);
      s.append(1, c);
   }
}

int main(int argc, char** argv)
{
   std::string s;
   int i;
   for(i = 1; i < argc; ++i)
   {
      std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
      s.erase();
      std::ifstream is(argv[i]);
      load_file(s, is);
      is.close();
      boost::sregex_token_iterator i(s.begin(), s.end(), e, 1);
      boost::sregex_token_iterator j;
      while(i != j)
      {
         std::cout << *i++ << std::endl;
      }
   }
   //
   // alternative method:
   // test the array-literal constructor, and split out the whole
   // match as well as $1....
   //
   for(i = 1; i < argc; ++i)
   {
      std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
      s.erase();
      std::ifstream is(argv[i]);
      load_file(s, is);
      is.close();
      const int subs[] = {1, 0,};
      boost::sregex_token_iterator i(s.begin(), s.end(), e, subs);
      boost::sregex_token_iterator j;
      while(i != j)
      {
         std::cout << *i++ << std::endl;
      }
   }

   return 0;
}



pi@raspberrypi:~/boost $ curl http://www.boost.org/ > boost.html
pi@raspberrypi:~/boost $ g++ -l boost_regex -Wall main.cpp &&./a.out boost.html 
Findings URL's in boost.html:
/
http://www.gotw.ca/
http://en.wikipedia.org/wiki/Andrei_Alexandrescu
http://safari.awprofessional.com/?XmlId=0321113586
/users/license.html
http://www.open-std.org/jtc1/sc22/wg21/
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdf
http://cppnow.org/
https://developers.google.com/open-source/soc/?csw=1
/doc/libs/release/more/getting_started/index.html
http://fedoraproject.org/
http://www.debian.org/
http://www.netbsd.org/

網站欄目：4C++Boost正則表達式
標題URL：http://www.muchs.cn/article28/ghgccp.html

成都網站建設公司_創(chuàng)新互聯，為您提供網站排名、網站導航、靜態(tài)網站、建站公司、移動網站建設、網頁設計公司

聲明：本網站發(fā)布的內容（圖片、視頻和文字）以用戶投稿、用戶轉載內容為主，如果涉及侵權請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網站立場，如需處理請聯系客服。電話：028-86922220；郵箱：631063699@qq.com。內容未經允許不得轉載，或轉載時需注明來源：創(chuàng)新互聯

猜你還喜歡下面的內容