Community
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Capture first word on line: using Unicode

 
Post new topic   Reply to topic    Community Forum Index -> Regular Expressions
View previous topic :: View next topic  
Author Message
Mike Olds



Joined: 30 Sep 2009
Posts: 226

PostPosted: Thu Jan 07, 2021 2:46 pm    Post subject: Capture first word on line: using Unicode Reply with quote

Hello,

Windows 7, 64bit TP 8.50

I am working on a .txt file of a dictionary which I am converting to .htm and would like to have the first word of each main entry put into boldface.

I am using the TP Search and Replace tool, not Wild-Edit.

Main entries begin a new line.

The problem is that this tool thinks that Unicode characters are word breaks.

Any solutions?
_________________
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhammatalk_forum/whats.new.htm
Back to top
View user's profile Send private message Visit poster's website
ben_josephs



Joined: 02 Mar 2003
Posts: 2360

PostPosted: Thu Jan 07, 2021 8:16 pm    Post subject: Reply with quote

What regex are you using to match words?
Please provide examples of words that this regex doesn't recognise as single words.
Back to top
View user's profile Send private message
MudGuard



Joined: 02 Mar 2003
Posts: 1253
Location: Munich, Germany

PostPosted: Thu Jan 07, 2021 11:08 pm    Post subject: Re: Capture first word on line: using Unicode Reply with quote

Mike Olds wrote:
The problem is that this tool thinks that Unicode characters are word breaks.


So every character is treated as word break?
As every character is a Unicode character ...
Back to top
View user's profile Send private message Visit poster's website
Mike Olds



Joined: 30 Sep 2009
Posts: 226

PostPosted: Fri Jan 08, 2021 11:21 pm    Post subject: Reply with quote

Hello,

Sorry for the delay in responding. I do not get notices although I have it checked.

Sample lines:

<p>:Akāca (adjective) [a + kāca] pure, flawless, clear D II 244; Snp 476; Ja V 203.</p>

<p>:Paricāreti [causative of paricarati]</p>

I think I need to ask a different question as the so called regex I was using appears to only be capturing the first letter.

I was using <p>(:\w)

All the relevant lines begin with <p>:

What I meant by Unicode characters I realize now was too vague. I am speaking about the characters with diacriticals, including compound characters. And since my regex was unsuitable, there might not be any more of a problem than my ignorance.

So can you give me a regex that will capture the first word of a line.
_________________
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhammatalk_forum/whats.new.htm
Back to top
View user's profile Send private message Visit poster's website
ben_josephs



Joined: 02 Mar 2003
Posts: 2360

PostPosted: Fri Jan 08, 2021 11:28 pm    Post subject: Reply with quote

Try
<p>(:\w+)

Let us know whether that works.
Back to top
View user's profile Send private message
Mike Olds



Joined: 30 Sep 2009
Posts: 226

PostPosted: Fri Jan 08, 2021 11:36 pm    Post subject: Reply with quote

Hello Ben,

Thank you for your quick response. Yes that seems to work.

I had just also found another which appears to work:

^<p>(:\S+)
_________________
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhammatalk_forum/whats.new.htm
Back to top
View user's profile Send private message Visit poster's website
ben_josephs



Joined: 02 Mar 2003
Posts: 2360

PostPosted: Fri Jan 08, 2021 11:57 pm    Post subject: Reply with quote

That matches sequences of any characters that aren't white space. Is that what you want?
Back to top
View user's profile Send private message
Mike Olds



Joined: 30 Sep 2009
Posts: 226

PostPosted: Sat Jan 09, 2021 12:34 am    Post subject: Reply with quote

Hello Ben,

Yes, sort of. There are some complications. But that seems to get what I need. (It captures <sup>1</sup> which is good.

Thanks again for this help and for tolerating my ignorance. I'm getting way too old for this sort of thing, my mind just can't keep up.
_________________
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhammatalk_forum/whats.new.htm
Back to top
View user's profile Send private message Visit poster's website
AmigoJack



Joined: 30 Oct 2016
Posts: 291
Location: グリーン ヒル ゾーン

PostPosted: Sun Jan 10, 2021 3:21 pm    Post subject: Reply with quote

Mike Olds wrote:
Sorry for the delay in responding. I do not get notices although I have it checked.
Just visit this board regularly (i.e. every weekend) and look out for the "unread post" icon () to easily spot activity that is yet unknown to you.

If you need an overview of all your own posts (to see topics you've created or participated) just use the Find all posts by Mike Olds link in your public profile - you could bookmark it.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Community Forum Index -> Regular Expressions All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB