New course! Every coder should learn Generative AI!
Try a free lesson+ 1
My pattern is similar to that of Josh Greig 's.
The way I'm doing it is - find EITHER
- A word starting with a capital letter. Enclose it in a capturing group
OR
- a string enclosed in quotation marks (like "text"). Enclose it in a *non-capturing group*.
This way, if a captured group is found, then extract the word, else just ignore.
The pattern used is this
"(?:[\"'].*["'])|(\\b[A-Z]\\w+\\b)"
Let's focus on "(?:[\"'].*["'])" first. It simply matches any number of characters enclosed in "". The "?:" means that it is a non-capturing group.
The '|' metacharacter is similar to the logical OR
"(\\b[A-Z][A-Za-z]+\\b)" matches a word starting with a capital letter enclosed within word boundaries.
After this, you need to enclose the print statement inside an if-block with condition
`if (m.group(1) != null)`
You also need to change `m.group()` to `m.group(1)` in line 25.
+ 1
Is this still a problem?
When I run it, I get the output:
Matched is found at index 0
MathchMeToo is found at index 23
FindMe is found at index 199
FindMe is found at index 226
That doesn't include the dontFindMe from line 18 and 19 which matches what you likely want. Is there still a problem here or did you fix it since posting your question?
+ 1
Sucheta, ok I see. You don't want the "FindMe" even because that is preceded by "dont" which starts with a lower case.
I'll troubleshoot it now.
+ 1
This would find words starting with a capital that appear at the start of a line or after a non-word/quote character:
String regex = "((^[A-Z]\\w+)|([^\\w\\\"][A-Z][\\w]+))";
I have some trouble with the lookarounds or the non-capturing groups, though. You'll need one to exclude words within quotes and to make the first letter in your matches be only at the start of the word and not merely after a lower case letter. My regular expression above matches a non-word character such as your period when we should exclude that from the match.
+ 1
Example string:
"""
Word abcd efgh "Java" two Another
"""
1. 'Word' will be matched by the right-hand-side of '|' and put into a capturing group. m.group(1) will return 'Word'.
2. '"Java"' will be matched too by the LHS of '|'. But since it will not be put into capturing group, m.group(1) will return null, which we can ignore
3. 'Another' will be the same as point 1.
Fixed code:
https://code.sololearn.com/c2ETLMud0UR1/?ref=app