Python slice text
I have a word and I want check it section by section. If I divide the word as follows: word='systematically' 1. Split the word based on vowels. So we could have : new_list =  word_list = ['sys','te','ma','ti','cal','ly',] 2. Then check each one and if true based on my conditions, reset the list. If len(word_list) == 3 and word_list== 'CVC': new_list.append(word_list) #here the word list should be reset to a new one till len(word_list) is finished word_list = ['te','ma','ti','cal','ly',] elif len(word_list) == 4: DO THIS elif len(word_list) == 3: DO ABOVE (IF CONDITION). (The first one) What is your idea to implement such an idea?
2/7/2020 12:08:26 PMDolan Hêriş
13 AnswersNew Answer
It is not clear to me what you want to achieve. Do you want to split words into syllables? For this purpose there are ready made modules which you can install via pip. If I remember correctly, a module (or package?) is called pyhyphen. I used such one months ago. These modules are not easy to build them on your own because this needs a lot of linguistic knowledge and lots of statistics.
Dolan Hêriş I used pyhyphen successfully despite it seems not to be updated anymore since 2017. Just follow the instructions on https://pypi.org/project/PyHyphen/ . It is easy to use. I don't know anything better. The module nltk seems to have a different aim. This is for semantically analyzing texts. I did not deal with that topic yet.
I would try to filter the wordlist by criteria. here with len = 3 new_List = list(filter(lambda x: len(x) == 3,word_list))
have you asked for something like this before? I have done the "split on vowels" part but didn't post it because a comment I posted regarding the task didn't get a reply. should still have it in my IDE to home, I'll post tonigh.
import re # example 1. words = ["work", "because", "function", "systematically"] for word in words: print(re.findall(r"[^aeiou]+[aeiou]*", word)) # example 2. print(re.findall(r"[^aeiou]+[aeiou]*", "systematically")) # or:- for x in re.findall(r"[^aeiou]+[aeiou]*", "systematically"): # do what you need here for each of the substrings.
Thanks Jan Markus . You are right. I searched and find that module but it didn’t updated from so long ago. Is there anything better. (other than nltk)
Before I post the code....is there an error in you example:- "word_list =['sys','te','ma','ti','cal','ly',]"? .....the "sys" and "te" are not split on vowels?
I wanted to split the long words based on vowels (including them) and then based on their length do some if conditions. not only filter those are three characters. word='systematically' word_list =['sys','te','ma','ti','cal','ly',] if the first condition on word_list is true, then word_list = ['te','ma','ti','cal','ly',]
I had some similar questions but about reset the value of list, no!
Thanks Jan Markus I will test it.
you are right. it should be based on vowels.
I will check and let you know. Thanks for your time.