Function to return subword of a camelcase string












6












$begingroup$


Given a camelcase string and an index, return the subword of the string that includes that index, e.g.:



find_word('CamelCaseString', 6) -> 'Case'
find_word('ACamelCaseString', 0) -> 'A'


My code:



def find_word(s, index):
for i in range(index, 0, -1):
if s[i].isupper():
left = i
break
else:
left = 0

for i in range(index, len(s)-1):
if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():
right = i
break
else:
right = len(s) - 1

return s[left:right+1]


Can this be made more concise/efficient?










share|improve this question









$endgroup$

















    6












    $begingroup$


    Given a camelcase string and an index, return the subword of the string that includes that index, e.g.:



    find_word('CamelCaseString', 6) -> 'Case'
    find_word('ACamelCaseString', 0) -> 'A'


    My code:



    def find_word(s, index):
    for i in range(index, 0, -1):
    if s[i].isupper():
    left = i
    break
    else:
    left = 0

    for i in range(index, len(s)-1):
    if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():
    right = i
    break
    else:
    right = len(s) - 1

    return s[left:right+1]


    Can this be made more concise/efficient?










    share|improve this question









    $endgroup$















      6












      6








      6


      1



      $begingroup$


      Given a camelcase string and an index, return the subword of the string that includes that index, e.g.:



      find_word('CamelCaseString', 6) -> 'Case'
      find_word('ACamelCaseString', 0) -> 'A'


      My code:



      def find_word(s, index):
      for i in range(index, 0, -1):
      if s[i].isupper():
      left = i
      break
      else:
      left = 0

      for i in range(index, len(s)-1):
      if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():
      right = i
      break
      else:
      right = len(s) - 1

      return s[left:right+1]


      Can this be made more concise/efficient?










      share|improve this question









      $endgroup$




      Given a camelcase string and an index, return the subword of the string that includes that index, e.g.:



      find_word('CamelCaseString', 6) -> 'Case'
      find_word('ACamelCaseString', 0) -> 'A'


      My code:



      def find_word(s, index):
      for i in range(index, 0, -1):
      if s[i].isupper():
      left = i
      break
      else:
      left = 0

      for i in range(index, len(s)-1):
      if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():
      right = i
      break
      else:
      right = len(s) - 1

      return s[left:right+1]


      Can this be made more concise/efficient?







      python strings interview-questions






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Dec 13 '18 at 14:30









      Eugene YarmashEugene Yarmash

      27829




      27829






















          3 Answers
          3






          active

          oldest

          votes


















          7












          $begingroup$

          Review





          • Add docstrings and tests... or both in the form of doctests!



            def find_word(s, index):
            """
            Finds the CamalCased word surrounding the givin index in the string

            >>> find_word('CamelCaseString', 6)
            'Case'
            >>> find_word('ACamelCaseString', 0)
            'A'
            """

            ...



          • Loop like a native.



            Instead of going over the indexes we can loop over the item directly




            range(index, 0, -1)



            We can loop over the item and index at the same time using enumerate



            for i, s in enumerate(string[index:0:-1])


            However this would be slower since it will create a new string object with every slice.




          • If we can be sure that the givin string is a CamalCase string



            Then we can drop some of your second if statement




            if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():



            Would be



             if s[i+1].isupper():



          • Actually your code (from a performance aspect) is quite good



            We could however use a while loop to increment both side at once, for a little performance gain.




          (slower, yet more readable) Alternative



          A different approach to finding CamalCase words can be done with regex,



          We can find all CamalCase words with the following regex: r"([A-Z][a-z]*)"



          And we can use re.finditer to create a generator for our matches and loop over them, and return when our index is in between the end and the start.



          import re

          def find_word_2(string, index):
          for match in re.finditer(r"([A-Z][a-z]*)", string):
          if match.start() <= index < match.end():
          return match.group()


          NOTE This yields more readable code, but it should be alot slower for large inputs.






          share|improve this answer









          $endgroup$





















            2












            $begingroup$

            An alternative approach may involve trading space for time and pre-calculate mappings between letter indexes and the individual words. That would make the actual lookup function perform at $O(1)$ with $O(n)$ sacrifice for space. This may especially be useful if this function would be executed many times and needs a constant time response for the same word.



            And, as this is tagged with interview-questions, I personally think it would be beneficial for a candidate to mention this idea of pre-calculating indexes for future constant-time lookups.



            We could use a list to store the mappings between indexes and words:



            import re


            class Solver:
            def __init__(self, word):
            self.indexes =
            for match in re.finditer(r"([A-Z][a-z]*)", word):
            matched_word = match.group()
            for index in range(match.start(), match.end()):
            self.indexes.append(matched_word)

            def find_word(self, index):
            return self.indexes[index]


            solver = Solver('CamelCaseString')
            print(solver.find_word(2)) # prints "Camel"
            print(solver.find_word(5)) # prints "Case"





            share|improve this answer









            $endgroup$





















              1












              $begingroup$

              Actually you found an example where looping over indices is ok. What you messed up is the search for the right end. When doing slicing the second value is not included 'abc[0:2]gives 'ab'. So your right shall be past the last included character, that is the next uppercase one. We rewrite the second loop to follow the style of the first one



              for i in range(index+1, len(s)):
              if s[i].isupper():
              right = i
              break
              else:
              right = len(s)


              and return the slice



              return s[left:right]


              That is IMHO also the most readable solution following the KISS principle (and some python Zen)






              share|improve this answer









              $endgroup$













                Your Answer





                StackExchange.ifUsing("editor", function () {
                return StackExchange.using("mathjaxEditing", function () {
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
                });
                });
                }, "mathjax-editing");

                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "196"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209619%2ffunction-to-return-subword-of-a-camelcase-string%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                7












                $begingroup$

                Review





                • Add docstrings and tests... or both in the form of doctests!



                  def find_word(s, index):
                  """
                  Finds the CamalCased word surrounding the givin index in the string

                  >>> find_word('CamelCaseString', 6)
                  'Case'
                  >>> find_word('ACamelCaseString', 0)
                  'A'
                  """

                  ...



                • Loop like a native.



                  Instead of going over the indexes we can loop over the item directly




                  range(index, 0, -1)



                  We can loop over the item and index at the same time using enumerate



                  for i, s in enumerate(string[index:0:-1])


                  However this would be slower since it will create a new string object with every slice.




                • If we can be sure that the givin string is a CamalCase string



                  Then we can drop some of your second if statement




                  if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():



                  Would be



                   if s[i+1].isupper():



                • Actually your code (from a performance aspect) is quite good



                  We could however use a while loop to increment both side at once, for a little performance gain.




                (slower, yet more readable) Alternative



                A different approach to finding CamalCase words can be done with regex,



                We can find all CamalCase words with the following regex: r"([A-Z][a-z]*)"



                And we can use re.finditer to create a generator for our matches and loop over them, and return when our index is in between the end and the start.



                import re

                def find_word_2(string, index):
                for match in re.finditer(r"([A-Z][a-z]*)", string):
                if match.start() <= index < match.end():
                return match.group()


                NOTE This yields more readable code, but it should be alot slower for large inputs.






                share|improve this answer









                $endgroup$


















                  7












                  $begingroup$

                  Review





                  • Add docstrings and tests... or both in the form of doctests!



                    def find_word(s, index):
                    """
                    Finds the CamalCased word surrounding the givin index in the string

                    >>> find_word('CamelCaseString', 6)
                    'Case'
                    >>> find_word('ACamelCaseString', 0)
                    'A'
                    """

                    ...



                  • Loop like a native.



                    Instead of going over the indexes we can loop over the item directly




                    range(index, 0, -1)



                    We can loop over the item and index at the same time using enumerate



                    for i, s in enumerate(string[index:0:-1])


                    However this would be slower since it will create a new string object with every slice.




                  • If we can be sure that the givin string is a CamalCase string



                    Then we can drop some of your second if statement




                    if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():



                    Would be



                     if s[i+1].isupper():



                  • Actually your code (from a performance aspect) is quite good



                    We could however use a while loop to increment both side at once, for a little performance gain.




                  (slower, yet more readable) Alternative



                  A different approach to finding CamalCase words can be done with regex,



                  We can find all CamalCase words with the following regex: r"([A-Z][a-z]*)"



                  And we can use re.finditer to create a generator for our matches and loop over them, and return when our index is in between the end and the start.



                  import re

                  def find_word_2(string, index):
                  for match in re.finditer(r"([A-Z][a-z]*)", string):
                  if match.start() <= index < match.end():
                  return match.group()


                  NOTE This yields more readable code, but it should be alot slower for large inputs.






                  share|improve this answer









                  $endgroup$
















                    7












                    7








                    7





                    $begingroup$

                    Review





                    • Add docstrings and tests... or both in the form of doctests!



                      def find_word(s, index):
                      """
                      Finds the CamalCased word surrounding the givin index in the string

                      >>> find_word('CamelCaseString', 6)
                      'Case'
                      >>> find_word('ACamelCaseString', 0)
                      'A'
                      """

                      ...



                    • Loop like a native.



                      Instead of going over the indexes we can loop over the item directly




                      range(index, 0, -1)



                      We can loop over the item and index at the same time using enumerate



                      for i, s in enumerate(string[index:0:-1])


                      However this would be slower since it will create a new string object with every slice.




                    • If we can be sure that the givin string is a CamalCase string



                      Then we can drop some of your second if statement




                      if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():



                      Would be



                       if s[i+1].isupper():



                    • Actually your code (from a performance aspect) is quite good



                      We could however use a while loop to increment both side at once, for a little performance gain.




                    (slower, yet more readable) Alternative



                    A different approach to finding CamalCase words can be done with regex,



                    We can find all CamalCase words with the following regex: r"([A-Z][a-z]*)"



                    And we can use re.finditer to create a generator for our matches and loop over them, and return when our index is in between the end and the start.



                    import re

                    def find_word_2(string, index):
                    for match in re.finditer(r"([A-Z][a-z]*)", string):
                    if match.start() <= index < match.end():
                    return match.group()


                    NOTE This yields more readable code, but it should be alot slower for large inputs.






                    share|improve this answer









                    $endgroup$



                    Review





                    • Add docstrings and tests... or both in the form of doctests!



                      def find_word(s, index):
                      """
                      Finds the CamalCased word surrounding the givin index in the string

                      >>> find_word('CamelCaseString', 6)
                      'Case'
                      >>> find_word('ACamelCaseString', 0)
                      'A'
                      """

                      ...



                    • Loop like a native.



                      Instead of going over the indexes we can loop over the item directly




                      range(index, 0, -1)



                      We can loop over the item and index at the same time using enumerate



                      for i, s in enumerate(string[index:0:-1])


                      However this would be slower since it will create a new string object with every slice.




                    • If we can be sure that the givin string is a CamalCase string



                      Then we can drop some of your second if statement




                      if s[i].islower() and s[i+1].isupper() or s[i:i+2].isupper():



                      Would be



                       if s[i+1].isupper():



                    • Actually your code (from a performance aspect) is quite good



                      We could however use a while loop to increment both side at once, for a little performance gain.




                    (slower, yet more readable) Alternative



                    A different approach to finding CamalCase words can be done with regex,



                    We can find all CamalCase words with the following regex: r"([A-Z][a-z]*)"



                    And we can use re.finditer to create a generator for our matches and loop over them, and return when our index is in between the end and the start.



                    import re

                    def find_word_2(string, index):
                    for match in re.finditer(r"([A-Z][a-z]*)", string):
                    if match.start() <= index < match.end():
                    return match.group()


                    NOTE This yields more readable code, but it should be alot slower for large inputs.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Dec 13 '18 at 15:43









                    LudisposedLudisposed

                    8,20222161




                    8,20222161

























                        2












                        $begingroup$

                        An alternative approach may involve trading space for time and pre-calculate mappings between letter indexes and the individual words. That would make the actual lookup function perform at $O(1)$ with $O(n)$ sacrifice for space. This may especially be useful if this function would be executed many times and needs a constant time response for the same word.



                        And, as this is tagged with interview-questions, I personally think it would be beneficial for a candidate to mention this idea of pre-calculating indexes for future constant-time lookups.



                        We could use a list to store the mappings between indexes and words:



                        import re


                        class Solver:
                        def __init__(self, word):
                        self.indexes =
                        for match in re.finditer(r"([A-Z][a-z]*)", word):
                        matched_word = match.group()
                        for index in range(match.start(), match.end()):
                        self.indexes.append(matched_word)

                        def find_word(self, index):
                        return self.indexes[index]


                        solver = Solver('CamelCaseString')
                        print(solver.find_word(2)) # prints "Camel"
                        print(solver.find_word(5)) # prints "Case"





                        share|improve this answer









                        $endgroup$


















                          2












                          $begingroup$

                          An alternative approach may involve trading space for time and pre-calculate mappings between letter indexes and the individual words. That would make the actual lookup function perform at $O(1)$ with $O(n)$ sacrifice for space. This may especially be useful if this function would be executed many times and needs a constant time response for the same word.



                          And, as this is tagged with interview-questions, I personally think it would be beneficial for a candidate to mention this idea of pre-calculating indexes for future constant-time lookups.



                          We could use a list to store the mappings between indexes and words:



                          import re


                          class Solver:
                          def __init__(self, word):
                          self.indexes =
                          for match in re.finditer(r"([A-Z][a-z]*)", word):
                          matched_word = match.group()
                          for index in range(match.start(), match.end()):
                          self.indexes.append(matched_word)

                          def find_word(self, index):
                          return self.indexes[index]


                          solver = Solver('CamelCaseString')
                          print(solver.find_word(2)) # prints "Camel"
                          print(solver.find_word(5)) # prints "Case"





                          share|improve this answer









                          $endgroup$
















                            2












                            2








                            2





                            $begingroup$

                            An alternative approach may involve trading space for time and pre-calculate mappings between letter indexes and the individual words. That would make the actual lookup function perform at $O(1)$ with $O(n)$ sacrifice for space. This may especially be useful if this function would be executed many times and needs a constant time response for the same word.



                            And, as this is tagged with interview-questions, I personally think it would be beneficial for a candidate to mention this idea of pre-calculating indexes for future constant-time lookups.



                            We could use a list to store the mappings between indexes and words:



                            import re


                            class Solver:
                            def __init__(self, word):
                            self.indexes =
                            for match in re.finditer(r"([A-Z][a-z]*)", word):
                            matched_word = match.group()
                            for index in range(match.start(), match.end()):
                            self.indexes.append(matched_word)

                            def find_word(self, index):
                            return self.indexes[index]


                            solver = Solver('CamelCaseString')
                            print(solver.find_word(2)) # prints "Camel"
                            print(solver.find_word(5)) # prints "Case"





                            share|improve this answer









                            $endgroup$



                            An alternative approach may involve trading space for time and pre-calculate mappings between letter indexes and the individual words. That would make the actual lookup function perform at $O(1)$ with $O(n)$ sacrifice for space. This may especially be useful if this function would be executed many times and needs a constant time response for the same word.



                            And, as this is tagged with interview-questions, I personally think it would be beneficial for a candidate to mention this idea of pre-calculating indexes for future constant-time lookups.



                            We could use a list to store the mappings between indexes and words:



                            import re


                            class Solver:
                            def __init__(self, word):
                            self.indexes =
                            for match in re.finditer(r"([A-Z][a-z]*)", word):
                            matched_word = match.group()
                            for index in range(match.start(), match.end()):
                            self.indexes.append(matched_word)

                            def find_word(self, index):
                            return self.indexes[index]


                            solver = Solver('CamelCaseString')
                            print(solver.find_word(2)) # prints "Camel"
                            print(solver.find_word(5)) # prints "Case"






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Dec 13 '18 at 22:51









                            alecxealecxe

                            15.3k53579




                            15.3k53579























                                1












                                $begingroup$

                                Actually you found an example where looping over indices is ok. What you messed up is the search for the right end. When doing slicing the second value is not included 'abc[0:2]gives 'ab'. So your right shall be past the last included character, that is the next uppercase one. We rewrite the second loop to follow the style of the first one



                                for i in range(index+1, len(s)):
                                if s[i].isupper():
                                right = i
                                break
                                else:
                                right = len(s)


                                and return the slice



                                return s[left:right]


                                That is IMHO also the most readable solution following the KISS principle (and some python Zen)






                                share|improve this answer









                                $endgroup$


















                                  1












                                  $begingroup$

                                  Actually you found an example where looping over indices is ok. What you messed up is the search for the right end. When doing slicing the second value is not included 'abc[0:2]gives 'ab'. So your right shall be past the last included character, that is the next uppercase one. We rewrite the second loop to follow the style of the first one



                                  for i in range(index+1, len(s)):
                                  if s[i].isupper():
                                  right = i
                                  break
                                  else:
                                  right = len(s)


                                  and return the slice



                                  return s[left:right]


                                  That is IMHO also the most readable solution following the KISS principle (and some python Zen)






                                  share|improve this answer









                                  $endgroup$
















                                    1












                                    1








                                    1





                                    $begingroup$

                                    Actually you found an example where looping over indices is ok. What you messed up is the search for the right end. When doing slicing the second value is not included 'abc[0:2]gives 'ab'. So your right shall be past the last included character, that is the next uppercase one. We rewrite the second loop to follow the style of the first one



                                    for i in range(index+1, len(s)):
                                    if s[i].isupper():
                                    right = i
                                    break
                                    else:
                                    right = len(s)


                                    and return the slice



                                    return s[left:right]


                                    That is IMHO also the most readable solution following the KISS principle (and some python Zen)






                                    share|improve this answer









                                    $endgroup$



                                    Actually you found an example where looping over indices is ok. What you messed up is the search for the right end. When doing slicing the second value is not included 'abc[0:2]gives 'ab'. So your right shall be past the last included character, that is the next uppercase one. We rewrite the second loop to follow the style of the first one



                                    for i in range(index+1, len(s)):
                                    if s[i].isupper():
                                    right = i
                                    break
                                    else:
                                    right = len(s)


                                    and return the slice



                                    return s[left:right]


                                    That is IMHO also the most readable solution following the KISS principle (and some python Zen)







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Dec 14 '18 at 20:35









                                    stefanstefan

                                    1,540211




                                    1,540211






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Code Review Stack Exchange!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        Use MathJax to format equations. MathJax reference.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209619%2ffunction-to-return-subword-of-a-camelcase-string%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Bundesstraße 106

                                        Verónica Boquete

                                        Ida-Boy-Ed-Garten