I have a program that require all keywords to be in a single paragraph, most of the time, separated by commas

For example:

I have those terms

1-Term
1.1-Term
2-Term
3-Term
4-Term

That i collected and organized into groups and subgroups with Titles and subtitles

Title

  • 1-Term

  • 1.1-Term

  • 2-Term

    • Sub-Title
      • 3-Term
      • 4-Term

But then i want to turn them into:

1-Term, 1.1-Term, 2-Term, 3-Term, 4-Term 
 

Removing certain marked words(Titles and sub-Titles), any Empty/Blank space, and Line breaks, while adding the commas between The Terms. I want to keep certain dashes “-”(like in words )

1-Term,1.1 -Term,2-Term,3-Term,4-Term

  • Cactus_Head@programming.devOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    12 hours ago

    Basically i collect keywords( e.g: transformers, A Deep dive, Harry Potter The worst, Xbox, stars worst, Jedi) from videos on my YouTube home page and organize them into a lists

    • YouTuber terms:

      • A Deep Dive
      • The Worst

    • Franchises:
      • Star wars
      • Jedi
      • Harry Potter
      • Transformers

    • Companies:

      • Xbox

    And Turn it into:

    A Deep Dive,The Worst, Star wars, Jedi, Harry Potter, Transformers,Xbox  
    
    

    Removing the titles and subtitles.

    How do you tell text and title/subtitle apart

    I was thinking of putting a symbol like “#” for example, in front of the Title

    # - YouTuber terms:  
    

    so the script knows to ignore that whole line, like in general programming

    • a14o@feddit.org
      link
      fedilink
      arrow-up
      4
      ·
      12 hours ago

      This is not difficult to achieve at all with tools like sed or awk. But unless you provide a concrete example input file or files, all we can do is point to those tools.

      • Cactus_Head@programming.devOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        5 hours ago

        Something like this?

        - Franchise(Title): 
        
          - Harry potter
        
          - Perfect Blue
        
          - Jurassic world
          - Jurassic Park
        
          - Jedi
          - Star wars
          - The clone wars
        
          - MCU
        
          - Cartoons(Sub-Title):
        
            - Gumball 
        
            - Flapjack
        
            - Steven Universe
        
            - Stars vs. the forces of Evil
        
            - Wordgril
        
            - Flapjack
        
        

        Turned into

        Harry potter,Perfect Blue,Jurassic world,Flapjack,Jedi,Star wars,The clone wars,MCU,Gumball,Flapjack,Steven Universe,Stars vs. the forces of Evil
        

        Both “Franchis” and “Cartoons” where removed/ not included with the other words.

        • moonpiedumplings@programming.dev
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          9 hours ago

          This is technically yaml I think, a list (with one entry) of lists that contains mostly single items but also one other list. You should be able to parse this with a yaml parser like pythons built in one.

          Note that yaml is picky abiut the syntax though, so it wouldn’t be able to handle deviations.