본문 바로가기
알고리즘

Codility : GenomicRangeQuery 풀이

by 내기록 2022. 3. 5.
반응형

문제

 

GenomicRangeQuery coding task - Learn to Code - Codility

Find the minimal nucleotide from a range of sequence DNA.

app.codility.com

A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively.

-> A(1) C(2) G(3) T(4)

You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?

The DNA sequence is given as a non-empty string S = S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).

For example, consider string S = CAGCCTA and arrays P, Q such that:

 

 

P[0] = 2 Q[0] = 4

P[1] = 5 Q[1] = 5

P[2] = 0 Q[2] = 6

The answers to these M = 3 queries are as follows:

The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.

  • The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
  • The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1.

Write a function:

class Solution { public int[] solution(String S, int[] P, int[] Q); }

that, given a non-empty string S consisting of N characters and two non-empty arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.

Result array should be returned as an array of integers.

For example, given the string S = CAGCCTA and arrays P, Q such that:

P[0] = 2 Q[0] = 4

P[1] = 5 Q[1] = 5

P[2] = 0 Q[2] = 6

the function should return the values [2, 4, 1], as explained above.

Write an efficient algorithm for the following assumptions:

  • N is an integer within the range [1..100,000];

M is an integer within the range [1..50,000];

  • each element of arrays P and Q is an integer within the range [0..N - 1];
  • P[K] ≤ Q[K], where 0 ≤ K < M;
  • string S consists only of upper-case English letters A, C, G, T.

 

 

 

 

해답

생각보다 좀 헤맸다. 다른 블로그들 참고하면서 이해했는데 개인적인 생각으로는 타입이 ACGT 4개 뿐이라서 2차원 배열 만들어서 저장하는게 부담없이 진행되지 않았나 싶다.

ACGT를 숫자로 바꾸는게 번거로워서 Map에 값을 넣고 변경했다.

key point : 각 층마다 존재하는 알파벳 수를 기입하는 2중 포문 map 사용

 

문제에 나온 예제로 살펴보면

map[0]=C={0,1,0,0}

map[1]=CA={1,1,0,0}

map[2]=CAG={1,1,1,0}

map[3]=CAGC={1,2,1,0}

map[4]=CAGCC={1,3,1,0}

map[5]=CAGCCT={1,3,1,1}

map[6]=CAGCCTA={2,3,1,1}

이렇게 2차원 배열 map에 저장이 된다.

 

array 위치 기준으로 2~4 내부에서 찾는다고 하면 GCC = map[4]-map[1]={0,2,1,0} 이 된다.

시작 위치는 2인데 map[1]을 빼는 이유는 map[2]를 빼게되면 2에 위치한 G의 값까지 빼게 된다!

map[4]-map[2]={0,2,0,0} 으로 G가 빠지기 때문에 적절하지 않다.

 

import java.util.HashMap;
import java.util.Map;

public class PrefixSum {
	public int[] solution(String S, int[] P, int[] Q) {

		Map<Character, Integer> change = new HashMap<>();
		change.put('A', 0);
		change.put('C', 1);
		change.put('G', 2);
		change.put('T', 3);

		int N = P.length;

		int[][] arr = new int[S.length()][4];

		arr[0][change.get(S.charAt(0))]++;
		for (int i = 1; i < S.length(); i++) {
			char c = S.charAt(i);

			arr[i] = arr[i - 1].clone();
			arr[i][change.get(c)]++;
		}

		int[] res = new int[N];

		for (int i = 0; i < N; i++) {
			for (int j = 0; j < 4; j++) {

				int start = (P[i] == 0) ? 0 : arr[P[i]-1][j];
				int end = arr[Q[i]][j];

				if (end - start > 0) {
					res[i] = j + 1;
					break;
				}
			}
		}

		return res;
	}
}

 

References

https://jackjeong.tistory.com/44

 

반응형

댓글